You are on page 1of 275

INFORMATION TO USERS

This reproduction was m ade from a copy o f a docum ent sent to us fo r m icrofilm ing.
While the m ost advanced technology has been used to photograph and reproduce
this d o cum ent, the quality o f the reproduction is heavily dependent upon the
quality o f the m aterial subm itted.

The following explanation o f techniques is provided to help clarify m arkings or


n o tations which may appear on this reproduction.

1 .T h e sign or “ targ et” fo r pages apparently lacking from the docum ent
photographed is “ Missing Page(s)” . If it was possible to obtain the missing
page(s) o r section, they are spliced into the film along w ith adjacent pages. This
may have necessitated cutting through an image and duplicating adjacent pages
to assure com plete continuity.

2. When an image on the film is obliterated w ith a round black m ark, it is an


indication o f eith er blurred copy because o f m ovem ent during exposure,
duplicate copy, o r copyrighted m aterials th a t should n o t have been film ed. F or
blurred pages, a good image o f the page can be found in the adjacent fram e. If
copyrighted m aterials were deleted, a target n o te will appear listing th e pages in
the adjacent fram e.

3. When a m ap, drawing o r ch art, etc., is p art o f the m aterial being photographed,
a definite m ethod o f “ sectioning” the m aterial has been follow ed. It is
custom ary to begin filming at the up p er left hand com er o f a large sheet and to
continue from left to right in equal sections with small overlaps. If necessary,
sectioning is continued again—beginning below the first row and continuing on
until com plete.

4. F o r illustrations th at cannot be satisfactorily reproduced by xerographic


m eans, photographic prints can be purchased at additional cost and inserted
into y o u r xerographic copy. These prints are available upon request from the
D issertations C ustom er Services D epartm ent.

5. Some pages in any docum ent m ay have indistinct print. In all cases the best
available copy has been film ed.

University
Microfilms
International
300N .Z e e b Road
Ann Arbor, Ml 48106

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8220419

Ahmed, H assan Masud

SIGNAL PROCESSING ALGORITHMS AND ARCHITECTURES

Stanford University Ph.D. 1982

University
Microfilms
International 300 N. Zeeb Road, Ann Arbor, MI 48106

Copyright 1982
by
Ahmed, Hassan Masud
All Rights Reserved

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
PLEASE NOTE:

In all ca ses this material has been filmed in the best possible way from the available copy.
Problems encountered with this document hava been identified here with a check mark V .

1. Glossy photographs or p a g e s ______

2. Colored illustrations, paper or print_____

3. Photographs with dark background_____

4. Illustrations are poor co p y ______

5. Pages with black marks, not original copy______

6. Print shows through as there is text on both sid es of paqe X

7. Indistinct, broken or small print on severalp ages is

8. Print exceeds margin requirements_____

9. Tightly bound copy with print lost in spine______

10. Computer printout pages with indistinct print______

11. P age(s)____________ lacking when material received, and not available from school or
auihor.

12. P a g e ( s ) _ _ 9 ______ seem to be missing in numbering only as text follows.

13. Two pages num bered____________ . Text follows.

14. Curling and wrinkled p a g e s______

15. O t h e r _________________________________________________ __

University
Microfilms
Internationa!

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
SIGNAL PROCESSING ALGORITHMS
AND
ARCHITECTURES

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

H assan M. A hm ed

June 1982

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(c) C opyright 1982

by

H a ssa n M. Ahm ed

ii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I c e rtify t h a t I have r e a d th is th e s is a n d t h a t in m y
opinion i t is fully a d e q u a te , in sc o p e a n d quality, as
a d is s e r ta tio n fo r th e d e g re e of D octor of Philosophy.

(P rin cip al Advis

I c e rtify t h a t I have r e a d th is th e s is e n d t h a t in m y
opinion i t is fully a d e q u a te , in sc o p e a n d quality, as
a d is s e r ta tio n fo r th e d e g re e of D octor of Philosophy.

I c e rtify t h a t I have r e a d th is th e s is a n d t h a t in m y
opinion it is fully a d e q u a te , in sco p e a n d quality, as
a d is s e r ta tio n fo r th e d e g re e of D octor of Philosophy.

A pproved fo r th e U niversity C o m m ittee o n G ra d u a te Studies:

D ean of Grac&iate S tudies & R e se a rc h

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ABSTRACT

The ad v e n t of th e V ery Large Scale In te g ra tio n (VLSI) technology has

provided th e ab ility to c o n s tru c t large sy ste m s o n a single silicon chip. This

d isse rta tio n is c o n c e rn e d w ith exploiting th is ab ility to design a powerful

signal p ro ce ssin g chip c ap ab le of efficiently im p lem en tin g su c h p o p u lar

algorithm s as th e d is c re te F o u rie r tra n sfo rm , la d d e r filters an d a sso c ia te d

m a trix a lg e b ra o p eratio n s. The la tte r include Givens ro ta tio n s and Cholesky

factorization.

The goal of th e p r e s e n t w ork is to efficiently m ap algorithm s onto

a rc h ite c tu re s b y m ain tain in g a close link w ith th e th e o re tic a l b a sis of a

p a rtic u la r signal p ro ce ssin g m ethod. It is show n t h a t all of th e algorithm s

co n sid ered c a n be c a s t in to a m a th e m a tic a l fram ew o rk involving g en eralized

v e c to r ro ta tio n s . S u c h r o ta tio n op eratio n s provide a n a tu ra l d e scrip tio n of

th e alg o rith m s a n d th e co m putational com plexity m e a su re d in te rm s of

th e s e e le m e n ta ry o p e ra tio n s is m u ch low er th a n in te rm s of th e usual

m ea su re of to ta l n u m b e r of m ultiplications. Thus, unlike p re s e n t day signal

p rocessing c o m p u te rs w hich em phasize ra p id m ultiplication, th e signal

p rocessing a rc h ite c tu re s in th is thesis a re b a s e d on th e ability to p e rfo rm

v e c to r ro ta tio n s in g e n e ra liz e d co ordinate sy stem s.

I t is show n t h a t th e C0RD1C algorithm of V oider provides a convenient

im p lem e n ta tio n cf v e c to r ro ta tio n s with only sim ple com ponents su c h as

adders, r e g is te rs an d sh ifters. U nfortunately, th ro u g h p u t is severely

com prom ised owing to th e n e e d for p erfo rm in g sp e cia l o p eratio n s to

a c co u n t fo r th e lim ited reg io n of convergence a n d spu rio u s scale c o n sta n ts

in h e re n t to th e m ethod. New tech n iq u es to c irc u m v e n t th e s e p ro b lem s w ith

no additional h ard w are a n d only a m arginal s p e e d p e n a lty are d escrib ed .

ili

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
F u rth e r sp e ed e n h a n ce m en ts th ro u g h th e u se of a newly developed m eth o d

known as h y b rid CORDIC a re discussed. Additionally, floating point CORDIC

(FL0RD1C) alg o rith m s th a t a re co n cep tu ally sim p le r th a n th e ir fixed point

c o u n te rp a rts a re developed and th e c o n n e c tio n of CORDIC to th e

convergence c o m p u ta tio n m eth o d s is shown.

The a rc h ite c tu re of a dual CORDIC block ch ip is d e scrib ed for a ta r g e t

application of re a l tim e s p e e c h analysis. The re su ltin g chip is shown to have

a h ig h er th ro u g h p u t p e r a re a th a n conventional chips b a se d on fast

m ultiplications. This is a ttr ib u te d to th e close m a tc h of th e p re s e n t chip to

th e algorithm s.

Large m e s h c o n n e c te d p ro ce sso r a rc h ite c tu r e s for m a trix facto rizatio n

a re developed w hich a re also closely m a tc h e d t o th e algorithm s. Individual

processing e le m e n ts in th e m esh a re b a se d on CORDIC o p erations, in fac t on

th e afo rem en tio n ed signal processing chip.

Finally, a new tech n iq u e for signal d e te c tio n in additive G aussian noise

is developed w ith a view tow ards e a se of im p lem en tatio n . It is b a sed on

la d d e r filters a n d m ay b e im p lem en ted using th e signal processing chip

m en tio n ed above.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ACKN0WLE3>GoMENTS

S tudying tow ards a Ph.D. d e g re e is m u c h m o re th a n a n e d u c atio n a l

ex p erien ce. During th is period, m a n y im p o rta n t friendships a re form ed.

While I do in d ee d in ten d to acknow ledge m an y individuals for th e ir d ire c t

co n trib u tio n s to m y th esis, I would like to th a n k th e m and m an y m o re a t th e

o u ts e t fo r th e ir friendships, which I have c h e rish ed .

P ro fe s s o r M artin Morf has u n q u e stio n a b ly b e e n m y m en to r. His wide

ranging in te r e s ts and abilities have affo rd ed m e th e o p p o rtu n ity to p u rsu e

m y own in te r e s t, w hich have o fte n b e e n away fro m th e m a in s tre a m of th e

In fo rm atio n S y stem s L aboratory. F o r th is a n d for th e p le a s u re of his

in te ra c tio n s o n m y re se a rc h , I am v e ry g rate fu l.

I am v ery hap p y to acknow ledge P ro fe sso r Jam es D. Meindl fo r his

c o n s ta n t e n c o u ra g e m e n t, his m any c o m m e n ts t h a t aided in focusing m y

r e s e a r c h a n d h is careful review of th e m a n u sc rip t. During th e c o u rse of a

Ph.D. in a field th a t is as volatile as VLSI is in th e in d u stria l a re n a , one is

o ften p lag u ed w ith dou b ts ab o u t th e m e r it of pursuing a d eg ree. I am

g ra te fu l to P ro fe sso r Meindl fo r en c o u ra g in g m e to co m plete m y stu d ie s

(which in re tro s p e c t, I see was th e c o r r e c t decision) - in him, I’ve tru ly found

a friend.

P ro fe sso r J o h n L. H ennessy pro v id ed m an y useful c o m m e n ts during his

rea d in g of th e th e sis th a t led to its im p ro v em en t. I would like to th a n k him

for those, as well as for his sp e ed y review, w hich allowed m e to " ru n off to

E urope" b e fo re th e end of th e q u a rte r:

It h a s b e e n a p lea su re to have w orked w ith Peng Ang on th e d e sig n of

th e chip (C h ap ter Six) a n d w ith Jean-M arc D elosm e on th e m u ltip ro c e s s o r

a rra y (C h a p te r Five). Their c o n trib u tio n s th ro u g h a n u n co u n tab le n u m b e r

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of stim u latin g discussions a re in te g ral to th o s e c h a p te rs. V aluable

discussions w ith P ro fe sso r Thom as K ailath a n d P ro fe sso r Abbas ElGamal a re

also gratefu lly acknow ledged.

The lab s e c re ta rie s , esp ecially B arb ara, R achel, Kathy, C harlotte, Mieko

and Jean a re ail good friends. I'm convinced, m o re th a n ever, th a t th ey hold

th e lab to g e th e r. I am g ra te fu l for th e ir frie n d sh ip s a n d for all th e fun

discussions. I feel th e sam e v e ry special a tta c h m e n t to m y room m ates (ex­

ro o m m a te s) Rich B a k e r an d P e te r Glynn.

The s u p p o rt of th e D efense Advanced R e s e a rc h P ro je c ts Agency an d th e

N atu ral S ciences a n d E ngineering R e se a rc h Council of C anada a t various

p h a ses of m y s ta y at S tan fo rd is g ra te fu lly acknow ledged. CODEX

C orporation, p a rtic u la rly Dr. G.D. F o rn ey and Dr. S.U.H. Qureshi have

provided m e w ith th e o p p o rtu n ity to k e e p m y h a n d in th e "industrial pie"

while studying. F o r th is, I a m v ery thankful.

Mrs. R achel Levy ty p e d th is m a n u s c rip t w ith excellence and speed.

However w hen th e d eadlines finally cam e, i t would have b e e n im possible fo r

m e to have c o m p le te d th e d iss e rta tio n on tim e w ithout h e r selfless

willingness to devote h e r tim e solely to it. To h e r a n d to m y friend, Mr. Am r

Badawi, who did th e "legwork" to subm it th e th esis, I a m e te rn ally grateful.

Finally, th e s e acknow ledgem ents would be incom plete w ithout

expressing m y d eep love and th an k s to m y p a r e n ts fo r th e ir patien ce, love

and e n c o u ra g e m e n t. (Being th re e th o u sa n d m iles a p a r t h a s n 't b e e n easy fo r

any of us.) So, th a n k s Mom an d Dad!!

I d e d ic a te th is th e sis to m y m o th e r a n d f a th e r a n d to m y good frie n d

Pegge - th e th re e p eople who have shown m e m o re cf life a n d love th a n m any

e x p e rien c e in a lifetim e.

vi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS

C hapter rage

1. INTRODUCTION ....................................................................................................... 1

BIBLIOGRAPHY ................................................................................................. 5

2. SIGNAL PROCESSING ALGORITHMS .................................... 10

2.1 THE DISCRETE FOURIER TRANSFORM ..................................................... 11

2.2 EXACT LEAST SQUARES LADDER ALGORITHMS ..................................... 11

BIBLIOGRAPHY ................................................................................................. 16

3. APPLICATIONS OF LADDER FORMS .................. 18

3.1 THE SPEECH ANALYSIS PROBLEM ............................................................. 18

3.1.1 S p e e c h S ynthesis T echniques ........................................................ 19

3.1.2 S p e e c h Analysis ................................................................................. 21

3.1.3 S p e e c h Analysis with S quare Root N orm alized Ladder


F o rm s —The Analysis F ilte r .......................................................... 23

3.1.4 An A ltern ate View of th e S quare Root Norm alized


L adder E quations ............................................................................... 25

3.2 ADAPTIVE EQUALIZATION ............................................................................ 2?

3.2.1 E qualizer S tru c tu re .......................................................................... 30

3.3 DETECTION OF DIGITAL SIGNALS ............................ :................................. 33

3.3.1 Timing R ecovery w ith L ad d er F o rm s .......................................... 35

3.5.2 S im ulation R esults ........................................................................... 50

CHAPTER SUMMARY AND CONCLUSIONS .................................................. 50

APPENDICES .................................................................................................... 63

BIBLIOGRAPHY ................................................................................................. 66

vii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. NUMERICAL ALGORITHMS .................................................................................... 69

4.1 THE CORDIC ALGORITHMS ........................................................................... 70

4.1.1 Som e C onvergence P ro p e rtie s .................................................... 74

4.1.2 Im p le m e n ta tio n Issues .................................................................. 78

4.1.3 Scale F a c to r N orm alization .......................................................... 79

4.1.4 Scaling in a P a ra lle l Im p lem en tatio n ........................................ 82

4.1.5 E xtending th e Domain of Convergence ...................................... 84

4.2 LOW OVERHEAD SOLUTIONS TO THE PROBLEMS OF CONVERGENCE


REGION AND SPURIOUS SCALE FACTORS ................................................ 87

4.2.1 E ffect on A ngular R esolution ........................................................ 94

4.2.2 S im u latio n R esults .......................................................................... 95

4.2.3 C om putational S peed and H ardw are C om plexity .................. 95

4.3 HYBRID CORDIC ALGORITHMS ...................................................................... 98

4.3.1 In te rp o la tio n with CORDIC’s ......................................................... 98

4.3.2 A Taylor S e rie s A pproach to Hybrid CORDIC’s ........................ 102

4.4 FLOATING POINT CORDIC ALGORITHMS (FLORDIC) ................................. 106

4.5 THE CONVERGENCE COMPUTATION TECHNIQUE .................................... Ill

4.5.1 E xam ples of th e Convergence C om putation Technique ...... 113

4.5.2 H ybrid C onvergence C om putation ............................................. 115

4.5.3 H ardw are Im p le m e n ta tio n ........................................................... 119

4.6 RELATIONSHIP BETWEEN THE CORDIC AND CONVERGENCE


COMPUTATION ALGORITHMS ........................................................................ 119

4.7 A GENERALIZED CONVERGENCE COMPUTATION METHOD AND


THE CORDIC CONNECTION ............................................................................ 126

4.7.1 E xam ples of th e G eneralized Technique .................................. 130

CHAPTER SUMMARY AND CONCLUSIONS ................................................ 137

APPENDIX ......................................................................................................... 140

BIBLIOGRAPHY .................................................................................. •........... 141

viii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5. PARALLEL PROCESSORS FOR LINEAR ALGEBRA. .............................................. 144

5.1 CHOLESKY FACTORIZATION ........................................................................ 145

5.1.1 F a st Cholesky by Rows in L ad d er F o rm ..................................... 146

5.2 SOLUTION OF LINEAR SYSTEMS OF EQUATIONS ................................... 151

5.2.1 A rc h ite ctu re for G ivens'A lgorithm ............................................ 154

5.3 COMPLEXITY DISTRIBUTION AND ACTIVITY CHARTS ............................ 158

5.3.1 Activity C harts ................................................................................... 159

5.3.2 A Tw o-dim ensional A rray fo r Givens’ A lgorithm ..................... 159

5.3.3 Dual A rrays ......................................................................................... 163

5.4 A FORMAL APPROACH TO COMPLEXITY MAPPING ................................. 164

5.4.1 C onstruction of M ultiprocessor A rrays ..................................... 169

5.4.2 An A pproach to F o rm alism ............................................................ 164

5.4.2.1 D istance M easures ........................................................................ 184

5.5 EIGENVALUE DECOMPOSITION ................................................................... 196

CHAPTER SUMMARY AND CONCLUSIONS .................................................. 200

BIBLIOGRAPHY ................................................................................................ 205

6. A LADDER FORM CHIP SET ..................................................................................... 208

6 .1 IMPLEMENTATION OF THE NORMALIZED


LADDER EQUATIONS ...................................................................................... 208

6.2 LADDER FORM CHIP ARCHITECTURES ...................................................... 212

6.3 DESIGN OF A CORDIC PROCESSOR ............................................................ 214

6.3.1 The Fully P a rallel CORDIC Block .................................................. 216

6.3 . 1 .1 Pipelining ......................................................................................... 221

5.3.2 The P arallel-S erial CORDIC Block ............................................... 221

6.3.3 The Serial-P arallel R ealization .................................................... 225

6.3.4 The Fully Serial CORDIC B lock ..................................................... 228

6.4 ARCHITECTURAL TRADEOFFS - A COMPARISON OF THE


CORDIC REALIZATIONS .................................................................................. 228

ix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.5 THE MICRO-CONTROLLER .............................................................................. 234

6.5.1 The S p eech Analysis M icrocode ................................................... 237

6 .6 OTHER APPLICATIONS ................................................................................... 236

6 .6 . 1 The D iscrete F o u rier T ran sfo rm .................................................. 239

6.6.2 S p eech Synthesis ............................................................................. 241

6.6.3 The U nnorm alized L ad d er F o rm .................................................. 241

6.6.4 Adaptive E qualization ..................................................................... 245

CHAPTER SUMMARY AND CONCLUSIONS ................................................... 247

APPENDIX ............................................................................................................ 248

BIBLIOGRAPHY ................................................................................................. 251

7. CONCLUSIONS .......................................................................................................... 253

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF FIGURES

F ig u re Page

2.1 L ad d er F ilte r S tru c tu re ............................................................................ 12

3.1 F ilte r Model fo r S p eech S ynthesis ........................................................ 20

3.2 S p e e c h S y nthesis ........................................................................................ 20

3.3 T hree Chip S y n th esizer b y Texas In stru m e n ts, In c ............................ 22

3.4 S p e e c h Analysis .......................................................................................... 22

3.5 C hannel E qualization ................................................................................. 29

3.6 T apped Delay lin e E qualizer ................................................................... 29

3.7 L adder F ilte r E qualizer ............................................................................. 31

3.8 P e rfo rm a n c e of L adder E qualizer ......................................................... 32

3.9 N on-R eturn to Zero T ransm ission F o rm a t .......................................... 36

3.10 Digital T ransm ission S y stem ................................................................... 36

3.11 S ta tistic a l D istribution of y n j .............................................................. 44

3.12 D etectio n T hreshold vs. False A larm P ro b ab ility .............................. 46

3.13 M issed D etectio n P robability for V arious Pp. X ............................... 47

3.14 Effective D egrees of F re ed o m of 7 ,l.r(X) ............................................ 48

3.15 B aseband Sim ulation, SNR = 46 dB, 8 t h o rd e r Ladder ................... 51

3.16 B aseband Sim ulation, SNR = 20 dB, 8 t h o rd e r L adder ................... 54

3.17 B inary PSK, SNR = 12.4 dB, 8 th o r d e r L adder ................................... 57

3.18 B inary PSK. SNR = 0 dB, 8 t h o rd e r L ad d er ......................................... 58

3.19 B inary PSK, SNR = 12.4 dB, 2nd o r d e r L ad d er ................................... 59

3.20 B inary FSK, SNR = 26.3 dB, 8 t h o rd e r L adder .................................... 60

3.21 B inary FSK, SNR = 12.4 dB, 4 th o rd e r L adder .................................... 61

xi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1 R o ta tio n in G eneralized C oordinate System s ..................................... 71

4.2 The CORDIC F unctions ................................................................................ 73

4.3 The R eversed Sign CORDIC Functions .................................................... 75

4.4 Y older's R otation S equences .................................................................... 80

4.5 A P a ra lle l CORDIC M achine A rc h ite ctu re ............................................. 83

4.6 P e rfo rm a n c e of W alther’s CORDIC M achine ......................................... 86

4.7 C o m p u ter S im ulation R esults .................................................................. 96

4.8 P e rfo rm a n c e of H ybrid CORDIC Schem e ............................................... 100

4.9 G eom etric In te rp re ta tio n of th e CCM .................................................... 104

4.10 A M achine A rc h ite c tu re for th e CCM ...................................................... 120

5.1 R ecursions In d u ced on th e Rows of th e Cholesky F a c to rs .............. 149

5.2 F a st Cholesky b y Rows in L adder F orm ................................................ 149

5.3 A P ip elin ed A rray of P ro c e sso rs ............................................................. 150

x5.4 Fully P ip elin ed Givens M ethod on a L inear A rray .............................. 155

5.5 . A rray In p u t S equence for Givens Algorithm ........................................ 157

5.6 L inear A rray A ctivity C hart ....................................................................... 160

5.7 A Two D im ensional A rray for Givens M ethod ......................................... 162

5.8 O p eratio n of th e Dual L inear A rray ......................................................... 165

5.9 Tim e-Space Dual A rray Activity C hart ................................................... 166

5.10 Dual T riangular A rray ............................................................................... 167

5.11 T rian g u lar A rray fo r H yperbolic Cholesky ........................................... 183

5.12 The R e c tan g u la r A rray ............................................................................. 186

5.13 The Double-H exagonal A rray ................................................................... 187

5.14 The H exagonal A rray ................................................................................. 189

5.15 Closed Ball Topology .................................................................................. 191

5.16 C oordinate A ssignm ent fo r T heorem 1 ................................................ 193

x ii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.17 T rav ersal O rdering ...................................................................................... 194

5.18 C oordinate S ystem for Givens A rray ................................................... 197

5.19 Eigenvalue D ecom position - QR D ecom position ..... 201

5.20 Eigenvalue D ecom position - RQ C alculation ........................................ 202

5.21 Eigenvalue D ecom position - Activity C hart .......................................... 203

6 .1 CORDIC Im p le m e n ta tio n of Square Root L adder F orm ..................... 210

6.2 Dual-CORDIC Chip A rc h ite ctu re ............................................................... 213

6.3 The Fully P a ra lle l CORDIC Block .............................................................. 217

5.4 Bit Slice of A rithm etic U nit ........................................................................ 220

6.5 A R e g iste r Cell ................................................................................................ 220

6 .6 P ipelined CORDIC Block .............................................................................. 222

6.7 The P arallel-S erial CORDIC Block ............................................................. 223

6 .8 Bus S tru c tu re of th e P arallel-S erial A rc h ite ctu re .............................. 226

6.9 The S erial-P arallel CORDIC Block ............................................................. 227

6.10 P e rfo rm a n c e C om parison of Various A rc h ite ctu res ......................... 230

6.11 M icrocontroller In stru c tio n S et .............................................................. 236

6.12 M icrocontroller A rc h ite ctu re ................................................................... 236

6.13 The D isc re te F o u rie r T ransform Im p le m e n ta tio n ............................. 240

6.14 L adder F orm S p eech S yn th esizer .......................................................... 242

6.15 The U nnorm alized L adder F o rm ............................................................. 244

6.16 LMS A daptive E qualizer ............................................................................. 246

6.17 Adaptive E qualizer Im p le m e n ta tio n ....................................................... 246

xiii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER ONE

INTRODUCTION

Over th e p a s t d ecad e, th e world has b e e n w itness to a n e le c tro n ic

rev o lu tio n t h a t has b e e n p rim a rily due to th e dev elo p m en t of th e Large

S cale In te g ra tio n (LSI) technology, and in p a rtic u la r, th e m ic ro p ro c e sso r.

V irtually ev ery in d u stry h a s b e n e fitte d fro m th e ab ility to fa b ric a te a

co m p u tin g m ach in e on a single chip. The c u rre n tly em erging V ery Large

Scale In te g ra tio n (VLSI) tech n o lo g y pro m ises still h ig h e r c irc u it d e n sitie s on

a chip. With th e ability to m a n u fa c tu re over a m illion devices o n a chip,

w h at does one build, and hew? The p ro b le m h e re is twofold [Mo79]. F irst, it

is difficult to build tru ly g e n e ra l p u rp o se sy ste m s th a t have a wide m a rk e t

ap p eal. W hereas SSI (sm all scale in te g ratio n ) afforded th e ability to

c o n s tr u c t two flip-flops p e r chip- a r a th e r g e n e ra l c irc u it- th e co m p lex ity of

VLSI sy ste m s a p p e a rs to n e c c e ssa riiy specialize th e m . C onsequently, th e

m a in th r u s t in VLSI r e s e a r c h in g e n e ra l and th is th esis in p a rtic u la r, has

b e e n to stu d y s tr u c tu r e s t h a t a re g e n e ra l p u rp o se w ithin a class of

p ro b lem s. Second, since e n tire sy stem s a re fa b ric a te d on chips, th e design

ta s k s ca n n o t be s e g m e n te d as w ith SSI and MSI an d to som e d eg ree, LSI

s y s te m s an d th e d e sig n e r n e e d s to b e fam iliar w ith m any a sp e c ts of design,

fro m c irc u its to m ach in e o rg an iz atio n an d alg o rith m s [Me 8 l].

This th e s is is c h a ra c te riz e d b y b o th of th e above ite m s. F irst, th e

c h o se n class of p ro b le m s is in signal p ro cessin g a n d re la te d lin e a r a lg e b ra

o p e ra tio n s. The u ltim a te goal is to d esig n a p ro g ra m m a b le , cu sto m

in te g ra te d c irc u it c a p ab le of efficiently im p lem en tin g com plex signal

p ro ce ssin g alg o rith m s u s e d for s p e e c h analysis, digital co m m u n ic a tio n s and

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 2 -

o th e r a re a s re la te d to e s tim a tio n th eo ry . Signal processing h as tra d itio n a lly

o ffered th e g r e a te s t challenge to in teg ratio n , being a sp e e d in te n siv e

a p p lic a tio n o n th e leading edge of technology, even fo r th e sim p le st

a lg o rith m s . VLSI affords f a s te r c irc u its th a n LSI, th u s allowing th e

c o rs id e ra tio n of still m o re com plex m eth o d s of processing. Secondly,

efficien t realizatio n s n e c c e ssa rily include a d e ta ile d stu d y of th e a lg o rith m s

to b e im p lem en ted , an y n u m e ric a l m eth o d s req u ire d to rea liz e th e m a n d a

s tu d y of c o m p u te r a rc h ite c tu re itself. P r e s e n t day thinking h a s led to signal

p ro c e ssin g m ic ro c o m p u te rs w hich m ay b e sim ply view ed as g e n e ra l p u rp o se

m ic ro c o m p u te rs w ith a ra p id m u ltip ly -an d -accu m u late facility; th e N ippon

E le c tric chip [KNSYM80], th e AMI chip [AMI79] an d th e Bell L a b o rato rie s

d ig ital signal p ro c e s s o r (DSP) [B 0 8 O] a re ty p ic a l exam ples. In c o n tra s t, th e

p re m ise of th is d iss e rta tio n is to exam ine th e m ath e m a tic a l fo rm u la tio n of a

c la ss of alg o rith m s in detail, id en tify th e n a tu ra l p rim itive o p e ra tio n s an d

effectively "m a p " th e alg o rith m s onto efficient a rc h ite c tu re s , th u s realizing

in te g r a te d sy ste m s t h a t "pack " m o re com puting pow er in to a given a re a . As

a r e s u lt, m an y different a re a s, ra n g in g fro m signal p ro cessin g m e th o d s an d

n u m e ric a l algorithm s to c o m p u te r a rc h ite c tu re and p a ra lle l p ro ce ssin g will

b e c o v e re d in th e c h a p te rs to com e, w ith new co n trib u tio n s being m ad e to

all. This stu d y c u lm in a te s in a signal p ro cessin g chip of novel a rc h ite c tu re ,

w hose p rim itiv e o p e ra tio n s s e t is b a s e d o n c o o rd in ate tra n s fo rm a tio n r a t h e r

th a n m ultiplication.

M any e stim a tio n r e la te d signal p ro cessin g alg o rith m s p e rfo rm m a trix

o p e ra tio n s su c h a s "fa cto riza tio n s" in w hich an a rb itra ry m a trix is

r e p r e s e n te d as a p ro d u c t of m a tric e s of sim p ler s tr u c tu r e . The larg e

c o m p u ta tio n a l com plexity of th e s e o p e ra tio n s is resp o n sib le fo r a q u e s t fo r

f a s te r com puting s tr u c tu r e s [La74] [SK75a] [Ch75] [Ku79] a s well as

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 3 -

algorithm s of low er c o m p u ta tio n a l com plexity [St73] [Mo74] [SK75b] [He78]

[LM80] [De82]. Existing u n ip ro cesso rs have b e e n u se d w ith only lim ited

success fo r th r e e m a jo r reasons. F irst, th e y a re u nable to efficiently

com pute a v a rie ty of e le m e n ta ry o p eratio n s s u c h as m ultiplication, v e c to r

ro ta tio n an d trig o n o m e tric functions. These o p e ra tio n s a re very com m on to

th e alg o rith m s of in te r e s t h ere. Secondly, g e n e ra l p u rp o se c o m p u te r

a rc h ite c tu re s provide only cu m b erso m e 'address a rith m e tic fo r d a ta

s tru c tu re s , su ch as c irc u la r buffers, th a t occur fre q u e n tly in

com m unications ap p licatio n s [SBA78]. Finally, signal p ro cessin g alg o rith m s

exhibit a s u b s ta n tia l a m o u n t of parallelism th a t is n o t efficiently ex p lo ited in

a u n ip ro c e sso r s y s te m (A n otable ex cep tio n is th e AMD2900 fam ily [AMD78]

which allows som e p a ra lle lism th ro u g h th e extensive u se of two p o rt ran d o m

a ccess m em o rie s (RAM’s)). The first and th ird ite m s have a pro fo u n d im p a c t

on th e design of a m a c h in e 's a rith m e tic facility, w hich is th e p rim e c o n c e rn

of th is th esis, while th e seco n d ite m falls in to th e re a lm of th e p ro g ra m

co n tro ller a n d m il n o t be considered. P a rallel p ro cessin g a rc h ite c tu r e s for

handling s u c h com plex algorithm s w ere afforded m u ch a tte n tio n in th e

lite ra tu re (see [Ku77] fo r a good survey), how ever th e VLSI technology has

im posed new c o n s tra in ts w hich m e rit renew ed in te r e s t in p a ra lle l com puting

s tru c tu re s . Tb-' m a jo r technological c o n stra in t im pacting th e a rc h ite c tu re

of a n in te g ra te d p a ra lle l p ro c e sso r is one of con n ectiv ity [MC80, C h a p te r 8 ].

The ability to effectively utilize silicon a re a fo r c o n stru c tin g m any

processing e le m e n ts will b e lim ited by th e a re a available fo r in te rc o n n e c t.

F u rth e rm o re , th e com m unications r a te betw een e le m en ts is in flu en ced by

th e c a p a c ita n c e of th e in te rc o n n e c t lines. B oth of th e s e fa c to rs c le a rly call

for p a ra lle l s tr u c tu r e s w ith sh o rt com m unications p ath s. T herefore, th e

problem to b e solved is to "m ap" th e com plex algorithm s onto p a ra lle l

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
- 4 -

a rc h ite c tu re s w hich exhibit s h o rt co m m unications p a th s b e tw e en elem ents,

a n d to e n su re th a t th e ele m en ts a re cap ab le of p erfo rm in g th e prim itive

o p eratio n s w hich n a tu ra lly d e scrib e th e algorithm s. In fac t, it will becom e

a p p a re n t t h a t tn e a fo re m e n tio n e d signal p ro cessin g ch ip is su itab le as an

ele m en t of th e p a ra lle l s tru c tu re .

As is u s u a f w ith stu d ies of c o m p u te r a rc h ite c tu re , it is n e c c e ssa ry to

define a m ea siire of su c ce ss (alb e it subjective) in th e r e s e a rc h (e.g. w hat

c o n s titu te s a b e tt e r a rc h ite c tu re ? ). The goals s e t above provide th re e

different in d ic a to rs of su ccess. F irst, th e chip m u s t b e g e n e ra l purpose

w ithin a class of problem s, t h a t is. one s tr u c tu r e should b e capable of

efficiently im plem enting th e d e sire d s e t of algorithm s. In th is sam e context,

a second m e a su re of su ccess, is to d e m o n s tra te th e u tility of th e chip as a

processing e le m e n t in any p a ra lle l a rc h ite c tu re s t h a t a re developed.

Thirdly, th e chip a rc h ite c tu re should provide m o re com puting power p e r

silicon a re a th a n existing in te g ra te d signal p ro ce sso rs, fo r th e class of

p roblem s u n d e r co n sid eratio n . This will be one of th e e a s ie r ite m s to show,

since m a n ) of th e com plex signal p ro ce ssin g alg o rith m s to be considered

involve sq u are ro o t o p erations, w hich c a n n o t b e e ffic ie n t^ co m p u te d by any

existing m ic ro co m p u te r. Still a n o th e r d esirab le fe a tu re would b e to provide

th e m ultiply-and-accum ulate p rim itiv e o p e ra tio n w ithout im p actin g chip

a re a , since th e n existing signal p ro c e sso rs could be viewed as

"special cases" of th e p r e s e n t chip. Finally, a re g u la r lay o u t would be

ex tre m e ly d esirab le, a n d p e rh a p s even m an d a to ry , if th e above goals a re to

b e m et. Why do th is p ro je c t a t all? The rew ard s a re in m an y ways obvious,

fo r th e ability to p e rfo rm a v a rie ty of signal p ro cessin g ta s k s w ith an

in te g ra te d c irc u it is m u c h so u g h t a fte r.

C hapter Two begins w ith a view of p o p u lar signal p ro cessin g m ethods.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 5 -

T a rg et ap p licatio n s fo r th e ir use a re s tu d ie d in C h ap ter Three, in o rd e r to

a p p re c ia te th e th ro u g h p u t re q u ire m e n ts w hich will be p la c e d on th e final

chip. In th is c o n te x t, a new m eth o d fo r op tim al signal d e te c tio n is

developed, sin c e a n im p o rta n t fe a tu re of th is w ork is to c o n s tru c t

alg o rith m s w hich m ay b e read ily im p lem en ted . The s tu d y of alg o rith m s

allows id en tifica tio n of th e p re d o m in a n t e le m e n ta ry operatio n s; th is tu rn s

o u t to b e a m u c h la rg e r s e t th a n th e usual m ultiply an d a c c u m u la te

prim itiv es com m on in to d a y 's signal p ro ce sso rs. C hapter F o u r is dev o ted to

devising efficient n u m e ric a l tec h n iq u es fo r evaluating th e s e op eratio n s.

Large a rra y s o r m e sh e s of p ro c e sso rs a re developed in C h ap ter Five for

som e relativ ely c o m p u ta tio n intensive m a trix o p eratio n s. These a re again

b a s e d on a r ic h o p e ra tio n s set. Finally, a novel signal p ro cessin g chip

a rc h ite c tu re a p p e a rs in C h ap ter Six, w hich was m o tiv ate d th ro u g h th e

stu d ie s of th e prev io u s c h a p te rs.

BIBLIOGRAPHY

[AMD78] A m erican M icro Devices Inc., AMD2900 B ipolar F a m ily Users

M anual, 1978.

[AMI79] A m e ric an M icrosystem s Inc., S ig n a l p rocessing Peripheral

R e fe re n c e M anual, 1979.

[Bo80] J.R. Boddie, G.T. D aryanani, 1.1. E laum iati, K.N. Gadenz, J.3.

Thom pson, S.M. W alters, ’'A Digital Signal P ro c e s s o r fo r

T elecom m unications A pplications." Proc. o f In t'l. S o lid S ta te

C ircuits C onference, S an F rancisco, CA, 1980

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[Ch75] S.C. Chen, "S peedup of Ite ra tiv e P ro g ra m s in M ultiprocessor

S y stem s," Ph.D D issertation, U n iv e rsity o f JZlinois a t Urbana-

Cham paign, Dept, of C om puter Science, Jan u ary , 1975

[De82] J.M. D elosm e, "Algorithm s for F in ite Shift-R ank P ro c esse s," Ph.D

D isserta tio n , S tam ford U niversity, D ept. of E le ctric a l

E ngineering. June 1982

[He78] D. H eller, "A Survey of P arallel A lgorithm s in N um erical L inear

A lgebra,” SIAM R eview , Vol. 20, No. 4, O ctober 1978, pp. 740-776

[KNSYM80] Y. Kawakam i, T. Nishitani, E. Sugim oto, E. Y am au rh i, M. Suzuki,

"A Single-Chip Digital Signal P ro c e s s o r fo r V oiceband

A pplications," Froc. o f I n tl . S o lid S ta te C ircuits C onference, S an

F ran cisco , CA, 1980

[Ku77] D. Kuck, "A Survey of P a ra lle l M achine O rganization and

P ro g ra m m in g ," Assoc, o f C om puting M achinery, C om puting

S u r v e y s , Vol. 9, No. 1, M arch 1977, p p. 29-59

[Ku79] H.T. Kung, "L et’s Design A lgorithm s for VLSI," Proc. o f the F irst

C aitech VLSI S y m p o siu m , C alifornia In s titu te of Technology,

1979, pp. 65-90.

[La74] L.L am p o rt, "The P arallel E x ecu tio n of DO Loops,"

C om m unicaziorts o f the Assoc, o f C om puting M achinery, Vol. 17,

No. 2, F e b ru a ry 1974, pp. 83-93

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 7 -

[LM80] D.T.L. Lee an d M. Morf, "Recursive Square-R oot L adder E stim atio n

A lgorithm s," Proc. 1 9 8 0 ICASSF, Denver, CO, April 9-11, 1980,

[MC80] C. Mead, L. Conway. In tro d u c tio n to VLSI S y s te m s , Addison

Wesley, 1980

[He 8 l] C. Mead, "VLSI a n d Technological Innovations,” Proc. o f VLSI81

In te rn a tio n a l Conference, E dinburgh, Scotland, August, 1981.

. [Mo74] M. Morf, "F a st A lgorithm s for M ultivariable System s", Ph.D

D issertation, S ta n fo r d U niversity, Dept. of E lectrical

E ngineering, 1974.

[Mo7S] G. Moore, ’A r e We Really R eady for VLSI?," Proc. o f the F irst

Caltech VLSI Sym posium ., California In stitu te of Technology,

1979.

[SBA78] A. Sewards, L. B eaudet, H. Ahmed, "Forw ard E rro r C orrection on

a n A eronautical S a tellite Channel.” Proc. o f th e AGARD P anel on

Avionics, May, 1978.

[SK75a] A.H. Sam eh, D. Kuck, "L inear S y stem Solvers for P arallel

C om puters," Technical R eport 75-701, U n iversity o f Illin o is at

Urbana-Champaign, Dept, of C om puter S cience, F e b ru ary , 1975.

[SK75b] A. Sam eh, D. Kuck, "A P a rallel QR-Algorithm fo r Sym m etric,

Tridiagonal M atrices," Proc. o f Second L a n g ley Conference on

S c ie n tific C om puting, 1975.

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
- 8 -

[St73] H. Stone, "An Efficient P arallel A lgorithm for th e Solution of a

T ridiagonal L inear S y stem of Equations." J o u rn a l of the

A ssociation o f C om puting M achinery, Vol. 20, No. 1, January,

1973, pp. 27-36

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 10 -

CHAPTER TWO

SIGNAL PROCESSING ALGORITHMS

Signal p ro ce ssin g algorithm s m ay b e b ro ad ly classified as e ith e r

fre q u e n c y o r tim e dom ain. The fo rm er, m o re tra d itio n a l m e th o d s m ak e

extensive u se of o rthogonal tra n sfo rm s while th e l a t te r fre q u e n tly e n ta il th e

app licatio n of e s tim a tio n th eo ry . The m o re p o p u lar alg o rith m s fro m b o th of

th e s e classifications will be c o n sid e red fo r VLSI re a liz a tio n in th is tre a tis e .

I t will b eco m e d e a r th a t a single co m p u ta tio n s tr u c tu r e is c ap ab le of

efficiently im p lem en tin g b o th ty p es of algorithm s.

The D iscrete F o u rie r T ransform (DFT) [OS75] is p e rh a p s th e m o st

com m on of ali fre q u e n c y dom ain algorithm s. Its wide app licab ility in a re a s

su c h as s p e c tra l estim atio n , filtering, digital co m m unications, c o n tro l and

id en tificatio n m a k e s its VLSI rea liz a tio n a p ro b le m of g r e a t im p o rta n c e . A

v a rie ty of fa s t alg o rith m s have b e e n developed fo r th e DFT, th e m o st

com m on being th e F a st F o u rier T ransform (FFT) a lg o rith m of Cooley and

Tukey [CT65]. C onsiderable a tte n tio n has b e e n given to th e c o n s tru c tio n of

DFT a n d FFT p ro c e sso rs, e.g. [De74] [De79] [P e 6 8 ] [Sw78].


\

Som ew hat less a tte n tio n h as b e e n afforded to th e new er tim e do m ain

algorithm s b a s e d o n estim ation, theory. L adder form s (see [Tu 8 Q] fo r a good

survey) a re am ong th e m o st p ro m isin g of su c h alg o rith m s owing to th e ir low

co m p u ta tio n a l com plexity an d rec u rsiv e s tru c tu re . The b a sis or th e s e so

called "la d d e r filters" lies in th e th e o ry of e x a c t le a s t sq u a re s p re d ic to rs an d

w h iten ers [VT6 8 ]. These algorithm s enjoy as wide ap p licab ility as th e DFT.

however, w ith fre q u e n tly b e tte r p ro p ertie s. F or exam ple, ia d d e r fo rm s u se d

for s p e c tr a l e s tim a tio n exhibit co n sid erab ly less sidelobe lea k a g e th a n th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 11 -

F o u rie r tra n s fo rm . There a re also a v a rie ty of signal p ro cessin g r e la te d

m a trix o p e ra tio n s, however th e s e will be c o n s id e re d in a la te r c h a p te r.

2.1 THE DISCRETE FOURIER TRANSFORM

The DFT, X (k), of an N p o in t sequence, x (n ), is defined by:

E z (n ) 0< k <N - 1
n= 0
* (* ) =
otherw ise

w here

WN = e-J'ai/jV

and x (n ), X (k ) a re in g e n e ra l com plex.

The DFT is th e tra d itio n al m eth o d of o b taining a s p e c tra l re p re s e n ta tio n

of a tim e se rie s a n d has b e e n effectively em ployed in a m u ltitu d e of a re n a s

including s p e e c h an d p ic tu re processing. Im p le m e n ta tio n s of th e DFT will be

c o n sid e re d in C h ap ter Six.

2 .2 EXACT LEAST SQUARES LADDER ALGORITHMS

C o n sid er th e p ro b lem of whitening a c o rre la te d tim e series, i.e.,

d e te rm in in g th e p ro c e ss of new (o r u n p re d ic ta b le ) in fo rm a tio n c o n ta in e d in

e a c h e le m e n t of th e series, know n as th e "in n o va tio n s p ro c e ss" [GK73]. The

tim e s e rie s is assu m e d to a rise from a finite o rd e r au to re g re ssiv e (AS)

p ro c e ss, y t a n d th e innovation of e a c h sam p le is d e te rm in e d fro m an

a u to re g re ssiv e , lin e a r le a st sq u a re s p re d ic tio n of t h a t sam ple b a se d on ’n ’

previous o b servations. This p ro b le m h a s th e c a sc a d e la d d e r fo rm so lu tio n

shown in F ig u re 2.1. Notice th e sim plicity of th e s tr u c tu r e , w hich co n sists of

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 12 -

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 13 -

delay elem en ts, gains an d sum m ing ju n ctio n s. The filte r coefficients a re

a d ju ste d to yield th e forw ard a n d backw ard p re d ic tio n e rro rs defined as:

e n .T = y T - Use ( y T I < y k & - 1 )

r n .T = y T ~ n ~ l l s e ( l / f - n I \V k ] k = T - n ^ l )

w here Use (2 jy ) d e n o te s th e linear, le a s t sq u a re s e s tim a te of x given y .

An e x tre m e ly useful p ro p e rty of th is s tr u c tu r e is th a t all th e p red ic tio n

e rro rs of o rd e r 1 to n a re obtained sim ultaneously in one s e t of

calculations. This is p a rtic u la rily useful fo r identifying th e o rd e r of a m odel.

The alg o rith m is said to be bo th o rd e r a n d tim e rec u rsiv e i.e., th e

coefficients of su c ce ssiv e filte r stag es a re c o m p u te d fro m th e previous

stag es (o rd e r re c u rsio n ) an d th e filter coefficients a re also u p d a te d b a sed

on new d a ta (tim e re c u rsio n ) th ro u g h th e ad d itio n of c o rre c tio n te rm s

r a th e r th a n rec o m p u tin g th e filte r fo r e a c h tim e s te p or o rd er.

A lthough la d d e r s tr u c tu r e s m ay be d eriv ed w hen th e p ro ce ss

covariance is e ith e r know n o r unknown, i t is th e l a t te r case which is of

fo rem o st in te r e s t in re a ltim e applications a n d in th is th esis. The p ro ce ss

covariance is e s tim a te d fro m seg m en ts of p a s t d ata; th e e x a c t se g m e n t

being d e t e r m in ed b y a d a ta window. The "slid in g w in d o w ” lad d e r form

[PFM82] uses a r e c ta n g u la r d a ta window of c o n s ta n t length, R , sam ples.

The "w eighted p re w in d o w e d " lad d e r fo rm [LM60], in w hich a positive

"fo rg e ttin g fa c to r " , X, of m agn itude less th a n u n ity is u se d to weight th e

previous d a ta a t e a c h tim e ste p , thus deem phasizing th e im p o rta n c e of o ld er

sam ples, is th e m o s t p o p u lar. B oth of th e s e filte rs a re able to tra c k sm all

variatio n s in p ro c e s s s ta tis tic s b e c au se of th e ir co n tin u e d deem phasis of old

data.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 14 -

The prew indow ed la d d e r fo rm is d escrib ed in [LM80] and d efined by:

£o.t = r o.r = V t

a _ %a . sn.T*1rn.T
“ n+1.7+1 _ * &n+l .T + “
7r.-l.T

_ Ar+1.7
^ " J#.T

IV - _ An+1.7*

-Kn.T-1

Sn+1.T — sn.T ~ Kn+l.T rn.T-l

tta-i.T ~ r n.T-l ~ Kn+l.T En.T

DC
Xn.T+l —
~ X f?z +
Kn.T 4-
_
1
7 » -l .T

Kk.T+1 = x Kk.T + T
4 '7+1
7 n - i.r

^n+l.r
7 i» « .r = 7 n .r -
f i.r
w here

An+i 7 is th e (7t+l)Wl order partial correlation of yr

K £ + u • Rn+i.r are the forward and backward filter gains of the

(7i+ l)<ft filter stage

Rr .t . Rn.T are the covariances of the forward and backward


prediction errors and

7 „ .7 is a likelihood v ariable of n th o rd e r

These equations a p p e a r q u ite form idable com putationally, exhibiting a

n u m b er of housekeeping c h o re s a t e a ch tim este p , su c h as continuous

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 15 -

e s tim a tio n of th e re sid u a l covariances R£j an d Rn.T- This o b servation

p ro m p te d th e developm ent of th e so called "sq u a re ro o t norm alized" la d d e r

fo rm s in w hich th e ran d o m v ariab les in th e alg o rith m a re n o rm alized to

have u n it varian ce, th u s elim inating th e n e e d for th e R£j and. R £ .j

eq u atio n s. The v ariables a re th e n f u rth e r n o rm a liz e d by th u s rem oving

th e y n T u p d a te . The resulting a lg o rith m c o m p rise s few er equations a n d

h a s th e a d d e d fe a tu re th a t all q u a n titie s have m ag n itu d e b ounded by u n ity

th e r e b y m aking fixed p o in t im p le m e n ta tio n viable. The prew indow ed

a lg o rith m is sum m arized:

Order Updates:

Pn+l.T = V l —Vn.T V l —Vn.T-1 Pn +l.T-1 + Vn.T Vn.T-1

_ vn.T ~ Pn+l.T V n .T -l
Vn +1.T ~ i- g r- g--------
"V 1-Pn+l.T V l ~ V n . T - l

_ V n .T -l ~ Pn+l.T Vn.T
n + lS V l - p | +i.7-V1-1/2.J-

w here v and 77 a re th e n orm alized fo rw ard an d backw ards resid u a ls

re s p e c tiv e ly a n d p is th e filte r g a in (o r re fle c tio n coefficient or n orm alized

p a rtia l c o rre la tio n ).

Tim e Updates:

R t - R t - i [ X + h$ ]“*

Vt
V o .T - vo. t = —j -m-
"v R t - i

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 16 -

N otice t h a t th e m ajo r am o u n t of com plexity is in th e fo rm e r th re e equations

since th e s e m u st be e x e c u te d for e a c h s ta g e of th e la d d e r filter. These

equations will consequently be of p rim a ry c o n c e rn in th e p re s e n t work an d it

suffices to s ta te in advance th a t a com puting s tr u c tu r e capable of efficiently

p erform ing th e s e co m p u tatio n s will also b e c a p ab le of efficiently calculating

th e la s t th r e e equations.'

While ap p licatio n s of th e DFT have b e e n quite well studied, new uses of

la d d e r alg o rith m s, som e of which will be d e s c rib e d in th e n e x t c h ap ter, a re

c o n sta n tly em erging.

BIBLIOGRAPHY

[CT65] J.W. Cooley, J.W. Tukey, "An A lgorithm for th e M achine Calculation

of Complex F o u rie r S eries." Math. C om putation, Vol. 19, 1965, pp.

297-301

[Be74] A.M. Despain, "F ourier T ran sfo rm C om puters Using CORDIC

Ite ra tio n s," IE E E Trans. C om put., Vol. C-23, Oct. 1974, pp. 993-

1001 .

[De79] A.M. Despain, ’V ery F a s t F o u rie r T ransform Algorithms fo r

H ardw are Im p lem en tatio n ," IE E E Trans. Com puters, Vol C-28,

No. 5, May 1979, pp. 333-341.

[GK73] M. Gevers, T. Kailath, "An Innovations A pproach to L east-S quares

E stim ation, P a r t VI : D iscrete-Tim e Innovations R ep resen tatio n s

a n d R ecursive E stim ation," IE E E T ransactions on A utom atic

Control, Vol. AC-18, D ecem ber, 1973, pp. 588-600.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 17 -

[LM80] D.T.L. Lee an d M. Morf, "R ecursive Square-R oot L adder E stim atio n

A lgorithm s," Proc. 1 9 8 0 1CASSP, Denver, CO, April 9-11, 1980,

[0S75] A. Oppenheim , R. Schafer. D igital S ig n a l P rocessing, P re n tic e

Hall, 1975

[PFM82] B. P o ra t, B. F ried lan d er, M. Morf, "Square Root Covariance

L ad d er A lgorithm s," IE E E Trams, on A utom atic Control, Vol. 27,

No. 4, August, 1982.

[Sw78] E. Sw artzlander, "VLSI Technology fo r Signal P rocessing," Proc.

o f GOMAC, M onterey, CA., 1978, pp. 76-79.

[VT68] H. Van Trees, D etection, E s tim a tio n and M odulation Theory,

Volume I, J. Wiley and Sons, 1968.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 18 -

CHAPTER THREE

APPLICATIONS OF LADDER FORMS

This se c tio n will explore a variety of signal p ro ce ssin g applications of

lad d e r form s. S p e e c h analysis and sy n th e sis b a sed on lin ear predictive

tech n iq u es will b e d iscu ssed first, followed by ad ap tiv e equalization an d

finally a new tec h n iq u e for digital signal d e te c tio n . A lgorithm s of lin e a r

alg eb ra c a n fre q u e n tly b e c a st into a la d d e r s tr u c tu r e , however exam ples of

th is will b e d e fe rre d till C hapter Five. The aim of th e p re s e n t study is to

identify th e im p o rta n t com ponents of th e a lg o rith m s th a t will ultim ately be

im p lem e n te d in a special a rc h ite c tu re , th u s unifying th e algorithm s, a n d to

develop a n a p p re c ia tio n fo r th e th ro u g h p u t re q u ire m e n ts to be placed on

th e signal p ro ce ssin g chip.

3.1 THE SPEECH ANALYSIS PROBLEM

L adder fo rm s a re useful s tru c tu re s fo r th e analysis an d synthesis of

sp eech using th e m eth o d s of lin e a r p re d ic tiv e coding (LPC) [MG76],

A nalysis, n o t to be confused w ith re co g n itio n , r e f e rs to th e problem of

d e te rm ining a s p e c tr a l re p re s e n ta tio n of a d isc re tiz e d seg m en t of sp e e c h

while s y n th e s is is th e a c t of re c o n s tru c tin g th e analog sp e ec h from its

sp e c tra l re p re s e n ta tio n .

T echniques fo r sp e e c h synthesis will be d e sc rib e d first, following which,

th e sp e ec h analysis p ro b le m will be discussed. The u se of lad d e r form s for

sp eech an aly sis will b e th e m ajo r case m otivating th e design of a la d d e r

form chip set.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 19 -

3.1.1 S p eech S yn th esis T echniques

A triv ia l way to synthesize a p a rtic u la r s e g m e n t of sp e ec h is to s to re

sam ples of th e se g m e n t in digital form , for ex am p le pulse code m odulation

(PCM)[TS7l]. If th e s e sam ples a re m ad e fre q u e n tly enough (i.e. a t le a s t a t

th e N yquist r a te ) th e n th e sp e ec h is re g e n e r a te d b y passing th e sam p les

th ro u g h a digital to analog (D/A) c o n v e rte r. Such a n a p p ro ach re q u ire s a

relatively la rg e a m o u n t of d a ta to sy n th e size a s h o rt seg m en t of sp eech .

A lternately, good quality sp e ec h m ay g e n e ra lly b e synthesized w ith m u c h

less d a ta by deriving p a ra m e tric m odels fo r th e s p e e c h p ro d u ctio n p ro c e ss

and rep la cin g a sp e e c h segm ent by its few er m odel p a ra m e te rs , w hich

p resu m a b ly re q u ire s few er bits.

A lin e a r m odel fo r sp eech p ro d u c tio n was developed in 1960 b y F a n t

[Fa60] in w hich th e various physiological e le m e n ts of th e vocal tr a c t a re

m odelled as tim e varying lin ear filters (F igure 3.1), th e ag g reg atio n of w hich

is r e f e rr e d to a s th e "syn th esis f i l t e r " . S p e e c h is p ro d u ce d by exciting th e

sy n th esis filte r w hose sp e c tra l p a ra m e te rs have b e e n a d ju sted to yield a

p a rtic u la r s p e e c h segm ent, as shown in F igure 3.2. The filter in p u t for

unvoiced sounds, s u c h as / / / in f i s h is a w hite noise signal. Voiced sounds

su c h as / i / * in eve a re p ro d u ce d w ith a n im pulse tra in of p erio d P , th e

p itc h p eriod.

S p e e c h signals ad m it to a u to re g re ssiv e m odelling, m eaning th a t th e

a g g re g a te filte r h as only poles. This filte r is typ ically of te n th o rd e r so t h a t

th e d a ta s e t re q u ire d to synthesize a s h o rt se g m e n t of sp e ec h co n sists of

th e te n filte r coefficients, som e p itc h p e rio d in fo rm atio n and p e rh a p s som e

pow er in fo rm a tio n fo r th e w hite n o ise p ro ce ss. This c a n r e p r e s e n t

co n sid erab le savings over d ire c t sto ra g e of PCM coded sam ples as will be

se e n in t h e exam ple to follow.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 20 -

impulse t r a i n

G lo tta l Model Vocal T ra c t Model

white noise

S p e ctral Correction Lip Radiation


speech

Figure 3 .1 : F i l t e r Model f o r Speech Synthesis

impulse t r a i n

M
SYNTHESIS FILTER /VA/
speech
w hite noise

R eflec tio n C o e f f i c ie n t s

Figure 3 .2 : Speech Synthesis

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 21 -

Texas In stru m e n ts Inc. an n ounced a lin e a r p red ic tiv e sp e e c h sy n th e sis

chip s e t [WB7B] in 1978, consisting of th r e e chips d e p ic ted fu n ctio n ally in

F igure 3.3. A r e a d only m em ory (ROM) is u s e d to s to re th e sp e e c h

p ro d u c tio n p a ra m e te rs fo r th e v o cab u lary to b e u sed in an y p a rtic u la r

ap p lic atio n (e.g., TI’s SPEAK and SPELL re q u ire s th e 26 le tte r s of th e

a lp h a b e t a s well as approxim ately 200 w ords). T hese p ro d u ctio n p a ra m e te rs

a re re trie v e d as re q u ire d by a c o n tro lle r c h ip and p re s e n te d to th e

s y n th e s iz e r chip w hich is an all pole, la d d e r sy n th e sis filter.

One c o m p le te p a ra m e te r strin g is 49 b its in len g th and is re trie v e d

ev ery 20 m illiseconds. In c o n tra st, a 20 m s s p e e c h seg m en t sa m p le d a t

8 KHz utilizing 8 b it PCM would req u ire 1280 bits! Clearly, th e u se of LPC in

sp e e c h sy n th e sis yields a re m a rk a b le savings in th e sy n th esis d a ta s e t (and

h e n c e in th e to ta l a m o u n t of ROM re q u ire d fo r th is ch ip set), how ever som e

qu ality m u s t be sacrificed. F requently th e quality afforded by 8 bit PCM is

n o t re q u ire d in m an y applications.

3 .1 .2 S p eech A nalysis

D ep icted in F igure 3.4, sp e ec h analysis involves d e t e r m in in g th e filter,

r e f e rr e d to as th e "a n a lysis" o r " in v e rse " filter, w hich w hitens a sp e e c h

p ro c e ss. Clearly, since th e sy n th esis filte r, w hen d riv en by an im pulse tr a in

an d w hite noise, p ro d u c e s speech, th e analysis o p e ra tio n s m u st g e n e ra te

th e im p u lse tra in , w hite noise p ro ce ss an d s p e c tr a l p a ra m e te rs fro m th e

s p e e c h p ro ce ss. Consequently, a c a sc a d e of th e analysis e n d sy n th e sis

filters h a s u n ity tra n s fe r function (assum ing th e m odelling assu m p tio n s

w ere a c c u ra te ). An in te rp re ta tio n of th e analysis p ro b lem is c le a r in an

e s tim a tio n co n tex t. The in p u t o r s p e e c h p ro c e ss is c o rre la te d a n d th e

analysis filte r is t h a t filte r which p ro d u c e s th e innovations p ro c e ss [GK73]

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 22 -

Memory S y n th es iz er — [ f l W
speech

C on tro lle r

Figure 3 .3 : Three Chip S y n thesizer by Texas In strum ents Inc.

impulse t r a i n (p itc h )

r \ r r\ : ANALYSIS FILTER
speech

w h ite n o is e (power)

R eflection C o e f f i c ie n t s

Figure 3 .4 : Speech A nalysis

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 23 -

fro m th e in p u t. U n d er assum ptions of s ta tio n a rity , frequency dom ain

ex p ressio n s for th e w hitening filter w ere given by W iener [Wi49], However,

fo r th e p u rp o se of realizing a sp eech a n aly zer in VLSI, it is p referab le to view

th e p ro b le m in th e tim e dom ain, and in p a rtic u la r, in a d isc re te tim e

dom ain, h e n c e th e u se of lad d e r filters.

The an aly sis p ro b le m is actually th re e fold;

(1) d e te rm in e th e whitening filter

(2 ) d e te rm in e th e period of th e im pulse tra in , i.e.. th e p itc h period.

(3) d e te rm in e th e power of th e noise p ro c e ss.

This th e s is will p rim a rily ad d ress th e firs t issue. Ite m (3) is of course a

b y p ro d u c t of th e w hitening filter since th e noise p ro ce ss (i.e. th e

u n c o rre la te d innovations p rocess) is th e o u tp u t of th is filter, however,

fre q u e n tly th e noise pow er is norm alized to u n ity an d a c c o u n t of th e gain is

ta k e n elsew here. Many au th o rs have e x am in ed v arious tech n iq u es for p itc h

p e rio d e x tr a c tio n - se e for exam ple [AH71]. A novel m a xim um likelihood

a p p ro a c h using la d d e r form s was developed by Lee and Morf [LH80]. This

m eth o d u tilizes th e s ta tis tic a l d istrib u tio n of th e la d d e r form likelihood

v ariable (se e C h a p te r Two) to red u c e th e p ro b le m of d e te c tin g a p itc h pulse

to a b in a ry h y p o th e sis testin g problem .

3 .1 .3 S p eech A nalysis w ith Square R oot N orm alized Ladder Form s -


The A nalysis f ilt e r

N otice t h a t th e la d d e r form algorithm s of S e c tio n 2.2 com pute th e b e s t

w hitening filte r fo r a process in a le a s t sq u a re s se n se i.e. th e new

in fo rm atio n or innovation of a sam ple of th e in p u t p ro c e ss is obtained a s th e

difference b e tw e en t h a t sam ple and a lin e a r le a s t sq u a re s e stim ate of th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 24 -

sam ple fro m a h isto ry of th e p ro ce ss. T herefore, la d d e r algorithm s m ay b e

u sed to d e te rm in e th e analysis filter u n d e r a le a s t sq u are e rro r c riterio n .

G enerally, analysis is p e rfo rm e d o n a fra m e of sp e ec h sam ples u n d e r

th e a ssu m p tio n th a t th e sh o rt te r m s p e e c h s p e c tru m is s ta tio n a ry .

However, sin c e la d d e r algorithm s a r e b o th tim e and o rd e r recu rsiv e,

analysis m ay be done on a sam ple by sa m p le basis, elim inating th e n e e d fo r

su ch a ssu m p tio n s. In fact, slow v aria tio n s in th e sp e c tru m a re tra c k e d by

th e a lg o rith m due to its adaptive n a tu re . F u rth e rm o re , sam ple by sam p le

analysis d e te rm in e s th e inverse filter in d e p e n d e n t of any p a rtic u la r coding

tech n iq u e, leaving th is choice to th e s y s te m designer.

Rem ark:

A le a s t sq u a re s e rro r c rite rio n is fre q u e n tly o b jec ted to b e c au se it does

n o t a p p e a r to c o rre la te well w ith su bjective d isto rtio n m e a su re s. However, a

w eighted le a s t sq u a re s (WLS) c rite rio n c o rre la te s very well w ith th e s e

subjective m e a su re s. Often, WLS c a n b e achieved by applying a le a s t

sq u a re s c rite rio n to a p refiltere d v e rsio n of th e d a ta , h en ce justifying th e

use of a le a s t sq u a re s m easu re.

L a d d e r algorithm s, like m o st signal p ro ce ssin g algorithm s, a re q u ite

com p u tatio n ally expensive req u irin g m uch p rocessing power. T he

discussions of C h a p te r Two su g g e ste d th a t th e sq u are r o o t n o rm alized

la d d e r re c u rs io n s a re m o re su ite d to digital im p lem e n ta tio n th a n th e ir

u n n o rm a liz e d c o u n te rp a rts, since th e r e a re few er equations a n d all

va ria b le s a r e m ag n itu d e norm alized, m aking fixed point im p le m e n ta tio n

viable. U nfortunately, th e norm alized eq u atio n s also req u ire th e ca lc u la tio n

of sq u a re ro o ts, w hich is generally a tim e consum ing propositition. The n e x t

s e c tio n will explore a m eth o d of re c a s tin g th e la d d e r re c u rsio n s, in an effort

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 25 -

to expose th e o p e ra tio n s w hich a re fu n d a m e n ta l to th e equations and to

re d u c e th e ir c o m p le x ity to a m anageable level.

3 .1 .4 An A ltern ate View of th e Square R oot N orm alized Ladder E quations

R ecall th e sq u a re ro o t norm alized la d d e r algorithm :

P n + l.r = V 1 —V n .T V 1 - V n . T - l P n + l.T - l + v n .T V n . T - l

,, _ v n .T ~ Pn+1.7- ^ w T - 1
Ti+l.r r 5 r 5
V i - P n t i J V l-rjn .T -i

_ V n .T -1 ~ P n + l.T v n .T
n* " T

w here i/ and rj a re th e norm alized forw ard a n d backw ard resid u a ls

resp e c tiv e ly a n d p is th e filter gain (o r re fle c tio n coefficient or norm alized

p a rtia l c o rre la tio n ).

Clearly, th e dig ital rea liz a tio n of th e s e th r e e eq u ations is nontrivial,

req u irin g th e u se of sq u a re ro o t, m u ltip lic a tio n and division operations.

G enerally, th e s e o p e ra tio n s a re quite expensive, req u irin g e ith e r special

h ard w are (e.g. a r r a y m ultipliers) o r req u irin g c o n sid e rab le ex ecu tio n tim e

th ro u g h th e r e p e a te d u se of sim ple h a rd w a re (e.g. sh ift a n d ad d in ste a d of

m u ltip lic a tio n o r N ew ton's m eth o d [SD65] for sq u are ro o t operations).

However by placin g a p p ro p ria te in te rp re ta tio n s o n th e la d d e r variables, th e

la d d e r eq u a tio n s m ay b e w ritte n in te rm s of v e c to r ro ta tio n s w hich a re few

in n u m b er. E fficient n u m erica l alg o rith m s for com puting th e s e ro ta tio n s

will be given in C h ap ter Four, th u s allowing fo r a p a rtic u la rly effective

im p le m e n ta tio n of th e la d d e r filter.

F o r n o ta tio n a l convenience, le t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- Zb

P ~ P n + l . T —1 • P+ ~ P n + l.T (3.1a)

V = Vn S , V+ - Vn+l'T (3.1b)

V = V n .I'-l ■ V+ = V n + l.T (3.1c)

ZC = V l - x 2 . X~c = (1 - X * ) ~ l/Z (3. Id)

Then th e la d d e r re c u rs io n s m ay b e re w ritte n as:

P+ = ifrfp + V7] (3.2a)

v+ = ( v - p + r f i / p l r f (3.2b)

77+ = (77 -p+v)/p%vl (3.2c)

Now, observe th a t:

( 1 ) Since \v\, | p | , 177 1 < 1 always, in te r p re t v, 77, p a s co sin es of

som e angles and resp e c tiv e ly . N ote th a t if

x = cos tJj; th e n x c = sini?z .

(2) Let

if v
V = (3.3)
—v i f *- [ 58 * - [ S 7
w here V a n d N a re o rthogonal o r 2x2 ro ta tio n s. Then th e m a trix

p ro d u c t VAN is

if V 7f -7 7 i f r f p + vr\ i n f - p r\if
VAN = (3.4)
—V i f Ikd °1
iJ 77 77c i f 77 —p v r f i f r f + pvrj

p+ ' 1/ .
d o n 't (3.5)
V* care

I/. T}.
an d [1/+ 77+] = [1 0 ] /? (3.6a)
0 0

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 27 -

p r
w here R (3.6b)
0

is S —orthogonal w ith r e s p e c t to:

-1 0
0 1,

N otice th a t u n d e r th e in te rp re ta tio n s of (1) both, V an d N are

orthogonal m a tric e s re p re s e n tin g ro ta tio n s of th e colum n v e c to rs of A

th ro u g h and It is in te re s tin g to no te th e fu n d am e n ta l n a tu re of

ro ta tio n s in th e la d d e r re c u rsio n s. In p a rtic u la r, ro ta tio n s by and t) v

give alm ost th e e n tire u p d a te of th e la d d e r variables, a final J-ro ta tio n being

n e c e ssa ry to yield v+ an d 77+. This is n o t too su rp risin g sin ce la d d e r form s

a re n a tu ra lly r e la te d to th e G ram -Schm idt o rth o n o rm a liz a tio n and

orthogonal tra n sfo rm a tio n s as shown in [MML81]. Im p le m e n ta tio n of th e

la d d e r recu rsio n s, w hich a p p e a r to have m an ag eab le com plexity when

ro ta tio n s a re th e fu n d am e n ta l o p erations, will b e co n sid ered in C h a p te r Six.

3 .2 ADAPTIVE EQUALIZATION

An im p o rta n t p ro b le m in digital telephony is to achieve h ig h d a ta r a te s

acro ss a telep h o n e line, in th e p re s e n c e of a severe filtering effect c a u se d by

th e channel. The a c t of p refilterin g th e input d a ta to th e re c e iv e r (Figure

3.5) to c o m p e n sa te fo r th e c h a n n el c h a ra c te ris tic s is know n as e q u a liza tio n

[LSW65]. A com prom ise e q u a lize r is basically a fixed tra n s fe r function, high

p a ss filter. While a p e rfo rm a n c e im p ro v em en t is realized, su c h a n equalizer

is unable t c effectively cope w ith e ith e r th e variatio n of c h a ra c te ris tic s from

line to line, o r w ith th e slow tim e v ariations of any p a rtic u la r line. An

adaptive e q u a lize r is a filte r whose s p e c tra l p ro p e rtie s a re continuously se t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 28 -

by a n ongoing lea rn in g of th e ch an n el c h a ra c te ris tic s . Initially, th e

equalizer is tra in e d w ith a known tra in in g seq u en ce a n d subsequently, it is

able to tra c k sm all variatio n s of th e line, b a se d on th e actu al, useful data.

P o p u lar im p lem en tatio n s of adaptive eq u alizers a re of th e ta p p e d delay

line v a rie ty (Figure 3.6), in which th e ta p coefficients a re a d ju ste d th ro u g h a

le a s t m e a n sq u ares (LMS) g rad ie n t algorithm [¥i70] els defined by:

Z* = E c*Tk -n
n
cA+1 = - Aen
w here

is th e n th com plex ta p coefficient a t tim e t

rn is a com plex input sam ple (applies to all lin e a r

m odulation schem es)

z k is th e com plex equalizer o u tp u t

A is a rea l a d a p ta tio n c o n sta n t

en is a com plex e rro r signal supplied fro m elsew here in th e

m odem (re fe rre d to as decision fe e d b a c k equalization)

This s tr u c tu r e will b e rev iste d briefly in C h a p te r Six. A draw back of

g ra d ie n t algorithm s is th e ir relatively slow co n v erg en ce ra te , owing to th e

locally optim al b u t globally suboptim al tra je c to ry followed by th e filter

during th e ad aptation. Specifically, changing th e coefficients of th e

eq u alizer a t each tim e ste p in a m a n n e r w hich re s u lts in th e la rg e st

re d u c tio n of th e m ea n sq u are e rro r a t th a t s te p (local optim ality) does n o t

g u a ra n te e th e mosc ra p id tra je c to ry to th e d e s ire d solution (global

o ptim ality). In c o n tra st, th e la d d e r algorithm s a r e globally o p tim al in a

m e a n sq u are sense since th ey satisfy a le a s t sq u a re s c rite rio n a t e a ch ste p

of th e ite ra tio n (it is notew orthy th a t le a s t sq u a re s ta p p e d delay line filters

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Channel Disturbances

1
Modulator 1 x Demod
1 x
Channel Equalizer

Figure 3 . 5 : Channel E q ua liz a tio n

Figure 3 . 6 : Tapped Delay Line Equalizer


- 30 -

also exist, see e.g. [Mo74]). F u rth e rm o re , th e y exhibit th e sam e o rd e r of

c o m p u ta tio n a l com plexity as does th e LMS a lg o rith m [Sh?9], thus

m otivating th e ir stu d y fo r fa s t s ta rtu p eq u alizers. V arious a u th o rs have

s tu d ie d th e e x a c t le a s t sq u ares equalizer [Sh79] an d th e re s u lts of S atorius

a n d P a c k [SP80] a re p re s e n te d h e re .

3.2.1 E q ualizer S tru ctu re

L et ^0 * ^ = 0 b e a known train in g se q u en c e tra n s m itte d a c ro ss th e

ch an n el of F igure 3.5. Let th e in p u t se q u en c e to th e e q u alizer be \x k ]k=c

an d le t

Xjv(fc) = [* (* ) x(fc-l) ••• x{k-N )]T (3.7)

w ith

= 0 V i >n .

The p ro b le m is to d e te rm in e th e N + l dim ensional v e c to r F^rC*) of

ta p coefficients, w hich m inim izes th e m e a n sq u a re e rro r:

mse = £ X*"P $ a(p ) + Y $ { k ) X # i p ) l 2 (3.8;


p=o

The p a ra m e te r, X : 0 < X;= 1, is a re a l c o n s ta n t w hich d e te rm in e s th e

m em o ry of th e equalizer. The eq u alizer c a n eS ectively t r a c k slow ch an n el

v a ria tio n s if th e m em o ry is n o t infinite.

E quating th e derivative of (3.8) to z e ro yields th e e q u alizer equations

for m in im u m m e a n sq u are e rro r, which have a solution in la d d e r fo rm as

d e m o n s tra te d in [SP80]. The re su ltin g e q u a lize r s tr u c tu r e , show n in F igure

3.7, c o n sists of a n e x a c t le a s t sq u a re s la d d e r w hitening filte r a n d a ta p p e d

d elay line. F igure 3.8 is a co m parison of th is la d d e r alg o rith m (LSALE) w ith

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 3] -

«(n )

r_{n) r , (n)

sta g e STAGE STAGE


xin)-
NM
e . (n)

The least squares, adaptive lattice equalizer.

The mth stage of the lattice.

Figure 3 . 7 : Ladder F i l t e r Equalizer

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 32 -

CHANNEL-CORRELATlON MATRIX
EIGENVALUE R A T I O - 11

11 TAP EQUALIZER. NOISE VARIANCE • .001

ALCE
Ui
UJ

u. <
C
O 3
2« -1.0
25 z< GRADIENT
Ui ALGORITHM
2

OPTIMUM

100 200 300 40 0 500 600 700 800 300

NUMBER OF ITERATIONS

Comparison by simulation of convergence properties for eigenvalue


ratio = 11.

CHANNEL-CORRELATlON MATRIX
EIGENVALUE RATIO - 21

11 TAP EQUALIZER. NOISE VARIANCE - .001

0.0 ALCE

u.
O LSALE GRADIENT
ALGORITHM
C
o
.J

OPTIMUM
- 2.0

100 200 300 4 00 500 600 700 800 900

3.0
NUMBER OF ITERATIONS

Comparison by simulation of convergence properties for eigenvalue


ratio = 21.

Figure 3 . 8 : Performance o f Ladder Equalizer

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 33 -

th e conventional IMS ta p p e d delay lin e e q u alizer (G radient) and a g ra d ie n t

la d d e r (ALCE) equalizer, for a ty p ic a l tele p h o n e c h an n el (from [SP80]).

Notice th a t th e convergence and b ias of th e e x a c t le a s t sq u ares la d d e r

equalizer a re fa r s u p e rio r to th e o th e r s tr u c tu r e s , fo r th e sam e o rd e r of

c o m p u ta tio n a l com plexity, b e c au se th e co n v erg en ce tra je c to rie s of th e

fo rm e r m e th o d a re globally optim al.

3.3 DETECTION OF DIGITAL SIGNALS

An age old pro b lem has b e e n th e d e te c tio n of digital signals in additive

coloured G aussian noise (see e.g., [VT6B]). S olutions th a t a re optim al in a

m axim um likelihood sen se have b e e n d e s c rib e d [WJ65], however a tte n tio n

h a s b e e n p rim a rily afforded to obtaining com p u tab le ex pressions for th e

likelihood ra tio s th a t o ccu r in th e se solutions. O ptim al d e te c to r s tru c tu re s

a re quite sim ple w hen th e additive n o ise is w hite an d has a circu larily

sy m m e tric p ro b ab ility density, fo r e x a m p le G aussian [WJ65]. However, noise

colouring c o m p licates th e p ro b lem a n d two im p lem e n ta tio n a p p ro ach es

have b e e n fre q u e n tly em ployed in s u c h situ a tio n s. The first of th e s e is to

sim ply ig n o re th e colouring a lto g e th e r an d th e re fo re , em ploy well known,

realizable o p tim al d e te c to rs for ad ditive w hite G aussian noise. The second

ap p ro a c h is to w hiten th e observed signal p lu s noise p ro c e ss an d follow this

w ith a d e te c to r which is optim al for w hite noise. This la tte r technique, losing

th e so c a lled w h iten in g f i l t e r , involves p rio r know ledge of th e covariance

s tr u c tu r e of th e noise p ro c e ss in o r d e r to d e te rm in e th e inverse filter. This

is clearly "asking for a l o t While one fre q u e n tly does n o t know th e

covariance s tr u c tu r e of a noise p ro c e ss, it m ay b e possible to a s s e rt a m odel

for th e noise, e.g. it is a n a u to re g re ssiv e p ro c e s s of m o d e st o rd er. In this

case, la d d e r form s provide a nice s tr u c tu r e fo r op tim al d etectio n .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 34 -

L ad d er filters a re very useful for estim atin g th e likelihood r a tio of

digital signals in ad ditive G aussian noise. In fact, i t will b eco m e c le a r t h a t

th e la d d e r form m ay essen tially b e used as a n adaptive w hitening filte r

w hich le a rn s th e co v arian ce s tr u c tu r e of th e noise p ro c e ss a n d th u s supplies

a s e t 6f a sy m p to tically orthogonal (i.e., w hite) sufficient s ta tis tic s fo r

d e te c tio n . Sim ulations have shown th a t a m p litu d e sh ift k e y e d (ASK),

fre q u e n c y shift k e y e d (FSK) a n d p h ase shift k e y e d (PSK) signals m ay b e

rea d ily d e te c te d w ith la d d e r form s e.g. [LeEO], however, in o rd e r to ex p lain

th e d e te c to r s tr u c tu r e , a sim p ler p ro b lem will b e c o n sid e re d initially. It is

d e s ire d to re c o v e r th e tim ing of a know n ra te , b a se b a n d digital b it s tr e a m

tra n s m itte d a c ro ss a n infinite bandw idth, additive G aussian noise channel.

The tra n s m is s io n fo rm a t is of th e n o n -re tu rn to zero (NRZ) v ariety , as shown

in F igure 3.S. The tim ing re c o v e ry algorithm o p e ra te s as follows. B o th

tr a n s m itte r an d re c e iv e r know th e d a ta r a te a n d e a c h have fre e r u n n in g

clocks a t equal freq u en cies, how ever a t different p h ases. T ran sitio n s in

re c e iv e r d a ta a re u s e d fo r fo rc e th e re c e iv e r clock in to syn ch ro n ism w ith

th e tr a n s m itte r so t h a t it suffices to d e te c t th e o c c u re n c e of b it tra n s itio n s

for tim in g re c o v e ry (This is know n as j a m - t i m i n g c o rre c tio n . F req u en tly ,

th e b it tra n s itio n s a r e only u s e d to m ak e sm all tim ing c o rre c tio n s, th u s

resu ltin g in a sm o o th e r tim ing acquistion). It is w o rth re m a rk in g th a t FSK

and PSK signals m ay be analyzed in a s imila r m a n n e r to w hat will b e

p re s e n te d h e re .

In em b a rk in g on th is discussion, it is im p o rta n t to rea liz e th a t th e

d e ta ile d analysis of th e d e te c to r is n o t w ithin th e scope of th is d iss e rta tio n

and will b e p r e s e n te d elsew here. However, th e sim ple exam ple of tim ing

re c o v e ry is ch o sen to illu s tra te firstly, th e u tility of la d d e r form s, h en ce th e

u tility of a signal processing chip capable of im plem enting th e la d d e r

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 35 -

re c u rsio n s (th e design of which is th e goal h e re ). Secondly, this exam ple is

m e a n t to illu s tra te th e im p o rta n c e of developing algorithm s th a t a re read ily

im p lem en ted . In th is case, knowing t h a t a n efficient la d d e r form chip is to

b e designed, a good o ptim al d e te c tio n alg o rith m (o r a t le a s t a tim ing

rec o v e ry algorithm ) utilizing th e la d d e r form will be developed.

3.3.1 T im ing R ecovery W ith L adder F o rm s

C onsider th e sy ste m of F igure 3.10 in which, th e re c e iv e r uses a n

unnorm alized, prew indow ed o r sliding window la d d e r form to e s tim a te

likelihood variables. Suppose t h a t th e noise p ro c e ss is zero m e a n w ith a n

Tixn cov arian ce m atrix , E, such t h a t th e jo in t d istrib u tio n of V sam ples

of th e in p u t p ro cess, y n is:

P y (s ) = (27r|2 |) n / 3 exP

w here y = [yi y z • ■• y n Y a n d 2 is a n unknow n p a ra m e te r.

P ro p o sitio n 3.1: The s ta tis tic , T ( y ) = y TE“ V is sufficient for th e fam ily

P y = I P y(E) | E e u j w ith a th e p a r a m e te r space.

Proof: Py(E) ad m its to th e fa c to riz a tio n g-e{T(Y))h(Y) w ith :

»«<nr» = (-K W i

* (« -

an d so T ( Y ) is sufficient for P r b y th e well known fac to riz atio n th e o re m

[Le59].

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 36 -

V(t)
+V

1 1 1 1
t

0 0 0 0

Figure 3 . 9 : Non-Return to Zero Transmission Format

Gaussian Noise

Modulator Demodulator

channel

Figure 3 .1 0 : D i g i ta l Transmission System

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 37 -

P r o p o s itio n 3.2: y n = y TZ n ly = r ^ E ^ r ^ , (y n = 1 - y n )

w here:

r j = v e c to r of backw ard resid u als

7 n = 1 “ 7n

P ro o f: This was shown by Lee a n d Morf [LM80], assum ing th a t th e sam ple

co v a rian c e m a trix provides a good e s tim a te of E, w hich is rea so n a b le for a

sufficiently larg e sam ple.

H ence, ynj is a sufficient s ta tis tic for d e te c tio n , in fac t, i t is a

likelihood variable. Although th is p ro p e rty of y n ,r will n o t b e specifically

ex p lo ite d in th is sim ple m otivation, i t c e rta in ly justifies ex am ining y n T as a

possible t e s t s ta tis tic for tra n sitio n d etectio n . F o r th is rea so n , it is fruitful

to e s ta b lis h its s ta tis tic a l d istrib u tio n u n d e r various conditions. However, it

is im p o r ta n t 'to note th a t th e unnorm alized, w eighted prew indow ed la d d e r

fo rm eq u atio n s of C h ap ter Two m u st be m odified to n o rm alize the

co v a rian c e s c o rre c tly . This influences th e d istrib u tio n of yn . In th e sequel,

it is a ssu m e d th a t th e alg o rith m has b e e n a p p ro p riate ly m odified to

n o rm a liz e th e covariances and p a rtia l covariances by th e n u m b e r of

sam p les.

P ro p o s itio n 3.3: Suppose th a t sufficient d a ta has b e e n o b served to

a c c u ra te ly e s tim a te E a n d h en ce ir7r r, a n d f u rth e r a ssu m e th a t no

tra n s itio n s have o c c u rre d fo r a sufficiently long tim e so th a t

■£’(£*.7’) = E i T k j ) = 0 f o r 0 < f c < 7 i . Then. y n.T~Xn< i-e., a c e n tr a l chi

sq u a re d v a ria te w ith n d eg rees of freedom .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 38 -

Proof: U n d er th e conditions s ta te d ,

r k .T - N(QM)

and

E [ r k .T Tl.T \ = 0 V k *1

b y th e orth o g o n ality p rin c ip le of lin e a r le a s t sq u a re s e stim a tio n [Pa65].

-i
0 fl
R l

0
' T rn

_ y r i.T
i=i

but

~ 1) -» 7n.T ~ Xn

In keeping w ith th e s p irit of a sim ple exam ple, th e p r e s e n t analysis will

b e c o n sid erab ly sim plified by ignoring th e ra n d o m n a tu r e of th e reflec tio n

coefficients an d rep la c in g th e m w ith fixed v alu es during p eriods cf non­

tra n s itio n (S im ulations p re s e n te d a t th e e n d of th is s e c tio n will justify th is

sim plification). It is now n e c c e s s a ry to ex p lo re how th e reflectio n

coefficients change d u rin g a tra n sitio n .

P ro p o siti o n 3.4: L et a tra n s itio n of 'y' o c c u r a t t = t 0 fro m +V to —V.

The re fle c tio n coefficients of th e la d d e r filte r a re u n a lte re d by th e

tra n sitio n .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 39 -

U n d er th e conditions s ta te d ,

Tt.T N[0,Rif)

an d

E [rk j t i t ] = 0 Vk

by th e o rth o g o n ality p rin c ip le of lin e a r le a s t sq u a re s e stim a tio n [PaS5].

IV. 'r~ ... r *I~ R r, 0 -i w


l i • c ' Aj;
Rl TZ
7 n .T
0
T rn

ri.T
= 2
i=l Ri.T

but

n.r 0 , 1)
N { 7n .T ~ Xn
V r ~7

In keeping w ith th e s p irit of a sim ple exam ple, th e p re s e n t analysis will

b e co n sid erab ly sim plified by ignoring th e ran d o m n a tu re of th e reflec tio n

coefficients a n d rep la cin g th e m w ith fixed values during p erio d s of non­

tra n s itio n (S im ulations p r e s e n te d a t th e end of th is se c tio n will justify th is

sim plification). It is now n e c c e s s a ry to explore how th e reflectio n

coefficients change d u rin g a tra n sitio n .

P ro p o sitio n 3.4: L et a tra n s itio n of 'y' o ccu r a t t = t 0 frc m + 7 to —V.

The reflec tio n coefficients of th e la d d e r filte r a re u n a lte re d by th e

tra n sitio n .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 40 -

Proof: see appendix A.

P ro p o sitio n 3.5: F o r a tra n sitio n of 2V in th e rec e iv e d signal a t t = t 0,

E {rk) = 2 V a t t = t0 + k for 0 < k < n . O therw ise E (rk ) = 0.

Proof: see appendix B. The consequence of th is p ro p o sitio n is th a t an

im p u lse of change in resid u a l m e a n p rcp o g a tes th ro u g h th e la d d e r form ,

alte rin g th e e x p e c ta tio n of e a c h resid u a l for one tim e ste p .

T h eo rem 3.1: F o r a tra n s itio n of 2 V a t t = t0 in th e receiv ed signal,


7n.T ~ xl(A) fo r t0 < t ^ t 0+TL w ith n o n c e n tra lity p a ra m e te r given by:

At = i = t - tg

Proof:

7ni = E 0 < i < t 0 +71


i= l n i,t

b u t a t t = t- + k

n.t ~ N { 0.1) V i *k

and fro m p ro p o sitio n 3.5


Tk.t 2V
~ N 1
k.t k .t

7 n .i ~ E Xi + X?(Ar)
n —1

= Xn(Aj)

by th e re p ro d u c tiv e p ro p e rty of th e chi sq u are d istrib u tio n [Le59].

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41 -

This th e o re m an d p ro p o sitio n 3.3 a re sufficient to co m p letely

c h a ra c te riz e yn j b o th before a n d a fte r a tran sitio n . Following a tra n sitio n ,

7 n .T r e v e rts to having a zero n o n c e n tra lity p a ra m e te r in 'n' tim e s te p s as a

c o n seq u e n c e of proposition 3.5. H ence, th e t iming reco v ery p ro b le m h as

now b e e n re d u c e d to a sim ple tw o h y p o th esis te stin g problem , fo r w hich a

w e a lth of knowledge exists. It is im p o rta n t to n o te th a t m o st of th e s e proofs

re ly on r i t being norm ally d istrib u te d . S tric tly speaking, th is is n o t th e

c a se sin ce R [ t is only a n e s tim a te ( th a t is, th e c o rre c t d istrib u tio n is

S tu d e n t’s-t) of th e tru e e r r o r covariance. However, a fte r a sufficient

n u m b e r of sam ples (which is quite sm all fo r G aussian signals), th is e s tim a te

h a s enough d e g re e s of freed o m to ju stify th e approxim ation.

T heorem 3.1 in dicates th a t th e re a re 'n' sequential o p p o rtu n itie s fo r

d e te c tin g th e tra n sitio n since X is n o nzero for n sam p les (E arlier

d e te c tio n is of course m ore d e sira b le w ith a ja m tim ing sch em e since th e

ex p ectin g tim ing e rr o r is red u c e d ). L et Pp.* b e th e false a la rm p ro b ab ility

a t th e I th sam p le (i = 1, 2 n ) a n d Pjt.i th e asso ciated m issed d e te c tio n

p ro b ab ility . Then, th e overall false a la rm (P /a) and d e te c tio n (Pp)

p ro b a b ilitie s are:

(3.9)
t=i

Po = i - n Pui (3.10)
i=i

N otice t h a t as th e false a la rm p ro b ab ility in c re a se s additively, th e m issed

d e te c tio n probability d e c re a ses geom etrically .

C onsider th e evaluation of th e false a la rm (Pp) an d d e te c tio n (P*)

p ro b a b ilitie s for a single sam ple. A b it tra n s itio n is now d e te rm in e d th ro u g h

th e two hyp o th esis problem s:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 42 -

H0 : no tra n sitio n , i.e., y n j- ~ x 2


v e rs u s H j: tra n sitio n , i.e., 7*7 ~ X n(V )

w hich re s u lts in a th re sh o ld d e te c tio n sc h e m e of th e form :

7n.T < T no tra n s itio n


< T tra n s itio n
= T ran d o m ize if d e s ire d

w here T is th e th re s h o ld w hich d e te rm in e s th e ty p e 1 a n d ty p e U (false

a la rm a n d m isse d d etectio n ) e r r o r p ro b ab ilitie s. U nfortunately, it is

difficult to c h a ra c te riz e th e te s t since is n o t a lo catio n p a ra m e te r of th e

ch i-sq u a re d d istrib u tio n . However, good r e s u lts m ay b e o btained th ro u g h

th e u se of F is h e r’s no rm al appro x im atio n to th e c e n tra l x2 d istrib u tio n and

P a tn a ik ’s [Pa49] ap proxim ation to th e n o n -c e n tra l x2- F o r yn (X) a chi-

sq u a re d v a ria te w ith a larg e num ber. 71, d e g re e s of freed o m an d n o n c e n tral

p a ra m e te r, X, th e s e approxim ations are:

■s/2 7 n ( 0 ) ~ N iy tZ n - 1, 1)

and

V 2 7n(X) ~ ff2)

w ith

— & V 2n - 1 (3.11)
n + X ' '

(3.12)

The ap p ro x im atio n of th e n o n c e n tral ch i-sq u a re d d istrib u tio n is b e tte r th a n

for th e c e n tra l d istrib u tio n for a given value of 'tl' b e c au se th e fo rm e r has

effectively v > n d e g re e s of freed o m given by:

(3.13)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 43 -

The two a p p ro x im a te d istrib u tio n s a re show n in F igure 3.11. Once a

th re sh o ld , T, is estab lish ed , th e false a la rm an d d e te c tio n p ro b ab ilitie s a re

given by:

Pp = P ro b [re je c t H0 | X = 0]

= e rfc (T - V 2 n - 1) (3.14)

an d

Pi - 1 - P ro b [a c c e p t H0 | X =X0 ^ 0]

1 -
- 1 - /r 7 e 1 ‘ d, x
VS tto5

= e rfc T ~ U (3.15)

w here / a, u a r e given by (3.11) (3.12) for X = \ and

e r f c x = 1 — - i —_ f e~xZ/z dx
V g i Jx

is th e c o m p le m e n ta ry e r r o r function.

E xam ple 3.1:

R ecall th e tim ing recovering sch em e o u tlin ed in S ection 3.3. Such a

’’jam m ed -tim in g " sc h em e is clearly m o re sen sitiv e to false a la rm

in fo rm a tio n t h a n m issed d e te c tio n fo r an in c o r r e c t t i m ing jam (false alarm )

c a n c a u se la rg e tim ing e rro rs. However failing to m a k e a n occasional tim ing

c o rre c tio n (m issed d e te c tio n ) is n o t cru cial sin ce o n c e th e tr a n s m itte r and

re c e iv e r clo ck s a re synchronized, th e y d rift slowly.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 44 -

c
>-

to
Lu c
o
•r—
4->
.3
5-
+->
I/)

ro
U
c/>
on
•4-5
<0

00
a;
s-
3on

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 45 -

The d e te c tio n th re sh o ld fo r a given Pp is d e te rm in e d fro m (3.14),

T = V2n - 1 + e rfc _1(Pp)

following which th e d e te c tio n p ro b ab ility is given by (3.15). F igures 3.12 and

3.13 show th e th re s h o ld and m iss p robabilities fo r various Pp. X for th e

case of n = 3. The in te g rity of th e no rm al a p p ro x im atio n to th e n o n c e n tral

c h i-squared m ay b e a s c e rta in e d from th e effective d e g re e s of freedom , v,

shown in F igure 3.14.

Rem arks:

1) The n o n c e n tra lity p a ra m e te r is a m e a su re of th e signed to noise ratio of

th e w hitened signal com ponents.

2) A pplications w hich a re m o re sensitive to m isse d d e te c tio n th a n false

a la rm (e.g. ra d a r) c a n benefit from r e p e a te d d e te c tio n trials, as

in d ic a te d b y E quations (3.9) and (3.10). F or exam ple, Pp = .0001 leads

to a Pu - . 06 a t X = 16 d B . However, w ith a n e ig h th o rd e r la d d e r filter,

th e r e a re e ig h t o p p o rtu n ities for d e te c tio n w hich re s u lts in Ppa - .0006

a n d Pu = 1 —Pj) = 1.2xlO -10 assum ing \ ~ X , i = 1, 2,..., n (which

sim u latio n s will show to b e a reasonable a ssu m p tio n ).

3) N orm al ap p ro x im atio n s to a chi-squared v a ria te a re a c c u ra te for larg e

d e g re e s of freed o m (e.g. v > 20). This fact, to g e th e r w ith Figure 3.14

should be b o rn e in m ind w hen using F igure 3.13. The norm al

a p p ro x im atio n provides a convenient vehicle fo r a sim ple exposition

s u c h as th is, how ever a c c u ra te re s u lts for low signal to noise ratio s

m u s t be o b ta in e d fro m th e chi-squared d istrib u tio n s.

4) The p r e s e n t analysis h a s involved th e t e s t of a sim ple hypothesis

a g a in st a sim ple alte rn a tiv e . This c o rre s p o n d s to having p rio r

knowledge of th e noise pow er (effectively, X). A co m posite hypothesis

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
P ro b ab ility of False Alarm(Pp)

1 0 '7 10"6 10-5 1 0 '4 H O '3 10‘ 2 1 0 '1 lO13

Figure 3 .1 2 : D e te c tio n Threshold v s . F a ls e Alarm P r o b a b ilit y

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 47 -

:=.oi

Lambda (dS)

Figure 3 .1 3 : Missed D e te ctio n P r o b a b i l it y f o r Various Pr ,X

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 48 -

40 -

20 .

10 .

Lambda (dB)

0 5 10 15 20

F igure 3 .1 4 : E f f e c t i v e Degrees o f Freedom o f y_ -i- U )


— — — n , i

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 49 -

t e s t would provide re s u lts for c a se s w here th e noise pow er a n d th e bias

in th e p a rtia l c o rre la tio n coefficients due to th e noise is unknown.

5) The previous se ctio n has explored tim ing reco v ery , r a th e r th a n th e

c o m p lete p ro b lem of signed detectio n . A sim ple d iffe re n tia l d e te c tio n

met-hod using th e tim ing recovery d a ta m ay b e re a d ily c o n stru c te d , fo r

ev ery tra n s itio n co rre sp o n d s to a b it change. The u su al p ro b lem s

a s so c ia te d w ith differential d e te c tio n [TS71] e n c o u n te re d . L adder

fo rm s m a y b e u se d fo r m em oryless d e te c tio n also, how ever a tre a tm e n t

of th a t s u b je c t is n o t c o n sisten t w ith th e goals of th is d isse rta tio n .

8) The b a se b a n d c a se th a t has b e e n explored h e re c o n sists of a n e a rly

d isco n tin u o u s signal c o rru p te d by a differen tiab le noise p ro cess.

S ingular d e te c tio n th e o ry 1 shows th a t it is always th e o re tic a lly possible

to d e te c t th e signal w ith a rb itra rily low e r r o r p ro b ab ility by

d ifferen tiatin g th e signal p lus noise process. The la d d e r filte r provides

a re a lis tic d e te c tio n s tr u c tu r e in th e singular case.

7) Many assu m p tio n s w ere m ad e in p re se n tin g th is sim plified analysis

sin ce a d e ta ile d ev alu atio n was n o t c o n sisten t with th e th e m e of th e

th esis. P e rh a p s th e m o st serious of th e s e was to ignore th e

c o n trib u tio n of th e a u to reg ressiv e noise p ro c e ss to th e reflectio n

coefficients of th e lad d er. However, th is a s su m p tio n is n o t as

re s tric tiv e as it m ay initially a p p e a r since th e d e te c to r c a n first be

tra in e d to th e noise s p e c tru m in th e ab sen ce of a signal and th e n

a c c o u n t m ay b e ta k e n of th e noise reflec tio n co efficien ts w hen a signal

is p r e s e n t (i.e. during n o rm a l o p eratio n of th e d e te c to r).

1 I am grateful to Professor Kailath for pointing out this fact.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 50 -

3 .3 .2 S im u lation R esu lts

A m o d em sim ulation p ro g ram em bodying a tra n s m itte r, additive

G aussian noise c h a n n el and a re c e iv e r was w ritte n to g a th e r sim u latio n d ata.

The noise so u rc e was a n eig h th o rd er a u to reg ressiv e m odel being d riv e n by

w hite noise. F igure 3.15 shows th e evolution of th e ^ l T w ith tim e fo r a

clean, b a seb a n d , NRZ fo rm a t signal. N otice th a t K7l T - 1, w hich was

a ssu m e d in P ro p o sitio n 3.4, is a good assum ption, as is th e a p p ro x im atio n

t h a t th e r e m aining p a rtia l c o rre la tio n coefficients a re zero. N otice f u rth e r

t h a t P ro p o sitio n 3.5 is verified since th e values of th e b ackw ards re sid u a ls

behave as e x p e cte d . The sam e re su lts a re shown in Figure 3.16 fo r a m u c h

low er signal to noise ratio.

S im ulations w ere also p e rfo rm e d for PSK and FSK signals fo r various

signal to noise ra tio s a n d la d d e r filter o rd ers. These a re shown in F ig u res

3.17 - 3.21. N otice th a t a norm alized value oi y n.r h as b e e n p lo tte d for

convenience as well as th e change in y nj from sam ple to sam ple. The

re s u lts (e.g. F igures 3.15 - 3.18) show th a t A* « A , i = 1, 2 n following a

tra n sitio n , w hich provides a convenient way to apply E quations (3.9) and

(3.10) sin c e Pp.* an d Pm-i becom e in d ep e n d e n t of i . The tra n s itio n s in y n j ,

even fo r low signal to noise ra tio s are quite profound. The effect of red u cin g

th e o rd e r of th e la d d e r is to in c re a se th e b a n d of noise a ro u n d (e.g

F igures 3.19 a n d 3.21) since th e lad d e r is unable to m a tc h t h a t p o rtio n of

th e noise s p e c tru m .

CHAPTER SUMMARY AND CONCLUSIONS

The p u rp o se of C h ap ter Three was to identify ta r g e t ap p licatio n s and

th ro u g h p u t re q u ire m e n ts for th e signed processing chip to be designed.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- SI -
1.5
Signal + Noise

160 (tim e )
1 .5 T

0 160

Figure 3 .1 : Baseband S im u la tio n , SNR=46 dB, 8 th ord er lad d er

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 52 -

0. 1

- 0.1

1 ,T

1.

0
0 160

Figure 3 .1 5 continu ed

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 53 -

0 200 (tim e)
2

3,T

2
0 200

Figure 3 .1 5 continu ed

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 54 -

Signal + Noise
1 .3 ,

M h 111*,

-1.3L
160 (tim e)

gamma.g j

Figure 3 .1 6 : Baseband S im u la t io n , SNR=20dB, 8th ord er ladder

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 55 -

1.0

0 .5

‘4 , 7

-0 .5
0 160

Figure 3 .1 6 continued

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 56 -

-2 L
160 (tim e )
2

5,T

2
160

Figure 3 . IS continued

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 57 -

O SO

0.00

CO •0 20

100 00 150.00

h-
CO

O 30

O 00 50 00 100 00 200 CO

Figure 3 .1 7 : Binary PSK, SNR=12.4dB, 8 th ord er ladder

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 58 -

C/5

O 00
o>
CO
•1 00

e.ce 150 00 200 00 300

O.So

0 -K

O20

0 CO

-O 20

300

Figure 3 .1 8 : Binary PSK, SNR= OdB, 8 th order ladder

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 59 -

(/>
O
c

cr>
c/i

i so
so oo

o.

I—
CJ

•0 4 0

200.00

0 So

CM

0 10

0 001
0 CO 100 0%' 200.00

Figure 3 .1 9 : Binary PSK, SNR= 12.4dB, 2nd order ladder

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 60 -

1 SOl

1 <.'Op;
+ n o is e
signal

-o.so

- i .00,

0 . 00. 100.00

CO
<

O 001

-o .

0. -101
SO. 00 200.00

Figure 3 .2 0 : Binary FSK, SNR= 26.3dB . 8th order lad d er

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 61 -

1 50

+ n o is e
signal

- 1 00

-1.50
0.00

0 . 40

-O, 1 0

-0.30
50.00 100 00 150.00 300

Figure 3 .2 1 : Binary FSK, SNR= 12.4dB. 4th order lad d er

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 62 -

L adder filters w ere shown to provide a unified s tr u c tu r e for th e signal

p ro cessin g ta s k s of sp e e c h analysis an d synthesis, ad aptive equalization and

digital signal d e te c tio n . V oiceband applications of th e s e ta sk s fre q u e n tly

e n tail th e handling of d a ta a t a n BKHz sam ple r a te . T herefore, th e signal

processing chip m u s t b e capable of com puting th e la d d e r filte r equations a t

th a t ra te . All of th e su g g ested applications u se filte rs w hich a re typically of

e ig th o r t e n th o rd er. It would be ideal if th e chip to be designed could

com pute all t e n sta g e s in th e req u isite a m o u n t of tim e, however, it will

suffice to co m p u te one sta g e p e r chip a n d c a sc a d e te n chips to fo rm th e

te n th o rd e r filter.

The com plexity of th e norm alized la d d e r fo rm e q u atio n s m o tiv ated an

a lte rn a te fo rm u latio n to expose th e fu n d am e n ta l n a tu re of g eneralized

v e c to r ro ta tio n s in describing th e algorithm s. It was shown th a t th e

re c u rsio n s could be rea d ily co m p u ted as a seq u en ce of a few two-

dim ensional ro ta tio n s. The a rith m e tic u nit of th e chip will th e re fo re be

b a sed on efficient n u m erica l techniques, stu d ie d in th e n e x t c h a p te r, for

p erform ing th e s e ro tatio n s.

Finally, n o te t h a t th e new schem e developed fo r d igital signal d e te c tio n

is im p o rta n t for two reaso n s. It is first im p o rta n t in its own rig h t as a

d e te c tio n sc h em e b e c au se it req u ires less a p rio ri know ledge th a n existing

schem es, a n d it is capable of adapting to ch an g es in th e noise environm ent.

Secondly, it is a n exam ple of devising a n a lg o rith m t h a t is am enable to

im p lem e n ta tio n fo r a specific task . In th is case, by using lad d e r filters in

th e d e te c tio n algorithm , it is now possible to p e rfo rm signal d e te c tio n using

th e sa m e h a rd w a re t h a t would have b e e n developed fo r sp e e c h p ro cessin g

and adaptive equalization. In fact, th e co m m o n ality of equalizer and

d e te c to r s tr u c tu r e s m ak e s th e im p lem e n ta tio n of a m odem quite easy.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 63 -

APPENDIX A

To prove t h a t ? and K£ f a re c o n sta n t a c ro s s a tran sitio n , th e

following sim plifying assum ption is m ade:

S ince th e noise sp e c tru m is n o t known, th e la d d e r filter will be

an alyzed w ith no noise. The re su ltin g v alu es of th e p a rtia l

c o rre la tio n coefficients will be u se d to analyze th e signal plus noise

c a se . S tric tly speaking, th is is n o t c o rr e c t sin c e th e coefficients

a r e b ia se d by th e noise sp ectru m .

The NRZ b a se b a n d signal adm its to th e a u to re g re ssiv e m odel:

Vn = * Vn- 1 (A.1)

w here

k = -1 a t a tra n sitio n
= +1 everyw here else

Now, fro m th e orthogonality principle of lin ear le a s t s q u a re s estim atio n

[Pa65]:

_ E iVT Vt- i)
*l.T (A.2)
•T - E&/..
{ y T- i y T-i)

F rom (A.1) E ( y T y r - i ) = k E {y T-i Vt- i)

.. K y = Jfc = 1

i.e., th e b e s t e s tim a te of y ? from y r-i is sim ply yr~ \. Similarity,

K'i .t = 1.

Finally, all h ig h er o rd e r p a rtia l c o rre la tio n c o efficients a re zero since:

£71.r = y T - Use lyT\yr-i Vt- z ■• • Vt^ I

i=l

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 64 -

b u t E{Enm
xyr-i) - 0 i = l,2 n w hich leads to

cu = 1
Oj = 0 z = 2, 3 n .

H ence, th e b e s t e s tim a te of yT is y r - i s e ttin g K \ j - K f r = 1 and

Ki'T = K i j = 0, and i > 1.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX B

Proof of P roposition 3.5:

L et

_ f 1 T < t0
UT ~ [ - 1 T 2: t 0

6? ~

P ro p o sitio n 3.4 a s s e r te d th a t

1 i = l
KZ.T = K* t = ' 0 i = 2. 3 7i

When th e in p u t to th e la d d e r form, is ZV u t +71t w here tit is a zero m e a n

noise p ro c e s s , th e n th e above values of and yield (re c a ll t h a t th e

bias in , K £r due to th e noise was ig n o re d in th is sim ple exposition):

Since = -Ki . t = 0. ?' = 2. 3 n i t is c le a r th a t

■£’( r i.7 ’) = ■£’(7'l.7 ’- £ + l ) = S F d y -iJ .!

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 66 -

BIBLIOGRAPHY

[AH71] B. Atal, S. H anauer, "S peech Analysis an d S ynthesis by L inear

P re d ic tio n of th e S peech Wave,” J o u rn a l o f the A coustical S o c ie ty

o f A m erica, Vol. 50, 1971, pp. 637-655.

[Fa60] G. F ant, Acoustic Theory o f S p eech P roduction, M outon and Co.,

1960.

[GK73] M. Gevers, T. Kailath, "An Innovations A pproach to L east-Squares

E stim atio n , P a r t VI : D iscrete-Tim e Innovations R ep resen tatio n s

a n d R ecursive E stim ation," IE E E T ransactions on A utom atic

Control, Vol. AC-16, D ecem ber, 1973, p p. 588-600.

[Le59] E. L ehm ann, Testing S ta tis tic a l H ypotheses, J.Wiley and Sons,

1959.

[Le80] D.T. Lee, "Canonical L adder F o rm R ealizations and F a st

E stim atio n A lgorithm s,” Fh.D D issertation, S ta n fo r d U niversity,

D ept, of E le ctric a l Engineering, 1980.

[LM80] D. Lee, M. Morf, "A Novel Innovations B ased Time Dom ain P itc h

D e te cto r," Proc. o f In t'l. Conf. on A coustics, S p e e ch and Signal

P rocessing, Denver, CO, 1980, pp. 40-44.

[LSW65] R. Lucky, J. Salz an d E. YTeldon, P rin cip les of Data

C om m unications, McGrawHill, 1968.

[MG76] J. M arkel, A. Gray Jr., L in ea r P re d ic tio n o f Speech, S pringer-

V erlag, 1976.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 67 -

M. Morf, C. M uravcbik, D. Lee, "H ilbert S p ace A rray Methods for

A lpha-S tationarya P ro c ess E stim ation," Proc. o f I n t l . Conf. on

A coustics, S p e e ch a n d Signal P rocessing, A tlanta, GA, 1.981, pp.

856-859.

[Mo74] M. Morf, "F a st A lgorithm s for M ultivariable System s", Ph.D

D issertation, S ta n fo r d U niversity, D ept. of E lectrical

E ngineering. 1974.

[Pa49] P.B. P a tn aik , "The N on-Central x2 a n d F D istributions and Their

A pplications", B io m etrika , vol. 36, pp. 202-232, 1949.

[Pa65] A. P apoulis, P robability, Random. Variables a n d Stochastic

P rocesses, McGraw Hill, 1965.

[Sh79] M. Shensa, "A L east-Squares L a ttice D ecision Feedback

Equalizer," Proc. o f In t'I. C om m unications C onference, 1980, pp.

57.6.1 - 57.6.5

[SP80] E. S atorius, J. P ack, "A L east S quares A daptive L attice Equalizer

A lgorithm ," N aval Ocean S y s te m s C enter, T echnical R eport 575,

S e p te m b e r, 1980.

[TS71] H. Taub, D. Schilling, P rinciples o f C o m m u n ica tio n s S ystem s,

McGraw Hill, 1971.

[VT68] H. Van T rees, D etection, E stim a tio n a n d M odulation Theory,

Volume 1, J. Wiley and Sons, 1968.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 68 -

[Wi49] N. Wiener, Extrapolation, In te rp o la tio n and S m o o th in g of

S ta tis tic a l Time S e rie s w ith E n g in eerin g A pplications,

Technology P re ss an d Wiley, 1949.

[Wi70] B. Widrow, "Adaptive F ilters." in A sp ects o f N etw ork a n d S y s te m s

Theory (Kalman, DeClaris), H olt. R in e h a rt and Winston, 1970.

[WJ65] J. W ozencraft, I. Jacobs, P rin cip les of C om m u n ica tio n

E ngineering, J. Wiley, 1965.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 69 -

CHAPTER FOUR

NUMERICAL ALGORITHMS

The p rev io u s c h a p te r evidenced th e n e e d fo r th e efficient co m p u ta tio n

of e le m e n ta ry trig o n o m e tric functions and sq u a re ro o ts a s well as fo r 2 x 2

ro ta tio n s a n d /- r o ta tio n s . Exam ples in fu tu re c h a p te rs will d e m o n s tra te

th e p ro life ra tio n of th e se op eratio n s in m a trix a lg e b ra algorith m s th a t a re

com m onplace in signal processing. This p ro v id es fo r a ra d ic a l d e p a rtu re in

c u r r e n t day th in k in g re g a rd in g signal p ro ce ssin g co m p u te rs, w hich a re

p re s e n tly b ased on feist m ultiply and add c irc u its [AMI79] [BellBl]

[KNSYM80]. It new se e m s t h a t c o m p u te rs c a p ab le of v e c to r ro ta tio n would

b e m o re n a tu r a l a s signed p ro cesso rs.

The p ro b le m of c o n stru c tin g b o th h a rd w a re an d softw are efficient

n u m e ric a l a lg o rith m s for th e above functions, w hich co m p rise th e r a th e r

ric h s e t of e le m e n ta ry o p e ra tio n s in signed p ro cessin g , h a s b een a d d re sse d

by a n u m b e r of a u th o rs,e .g . [Me62] [CET62] [Sp65] [SK71] [DeL70], Two

prom ising a p p ro a c h e s fo r th e p re s e n t re q u ire m e n ts a re th e CORDIC

a lg o rith m s of V oider [Vo59] a n d W alther [W a7l] a n d th e convergence

c o m p u ta tio n te c h n iq u e s of Chen [Ch7l],

This c h a p te r will first d escrib e b o th of th e s e tech n iq u es. Som e m a jo r

im p ro v e m e n ts to th e alg o rith m s, w hich sim plify th e ir im p lem e n ta tio n while

enhancing th ro u g h p u t, will b e p re s e n te d . A g e n e ra liz a tio n of C hen's

m ethod, th a t p ro v id es a fram ew ork for d e m o n s tra tin g th a t th e CORDIC

alg o rith m s a re a c tu a lly a special c a se of th e convergence co m p u ta tio n

m eth o d , will b e c o n sid e re d last. Many new a n d useful functions m ay b e

c o m p u te d in th e g e n e ra liz e d fram ew ork.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 70 -

4.1 THE CORDIC ALGORITHMS

The CORDIC com puting tech n iq u e was firs t p re s e n te d by V oider [Vo59]

as ite ra tiv e eq u atio n s for com puting som e e le m e n ta ry functions su c h as

plane ro ta tio n s an d hyperbolic (or J-) ro ta tio n s on two dim ensional vecto rs.

M ultiplication, division, t a n -1, t a n h -1 a n d sq u a re ro o ts were also included in

V older’s m eth o d . W alther [IVa7l] show ed t h a t th e CORDIC algorithm s could

be unified in to one s e t of ite ra tiv e eq u atio n s p a ra m e te riz e d b y a q u a n tity

'm.' w hich re p r e s e n te d a coordinate s y s te m in w hich th e rad ial co m p o n en t

or no rm . R , a n d a n g u la r com ponent, $, of a v e c to r X = ( x , y ) a re given

by

1 0
R - y /x 2 + m y 2 = | | (2,v) I Is • 2 (4.1)
0 771

$ = yfrrL ta n ’( y V m / i ) (4.2)

Figure 4.1 d e p ic ts R , $ for 771 = - 1 , 0, 1 w hich a re re fe rre d to as th e

hyperbolic, lin e a r a n d c irc u la r co o rd in a te sy ste m s respectively. The CORDIC

ite ra tio n s c o rre sp o n d to ro ta tin g a v e c to r along one of th e curves of F igure

4.1.

The CORDIC re c u rsio n s ro ta te a v e c to r X* = (xit y i) 7 to a new v e c to r

Xi+i = (xi+1. 2/i+1) 7 according to

(4.3)

w here {M. = ± 1 d e te rm in e s th e d ire c tio n of r o ta tio n a t e a ch ite ra tio n a n d

$<5i$ is a se q u en c e of a rb itra ry c o n s ta n ts re p re s e n tin g th e m agnitude of t h a t

ro ta tio n . A fter 'n ' ite ratio n s, th e new ra d ia l a n d an g u lar com ponents of th e

v e c to r a re

= ?C - O. (4.4a)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 71 -

m= 1 lm=0 m =-l

S = Shaded Areo

Figure 4 . 1 : R otation in G en eralized Coordinate Systems

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 72 -

Rn = R c * K (4.4b)

w here

n —1 it—1 ^
a = 2 A^t®i = 2 /iim -1/2tan-1(5iV m ) (4.5a)
:0

*fm = V Ki = “n 1 V l + m 5 ? (4.5b)
i= 0 i=0

An auxilllary v ariable is in tro d u c ed to a c c u m u la te th e to ta l rotation:

2 i +1 = *£ “ (4.6)
(n o tice t h a t a* > 0 w ith th e sign of th e ro ta tio n b ein g ch o sen a t e a ch ste p

th ro u g h fj-i)

As a n exam ple, c o n sid e r evaluating t a n -1^ / ^ • If th e in itial v e c to r

Xo = (x0 y 0) is r o ta te d th ro u g h a seq u en ce of angles \ai] until

X„ = (x n , 0) th e n z n will equal th e n e t r o ta tio n p ro v id ed it was initially

equal to zero. Thus z is a useful q u a n tity sin ce a f te r n ro ta tio n s it

a c c u m u la te s th e n e t ro ta tio n ,

n=l
2n = 20 - 2 AfcOi (4.7)
i= 0

w here - sig n of th e I th ro ta tio n (±1).

The CORDIC alg o rith m is com prised of E quations (4.3) an d (4.6). If th e

ro ta tio n s a r e m a d e to p ro c e e d in a m a n n e r w hich fo rc e s to zero, i.e.,

y n -* 0 th e n a n d z„, re s u lt in som e useful fu n ctio n s, one of w hich is th e

a r c ta n e x am p le given above. Figure 4.2 su m m arizes th e r ic h co m plem ent of

fu ctio n s t h a t a r e o b tain e d w hen y -> 0 or z -» 0.

N otice t h a t <5* is re la te d to th e ta n g e n t of a* (in th e a p p ro p ria te

c o o rd in a te s y s te m of course) an d th is h as r e d u c e d th e n u m b er of

m u ltip lic a tio n s in E quation (4.3) from th e u su al fo u r re q u ire d in a v e c to r

ro ta tio n . The p e n a lty fo r th is sim plification is th e sp u rio u s scale facto r, K,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 73 -

CORDIC Functions
The
4.2:
Figure

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 74 -

since each, ite r a tio n of (4.3) is a ro ta tio n as well as a stre tc h in g .

A definition of positive and negative an g les is im p licit in th e CORDIC

eq u atio n s, sin c e th e d ire c tio n of r o ta tio n is c h o s e n a t e a c h ite ra tio n to

achieve a p re s c rib e d destination, e.g., z* -» 0. The definitions a re c h o sen to

yield th e fu n ctio n s of F igure 4.2. It will b e fre q u e n tly convenient to re v e rse

th e definition of positive angles so as to g e n e ra te slightly differen t functions.

F o r exam ple, re c a ll E quation (3.4) in w hich th e m a tr ix ,N, is a tra n sp o se d

r o ta tio n m a trix . This m ay be re p r e s e n te d a s a p lan e r o ta tio n th ro u g h th e

angle — F u r t h e r mo r e , th e n e e d for m u ltip ly —a n d —su b tra c t r a th e r th an

m u l t ip l y —a n d —a c c u m u la te will b eco m e a p p a re n t in a v a rie ty of c a se s la te r.

I t would c le a rly b e m o re convenient to a b so rb th e s e sign changes in to th e

CORDIC re c u rs io n s (which is a relativ ely sim ple c o n tro l ta s k in a CORDIC

m ac h in e) r a t h e r th a n to in c u r th e s p e e d a n d com plexity p en alties of

s e p a ra te ly n e g a tin g a quantity. The re v e rs e d sig n fu n ctio n s shown in F igure

4.3 c a n b e re a d ily g e n e ra te d by sim ply re v e rsin g th e sign in th e Zj+j

ite r a tio n (E quation (4.6)), th a t is, th is ite ra tio n is defined as.

Zj+l = Zi - SPiCLi

w here s = 1 y ield s th e n o rm a l CORDIC eq u atio n s a n d s = - 1 re s u lts in th e

sign r e v e rs e d e q u a tio n s in w hich th e d ire c tio n of ro ta tio n has b e e n rev ersed .

A lthough th is r e s u lt is intuitively satisfying, th e d e ta ils of th e proof a re given

in th e appendix.

4 .1 .1 Som e C onvergence P roperties

The ch o ice of is p re d e te rm in e d a n d c ru c ia l to th e convergence

b eh av io r of th e algorithm . C onsider th e so c a lled v e cto rin g case in which

th e d ire c tio n of ro ta tio n is ch o sen to re d u c e th e m ag n itu d e of th e angle a t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

K ,(y c o s/-x sin /)

CIRCULAR (m = 1): 0 CIRCULAR (m 1): y — 0

y -----

/ ------- ----- 0

LINEAR ( m = 0 ) : z - 0

K _ ,(x co sh j-y sin h /)

y — K .i(y c o sh f-x s in h j)

-----0

HYPERBOLIC (m = - 1 ) . ^ — 0 HYPERBOLIC ( m = - 1 ) : y - 0

Figure 4 . 3 : The Reversed Sign CORDIC Functions


- 76 -

e a c h ite ra tio n , th u s bringing th e r e s u lta n t v e c to r to th e abcissa, i.e.,

i*i+ii = I l St l - « i i (4.8)
The sum of th e r e m aining ro ta tio n s a t e a c h s te p m u st be large enough

to b rin g th e angle to a t le a s t w ithin a n -1 of zero, th u s g u aran teein g th e

"g r a n u la r ity " of th e calcu latio n or th e "a n g u la r re so lu tio n " to be an- j in

'n ' s te p s (This m u st be tru e even w hen = 0 and j$i+ i| = « i). This

c o n d itio n im plies:

«i - * £ a.j < ot„_i i = 0, 1, 2.........n —2 (4.9)


j=i+1

The d om ain of convergence of th e alg o rith m is lim ite d by th e to ta l possible

ro ta tio n , i.e.,

n-1
I* . I — 2 ^ 1
J=0
*1—2
-» m ax | $0 | = a n _i + a,- (4.10)
j= 0

In o r d e r to show th a t $ converges to w ithin a „ _ i of zero in 'n ' steps,

p r o c e e d as follows:

L e m m a 4.1 [Walther]:

l$tl < On-l + 1C °-i

P ro o f: By in d u ctio n

i = 0 : The hypothesis is t r u e for i = 0 by (4.10) above.

Opnpral i : A ssum e th e hypothesis is t r u e fo r som e i .

Tl-l
|$ i| < ^ CXj
3=i

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 77 -

n -l
l$tl ~ < a n- 1 + 2 «;
J=i+1

n-1
-O i < |$ i| - oti < a n _! + 2 aj
j= i+ 1

Now apply (4.8):

« n -l + IS «> < -O i < |$ i | - £Xi < ^ ttj


j= i+ 1 J=i+1

i.e.,

n -l
| | CLi < Otjj—i + 2
i = t+ l

However, applying (4.8) yields:

Tt-I
| f i+i| < a n _! + 2 ai
j=i+1

m aking th e hypothesis tru e for i + 1 an d th e lem m a is p ro v ed by induction.

T h e o rem 4.1 [Walther]:

§ converges to w ithin <*„_! of zero in n steps.

P roof: This is a d ire c t co n seq u en ce of th e above le m m a fo r i = n .

R em arks

(1) By rep lacin g § w ith z , e x a c tly th e sam e a rg u m e n t applies to

prove convergence in th e r o ta tio n c a se w hen th e v ariab le z is

driven to zero. C onsequently, z also has th e sa m e dom ain of

convergence as $, i.e.,

m ax | z 0 | = m a x | $ 0 1 = ^ a,-

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 78 -

(2) The th e o re m o n co n v erg en ce is v e ry im p o rta n t since it yields a

g u a ra n tee d , c o m p u ta b le co n v e rg e n c e bound on th e algorithm .

(3) The algorithm h as nice b eh av io r p ro p e rtie s even outside its region

of convergence. Suppose § 0 is to o larg e a n d outside th e dom ain

of convergence. Then th e sig n s of -will all b e chosen th e sa m e

a n d th e calcu latio n will a p p ro x im a te , as closely as it can, th e tr u e

angle, i.e., th e c a lc u la tio n will p ro c e e d to a n angle of ^ a* a n d


j =o
th e n s a tu ra te . For exam p le, if m a x | $„ | = and

$ = ta n -1 y 0/ x 0 is to be c o m p u te d , th e n th e re s u lt will b e if

$ > m ax | $o | . Thus, th e r e s u lts a re p re d ic ta b le and rea so n a b le

e v en outside th e d om ain of co n v erg en ce.

4 .1 .2 Im p lem en tation Issu es

The choice of j m ay b e q u ite a rb itra ry , how ever th ey a re a lm o st

always ta k e n to be in te g ra l pow ers of th e m ac h in e radix, generally two, so

th a t th e scaling by <5* in (4.3) c a n b e p e rfo rm e d w ith a rith m e tic sh ifte rs


J?
in ste a d of m ultipliers. If <5* = 2 *, th e choice of is equivalent to th e

choice of [Fi]. Once a se q u en c e is chosen, it is p re sto re d , fixing th e

m ag n itu d e of <5* a n d h e n c e cq a n d However th e d irectio n of ro ta tio n

is c h o sen a t e a c h ite r a tio n to re a liz e th e p a rtic u la r zero forcing z n -* 0

a n d y n -* 0 m en tio n ed above. The ch o ice of \F i\ is c ru c ial to th e size of

th e sc ale c o n stan t, K , in (4.5b) a n d th e do m ain of convergence of th e

alg o rith m (4.10).

R ecall th e rela tio n sh ip b e tw e en cq a n d <5it

« = E Mia i = £ Aqm._1/2ta n _1(<5i V m )


i= 0 i= 0

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 79 -

The seq u en ce Fi m u st be c h o sen s u c h t h a t $oci} satisfies th e

convergence c rite r io n of (4.9). F o r a ra d ix 2 m a c h in e , V oider su g g ested

som e se q u en c e s fo r m = —1 ,0 , 1 th a t a p p e a r in F igure 4.4. These

seq u en ces have b e e n com m only u se d in CORDIC p ro c e s s o rs r e p o r te d in th e

lite r a tu r e [W a7l] [HT80]. However th e n u isan ce sc ale fa c to rs a n d th e

r e s tr ic te d reg io n s of convergence have b e e n c irc u m v e n te d only a t th e

expense of e ith e r a h a rd w a re or sp eed p e n a lty o r b o th . Som e existing

tec h n iq u es fo r rem oving th e scale fac to rs an d ex te n d in g th e dom ains of

convergence a re now review ed, following which, new h a rd w a re a n d sp eed

efficient m e th o d s will b e developed.

4.1.3 S cale F a c to r N o rm a liz a tio n

W alther [W a7l] su g g e ste d storing th e scale fa c to r K in m em ory, since

its m ag n itu d e is know n once th e sequence \Fi] is d e te rm in e d (see E quation

4.5b). A CORDIC o p e ra tio n w ith m = ±1 is followed b y a division of th e


«. , ,, . - •• . - A*i> •«' __ i?
re s u lt by tile S L O T * I i ~ r T \ iTT^T**' »)~m^ QiviSiOUlS

rea d ily p e rfo rm e d w ith th e sam e CORDIC block how ever, c le a rly a large

sp e ed overhead is in c u r r e d by th e n e e d fo r e x ecu tin g two CORDIC o p eratio n s

for e a c h d e sire d o p e ra tio n . This is very u n d e sira b le in a h ig h th ro u g h p u t

application.

A n other a p p ro a c h due to Haviland a n d T uszynski [HT80] involves th e

use of scaling cy cles to tra n s fo rm Xi+1 to X’i+1. F o r so m e e le m e n ts of th e

seq u en ce \Fi ], th e e q u a tio n s

-v -’ _ 1 "*■ 7 t 2 i 0 „ fz.ii')
1+1 0 l + jiZ ~ Fi (4 1 1 )

im m ed iately follow th e re g u la r CORDIC ite ra tio n , th u s scaling th e m ag n itu d e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 80 -

to
CO O CO
O rH rH rH
211 2H •
A
rH
4*
CO
o *
e* CO -X
• ■— o r*-.
A
to
• • CD
X rH rH rH • O
ea ill ZU • c
E 4T. <D
A 2
rH cr
CM 07
rH CO
A c
o o
•r—
4-5
A to
•r* • • CO 4-3
U. • • rH O
1 • CZ
CM A • A
n rH to
+ a »
<o • •r— S-
• CO 07
rS
a a S- "O
• CD r—
•r • • CD O
u . A CD >
ts> • • 4->
C
07 A A ••
O ID
c CD
a; A A A «C c3-
2 ^r CO 4-3
cr CD
CD #» A * 4-5 S.
CO CO CO CM «3 2
CD CD
4-5 A A * Q. •r—
«*- CM CM rH CD Ll_
•r* S-
.E A A a

CO rH rH o A
rH

ii
E
s-
o
* 4-
£ tH o rH *
I

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 81 -

of th e v e c to r Xi+1. The value of 7* = ±1 is chosen during e a c h scaling

cycle to m ove ||Xi+ili tow ards unity, while 7 * = 0 during a cycle in which no

scaling is to be p erfo rm ed .

T hese additional equations a re n o t e x e c u te d fo r e a c h e le m e n t in [F-,1

b u t r a th e r fo r a se le c t se t, £Gj} 1 c (i.e. w hen 7 * 5* 0) in o rd e r to

m ak e th e overall scale fa c to r converge to unity. Let K be th e scale fa c to r

in tro d u c e d by th e CORDIC alg o rith m an d le t K* be th e scaling resu ltin g

fro m th e additional o p eratio n s, i.e..

(4.12)

Then JQ J is c h o sen s u c h th a t KK* - 1.

While th is a p p ro a c h to seeding does n o t in tro d u c e as larg e a sp e ed

o verhead as did W alther’s tech n iq u e, th e p e n a lty in c u rre d is still quite

significant (ap p ro x im ately 50% re p o rte d in [HTBO]). Some ad d itio n al co n tro l

is also re q u ire d to recognize th e elem en ts of since scaling is not done

a t e a c h ite ra tio n .

A f u rth e r u n d e sira b le effect of th e s e two ap p ro ach es is to m ake th e

e x e cu tio n tim e for th e c irc u la r an d hyperbolic CORDIC fun ctio n s m u ch

lo n g er th a n th a t for m u ltip lic a tio n an d division in w hich no scaling is

req u ired . This n o nuniform ity of e x ecu tio n sp eed s is quite a nu isan ce in a

synchronous multi-CORDIC p ro c e sso r en v iro n m en t (see e.g. [AMLA81]), since

p ro c e sso rs co m puting m u ltip licatio n s o r divisions m u st w ait fo r o th e rs

w hich a re o p erating in th e m = ±1 system s.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 82 -

4 .1 .4 S calin g in a P a ra llel Im p lem en tation

The s tr u c tu r e of th e th r e e CORDIC eq u atio n s is su c h t h a t th e y m ay b e

e x e c u te d c o n c u rre n tly a s shown in F igure 4.5. S u c h a re a liz a tio n will be

re f e rr e d to as p a ra lle l. The scaling cycle tec h n iq u e of Haviland and

Tuszynski for norm alizing th e spurious c o n sta n t, K, c a n b e im p lem e n te d in

p a ra lle l realizatio n s w ithout a sp e ed overhead a n d w ith a m o d e st a m o u n t of

ad ditional hard w are. By su b stitu tin g (4.3) fo r X i+1 in (4.11) and dropping

th e s u p e rsc rip t, th e co m b in ed CORDIC ite r a tio n a n d scaling eq u ations a re

Xi+i - (zv+i y t+ i)T -

E xpanding o u t fo r th e v e c to r co m p o n en ts yields:

Xi+1 = Xi + + ■yi x i Z~F' + ^ (4.14a)


A B C

3A+i = Vi ~ rn.j.iiXiZ F< + Fi - ^ (4.14b)


D F

Now te r m s A a n d D a re th e norm al CORDIC ite ra tio n s. T erm B is an

ad d itio n al te r m in th e a:-u p d ate b u t it is sim ply th e o u tp u t of th e sh ifte r in

th e T/-channel. H ence th is re q u ire s no a d d itio n a l c irc u itry in a p a ra lle l

re a liz a tio n Sim ilarly E is available a t th e o u tp u t of th e x -c h a n n e l shifter.

Finally, C an d F a re new te rm s b u t sin ce th e y a re seeded by 2 i , th e y

a re insignificant fo r m o s t of th e sequence \Fi]. They m u st be co m p u te d fo r

th e first few values of ’i \ w here th e ir c o n trib u tio n is significant b u t n o t fo r

all values of ’i ' m ean in g t h a t th e sh ifte r re q u ire d fo r th e se te rm s c a n b e

m a d e r a t h e r sm all. F o r exam ple, th e seeding se q u en c e given in [HT80] would

re q u ire only four possible shifts for six te e n b it q u an tities. Hence, by

building lo u r in p u t r a t h e r th a n two in p u t a d d e rs a n d two additional, sm a lle r

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- S3 -

+ /-

A rith m etic U nit

+ /-

A rith m e tic U nit

F igure 4 . 5 : A P a r a l l e l CORDIC Machine A rc h it e c tu r e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 84 -

sh ifters, th e sealing cycle tech n iq u e fo r norm alizing th e spurious scale

fa c to rs c a n b e im p lem e n te d w ith a v ery low sp e ed p e n a lty (som e p enalty is

due to c irc u it speed). Only a m o d e st a m o u n t of e x tr a h ard w are is n e e d e d

sin ce th e la r g e r additional te rm s, B a n d E , a re available in th e ’u n sealed

CORDIC. Observing th a t th e hardw are c a n b e so s h a re d yields a p o ten tially

larg e sp e e d adv an tag e in a p a ra lle l im p le m e n ta tio n w here sp e ed is likely to

b e a n im p o r ta n t c o n c e rn anyway (o r else a se ria l rea liz a tio n could have

b e e n em ployed).

4 .1 .5 E xtending th e Dom ain of C onvergence

This se c tio n will m ainly be c o n c ern e d w ith th e d om ain of convergence

of th e trig o n o m e tric functions (i.e., m = 1) since a d esirab le, finite dom ain,

i.e. th e circ le , is rea d ily defined. In th e c a se of m ultiplication, division and

h yp erb o lic functions, th e desirab le dom ain of co n v erg en ce is infinite (or a t

lea st, r e s tr ic te d only by th e finite re p re s e n ta tio n of n u m b ers). Recognizing

t h a t th e se q u en c e s \F i] of F igure 4.4 do n o t yield a co m p lete reg io n of

co n v erg en ce fo r th e algorithm (Equation 4.10), w h at c a n be done w hen a

ro ta tio n in itia te s (o r te rm in a te s ) outside th e region?

W alther [Wa7l] su g g ested th e use of p resc a lin g id e n titie s to o p e ra te on

a new v e c to r lying inside th e dom ain of convergence. F or exam ple, w ith th e

[Fil of F igure 4.4 th e dom ain of convergence in cludes angles whose

m ag n itu d e does n o t ex c ee d 1.74 radians. In o rd e r to calcu late, say th e sine

of a la rg e a rg u m e n t, th e angle is firs t divided b y rr/2 resu ltin g in a

q u o tie n t Q an d a re m a in d e r R w ith | R | < it/ 2 w hich is in th e reg io n of

co n vergence. Now

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 85 -

sini? if Q m o d r = 0
cosi? if Q m o d r = 1
siniS = s in ( $ ^ - + R ) =
—sini? if Q m od r = 2
. — cosR if Q m od r = 3

so t h a t only sin R o r cos R n e e d b e co m p u ted . Clearly this tech n iq u e

im plies b o th a su b sta n tia l speed a n d c o n tro l p e n a lty as one division

o p e ra tio n a n d m an y decisions m u st b e p e rfo rm e d .

Still a n o th e r a p p ro a c h is th a t of p re r o ta tio n su g g e ste d by Haviland and

Tuszynski [HTBO]. P rio r to com puting a v e c to r ro ta tio n w hich does n o t lie in

th e reg io n of convergence, ro ta tio n s b y n /2 and tt/ 4 a re p erfo rm ed .

These p rc ro ta tio n s cire relatively sim ple, req u irin g only a m ag n itu d e

m u ltip lic a tio n b y 1 /V 2 which is a c c o u n te d for in th e scaling cycles

g e n e ra tin g K* (E quation 4.12). B oth a s p e e d a n d co n tro l pen alty is once

ag a in in c u rre d .

Com bining b o th th e p re ro ta tio n s a n d scaling m eth o d s, to obviate th e

difficulties w ith th e CORDIC algorithm s, c a n r e s u lt in considerable overhead.

For exam ple, W alther’s CORDIC p ro c e sso r, w hich com bines his two

tech n iq u es, e x h ib its th e execution tim e s g iv en in Figure 4.6. Notice th a t

while th is m a c h in e re q u ire s only 70 //se c fo r th e CORDIC ite ra tio n s w hich

co m p u te th e sine function, a n additional 80 //se c a re re q u ire d for p rescalin g

a n d norm alization!

A new m e th o d capable of tailo rin g th e co n v erg en ce region a n d scale

fa c to rs to d e s ire d values will b e d e sc rib e d . Ideally, an y new tech n iq u e

should r e s u lt in la rg e regions of co n v erg en ce a n d u n ity scale facto rs, w ith

only m o d e s t s p e e d overhead.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 86 -

■Maximum Execution Tim es

DATA
con D ie p r e s c a l e . t r a n s f e r s
EXE- NORMAL- FROM
CCTIOX IZE, MISC. COMPUTER TOT.AL
ROUTINE i<sec jiscc pscc jjscc

LOAD 0 5 25 30
STORE 0 0 15 15

ADI) 0 15 25 40
SUBTRACT 0 25 25 50
m u l t ip l y 60 15 25 100
DIVIDE 60 15 25 10C

SIN 70 65 5 160
COS 70 85 5 100
TAN 130 85 5 220
ATAN 70 15 5 90
S1NH 70 55 5 130
COSH 70 55 5 130
TAN 11 130 55 5 100
ATANH 70 45 5 12U
EXPONENTIAL 70 55 5 130
LOGARITHM 70 45 5 120
SQUAItE- 70 25 5 *00
KOOT

Figure 4 . 6 : Performance o f M alth er's CORDIC Machine

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 87 -

4 .2 LOT? OVERHEAD SOLUTIONS TO THE PROBLEMS OF CONVERGENCE


REGION AND SPURIOUS SCALE FACTORS

Novel solutions to th e a fo re m e n tio n e d p ro b lem s of scale and

convergence, w hich do n o t suffer fro m th e h a rd w a re a n d sp eed o verhead of

th o s e in [W a7l] a n d [HT80], c a n be found b y re tu rn in g to th e th e o ry of th e

CORDIC alg o rith m s. L et K x be th e value of th e scale facto r, K , w hen

m = 1 a n d K - x th e value of K w hen m = —1. Once th e seq u en ce \Fix

h a s b e e n d e te rm in e d , E quation 4.5b shows t h a t th e value of K can only b e

in fluenced th ro u g h th e choice of 'n '. It would b e id eal if K x and I<-x w ere

in te g ra l pow ers of th e m ac h in e radix, in th is c a s e 2, since th e y could th e n be

rem o v e d w ith sim ple sh ifters. U n fo rtu n ately th is is n o t possible w ith th e

seq u en ce of Figure 4.4 fo r an y value of 'ri as show n in Claim 4.I.-

Claim 4.1:

F o r a ra d ix 2 m ac h in e, th e re ex ists no 'n ' su c h th a t K x an d AT_X a re

in te g ra l pow ers of 2 using th e sequences [ F i ] ^ 1 in F igure 4.4.

Proof:

m _= .1.:

Ki = n VTTs?
<=0

-* lnfST? = 1 0 In (1 + 4“i ) since 5* = 2~Fi = 2“*


i= 0

7*= 1
^ 2j ^ (Jen sen ’s Inequality)
i=0

< lim ^ 4 ^ = 4 /3

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 88 -

-* 1 < K i < e 2' 3 < 2

Sim ilarily, it is possible to show th a t 1 / 2 < K -i < 1.

“While th is is m otivation enough to s e a rc h fo r new seq u en ces , it is

also p ossible to show th a t th e trig o n o m e tric fu n ctio n s c a n n o t be m ade to

co n v erg e for all angles in th e c irc le using th e se q u e n c e of F igure 4.4.

Claim 4.2:

F o r a ra d ix 2 m achine, th e re exists no s u c h th a t by using

of F ig u re 4.4, th e trig o n o m e tric functions converge V $ 0 e —[ tt.tt).

Proof:

n~l
m ax | $0 I = « n -i + 2 aj (from E quation 4.10)
3=0

i= cto + lim V a,-


71— j=0

= ta n _1l + lim y ta n -12-J


»— 3= 0

£ ta n _1l + lim Y. Z~j


7l_,“ j = 0

= 2.785 < 77

It is n o t p ossible to s ta te a sim ilar re su lt fo r th e hyperbolic functions, since

$0 e ( —=c,=) an d so th e alg o rith m is lim ited by th e finite b it re p re s e n ta tio n of

S ince th e se q u en c e s [Fi] of F igure 4.4 ex h ib it som e unfavorable

p ro p e rtie s , a lte rn a te seq u en ces m ay be sought. T hese too should be non­

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 89 -

d e c re a sin g seq u en ces so t h a t e a c h ite ra tio n refin es th e ro ta tio n , an d th ey

m u s t co n sist only of in te g e rs to a c co m m o d ate th e use of sh ifters.

L em m a 4.2:

Suppose [Fil is a non -d ecreasin g se q u en c e of in te g e rs. If \Fi

sa tisfie s th e convergence c rite rio n (E quation 4.9) an d a new sequence,

$Q . is c o n s tru c te d by re p e a tin g th e l tk e le m e n t of su ch th a t

{Gil = [F0 F i • ■ • F i-i Fi Ft Fi+i ■■ ■ Fn - Z\ th e n also satisfies

th e co n v erg en ce c rite rio n fo r I = 0, 1.......n —2.

Proof:

L et be th e sequence of in c re m e n ta l ro ta tio n s corresponding to

i.e., oq • • • c j.j ctj cq a £+1 • • • a ^ j.

satisfying th e convergence c rite rio n im plies

a-: - £ CL}- < an -! i= 0 , 1 , 2 n-2 (4.13)


;=i+l

It is n e c c e s s a ry to prove th at:

a 'i - ^ a'j < a ’n - i i= 0 , 1 , 2 ......n - 2 (4.14)

N otice th a t:

cti i < I
K; = (4.15)
Oti_! i >I

F u rth e rm o re , n o tic e th a t fo r th e la rg e r values of i , a t ~ 5j, for sequences

like th o se of F igure 4.4, and since g e n e ra lly Fi - F i-i + 1. we will use th e

ap p ro x im atio n an - 2 « 2a n-i-

In o r d e r to prove (4.14), p ro ce e d as follows:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 90 -

i >1:

n=l n=l
a 'i ~ 2 a '} = ai- 1“ 2
j'=i+l j= i+ l

n-2
= CXi- j - 2 a3
}= i

< 2on_1 « CXn-2 = a-'n-l

T t-l i n-1
a'i ~ 2 a 'j = ai - 2 ai + 2 «J- 1
3=1+1 j"=i+l 2=i+l

t n-2
= - 2 aj + 2 ai
j=i+l j=£

= at - eXj — at
3=i+1

< 2 an_i - a* s= 0 <

T h eo rem 4.2:

If ^ " o 1 satisfies th e co n v erg en ce c rite rio n and j^ c 1 is

c o n s tru c te d b y re p e a tin g th e Ith e le m e n t of \Fi] m u ltiply and fo r one o r

m o re I w ith in th e lim it s of seq u en ce le n g th , th e n also satisfies th e

convergence c rite rio n .

Proof:

Follows fro m re p e a te d a p p licatio n of L em m a 4.2. However n o tice th a t

th e im p licit a ssu m p tio n in th e proof of le m m a 4.2 th a t an - 2 « 2an _! r e s tr ic ts

th e form of [Fi] an d m ay be violated th ro u g h excessive re p e titio n .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This th e o re m is very pow erful b e c au se it g u a ra n te e s th a t th e m u ltitu d e

of se q u en c e s g e n e ra te d in th e p a rtic u la r m a n n e r su g g ested , do in fa c t

satisfy th e co n v erg en ce c riterio n . A nother tech n iq u e for c o n s tru c tin g new

se q u en c e s is v ery sim ilar, b u t th e seq u en ce is allowed to grow in length.

L em m a 4.3:

If satisfies th e convergence c rite rio n (Equation 4.9) and a new

sequence, . is c o n s tru c te d by rep e a tin g th e I th e le m e n t of \Fi]

su c h th a t I Q = [F0 F a • • • Fi-x Ft Ft Fm ••■ th e n also

satisfies th e co n v erg en ce c rite rio n fo r I = 0, 1.........n —2.

Proof:

I t is n e c c e s s a rv to prove:

CLi 2j j ^ » (4.15)
j=i+l

Once again, p ro c e e d in two p a rts:

i > I:

j=i+l i= i+ i

^ &jj-1 — & 71

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 92 -

i<Z:

n I n
£Xi — ^ &j ~ 2 — 2 -1
j'= i+ l j = i+ l i= i+ l

£ 7 1 -1
= ai - £ *i ~ H ai
j=i+1 J=£

= « i ~ 1 2 a ; “ «£
j=i+i

< c tn - 1 — cq < 0 < a 'n

T h e o rem 4.3:

If Sx'iMWc1 satisfies th e convergence c rite rio n a n d jGjjiLc. 7i' > n , is

c o n s tru c te d b y re p e a tin g th e Ith e le m en t of [Fi] m u ltip ly and for one o r

m o re I, th e n also satisfies th e convergence c rite rio n .

P roof:

Follows fro m r e p e a te d ap p lic atio n of Lem m a 4.3.

T heorem 4.3 provides a u sefu l c o n stru c tio n s c h e m e sim ilar to T heorem

4.2. At firs t glance, th is is v e ry close to th e scaling cy cle tech n iq u e of

H aviland e t al [HT80], how ever it is c le a r t h a t while th e ir m e th o d only scales

during a scaling cycle, th e p re s e n t tech n iq u e sc a le s and ro ta te s

sim ultaneously, h en ce m oving c lo se r to th e final re s u lt. Clearly, this is a

m o re efficient m ethod.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 93 -

Exam ple 4.1:

L et n = 16. S tartin g w ith th e sequence of Figure 4.4 for m - 1, apply

th e c o n s tru c tio n of th eo re m 4.2 to g e n e ra te th e sequence

{Gil = I 0, 1, 1, 2. 2. 2. 3. 3. 3, 4, 4, 5, 6, 7. 8. 9 I

This y ield s K x = 1.99 and m ax |$„ | = 172.2° c o m p a red with K\ = 1.67 and

m a x ]$0 | = 99.2° fo r th e original sequence.

N ext, s ta r t w ith th e sequence for m. = —1 to c o n s tru c t

l(kl = u . 1. 1. 1. 2. 2, 2, 3, 3, 4, 4, 6, 7, 8, 9, 10 I

w hich yields K - x - .500 and m ax |$ 0 | = 3 .3 7 co m p ared w ith = .828 and

m a x |$ 0 | = 1.12 for th e original sequence.

N otice t h a t b o th sequences r e s u lt in scale facto rs w hich a re easiiy

c o m p e n sa te d , as well as g reatly e n la rg ed regions of convergence.

It is a p p a re n t from this ex am p le th a t no special seeding cycles a re

re q u ire d during th e co m putation. The algorithm p ro ceed s n o rm ally and

K i « 2 an d K - x » 1 / 2 a t term in a tio n . These effects a re rem o v e d by a

single final shift o p eratio n w hich is quite inexpensive to perfo rm .

F u rth e rm o re , th e c o n stru c tio n of lem m a 4.2 in effect tra d e s an - i for

an d sin c e at > a „_ i (due to th e non-decreasing n a tu re of J at O' th is

m e th o d will always re s u lt in a la r g e r dom ain of convergence. Sim ilarily,

applying th e o re m 4.3 also yields a la rg e r convergence region.

E xam ple 4.2:

L et 77- = 16, m = 1 and s t a r t w ith th e new sequence {Fil = {i - 1

w hich provides a la rg e r dom ain of convergence th a n V older's original

seq u en ce. In fact, {Fil yields = 3.73 and m ax |$ 0 | = 162.6°. Apply

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 94 -

th e o re m 4.2 to c o n s tru c t

= I - 1 , C, 1, 2, 2, 2, 3, 3. 3, 3, 4, 5. 6, 7, S. 9]

which yields = 4.0 an d m ax |$ 0 ! = 212.7° T heorem 4.3 yields sim ilar

re su lts, how ever th e seq u en ce is longer.

Exam ple 4.2 p rovides a sequence t h a t sim u ltan eo u sly re s u lts in a

co m plete re g io n of convergence and scale f a c to r n o rm alizatio n w ith two

shift o p eratio n s. P re cisio n loss owing to th e two b it sh ift is overshadow ed by

th e a c c u ra c y of th e scale fac to r.

4.2.1 E ffect o n A ngular R esolution

A final p ro p e rty affected by $5^ is th e "a ngular reso lu tio n " of th e

algorithm . R ecall t h a t if [F ^ satisfies (4.9) th e n $ (or z ) converges to

w ithin CLn-i of zero w ithin 'n' ste p s w here is defined to be th e

an gular re so lu tio n o r a n g u la r g ra n u la rity of th e co m putation. When \ }

is form ed fro m [F ^ using th e c o n stru c tio n of T heorem 4.2, a'n-i is la rg e r

th e n a n d so th e an g u lar reso lu tio n is w orse. Typically however th e

reso lu tio n is still a c c u r a te enough for m o st ap p licatio n s as was evidenced in

th e ex am ples w h ere cx'n-i = 0.002 rad ia n s w hen m = 1. The c o n stru c tio n

of th e o re m 4.3 m ay be em ployed w hen th e a n g u la r reso lu tio n of th e original

\Fi] m u s t be m e t, sin ce a'n■= On-y In exam ple 4.2, such a c o n stru c tio n

would wield a scale fa c to r of four, a co m p lete reg io n of convergence and

un co m p ro m ised a n g u la r reso lu tio n w ith only five additional CORDIC

ite ra tio n s. This c o m p a re s favourably with th e som e eleven scaling ite ra tio n s

as well a s th e p re -ro ta tio n overhead of o th e r im p lem en tatio n s [HT80].

Sim ilar c o m m e n ts apply to th e case of m. = —1.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 95 -

4 .2 .2 S im u lation R esu lts

C o m p u ter sim u latio n re s u lts a p p e a r in F igure 4.7 w hich confirm th e

o p e ra tio n of th e CORDIC algorithm fo r exam ple 4.2. In th e tab le, th e v e c to r

h a s co m p o n en ts X a n d Y while Z r e p r e s e n ts a n angle ($). Scaling in th e

sim u latio n p ro g ra m was accom p lished by r ig h t sh iftin g b o th X an d Y two

bits. The ex am p les given in Figure 4.7 c le a rly show t h a t th e re g io n of

co n v erg en ce of th e algorithm is indeed c o m p le te an d th a t th e spurious scale

fa c to rs a re re a d ily c o m p e n sa te d for. The final e n tr y in th e tab le shows th a t

th e alg o rith m even converges for larg e angles. The se q u en c e s of Figure 4.4

would have given e rro n e o u s re su lts in th is case.

4 .2 .3 C om putational Speed and Hardware C om plexity

T echniques r e p o r te d by W alther [W a7l] a n d H aviland e t al. [HT80] for

scaling a n d im proving th e convergence re g io n of CORDIC algorithm s, have

b e e n shown to be expensive b o th in th e re q u ire d ad d itio n al h ardw are and

th e e x e c u tio n tim e. The n e e d for s e p a ra te s c alin g a n d p re ro ta tio n s im poses

a s p e e d o v erh ead as larg e as 120%, i.e., th e a d d itio n al o p e ra tio n s consum e

m o re tim e th a n th e CORDIC ite ra tio n s them selves! In com parison, th e

re s u lts m ay b e c o m p u te d in n e a rly th e tim e re q u ire d fo r th e CORDIC

re c u rsio n s aione, by using th e tech n iq u e ju s t d escrib ed . F u rth e rm o re , no

ad d itio n al h a rd w a re is re q u ire d for d ecision m ak in g or p re ro ta tio n s , a very

significant savings.

T here is a n e sse n tia l difference b e tw e en th e p r e s e n t m e th o d an d th e

scaling cycle tec h n iq u e of [HT80] w hich a c c o u n ts for th e en h an ced

efficiency of th e fo rm er. During a scaling cycle, th e m eth o d of [HT80] scales

th e_ m ag n ilu d e of a v e c to r b u t does n o t r o ta te it. In th e p re s e n t schem e,

r o ta tio n an d seeding o c c u r sim ultaneously, sin c e th e n a tu ra l scaling of th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

O perution Initial Values E xpected Final Values Final Values A n gu lar


E rro r
(R ad ians)
X Y Z X Y 7. X Y Z

V ectoring .25 .25 0 . 350 0 .7850 .350 .0007 .7836 .0010

V ectorin g -.1 -.2 0 .22<l 0 -2.03 0 .220 .0002 -2 .0 3 5 .0007

oo
o
V ecto rin g -.2 5 0 0 .250 0 3.101 .250 3.1399 .0017

Rotation -.2 5 0 1.5708 0 -.2 5 0 -.0001 -.25 03 .0003 .0003

Rotation .20 -.1 .61 .221 .0326 0 .222 .0320 .0013 .0013

.25 .25 2.3562 -.3536 0 0 -.3 5 3 9 -.0011 -.0 0 2 .002


Rotation

Figure 4 . 7 : Computer S im u la tio n R e su lts


- 97 -

CORDIC a lg o rith m is exploited. Hence, th e p r e s e n t sch em e is significantly

m o re efficient.

A final p o in t reg ard in g advantages of th e p r e s e n t sc h em e is th e

u n ifo rm ity in n u m b e r of req u ire d ite ra tio n s fo r th e e le m e n ta ry functions

com pare'd w ith th e d isp a rity in th e n u m b e r of ite ra tio n s in th e sc h em e s

p re s e n te d in [Wa7l] a n d [HT80]. Com puting v arious CORDIC functions

re q u ire s a n a p p ro x im ately equal n u m b e r of ite ra tio n s , sin ce th e m odified

seq u en ces a r e of a lm o st equal length. Thus, th e su g g e ste d m eth o d lends

itse lf m o re easily to im p lem en tatio n s of m any CORDIC p ro c e sso rs o p eratin g

in p arallel, s a y in a pipeline or a tightly coupled m ode, b e c a u se th e waiting

tim e of a p ro c e s s o r fo r o th e rs to co m p lete th e ir calcu latio n s is m inim ized

[AMLA81].

Remark:

The CORDIC sc h em e was originally s tr u c tu r e d in a m a n n e r w hich

e n s u re d t h a t [Fi] an d K w ere always c o n stan t. However, it is c le a r th a t th e

e x e cu tio n s p e e d of th e CORDIC re c u rsio n s can oe e n h a n c e d in som e c a se s by

te stin g th e pro x im ity of th e v e c to r being ro ta te d , to its d e stin a tio n value, a t

e a c h ite ra tio n . It c a n in d ee d happen, th a t th e d e s ire d re s u lt is achieved in

th e first few ite ra tio n s. However, in th is case, th e scale c o n s ta n t is also

d e te rm in e d solely by th e s e few ite ra tio n s, and its rem oval involves a division

o p e ra tio n (sin c e its value is d e t e r m in ed by th e n u m b e r of ite ra tio n s

e x e cu te d ). Sim ilarily, d uring vectoring o p e ra tio n s, th e CORDIC equations

always r o ta te th e v e c to r to th e positive abcissa. Som e sp e e d ad vantage c a n

b e rea liz e d b y ro ta tin g a v e c to r to th e n e a r e s t c o o rd in a te axis, how ever

seeding is o nce again p ro b lem atic.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 98 -

4 .3 HYBRID CORDIC ALGORITHMS

CORDIC alg o rith m s offer th e ability to co m p u te v e c to r ro ta tio n s and

trig o n o m e tric fu n ctio n s to good p recisio n w ithout th e n e e d fo r m ultipliers

or s to re d trig o n o m e tric ta b le s (aside from ai). In c o n tra s t, a n a rra y

m u ltip lier coupled w ith s u c h tab le s (re fe rre d to a s th e s to re d tab le

approach) is cap ab le of com puting th e ro ta tio n s m u c h f a s te r th a n th e

ite ra tiv e CORDIC alg o rith m s. This sectio n will explore th e com b in atio n of

CORDIC's w ith m u ltip lie rs an d s to re d tab le s in o rd e r to achieve f a s te r v ecto r

ro ta tio n s th a n th e b a sic CORDIC’s, b u t w ith a c o n sid e rab ly low er sto rag e

re q u ire m e n t th a n th e s to re d ta b le approach. The p e rfo rm a n c e m e a su re to

be m ain ta in e d by all tech n iq u es is angular reso lu tio n . Only th e

trig o n o m e tric fu n ctio n s will be co nsidered alth o u g h th e m e th o d to be

p re s e n te d c a n be re a d ily applied to o th e r co o rd in ate sy ste m s.

4.3.1 Interp olation with. CORDIC’s

C onsider th e CORDIC alg o rith m s for m = 1 w ith th e se q u e n c e s [Fi] of

Figure 4.4. A fter ’n ’ ite ra tio n s, th e angular re so lu tio n is of th e o rd e r of

2 -JI+1 ra d ia n s. A m u ltip lie r a c cu m u la to r to g e th e r w ith sine a n d cosine

tab le s could a cco m p lish th e sam e re s u lt in four o p e ra tio n s pro v id ed th e

tab le s se g m e n te d th e u n it c irc le into 2nir p a rts . The m in im um sto rag e

re q u ire d is l / 8 t h of th e circle, i.e., sines an d co sin es fo r som e S71-3^

different angles. F o r even m o d e ra te values of 'n ' s u c h as 16 or 24, this

b ecom es q u ite significant.

The sto ra g e re q u ire m e n t m ay be re d u c e d c o n sid e rab ly b y quantizing

th e c irc le m u c h m o re coarsely, ro ta tin g by th e c lo s e s t angle using th e

m u ltip lier a n d t h e n in te rp o la tin g using CORDIC's to achieve th e desired

re s o lu tio n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 99 -

Let iFilZTo1 = [iliFo'1 (as, for in stan ce, w hen m = i) so th a t th e CORDIC

a lgorithm re s u lts in a n an g u lar reso lu tio n of R q = 2_n+1 (radians) a fte r

'n ' ite ra tio n s. L et th e u n it circle be q u an tized to 2~k p a rts yielding

reso lu tio n R j - 2tp2_1: ~ 2~k+zs. An in te rp o la tio n ste p via CORDIC's

s ta r ts a t i = k - 2 and p ro ce e d s to n - l i.e.. n - k + i CORDIC ite ra tio n s

a re req u ired . F o r p u rp o se s of analysis, a ssu m e all q u an tities a re

re p re s e n te d w ith equal w ordlengths, th e m u ltip lie r tim e is 7# and a

CORDIC ite r a tio n re q u ire s tim e Tc- The e x ecu tio n tim e, E , and sto ra g e

req u irem e n t, 5 , fo r th e th re e sch em es u n d e r c o n sid e ratio n a re (Memory

m an ip u la tio n tim e is ignored) :

CORDIC only:

E c = riTc Sc = n locations

M ultiplier only:

Eji = 4 7 # S ji = ^ n = 2n-37r locations


o

Hybrid:

E jj — 4 7 # + (ti —k + l) Tc S jj * 2fc-s locations

Then:

+1 TC Sn fc Ec TL
- 1 + 4 TU wM e W = ^ n - k + l + 4 T M/ T c

i.e., while th e e x e c u tio n tim e ratio s a re only lin early d e p e n d e n t o n n - i (th e

n u m b e r of in te rp o la te d b its), th e s to ra g e re q u ire m e n t varies exponentially!

These ra tio s a re d e p ic te d grap h ically in F igure 4.8.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 100 -

n-k
T “(bits)
20

Figure 4.8: Performance of Hybrid CORDIC Scheme

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 101 -

Example 4.3:

L et n = 24, A: = 15 a n d —— = 2. Then:

—— = 51277 and
Eu

i.e., while th e h y b rid m e th o d is only 2.25 tim e s slow er th a n th e s to re d ta b le

m eth o d , it provides a 1500 fold re d u c tio n in storage! On th e o th e r hand, th e

h y b rid m e th o d is also 33% f a s te r th a n th e s ta n d a rd CORDIC.

Rem arks:

1} The choice of ' k ' c a n be optim ized for th e d e sire d com bination of

s p e e d a n d sto ra g e c o n stra in ts.

2) The a rr a y m u ltip lie r is likely to b e m u ch la rg e r in a n in te g ra te d

re a liz a tio n th a n a CORDIC block so th e a re a p e n a lty in c u rre d by

th e h y b rid sc h em e th ro u g h th e ad d ition of a CORDIC block is

m arg in al (This is esp ecially tru e if th e CORDIC ite ra tio n s a re done

w ith th e m u ltip lie r in th e c a se w hen Tu - Tc). In an y event, th e

overriding c o n sid e ra tio n is th e re q u ire d sto rag e. Clearly, th e

h y b rid sc h em e re q u ire s less a r e a th a n th e m u ltip lie r sch em e since

th e m e m o ry a re a is d rastic a lly red u c e d .

3) This tec h n iq u e, a lth o u g h p r e s e n te d only fo r m = 1, c a n be read ily

a p p lied to o th e r c o o rd in a te sy stem s.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 102 -

4 .3 .2 A T aylor S e rie s A pproach t o H ybrid CORDIC's

The foregoing h y b rid CORDIC m eth o d provided th e ability to tra d e

e x e cu tio n sp e e d fo r sto ra g e by initially com puting a c o a rse r o ta tio n and

th e n using th e CORDIC m eth o d to in te rp o la te th e rem ain in g b its. An

a lte rn a te h y b rid sch em e which first em ploys th e CORDIC ite ra tio n s to

re d u c e th e angle to a sm all value and th e n utilizes a Taylor se rie s expansion

to achieve th e final r e s u lt will now be describ ed . As in th e previous section,

only th e p lan e ro ta tio n case will be d iscu ssed in detail, how ever th e

ex ten sio n to th e additional CORDIC functions will be obvious. This hybrid

a p p ro a c h ach iev es s u p e rio r p erfo rm a n c e im p ro v e m e n t while req u irin g n o

ad d itio n a l sto ra g e , how ever it does n o t allow fo r as m u c h c o n tro l over th e

sp e e d tra d e o ff as th e m e th o d of S ection 4.3.1 d id (th ro u g h th e choice of

n —k ) .

C onsider th e v e c to r ro tatio n :

COS 2 —S£T. 2
Xn = sin 2 cos 2

Let all q u a n titie s b e re p re s e n te d to 'n' -b it precision. Then, a v e c to r ro ta tio n

re q u ire s 'n' CORDIC ite ratio n s. Suppose th a t m < tl ite ra tio n s a re

p e rfo rm e d yielding a v e c to r XTO and a resid u a l angle, <pm , close to zero,

w hich is th e d e s tin a tio n angle. Next, n o te th a t th e Taylor s e rie s expansions

of s in a n d cos y> a ro u n d th e origin are:

xn (—l) fc (p2*
c o s <" = s , < M ~ -

However for <pm close to zero, only th e first t e r m of th e ex p an sio n is

significant, i.e.,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 103 -

sin. <pm = <pm

T herefore th e h y b rid CORDIC a lg o rith m is:

S tep 1: C om pute 'm ' CORDIC ite ra tio n s to o b tain X m w ith a sm all

resid u a l angle <pm .

S te p 2:

1 ~<Pm
= »» i *

N otice th a t no a d d itio n a l s to ra g e is req u ire d in th is h y b rid sch em e an d th a t

only two m u ltip lic a tio n s a re n e c e s s a ry in th e final ste p of th e algorithm . It

is straig h tfo rw ard to show th a t:

Eh _ m Tc 1_
0 < m. < n
Eh 4 Tm 2

71
0 < m <7i
Eh 772. + 2 T ji/ T c

These rela tio n s a re show n g rap h ically in Figure 4.9.

It is n e c e s s a ry to d e te rm in e how larg e 'm ' m u st be in o rd e r for th e

tru n c a tio n of th e T aylor s e rie s to be justified. F o r 'n' b it precision,

m u st be ch o sen sm all enough s u c h th a t th e additional te rm s of th e series

a re individually s m a lle r th a n m ay b e accom m odated by th e finite b it

re p re s e n ta tio n . (Note t h a t it is n o t n e c e ssa ry to g u a ra n te e t h a t th e

su m m a tio n of th e r e m a i n ing te r m s b e so sm all since finite b it a d d itio n is n o t

associative. Indeed, failu re to reco g n ize th is fa c t would serio u sly com prise

th e p e rfo rm a n c e of th is m eth o d ). T herefore, i t suffices to choose 'm ' such

th a t

<Pm < 1 ana

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 104 -

G(Wc.=oJ

/x —
ftr . / J = : 0 \ /

Figure 4 . 9 : Geometric I n t e r p r e t a tio n o f th e CCM

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 105 -

i.e.,

However, as in d ic a te d in S ection 4.3.1, th is re q u ire s

i.e., u p to alm o st one half of th e b its m a y b e ob tain ed from th e final s te p of

th e algorithm , w ith th e sim ple inclusion of a m ultiplier! (This is s u p e rio r to

som e of th e new er schem es, e.g., [F a 8 l], w hich em ploy tru n c a te d pow er

E xam ple 4 .4

R ecall Exam ple 4.3 in w hich ti = 24 and 7 # / Tc = 2. With


7b 1
ttl = — ] = 13, th e p e rfo rm a n c e of th e p re s e n t m eth o d is:

w hich is b e tte r th a n Exam ple 4.3. The r e a l b en efit of th is m e th o d is t h a t no

s to ra g e is req u ired .

R em ark :

1. In th is exam ple, only 13 CORDIC ite ra tio n s a re p e rfo rm e d . If 13

ite ra tio n s w ere p e rfo rm e d w ith th e m eth o d of S e c tio n 4.3.1, th e n

E # / E ji = 2.75 w hich is w orse th a n th e p r e s e n t schem e.

2. For larg e m , n i t is c le a r t h a t am - j « 5m_! = 2-mM. Since

9m < Kjr.-i. it is possible to sacrifice angular re so lu tio n and

a p p ro x im ate w ith a pow er of two (e.g.. a m-i). Then

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 106 -

Elif 77L 4 2
—— = — ------ —— -which r e p re s e n ts a n ad ditional p e rfo rm a n c e
Hu 4 la

im provem ent.

3. This tec h n iq u e m ay b e rea d ily applied to th e o th e r CORDIC

functions. F o r exam ple, th e hyperbolic ro ta tio n s involve Taylor

se rie s expansions of co sh <pm an d sinh cpm w hich a re v e ry sim ilar

in fo rm to th o se for cos <pm an d sin tpm .

4. A n g u lar R e so lu tio n R e v isite d

T heorem 4.2 provided a v e ry pow erful m eth o d fo r im proving th e

do m ain of convergence of th e CORDIC algorithm , how ever it

re d u c e d th e an g u la r reso lu tio n as m en tio n ed in S e c tio n 4.2.1.

F igure 4.7 shows th a t th e an g u la r residual, w hich is eq u iv alen t to

5 , was n o n e th e less v ery sm all. Exploiting th is fact, i t is n a tu ra l

to im prove th e reso lu tio n by applying Step 2 of th e foregoing

h y b rid algorithm , w hich re q u ire s th e addition of a m u ltip lie r to th e

CORDIC a rith m e tic u n it. 'Whether it is p re fe ra b le to c o n s tr u c t th is

m u ltip lie r o r sim ply em ploy T heorem 4.3 to im prove th e reso lu tio n

will d e p e n d on th e application.

4 .4 FLOATING POINT CORDIC ALGORITHMS (FLOEDIC)

All expositions on CORDIC algorithm s to d a te have b e e n a im e d a t fixed

p o in t im p lem en tatio n s, and th e n a tu r a l a ssu m p tio n h a s b e e n t h a t if th e

a lg o rith m s co u ld be generalized, a n in te g ra te d re a liz a tio n would b e m u c h

m o re com plex a n d costly. The FLORDIC tech n iq u e to b e d e s c rib e d in th is

sectio n , is a floating poin t CORDIC m ethod. In te restin g ly . FLORDIC

alg o rith m s a p p e a r to b e sim gder to im p lem e n t th a n th e fixed p o in t CORDICs

sin c e th e y c a n b e s tr u c tu r e d to re q u ire little o r no shifting fo r th e seeding by

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 107 -

<5j. This is v ery significant fo r b it p a ra lle l realizatio n s like F igure 4.5 in

which th e b a rr e l s h ifte r co n su m es m o s t of th e chip a re a (as will b eco m e

obvious in c h a p te r six). Accordingly, th e th ro u g h p u t p e r a re a ad v an tag es of

b it p a ra lle l v e rsu s b it se ria l rea liz a tio n s a re quite larg e (som e d e ta ils a p p e a r

in c h a p te r six).

C onsider floating point n u m b e r re p re s e n ta tio n :

X = Mz 6 c*+e (4.17)
w here

AT is a floating po in t n u m b e r

Mx is a m a n tissa in sign plus m ag n itu d e fo rm at [Hw79]

cz is a n in te g e r c h a ra c te ris tic

k is th e c h a ra c te ris tic b a se

e is a n offset q u a n tity for a n ex ce ss sto ra g e fo rm at [Hw79]

Mx is g e n e ra lly in e ith e r no rm alized o r sta n d a rd iz e d form , th e l a t te r

re p re s e n ta tio n being used in m o st m ain fra m e c o m p u te rs while th e fo rm e r is

th e IEEE m ic ro c o m p u te r floating p o in t sta n d a rd .

R ecall th e CORDIC ite ra tio n s (E quations 4.3 a n d 4.6):

a:i+1 = X i+ m S iy , (4.18a)

Vi+i = Vi ~ (4.18b)

Zi+i= -O i (4.18c)

(th e fa c t th a t th e signs in th e eq u atio n s dep en d o n th e d ire c tio n of ro ta tio n

has b e e n ig n o red h e re since it h a s no b e a rin g on th e p re se n ta tio n )

In th e sequel, a ssu m e th a t th e c h a ra c te ris tic and m a n tis s a in th e

floating poin t fo rm a t a re se p ara b le (all floating poin t a rith m e tic u n its a re

capable of doing th is, m aking it a rea so n a b le assum ption). F u rth e rm o re , le t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 108

—F
6i = p i fo r som e p. Then, th e CORDIC equations m ay be w ritten:

Zi+1 = Xi + m.My.b Vi p Fi (4.19a)

Vi+i = V i - M s f i ^ ' p Fi (4.19b)

The m ajo r difficulty in realizin g th e s e equations a re th e p ro d u c t te rm s

of th e g en eric form :

b =+ e p -F

Many options ex ist for co m puting th is p ro d u ct.

Case 1: p = 2 , 6=2*

In th is case, w hich is th e c lo s e s t to th e fixed point situation, E quation

4.19 becom es:

(c — ft-) + g
Xi+1 = Xi + rnMy. b Vi k (4.20a)

Fi
Vi+i = Vi ~ MXib { Xi (4.20b)

These equations a re re a d ily reco g n ized as floating p o in t additions of x*

a n d y t in w hich one of th e two v a riab les h a s h a d its c h a ra c te ris tic m odified

F-
by S u b tra c tin g a c o n s ta n t fro m th e c h a ra c te ris tic is a n e asy task , so

th e in te g ra te d im p le m e n ta tio n of (4.20) is read ily achieved w ith a floating

F-
p oin t ad d er. W ell... n o t quite. N otice th a t ■£~is n o t always a n in te g e r. L et

F-
qi = in te g e r = Fi m o d k . Then Equation 4 m ay be w ritten :

xi+1 = Xi + m{2~*iMyi ) b ' Vi (4.21a)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 109 -

__ (c_ - ?t ) + e
Vi+i = Vi ~ (2 % ) b 1 (4.21b)

N ov E quation 4.22 is p a rtic u la rily e a sy to im plem ent. Sim ply shift th e

m a n tis s a an d s u b tra c t a c o n s ta n t fro m th e c h a ra c te ris tic of (or y t) and

do a floating po in t addition w ith y i (or x^) to yield th e re s u lt. The re su ltin g

m a n tis s a is g u a ra n te e d to be in th e c o rr e c t fo rm a t since r* < k .

Rem arks:

1) The m a n tis s a shift could be done w ith th e shift u n it u se d in th e

floating point adder, th u s n e c c e ssita tin g no f u rth e r hard w are.

A lternatively, a s e p a ra te s h ifte r will en h a n ce th ro u g h p u t.

2) Even if a se p a ra te sh ifte r is built, its shift range is m e re ly from 0

to k — 1 r a th e r th a n th e e n tire ra n g e of [Fi ] as in th e fixed p o in t

case. For exam ple, c o n sid er a 32 b it floating point r e p re s e n ta tio n

in w hich b = 16 , i.e. k = 4 a n d a 24 b it m a n tissa is used. The

values of \F.■] ran g e from 0 to 23 req u irin g a 23 po sitio n b a rre l

s h ifte r for a parallel rea liz a tio n (quite a form idable task ), while th e

p r e s e n t floating p o in t m e th o d would req u ire a th re e p o sitio n

s h ifte r only. The la tte r is of c o u rse m uch, m u c h sim pler to

in te g ra te and consum es roughly 12% of th e area.

3) Fi c a n be s to re d as qz , w ith no s to ra g e pen alty w hatever.

Case 2: p = b

This c a se is p e rh a p s th e m o st in te re s tin g , since it is re a lly th e CORDIC

a lg o rith m applied to a radix p m ach in e (it is in te re stin g th a t th is re s u lts in a

n a tu r a l c o n n e ctio n to floating po in t alg o rith m s). S u b s titu te k = 1 in to

(4.20) to get:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 110 -

(=„. - Fi ) + e
z i+1 = Xi + m U y<h 1 (4.22a)

(c - )+e
Vn-i = V i ~ * (4.22b)

This case re su lts in a tru ly sim ple im p le m e n ta tio n since yl+1 (o r Z£+1) is

read ily ob tain ed th ro u g h a floating po in t ad d itio n of y x (xt ) and x x (my,^),

a fte r a c o n sta n t has b e e n s u b tr a c te d fro m th e c h a ra c te ris tic of x i (yx). This

is clearly m u ch sim pler th a n th e fixed p o in t situ a tio n since only a d d e rs a re

req u ired . The am ount of shift re q u ire d by th e floating p o in t a d d e r for rad ix

p o in t alignm ent is sim ply [ cyi - c ^ + F.: j ( Ic^ - cyi + Ft j) and th e

m odification of th e c h a ra c te ris tic of z* (yi) n e e d only be explicitly c o m p u ted

for

th e c h a ra c te ris tic of y i+l w h en cx . — Fx < cy.

th e c h a ra c te ris tic of Zi+1 w hen cyi — Fx < cXl

This se ctio n has th u s fa r d e m o n s tra te d th a t CORDlCs m ay b e readily

g en eralized to floating point re p re s e n ta tio n s having sim ple realizations,

how ever it is still n e c c e ssa ry to d e sc rib e a m ea n s fo r sim ultaneously

obtaining a sufficiently larg e reg io n of convergence an d a n a p p ro p ria te

a n g u la r reso lu tio n to acco m o d ate th e larg e dynam ic ra n g e of n u m b ers, i.e.

a n a p p ro p ria te choice of \Fxl. 'When m = 1, an y of th e seq u en ces of Section

4.2 m ay be em ployed sin ce th e size of th e d e sire d reg io n of convergence is

n o t a lte re d by th e floating point re p r e s e n ta tio n (i.e. w h eth er floating or

fixed point, convergence is still d e s ire d fo r all angles in th e circle).

U nfortunately, th is is n o t th e c a se fo r th e lin ear an d hyperbolic co o rd in ate

sy ste m s and th e following a lte rn a tiv e s m u s t be considered:

1) R eso rt to th e u se of som e negative in te g e rs in \FX\ as was done in

exam ple 4.2. This will im prove th e region of convergence, however

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- Ill -

additional ite ra tio n s will be n e c c e s s a ry if a n g u la r reso lu tio n is n o t

to b e com prom ised.

2) Utilize th e prepcaling id e n titie s in [Wa7l] w hen larg e qu an tities a re

e n c o u n te re d .

3) W hen m. - —1, 0 it is m o st convenient to exploit th e sep arab le

n a tu r e of floating point re p re s e n ta tio n s to p e rfo rm only fixed point

o p e ra tio n s o n th e m an tissas a n d a c c o u n t fo r scaling w ith sim ple

ad d itio n s to th e c h a ra c te ris tic s of th e q u a n titie s involved. Since

th e m an tissa s a re always norm alized (o r sta n d a rd iz e d ), th e y lie in

a very lim ited ran g e an d convergence of th e fixed po in t CORDIC

a lg o rith m does n o t pose any problem s. E xam ples of th is idea a re

p r e s e n te d in [Ah82], to which th e re a d e r is r e fe rre d fo r additional

details.

The l a t te r ap p ro a c h is v ery a ttra c tiv e since convergence of th e algorithm is

n o t of m a jo r concern.

4 .5 THE CONVERGENCE COMPUTATION TECHNIQUE

A new ite ra tiv e technique, b a sed on a g e n e ra liz a tio n of th e convergence

division m e th o d [Go64], for th e evaluation of exponentials, logarithm s, ra tio s

an d sq u a re ro o ts of fra c tio n a l n u m b ers was in tro d u c e d by Chen [Ch7l].

C onsider initially, th e convergence division m eth o d fo r evaluating

Q = N /D

At e a c h ite ra tio n , a c o n sta n t Ri is c h o sen to m u ltip ly b o th n u m e ra to r an d

d e n o m in a to r so a fte r K -iterations:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The sequence $/Z*J is c h o s e n so t h a t th e d e n o m in a to r converges to unity.

Hence, th e n u m e ra to r converges to th e desired quotient.

C hen's g e n e ra liz a tio n o p e ra te s sim iliarily a n d is b a s e d on th e co­

tra n sfo rm a tio n of a n u m b e r p a ir (x . y ) su c h th a t som e fu n ctio n F ( x ,y ) is

in v arian t (e.g. above (x ,y ) = ( N ,D ) and F ( x , y ) = x / y ) . The ite ra tio n s or

tra n sfo rm atio n s a re m a d e to p ro c e e d in a m a n n e r driving 2 to a known

value x u so y converges to th e corresponding y u w hich is th e d esired

re su lt.

In o rd e r to ev alu ate a fu n ctio n z 0 = f ( x ) jz = x in tro d u c e a variable

y to form th e co n v erg en ce fu n ctio n F { x ,y ) satisfying

(1) th e r e ex ists a know n in itiatio n value y = y 0 su ch th a t

F (x 0 ,y0) = z 0

(2) th e r e ex ists a co n venient tra n sfo rm a tio n of {xk ,yk ) into

(xk+1y k+l) s u c h t h a t F (x k+1,yk+1) is in v arian t V k 5:0.

(3) a known d e s tin a tio n value x u is re a c h e d th ro u g h th e sequence of

x -tra n sfo rm a tio n s an d th e resulting -tra n sfo rm a tio n s converge

to y = y u = F { x a,y a) = z0.

A g e o m e tric in te r p r e ta tio n of th ese conditions is given in Figure 4.9.

The function F ( x ,y ) is c o n s tra in e d to lie in th e z = z 0 plan e of a th re e

dim ensional cube w hich h a s P0 = (x0 ,y0 ,z0) as one v e rte x . The invariant

tra n sfo rm a tio n im plies t h a t a t e a c h iteratio n , th e p o in t Pk = {xk ,yk ,zk )

lies on th e curve F ( x ,y ) , i.e.,

= F (x 0,y0) = F ( x ,y ) = • • • = F (xk .yk ) = • • • = F { x u,y u) = ^

F u rth e rm o re , th e c u rv e F m u s t pass th ro u g h th e p o in t Q (xu,y u,z0)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 113 -

as a co nsequence of th e th ird co n d itio n above.

The tra n s fo rm a tio n ru le tak in g Pk to Pk +i involves th e se le c tio n of a

p a ir of functions, tp and ip s u c h th a t
»
= V (x k.Vk)

Vk+i = fiZk-Vk)

Som e exam ples m il now b e given.

4.5.1 E xam ples of th e C onvergence C om putation Technique:

C onsider th e co m p u ta tio n of f ( x ) = w e z fo r 0 ^ x < In 2. Let

F ( x ,y ) = y e x w ith in itiatio n value y 0 = w and d e stin a tio n x a = 0. The

tra n s fo rm a tio n ru le s are

z k+i = <p(xk ,Vk) = x k - Id. ak


yk+i = (Zk.Vk) = Vk u-k

Then c le a rly

F (x k+1,yk+1) = y k+ie*k+1 = y k eXk = F {xk ,yk )

is invariant.

The a lg o rith m is te m in a te d w hen x -»0. As w ith th e CORDIC

algorithm s, ak is ch o sen in s u c h a way as to rep la ce m ultip licatio n s w ith

shift an d a d d o p eratio n s, i.e., choose a* = 1 + 2~m w here th e se le c tio n of

’to ' is d e ta ile d in [C h7l] (Notice th e sim ilarity of ak to 6k in th e CORDIC

algorithm ).

A dditional exam ples are:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 114 -

L o g arith m :

f(x) = w + ln x fo r 1 / 2 < x < 1

F { x ,y ) — y + In x

in itia tio n y 0 = w d e stin a tio n x u = 1

^Jfc+l — z k ak
T ransf o rm a tio n
Vk+1 = Vk - l n a i

R atio Algorithm :

f (x ) = w / x for l / 2 s = x < l

F { x ,y ) = i / / z

in itia tio n y g = w d e stin a tio n x u = 1

/
T ra n sfo rm a tio n >„ _
I V jt+ i = V k ^ k

In verse Square Root:

f (x ) = w / V x fo r l / 4 < z < i

F{x >y) ~ y / ^ x

in itia tio n y 0 = w d e stin a tio n x a — 1

**+i = x k<*k
T ransf o rm atio n
?/jfc+l - y k ak

Notice if w = x th e n VET is o b tain ed as th e re su lt.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 115 -

4 .5 .2 H yb rid C onvergence C o m p u tatio n

As w ith th e h y b rid CORDIC a lg o rith m s of S ectio n 4.3, i t is possible to

derive h y b rid convergence c o m p u ta tio n sch em es. Since th e se dep en d on

th e individual tra n s fo rm a tio n eq u atio n s of th e CCM fo r th e p a rtic u la r

fu n c tio n d e sire d , th is se ctio n will illu s tra te th e h y b rid c o n c ep t w ith an

exam ple.

C onsider th e ra tio algorithm w hich is defined fo r 1 / 2 ^ x < 1. By

choosing ak = 1 + 2-m w here m is th e p o sitio n n u m b e r of th e leading 1-bit in

|1 —x k \, th e division o p e ra tio n re q u ire s a n e x p e c te d N / 4 ite ra tio n s for

N + 1 b it q u a n titie s [C h7l]. The d e stin a tio n , x* -» 1, could be re a c h e d in

a single ite r a tio n if Oj, = 1 / x a w ere looked u p in a ta b le (ROM) (This will be

r e f e r r e d to as th e s to re d ta b le ap p ro a c h ). However, th e a m o u n t of sto ra g e

is pro h ib itiv e even for 16 o r 24 b it p re c isio n since th e r e a re 2 15 or 2s3

d is tin c t values of x satisfying 1 / 2 ^ x < 1. Now assu m e th a t th e sto ra g e

re q u ire m e n t is re d u c e d by quantizing th e in te rv a l 1 / 2 ^ x < 1 m u ch

m o re co arsely , e.g., to 2Q quantities. F u rth e rm o re , a ssu m e th a t a fixed

p o in t m u ltip lie r, w ith m ultiply tim e Ta , is available for scaling by a*. (since

is no lo n g e r of sim ple form ). T hen a h y b rid CCM sc h em e involving tab le

lookup is:

S te p 1: F ro m xOJ look up a n a , in th e ta b le w hich is c lo se st to

1 / x 0.

S te p 2: C alculate

X i = XgOo
V i = v 5o=

S te p 3: C alculate m = po sitio n of leading 1 b it in j l - X j l

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 116 -

S te p 4: Continue w ith n o rm a l CCM tra n sfo rm a tio n un til co n v e rg e n c e is

achieved.

The h y b rid CCM essen tially u s e s ta b le lookup to g e t close to th e final

r e s u lt a n d th e n refines th e p re c is io n of th e re su lt via th e CCM re c u rsio n s,

h e n c e realizing th e m ajo r adv an tag e t h a t fewer ite ra tio n s a re re q u ire d to

achieve th e final resu lt, th a n th e s ta n d a rd CCM, while m u ch less m e m o ry is

n e c e s s a ry th a n th e s to re d ta b le ap p ro ach . An additional fixed poin t

m u ltip lie r in c re a se s th e am o u n t of hardw are, over th e s ta n d a r d CCM

ap p ro a c h . However, since th e m u ltip lie r allows rem oval of th e re s tric tio n

th a t ak = 1 + Z~m , as in th e u su al CCM, som e fu rth e r adv an tag e is likely

a tta in a b le in ste p 4.

In o rd e r to analyze th is m eth o d , le t th e ROM contain values of Og arising

fro m a lin e a r quan tizatio n of th e in te rv a l [1 /2 , l) to 2® levels. L et

S jj = sto rag e re q u ire d by h y b rid schem e (in words)

S x = sto rag e re q u ire d by th e ta b le lookup m eth o d

Tji = m u ltip lie r tim e

Tc = tim e for one u su a l CCM ite ra tio n

E c = e x p e cte d ex e cu tio n tim e fo r usual CCM

Eh = e x p e cte d ex e cu tio n tim e fo r th e h ybrid sch em e

E t - e x p e cte d e x e cu tio n tim e fo r th e tab le lookup m e th o d

jV t ! = n u m b er of b its in n u m b e r re p re s e n ta tio n

Then

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 117 -

^ r - = 2 Q~N (4.23)
Of
NTr
Ec - ~ z — (since N /4 ite ra tio n s ) (4.24)

Now, for som e x„, th e re trie v e d a*, will differ from ; a n d th e


xo

m axim um d eviation will d e te rm in e th e n u m b e r of u su al CCM ite ra tio n s

re q u ire d in s te p 4 above. It is re a so n a b le to e stim a te th e m axim um

d iscrep an cy a s o c c u rrin g w hen x0 falls b e tw e en two ta b le en tries, I x an d / 2,

which are:

h = x 0 - 2-((?+2>
J 2 = x 0 + 2 - « t8>

w ith c o rresp o n d in g values of a0, d e n o te d a,,1, a 2

a,,1 = ------------------
^ x 0 - z-w+Q
2_ 1
a° ~ x 0 +

Hence:

1_ _____
x ° a° ~ x 0 - 2 _w+2)
1
- ^ 2-C9«)
1 --------------
*0

< i _ 2~(G+2) ^o r x ° e 2, 1)

n i + 2~(9+2)

cLTid.

x 0 a 2 < 1 - 2_(E?+2)

i.e., $ b its of p re c is o n a re obtained in th e first two steps.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- '118 -

T herefore, s te p 4 m u s t a c co u n t fo r th e re m a in in g N — Q b its of
N - Q
p recision, req u irin g a n e x p e c te d — CCM ite ra tio n s .

R elative sp e ed s of th e h ybrid CCM w ith one m u ltip lie r and th e s to re d

ta b ie a p p ro a c h e s a re c o m p a re d th ro u g h th e ratio :

Ej Tu

= 4 + {N (4.25)

E quations (4.23) a n d (4.25) su m m arize th e re la tiv e p e rfo rm a n c e of th e

two sch em es. M em ory a c c e ss tim e has b e e n ig n o red in th is sim ple analysis.

While th is is re a so n a b le fo r th e h ybrid CCM, it gives o p tim istic re s u lts for th e

s to re d ta b le a p p ro a c h w hich has a larg e m em o ry a n d hence, a slow er

JSp S ff
access. N otice t h a t while depends lin e a rly on N — Q, —— exhibits an
hr Ex

ex p o n en tial d e p e n d en c e . T herefore, a n e x p o n en tia l re d u c tio n in sto ra g e

re q u ire m e n ts (th ro u g h th e use of th e h y b rid CCM) in c u rs only a lin ea r

re d u c tio n in speed! R ecall th a t th is was also th e c a s e w ith th e h y b rid

CORDIC a lg o rith m s .

S im ilar h y b rid sc h e m e s m ay be o b ta in e d fo r th e o th e r CCM functions.

While q u a n tiz a tio n of th e d om ain of th e fu n c tio n h a s only b e e n considered,

th e r e m ay b e a n adv an tag e to quantizing th e ra n g e in ste a d . F u rth e rm o re ,

optim al q u a n tiz a tio n h a s n o t b e e n co n sid ered , a lth o u g h th e p re s e n t lin e a r

sch em e is e x p e c te d to b e quite efficient.

Finally, n o te t h a t a Taylor se rie s ty p e h y b rid sc h em e could also be

developed fo r m a n y of th e CCM functions in m u c h th e sam e m a n n e r as

p re s e n te d in S e c tio n 4.3.2.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 119 -

4.5.3 H ardw are Im p le m e n ta tio n

C hen [Ch.71] has also pro p o sed a m ac h in e a rc h ite c tu re for

im plem enting th is algorithm . This is shown in F igure 4.10. Notice th a t

m u ltiplications have b e e n avoided by choosing all sc ale fa c to rs to be in te g ra l

pow ers of th e m a c h in e radix. As w ith th e CORDIC's, th e m a jo r a rith m e tic

co m p o n en ts a re , th e re fo re , a sh ifte r and a n a d d e r. R ecall th a t in th e

CORDIC algorithm s, th e values of Jdij w ere c h o s e n to b e pow ers of th e

m ac h in e radix, n e c e s s i ta ti n g sto rag e of th e co rre sp o n d in g angles a t in a

m em ory. Sim ilarily, b y choosing a* = 1 + 2-m h e re , a m e m o ry is req u ire d

to s to re th e v alu es of In ak .

The o p e ra tio n of Chen’s m achine is q u ite sim ple. Consider, fo r

exam ple, th e a lg o rith m of S ection 4.5.1 for co m puting w e x , whose

tra n s fo rm a tio n ru le s are:

**+i = P fa b .V*) = ** - l n t i f c

= if {xk'Vk) = Vk a-k

an d ak h a s th e fo rm 1 + 2~m. The y -tra n sfo rm a tio n is easily c a lc u la te d by

placing y k in th e T re g is te r a n d its scaled value in th e U re g is te r. Adding

th e two yields y k+i w hich is p la c e d b a c k into th e F -re g iste r. Sim ilarily, th e

value of z k is p la c e d in T while th e value of a*, is u s e d to d e te r m ine —In ak

from th e m em o ry . Adding yields x k+1 w hich is p la c e d in th e X re g is te r, th u s

com pleting a n ite ra tio n .

4 .6 RELaXIONSxilF BETWEEN THE CORDIC AND CONVERGENCE COMPUTATION

ALGORITHMS

In his original tre a tis e . C hen[C h7l] n o te s t h a t h is convergence

c o m p u ta tio n te c h n iq u e differs fu n d am en tally fro m o th e r algorithm s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 12C -

To term ination
m < m• algorithm

Scratchpad
Captions
Memory
m < m
lX): Contents
o f register X
Shifter
C(m): Contents C(m) * - In H + 2~m)
o f memory
location m
x 01 y

C (m)

ADDER

Figure 4 .1 0 : A Machine A r c h ite c tu r e f o r th e CCM

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 121 -

including CORDIC. However, it is possible to show t h a t a g en eralizatio n of

th e CCM includes th e CORDIC equations as a sp e c ia l case. This n ot only

d e m o n stra te s t h a t th e two algorithm s a re in fa c t in tim a te ly linked, b u t also

provides th e ab ility to include v e c to r ro ta tio n a n d p o la r/re c ta n g u la r

coordinate co nversion in Chen’s m ethod, h e n c e, realizing a unified

s tru c tu re for im p lem e n tin g all of th e s e functions.

It will be c o n v en ien t to d e fe r discussion of th e g e n e ra liz e d s tru c tu re of

Chen's tech n iq u e u n til a fte r th e connection b e tw e e n th e two algorithm s has

b e e n d e m o n s tra te d by a n a lte rn a te m eans. The m a in in te r e s t h e re will be

th e CORDIC alg o rith m s fo r to = ± i since m u ltip lic a tio n an d division (i.e.

to = 0) a re in fa c t th e basis of th e convergence c o m p u ta tio n technique and

hence, obviously co m p u tab le.

C onsider th e co n v erg en ce c o m p u tatio n alg o rith m for th e exponential

given by:

/ (x ) = -wex

F ( x ,y ) = y e s

x k+l ~ xk ~ ^ - ak
. Vk +1 = V k°k

Suppose th e q u a n titie s a re all com plex, i.e., le t

x =
and

I I JOfc
Q* = |Ofc|e

Then

F ty -V ) - y e 5*

and th e x tra n s fo rm a tio n becom es:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 122 -

Jtffe+i = - lnofc = j { 4 k - a t ) - In |ofc|


or

&k+i = (v t - a t ) - j l n | ctfc 1 (4.26)

It now follows th at:

V t+ 1 = Vk^k = V k \ O k \ e 3 *k

so th e tra n s fo rm a tio n p a ir is th e re fo re

~ ext + jf In 10 * |
(4.27)
y<fc+1 = % l“fc|e7“fc = y k ak

N otice t h a t &k - ak is re p re s e n ta tiv e of a r o ta tio n th ro u g h ak , th e

a rg u m e n t of ak . It is stra ig h tfo rw a rd to verify th a t F{i}.y) is invariant,

since:

-FOSjt+i.afe+i) = yk+ie3***1 = Vk°-k^k_lno*


-

...
SfKw
J*k

= F fa .y k )

T hese tra n sfo rm atio n s m ay now be u se d to o b tain th e CORDIC

ite ra tio n s . L et

Vk = fit + k [Pk < k \T k X k (4.26)

so th a t th e com plex valued 'yk ' tra n s fo rm a tio n is in fa c t two r e a l valued

tra n s fo rm a tio n s of ’{3k ’ and '(*'• F u rth e rm o re , since th e choice of ak is

a rb itra ry , le t

Ok = 1 4 j 6 k (4.29)

w here, 6k c a n be ch o sen to be a pow er of th e m ach in e rad ix so scaling by

ak c o rre sp o n d s to a sh ift a n d add.

Now su b s titu tin g (4.28), (4.29) in to (4.27) yields th e Xk u p d a te of th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 123 -

CORDIC algorithm , i.e.

1 -<5*
X* (4.30)
fi* 1
where

6k = ta n (Xk

Taking th e re a l p a r t of (4.26) yields th e auxilliary z u p d a te , i.e.,

z k = Re (•#*)

so th a t

z k + 1 = Re(-djfc) - ak

= z k - ak (4.31)

Equations (4.30) a n d (4.31) a re th e CORDIC alg o rith m in th e c irc u la r case.

The c o n n e c tio n b e tw e en th e CCM a n d c irc u la r CORDIC is com pleted

th ro u g h th e d e riv a tio n of th e spurious scale fac to r, K lt m en tio n e d in S ection

4.1 of th is c h a p te r. With in itia l values y 0 = w , i?0 = v a n d d estin a tio n

Re(i5fc) -* 0, (w hich c o rre so n d s to th e te rm in a tio n co n dition x n -» 0 in

th e re a l valued w e x alg o rith m ), th e d e stin a tio n value of F { x ,y ) is:

F (tfu,y u) = y ue _Sln|ail = F (i\ , y 0) = we>*


hence.

_ Elnlai.1
Vw = e *

n i° * i w e
*=0

Now since w is a com plex quantity, th e resu ltin g y w c o rre sp o n d s to a

ro ta tio n of th e w v e c to r th ro u g h an angle •£, how ever w ith a spurious

scale change. The sc ale fa c to r for th e c h o sen sequence, [ak = 1 + j 6 k \.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 124 -

becom es:

* = H 1°*i = n viT^i = k x (4.32)

This is p rec isely th e Ki of E q u a tio n 4.5b. F u rth e rm o re , since

Oj. = l + j 6 k = \ak \e i<Xk, th e a rg u m e n t of a* is

ak = ta n 16ifc (4.33)

an d a g a in th is is p recisely th e in c re m e n ta l ro ta tio n a t e a ch ite r a tio n of th e

CORDIC algorithm .

Remarks:

(1) Since th e s e equations w ere d e riv e d b a s e d on th e exponential

a lg o rith m of th e convergence c o m p u ta tio n m ethod, th e d e stin a tio n

x a -* 0 (which b ecom es -> 0) n a tu ra lly yields th e ro ta tio n m ode

of th e CORDIC m ethod. The v e c to rin g m ode m ay be read ily

o b tain e d by choosing ak su c h t h a t X* of (4.28) is driven to th e

abcissa.

(2) The in v arian t fu nctional in th is c a se is sim ply th e original v e c to r

to g e th e r w ith a com plex p h a se fa c to r w hich "reverses" th e effect

of e a c h in c re m e n ta l ro ta tio n . The im ag in ary p a r t of th e angle

(4.26) "re ve rse s" th e effect of m ag n itu d e scaling a t e a c h ite ra tio n ,

th u s leaving th e original v e c to r in v arian t.

The hyperbolic a n d c irc u la r sy ste m s a re in tim a tely c o n n e c te d th ro u g h

th e a n g u la r conversion

At = (4.34)

Making th is su b s titu tio n in to (4.33) yields th e identity:

6k = ta n ak = —ta n h jnk (4.35)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 125 -

w hich re d u c e s E quation 4.30 to:

1 - t a n h fj.k
Nk+i (4.36)
ta n h /.i k 1

However, 6k = t anh jik in th e hyperbolic sy ste m (th e n o ta tio n is u n fo rtu n a te

since th e sa m e symbol, 5k , is u sed to r e p r e s e n t d iffe re n t qu an tities in

W aither's original p a p e r) so th a t (4.34) beco m es th e u su a l CORDIC ite ra tio n

for 77i = —1:

1 S k
V: + l - (4.37)
<5* 1

Turning now to th e derivation of K - lt re c a ll from E q u atio n 4.32

Jfc=0

o-l
= V l + ta n zak by definition
k=C

- fjb=C
i (Xfg

w hen (ik = j a k

a—i 1
= n — -—
it=o co sh (ijf

O-l
= n V l - tan h 2//*
*=o

= f i 61 = K . x (4.3B)
ik=0

(recall th a t th is la tte r 6k is defined differently th a n fo r th e c irc u la r system )

Sum m arizing, th e convergence c o m p u ta tio n tra n s fo rm a tio n s for

y k +i a re p re c ise ly th e CORDIC equations u n d e r th e analogy th a t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 126 -

Vk = X i

Re(tfjt) = z k

I m ^ ) = -InAi

The te rm in a tio n co n dition Re(i5-k ) •* 0 th e re fo re c o rre sp o n d s to z k -* 0

w hich is th e r o ta tio n m ode of th e CORDIC. It was shown th a t y u in d eed

co rre sp o n d s to a v e c to r ro tation.

In som e se n se , Chen’s m ethod is m o re g e n e ra l th a n th e CORDIC

tech n iq u e sin ce ~$k is an au g m en ted v ersion of zk . I t is stra ig h tfo rw ard to

derive th e CORDIC equations fro m th e convergence c o m p u ta tio n

tra n sfo rm a tio n s b u t n o t vice versa. Note th a t th e CORDIC alg o rith m yiei ’. j

som e ad d itio n al in sig h t into its m ore g e n e ra l c o u n te rp a rt.

4 .7 A GENERALIZED CONVERGENCE COMPUTATION METHOD


AND THE CORDIC CONNECTION

The previous se ctio n estab lish ed th e CORDIC alg o rith m s as a special

case of a slightly generalized m eth o d of Chen, nam ely, th e inclusion of

com plex valued functions. However, t h a t a p p ro a c h was c u m b e rso m e since it

re q u ire d th e in clu sio n of a com plex angle, whose im ag in ary p a r t

m ain ta in e d th e in v arian ce of F(i>,y). The convergence c o m p u ta tio n m eth o d

will now be f u r th e r generalized to v e c to r valued functions, th u s

circum venting th e n e e d for com plex q uantities. This a p p ro a c h will also

re la x th e in v arian ce re q u ire m e n t on F t y ,y ) .

Suppose t h a t it is d esired to c o m p u te a v e c to r valued fu n c tio n of a

s c a la r q u a n tity (as p e r usuai, boldface q u a n titie s are v ectors):

z0 = f (x)\x=Xg

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 127 -

In tro d u c e a v ecto r, Y. a n d form th e v e c to r v alu ed functio n F(x ,Y) su c h

th at:

(1) T here e x ists a knovm initial value Y = Y 0 satisfying F (x 0 ,Y0) = z 0.

(3 ) e x is t0 ? nofrv rm afm r\ Q • {^, V.- ^ V,_ t

s u c h th a t F(xt+1,Yjt+i) is re la te d in a known, invertible m a n n e r to

F o r exam ple

(a) F(xfc+1,Y i+1)= F(rt ,Yjfc) V i.e.. F is invariant.

(b) F(xJk+1,Yfc+1)= cfcF(xfc,Yjt) w ith know n ck i* 0 V jfc>0

(3) T here ex ists a k n own d e stin a tio n x u r e a c h e d u n d e r G, su c h t h a t

x - » x u im plies Y -» Y u = F (x u,Ya) = g ( z „ ) with g ( ) a know n

fu n ctio n having a single valued in v erse. F o r exam ple

(a) F ( x*+1,Y*+1)= F(xfc,Yjfc) V f c 2 i0 t h e n g ( z o) = z o.

(b) F(xi+ 1.Yfc+1)= cfcF(xfc,Yjfc), t h e n g ( z 0) = ck

A lthough th e invariance r e s tric tio n o n F h a s b e e n re la x e d in a v e ry

g e n e ra l m a n n e r, sim ple v a ria tio n s of F, lik e th e exam ples above, will prove

quite useful in p ra c tic e . Sim ple v a ria tio n s will n o t im pose u ndue h a rd sh ip

on th e im p lem en tatio n , w ith r e s p e c t to th e in v ersio n of g . In p a rtic u la r, it

will now b e shown t h a t v a ria tio n (b) in ite m s (2) a n d (3) allows an e x tre m e ly

sim ple c o n n e c tio n to CORDIC's, w ithout th e n e e d fo r com plex v alued angles.

V e c to r ro ta tio n is d e sc rib e d by th e function:

f (x ) = A (x) Q

w here 0 is a know n v e c to r c o rresp o n d in g to in th e CORDIC alg o rith m s and

A is a g e n e ra liz e d r o ta tio n m a trix th a t ta k e s th e form :

CS X —7TL-si X
.si x cs x .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 128 -

in which:

m is th e p a ra m e te r of th e c o o rd in a te sy stem , defined in S ection

4.1

x is th e g e n eralized angle of th e CORDIC a lg o rith m

cs x , s i x a re th e cosine a n d sine of x in th e g eneralized

co o rd in ate system .

Define:

F (x ,Y ) = A (x )Y

so choosing Y 0 = 0 yields z 0 = F (x 0 ,Y0). th u s satisfying condition (1).

Next, choose th e tra n sfo rm atio n , G: {xk ,Yfc)-»(xjt+1,YJfcx1) as

x k+\ - Xk ~ a k
1 —721
Yjt
6k 1

w ith ak = i n ' 1 6k , w here tn ~ l is a g e n e ra liz e d in v erse ta n g e n t (defined for

exam ple in E quation 4.2 fo r th e CORDIC alg o rith m s) an d 6k is a s e t of

a rb itra ry co n sta n ts, analagous to th e ak in C hen's m ethod.

It is now easy to verify th a t

FCxjb+j.Yjt+i) = ck F(xjfc,Yfc)
with

Cjfc = l / c s CCjfc
since

F ( a r j f c + i,Y j b + j ) — A (x jk +1) Y * + i — A ( x j t ~ a * ) Y jt+ i

1 —m.-5k
= A (xk - a k )
<51 1

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 129 -

CS Ok —771 SI at.

cs a*- A fa-otfc) s i Ok c s ak

—A(zfc) Yjt , b y lin e a rity of a rg u m e n t of A ( x )

1 -F fot.Y *)
c s ak

th u s satisfying con d itio n (2b).

Finally, co n d itio n (3b) is satisfied by choosing x u = 0 so th a t:

F (2 d'Y u) = IY U = Y a = Tf
fc*C

w here I is th e 2x2 id e n tity m atrix (This final eq u a tio n assum es th a t cs x = 1

an d s i x = 0 w hen x = 0, in all th e c o o rd in a te sy ste m s u n d e r consideration.

This is c e rta in ly th e c a se fo r th e CORDIC alg o rith m s). The l a t te r p a r t of th e

equality is. of course, th e consequence of ite ra tin g th e Y tran sfo rm atio n s.

R em ark :

N otice t h a t th e tran sfo rm atio n , G , is e x a c tly th e CORDIC equations a n d

in fact, all th e CORDIC functions a re c o m p u te d th ro u g h th e a p p ro p ria te

choice of th e c o o rd in a te system , th a t is, th e choice of ex*, 6k and hence,

cs x a n d s i x . The functions co rresponding to v e cto rin g in th e CORDIC

alg o rith m a re also rea d ily obtained by in te rc h a n g in g th e significance of x

an d Y in co n d itio n s (1) - (3) and forcing Y to a d e stin a tio n value on th e

abcissa. In th is connection, th e tru e g e n e ra lity of th is m ethod is a p p a re n t

since no d istin c tio n is m ad e betw een x a n d Y . R a th e r th a n s ta rtin g w ith a

function, f (x ), of a single variable, sim ply b eg in w ith a function, F (x,Y ), of


two in d e p e n d e n t v ariables, e ith e r of which m ay be con tro lled ( This is

p rec isely w h at th e CORDIC equations do, th e two in d e p e n d e n t v ariab les

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 130 -

being th e v ecto r, X , or th e angle $ ).

Sum m arizing, th e foregoing g en e ra liz a tio n provides a convergence

m eth o d which:

(1) re d u c e s to C hen’s m e th o d in th e sc a le r case, w hen F ( x , y ) is

in v arian t.

(2) provides ad d itio n al usefu l functions, including th e CORDIC

fu n ctio n s, w ithin a single c o m p u ta tio n a l fram ew ork.

To th e a u th o r's know ledge, th e g e n e ra liz e d ro ta tio n s an d v ectoring have

n e v e r b e e n c o m p u te d w ith C hen’s m eth o d before.

4.7.1 E x am p les o f t h e G e n e raliz e d T e c h n iq u e

The CORDIC eq u atio n s have a lre a d y b e e n shown to a rise fro m th e

g e n eralized c o n v erg en ce c o m p u ta tio n m e th o d w hen th e in v arian ce

r e s tric tio n on F is rela x ed . F u rth e r exam ples will now be given, in w hich F

is tru ly in v arian t. In th e s p irit of th e com m e n t ending th e previous section,

th e fu n ctio n z will be ignored, r a t h e r a fu n ctio n F (x ,Y ) will b e .defined

fro m th e o u ts e t a n d e ith e r x or Y m a y be d riv en to a d e stin a tio n value.

N otice th a t th e m a n n e r in w hich th e m e th o d was defined in S ection 4.7 does

n o t r e s tr ic t F fro m being s c a la r o r m a trix valued, a n d in fac t, b o th of th e s e

c a se s a re c o n sid e re d in th e following exam ples:

E xam ple 4 .5 The CORDIC alg o rith m s - th e se w ere developed in d e ta il in th e

previous sectio n .

E xam ple 4 .6 F re q u en tly , it is n e c e s s a ry to divide a quantity, x , by th e

p ro d u c t of two o th e r q u a n titie s £ a n d 77. R a th e r th a n first form ing th e

p ro d u c t £77 a n d th e n using C hen’s m e th o d to o b tain - —; it is p re fe ra b le to

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 131 -

utilize th e g e n e ra liz e d CCM to o b ta in th e re s u lt fro m x , £, 77 d irectly.

Define

Y = (£ , 77) (4.39)

and

F(a.*t,Yjfc) = t ~— (4.40)
SkV t

(Notice th a t £77 c a n b e -w ritten as w here a n d 7t2 a re first an d

seco n d co m p o n en t e x tra c tio n m a tric e s, resp ectiv ely . This, m o re

cu m b erso m e n o ta tio n is ig n o re d h e re .)

The re s u lt F 0 (x0 Y 0) = -r-2— is desired .


SoVo

Choose th e tra n sfo rm a tio n s:

**+i = ** 0 * 6 * (4.41a)
Ifc+i = £*&k (4.41b)
Vk+1 = V kak (4.41c)

u n d e r which F is c le a rly in v arian t.

Now, w ith d e s tin a tio n value Y u = (1,1) is is c le a r fro m (4.41b) an d

(4.41c) th a t

!if * = •/o
*=c
“ri‘ * = jso-
*=0

SO

u-1 a-l X0
*« = *0*=0
n n bi
1=0
= ttt
So Vo

is th e d e sire d re su lt.

The se q u en c e s [ak ] and a re b o th ch o sen to have th e form

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 132 -

1 + 2~m so th a t th e tra n sfo rm a tio n s m ay be im p lem e n te d w ith sim ply sh ifts

an d adds.

E xam ple 4 .7 The g e n e ra liz e d CCM c a n b e u se d to c o m p u te x 0 + l n | 07?0

and x 0 + In £0/ r , 0 w ithout explicitly com puting £07j0 or Vo (o r

altern ativ ely , w ithout w riting In ^sv s = In + ln 770 an d th e n applying

Chen’s m e th o d tw ice to o b ta in two s e p a ra te lo g a r ith m s ). Consider

evaluating

x 0 + In %07}o

b y defining th e functio n

F (xk , Y k ) = xk + In f* Vk

w ith Y as given by (4.39). Then th e tra n sfo rm atio n s

fjfc+l - IfcOfc (4.42a)


= Vkbk (4.42b)
x k — x k ~ In Ok — In bk (4.42c)

leave F invariant.

Again choosing Y u = (1,1) yields

so

In ak — In bk
k=C

= x 0 + In £0 7]0

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 133 -

As w ith th e CCM, ak a n d bk a re ch o sen to be of th e fo rm l+2~m and

a sm all ta b le of In ( l+ 2 -m ) is m aintained.

E xam ple 4.8: Define

Ftafc.Yfc) = x k + I n k /? ? *

The F -invariant tra n s fo rm a tio n s a re

$/fc+l = ak

x k +i — x k In otju

Vk-r i — Vk — Vo

Choosing th e d e stin a tio n Y u = {Vo-Vo) yields

CJ—1
11°*= V o /io
k=o

and

X u — Xg "F (?o / Vo')

Notice th a t choosing Y u = (Vo-Vo) r a th e r th a n Y ^ = (1,1) as in

Exam ple 4.7, allows fo r s im p le r tra n sfo rm a tio n rules. However th e p enalty

in c u rre d for th is sim plification is th a t \ak j m u st be ch o sen su ch th a t Vo is

a re a c h a b le value of which im plies less freedom in th e choice of th a t

sequence.

E xam ple 4.9: M atrix Inversion

Obtaining th e inverse of nonsingular m a tric e s fits th e g en eralized CCM

s tr u c tu r e very well. In th is case, all of x , Y and F a re m a trix valued and

wiii be d en o ted X. Y a n d F resp ectiv ely . Consider th e g e n eral function

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 134 -

F(X*,Y*) = Yt- ‘ Xt

Choose a s e t of known, e le m e n ta ry elim in ato r m a tric e s, ] fo r th e

tra n s fo rm a tio n s :

Xfc+i - C k X k (4.43a)

Y*+1 = Cjt Yjt (4.43b)

It is rea d ily s e e n th a t F is in v arian t u n d e r (4.43) b e c au se

F(Xjt+i, Yfc+1) = Y & X t + i

= Y * 1 Cfc1 Cfc Xk = F(Xfc.Y*)

Now choosing th e d e stin a tio n Y a = I it is c le a r th a t

n 1. c* = Y r 1
Jk=0

w here JJ* d e n o te s left m a trix p ro d u ct. Then

xu = Jt=0 CfcXo = Yo'1^


If X0 = I th e n Xu = Y 0- .

K em ark : This m ay be viewed as th e m a trix c o u n te rp a rt of

c onvergence division.

The ch o ice of is n o t a trivial m a tte r since th e m a tric e s m u st be of

sim ple fo rm in o rd e r to m ake th is schem e p rac tic a l. A sim ple 2x2 m a trix

exam ple provides som e guidelines for a n obvious choice of Ct (b u t p robably

n o t th e b e st).

L et Y„ = y u V iz and le t C* = I + C* w ith C* = 0 0
Vzi Vzz 0 2~m

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Apply th e se q u en c e of su c h C* tra n s fo rm a tio n , say 'n' in n u m b er, to

o b ta in

Y = 3/n V iz
n y'zi y'zz
fi n
N ext apply C„ = ^ to o b tain

_ y 11 v i z

Now choose a seq u en ce = I + C*$*=71+1 w hich will yield

y - [y u y 12
Y" " 1 " 0 y-22

a n d apply

rU"-2 -~ f1 _1'
[0 1 .

to diagonalize Y u.

All of th e se tra n sfo rm a tio n s a re e a sily im p le m e n te d w ith sh ifts an d

ad d itio n s alone, an d r e s u lt in Xu w hich is clo sely re la te d to Y-T1 by known

c o n sta n ts.

E xam ple 4.10: Solution of a lin e a r S ystem of E quations

C onsider th e s y s te m A0x = b 0 to be solved fo r z . Define

F(Ajfc, b*) = AjfcX — b t = 0

a n d th e tra n s fo rm a tio n s

Ajt+i - CfcA* (4.44a)

b«e+l - Cfcb* (4.44b)

w ith C* as defined in th e previous exam ple. Then F is in v a ria n t since:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 136 -

^’(Ai+i, b fc+i) - Cfc(Ajfcx —b ,t) - 0 - F(Ajfc,bi;)

N ext choose th e d e stin a tio n Au = I yielding (from 4.44a)

n 1. c * = a^1 .
fc=0

Then (4.44b) b ecom es

bu = Cfcb0
fc=0

= A0- 1b 0

= x

T herefore b a converges to th e so lu tio n v e c to r x.

R em ark : It is also possible to have A, b converge to a value from

w hich x is rea d ily a tta in a b le . A sim ple exam ple of th is is w hen x c a n be

o b tain ed by b ack -su b stitu tio n . In th is case, th e d e stin a tio n value of A is

A„ = U w here U is any u p p e r tria n g u la r form . Thus fro m (4.44a)

U = T f - CfcA0
t=c

an d (4.44b) b eco m es

b u = U A " 1^

= Lb0

w here L is a n u p p e r tria n g u la r form . By v irtu e of th e invariance of F,

Fu = 0 = U x — L b0 = U x - bu

w hich ca n be solved by b a c k su b stitu tio n . In th is case, th e ch o sen sequence

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 137 -

of e lim in a to r m a tric e s provides a lower- u p p e r tria n g u la r decom position of

A. An orth o g o n al decom position m ay b e rea liz e d by choosing orthogonal

elim in ato r m a tric e s, in p a rtic u la r th e CORDIC m a tric e s.

T here a re n u m ero u s o th e r functions of m ultiple a rg u m e n ts which ca n

b e co m p u te d using th e generalized CCM s tr u c tu r e . The key to th e p ra c tic a l

u tility of th e m e th o d is, of course, th e e x iste n ce of c o n v en ie n t

tra n sfo rm a tio n s w hich allow F to v a ry in a known, invertible m anner.

However, th e th e o re tic a l im p o rta n c e of th e g e n eralized CCM lies in providing

a unified s tr u c tu r e u n d e r which th e c o m p u ta tio n of m any seem ingly

u n re la te d fu n ctio n s c a n be studied.

CHAPTER SUMMARY AND CONCLUSIONS

This c h a p te r h a s b e e n som ew hat le n g th y a n d m e rits a su m m ary of th e

im p o rta n t developm ents. The m otivation fo r studying n u m erica l algorithm s

a ro se fro m th e n e e d to c a lc u la te a v a rie ty of e le m e n ta ry functions

ap p earin g in th e signal p ro cessin g alg o rith m s of th e previous c h a p te rs, fo r

w hich th e CORDIC an d Convergence C om putation M ethods (CCM) ap p e are d to

b e prom ising tech n iq u es. It was n o te d t h a t th e convergence p ro p ertie s of

th e CORDIC algorithm s w ere n o t a d e q u a te owing to lim ited regions of

c onvergence an d th e existence of sp u rio u s scale facto rs. C ircum venting

th e s e p ro b le m s w ith ideas appearing in th e lite ra tu re [Wa7l] [HT80]

in c u rre d la rg e a m o u n ts of h ardw are an d s p e e d overhead, in fa c t as larg e as

a 120% s p e e d penalty. However, it was shown t h a t th ro u g h h ardw are

sharing, th e scaling technique of Haviland e t al. [HT80] could b e realized in

hard w are w ith m inim al speed p e n a lty a n d only a m o d est in c re a se in

circu itry .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 138 -

The m o st a ttra c tiv e technique fo r b o th scaling and in creasing th e

reg io n of convergence, was b ased on a new m e th o d which utilized only th e

re g u la r CORDIC ite ra tio n s. It was shown t h a t th e convergence p ro p e rtie s

could be c o n tro lle d th ro u g h th e a p p ro p ria te s e le c tio n of th e sequence, \Fi],

an d a m e th o d w as given to g e n e ra te good se q u en c e s which w ere g u a ra n te e d

to sa tisfy th e convergence c rite rio n of th e CORDIC algorithm s. The effects of

th is new m e th o d o n angular reso lu tio n w ere also studied.

CORDIC a lg o rith m s ten d to be rela tiv e ly slow due to th e ir ite ra tiv e

n a tu re ; in fac t, th e y g e n e ra te one bit equivalent p rec isio n p e r ite ra tio n . A

new sc h em e know n as th e H ybrid CORDIC m e th o d was developed which

com bined th e ad v an tag es of tab le lookup an d th e CORDIC algorithm s. This

sc h em e show ed im proved sp eed p e rfo rm a n c e w ith only m o d est am o u n ts of

sto ra g e . When c o m p a re d w ith a c o m p le te ly ta b le lookup ap p roach, th e

h y b rid CORDIC m e th o d realizes an ex p o n en tial re d u c tio n in sto ra g e w ith

only a lin e a r in c re a s e in ex ecu tio n tim e . A T aylor series app ro x im atio n

a p p ro a c h to h y b rid CORDICs provided eq u iv alen t sp e ed p erfo rm an ce w ithout

a n y ad d itio n a l s to r a g e .

F loating p o in t CORDIC (FLORDIC) a lg o rith m s were developed, b a se d

e n tire ly on floating po in t operations. These w ere conceptually sim p ler to

im p lem e n t th a n th e ir fixed point c o u n te rp a rts b e c a u se th e n e e d fo r a larg e

s h ifte r was e lim in ated . It was also show n t h a t floating point calculations

could b e p e rfo rm e d w ith fixed point CORDIC algorithm s. Floating p o in t

r e p re s e n ta tio n s provide g u a ra n te e s on th e convergence p ro p e rtie s of th e

alg o rith m s owing to th e lim ited dynam ic ra n g e of th e m an tissa.

The C onvergence C om putation M ethod was shown to be in tim a te ly

r e la te d to th e CORDIC algorithm s. By gen eralizin g th e CCM to v e c to r valued

fu n ctionals a n d relaxing th e invariance c o n s tra in t, a very sim ple deriv atio n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 139 -

of th e CORDIC eq u atio n s was obtained. H ence, a unified s e t of equations

(and th e re fo re , also a single c o m p u ta tio n s tr u c tu r e ) provided m any

functions of in te r e s t. Hew functions in clu ding m a trix o p eratio n s ca n b e

co m p u te d w ith th e g e n eralized CCM. Finally, th e CCM was also s e e n to be

conducive to a h y b rid a rc h ite c tu re .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 140 -

APPENDIX

I t is n e c c a s s a ry to prove th a t th e auxilliary e q u atio n of th e CORDIC

alg o rith m w hen defined as:

*i+i = z-; - ejUiOi

provides th e functions of F igure 4.2 w hen £ = 1 while s = —1 re s u lts in th e

re v e rs e d sig n fun ctio n s of F igure 4.3.

W hen £ = 1, th e n o rm al auxilliary e q u a tio n a s defined in [Vo59] is

o b tain e d so th e functions of F igure 4.2 Eire g e n e ra te d . Now, le t th e initial

value of z be d e n o te d 'z 0'. When £ = —1 , z n -» 0 im plies:

n
« = £ = -* 0 (A.l)
i=0

W alther [W a7l] showed th a t th e so lu tio n to th e CORDIC difference equations

(i.e. E q u atio n 4.3) is:

x t. - K [ x o cos(V7rT a ) 3 V m sin (V m a) ]
— /0

yn - K[ y 0 c o s (V m a ) + x 0^fm. sin (V m a) ]

w here x Q a n d y 0 a re th e initial values of x a n d y respectively. S u b stitu tin g

(A.1) yields:

x n = K [ x g co s(—VnTzo) - y 0>fm. s in ( - V m z 0) ]
= K[ x 0 co s(V rn z0) + t/0 V m sin (V m z 0 ) ]

y n = K[ y 0 c o s + 2 ,% /m s in ( - V m z 0) ]
= K[ y g c o s(V m 2 0) - x 0\ f m sin (V m z 0) ]

S u b stitu tin g in th e various values of m y ield s th e fu n ctio n s of Figure 4.3.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 141 -

BIBLIOGRAPHY

[Ah82] H. Ahmed, "N um erical T echniques fo r th e ESL Boundary Cell,"

in te r n a l report, ESL Inc., S an Jose, CA, 1982.

[AMI79] A m erican M icrosystem s Inc., S ig n a l Processing Peripheral

R e fe re n c e M anual, 1979.

[AMLABl] H.M. Ahmed. M. Morf, D.T.L. Lee a n d P.H. Ang. "A VLSI S peech

A nalysis Chip S et B ased on Square-R oot N orm alized L adder

F orm s," Proc. 1981 ICASSP, A tlanta, GA, Mar.-Apr. 1981, pp. 648-

653.

[BellBl] B ell S ystem . Technical Journal, Vol. 60, No. 7, p a r t 2, S ep tem b er,

1981 (e n tire issue)

[CET62] D. C antor, G. E strin, R. Turn, "L ogarithm ic and Exponential

F u n c tio n E valuation in a V ariable S tru c tu re Digital C om puter,"

IR E Trans, on E lectronic C om puters, Vol. EC-14, 1965, pp. 85-86.

[C h7l] T.C. Chen, "A utom atic C o m p u tatio n of Exponentials, Logarithm s,

R atios and Square Roots," IB M Journal o f R esearch and

D evelopm ent, July 1972, pp. 380-388.

[DeL70] B. d e Lugish, "A Class of A lgorithm s fo r A utom atic E valuation of

C e rta in E lem en tary F unctions in a B inary C om puter," Technical

R eport No. 399, U n iversity o f Illin o is, Dept, of C om puter

S cience, June, 1970.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 142 -

[FaBl] M. Farmwald, "On th e D esign of High P erfo rm an ce Digital

A rithm etic Units," Ph.D D issertation, S ta n fo r d U niversity, Dept,

of C om puter Science, 1981.

[Go64] R. Goldschm idt, "A pplications of Division by Convergence," M .S.

D issertation, M assachusetts I n s titu te o f Technology, Dept, of

E le ctric a l Engineering, June 1964.

[HT60] G. Haviland, A, Tuzynski, "A CORDIC A rithm etic P ro c esso r Chip.”

IE E E Trans, on Com puters, Vol. C-29, No. 2, F ebruary, 1980.

[Hw79] K. Hwang, C om puter A rith m etic, P rinciples, A rchitecture a n d

Design, J. Wiley, 1979.

[KNSYM80]Y. Kawakami, T. Nishitani, E. Sugim oto, E. Yam auchi, M. Suzuki,

"A Single-Chip Digital Signal P ro c e sso r for V oiceband

A pplications," Proc. o f In t'l. S o lid S ta te C ircuits Conference, S an

Francisco, CA, 1980

[Me62] J. Meggitt, "Pseudo Division and P seudo M ultiplication

P ro c e sse s,” IBM Journal o f R e se a rc h and D evelopm ent, Vol. 6,

1962, pp. 210-226.

[SK71] B. S a rk ar, E. K rishnam urty, "E conom ic Pseudo-division P ro c e sse s

for Obtaining Square Roots, L ogarithm and A rctan," IE E E

T ransactions on Com puters, Vol. C20, Dec. 1971, pp. 1589-1593

[Sp65] W. S pecker, "A Class of A lgorithm s for l n x , e x p x , sin x , cos x ,

ta n -1 x and c o f-1 x IR E Transactions on E lectronic

Com puters, Vol. EC-14, 1965, p p . 85-86.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 143 -

[Vo59] J.E. Voider, "The CORDIC T rigonom etric Computing T echnique,”

IR E Trans, on. E lectronic C om puters, Vol. EC-8, No. 3, pp. 330-

334, Sept. 1959.

[W a7l] J.S. W alther, "A Unified A lgorithm for E lem entary F u nctions,"

Proc. o f the 1971 S p rin g J o in t C om puter Conference, p p . 379-

385.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 144 -

CHAPTER FIVE

PARALLEL PROCESSORS FOR LINEAR ALGEBRA

L in ear a lg e b ra o p eratio n s fo rm a m a jo r co m p o n en t of m any d iffe re n t


<

com plex signal p ro cessin g ta sk s, how ever, m a trix alg eb ra algorithm s a re

th em se lv e s fre q u e n tly quite com plex, re q u irin g fo r instance, a n u m b e r of

o p e ra tio n s w hich is polynom ial in th e m a tr ix o rd e r. As a resu lt, som e

a u th o rs have b e e n p ro m p te d to c o n s tru c t larg e a rra y s o r m e sh e s of

p ro ce ssin g e le m e n ts (see e.g. [SK75] [Ch75] [KL80]). However, all of th e s e

efforts have b e e n b a s e d on th e use of f a s t m u ltip lie rs as th e c e n tra l e le m e n t

of e a c h p ro c e s s o r in th e m esh. Once again, g en eralized ro ta tio n s, in

p a rtic u la r th e CORDIC o p erations, a re fu n d a m e n ta l to a larg e n u m b e r of

alg o rith m s w hich a re com m only u sed to p e rfo rm m a trix o p eratio n s like

fa c to riz a tio n a n d eigenvalue decom position. The p rim e in te r e s t of th is

c h a p te r will b e th e synthesis of larg e a rr a y s of p ro c e sso rs which a re c a p ab le

of exploiting th e in h e re n t p a ra lle lism of a n u m b e r of m a trix a lg e b ra

a lg o rith m s of in te re s t. F o rtu ito u sly a few sim ple s tru c tu re s will be

sufficiently g e n e ra l purpose to im p lem e n t all th e algorithm s co n sid ered .

Individual p ro c e s s o rs of the a rr a y will b e b a s e d o n CORDIC o p erations. By

m atc h in g th e o p e ra tio n s an d th e m esh co n n ectio n s closely to th e

alg o rith m s, m o re efficient s tr u c tu r e s will be realized th a n have b e e n

r e p o r te d in th e lite r a tu r e to d a te .

VLSI s tr u c tu r e s m u st be re g u la r in n a tu r e in o r d e r to m anage th e la rg e

d esig n c o m p le x ity afforded by th e technology. This is p a rticu larily tr u e of

in te rc o n n e c tio n s w hich not only co n su m e th e m ajo rity of chip a re a b u t also

re d u c e o p e ra tio n a l sp e ed due to th e ir c a p ac itiv e loading. T herefore, all

a rr a y a rc h ite c tu r e s to be derived h e re will b e r e s tr ic te d from th e o u ts e t to

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 145 -

have re g u la r s tr u c tu r e s w ith local co m m u n icatio n s p a th s w here p ro cessin g

ele m en ts in th e a rr a y m ay only co m m u n ic a te w ith n e a re s t neighbours. This

re s u lts n a tu ra lly in p ipelined im p lem en tatio n s.

This c h a p te r will c o n sid er th e solution of a sy ste m of lin ear equations,

Cholesky fa c to riz a tio n an d eigenvalue d ec o m p o sitio n [SB80], which a re th e

m o st com m on m a trix a lg e b ra p ro b lem s a risin g in signal processing and

s ta tis tic s am ong o th e r a re a s. A form al m e th o d fo r co n stru c tin g and

analyzing la rg e a rra y s, so as to g u a ra n te e so m e g e n e ra l applicability of th e

s tru c tu re , will b e given.

5.1 CHOLESKY- FACTORIZATION

A m ajo r p ro b le m of in te r e s t in lin e a r le a s t-s q u a re s e stim a tio n is to

o b ta in th e C holesky fa c to rs of a n nxrc. T oeplitz m atrix , T, w hich is

fre q u e n tly th e cov arian ce m a trix of a s ta tio n a ry sto c h a stic process.

The C holesky d ecom positio n of T a s LLr w ith L low er tria n g u la r c a n

b e o b tain ed in a com p u tatio n ally e S icien t m a n n e r using th e F ast Cholesky

alg o rith m s developed by [Mo74], [MLNV77], [LeRG77]. In designing VLSI

s tr u c tu r e s to re a liz e th e se algorithm s, it is im p o rta n t to utilize th e th e o ry in

a m a n n e r w hich is conducive to im p lem e n ta tio n . In p a rtic u la r, rec u rsiv e

fa s t Cholesky alg o rith m s ex ist for g e n e ra tin g L e ith e r by colum ns o r by

rows. However, th e fo rm e r alg o rith m re q u ire s a c c e s s to ail ‘n ‘ e n trie s (th e

e n tire first colum n) defining T, while th e l a t te r u se s th e e le m en ts

sequentially, which, in a re a l tim e application, could c o rre sp o n d to using th e

co v arian ce in fo rm a tio n in sequence a s it 'a r riv e s'. Hence, th e row

re c u rsio n s a re m o re su itab le fo r im p le m e n ta tio n fro m a d a ta a c c e ss

viewpoint. A lternately, th e rec u rsio n s by co lu m n s induce a la d d e r form

s tr u c tu r e [LeRG77], [DM80] th u s m aking th is a lg o rith m a ttra c tiv e b e c a u se

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 146 -

la d d e r fo rm s exhibit n a tu ra l pipelining. No su ch la d d e r re c u rs io n for th e

row s a p p e a rs to exist.

In th is section, a lad d e r form fo r th e rec u rsio n by rows is d erived which

n a tu ra lly su g g ests a pipelined a rc h ite c tu r e b a sed on e le m e n ta ry ro ta tio n s,

allowing th e use of a lin e a r a rra y of CORDIC p ro ce sso rs. This is a n exam ple

of using th e th e o ry in an a p p ro p ria te fram ew ork for im p lem en tatio n , since

now b o th a sim ple d a ta access sc h em e a s well as a pipeline a rc h ite c tu r e a re

defined by th e sam e s e t of rec u rsiv e equations.

5.1.1 F ast Cholesky by Rows in Ladder Form

It is e a sie st to derive th e la d d e r fo rm for th e row re c u rs io n s by

exam ining th e fa s t Cholesky alg o rith m by colum ns [Mo74]. Let:

T = [tj : t 8 ; • • • : t B] (5.1)

0 0 0 . . . . 0
10 .
0
z = = colum n s h if t m atrix (5.2)

Lo . . . . o i o .

e = [i j o ; • • • ; o y (5.2b)

tk l = Ith elem en t of c o lu m n t*

Then th e colum n re c u rsio n s in n o rm alized form are:

Initialization:

[ c i : c 2]° = [ t i | t j - f n e j / V f n (5.3)

R ecursion:

[c i : c 2]*+1 = [Z c f i c |] 0 fc , (5.4)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 147 -

w here

cosh-i?* siELhtfjfc
©* =
sin h cosh i?fc

is /-o rth o g o n a l w ith

= ta n h -1 -~ C.3^ -2 .
e lm

Then th e d e sire d Cholesky facto r.L , is given in colum n p a rtitio n e d fo rm by:

L = [c? | c f | • • • I e f -1] •

N otice th a t th e re c u rs io n only re q u ire s knowledge of th e first co lu m n of

th e Toeplitz m atrix . T. It c a n b e shown th a t (5.4) is th e C h a n d ra se k h a r

eq u a tio n fo r a m oving av erage p ro cess, i.e. w here T is a b a n d e d m a trix

[Mo?4], [MKD74]. In th is algorithm , th e co lu m n of L is o b tain e d by

p e rfo rm ing a J-orth o g o n al tra n s fo rm a tio n (or a CORDIC hyperbolic ro ta tio n )

on th e row v e c to rs of [Z c f j c | ] to ann ihilate c | i+ 2 - However th is

re c u rs io n on th e colum ns of L n a tu ra lly induces a no rm alized re c u rs io n fo r

its rows as shown in F igure 5.1. The arrow s in d ic a te which ro ta tio n angle is

applied in going fro m one colum n to a n o th e r o r fro m one row to th e n ex t.

Unlike th e colum n a lg o rith m w hich u se s a single ro ta tio n m a trix for e a ch

rec u rsio n , th e l a t t e r a lg o rith m req u ires several d istin c t ro ta tio n s to

co m p u te a single row. This fa c t to g e th e r with th e shift p ro p e rty (in d ic a te d

by th e arrow s in F igure 5.1) su g g ests a la d d e r form re a liz a tio n using J-

o rthogonal sectio n s w hich m ay b e w ritte n as:

Initialization:

Ci = 0 , V k

v 0.{ = V o.t = £ = 0, 1 .2 ....... 7 i — l

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 148 -

R ecursion:

fo r fc = 0 to £-1 begin:

= 0>

end;

■with Qk as defined e a rlie r an d = ta n h -1


7)k*

T hen fo r L ~ [ l j \ !■{ ; ■ • • ; ^ T -i]r , th e row s a re defined by:

= [V o.i V l.( ■■■ V n -u l ■

In th is algorithm , th e re c u rs io n on ’k ’ is th e o rd e r u p d a te of th e

la d d e r while ite ra tin g on £ c o rre sp o n d s to th e tim e u p d a te . The tem p o ral

se q u en c e h e re , is th e se q u en tial a c c e s s of th e covariance in fo rm a tio n (th e

e n trie s of tj), w hich m ay well b e c o m e available a s a tim e s e rie s in a re a l

a p plication.

The la d d e r fo rm consisting of J-o rth o g o n al sections is shown in Figure

5.2 a n d it m ay be rea d ily im p le m e n te d using th e lin ear pipelined a rra y of

F ig u re 5.3. D ata e n te rs th e pipe only via th e le ftm o st p ro c e sso r an d th e k th

p ro c e s s o r p ro d u ce s zero o u tp u ts u n til £ = fc+ l w hen it c a lc u la te s and

s to r e s i^jc a n d p ro d u ces ?]Jc+ijc+i a n d i/jt+ijt+i- T h ereafter, all e n te ring d a ta

is r o ta te d th ro u g h . N otice t h a t is only c a lc u la te d once.

To sum m arize, th is se ctio n h a s shown how a novel a rc h ite c tu re is

o b ta in e d th ro u g h an in tim a te c o n n e ctio n b etw een th e o ry and

im p lem e n ta tio n . Specifically, a new la d d e r fo rm s tru c tu re fo r th e fa st

Cholesky alg o rith m by rows was d e riv e d to exploit th e n a tu ra l pipelining of

la d d e r s tr u c tu r e s and to o b tain a n e le m e n t b y e le m en t d a ta a c c e ss schem e.

The n a tu ra l op eratio n s defining th e alg o rith m w ere /-ro ta tio n s , h en ce a

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 151 -

p ip elin e d lin e a r a rra y consisting of CORDIC p ro c e s s o rs was em ployed. It is

im p o rta n t to n o te t h a t th is la d d e r s tr u c tu r e tu r n s out to b e th e sam e as for

th e Levinson a lg o rith m in la d d e r fo rm a n d th e fa s t Cholesky by colum ns in

la d d e r fo rm £DMSQ) algorithm ! m p a rtic u la r, th e row and co lu m n re c u rsio n s

fo r th e Cholesky fa c to rs becom e equivalent w hen pipelining is in tro d u ced .

C onsequently, th e pipelined lin e a r a rra y is a unified VLSI re a liz a tio n of all of

th e s e algorithm s, suggesting a good m a tc h of alg o rith m s to a rc h ite c tu re .

5.2 SOLUTION OF LINEAR SYSTEMS OF EQUATIONS

C onsider th e m a trix equation

Ax = b (5.6a)
w here,

A: is th e coefficient m a trix of dim en sio n n x n

x : is the- n-dim ensional v e c to r to b e d e t e r m in ed

b: is a known v ecto r

P o p u la r m e th o d s of solving for x , invariably fa c to r A s u c h t h a t a t le a s t

one of its fa c to rs is of sim ple form , e.g. u p p e r tria n g u la r. When su ch

fa c to riz a tio n is done w ith e le m en ta ry row a n d colum n o p e ra tio n s ap p lied to

b o th sides of (5.6), th e re d u c e d sy ste m

Ux = c , w ith U u p p e r tria n g u la r (5.6b)

m ay b e solved b y b a c k su b stitu tio n . F a c to riz a tio n of A is th e m o st

c o m p u ta tio n a lly intensive operation, and will b e of p rim e c o n c e rn h e re . The

p o p u la r G aussian Elim ination m e th o d em ploys row o p e ra tio n s to fa c to r

A = LU w h ere L is lower tria n g u la r. However, n u m erica l sta b ility of LR

p ro c e d u re s (as LU facto riz atio n p ro c e d u re s a re called) d em an d s th e use of

pivoting (e x c e p t w hen A is positive definite), w hich is a p ro c e d u re req u irin g

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 152 -

th e physical exchange of rows of a m atrix. This is an e x tre m e ly

cu m b erso m e ta s k in a rr a y p ro c e s s o r a rc h ite c tu re s [Ku79], [KuS80]. In

c o n tra s t, QR fac to riz atio n s b a s e d for in sta n c e on Givens’ m eth o d , a re

s ta b le w ithout pivoting an d th e re fo re good ca n d id a te s for a rr a y

im p le m e n ta tio n (this was no ted , am ong o th ers, by II.T. Kung a n d C.

L eiserson [KL80]). In th is ap p ro ach , th e facto riz atio n A = QU, w here Q is

orthogonal, is ob tain ed th ro u g h a sequence erf orthogonal tra n sfo rm a tio n s

a pplied to A.

B efore em barking o n a n im p lem e n ta tio n of Givens’ m ethod, it is of

in te r e s t to n o te th a t QR fa c to riz a tio n also c o n stitu te s a significant s te p in

th e eigenvector deco m p o sitio n of a m atrix ,e.g . [SB80]. Im p o rta n t

applications of eig e n v e c to r deco m p o sitio n include beam form ing, sy ste m

identification, s p e c tra l e stim a tio n , e.g. [Sc 8 l ] an d com m unications [VT6 8 ].

In Givens’ m ethod, a n tlxtl m a trix A is o p e ra te d on by a n o rthogonal

m atrix , Q ^. su c h th a t th e ( i ,r ) e le m e n t of A is a nnihilated.

1 0
0 1
cos'iSjj.
1
Qr i = (5.7)
1
s im J jr
0 0

r th col col

Then A is re p la c e d by Q,?A an d th e p ro c e s s re p e a ts . Finally:

a = n n -Q 5 * = qk - (5.8)
r i

w here J"J • d e n o te s a le ft m a trix p ro d u ct.

N otice t h a t m u ltip lic a tio n b y Q,? in effect ro ta te s th e colum n v e c to rs

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 153 -

of

^1 ^r2 ■ ■ • a m

° il ° i2 • • •

th ro u g h v-ri, a very n a tu ra l ap p lication of th e CORDIC tech n iq u e. The

algorithm is su m m arized as follows:

fo r t = l to - begin:

fo r i = r + 1 to n begin:

■6n = - ta n 1(Lir/Qyj. ; Qyj. <- V o ^ + o |;

fo r j = r + 1 to 71 begin;

C O S tfji -sim Jrt ° n


“ii. simJrt C O S tir i

end;

^1
bi
<- c o s - i - s i m 5 rt
sin-^ rt c o st >_-

end;
end;

The ,7 -loop p e rfo rm s th a t r o ta tio n on th e 1 th and r 01 rows which z ero es o u t

Ojj., while th e i-loop re p e a ts th is o p e ra tio n to zero o u t all th e e le m en ts

below a^..

On s ta n d a rd g eneral p u rp o se m achines, Givens m eth o d is m o re

c o m p u tatio n ally intensive th a n G aussian elim ination since it p e rfo rm s a

plane ro ta tio n w here G aussian elim ination calls for only one m u ltip lic a tio n

a n d one addition. However CORDIC algorithm s p e rfo rm a plane ro ta tio n in

no m o re tim e th a n a b it se ria l m ultiplication, w hich is th e p ro p o se d

e m b o d ie m e n t of m u ltip liers in [KuS80] for larg e a rra y s. H ence, w hen

e le m e n ta ry ,.o p a o a tio n s a re co u n ted (an e le m e n ta ry o p e ra tio n being a

m ultiply-and-add o r any o th e r CORDIC op eratio n ), Givens m e th o d has th e

sam e com plexity as G aussian elim ination w ithout pivoting. On

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 154 -

m u ltip ro c e sso r m ach in es, w here pivoting is r a th e r cu m b erso m e, Givens'

alg o rith m becom es an a ttra c tiv e a lte rn a tiv e .

5.2.1 A rc n ite c tu re s f o r G ivens’ A lgorithm

C onsider th e im p le m e n ta tio n of Givens’ algorithm on a lin e a r a r r a y of

'71' CORDIC p ro cesso rs, w here n is th e o rd e r of A; th e p ro c e s s o rs being

p e rh a p s of th e kind to b e d e sc rib e d in c h a p te r six. Given th e row in d ic e s r

and i , th e firs t ta s k is to c o m p u te th e angle an d th e new value of 0^;

th is is done in one CORDIC op eratio n . Next, th e v e c to rs [ a ,,, Oy]r ,

j = r + 1 ...... 71 m u st b e r o ta te d th ro u g h th e angle In o rd e r to p e rfo rm

th is w ith sev eral p ro c e s s o rs in parallel, 1S should be tra n s m itte d

sim ultaneously to all th e s e p ro c e sso rs. However th is involves global

com m unications and is n o t a c c e p ta b le fo r a VLSI im plem entation. Thus, in

th e designs to be p re s e n te d , th e ro ta tio n s a re pipelined on a rr a y s w here

e a c h p ro c e s s o r is able to c o m m u n ic a te only w ith an im m ed iate n e ig h b o r in

one cycle. Notice th a t th e local co m m unications re s tric tio n im p o sed a t th e

o u tse t lead s n a tu ra lly to a p ip elin ed s tru c tu re .

A fully pipelined im p le m e n ta tio n of Givens m ethod o n a lin e a r a rr a y is

shown in F igure 5.4 fo r n = 5 (th e values, and bi ap p e arin g a t th e

o u tp u ts of th e p ro c e sso rs , c o rre sp o n d to th e e n trie s of U and c of E quation

5.6b respectively). The n e e d fo r th e leftw ard an d rig h tw ard d a ta p a th s as

well as th e first-in, first-o u t (FIFO) s ta c k s will becom e a p p a re n t sh o rtly . The

m ovem ent of d a ta in th is a r r a y is v ery n a tu ra l a n d is su m m arized as follows.

The value of ^ , which is th e m ajo r p a ra m e te r in u p dating th e t 01 an d i th

rows and zeroing is always c o m p u te d in th e leftm o st p ro c e s s o r and

p ro p ag a tes to th e rig h t a s th e new e le m e n ts of th e se rows a re c o m p u te d in

sequence. However, fo r a given r , th e ta s k of settin g to z e ro m u s t be

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 149 -

Figure 5 .1 : Recursions Induced on th e Rows o f th e Cholesky

Factors

n-2
-V n -l. Z

Figure 5 .2 : Fast Cholesky by Rows in Ladder Form

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 150 -

D a ta M a n a g e r

----------- A--------------
OLU

d a t a to
le ft

Proc 1 Proc 2 Proc n


d a ta to
rig h t

Figure 5 . 3 : P ip e lin e d Array o f P rocessors

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
- 155 -

“ 55

“54 a45

53 a44 a 35

52 a43 a34 a25

51 a42 a33 a24 a 15

a41 a32 a23 “14

a31 a22 “13

a21 a12

ro c 3

13

14

Lb*

24

a
2b

F igure 5 .4 : F u lly P ip e lin e d Givens Method on a L in e ar Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 156 -

done fo r 7i - r rows requiring th a t th e r t h subrow [tz^ a,./+1 • • • aTn~\

be re s id e n t in th e leftm o st n —r + 1 p ro c e sso rs for e a c h of th e tl —r values

of i . Thus, th e new value of a0- (for e a c h j > r , 7-+1, t i) is co m p u te d in

th e (j - t + l ) t h p ro ce sso r (e.g., a t t = 5 a22 is u p d a te d in p ro c e sso r 1

and re q u ire d in th e sam e p ro ce sso r a t t = 6 for co m puting -$24). Finally,

th e new e le m e n ts of th e i t h row p ro p ag a te left. This m o v em en t is

e x em p lary of th e fac t th a t a s r in cre ase s, th e subrow to b e o p e ra te d on

b eco m es sh o rte r, the leading ele m en ts having a lre a d y b e e n zero ed . Thus

th e e le m e n ts m u s t move left sin ce th e value of ^ w hich is b a s e d on is

always c o m p u te d in th e leftm o st p ro ce sso r.

C onsider now the evaluation 'of th e rig h t h a n d side of th e re d u c e d

system , i.e., com puting c. With re fe re n c e to th e alg o rith m given earlier, it

is ev id en t th a t th e elem en ts of b m u s t b e r o ta te d in p a irs in exactly th e

sam e fash io n as th e rows of A w ere affected. This is re a d ily done in th e

rig h tm o st p ro c e s s o r of th e a rra y as shown in Figure 5.4. When th e angle tin

a rriv es a t th is p ro cesso r, it is u sed to ro ta te th e a p p ro p ria te su b v e cto r of

b . Finally, b a c k su b stitu tio n is p e rfo rm e d to o b ta in th e re s u lt, x. The

r e a d e r is re f e rr e d to [Ku80] an d [De82] fo r b a c k su b s titu tio n m eth o d s on

a rra y p ro c e sso rs.

A few com m en ts regarding th e lin e a r a rra y s tr u c tu r e a r e in order.

F irst, n o te th a t a m em ory m an ag em en t sy ste m is re q u ire d to provide d a ta

to th e a r r a y in th e o rd e r req u ired . In th is p a rtic u la r case, th e m em o ry

m a n a g e r is quite simple, being ju s t a b an k of FIFO's (first in, first out

sta c k s). This is rea d ily se e n from F igure 5.5 w hich shows th e d a ta inputs

re q u ire d by e a c h p ro ce sso r during th e o p eratio n s se q u en c e fo r n = 5. The

d ashed arrow s a re re p re se n ta tiv e of th e leftw ard d a ta m o v em en t in th e

a rray . If e a c h p ro c e sso r is fed oy a FIFO, th e y -in p u t a t e a c h is an

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 157 -

PROCESSOR 1 2 3 4 5 6

* y x y x y * y x y x y

»=1 3)1 321

2 a11 33i a 12 a 22

/
3 a l1 a4i ,a 12 332 a 13 323
' /
r=1 4 a ti 35/
,a 12 a42 3)3 a33 aI4 a24
*
r= 2 5 322 a32 val2 a52 a13 343 a14 a34 a15 a25
j
6 322 342 ,a23 333' a i3 353 314 a44 al5 335 j>2
'— ~V */ /
y
CM

» 352
322 a23 a43 324 334 ’ , 314 354
II

a15 a45 A 63
----------- V ' ----------V ' /
/
r= 3 8 3^3 343 a23 3s3 324 "35 a25 a35' a 1S a55<r /& ) *4
\ _____ ' ------/ - x
bz H

F igure 5 . 5 : Array Input Sequence f o r Givens Algorithm

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 158 -

e le m en t of th e FIFO (th e a:-input is a pseudo in p u t b e c a u s e its valu e is

u p d a te d an d re c irc u la te d in th e sam e p ro ce sso r). The new e le m e n ts of th e

i t h row (for i = r + 1 , r + 2 n) a re p ro p a g a te d le ft a n d th e n in to th e

a sso c ia te d FIFO th e re b y s e ttin g up th e d a ta in p u t for th e n e x t value of V .

At e a c h r , th e b u ffe r le n g th is n - r so t h a t th e FIFO a d d re s s logic m u s t

a c c o u n t fo r th e shrin k in g size of th e buffer, a ta s k which is re la tiv e ly e a sy in

a b it serial, nMOS realizatio n . N otice finally t h a t d u rin g s t a r t u p th e r e a re

som e b la n k e n trie s in th e FIFO's of p ro c e sso rs 2. 3 n. Tnese do n o t

how ever c o rre s p o n d to e m p ty o r w asted m em o ry lo catio n s. This is b e c a u se

a d a ta tr a n s f e r does n o t o c c u r u n til a p ro c e s s o r h a s b e e n a c tiv a te d by th e

rig h tw ard m o v em e n t of Thus fo r exam ple, th e first d a ta tr a n s f e r to

p ro c e s s o r 3 fro m its FIFO is n o t u n til t = 3.

5 .3 COMPLEXITY DISTRIBUTION AND ACTIVITY CHARTS

The p revious se c tio n s have given exam ples of re d u c in g th e te m p o ra l

com plexity of a n alg o rith m th ro u g h th e a d d itio n of c o m p u tin g re s o u rc e s , i.e.

th ro u g h a n in c re a s e in sp a tia l com plexity. W hereas u n ip ro c e s so rs e x e c u te

alg o rith m s in tim e, th e lin e a r a rra y enjoys one sp a tia l d im e n sio n as well. It

is quite n a tu r a l to th in k of enhancing th e e x e c u tio n s p e e d of Givens'

alg o rith m th ro u g h ad d itio n al sp a tia l dim ensions, fo r exam ple, u se a two

dim ensional m e s h of p ro ce sso rs. While th is e n d c a n b e achieved by

inspection, it is th e goal of th is section to d e m o n s tra te by exam ple, t h a t th e

lin e a r a rr a y m ay be em ployed sy ste m atic a lly to c o n s tr u c t h ig h er

dim ensional n etw orks. An a tte m p t a t form alizing th e p ro c e d u re even

f u rth e r will be m a d e in a la t e r section.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.3.1 A ctivity Charts

Givens' a lg o rith m co n sists of th re e n e s te d loops an d could th e re fo re

intuitively m ak e efficient u se of a t le a s t th re e dim ensions . nam ely, one

tem p o ral and two sp a tia l dim ensions. The lin e a r a rr a y h as exploited only a

single sp atial d im en sio n an d a doubly indexed tim e dim ension. It is

in stru c tiv e to show th e a c tiv ity of th e p ro c e sso rs of th e lin e a r a rra y as a

fu n ctio n of tim e a n d th is diagram , shown in F igure 5.6, will be te rm e d th e

a c tiv ity c h a r t. The six individual p ro c e sso rs a re draw n horizontally acro ss

th e page and th e ir o p e ra tio n a l evolution a t e a c h tim e s te p is in d ic a te d

vertically. This c h a r t is a useful tool fo r synthesizing a two dim ensional

a rray .

5.3 .2 A Tw o-dim ensional Array fo r G ivens’ A lgorithm

The goal of th is s e c tio n is to red u c e th e doubly indexed tim e axis of th e

lin e a r a rr a y (in e ffect "tw o " te m p o ra l dim ensions) to a singly indexed one.

by m apping th e co m p lex ity of one index onto a n o th e r sp a tia l dim ension.

This o p e ra tio n is q u ite sim ple an d sy stem atic given th e activity c h a rt of th e

lin e a r a rra y . It is n e c c e s s a ry only to observe th a t:

1) The leftw ard m o v em e n t of d a ta in th e lin e a r a rra y is in effect

p re p a rin g th e in p u t m a trix of th e a lg o rith m for th e n e x t value of

V.

2) F o r in cre asin g 'r' , th e size of th e rows to b e o p e ra te d on d e c re a ses,

m eaning t h a t th e n u m b er of p ro c e s s o rs re q u ire d for th e V

dim ension b e c o m e s continually sm aller.

With th e se fa c ts an d th e activity c h a rt, it is q u ite sim ple to c o n s tru c t a

2-D a rra y w ith well defined o peration, by stack in g a s e t of one dim ensional

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 160 -

PROCESSOR 1 PROCESSOR 5 PROCESSOR 6


*11i i *21
r- 1 A » l!
f-1

r -2 k£*u it

f-3

r-7
r-2

notes- process:*
INPUTS

inactiveprocessor

Figure 5 . 6 : L inear Array A c t i v i t y Chart

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 161 -

a rra y s. Consider th a t for a given "r\ th e lin e a r a rr a y is a one tim e /o n e

sp ace dim ensional s tr u c tu r e whose o u tp u t c a n be fed in to exactly th e sam e

s tr u c tu r e executing ex actly th e sam e o p e ra tio n s (ex cep t th a t th is

s tr u c tu r e is sm aller) fo r th e su b seq u en t 'r '. Sim ply stack in g th e s e lin ear

s tr u c tu r e s and adding th e in te rco n n e c tio n s yields th e tria n g u la r a rra y

s tr u c tu r e of Figure 5.7. An orthogonal d e co m p o sitio n is now p e rfo rm e d in

0 { n ) tim e ste p s c o m p a re d w ith 0 { n z) on th e lin e a r a rra y .

Eem arks:

1) It is im p o rta n t to n o tice th a t th e tria n g u la r a rra y has b e e n

c o n s tru c te d using q u ite a g en eral p rin cip le. No p rio r 2-D s tru c tu re

was assum ed. This h a s resu lte d in a m o re p ro c e sso r efficient

solution th a n beginning with an a ssu m e d c o nfigu ra tio n su c h as th e

re c ta n g u la r o r hexagonal array s. In fact, th e p re s e n t tria n g u la r

a rra y a c tu a lly h a s th e in te rc o n n e c tio n of a re c ta n g u la r s tru c tu re

w ith th e re d u n d a n t p ro ce sso rs rem o v ed as a consequence of th e

synthesis p ro c e d u re (In te re ste d re a d e r s m a y find th e re c ta n g u la r

a rra y s tr u c tu r e in [KR81]. It ca n also b e o b tain e d from th e activity

c h a rt b y ignoring th e shrinking dim ension of th e lin e a r a rra y w ith

in creasing ’r ') .

2) Many a u th o rs e.g. [M u7l] [KuSBO] have n o te d th a t th e "locus" of

active p ro c e sso rs in a m u ltip ro c esso r a rra y m ay b e viewed as a

series of c o m p u ta tio n w a v e fr o n ts rese m b lin g plane waves. It is

in te re stin g to n o te t h a t th e lin e a r a rr a y is m o re efficient th a n th e

tria n g u la r a rr a y in its p ro ce sso r utilizatio n sin ce it co rresp o n d s to

a c u t th ro u g h a h ig h er dim ensional a rr a y along a com p u tatio n

wavefront.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 162 -

F ig u re 5 .7 : A Two Dimensional A rray f o r Givens A lgorithm

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 163 -

3) Only th e tria n g u la riz a tio n of a sq u a re m a trix A has been

co n sid e red . The a rra y s tr u c tu r e s e x te n d n a tu ra lly how ever to th e

p ro b le m of tria n g u la riz in g a n n x p m a trix , e.g. for solving th e

lin e a r le a s t-s q u a re s pro b lem s. A lin e a r a rr a y of 'p' p ro ce sso rs

co u ld b e o p e ra te d exactly a s in F igure 5.4 a n d th e e x e cu tio n tim e

would b e p + n ( n — l ) / 2 units.

4) The p r e s e n t re s u lts could b e g e n e ra liz e d to still h ig h e r dim ensional

a rra y s , p a rtic u la riiy for algorithm s w ith m an y n e s te d loops, since

th e b a sic p rin c ip le involves unravelling th e loops and m apping

th e m o n to s p a tia l dim ensions. F o u r dim en sio n al a rra y s (in which

one d im en sio n is tim e) a re for all p ra c tic a l p u rp o se s, th e lim it in

th is w orld a lth o u g h a b stra c tio n s to still h ig h e r dim ensions m ay

in d ee d p ro v e useful. Algorithm s w ith finite co m plexity enjoy th e

p r o p e rty t h a t th e dim ensional "axes" onto w hich th e com plexity is

m a p p e d a re finite. H ence, m ultiple in d e xing on th e s e axes, i.e.

th e ir r e p e a te d use, is a m ean s for c re a tin g h ig h e r dim ensional

a rra y s in a four dim ensional world. The sim p le st exam ple of a

m ac h in e w hich exploits th is prin cip le is th e u n ip ro cesso r. The

e n tire co m p lex ity of a n alg o rith m is m a p p e d o n to th e tim e axis

a n d c e r ta in se c tio n s of code a re e x e c u te d re p e a te d ly in tim e

r a t h e r th a n p erfo rm in g th e m in sp a ce a s is th e case w ith m esh

c o n n e c te d sy ste m s.

5 .3 .3 D ual A rrays

The lin ear_ an d tria n g u la r a rra y s p r e s e n te d h e re a re p a rtic u la r

exam p les of m apping algorithm ic com plexity on to m an y dim ensions of

co m p u tatio n , how ever th e m apping is not unique. In th e p r e s e n t case of

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 164 -

Givens' algorithm , com plexity lies in th e th re e n e s te d loops. In "u n w in d in g "

th e s e loops, a decisio n was m ad e to m ap c e rta in loops in space, how ever th is

choice was n o t unique. In p a rtic u la r, a n o th e r so lu tio n read ily a rises to th e

lin e a r a rr a y b y m apping th e loop into sp ace a n d p erform ing th e 'j ' loop

in tim e. The a rra y o p eratio n is now d e p ic te d in F igure 5.8 and th e

a s so c ia te d activ ity c h a rt in F igure 5.9. E a c h p ro c e s s o r is d e d ic a te d to a

specific r o ta tio n a n d first c a lc u la te s an d s to re s a n angle (angle in place).

F u tu re in p u ts a re r o ta te d by th a t angle. This a rra y will b e called th e

tim e / sp a ce dual -or sim ply th e dual to th a t of F igure 5.6. While th e

te m p o ra l com plexity is still 0 { n z) w ith 0 ( n ) p ro c e sso rs, th e e x a c t n u m b ers

in d icate n —r tim e step s p e r r-lo o p w ith n + 2 —r p ro ce sso rs w hereas

previously, it was n + 2 - r p ro ce sso rs w ith n —r tim e ste p s, i.e. th e e x a c t

dual. A du al tria n g u la r a rra y to th a t of F igure 5.7 is also rea d ily g e n e ra te d

using th e du al a c tiv ity c h a rt an d th is a rra y is shown in F igure 5.10.

5 .4 A FORMAL APPROACH TO COMPLEXITY MAPPING

Ad hoc tec h n iq u es a re gen erally applied in o r d e r to o b tain an a rra y

a rc h ite c tu re th a t co m p u tes a p a rtic u la r algorithm . H ence, while it is known

th a t th e c h o s e n alg o rith m m ay b e e x e cu te d efficiently on th e a rra y , little

c a n b e said a b o u t th e g en eral applicability of th e a rc h ite c tu re . A step

tow ards a sy s te m a tic c o n stru c tio n p ro c e d u re w as ta k e n in th e previous

sectio n s w here th e a c tiv ity .c h a rt of th e lin e a r a rr a y provided th e ability to

c o n s tru c t a two dim ensional a rra y . The n o tio n of du al a rra y s also provided a

m ean s fo r exam ining a lte rn a te a rc h ite c tu re s . This s e c tio n will be c o n c ern e d

w ith form alizing th e id ea of dual a rra y s as well as th e use of activ ity c h a rts

of lin e a r a rra y s fo r obtaining h ig h er dim ensional s tr u c tu r e s . S tatin g re s u lts

th a t a re g e n e ra l to an y conceivable algorithm is a difficult task , however

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 165 -

proc proc .proc proc

1
rt-

A
1
al l

a12 ,.t= 5

a13 t= 6

a14 t= 7

a15 t= 8

a22 t= 9
C1
a23 t=10

“24 t = ll

a25 t=12

a33 c2 t=13

a34 t=14

a35 t=15

c3 t=16

t=17

t=18

F ig u r e s .8 : O peration o f th e Dual L in ear Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 166 -

PROCESSOR! PROCESSOR 2 PROCESSOR4

r-1
l-l

f-7

NOTES: INPUTS

CTVE
OUTPUTS

F igure 5 . 9 : Time-Space Dual Array A c t i v i t y Chart

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 167 -

fO to

bl a i 5 3 14 3 13 3 12 3 i l

F ig u re 5 .i 0 : Dual T ria n g u la r Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 168 -

s ta te m e n ts c a n b e m a d e ab o u t a p a rtic u la r class of alg o rith m s th a t satisfy

som e p ro g ram .m o d el (re c a ll th a t it was s ta te d in c h a p te r one th a t a goal of

th is d iss e rta tio n was to exam ine a class of p ro b le m s, nam ely signal

p rocessing alg o rith m s). F ortuitously, com plex signal p ro ce ssin g ta sk s often

em ploy a v a rie ty of lin e a r a lg e b ra algorithm s th a t a d m it to th e specification

of a m odel.

A convenient n o ta tio n fo r describing d a ta d e p e n d e n c ie s in program s

will prove useful. Following Kuck [Ku77] and others:

D efinition. 5.1: A basic loop is one in which th e loop body does n o t contain

loops

D efin ition 5.2: The s e t of in p u t variables to a loop, L, i.e.. in d ep en d en t

v ariab les u sed on th e r ig h t h an d side of a ssig n m en t s ta te m e n ts in th e loop

body, is d e n o te d I{ L ) while th e s e t of o u tp u t v a ria b le s is d e n o te d f1(L).

D efin ition 5.3: A loop. L, w ith index s e t {Ih / 2 In ). a t a p a rtic u la r

ite ra tio n is d e n o te d Z ( i lf %z, .... in). This ite r a tio n o c c u rs for l x = i x,

h - i'Z In = in -

D efin ition 5.4: F o r two loops Li.Lj-

(i) Lj is d a ta dep en d en t on 1+, d e n o te d Li 6 L j. if for som e

x € /(£,-)• The co n d itio n x e II(Li) also holds.

(ii) Lj is d a ta in d e p e n d e n t of Li, d e n o te d Li 6 L j, if V z e I{L j),

th e co n d itio n x e Q(Z,j) is satisfied.

(iii) Lj is d a ta o u tp u t d ep en d en t on L , d e n o te d Li 6° Lj, if

x € n ( i j ) is c o m p u te d a fte r th a t of Li

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 169 -

(iv) Lj is in d ir e c tly d a ta d e p e n d en t on Lj , den o ted Lj A L, - , if th e r e

a re a sequence of s ta te m e n ts , $ Sj su c h th a t S 1 6 S 2 • • • 6 S k

an d S 2 £ ijj, Sk € L j.

D efin ition 5.5: D ata D ependencies:

F o r any loop, L, w ith in d ex s e t $/j j ji i a n d body s ta te m e n ts

S i, S 2, •••. S k , an d ( i lt i z .... i d) , (k l t k z ............k d) e / w here I is th e in d ex

sp a c e and fo r som e x e D[S’i (fc1, k z .........fcd)] th e following d a ta

d e p e n d e n c ie s hold:

(i) S i ik i, k z...... k d) < S j( ix i d) an d

x e I { S j{ ii, iz, ..., £*)) -> S i 6 S j w here Sj < S j m ea n s S'* is

e x e c u te d befo re S j.

(ii) x e / ( S , - ^ ........•£*)) an d S ^ i i . i 2 i*) < S j ^ j fcd) Sj <5 5,-

fxt i * '. ~ <r


n. j x“ x* ••••
<?. f f c .i- •••* •“<*/
«• A <^ -S' j’.r\Vl ,l> •••■ £_,>
~dJ
-» .S ’. °<5° ‘- 'j

D efin ition 5.S: D ata D ependence Graph:

A d a ta d ep en d en ce g raph, G, of s nodes, one for e a c h Sj , 1 < i ^ s .

F o r e a c h d ep en d en ce re la tio n b e tw e en Sj an d S j , th e re is a c o rresp o n d in g

lab e lle d a rc from th e node re p re s e n tin g Sj to th e S j node.

5.4 .1 C on struction of M ultiprocessor Arrays

The foregoing n o ta tio n p ro v id es a convenient vehicle to d escrib e th e

m apping of algorithm ic co m p lex ity onto larg e collections of p ro ce sso rs.

M atrix a lg e b ra o p eratio n s ex h ib it co n sid erab le s tr u c tu r e allowing th e

fo rm a tio n of a p ro g ra m m odel o n w hich th e p r e s e n t re s u lts a re based. Many

m a trix alg o rith m s a d m it to th e following s tru c tu re :

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 170 -

do while I lt Cj ;

Bx

do while / 2, C2 ;

Bz
O
Q

Q
do while Is , Cy ;

Bu

e n d Iu ;

e n d Ij i - i :

end I i ;

w here U il& i a re th e loop indices, a re th e loop co n d itio n als and

\B iliL i a re th e loop bodies, assu m ed to be b a sic . F re q u en tly , only B u

a n d /o r B u - 1 a re n o n em pty.

L et B i(ii,iz ,..,iji) d e n o te th e execution of th e loop body B t w hen

Ji = i i . Iz = ^z e tc .. le t Bt ( i) d enote th e ex ecu tio n of Bt w hen /j = i while

B i(li) m ean s th e e x e cu tio n of B t for einy allowable value of / t . F u rth e rm o re ,

r e s tr ic t sill loop bodies to have a single e n try p o in t an d a single exit. The

goal of th is s e c tio n is to exam ine u n d e r w hat conditions, som e g e n e ra l

s ta te m e n ts c a n b e m ad e regarding th e c o n s tru c tio n of m u ltip ro c e sso r

a rra y s having local connectivity. Specifically, th e re a re two issues:

i) The d a ta d ep e n d en c ies of a n algorithm lim it th e a m o u n t of

p a ra lle lism th a t m ay be em ployed. What s o rts of d e p e n d e n c ie s a re

allowable s u c h t h a t a m u ltip ro cesso r having lo cal co n n ectivity

re s u lts in a sp e e d advantage? In p a rtic u la r, do th e m a trix a lg e b ra

o p e ra tio n s of in te r e s t exhibit th e s e d e p en d en cies?

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 171 -

ii) W hat is th e p erfo rm a n c e a d v a n ta g e ach iev ed by th e

m u ltip ro c e s s o r? F o r exam ple, a n a lg o rith m req u irin g 0(71-®) ste p s

on a u n ip ro c e s so r should a t le a s t re q u ire no m o re th a n

s te p s if 0 (n ) p ro c e sso rs a re em ployed. C learly th e d a ta

d e p e n d e n c ie s influence th e achievable s p e e d en h a n ce m en t.

The m a in t h r u s t of th is sectio n will b e to show t h a t u n d e r c e rta in

conditions, a rr a y s th a t achieve th e p e rfo rm a n c e e n h a n c e m e n t alluded to in

(ii) do exist. P ro o f of e x iste n c e will be given b y providing a c a n d id a te a rra y .

A lthough a rr a y s of a rb itra ry com plexity a re possible fo r v e ry com plex

p ro g ra m s, e.g., a n 71-dim ensional a rra y fo r a p ro g ra m w ith n a p p ro p ria te ly

s tr u c tu r e d loops, VLSI in te g ra tio n on a two d im ensional silicon su rface

im poses v e ry s e v e re c o n stra in ts on a rra y size. While it is possible to

c o n s tru c t a n a r r a y w ith th r e e sp atial dim ensions an d only local connectivity

in a fo u r d im en sio n al universe, two sp a tia l dim ensions a re th e lim it on a

silicon plane. P ro g ra m s of large c o m p u ta tio n a l com plexity m ay be

im p lem e n te d only b y m ultip ly indexing th e dim ensions.

T hree d im en sio n al a rra y s (w here one d im en sio n is tim e) will be studied

first while g e n e ra liz a tio n s will b e m en tio n ed la te r. The m a in in te r e s t of this

s tu d y is to e s ta b lis h th e s o rts of d a ta d e p e n d en c ies t h a t a re allowable in th e

p ro g ra m m odel, given t h a t a th re e dim ensional a rr a y s tr u c tu r e w ith local

co n n ectivity is d e sire d . The p erfo rm a n c e lim ita tio n s o n th e a rra y due to th e

d ep e n d en c ies will also b e exam ined. Since th e r e a re two available spatial

dim ensions, th e two m o st deeply n e s te d loops will b e d istrib u te d in space

b e c a u se th e s e cure e x e c u te d m o st frequently.

L et th e in d ic e s / 1. Iz , .... In assu m e 0{ 71 ) d istin c t values e a c h (i.e. Ii

assu m es a t m o s t f c n values) yielding a n a lg o rith m of com plexity 0{7im)

(assum ing e a c h loop body co n sists of only a few sim ple in stru c tio n s, whose

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 172 -

n u m b e r is in d e p e n d e n t of n ) .

l e m m a 5.1: L et Br (a ,b ) d e n o te th e e x e c u tio n of Br w hen I j/-i = a , I y - b

a n d le t * d e n o te any allowable value in th e a p p ro p ria te in d ex set):

(d l): B y (iy ) S iy e Iy , Iu _j = c o n s ta n t, / ^ 0

(d2): B y { i y - i . d ) 6 B y - ^ i y - i + l , *) iy -i^ -Iy -i • d G /if—i , I > 0

(d3): B ji-i{ia -i, *) 5 B y {iy-i, i) i# -i e Iy-1 , V i e Iy

(d4): B ji{iy -i, iy-T-e) 6 B y (iy - i + l, iy ) , s <= 0

a re allowable dependencies, th e n th r e e ex ists a one dim ensional co m puting

s tr u c tu r e w ith only n e a re s t n eig h b o r connections, capable of co m p u tin g th e

p ro g ra m re s u lts in O fa * -1) tim e.

R e m a rk : Ite m ( d l) freq u en tly d e g e n e ra te s in m a trix alg o rith m s s u c h t h a t

n o d e p e n d e n c e ex ists b etw een su ccessiv e e x ecu tio n s of th e in n e rm o st loop.

Sim ply s ta te d , th e ap p licatio n of th e s e o p eratio n s to a s e t of rows (o r

co lu m n s) does n o t g e n e ra te d a ta d ep e n d en c ies am ong p a rtic u la r row

e le m e n ts fo r t h a t sam e o p eratio n . The p r e s e n t m odel quite re a so n a b ly

a ssu m e s t h a t su c h a n e le m e n ta ry o p e ra tio n is applied to individual m a trix

e le m e n ts in th e m o st deeply n e s te d (i.e., M01) loop.

Proof:

C onsider th e c a n d id a te s tr u c tu r e of p ro ce sso rs: (a c irc le d e n o te s a

p ro c e s s o r an d its label in d ic a te s th e loop body t h a t it ex ecu tes)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
w here:

fc= 1

a (5 ir -i) U Q(^if(fc)) i> f


fc=i-/+1

is th s o u tp u t of th e J?# p ro c e sso rs to th e rig h t.

D ependence ( d l) is c le a rly satisfied by th e e x p re ssio n for U (5ff(i)).

D ependence (d3) is sa tisfie d since Q(Bji - i ) p ro p o g a te s th ro u g h th e array.

Finally d ep en d en ce (d2) a n d (d4) a re satisfied by th e re v e rs e flow of data. It

is assu m ed t h a t e a c h p ro c e s s o r h a s local m em ory.

Suppose t h a t *) is ex e cu te d a t T = 1. T hen th e th ro u g h p u t

of th e a rra y will b e d e te rm in e d by w hen successive o p e ra tio n s of B jf-i can

be in itiated , a s s um ing everything h ap p en s in som e b a sic tim e unit. From

(d2) it is c le a r th a t is in itia te d a t tim e ft given by

ti = m ax ( i , i + (2d —iyf 1 J)

Now since /j^_x v a rie s over a t m o st k u _x n values, th e la s t BU- X

p ro ce ss is in itia te d a t:

m a x ( k u .- j i , k j / ^ n + —J )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 174 -

L et I n v ary over a t m ost kMn values. T hen all th e calculations of th e

(M -l)**1 and M 0 1 loop are co m p lete a t tim e (we a ssu m e n > I)

T - { k j i + k j i - 1)71 + (2 d -Z )[ U J/{Z(i-£>oj

w here /{cj is th e in d ic a to r fu n ctio n of th e logical event, C. It is c le a r th a t

T - 0 (7 1 ) sin c e th e 2nd te r m is a c o n sta n t. T herefore, th e ex ecu tio n tim e

fo r th e e n tire p ro g ra m is

= 0 (71* J)

a n d th e proof is com plete.

The foregoing lin e a r a rra y has a c o rresp o n d in g activity c h a rt and it is

now possible to g en eralize th e id e a of "stack in g " of th e rows of th e activity

c h a r t to c o n s tru c t a tw o-dim ensional a rra y .

T heorem 5.1.

If ( d l) - (d4) a n d th e additional d ep e n d en c ies (i?r ( a ,b ,c ) d en o tes th e

e x e c u tio n of B r w hen Iu -z - a, I = b , I*? = c):

(d5): *) 5 5 jf-i(,'.ff-i+fc> *) . i-H-i ^ I ji - i . fc > 0 ,

I n -z - c o n sta n t

(th is is allowable in Lem m a 5 .1 w ith m em ory)

(d 6 ): B ji-z iiy -z , *, *)6 B ji-iiiji-z. i . *) . v '• £ 7ff- l

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 175 -

(d.7): | ) 6 B u - z H u - z ^ l . * , * ) , ijg-z e lu - z . £ e I j i - i c o n sta n t.

a re allowable, th e n th e r e e x ists a two dim ensional a rr a y of p ro c e s s o rs w ith

local connectivity, ca p ab le of executing th e p ro g ra m m o d el in 0 { n M~z)

tim e.

Proof:

B eginning w ith th e s tr u c tu r e of Lem m a 5.1 w hich satisfies ( d l) - (d4)

an d is in d e p e n d e n t of (d5)-(d6), c o n s tru c t th e c a n d id a te s tru c tu re :

W D
#

M-2
• 0

BM(kMn)
0

0 # 9
9 9 9

9 0

V kMn )
+r

It is c le a r t h a t (d l), (d3) a re satisfied. Next, n o te t h a t (d2). (d4) a re

satisfied th ro u g h th e a d d itio n a l connectivity, w hich also satisfies (d5) an d

(d6).

N otice th a t freq u en tly , som e of th e p ro c e sse s, p a rtic u la rity th e Bu

p ro c e s s e s will be null, so t h a t th e a rra y will n o t b e re c ta n g u la r.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 176 -

The n e x t ta s k is to a s c e r ta in th e e x e cu tio n tim e of th e p ro g ra m m odel

on th is a rra y , to c o m p le te th e proof. It is once a g a in n e c e ssa ry to

d e te rm in e th e tim e s te p s a t w hich su ccessiv e B u -z p ro c e sse s c a n b e

in itia te d given th a t B u -z{Iu -z = 1. *, *) o c c u rs a t t = t0 . Let

r e p re s e n t th e tim e of o c c u rre n c e of B u - X(i, *) for Iu -z = co n stan t. F ro m

th e can d id ate a rra y s tr u c tu r e , th is is given b y

ti = \r-jH 2 d + i +t0

Notice th a t (d6) h as no e ffect in £* sin ce is always a t le a s t as larg e as

th e tim e re q u ire d for Bu-zi^-u-z. *’• *) to p ro p a g a te to B u-iH u-z- *)■

Next, le t t \ d e n o te th e tim e of o c c u rre n c e of B u -z{i. *). F rom (d?), it

is c le a r th a t

= i + [------ J/ (£,g) ; / is a fu n c tio n of f,g only

" ~ kji-Z71 + l ~ ^ ------ J/ ( £ • ? )

= 0 (n )

T' = 0 ( n ) = tim e to c o m p le te th e th r e e in n e r loops.

.'. The p ro g ra m ex e cu tio n tim e is T'p :

t=i

t= i

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- Ill -

th u s com pleting th e proof.

It is in te re s tin g to ev alu ate th e latency, L, th ro u g h th e a rr a y for a given

Iji-z - iu - z . i-e. th e tim e to co m p lete th e two in n e r loops. This is given by:

~ kyn

= 0 (n ) (5.9)

Rem arks:

1} N e ith e r of th e c a n d id a te a rra y s of Lem m a 5.1 nor of T heorem 5.1

have b e e n allowed to a c c e p t input, for exam ple in th e m a n n e r of

F igure 5.6. This is re a d ily acco m o d ated in th e p r e s e n t s tr u c tu r e

b y p r e te n d in g th a t e a c h p ro ce sso r h as a m e m o ry a s so c ia te d w ith

it, w hicn i t a c c e sse s as p a r t of th e icop body Sfc. In p ra c tic e , th e

m em o ry n e e d n o t ex ist if in p u t is provided fro m a n e x te rn a l so u rc e

in th e sa m e m a n n e r, however, th e m em ory c o n c ep t is a convenient

tr ic k to e n s u re t h a t n one of th e proofs a re violated.

2) A lthough th e proofs above are c o n stru c tiv e in n a tu re , it is

im p o rta n t to re m e m b e r t h a t th e propositions th em selves r e f e r to

th e e x iste n c e of a rra y s having a c e rta in p e rfo rm an ce.

The proof of T h eo rem 5.1 h a s im m ense value in providing a n a rra y th a t

is g e n e ra lly app licab le to alg o rith m s satisfying th e p a rtic u la r p ro g ra m

m o d el u sed . In c o n tra s t, previous m eth o d s of synthesizing su c h a rra y s have

b e e n q u ite ad hoc, p resu p p o sin g a specific problem , a rra y in te rc o n n e c tio n

o r b oth. H ence, th e ra n g e of applicability of th e m u ltip ro c e sso rs is seldom

know n (ILIAC TV). F u rth e rm o re , th e a rra y s tr u c tu r e s c a n b e g e n e ra te d

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 178 -

auto m atically fro m th e algorithm . A pplications of th e th e o re m will b e given

shortly, for som e p o p u lar th re e loop alg o rith m s.

G eneralizations to a rra y s of h ig h er dim ension, while of academ ic

in te re s t, a re n o t p ra c tic a l in VLSI im p lem e n ta tio n s u n less th e connectivity

is en tirely two dim ensional, which is c le a rly re s tric tiv e . In any event, if one

continues to a u g m e n t th e d a ta dep en d en cies, as was done fo r T heorem 5.1

in th e case of (d5)-(d7) i t is possible to show th e following resu lt:

F o r any p ro g ra m satisfying th e p ro g ra m m odel a n d d a ta dependencies,

th e re e x ists a n {M — l) dim ensional com puting s tr u c tu r e w ith only

local connectivity, capable of ex ecu tin g th e p ro g ra m in 0{ ti ) tim e

steps.

Remark:

(1) The g e n e ra liz a tio n is possibly of in te r e s t fo r 4 dim ensional a rra y s (one

tim e dim ension) if c u rre n t day ta lk of " th re e dim ensional VLSI" proves

c o rre c t.

(2) In th e c a s e of 3-D VLSI as well a s fo r non-VLSI a rra y s (i.e., c a se s w here

local con n ectiv ity is practiced), it is u n fair to a ssig n equal cost (tim e) to

all com m unications.

It is w o rth re m a rk in g t h a t th e c o n s tru c tio n u se d in th e proof of

T heorem 5.1 is sim ply a n ex ten sio n of th e " sta c k in g " id e a of S ection 5.3.2

em ployed fo r Given's algorithm . The tim e sp ace d u ality c o n c ep t of S ection

5.3.3 c a n be sim ilarly ex te n d e d to a u seful a n d g e n e ra l re s u lt.

Lem m a 5.2:

The tim e s p a c e dual lin ear a rra y of th e U m loop alw ays exists.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 179 -

A du al a rra y of L em m a 5.1 is:

G C B ^ a )] «[B m_1 ( 2 ) ]

Theorem 5 .2 (T heorem of D uality)

F o r e a c h a rr a y satisfying th e c o n stru c tio n of T heorem 5..1 (i.e., for th a t

class of alg o rith m s), th e re ex ists a t le a s t one dual a rr a y in w hich th e loop of

th e p ro g ra m m odel m ap p e d into th e te m p o ra l dim ension is ex changed fo r a

loop m a p p e d in a sp a tia l dim ension.

Proof:

Follows as a con seq u en ce of L em m a 5.2 w hen th e lin e a r a rr a y in th e

proof of T h eo rem 5.1 is re p la c e d by its dual.

Corollary;

T here e x ist "space du a l" a rra y s in which th e sp a tia l dim ension of two-

loops of th e p ro g ra m m odel a re exchanged.

S p ace dual a rra y s will n o t b e stu d ied in th is d isse rta tio n , m ainly

b e c au se th e y m u s t b e stu d ie d a t a co n stru ctiv e level. Clearly e x iste n c e

level stu d ie s c a n b e quite trivial since a space dual a rra y m ay b e o b tain e d

b y sim ply re o rd e rin g (i.e., renam ing) th e sp a tia l dim ensions of an y

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 180 -

p a rtic u la r a rra y .

Rem ark:

Sim ply s ta te d , th e foregoing h as d isc u sse d th e m apping of algorithm ic

c o m plexity onto m an y dim ensions fo r a lg o rith m s w hich satisfy a p a rtic u la r

s tr u c tu r e . F undam entally, th e re is no difference b e tw e en any two

dim ensions a lth o u g h it m ay be im posed b y d a ta d e p e n d en c e o r co n n ectiv ity

c o n stra in ts, e.g.. th e dual n a tu re of tim e an d space. In th is world, th e

p ra c tic a l lim it is four dim ensions, i.e., four loops in th e p ro g ra m m odel

how ever, h ig h e r dim ensional p ro g ra m s in w hich th e to ta l n u m b e r of

co m p u ta tio n s a re finite c a n be th o u g h t to g e n e ra te h ig h e r dim ensional

a rra y s b y m u ltip ly indexing along a given dim ension. By indexing in th e

tim e dim ension, th e usual c o n c e p t of tim e sh arin g o r c o m p u ta tio n in tim e

arises; sp a c e sh a rin g is clearly th e m o re u n u su al case.

Exam ples:

T here a re a v a rie ty of im p o rta n t m a trix a lg e b ra algorithm s, p rim a rily

fo r fac to riz atio n , w hich satisfy th e p ro g ra m m odel. In view of th e ir r e c e n t

p ro life ra tio n in th e lite ra tu re [MD81], a rra y s fo r th e ir re a liz a tio n will b e

deriv ed a s e x a m p les for th e tec h n iq u es ju s t d escrib ed . These a rra y s

a p p e a re d in [ADM82] w here th e y w ere o b tain e d b y in sp ectio n . In c o n tra s t,

th e a rra y s to follow a re all derived fro m a g e n e ra l m eth o d . A g e n e ra l

p u rp o se a r r a y s tr u c tu r e , n am ely a tria n g u la r a rra y , w hich is usable fo r all

th e ex am p les, is th e re s u lt of e ith e r a p p ro a c h in th is case.

E xam ple 5.1:

Givens A lgorithm - This h a s a lre ad y b e e n p r e s e n te d in S e c tio n 5.3.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 181 -

Example 5.2:

G aussian E lim ination is a well know n m e th o d fo r obtaining th e LB

deco m p o sitio n of a m a tr ix A = [ay ] as follows:

fo r r = 1 to begin;

fo r i = t +1 to 7i begin;

■>?ri = Q j r / ° r r ; arr*~ ar r •

fo r j = r + 1 to 7i begin;

°T3 1 o' au
1 “y.
end;

&r <—
1 0 br
b i.
tfri 1 bi
I J

end;

end;

This is sim ply th e Givens’ m e th o d in w hich th e coefficient m a trix is a

lin e a r (i.e., a m u ltip ly a n d a c c u m u la te ) r a th e r th a n a c irc u la r ro ta tio n . Its

im p le m e n ta tio n is e x a c tly th a t of Givens m eth o d (e.g. F igure 5.10), however,

w ith a lin e a r m e tric c h o se n fo r th e CORDIC p ro ce sso rs.

Exam ple 5.3:

The H yberbolic C holesky Algorithm , w hich is a n efficient m e th o d of

facto rin g a positive defin ite m atrix , A, firs t a p p e a re d in [MD81]. Following

Belosm e an d Morf’s d e s c rip tio n of th e alg o rith m [DM81], observe t h a t A

satisfies th e iden tity :

A ~ V~ V - W7 Tr
w here

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 182 -

v tj = 0 otherw ise
and

m ;- = if ? > i

■U/'y- = 0 otherw ise

a re th e e le m e n ts of th e u p p e r tria n g u la r m a tric e s V an d W.

H yperbolic ro ta tio n s a r e now applied to com bine rows of V a n d W in a

m a n n e r forcing W to zero while m aintaining th e s tr u c tu r e of V. This yields

th e u n iq u e Cholesky decom position of A. This alg o rith m m ay b e w ritte n as

[DM81]:

fo r r - 2 to n begin;

for i = r —1 to 1 begin;

tfrt = - ta n h -1^ / ^ ; «- y/v£. - -w£.\

fo r j = r + l to 7i begin;

v rj (cosh#ri sinhtfrt Vrj


[sinhtfri coshtfri Wij
V*.
end;

end;

end;

It is re a d ily verified th a t th is alg o rith m satisfies th e p ro g ra m m odel.

Applying T heorem 5.1 for n = 4, in p a rtic u la r th e c o n s tru c tio n in its proof,

th e th r e e dim ensional (two sp atial dim ensions) a rra y of F igure 5.11 is

ob tain ed . Incom ing c o m p o n en ts of th e m atrix , A, a re first n o rm alized to

c o m p u te Vy an d iUy. Then e a c h p ro c e s s o r in th e a rra y c o m p u te s an d s to re s

th e angle th ro u g h w hich it will ro ta te all su b seq u e n t in p u ts, resu ltin g in th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 183 -

t = l t= 2 t= 3 t= 4 t= 5

a l l a L2 a 13 a l4 b l

a 22 a 23 a 24 b 2

a 33 a 34 b 3

a 44 b4

F igure 5 . 1 1 : T ria n g u la r Array f o r H yperb olic Cholesky

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 184 -

tria n g u la r a rr a y w ith a ngles i n p la c e .

N otice th a t a g e n e ra l purpo se a rra y , n a m e ly a tria n g u la r a rra y , has

b e e n show n to exist fo r th e p ro g ra m m odel, a s co n firm ed b y th e exam ples.

5 .4 .2 An Approach To Form alism

The previous s e c tio n provided som e fo rm a lism fo r th e sy n th e s is of

m u ltip ro c e s s o r a rra y s, w hich g u a ra n te e d th e s tr u c tu r e s to b e g e n e ra l

p u rp o se for a given p ro g ra m m odel. This s e c tio n is c o n c e rn e d with

developing additional ideas for th e a n a lysis of m u ltip ro c e s s o r a rra y s, b a sed

o n th e n o tio n of "m e tric spaces", p a rtic u la rly th e n o tio n of d ista n c e. It is

c le a r th a t d a ta d e p e n d en c ies and a rra y in te rc o n n e c tio n s lim it th e speed of

m u ltip ro c e s s o r a rra y s a n d th e a p p ro a c h to b e d ev eloped will provide a

s y s te m a tic m e th o d fo r th e ir analysis.

D istances will b e m e a su re d in u n its of tim e (w hich is in a c co rd a n ce with

o u r u su a l c o n c e p t of d ista n c e since tim e is sim ilar to sp a tia l dim ensions)

a n d d ista n c e m e a s u re s will be d e te rm in e d b y th e a rr a y connectivity. Data

d e p e n d e n c ie s in a p ro g ra m will e sta b lish a p a r tia l ord erin g w hich will

d e te rm in e th e m a n n e r in which th e c o m p u ta tio n w avefronts [se e e.g., Mu71

fo r a discussion] tra v e rs e th e a rra y , w hich to g e th e r w ith th e distan ce

m e a s u re s , will allow ex e cu tio n tim e analysis for v ario u s p ro g ra m m odels. In

fac t, sim ple proofs to th e th eo re m s of th e previous s e c tio n c a n also be

ob tain ed .

5 .4.2.1 D istan ce M easures

L et dn{P,Q ) d e n o te th e "distance" b e tw e e n two elem en ts, P and Q

of a c o lle ctio n D.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 185 -

D efinition 5.7:

A d ista n c e m e a s u re m u st satisfy th e following p ro p e rtie s [R0 6 6 ]

(1) d a(P ,Q ) > 0 . P * Q ; d a{P ,P ) = 0

( 2) d 0{P.Q ) = d a(Q.P)

(3) d a(P .Q ) + d D{Q ,E) ^ d a(P .B ) P .Q .R e Q

The l a t te r ite m is known as th e T riangle Inequality. C onsider th e

re c ta n g u la r a r r a y of p ro cesso rs, w ith lo ca l connectivity, shown in F ig u re

5.12. E a c h p ro c e s s o r is assigned a s e t of co-ordinates, e.g., le t P have

c o o rd in a te s (px , p y ). an d th e co llectio n of p ro ce sso rs is th e se t Q.

D efin ition 5.8:

The d ista n c e b e tw e en e lem en ts P , Q of th e re c ta n g u la r a rra y is:

djt{F.Q ) = IPz 9z i + \ P y - q V \ (5-10)

I t is c le a r t h a t satisfies th e firs t tw o p ro p e rtie s of Definition 5.7. To

prove th e tria n g le inequality.

dR {P.Q ) + dR {Q ,R) = \px - q s \ + \qs - r z \ + \py - qy \ + \qv - r y

^ \?= ~ ?x + qx ~ r x | + \py - qy + gy - Ty |

= d jtiP .Il)

The d ista n c e , dj>(P,Q) is a m e a su re of th e tim e re q u ire d to go fro m P , Q

following th e con n ectiv ity of th e a rra y .

N ext c o n s id e r th e double hex ag o n al a rr a y of Figure 5.13 (th e a rr a y is

te r m e d double hexagonal due to th e in te r c o n n e c t p a tte rn , n o t to th e sh a p e

of th e c o lle ctio n of p ro ce sso rs, w hich would v ary considerably with n u m b e r

of p ro c e s s o rs a n d a rtis tic ability).

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 186

o,o

2,0

3 ,0

F igure 5 .1 2 : The R ectan g u lar A rray

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
- 187 -

0,0 0 ,2 0 ,3

1,0

2,0 2 ,3

3 ,0 3 ,1

F ig u re 5 .1 3 : The Double-Hexagonal Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 188 -

Definition 5.9:

The d ista n c e b etw een e le m en ts P , Q of th e double hexagonal a rra y is:

dx{P .Q ) = m ax [ \px - qx \ , \pv - q v }]

and th is is th e tim e re q u ire d to go fro m P to Q. Again, dx c le a rly

satisfied Definition 5.7. To prove th e tria n g le inequality:

dx {P,Q ) + dx (Q .R ) = m ax [ \px - qx | . i P y - ? * ! ] + m a x [ \qx - rx | . |gy - r y |]

St m a x [ \px - q x \ + \qx - T x \, Ip y - q y \ + | qy - r y |l

£= m a x [ \px - r x I, |py -T y |]

= dx {P ,R )

Finally, co n sid e r th e hexagonal a rra y of Figure 5.14.

D efinition 5.10:

The d ista n c e b etw een e le m en ts P , Q of th e hexagonal a rra y is:

dH(P ,Q ) = dPX P .Q )I,Z]p(j + dx {P .Q )I[clpQ

w here is th e in d ic a to r fu n ctio n of th e logical event

\C \PQ = \sg n (p x - q x ) = sg7i(py - qy )\

To prove th is e x p ressio n satisfies th e tria n g le inequality p ro c e e d a s follows:

dH(P,Q ) + d ff(Q .R ) = d p {P ,Q )ImpQ + dx {P ,Q )Ilcip9 + dR { Q .R )I p l(}R + dx { Q .R )I[clqR

dfi(P ,Q ) + dR {Q ,R) , (Cjpp n


dx {P ,Q ) + dx (Q ,R ) . \Clpg n \C \qs
djt(P .Q ) + dx {Q ,R ) , \C \ pq n \ C \ ^
dx (P ,Q ) + dR (Q ,R ) , n

d ff(P .R )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 189 -

0,0 0, 2 0 ,3

1,0

2,0 2 , 2 '.

3 ,0

F igure 5 .1 4 : The Hexagonal Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 190 -

This in eq u ality is c le a rly tru e fo r th e first tw o c a se s by th e re s u lts of

Definitions 5.8 a n d 5.9. It is also tru e fo r th e l a t te r two c a se s for i t c a n

never h a p p e n t h a t th e a rra y is tra v e rs e d fro m P to R via a longer p a th

th a n P , Q, R . C learly if th a t w ere th e case, th e p a th P, Q, R would be

chosen, th u s achieving a n equality a t m ost.

These d ista n c e m e a s u re s im pose a topology w hich is useful for

analyzing th e co m p u ta tio n s being p erfo rm ed on a n a rra y .

D efinition 5.12: [Ro68]

A "closed ball" w ith c e n te r P an d rad iu s, r , is th e s e t of elem en ts 0

su c h th a t:

d a{P.Q) * r P .g € Cl

For exam ple, F ig u re 5.15 shows th e shape of a clo sed ball in th e collection

fi, of p ro c e s s o rs fo r th e re c ta n g u la r and hexagonal a rra y . Notice th a t since

th e co llectio n of p ro c e s s o rs in finite., th e sh ap e of th e o p en ball depends on

b o th th e c e n te r P a n d th e rad iu s r .

T heorem 5.3:

L et JC J d e n o te a collection of c o m p u ta tio n s o c c u rrin g a t tim e t 0 (Q

could b e as sim ple as a d a ta tra n s fe r) whose r e s u lts a re re q u ire d to p e rfo rm

a .c a lc u la tio n C in p ro c e s s o r P . The e a rlie s t in s ta n t a t which C c a n be

p e rfo rm e d is tp = t 0 + r w here r is th e ra d iu s of th e sm a lle st closed ball

of c e n te r P enclosing $Cij. i.e.,

tp = t Q + m in : d(Q, P ) « r V Q}

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
191 -

o
ooo
s~~\
K J

O0 ©O o o
O0 o o o
ooo
o
R ectangular Array

oo oo
ooo oo
oo o P i
'•v_y oo
o o o ®o oo
ooo o PN
o
oo o \ ___ y t

o o » }

[lexa gone! Array

Figure 5 .1 5: Cl os ed B a l l Topol ogy

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 192 -

Proof:

The th e o re m is tru e b y n e g a tio n for if C could be p erfo rm ed a t P a t

t < f p , th a t would im ply th e ex isten ce of r a < r su c h th a t d ( Q , P ) < r 0.

B ut th is is false b y th e definition of V .

I t is c le a r fro m Figure 5.15 a n d T h eo rem 5.3 th a t th e hexagonal

in te rc o n n e c tio n provides fo r m o re flexible c o m p u ta tio n th a n th e

re c ta n g u la r a r r a y since a closed ball of ra d iu s V encloses m o re

p ro ce sso rs.

E xam ple 5.4:

S uppose th e late n cy of th e c a n d id a te th re e dim ensional a rr a y of

T heorem 5.2 is d e s ire d for a given value of Ij i - z - iji-z- This p ro b le m is

rea d ily solved using th e d istan ce c o n c e p t a n d th e p ro c e sso r c o o rd in a te

assig n m en t of F ig u re 5.16. The a rr a y h a s d ista n c e m ea su re given by

E quation 5.10 sin c e it has a re c ta n g u la r in te rc o n n e c tio n p a tte rn .

The d a ta d ep e n d en c ies e sta b lish a p a rtia l o rdering d eterm ining th e

tra v e rs a l of th e a rra y . This ordering is shown in Figure 5.17; to g e th e r w ith

th e d e p e n d e n c e g rap h . Now, in o rd e r to c a lc u la te th e ex ecution tim e on th e

a rra y p ro c e e d as follows;

(1) L et L d en o te th e ex ecution tim e fo r th e in n er two-loops fo r a

given

(ii) The c a lc u la tio n is com plete once th e d ista n c e from

to B iiiijf- 2 . fcjf-iTi, k Mn ) is tra v e rs e d . However the choice of

tra v e rs a l p a th is n o t a rb itra ry (in p a rtic u la r, it is not th e m o s t

d ir e c t p a th ) b u t r a th e r it is d e t e r m in ed b y th e d a ta d ep en d en cies

of th e p ro g ra m m odel. Following th is p a th yields:

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 193 -

M-2

3 ,0 3,k..n

►»

Figure 5 .1 6 : Coordinate Assignment f o r Theorem 1

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 194 -

Dependence: Graphs: '■ '


vuu; (d3)

(d7) (d2)

c^cx T < f
I M -2< q “M-l

P a r t ia l O r d e r in g s :

f o r BM - 2 ^ X 1- q

BM - 2 ^ BM - 2 ^ BM-2^i _ 1 ^ BM - 2 '
fo r bm_ 9 ( 1 )>

BM-2( i " q > BM -1{1) BM -I {2> BM-2(q )

for y y

BM - 1 ^ BM - 1 ^ BM - l ( i >
for
• •

f o r BM( i ) c «

* _BM - 1 ^ ) bm ( D V 2) V 1')

Figure 5 .1 7 : T raversal Ordering

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
L = S
3=1

1
+ d s [ B n { k ji-in - j l,d ) \ B u -i{k ji-in - —j l , *)] )!j

+ B ll(k n -in .,k ji 7i)]

k u .n —l
*jf-in-f li

3=2

(iii) N ext, applying T heorem 5.3 it is re a d ily a s c e rta in e d th at:

d R [ B j i - i ( l - j B f f - x{ l - j - l , * ) ] - 1 0 <,?' < Z- 2

d n [B ji-i{k fi-in - #): B u { k U - xTt, = ka n

d u l B n - i i k j / ^ n - j l . *)\ B x {k ]i- l-n.-tj-H)l,<£)'\ - d+Z

(iv) Thus, th e la te n c y of th e a rra y fo r th is p ro g ra m m odel is given b y

L:

+ 1

- r ^ *------ ] (2d) + {kjj + k j i ^ y n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 195 -

= 0{n)

This is p re c is e ly th e late n cy evaluated in th e previous section, E quation

5.9, sin ce it is easily s e e n th at:

E xam ple 5.5:

This ex am p le will d e m o n stra te how "quick and. d irty " approxim ations

c a n b e o b ta in e d from th is m ethod. C onsider Givens algorithm , w hich

satisfies th e p ro g ra m m odel, im p lem en ted o n th e a rr a y of F igure 5.6. This

a rra y h a s a re c ta n g u la r in terconnection. With th e co o rd in ate system of

f ig u r e 5.18, it is e a sy to e stim a te th e ex e cu tio n tim e , T, of th e algorithm as

follows:

T ~ 7i + «&[(1.1) : ( !.« ) ] + cfe[(l.n) ; (n.n)]


= 371

N otice t h a t th is is th e ap p ro x im ate e x ecu tio n tim e th a t was d e te rm in e d for

t h a t a rr a y e a rlie r.

5.5 EIGENVALUE DECOMPOSITION

D eterm in in g th e eigenvalues of a n ti x n m a trix , B , is a n e x trem ely

im p o rta n t c a lc u la tio n in lin ear algebra, owing to th e a p p earan ce of

eigenvalues in th e solution to a larg e n u m b e r of pro b lem s. The m o st

efficient te c h n iq u e s fo r calculating eigenvalues a re th e LR algorithm of

R u tish a u se r a n d th e QR p ro c e d u re of F ra n cis [SB80]. B ased on tria n g u la r

an d o rth o g o n a l decom positions resp ectiv ely , b o th th e s e algorithm s a re

ite ra tiv e in n a tu re . The focus of th is s e c tio n will be on th e QR algorithm

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 197 -

2 .n

n -l,n -l • *

Figure 5. : Coordinate System f o r Givens Array

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 198 -

since th e d eco m p o sitio n is s ta b le w ithout pivoting, a s m e n tio n e d ea rlie r.

However, LR w ith o u t pivoting m ay b e im p le m e n te d on th e a rra y s to be

p re se n te d , sim ply th ro u g h a se le c tio n of th e lin e a r m e tric in th e CORDIC

p ro c e sso rs (ex a c tly as in exam ple 5.2 above).

Since th e QR an d LR alg o rith m s a re b o th ite ra tiv e , it is d e sira b le to

re d u c e th e original m atrix , R , to a s p a rs e r s tr u c tu r e , A , which re q u ire s

less c o m p u ta tio n p e r ite ra tio n ; fo r exam ple,A , is trid ia g o n a l o r H essenberg.

S im ilarity tra n s fo rm a tio n s applied using Givens’ ro ta tio n s a re a convenient

m eth o d of achieving th e d e sire d s tru c tu re , and sin ce a rr a y a rc h ite c tu re s

have a lre a d y b e e n p r e s e n te d fo r Givens’ m ethod, th e seq u el will a ssu m e th a t

A is in trid ia g o n a l form .

L et Aq —A. This will be ite ra tiv e ly m odified, so le t A* d e n o te th e m a trix

a t th e k 01 ite ra tio n . T hen th e QR a lg o rith m is d e sc rib e d as follows:

S te p 1: F o rm th e o rth o g o n a l-u p p e r tria n g u la r d eco m p o sitio n of Afc , i.e.

A* = Q /R fc, w ith Qfc o rthogonal and R* u p p e r t r i angu lar. This ste p

c a n b e a c co m p lish e d w ith Givens’ m e th o d a s d isc u sse d in a n e a rlie r

section.

S te p 2: F o rm A*+i = R k Q j an d r e p e a t th e p ro c e d u re .

C onsider one s te p of th e p ro c e d u re so t h a t th e ite r a tio n in d ices m ay b e

dropped. The o rth o g o n a l m atrix , Q, m ay b e w ritte n a s th e p ro d u c t of th e

Givens ro ta tio n m a tric e s , i.e.:

Q A = Qn Qn-i • • Qi A = R
(notice th a t a tra n s p o s itio n of Q , as in s te p 1,is n o t re q u ire d sin ce it is u se d

on th e le ft h a n d sid e of th e e q u a tio n h e re )

while th e p r o d u c t in s te p 2 m ay b e c a s t into th e form :

R Q = R Qn Qn—i ■ ■ ■ Qi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 199 -

It is con v en ien t to rew rite th e p ro d u c t as:

(RQ)r = QfQ2r • • • Q j R r

This l a t te r form, is quits sim ple to im p lem e n t since a p re m u ltip lic a tio n b y

Q j is a ro ta tio n th ro u g h The angle is a lre ad y know n fro m th e

d eco m p o sitio n p h a se of th e a lg o rith m so th is r o ta tio n is rea d ily p e rfo rm e d

b y redefining th e d ire c tio n of r o ta tio n of th e CORDIC algorithm , a s d isc u sse d

in S e c tio n 4.1. The trid ia g o n a l s tr u c tu r e of A m ak e s th e fo rm a tio n of RQ

fro m (RQ)r a n e a sy ta s k involving only local co m m unications b e tw e en

p ro c e sso rs.

C onsider a n exam ple w ith n = 4. The initial m atrix , A,

ttjl tljg 0 0
a Sl a 22 a 23 0
A = 0 a 32 a 33 a 34
0 0

is re d u c e d to:

r ll r 12 T 13 0 rn 0 0 0
0 7-33 7-24 r l2 7-32 0 0
R = 0 0 7-33 7-34
and R7 =
r 13 r 23 r 33 0
0 0 0 r 44 0 r 24 r S4 r 44

in s te p 1. A lin e a r a rra y of p ro c e s s o rs suffices fo r th is c o m p u ta tio n since it

h a s only 0 ( n ) com plexity owing to th e trid iag o n al s tr u c tu r e of A. Only

th r e e p ro c e sso rs, e a c h one d e d ic a te d to a p a rtic u la r ro ta tio n angle, a re

re q u ire d w ith an im p le m e n ta tio n like F igure 5.8 , w here th e angles a re in

p lace.

R e m a rk : The du al a rra y is n o t a good a lte rn a tiv e h e re , since e a c h p ro c e s s o r

p e rfo rm s only th re e calcu latio n s owing to th e sp a rse n e ss of A. The dual

a rr a y (like F igure 5.6) would b e p re fe ra b le if A was less sp a rse , e.g.

H essen b erg . The trad eo ff is obvious. The load is sp lit b e tw e en th e initial

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 200 -

re d u c tio n of B to A an d th e ite ra tio n s of th e QR m eth o d . This exam ple

b rin g s o u t th e u tility of dual array s. The initial s tr u c tu r e of A h a s a -

profound im p a c t on w hich a rr a y is p referab le.

Once th e QR decom position has been co m pleted, th e angles a re in place

an d th e p ro c e s s o r o u tp u ts m ay b e fed b a c k to co m p u te th e se co n d ste p of

th e ite ra tio n . B idirectional connections a re re q u ire d betw een neighbours in

th e a rra y , owing to th e n eed fo r transposing sp a rse m a tric e s in th is second

ste p . An activity c h a rt of th e a rra y during s te p 1 is shown in F igure 5.19,

Figure 5.20 fo r ste p 2 an d finally Figure 5.21 for th e com b in atio n of ste p 2

p ipelined behind ste p 1. The beginning of s te p 1 of th e n e x t ite r a tio n is also

shown a s a re th e d a ta m ovem ents. Notice th a t only local co m m u n icatio n is

req u ired .

Finally n o te t h a t to im prove th e convergence of th e ite ra tiv e QR

p ro c e d u re , o fte n re q u ire s shifting th e diagonal e le m en ts of A* a t e a ch ste p

[SB60], involving som e additions. These m ay be read ily p e rfo rm e d w ith th e

a rith m e tic u n it of th e CORDIC block. F u rth e rm o re , a n a rith m e tic te s t

facility w hich influences th e m em ory m an a g e m e n t u n it proves useful in

d e te c tin g w hen th e p ro b le m sp lits into sm aller problem s.

CHAPTER SUMMARY AND CONCLUSIONS

L arge a rra y s of p ro c e sso rs were p re s e n te d for th e m a trix operations

th a t a re com m on in m an y a re a s including signal processing- A new lad d e r

s tr u c tu r e for th e fa s t Cholesky algorithm by rows was developed which was

rea d ily im p lem e n te d on a lin e a r a rra y of CORDIC p ro c e sso rs. It was no ted

th a t th e Levinson in la d d e r fo rm and feist C holesky by rows an d colum ns

a lg o rith m s a re all equivalent u n d e r pipelining an d h en ce, enjoy th e sam e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 201 -

'al l a21
t= i le x

a 12 a22

t=2

r 12 a22V x
\
\
\
\
. a 22 a32
0v 523
t=3

r 13 a 23x r22 0
\
\
\
\
^ a23 a33
t= 4 £=!)

’ 23 “ 33
\
\
\
0 a,'34
a33_ / a43
t= 5

r 24 a3 ^ r 33 0
\
\
\
^ \34 , a44
t=6

in a c tiv e . «. r 34 r 44
in p u ts

^ ^ ^ C O R D I C p ro c e sso r

o u tp u ts

F ig u re 5 .1 9 : E igenvalue D ecomposition - QR Decomposition

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 202

rl l r12
t= l (-6

P11 r 12\
\
\
\
0 r ,„
^ - / 13

P12 r 22>S" PCl


ot r»-%\
lO ■
\
\ \
\ \
X
' S 'v r 22 r 23 X r13 0
t=3

p22 r 23s' \ 31 P41


X
0 r.
33 ,r 24
t=4 f-9

P23 r3 3 \ P32 P42


\
X

t=5
X r 33 r 34

P33 p43

u r
44
t=6
& Y z<

p34 p.44

F ig u re 5 .2 0 : E igenvalue Decomposition - rq C a lrn i^ -; on

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 203 -
Processor 1 Processor 2 Processor 3

*11 *21

rn °

12 22

12 *22^- „
QR Decomposition

*23 > i2 2 *32

r
M
13 23 N .
H
/2 2 °
x /
~\ v ' /
rl l 12 w f V ..

x ^ t-*

CRq)t - p

P12 r2 2 \

next QR deconp.
rn °

22 *23 / X PI3

p22 r2 3 ^
/ X

r 33/ X \ r23 r 24

P23 r 33X x 32 *42

\ l 3 34
‘ rV
“'») f 9
*34 *41

\,S-
‘ p34 p«

Figure 5 .2 1 : E igenvalue Decom position - A c t i v i t y Chart

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 204 -

im p lem e n ta tio n .

L inear a n d tria n g u la r a rr a y a rc h ite c tu re s w ere p r e s e n te d for Givens

m eth o d a n d th e no tio n of dual a rra y s was given. I t w as shown th a t th e

ac tiv ity c h a r t of th e lin e a r a rr a y provides a useful to o l fo r th e sy ste m a tic

c o n s tru c tio n of a rra y s of h ig h er dim ension. This id e a was form alized to

include a lg o rith m s t h a t satisfy a p a rtic u la r m odel, th u s g u aran teein g an

a rr a y t h a t is g e n e ra lly applicable w ithin a c lass of p ro b le m s. Som e id ea s of

r e a l analysis w ere c a s t into a fram ew ork a p p ro p ria te fo r analyzing a rr a y

co m p u ta tio n s th u s allowing a convenient m e th o d fo r analyzing th e

p e rfo rm a n c e of a p a rtic u la r p r o g ra m /a rc h ite c tu r e com bination. Although

sim ple ex am p les w ere analyzed; th is tech n iq u e is b eliev ed to b e th e c o rr e c t

a p p ro a c h fo r a tta c k in g m ore g e n e ra l p ro b le m s in w hich p ro g ra m an d

c o m m u n ic a tio n c o sts a re d ifferen t and n o t n e c c e s s a rily in te g e r valued.

D ifferent p ro g ra m se g m e n ts an d d ifferent d a ta p a th s m ay have widely

varying a s so c ia te d costs, for exam ple as in a d is tr ib u te d sen so r netw ork

[CB79 for exam ple]. N e a rest n eighbour c o n n e ctio n s only was an im plicit

a ssu m p tio n in th is work, how ever m an y in te re s tin g possibilities fo r

g e n e ra liz a tio n exist. For exam ple, if d a ta n e e d s to b e tra n s fe rre d to a

d is ta n t p ro c e sso r, w h at is th e im p a c t of accom plishing th is tra n s fe r in a

sm all n u m b e r of ste p s, e a c h of w hich is la r g e r th a n sim p ly th e d istan ce to

th e n e a re s t neig h b o u r?

Finally, a n a rr a y a rc h ite c tu re was given fo r th e c e le b ra te d QR

algorithm . The u tility of dual a rra y s was b ro u g h t o u t in th is exam ple.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BUSLIOGRAFHY

[CB79] D. Cohen, J. B a rn e tt, Y. Yem in i, D. Schw abe, "DSN - D istributed


t.

S ensor Networks," In fo r m a tio n S c ie n c e s In s titu te , Univ. of

S o u th ern California, Working p a p e r ISI/WP-12, April, 1S79

[Ch75] S.C. Chen, "Speedup of Ite ra tiv e P ro g ra m s in M ultiprocessor

System s," Ph.D D issertation, U n iversity o f Illin o is at Urbana-

Champaign, Dept, of C om puter Science, Jan u ary , 1975

[De82] J.M. Delosme, "Algorithm s fo r Finite Shift-Ran k P ro cesses," Ph.D

D issertation, S ta n fo r d U niversity, Dept. o? E lectrical

Engineering, June 1982

[DM80] J.M. Delosme, M. Morf, "A Tree C lassification of A lgorithm s for

Toeplitz and R elated E quations Including G eneralized Levinson

a n d Doubling Type A lgorithm s," Proc. 19 th IE E E CDC, D ecem ber

10-12, 1980, pp. 42-46.

[DM81] J.-M. Delosme, M. Morf, " S c a tte rin g A rrays fo r M atrix

C om putations," Proc. o f the 2 5 th In t'l. Tech. S ym p . o f SPIE, San

Diego, CA. August, 1981.

[KL80] H.T. Kung, C. L eiserson, "Highly C o n c u rre n t System s," in

In tro d u c tio n to VLSI S y s te m s . (Mead an d Conway), Addison-

Wesley, 1980.

[KR81] S.-Y. Kung, D. Rao, "Highly P a rallel A rc h ite c tu re s fo r Solving

l in e a r Equations," Proc. o f I n i L Conf. o n A coustics Speech and

Signed Processing, A ltanta, GA, M arch. 1961, pp. 39-42.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 206 -

[Ku77] D. Kuck, "A Survey of P a ra lle l M achine O rganization a n d

P ro g ram m in g ," ACM C om puting S u rv e y s, Vol. 9, No. 1, M arch

1977.

[Ku79] H.T. Kung, "L et’s Design A lgorithm s for VLSI System s," Proc. o f

1 st Caltech VLSI S y m p o siu m , pp. 65-90, Ja n u a ry 1979.

[KuSSO] S.Y. Kung, 'VLSI A rray P ro c e sso r for Signal P rocessing," Proc. o f

1st M IT Conf. on A dvanced R esearch on In te g ra te d C ircuits,

J a n u a ry 28-30, 1980.

[LeRG77] LeRoux, C. Gueguen, "A Fixed P oint C om putation of P a rtia l

C o rrelatio n Coefficients in L inear P red ictio n ," Proc. 1877

ICASSP, p p 742-743

[MD81] M. Morf, J.-M. Delosme, "M atrix D ecom positions an d Inversions Via

E le m e n ta ry Signature-O rthogonal T ransform ations." IS S M I n t l .

S y m p . o n M ini and M icrocom puters i n Control a n d M ea su rem en t,

S a n F rancisco, CA, May, 1981.

[Mo74] M. Morf, "F a st A lgorithm s fo r M ultivariable S ystem s", Ph.D

D issertation, E le ctric a l E ngineering Dept., S tanford U niversity.

S tanford, CA, 1974.

[MLNV77] M.Morf, D. Lee, J. Nickolls, A. V ieira, "A C lassification of

Algorithm s fo r ARMA Models an d L ad d er R ealizations,"

P roceedings o f the 1977 IE E E In tl. Conf. o n A coustics, S p e e ch

a n d S ig n a l Processing,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 207 -

[Mu7l] Y. M uraoka, "P arallelism E xposure and E xploitation in

P ro g ram s," Ph.D. D issertation, Dept, of Comp Sci., Univ. of

Illinois a t U rbana, 1971.

[Ro56] M. R oseniicht, In tr o d u c tio n to A nalysis, S cott, F o re sm a n a n d Co.,

1S6B

[SB80] J. S to er, R. B ulirsch, In tr o d u c tio n to N u m e ric a l Analysis,

Springer-V erlag, NY. 1980.

[Sc81] R. Schm idt, "A Signal S u b sp a c e A pproach to M ultiple E m itte r

L ocation an d S p e c tra l E stim ation." Ph.D. D issertation, Dept, of

E le c tric a l E ngineering, S ta n fo rd U niversity, 19B1.

[SK75] A.H. Sam eh, D. Kuck, "L inear S y stem Solvers fo r Parallel

C om puters," Technical R eport 75-701, U niversity o f Illinois at

Urbana-Champaign, Dept, of C om puter S cience, F e b ru a ry . 1975.

[VT6 8 ] H.L. Van T rees, D etection, E s tim a tio n a n d M odulation Theory,

John Wiley an d Sons, New York, 1968.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 208 -

CHAPTER SIX

A LADDER FORM CHIP SET

It is now possible to co n sid er th e design of a VLSI chip o r chip s e t which

would be su itab le for th e im p lem en tatio n of th e a lg o rith m s of C h apters

T hree a n d Four. Such a chip would have a wide ran g e of applications,

c e rta in ly as m an y possibilities as th e re a re fo r la d d e r form s. In addition,

th e ch ip s e t should be useful for o th e r signal p ro ce ssin g su c h as th e DFT of

C h a p te r Two an d th e m a trix operations of th e p rev io u s c h a p te r. A c ritic a l

ex a m in atio n h a s rev ealed th a t CORDIC o p e ra tio n s provide a n a tu ra l

d e sc rip tio n of all th e se algorithm s, hen ce th e y should m ap quite efficiently

onto a g e n e ra l p u rp o se CORDIC p ro cesso r. However, th e la d d e r filter

fo rm u la tio n of C h ap ter T hree (Equations 3.4-3.6) a p p e a rs to pose th e m o st

in te re s tin g challenge, sc th is c h a p te r will c o n c e n tra te on im p lem en tin g th e

sq u a re ro o t norm alized la d d e r rec u rsio n s for th e analysis of 8 KHz sam pled

sp e e c h as a ta r g e t application. In m any ways, th is is a very re a l p ro b lem of

in te r e s t in th e in d u stry today. F u rth e rm o re , th e re su ltin g a rc h ite c tu re will

b e m o re in te re s tin g th a n sim ply a g e n e ra l p u rp o se CORDIC p ro c e sso r a n d it

will also b e cap ab le of rea d ily im plem enting th e rem a in in g algorithm s.

6.1 IMPLEMENTATION OF THE NORMALIZED LADDER EQUATIONS

R ecall t h a t in C hapter Three, la d d e r filte rs p ro v id ed a convenient

s tr u c tu r e fo r sp e e c h analysis. The sq u are ro o t n o rm a liz e d la d d e r re c u rsio n s

w ere co n sid e red to be p re fe ra b le for im p le m e n ta tio n th a n th e ir

u n n o rm a liz e d c o u n te rp a rts b e c au se th e equations w ere few er in n u m b e r

a n d a ll v a ria b le s w ere in m ag n itu d e less th a n unity, th u s m aking fixed po in t

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 209 -

im p le m e n ta tio n viable. The equations w ere e x p re sse d in te rm s of

g e n eralized ro ta tio n s in o rd e r to re d u c e th e com plexity of th e algorithm .

This se ctio n will p u rsu e th e im p lem e n ta tio n of E quations 3.4-3 . 6 using th e

CORDIC algorithm s.

C onsider first th e m a trix p ro d u c t VAN (E quation 3.4) which co rresp o n d s

to ro ta tio n s of th e colum n v ecto rs of A th ro u g h -!?v and i3n . P rio r to

p erfo rm in g a ro tatio n , it is n e c e ssa ry to c o m p u te th e angle (e.g. iSv from

v). This o p e ra tio n re p re s e n ts co n sid e rab le o v e rh e a d since one CORDIC

o p e ra tio n is re q u ire d to c a lc u la te xc a n d th e n a second o p e ra tio n is

n e e d e d for calculating in o rd e r to c o m p u te fro m x . Due to th e

sp a rs e n e ss of A it is p refe ra b le to im p le m e n t th e ro ta tio n by ~3,v i.e. th e

p ro d u c t 'AN1 w ith s tra ig h t forw ard m u ltip lic a tio n (also using a CORDIC

p ro c e sso r) th e re b y avoiding th e n e e d to c a lc u la te However, th e p ro d u c t

A N is no lo n g er sp a rse an d th e ro ta tio n of its colum ns th ro u g h •#„ is m o st

efificently realized as a v e c to r ro tatio n .

A two p ro c e sso r realizatio n of one la d d e r filte r s ta g e is shown in Figure

6.1 . H ere, it is assu m ed th a t b o th p ro c e s s o rs c a n co m p u te all the CORDIC

functions, th e p a rtic u la r fu nction being s e le c te d by a s e t of control signals

w hich r e p r e s e n t th e value of 'ml and id en tify th e zero fo rced variable 2 or

y in th e CORDIC algorithm . F u rth e rm o re , th e spu rio u s scale fac to rs which

a p p e a r in th e CORDIC re c u rsio n s a re a ssu m e d to b e norm alized out and th e

convergence reg io n of th e algorithm s is a ssu m e d to be e x ten d ed by one of

th e m eth o d s of C h ap ter Four.

E ach tim e slo t in Figure 6.1 c o rre sp o n d s to one com plete CORDIC

o p e ra tio n an d is r e fe rre d to as a ''vnacrocycls ”. It is f u rth e r subdivided in to

m an y "m icro cyctes" e a c h of which c o rre sp o n d s to one ite ra tio n of th e

CORDIC algorithm .

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

CORDICBLOCK1

HYP LIN CIRC HYP p+ HYP 1+

m»-1 -I

y -0 z-0 y-0 p+

COMPUTE i)° COMPUTEpi)c ROTATE COMPUTE 0 , HYPERBOLIC ROTATION

BY- BY0„.

HYP CIRC LIN CIRC HYP

COMPUTE i c COMPUTE - 0 , COMPUTE - p i ; ROTATE HYPERBOLIC ROTATION


BY

MOTES Ri d em o n s Ilie/H i s o a lc h p a d fea>slcr *c = ' r 1

Figure 6 . 1 : r.ORniC Implem entation o f Square Root Ladder Form


- 211 -

During 7=1, 2 th e first CORDIC block c o m p u te s th e m a trix p ro d u c t

'A N (w hich is s tra ig h t forw ard d u e to th e sim ple s tr u c tu r e of A).

S im ultaneously, th e seco n d block p re p a re s fo r th e r o ta tio n th ro u g h ~SV (o r

m u ltip lic a tio n by V by com puting th e angle as

= t a n -1 —
ir

Then th e two colum n v e c to rs of ’A N a re r o ta te d th ro u g h one colum n in

e a c h p ro c e s s e r. S ubsequently, th e final J r o ta tio n (E quation 3.6) is

p e rfo m e d during 7 =5 a fte r com puting its "angle" J#p as

J*p = ta n h -1 p+

The u p d a te d variables p+y+ a n d 77+ a p p e a r in various tim e slo ts during th e

c o m p u ta tio n .

The la d d e r re c u rsio n s a re rea d ily im p le m e n te d w ith two "p erfect"

CORDIC p ro c e sso rs in five CORDIC o p eratio n s. This ra is e s th re e issues.

F irst, it is possible to design a " p e rfe c t” CORDIC, i.e., one which does n o t

suffer fro m th e scaling a n d convergence p ro b le m s of th e CORDIC algorithm s

w hich w ere p o in te d o u t in C hapter Four. Secondly, c a n th e s e be m ade to

o p e ra te sufficiently fa s t for th e ta r g e t application? Finally, design th is chip.

The l a t te r point will be ex am in ed first a n d a d e ta ile d answ er to th e

se co n d q u e stio n will n a tu ra lly arise. Note a t th e o u ts e t th a t one of th e

c o n trib u tio n s of C h ap ter Four was to provide e x tre m e ly low overhead

so lutions to th e scaling an d convergence s h o rtc o m ings of th e CORDIC

alg o rith m s, th u s answ ering th e first point.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 212 -

6 .2 LADDER FORM CHIP ARCHITECTURE

P rio r to em b ark in g on a chip design, it is n e c e s s a ry to e sta b lish th e

d e sig n c o n stra in ts, which a re now item ized:

1) The chip will be fa b ric a te d in a single polysilicon, du al th resh o ld ,

tl-channel silicon g a te MOS p ro ce ss. This decision is b a s e d on th e

relativ ely good availability of su c h p ro c e sse s as well as th e ir

o m n ip resen ce in th e a c ad e m ic VLSI com m unity.

2) A single chip should be capable of im plem enting a t le a s t one and

p re fe ra b ly m o re s ta g e s of a la d d e r filter w ith an in p u t r a te of 3000,

16 b it sam p les p e r second. Multiple chips should b e easily

in te rfa c ed , providing p ro to c o l fre e d a ta tra n s fe r, so t h a t filters of

larg e o rd e r an d m u ltip ro c e s s o r a rra y s (d a ta flow a rc h ite c tu re s )

m ay b e read ily c o n s tru c te d .

3) Fixed poin t CORDIC alg o rith m s will be u tilized since m an y signal

p ro cessin g alg o rith m s, especially square ro o t n o rm alized la d d e r

form s, a re conducive to th a t.

4) The chip should be m ic ro p ro g ram m ed fo r flexibility an d design

ease. This will allow ra p id m odification of th e co n tro l p ro g ra m in

applications o th e r th a n s p e e c h analysis.

These c o n s tra in ts su g g e st th e g e n e ra l s tr u c tu r e of F igure 6.2 which

c o n sists of two CORDIC p ro c e sso rs, som e sc ra tc h p a d memory,- in p u t/o u tp u t

(I/O) and a m ic ro p ro g ra m c o n tro lle r. W hether o r n o t to include th e

c o n tro lle r on th e sa m e chip as th e a rith m e tic facility is a n in te re s tin g issue

(A rithm etic facility h e re re f e rs to th e CORDIC blocks, som e s c ra tc h p a d area,

I/O a n d bus s tr u c tu r e .) An onboard c o n tro lle r is a ttra c tiv e in m any

ap p licatio n s b u t it m ak es chip te stin g difficult. F u rth e rm o re , a la d d e r filter

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 213 -

CORDIC BLOCK 1

INPUT SCRATCHPAD
PORT REGISTERS PORT

CORDIC BLOCK 2

MICROPROGRAM CONTROLLER

Figure 5 . 2 : Dual-CORDIC Chip A r c h ite c tu r e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 214 -

c o n s tru c te d of m a n y of th e s e chips c a n b e o p e ra te d w ith a single co n tro ller

since e a c h chip in th e la d d e r p e rfo rm s th e sa m e operations. For

developm ent p u rp o se s, th e decision was m a d e to have a s e p a ra te controller,

how ever th e o p tio n to in te g ra te it a t a la te r d a te rem a in s.

Only th re e s c ra tc h p a d locations a re a c tu a lly re q u ire d to execute one

sta g e of th e la d d e r algorithm s, as in d ic a te d in F ig u re 6.1, however, a to ta l of

eight will b e in clu d e d in th e p ro to ty p e chip to allow fo r applications th a t a re

m ore sto ra g e intensive.

The in p u t a n d o u tp u t p o rts a re com pletely syn ch ro n o u s p o rts w ithout

any h a n d sh ak e p ro to co l. S u ch p o rts a re m o st n a tu r a l fo r la d d e r form s and

th e a rra y s of C h a p te r Five, w hich a re in h e re n tly d a ta flow a rc h ite c tu re s with

local connectivity. M ultiple chips m ay b e re a d ily co n n ected w ithout

u p se ttin g th e la d d e r d a ta flow. Since m any r e a l tim e signal processing

ap plications a re of a d a ta flow variety, th e I/O s tr u c tu r e is quite generally

applicable.

The im p le m e n ta tio n of a single CORDIC p ro c e s s o r will be considered in

d etail next. The a rc h ite c tu r a l definition of th e a rith m e tic facility will th e n

be c o m p le te d b y specifying th e bus s tru c tu re . D iscussion of th e co n tro ller

is d e fe rre d to a l a t e r section.

6.3 DESIGN OF A CORDIC PROCESSOR

In focussing a tte n tio n o n a single CORDIC p r o c e s s o r of Figure 6.2, th e

following q u e stio n s m u s t b e resolved.

1) How a r e th e scaling an d convergence p ro b le m s of th e CORDIC

alg o rith m s to b e solved?

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 215 -

2) The CORDIC ite ra tio n s involve th re e equations. Will th e se be

e x e c u te d c o n c u rre n tly o r in seq u en ce?

3) Is th e a rith m e tic in th e CORDIC e q u a tio n s to b e done b it serially o r

b it p arallel?

4) How a re th e tru n c a tio n e rr o r s t h a t o c c u r in th e CORDIC ite ra tio n s

handled?

All a rith m e tic will be two's co m p le m e n t in th e in te r e s t of sim plicity and

sp e ed . Ite m (4) is re a d ily handled b y p e rfo rm in g all in te rn a l a rith m e tic to

20 b its, sin ce th e additional fo u r g u a rd b its g u a ra n te e zero e rr o r due to

tr u n c a tio n in s ix te e n CORDIC ite ra tio n s [W a7l]. Ite m s (2) and (3) have a

p ro fo u n d im p a c t on chip a re a a n d th ro u g h p u t. P a ra lle l execution of

eq u a tio n s an d a rith m e tic will c le a rly re s u lt in h ig h e r chip th ro u g h p u t b u t

also in m o re chip a re a and h en ce re d u c e d yield. However, for a given filter

of la rg e o rd er, few er chips of h ig h er th ro u g h p u t would b e re q u ire d since

m o re s ta g e s of th e filte r could b e co m p u te d b y a single chip com pared w ith

a single, serially b ased , lower th ro u g h p u t chip. The b it serial a rithm e tic

re a liz a tio n s offer p o te n tia l ad v an tag es of re d u c e d pow er and pinout. A

specific configuration will c le a rly be c h o se n b a s e d on th e p ro b lem to be

solved.

R e tu rn in g now to ite m 1) above, re c a ll t h a t so m e existing as well as new

so lu tio n s to th e convergence en d scaling p ro b le m s w ere stu d ie d in C h ap ter

F o u r These ra n g e d fro m sim ple id ea s to e la b o ra te sc h em e s of p re ro ta tio n s

a n d scaling cy cles w hich involved sp ecial c o n tro l an d hardw are.. The

im p o rta n c e of th is choice is n o t to b e u n d e rs ta te d in view of its m ark e d

e ffect on chip an d c o n tro l com plexity as well as th ro u g h p u t. Clearly th e

s u p e rio r sc h em e is th e new one of S ection 4.2 sin ce it involves no h ardw are

o r c o n tro l overhead a n d in cu rs only a m inim al s p e e d p en alty . F u rth e rm o re ,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 216 -

since th e sc h e m e re lie s only on th e s ta n d a r d CORDIC recu rsio n s, no sp e cia l

o p e ra tio n a l c a p ab ility beyond th e shift a n d ad d re q u ire m e n t need be b u ilt

in to e a c h CORDIC p ro ce sso r. The seq u en ce of exam ple 4.2 'will be em ployed

■with m = l a n d a sim ila r sequence c a n b e g e n e ra te d for 771 = - 1.

Given th is decision, various a rc h ite c tu re s fo r th e a rith m e tic facility will

b e stu d ie d n ex t.

6.3.1 The F ully P arallel CORDIC Block

The CORDIC eq u atio n s a re re c a lle d fo r convenience:

r
1 rr.^ d i

-AkOi 1 yi

* t+ i = zi ~ £ Mi “ t

S uppose th a t all th re e CORDIC eq u atio n s a re e x e cu te d c o n c u rre n tly

using b it p a ra lle l a rith m e tic . This co n fig u ratio n will b e known as th e

" f u l l y p a ra lle l" a p p ro a c h and it exhibits th e h ig h e st th ro u g h p u t of all th e

a r c h ite c tu r e s to b e c o n sid ered while occupying th e m o st area. Since th e

c h o sen scalin g te c h n iq u e m ak es use of th e s ta n d a rd CORDIC equations, th e

m ajo r logic c o m p o n e n ts req u ire d a re a d d e rs a n d tw o's com plem ent p a ra lle l

sc a la rs. F ig u re 6.3 shows th e fully p a ra lle l a rc h ite c tu r e whose re g is te r

tra n s fe r language (RTL) d e scrip tio n is:

t l : § 1= BUFy «- X ; B U F y «- Y ; B U F y <- Z

SH FT y <- B U F y : S H F T y <- B U F y ; ZAUa <- ROM

$ 2: X <- XAU ; Y <- YAU ; Z <- ZAU

w here t i is th e i th m icrocycle an d SH FT y, SH FT y re fe r to th e sc ale rs in th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 217 -

< ■ /-

A rith m e tic U nit

+ /-

A rith m e tic U n it

+ /-

F igure 6 . 3 : The F u lly P a r a l le l CORDIC Block

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 218 -

X a n d Y channels respectively, (n o te t h a t c o n tro l in form ation is assu m ed to

b e valid th ro u g h o u t th e m icrccycle).

This d e scrip tio n is d elib erately -written in a two p h ase clock fo rm a t as

th is is th e rea liz a tio n ch o sen fo r th e a rc h ite c tu re , although it is c le a rly

quite a g e n e ra l descrip tio n b e c au se ®lt $>2 m ay b e viewed as th e active and

in activ e p h a se s of som e clock 'C'.

D uring th e first phase, f j, d a ta is tra n s fe r re d from th e d a ta re g is te r,

th ro u g h th e buffers to th e sc a le rs a n d a rith m e tic units. This is th e c ritic a l

tim ing p a th . At th e end of th e seco n d p hase, th e new re s u lts w hich a re

e m e rg in g fro m th e AU’s are w ritte n b a c k in to th e d a ta re g iste rs com pleting

one ite r a tio n of th e CORDIC equations. Since d a ta is only w ritte n b a c k on

$g, th e r e t u r n p a th c a n be p re c h a rg e d on §,, resu ltin g in a sp e ed

ad v an tag e. The z -c h a n n e l is p a rtic u la rly sim ple since one of its o perands,

a, is su p p lie d by a ROM. A c o n tro l signal b a s e d on e ith e r th e sign of y or

z is u s e d to se le c t th e d irectio n of r o ta tio n a t e a c h iteratio n , th u s realizing

th e fu n c tio n to b e com puted. F o r exam ple, in com puting th e a rc ta n

fu n ctio n , th e goal is to drive y n -» 0 th u s m aking th e sign of y th e

c rite r io n for selectin g th e d ire c tio n of r o ta tio n Notice th a t a n additional

signal is re q u ire d for th e z-channel to define th e d ire c tio n of ro ta tio n , b a s e d

on £ (se e S e c tio n 4.1).

E a c h channel (i.e. X , Y , Z ) of th e fully p a ra lle l a rc h ite c tu re h as

d e d ic a te d b u sse s for th e re a d a n d w rite functions. In fact, a single b u s m ay

be s h a re d since th e se functions a re on a lte rn a te clock p h ases.

U n fo rtu n ately , p rech arg in g is th e n p re c lu d e d w ith a two p h a se clock

sc h em e . W hether or n o t p re c h a rg in g would realize a significant sp e ed

a d v an tag e, is a question w hich c a n only b e answ ered th ro u g h c irc u it

sim ulation, and a d etailed knowledge of th e p ro c e ss p a ra m e te rs (the l a t te r

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 219 -

is fre q u e n tly n o t available in a university e n v iro n m en t). Since th e m ixed

p a ra lle l-se ria l a p p ro a c h e s of th e n e x t sectio n s, a re of p rim a ry in te r e s t in

th is th esis, th e d e ta ile d developm ent of th e fully p a ra lle l sy ste m is le ft to

th e re a d e r.

Som e d ev elo p m en t a n d layout was how ever n e c c e s s a ry in o rd e r to

o b ta in chip size e s tim a te s fo r com parison p u rp o se s. These will be d iscu ssed

la te r; it suffices to s ta te t h a t th ey w ere b a se d on th e use of p se u d c -sta tic

re g iste rs, r e fre s h e d once p e r clock, a b a rr e l s h if te r [MC80] au g m en ted to

p ro p o g ate sig n fo r th e a rith m e tic scaling and a fully active, tw o’s

c o m p lem en t a rith m e tic u n it (an active c a rr y c ir c u it provides th e ability to

p ro p o g ate th e c a r r y signal rapidly. In c o n tra s t, th e M anchester c a rry chain

[MC80] p e rfo rm s poorly in propogating high levels). A c u te design of th e AU

is possible since th is u n it is only req u ired to a d d a n d s u b tra c t. Addition c a n

be sim ply done w ith one full a d d e r [Pe72] p e r b it p o sitio n a n d a half a d d e r in

th e le a s t sig n ifican t b it position. However, by m ak in g th e l a t te r a full a d d e r

also, a tw o's c o m p le m e n t addition (i.e. a s u b tra c tio n ) is read ily obtained.

The s u b tra h e n d is logically co m p lem en ted a n d a d d e d to th e o th e r op eran d

yielding a o n e’s co m p le m e n t addition. F orcing a c a rr y in to th e le a s t

significant p o sitio n re s u lts in a two’s c o m p le m e n t o p e ra tio n (and th is avoids

th e n e e d to w ait for th e e n d around, c a rry [Hw79] re q u ire d by o n e ’s

co m p lem en t addition).

A logic d ia g ra m of th e AU is given in F igure 6.4. Notice th a t an in v e rte r

has b e e n in c lu d e d to fo rm th e one’s c o m p le m e n t of an op eran d during a

su b tra c tio n . S tric tly speaking, this c a n b e e lim in a te d th ro u g h th e addition

of a single c o u p le r in th e re g is te r cell, w hich e n a b le s th e co m p lem en t of th e

re g is te r c o n te n ts onto th e bu s (Figure 6.5). U n fo rtu n ately , th e ad d itio n of a

c o n tro l line in e a c h re g is te r to activate th is c o u p le r consum es m o re a re a

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 220 -

n
•S.. (sum)

i+1

i+1

ADD

ADD

a i bi

Figure 6 . 4 : B it S l i c e o f A r ith m e tic Unit

WRITE

REFRESH

READ NEGATIVE

READ

Figure 6 . 5 : A R e g i s t e r Cell

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 221 -

th a n th e e x tr a in v erter.

In a d d itio n to being a specific rea liz a tio n of th e CORDIC block, th e fully

p a ra lle l s tr u c tu r e also serves as a functional a r c h ite c tu r e fo r th e o th e r

co n fig u ratio n s to be studied.

6.3.1.1 P ip elin in g

The individual CORDIC ite ra tio n s 'm ay b e pip elin ed as show n in Figure

6 .6 , w hich is p a rtic u la rily convenient in a d a ta flow a rc h ite c tu re . This

m eth o d e sse n tia lly lea d s to a d istrib u te d s c a le r (unlike F igure 6.3) w here a

sm all s h ifte r, su p p o rtin g one or two shift values, is b u ilt for e a c h ite ra tio n of

th e alg o rith m . In terestin g ly , su c h a rea liz a tio n is likely to b e sm a lle r in an

MOS tec h n o lo g y th a n th a t of Figure 6.3, since th e s e le c t lin es to th e sc ale r

a re no lo n g e r n e e d ed . In te rm e d ia te sto ra g e re q u ire d b e tw e e n ite ra tio n s is

read ily a c c o m o d a te d by th e p se u d o sta tic node sto ra g e aflo rd ed by th e MOS

technology.

The c o m p u ta tio n r a te (clock r a te ) of th e s tr u c tu r e of F igure 6 .6 m ay

now b e s e le c te d th ro u g h a com bination of th ro u g h p u t and late n cy

re q u ire m e n ts .

6 .3 .2 The P arallel-Serial CORDIC Block

" P a r a lle l- S e r ia l'’ re fe rs to th e rea liz a tio n in w hich th e CORDIC

eq u atio n s a re e x e c u te d sequentially, however w ith b it p a ra lle l a rith m e tic .

With re fe re n c e to th e a rc h ite c tu re shown in F igure 6.7 th e RTL d escrip tio n

tl: BUF <- X; SHFT «- Y

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 222 -

fZ

o
o
CO
o
»—I
Q
O'
o
o
X
•a
•c
0)
CM
Q.
O
VO
CO U_ J— VO
<D
s-
3
cn
•p*
Lu

co

00 U_ >—

X N

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 223 -

MUX

>-
— >N— -^5
CU
c c o
o> cn e
CO CO
0)

Figure 6 . 7 : The P a r a l l e l - S e r i e l CORDIC Block

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 224 -

$2: TEMP «- AU /* TEMP is a s c r a tc h p a d re g is te r V

t2: BUF <- Y; SHFT <- X

$ 2: Y <- AU

t3: BUF - Z; AU «- BOM (a*); X «- TEMP

$2: Z «- AU

N otice th a t th e d a ta r e g is te r s m ay b e re a d on two b u sses so th a t b o th

AU o p eran d s a re a c c e sse d sim ultaneously, saving co n sid erab le tim e over

sequential fe tc h schem es. B eyond th e obvious c o m p o n e n t red u c tio n , th e

m ajo r difference b etw een th is sc h em e and th e fully p a ra lle l a p p ro a c h is th e

n eed to buffer th e new value, xi+1, u n til T/i+i has b e e n co m puted, since th e

la tte r q u a n tity re q u ire s th e value Xj. This buffer could be e ith e r one of th e

s c ra tc h p a d lo catio n s or an ad ditional location b u ilt a d ja c e n t to th e x -

re g iste r. The ev en tu al tra n s fe r of Xi+i fro m th e buffer to th e x r e g is te r is

done on of C3. W hereas previously all writing was p e rfo rm e d during <S?2,

m aking this tra n s fe r on saves a n e n tire clock cycle, i.e., it is n o t

n e c e ssa ry to w ait fo r th e s u b se q u e n t $2 o ccurring in C4. However, if th e

buffer was one of th e s c ra tc h p a d re g is te rs , being tra n s fe rre d to x on

WRBUS, th e n p rec h a rg in g of th is bus during is n e c e ssa rily p rec lu d ed , a t

le a s t during C3. This im poses a severe size p e n a lty on th e a rith m e tic u n it

and s c ra tc h p a d re g is te rs sin c e th ey a re th e n re q u ire d to pull th e bus in

b o th d irectio n s. A p re fe ra b le solution is to build a sm all, slow buffer

r e g is te r a d ja c e n t to th e X re g is te r w ith a short, d e d ic a te d tra n s fe r p ath.

Since WRBUS will th e n no lo n g e r be re q u ire d fo r TEMP -» X , it m ay be

p re c h a rg e d as before. The s h o r t tra n s fe r p a th to X m ay b e p re c h a rg e d on

$2 a n d u se d on $ lt or vice v e rsa , th e la tte r being p re fe ra b le since YfRBUS

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 225 -

and th e d e d ic a te d tra n s fe r p a th to X m ay be p re c h a rg e d w ith th e sam e

signal. An a lte rn a te a p p ro a c h is to n o t p re c h a rg e th e tr a n s f e r p a th a t all

and u tilize th e e n tire clock cycle to effect th e tra n s fe r. J u s t p rec isely w hich

o p tio n is p re fe ra b le c a n be a s c e rta in e d th ro u g h c irc u it sim u latio n fo r th e

specific, p ro c e s s being used.

This re a liz a tio n of th e CORDIC block h as th e logic b u iit u n d e r a th re e

bus s tr u c tu r e fo r m o st effective a re a utilization. B oth AU o p eran d s a re

a c c e sse d sim u ltan eo u sly , via d ire c t drive of m e ta l b u sses th ro u g h couplers,

while th e th ird b u s is u se d to w rite th e re s u lt b a c k in to one of th e reg iste rs.

P a rallel a c c e s s of th e AU o p eran d s re s u lts in th e b e s t th ro u g h p u t.

The bu s s tr u c tu r e fo r a single b it p a th is show n in F igure 6 .6 . M03

tech nology affords a p a rtic u la rily notew orthy m e th o d of b u s c o n tro l th ro u g h

th e use of couplers. The s c ra tc h p a d a re a is also n e a tly a c c e sse d by

effectively d istrib u tin g th e m ultiplexing to be lo cal to e a c h re g is te r. All

re g is te rs an d th e a rith m e tic u n it a re sim ilar to th o se of th e p arallel

realization.

6 .3 .3 The Serial-P arallel R ealization

W hen b it se ria l a rith m e tic is em ployed to c o m p u te all th re e CORDIC

ite ra tio n s c o n c u rre n tly , th e rea liz a tio n is te r m e d s e r ia l- p a r a lle l. While

th is an d th e fully s e ria l a p p ro a c h of th e n e x t s e c tio n a re in c lu d e d h e re for

co m p leten ess, it is to b e em p h asized th a t th e se w ere developed by Peng Ang

[AnBO] b a s e d on th e fu n ctio n al a rc h ite c tu re of S e c tio n 6.5.1.

The d e ta ils of th e CORDIC block is shown in F igure 6.9. The /- r e g is te r

in d ic a te d in t h a t figure a c ts as a c o n tro lle r to th e 2* s c a le r. It se le c ts

w hich of th e 20 b its in th e X a n d Y re g is te rs g e t fed d ire c tly tc th e full

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 226 -

WRITE

READ ON BUSS1
CORDIC REGISTERS

READ ON BUSS2 ____

WRITE '
READ (BUSS1 nnly ) AU BUFFER REGISTER

BARREL SCALER

OPERATION CONTROL ARITHMETIC UNIT

WRITE REG 0
READ REG 0

WRITE REG 1
READ REG 1 SCRATCHPAD REGISTERS

WRITE REG 7
READ REG 7

F ig u re 6 .8 : Bus S tr u c tu re o f P a r a l le l - S e r i a l A rc h ite c tu re

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 227 -

BUFFER

AU

I-REGISTER/SHIFTER MUX

AU.

BUFFER

:-REGiSTER

AU
a.

SIGNY-
CONTROI.

S1GN/'.■ ■MODE

Figure 6 . 9 : The S e r i a l - P a r a l l e l CORDIC Block

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 228 -

ad d er. I t is p e rtin e n t to n o te th a t th e com plexity of th e s c a le r of a p a ra lle l

im p le m e n ta tio n is com pletely b y p a sse d in th e se ria l a rith m e tic a p p ro a c h .

Timing is h iera rc h ica lly organized in to th re e levels. These c o m p rise

(a) th e m a c ro (or T) cycles

(b)" th e ite ra tio n (or I ) cycles

(c) th e se ria l (or 5 ) cycles.

There a re tw enty /-c y c le s n e s te d w ithin each m a c ro cycle,

c o rresp o n d in g to th e ite ra tio n s of th e se ria l a rith m e tic on 20 b it q u a n titie s.

F u r th e r e m b e d d e d w ithin e a c h /-c y c le a re tw enty 5 -c y c le s for th e

individual CORDIC ite ratio n s. .At th is level, th e ite ra tio n s involving th e X , Y

an d Z re g is te r s of th e re sp e c tiv e CORDIC blocks a re serially c o m p u te d and

re c irc u la te d b a c k into th e ir X ', T a n d Z ' re g iste rs. At th e s t a r t of a new

ite ra tio n cycle, th e c o n tro lle r invokes a p a ra lle l load from th e p rim e d

re g is te rs to th e ir non-prim ed c o u n te rp a rts.

6 .3 .4 The F ully S erial CORDIC B lock

Bit se ria l a rith m e tic is u s e d to c o m p u te th e th re e CORDIC eq u atio n s in

sequence in th e fully se ria l block. It is esse n tia lly a th ird of th e serial-

parallel a p p ro a c h an d so will n o t b e covered in any d e ta il (re fe r to

[AMLA81]). It will be s e e n th a t fo r th e t a r g e t application, th is s tr u c tu r e does

n o t ex h ib it sufficient th ro u g h p u t in a c u rr e n t conservative nMOS technology.

6.4 ARCHITECTURAL TRADEOFFS


A COMPARISON OF THE CORDIC REALIZATIONS

Som e s ta tis tic s on th e four CORDIC block realizations, p a r tic u la r ly size,

have a lre a d y b e e n m entioned. These w ere b a sed o n six te e n b it s c ra tc h p a d

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 229 -

r e g is te r s an d in p u t sam ples w ith tw enty b it q u a n titie s X , Y , Z ; th e additional

four b its being ad d ed to th e CORDIC blocks to r e d u c e th e effects of roundoff

a t e a c h s te p of th e ite ratio n . A m o re c o m p le te s e t of p e rfo rm a n c e figures

a re given in F igure 6.10. C om parisons w ith th e S p e a k a n d S p e ll sy n th e siz e r

by T exas In stru m e n ts Inc. [WB78] a s well as w ith th e Bell L ab o rato ries Echo

C anceller chip [CD80] a re also in clu d e d in t h a t figure. Som e of th e

q u a n titie s in th e tab le a re quite sub jectiv e so th e rea so n in g b e h in d th e m will

b e given h e re .

A rith m e tic A r e a : This a r e a c o rre sp o n d s to th e two CORDIC blocks and

t h e s c ra tc h p a d re g iste rs. In th e c a se of S p e a k a n d Spell, th is is th e

a r e a of th e sy n th e size r chip w ithout th e in te g ra te d digital to analog

c o n v e rte r. U nfortunately, it was n o t p o ssib le to s e p a ra te o u t th e

a rith m e tic a re a from th a t o ccu p ied b y th e c o n tro l functions fo r th e

e c h o c a n ce lle r chip. Size e s tim a te s for th e CORDIC chips a re b a sed on

a five m ic ro n nMOS technology. It is s e e n t h a t all four CORDIC

re a liz a tio n s a re sm aller th a n th e c o m m e rc ial chips by a co n sid erab le

m a rg in . Interestingly, th e fully p a ra lle l CORDIC rea liz a tio n is less th a n

tw ice th e size of th e s e ria l-p a ra lle l tec h n iq u e. The la tte r m eth o d

r e q u ir e s som e buffer r e g is te rs w hich a re n o t n e c c e s s a ry in th e parallel

im p le m e n ta tio n as is evidenced fro m F ig u res 6 .6 and 6 .8 . Hence, th e

p a ra lle l a p p ro a c h p u ts th is a re a to good u s e resu ltin g in a m u ch

sm a lle r a re a difference b etw een th e two th a n m ig h t have b e e n

originally sp eculated.

I t h a s b e e n n o te d th a t th e CORDIC s p e e c h analysis chips a re sm a lle r

t h a n b o th co m m ercial efforts, b u t it is only fa ir to m en tio n t h a t

in d u stria l designs a re re q u ire d to b e ro b u s t to th e p ro cessin g v ariatio n

of th e fa b ric a tio n facility. C onsequently, th e y a re b a se d on m o re

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

LADDER FORM SPEECH ANALYSIS CHIP


FULL PARALLEL S E R IA L FULL SPEAK & echo:
PARALLEL S E R IA L PARALLEL S E R IA L SPELL CANCELLER

(P a rtia lly
A R IT H M E T IC Dynamic)
AREA 1.76 1 .4 3 1 .0 0 .7 3 .1

(M M2 )* 16 13 9 6 .3 28 72
( A r i t h m e t i c Area) ( In c lu d e s C n t l )

If T R A N S IS T O R S * 6000 4720 4000 3100 *12000 35000


( A r i t h m e t i c Area) (In c lu d e s C n t l )

CONTROL
COMPLEXITY 1 -2 1 1 >1 1 -2 B IG

(Synthesis
Only)
R E L A T IV E
THROUGHPUT 20 6.67 1 0 .3 3 1 2 .5 1 2 6 (T A P S )/5 '

M I N CLK RATE*
FOR 1 STAGE
0.6 MHZ 1.9 MHZ 12.7 MHz 38 MHz * ,8 MHz ■v 2 MHZ.
(8 KHZSPEECH
A N A L Y S IS )

(Random
MICROPROGRAM
-vl -vl ^1 >1 Logic)
COST *1

R E L A T IV E
1 .2 1 1 <1 »1 » 1
DESIGN
EFFORT

*- All q u a n titie s ex cept those a sterisk ed are r e l a t i v e .

Figure 6 .1 0 : Performance. Comparison o f Various A r c h i t e c t u r e s


- 231 -

p e sse m istic design p a ra m e te rs th a n th e ty p ical p a ra m e te rs utilized in

th e d esig n of th e la d d e r fo rm chips. While th is acco u n ts for a p a r t of

th e size discrepancy, th e lion’s s h a re is c e rta in ly a ttrib u te d to th e fa c t

t h a t th e CORDIC im p le m e n ta tio n is well m a tc h e d to th e alg o rith m s

(su c h as th e c o rre c t s e t of p rim itiv e o p eratio n s) resu ltin g in a c o m p a c t

realizatio n . This is especially tr u e given t h a t th e la d d e r filter chips also

e x h ib it su p e rio r th ro u g h p u t c o m p a re d w ith b o th Speak and Spell an d

th e echo canceller.

O perational Speed — M n im u m d o c k B a te For One Stage O f


The A n a lysis F ilter With BKHz S a m p le d Speech The title of th is

se c tio n being self ex planatory, it is s e e n th a t if one chip w ere u s e d p e r

sta g e of th e la d d e r filter, th e fully p a ra lle l chip could be clocked a t a

v e ry re a so n a b le 600 KHz, a b o u t tw e n ty tim e s slower th a n th e 12.7 MHz

clock r a t e req u ired by th e se ria l-p ara lle l approach. This is quite

re m a rk a b le in view of th e fa c t t h a t th e fo rm e r is less th a n twice th e size

of th e la tte r . While 12.7 MHz a p p e a rs to b e quite a form idable clock

r a t e fo r c u rr e n t day nMOS technology, it is in fa c t quite rea so n a b le fo r

th e se ria l a p p ro ach es which e x h ib it e x tre m e ly sh o rt p ro p ag atio n p a th s

b y v irtu e of th e ir se ria l n a tu re s . In c o n tra s t, th e fully serial a p p ro a c h

is n o t p a rtic u la rily useful sin c e it re q u ire s a prohibitively h ig h clock

fre q u e n c y for th e ta r g e t five m ic ro n nMOS technology. It m ay well b e

u sefu l fo r lower sp e ed a p p lic atio n s or alternatively, if it was

im p le m e n te d in a fa s te r tech n o lo g y su c h as su b m icro n nMOS (One could

a rg u e th is po in t forever, sin c e in a su b m icro n technology, th e fully

p a ra lle l a p p ro a c h would b e e x tre m e ly d e sira b le an d favourable).

The two c o m m ercial chips a re n o t rea lly cap ab le of s p e e c h analysis (a t

le a s t w ith th e chosen a lg o rith m s) since th e y a re unable to co m p u te th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 232 -

r a t h e r com plex o p e ra tio n s re q u ire d in th e n orm alized la d d e r fo rm

rec u rsio n s. While sq u a re ro o ts co u ld be co m p u te d using ite ra tiv e

tec h n iq u es su c h as New ton’s m e th o d [SW65], th e sp e ed o v e rh e a d would

be prohibitively larg e. However, a sp e ed co m parison w ith th e analysis

chips will still be m ad e on th e b asis t h a t analysis involves five ro ta tio n s

p e r sta g e of th e filter. By com parison, sy n th esis involves a single

r o ta tio n p e r stage. Since S peak a n d Spell is cap ab le of sythesizing

speech a t 10000 s a m p le s /s e c fro m a te n th o rd e r filter, it is effectively

c ap ab le of 12.5 ro ta tio n s a t 8 KHz o r roughly two sta g e s of th e analysis

filte r (We a re rea lly giving S p e a k a n d Spell th e b en efit of th e d o u b t h e re

as a lre a d y m en tio n ed ) a t its o p e ra tio n a l clock ra te , th u s providing a

p e rfo rm a n c e figure in F ig u re 6.10. Sim ilarily, th e echo c a n c e lle r w hich

c o n sists of 126 ta p p e d d elay line sta g e s, is c o n sid e red to b e c a p ab le of

64 ro ta tio n s fo r BKHz data.

R ela tive Throughput -. An a lte rn a te way to co m p are th e o p e ra tio n a l

sp e e d of th e chips u n d e r c o n s id e ra tio n is to exam ine how m an y la d d e r

sta g e s (for BKHz sam ples) could b e c o m p u ted by th e fo u r CORDIC

realizatio n s a t th e sa m e c lo ck r a te . T hat th e y c a n in fa c t b e clocked a t

th e sam e r a te a rise s from t h e ir c o m p a c t an d re g u la r layouts. H ence,

even th e b it p a ra lle l a p p ro a c h e s ex h ib it quite sm all p a th delays as is

ju stified in th e appendix. The c h o sen clock ra te for co m p a riso n is 12.7

MHz, w hich is th e d esig n lim it of th e serial-parallel a p p ro a c h fo r

calcu latin g one sta g e of th e la d d e r filter. Notice once a g a in fro m th e

tab le , th e re m a rk a b le p ro p e rty t h a t fo r less th a n tw ice th e a re a , th e

p a ra lle l a rc h ite c tu re affords app ro x im ately tw e n ty tim e s th e

th ro u g h p u t. H ence, a te n sta g e la d d e r could b e c o n s tru c te d w ith a

single chip, co m p a red w ith t e n chips if th e serial-p arallel a p p ro a c h is

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 233 -

utilized.

The th ro u g h p u t figures fo r th e two c o m m e rc ial chips a re in fe rre d from

th e lite r a tu r e , b a s e d on th e assum ptions above, as well a s th e fa c t t h a t

th e ir m ax im u m c lo ck r a te s a re p rec isely th o se fo r which th e y w ere

designed to o p e ra te , i.e. th e y a re in cap ab le of 12.7 MHz o p e ra tio n .

Again, th is is a d m itte d ly u n fa ir since th e s e designs a re b a s e d o n w o rst

c a se p ro ce ssin g p a ra m e te rs w hereas th e CORDIC b a se d analysis chips

have b e e n d esig n ed w ith ty p ic a l processing p a ra m e te rs . The d ifferen ce

in th e p a ra m e te rs u su ally re s u lts in a s p e e d fa c to r of two. In an y event,

th e ta b le shows c le a rly th a t th e p re s e n t la d d e r filte r chips p a c k m u c h

m o re p e rfo rm a n c e in to a given a re a an d th is is due to th e f a c t th a t

th e ir im p le m e n ta tio n is in tim ately m a tc h e d to th e th e o ry of th e

p ro b le m re s u ltin g in a good m apping of th e alg o rith m s o n to th e

a rc h ite c tu re s . F o r exam ple, th e a rith m e tic u n it's o p e ra tio n s s e t

co n sists of th e m o st n a tu ra l operations describing th e alg o rith m s.

The rem a in in g fig u res in th e tab le a re q u ite subjective. F o r exam ple, all

fo u r im p le m e n ta tio n s a re quite straig h tfo rw ard to c o n tro l w ith th e

c o n tro lle r s tr u c tu r e c h o se n (to b e d escrib ed in th e n e x t section), involving

ju s t d ire c t in te r p r e ta tio n of th e m ic ro in stru c tio n . S peak and Spell is also

qu ite sim ple to c o n tro l b y v irtu e of its m ic ro p ro g ram m ed n a tu re , how ever it

d o e sn o t len d itse lf to being read ily a d a p te d to o th e r ap p licatio n s sin ce it

does n o t provide th e pow erful in stru c tio n s e t of th e c o n tro lle r to be

d escrib ed . The e ch o c a n c e lle r h a s a ran d o m logic c o n tro lle r w hich is quite

larg e, difficult to a lte r and also tim e c o n sum ing to design. The

m ic ro p ro g ra m c o s t (p ro g ram m in g ease and sto ra g e ) of th e analysis chips is

th e sam e for all fo u r a p p ro a c h e s and p ro b ab ly lower th a n th a t of S p e a k and

Spell due to th e sp ecial c o n tro lle r a rc h ite c tu re . Finally, th e CORDIC chips

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 234 -

a re e x tre m e ly re g u la r since th e y c o n sist only of re g is te rs , a d d e rs and

m u ltip lex ers. H ence, th e y a re m u c h e a sie r to d e sig n th a n e ith e r of th e

c o m m e rc ial chips.

R e m a rk : The a u th o r acknow ledges t h a t som e of th e n u m b e rs in F ig u re 6.10

m ay be in c o rre c t sin ce th e y w ere in fe rre d from publications. My apologies

to Texas In stru m e n ts a n d Bell L aboratories, as well as th e re a d e rs , if th is is

in d eed th e case.

6.5 THE IDCROCONTROLLER

A good m ic ro p ro g ra m co n tro l s tra te g y should have a sim ple s tru c tu re ,

b e easily p ro g ra m m e d a n d be cap ab le of efficiently im p lem en tin g th e basic

o p e ra tio n s re q u ire d b y th e signal p ro cessin g a lg o rith m s of in te re s t. In this

p re se n ta tio n , th e c o n tro lle r will be d e sc rib e d (b ase d on th e CORDIC

o p e ra tio n s being of in te r e s t) an d th e n im p lem e n ta tio n s fo r a v a rie ty of

signal p ro cessin g alg o rith m s will be given.

S im plicity of s tr u c tu r e a n d pro g ram m in g is re a liz e d w ith a two level

c o n tro l philosophy. The h ig h e r level o r "m a c ro le v e r' of o p e ra tio n c o n sists of

a s e t of pow erful in stru c tio n s (Figure 6 . 1 1 ), few in n u m b e r, w hich define th e

fun ctio n al o p e ra tio n of th e chip, e.g. as a s p e e c h sy n th e size r, adaptive

equalizer, filte r e tc . A ch ip u s e r n e e d only be c o n c e rn e d w ith th is level of

o p eration. M acrolevei in stru c tio n s m ay invoke one o r m o re o p e ra tio n s a t

th e m icrolevei, th e se c o n d level of control. Note t h a t while th e prefix

"m acro" o r "m icro" is u se d to distin g u ish b e tw e en in stru c tio n s a t th e two

levels of op eratio n , th e y b o th fo rm a p a r t of th e c o n tro lle r’s m ic ro p ro g ram .

A m ic ro in stru c tio n is a single ite ra tio n of th e CORDIC re c u rs io n s (th e p recise

definition d ep en d s on w hich of th e fo u r im p lem e n ta tio n s is u sed ).

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 235 -

M acrolevel in s tru c tio n s a re n o t u niform in th e ir e x ecu tio n tim e. The

CORDIC o p e ra tio n s a re co n sid erab ly slow er th a n th e d a ta tra n s fe r o r SADD

a n d SSUB in stru c tio n s. "While th e l a tte r a re said to req u ire one m icro cy cle,

th e fo rm e r consum e one m ac ro cy c le . In fact, th e CORDIC in stru c tio n s su c h

as MUL, JROT e tc . a re calls to s u b p ro c e d u re s w hich im plem ent th e rec u rsiv e

CORDIC alg o rith m s as a se q u en c e of m icrolevel in stru ctio n s. In o rd e r to

avoid p ro c e sso r w aiting, a s well as in th e in te r e s t of p rogram m ing ease,

m a c ro in s tru c tio n s m ay only invoke o p e ra tio n s of th e sam e group (Figure

6 .11) in th e two p ro ce sso rs.

A sim ple m ic ro c o n tro lle r s tr u c tu r e is shown in F igure 6.12. A two p o r t

m em o ry provides th e n e c c e s s a ry m ic ro co d e to e a c h p ro ce sso r. S e p a ra te

p ro g ra m c o u n te rs c o n tro l p ro g ra m e x ecu tio n a t th e two levels while a field

of th e in stru c tio n is u se d for a d d re ss sequencing via th e n e x t a d d re ss logic

(NAL). All of th e n e c c e s s a ry c o n tro l signals to d ire c t th e o p e ra tio n of th e

CORDIC p ro c e sso rs as well as th e I/O an d sc ra tc h p a d com m unications a re

provided by v ario u s fields of th e m icrocode. An ite ra tio n c o u n te r whose

sequencing is c o n tro lle d by th e NAL, is provided for sim ple looping

c o n s tru c ts . A loop is in itia te d w ith a DO in stru c tio n specifying th e beginning

of a c o n s tru c t. The final in s tru c tio n of th e loop body is signified by a

m ic ro p ro g ram m ed (nanoprogram m ed?!) LOOP bit. At th is point, c o n tro l

r e tu r n s to th e a d d re s s following th e DO in stru c tio n and th e ite ra tio n c o u n te r

is d e c re m e n te d . N otice th a t a DO f o r e v e r facility is provided by se ttin g

n=255.

B oth of th e m acrolevel p ro c e s s e s a re tightly coupled in th e ir a d d re ss

sequencing by v irtu e of a single p ro g ra m counter, PCO. The two p o rt

m em o ry a p p e a rs a s a wide single p o rt device a t th is level of operation.

F u rth e rm o re , sin ce b o th p ro c e s s e s m u st b e from th e sam e group, th e chip

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 236 -

Sign
In s tr u c t io n Operands Reversal O peration Comments
Mnemonic B it(c )
---------------- i
MOVE1 s r c .d e s t no d a ta tr a n s f e r s r c . d e s t a re X.Y.Z o r
sc ra tc h p a d re g s o r I/O

SADD1 k no X+2~kY k i s a 4 b i t unsigned


i n te g e r

SSUB1 k no X -2 'kY u ses sig n re v e rs a l b i t


and SADD in te r n a lly

DO1 n no i n i t i a t e s loop do " fo re v e r" i f n=255


f o r 0<n<254

MUL2 yes Y+eXZ M u ltip ly and Accumulate

DIV2 y es Z+eY/X D ivide and Accumulate

ATAN2 y es Z+etan_1Y/X

CROT2 y es p lan e r o t a ti o n o' ' [X Y f by an g le eZ

ATANH2 y es Z+etanh_1Y/X

JROT2 yes h y p e rb o lic r o t a t on o f [X V]T by eZ |

E=±! 1 - d e n o te s group 1 in s tr u c t io n 2 - deno tes g rc jp 2 in s tr u c tio n

Figure 6.1 1 : M ic r o c o n tr o lle r I n s t r u c t io n S e t

PORT 1

2 - PORT
PROGRAM MEM

PORT 2

CORDIC 1 CORDIC 2

Figure 6 .1 2 : M ic r o c o n tr o lle r A r c h ite c tu r e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 237 -

m ay b e viewed as an SIMD m achine invoking two c o n c u rre n t o p eratio n s, w ith

th e wide m em o ry o u tp u t being th e in s tr u c tio n (altern a te ly , it c a n b e view ed

as a c o n stra in e d MIMD s tru c tu re ). When a gro u p 2 m a c ro in s tru c tio n is

e x e cu te d , p ro g ra m co n tro l is tra n s fe r re d to PC I a n d PC2 a t th e m icrolevel

of o peration. Now, a tru e MIMD s tr u c tu r e exists w ith a se p a ra te in s tru c tio n

fo r e a c h p ro ce sso r. These a re of course, th e a c tu a l ite ra tio n s of th e CORDIC

alg o rith m s. Microlevel p ro ce d u re s m u s t b e of th e sam e le n g th (again fo r

sim plicity) an d p ro g ra m co n tro l is r e tu r n e d to PCO sim ultaneously for b o th

p ro c e sso rs. It is w orth noting th a t it is fre q u e n tly n o t possible to achieve

equal le n g th p ro c e d u re s for th e vario u s CORDIC in stru ctio n s d u e to th e n e e d

fo r scaling cycles and p re -ro ta tio n s [HT80]. However, th e e x e c u tio n tim e

d isp a rity m ay b e re d u c e d co n sid erab ly w ith th e m eth o d s of S e c tio n 4.2.

Finally n o te th a t although th e c o n c u rre n c y s tr u c tu r e of a p ro g ra m is

r e s t r i c te d by th is co n tro ller (e.g. a n u n c o n stra in e d MIMD philosphy would be

m o re g en eral), m any signal processing alg o rith m s fit th e c o n tro lle r well, fo r

exam ple, th e algorithm s to follow.

6.5 .1 The Speech A nalysis M icrocode

R eturning to th e ta r g e t a p p lic atio n of sp e e c h analysis, it c a n now be

m a d e m u ch m o re tangible by a c tu a lly w riting a p ro g ra m in th e m icro co d e

lan guage of th is chip which e x e c u te s th e la d d e r form re c u rsio n s d e p ic te d in

th e " flo w c h a r t" of F igure 6.1. N otice th a t on in stru c tio n defines an

o p e ra tio n for e a c h of th e two p ro ce sso rs.

MOVE 1 ,X ; 1 .X ; N ote SIMD S tru c tu re

MOVE , Y ; i/, Y ~

MOVE 0 . Z ;0 . Z

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 238 -

ATANH ; ATANH \T = 1

MOVE X,Rl;v,Y

MOVE p .Z;0 ,Z

MUL ; ATAN ; N ote MIMD S tru c tu re

MOVE 77 , X ; Z , R2

MOVE R2 , Z ; 7 7 , X

MOVE NOP ; M , Z

E-i

CO
CROT ; MUL

II
MOVE X , R3 ; R1 , X

MOVE Y , R4 ; R2 . Z

MOVE R l , X ; NOP

ATANH ; CROT ;T = 4

MOVE R3 , X ; Y , X

MOVE R3 , X ; Y , X

MOVE Z ,R3 ; 0 , Y

MOVE NOP ; R3 , Z
s-3
01

JROT ; JROT
II

6.6 OTHER APPLICATIONS

A lthough th e d esig n of th e chip h a s b e e n p re s e n te d in light of q u ite a

specific ap plication, it is actu ally app licab le to a larg e class of signal

pro cessin g p ro b lem s. The chip a r c h ite c tu r e is a r a t h e r powerful one

consisting of tw o c o n c u rre n t p ro c e s s o rs w hich o p e ra te on a v ery ric h s e t of

prim itive o p e ra tio n s. These a re f u rth e r a u g m e n te d by a c o n tro lle r which is

easily p ro g ra m m e d to s u p p o rt th e c o n c u rre n t o p e ra tio n of th e s e p ro c e sso rs

in a m a n n e r w hich is com m on to a h o s t of signal p ro ce ssin g applications.

This se ctio n will explore th e 'use of th e ch ip for com puting d isc re te F o u rier

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 239 -

tra n s fo rm s (DFT), p erform ing LPC b a s e d sp e ec h sy n th esis an d im p lem en tin g

b o th LMS ty p e adaptive equalizers. E xtensive u se of "flo w c h a r ts " s im ila r to

F ig u re 6 .1 will b e m ade. Som e f u r th e r applications of th e chip w hich will n o t

b e p r e s e n te d a re echo c a n ce lla tio n a n d adaptive line en h an cem en t.

B efore em barking on th e d e s c rip tio n of th e s e o th e r applications, it is

w o rth noting t h a t a single CORDIC v e rsio n of th is chip w ith a sim p ler c o n tro l

s tr a te g y will o fte n be useful. The id ea s p re s e n te d in th e design of th e

p r e s e n t chip a n d c o n tro lle r apply d ire c tly to th e rea liz a tio n of th e single

CORDIC version.

6 .6.1 The D iscrete Fourier Transform

R ecall th e DFT alg o rith m of S e c tio n 2.1. Notice th a t z(n)JV£71 is a

ro ta tio n of th e com plex v e c to r x ( n ) th ro u g h an angle -!?ta = —2 i r k n / N , so

th e p r e s e n t chip should b e id eal fo r im plem enting th e DFT. C onsider

utilizing N / 2 chips for a n N p o in t tra n sfo rm . E ach CORDIC blo ck is u sed

to c o m p u te one point of th e tra n s fo rm as shown in F igure 6.13. The q u a n tity

- 2 tt k / N is a c o n sta n t for e a c h CORDIC block so th a t as sam ples arriv e, th e

angle = (-2 7 - k / N ) n is first c a lc u la te d an d th e n th e c u rr e n t sam ple

x ( n ) is ro ta te d . Two m icro cy cles a r e th e n re q u ire d to a c cu m u la te th e re a l

a n d im ag in ary p a rts of th e tra n s fo rm sam ple. After N sam ples, th e

tra n s fo r m c a lc u la tio n is com plete. The final two m icrocycles im pose v ery

little a d d itio n al sp eed o v erh ead c o m p a re d w ith th e calcu latio n of and

th e r o ta tio n of x (n ).

E a c h r o ta te and a c cu m u la te o p e ra tio n re q u ire s 36 m icrocycles o r clock

cy cles. F or m axim um th ro u g h p u t, one CORDIC block p e r tra n sfo rm p o in t is

u sed . H ence, a sam ple d a ta r a te of 210K com plex s a m p le s/s e c o n d is

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

MACROCYCLE M ACRCCYCLE M ICII0C Y CL E M ICROCYCLE

CORDIC BLOCK k

LIN CIRC

x!(kj)

conm c olock a * i

LIN CIRC

m«1
/ —0

/-I
NOTES »'„>-.-inM/n SUBSCRIPT fl (I) Di NOTES REAL (IMAGINARY) PART x’(k,l)-=xllf-B^kl X(k.l) = E x'(k.i)

Ftgure 6 . 1 3 : The D i s c r e t e F ou rier Transform Implementation


- 241 -

possible w ith a n 8 MHz clock. Of course, for low er d a ta r a te operation, th e

CORDIC blocks m ay be tim e s h a re d resu ltin g in few er chips for a n iV-point

tran sfo rm .

D espain [De74] [De79] h a s stu d ie d th e use of CORDIC algorithm s for th e

DFT quite extensively. The r e a d e r is r e fe rre d to h is work.

6.6 .2 S p eech S yn th esis

Many ap p licatio n s involve th e sy n th esis of s to r e d s p e e c h segm ents e.g.

c o n su m er p ro d u c ts su c h as SPEAK&SPELL. The sy n th esis problem (C hapter

T hree) m ay b e rea d ily im p lem e n te d in la d d e r fo rm using a single p ro ce sso r

v a ria n t of th e p re s e n t ch ip (Figure 6.14) b e c au se e a c h sta g e of th e filter is

ju s t a ro ta tio n b y a n a m o u n t re la te d to th e re fle c tio n coefficient of th e

stage.

When th e sy n th e sis application involves s to re d r a th e r th a n a rb itra ry

sp e e c h se g m e n ts (as would be th e case in digital • telephony), th e

im p lem e n ta tio n c a n be significantly sim plified b y sto rin g i?n ra th e r th a n th e

reflectio n coefficient. pn .

6.6 .3 The U nnorm alized Ladder Form

Recall th e u n n orm alized praw indow ed la d d e r form equations from

C h ap ter Two:

£cj - T c.r - V t

7 n - l.T

rsc - & n+ l.T


■Kn+1.7 -

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 242 -

T =1 T =2. T =3
1
1_ \^ P Z ATAN
1 1 MOVE P< Y
1 1 MOVE 0, Z
P_ P__ V___ V+ /•JAN
ATAN 1 ATAN 1 CROT
I MOVE v, X
0 0 9P MOVE V* Y
tt l CROT
MOVE X, V
cos Bp -s in Bp MOVE Y, V

sin 8 COS Be

Figure 6 .1 4 : Ladder Form Speech S y n th e s iz e r

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 243 -

1ST _ ^7 1 + 1 .7
X n +l.T -

s7i+l;r = Sti.T — Kr. +l.T r n .T - l

r n+ l.T = r n .T - l ~ ■^r +1.7’ S-n.T

/n -l.r+ l

K Z.T +1 - * - - R n . T + = ? ' 7+1


7n-l.T

7 n+ l.r = 7n.T ~
fin+l.T
w here

&n+i.T is th e (7 1 + 1 ) ^ o rd e r p a rtia l c o rre la tio n of y ?

K£+l T , a re th e forw ard and b ack w ard filte r gains of th e

( n + l ) th fille r sta g e

R £ t . R n.r a re th e covariahces of th e fo rw ard an d backw ard

p re d ic tio n e r r o r s an d

7n.r is a likelihood v a riab le of n th o rd e r

It is possible to re a liz e th e s e equations w ith th e CORDIC functions when

773. = 0 provided th a t th e r e is sufficient dynam ic ra n g e in th e fixed point

sto ra g e fo rm a t to r e p r e s e n t th e q u an tities e n c o u n te re d . The re q u ire d ran g e

is clearly a functio n of th e in p u t p ro c e ss sta tistic s.

F igure S. 15 shows th e two CORDIC im p lem e n ta tio n of th e unnorm alized,

prew indow ed la d d e r form . N otice t h a t th e sign re v e rs e d CORDIC functions of

F igure 4.3 have also b e e n em ployed. The e a se w ith w hich th e sig n rev e rsal is

h an d led by th e CORDIC p ro c e s s o rs saves co n sid erab le overhead. R otations

a p p e a r to b e th e c o r r e c t co m plexity m e a su re since th e unnorm alized and

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 244 -

I 33 E
s.
o
u.
s.
0
"O
T3
as
*D
0
N

0
.C
t—

ID
0
S-
3
cn
Ll.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 245 -

n o rm a liz e d alg o rith m s req u ire roughly th e sam e n u m b e r of th e m p e r sta g e

(since so m e p rep ro c e ssin g of th e d a ta m u s t b e done in th e norm alized

case).

6.6.4 A daptive E q u a liz a tio n

The m o s t com m on rea liz a tio n of ad ap tiv e equalizers in p re s e n t day

m o d em s is a com plex tra n sv e rsa l filte r (Figure 6.16) w hose coefficients a re

a d ju ste d using th e le a s t m ea n sq u a re (LMS) g ra d ie n t alg o rith m [Wi70]. The

fo rm u la tio n is re c a lle d from S ection 3.2:

z ifc — 2 c n r if c - n ( 6 .1 )
n
c *+1 = e* - Ae„
w here

c£ is th e n th com plex ta p coefficient a t tim e t

r n is a com plex in p u t sam ple (applies to all lin e a r

m o d u la tio n schem es)

z k is th e com plex eq ualizer o u tp u t

A is a re a l a d a p ta tio n c o n sta n t

en is a com plex e rro r signal supplied fro m elsew here in th e

m o d em (re fe rre d to as decision fee d b a c k equalization)

The e q u alizer is a significant p o rtio n of th e m o d em d a ta processing,

esp ecially w hen th e com plex m u ltiplications a re rea liz e d as re a l m ultiply

an d a c c u m u la te o perations. Again, th is p ro b le m is m ore n a tu ra lly

r e p r e s e n te d in te rm s of ro ta tio n s since th e s e a re a well known

re p r e s e n ta tio n of com plex m ultiplications. An im p le m e n ta tio n of one

eq u alizer ite r a tio n is given in F igure 6.17.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 246 -

Figure 6 .1 6 : LMS A daptive E q ualizer

T=1 T=2

Im ( c ;)
ATAN CROT MUL

R e(e„)

MUL MUL
Irr. ic’ )

S p ( z ), S j ( z ) are real and imog. partial sums of zk at order n.

Figure 6 .1 7 : A d aptive E q u a liz e r Implementation

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 2*7 -

N otice t h a t th e m odem p refilterin g c a n also b e done as defined by ( 6 . 1 )

e x c e p t t h a t th e filter is tim e in v a ria n t so t h a t it is d esirab le to s to re iSr.

r a th e r th a n th e filter coefficients them selves.

All of th e foregoing exam ples a re n a tu ra lly d e sc rib e d by ro ta tio n s.

E fficient im p lem e n ta tio n s b ased on the r o ta tio n fram ew ork c a n be ob tain ed

e ith e r w ith th e p re s e n t chip o r u n ip ro c e sso rs w hich a re v ariatio n s of th e

p r e s e n t a r c h ite c tu r e and co n tro l s tr u c tu r e .

CHAPTER SUMMARY AND CONCLUSIONS

The d e sig n of a two p ro c e s s o r CORDIC chip was p re s e n te d w ith th e

a p p lic atio n of rea l-tim e sp e e c h analysis using la d d e r filters in m ind. F o u r

possible a rc h ite c tu re s , exploiting varying d e g re e s of parallelism , w ere

c o n sid e red a n d c o m p a red to som e existing chips. It was se e n th a t by

carefu lly m atch in g a rc h ite c tu re to algorithm s, th is chip offers m o re

c o m p u ta tio n pow er fo r a given chip area.

A tw o level m ic ro p ro g ram c o n tro l s tr a te g y was p re s e n te d which was

desig n ed to fa c ilita te u s e r p rogram m ing. A lim ited n u m b er of pow erful

in stru c tio n s allow th e u s e r to use th e two CORDIC m ach in e for a v a rie ty of

in te re s tin g signal p ro cessin g functions, including s p e e c h analysis.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 248 -

APPENDIX

Is it possible to clo c k th e p a ra lle l a rc h ite c tu re s a t 12.7 MHz? The e le c tric a l

p ro p e rtie s given in [MC80] 'will be u sed in a n a tte m p t to answ er th is ques­

tion. The d a ta flow in th e p a ra lle l a rc h ite c tu re is shown in figure 5.3. From

th e p re lim in a ry design an.d lay o u t t h a t was p e rfo rm e d to g e n e ra te figure

6.10, th e following first o rd e r (i.e. RC) eq u iv alen t c irc u it of th e c ritic a l tim ­

ing p a th is o b tain e d (p rech arg in g is n o t a ssu m ed ):

DD

R1=10K
AU d elay
R3=14.6K
A /W

C^=.3pF — —
- i — C2=.3pF
C3= l . i pF

It is sim p ler to analyze th e m o re pessim istic circ u it:

V,
DD

R2=20K

"T Cx=1.4pF ~ C2=.3pF

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 249 -

L et Tc b e th e c lo ck p e rio d a n d Tav = 40 n s th e delay th ro u g h th e g a te s of

th e a rith m e tic u n it. If signal rise tim e s a re m e a s u re d to th re e volts (assum ­

ing a 5V supply), i t is c le a r th a t:

Tc ^ R \C i + R zC z + Ta'j
- 81 n s

H ence th e m ax im u m clock freq u en cy is v e ry close to th e d e sire d 12.7 MHz.

Notice th a t in s tru c tio n fe tc h and decode tim e has b e e n ignored in c a lc u la t­

ing th e cycle tim e of th e chip b e c a u se th e s e functions a re o verlapped w ith

in stru c tio n e x e c u tio n th ro u g h a single in stru c tio n p re fe tc h . It c a n n o t be

em p h asized en o u g h t h a t th is sim ple analysis is n o t ro b u st to processing

variatio n s a n d i t is likely quite optim istic.

It re m a in s to show th a t T^u - 40 n s is achievable w ith th e p ro ce ss

p a ra m e te rs of [MC80] (rem em b erin g again t h a t th ey a re e x tre m e ly optim is­

tic , how ever th e y do provide a co m m o n b a sis for com parison). Since th e

a d d e r is 20 b its wide, w ith a rip p le c a rry , a propogation delay of 2 n s / b i t is

to le rab le in th e c a rr y circu it. R eferring to figure 6.4, it is c le a r th a t th e AU

sp e e d is in d e e d lim ited by th is c irc u it.

L et th e AU o perands, a* an d 6 * b e s ta b le a t th e re sp e c tiv e inputs. The

c a rry signal in c u rs a single p a ir delay in e a c h stag e of th e a rith m e tic unit.

The equivalent c irc u it of a stag e is:

V,
DD
T

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 250 -

Sam ple layouts, t h a t w ere u sed to d e te rm in e th e c o m p arativ e chip

a re a s p ro v id ed in figure 6 .1 0 , le a d to:

R u - 30 A:n R dl = 3 R dZ = 6 k Q C = 0.1 p F

L et Tr — Rn C a n d T f = R i ZC. An in v e rte r ty p ically sw itches a t 1 .5 7 so

define:

t r - rise tim e = tim e to charge C fro m 0 7 to 1 .5 7

t j = fall tim e = tim e to discharge C fro m 5 7 to 1 7

tp = p a ir d elay = tr + tf

Then:

tp = —( T f 'in 0.2 + r r ln 0.7 ) = 2 ns

H ence. Ta y = 40 ns is achievable w ith th e r a t h e r o p tim istic p ro c e s s p a ra m e ­

t e r s p re s e n te d in [MC80]. In rea lity , 60 ns — 100 ns w ould b e m o re likely *.

1 1 am grateful to Professor Hennessy for pointing out these figures and far supplying me
■with th e perform ance of som e adders for comparison.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 251 -

BIBLIOGRAPHY

[AMLABl] H.M. Ahmed, M. Morf, D.T.L. Lee a n d P.H. Ang, "A VLSI Speech

Analysis Chip S e t B ased on S quare-R oot N orm alized Ladder

F orm s," Proc. 1981 ICASSP, A tlanta, GA, M ar.-Apr. 19B1, pp. 64B-

653.

[AnBO] P. Ang, N otes o n th e S erial-P arallel CORDIC Block, 1980

[CD80] Y.-S. Chen, D. D uttw eiler, "A 35,000 T ra n sisto r Chip VLSI Echo

C anceler," Proc. o f the I n t l . S o lid S ta te C ircuits Conference,

S an F ran cisco , CA., 1980.

[De74] A.M. D espain, "F o u rier T ransform C om puters Using CORDIC Ite ra ­

tions." IE E E Trans. Com put., Vol. C-23, Oct. 1974, pp. 993-1001.

[De79] A.M. D espain. 'V e ry F a s t F o u rie r T ran sfo rm A lgorithm s for

H ardw are Im p le m e n ta tio n ," IEEE Trans. C om put. , Vol C-28, No.

5. May 1979, pp. 333-341.

[HT80] G. Haviland, A. Tuzynski, "A CORDIC A rith m etic P ro c e sso r Chip,"

IE E E Trans, o n C om puters, Vol. C-29. No. 2, F e b ru ary , 1980.

[Hw79] K. Hwang, C om puter A rithm etic, p rin c ip le s, A rc h ite ctu re and

Design, J. Wiley, 1979.

[MC80] C. Mead. L. Conway, In tro d u c tio n to VLSI S y s te m s , Addison Wes­

ley, 1980

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 252 -

[Pe72] J. P e a tm a n , The D esign o f D igital S y s te m s , McGrawHill, 1972

[SD65] P. SouttivOFux , 2. i/c i£ c u W t lscytA iM i oO I / t p r s i wlu j 'i l i S ftx T L C O l

M ethods, M cGrawHill, 1965.

[W a?l] ‘ J.S. W alther, "A Unified A lgorithm fo r E le m e n ta ry F u n c tio n s,”

Proc. o f the 1971 S p rin g J o in t C om puter Conference, pp. 379-

365.

[WB78] R. Wiggins, L. B rantingham , "Three Chip S y stem Sjm thesizes

H u m an S p e e c h ,” E lectronics, Vol. 51, No. 16, Aug. 31, 1978, pp.

109-116.

[Wi70] B. Widrow, "Adaptive F ilters," in A spects o f N etw o rk and S y s te m s

T heory (Kalm an. DeClaris), Holt, R in e h a rt a n d Winston, 1970.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 253 -

CHAPTER SEVEN

CONCLUSIONS

This d iss e rta tio n was d ev o ted to th e stu d y of signal processing algo­

rith m s for efficient, in te g r a te d im p lem en tatio n . The goal was to d esign a

g e n e ra l purp o se signal p ro c e ssin g chip w hich would provide a la rg e r

th ro u g h p u t p e r silicon a r e a th a n existing signal p ro cesso rs, n ot b e c a u se of

technological ad v an tag es in IC p ro cessin g , b u t r a th e r by v irtu e of having a n

a rc h ite c tu re th a t is closely m a tc h e d to th e algorithm s of in te re s t. This

re q u ire s a d e ta ile d stu d y of a re a s spanning from signal p ro cessin g th e o ry to

c o m p u te r a rc h ite c tu re a n d c irc u it design. The chip th a t was designed

r e p re s e n ts a ra d ic a l d e p a rtu re from c u rr e n t day signal p ro c e sso rs b e c a u se

its a rith m e tic u n it is b a s e d on th e ability to p e rfo rm g e n eralized v e c to r

ro ta tio n s as p rim itive o p e ra tio n s r a t h e r th a n th e visual m u ltiply 2 n d a c c u ­

m u la te function.

The m otiv atio n for v e c to r ro ta tio n p rim itives arose from th e rea liz a tio n

t h a t m an y algorithm s c a n b e c a s t in to a fo rm w here co o rd in ate tra n s fo rm a ­

tio n s are the n a tu ra l o p e ra tio n s d escribing th em . In fact, m u ltip licatio n is

re a iiy a v ecto r r o ta tio n in a p a rtic u la r co o rd in ate system , i.e. it is a s u b s e t

of th e ric h c o m p lem en t of o p e ra tio n s th a t n atu rally d e scrib e m any signal

pro cessin g algorithm s. B o th th e d isc re te F o u rie r tra n sfo rm (DFT) and th e

s q u a re root n o rm alized la d d e r form w ere c a s t into a ro ta tio n fram ew ork. In

th e la t te r case, th e e n tire la d d e r u p d a te was shown to only re q u ire five r o ta ­

tio n s p e r stage, i.e. five p rim itiv e op eratio n s. However, th e com plexity in

te r m s of m u ltiplications is fo rm id ab le owing to th e sq u are ro o t op eratio n s.

M atrix algebra alg o rith m s t h a t o c c u r com m only in signal p ro cessin g w ere

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 254 -

also shown to be n a tu ra lly d e scrib ed by ro tatio n s.

The CORDIC algorithm s were e v alu ated as a m ea n s fo r p e r f o r m ing vec­

to r ro ta tio n s b e c au se th e ir im p lem e n ta tio n is v e ry sim ple, requiring only

a d d e rs, sh ifte rs a n d re g iste rs. U nfortunately, th e alg o rith m s a re in h ere n tly

slow owing to th e ir ite ra tiv e n a tu re a n d th e y do n o t provide a sufficiently

larg e dom ain of convergence for m o st applications. The d e sire d re s u lts a re

also sc ale d by a spurious scale c o n s ta n t w hich is difficult to co m p en sate for.

In fact, existing m ethods of circum venting th e s e sh o rtcom ings in c u r a

se v e rs h a rd w a re and sp e ed penalty. A new m eth o d was developed for sim ul­

tan e o u sly enlarging th e reg io n of convergence of th e alg o rith m and com pen­

sa tin g for th e scale fa c to r w ithout th e ad d ition of any h ardw are an d w ith

only a m in o r sp e ed overhead. This h a d a profound im p a c t on th e chip

design, red u cin g its cycle tim e re q u ire m e n t by n e a rly 50%. Chip size was

also re d u c e d sin ce special hardw are was n o t req u ired .

Two h y b rid CORDIC techniques, w hich com bine th e CORDIC algorithm s

w ith a m ultip lier, w ere developed for enhancing th e o p e ra tio n speed of th e

algorithm s. When c o m p ared w ith a s to re d ta b le a p p ro a c h in which a m u lti­

p lie r is u se d to p e rfo rm r. v ecto r ro ta tio n , th e h y b rid CORDIC afforded an

ex p o n en tial re d u c tio n in req u ire d sto ra g e in exchange for a lin e a r in c re a se

in ex e cu tio n tim e. Consequently, th e h y b rid m e th o d h a s a m u ch h ig h er

th ro u g h p u t p e r a re a figure of m e rit th a n th e s to re d ta b le approach.

Floating p o in t CORDIC algorithm s th a t a re b a se d solely on floating point

ad d itio n s w ere also developed. They a re co n cep tu ally sim p ler th a n th e ir

fixed p o in t c o u n te rp a rts since no explicit shifting is re q u ire d (effectively,

th e sh ift o c c u rs a u to m atically during ra d ix p o in t alignm ent in th e floating

p o in t ad d e r).

C hen’s convergence co m p u ta tio n m eth o d (CCM) was co nsidered fo r th e

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 255 -

a rith m e tic u n it d esig n a s well. The CCM was g e n eralized to v e c to r valued

fun ctio n s an d i t was show n to be in tim a tely r e la te d to th e CORDIC; CORDIC

being a special c a se of th e g e n e ra liz e d CCM. H ence, a unified fram ew ork for

m an y e le m e n ta ry fu n ctio n s was discovered in th is gen eralizatio n .

T arget ap p licatio n s for th e signal processing chip included r e a ltim e

sp e e c h analysis an d synthesis, adaptive equalization, dig ital signal d e te c tio n

a n d som e m a trix o p e ra tio n s. An effort was m ad e to c a s t m any of th e se

p ro b lem s in to a la d d e r fo rm s tr u c tu r e since th e la d d e r form h a s a nice

im p lem e n ta tio n (b ase d on ro ta tio n s) and exhibits n a tu ra l pipelining. A new

signal d e te c tio n sc h em e was d e scrib ed w hich u se s th e likelihood variable in

th e la d d e r filte r to d e te c t ch a n g es in signals c o rru p te d b y additive G aussian

noise. The p e rfo rm a n c e of th e d e te c to r was shown to d ep en d on th e tr a n ­

s ie n t behaviour of th e la d d e r following a b it change. D etecting a change in

th e in p u t signal was re d u c e d to be a binary hyp o th esis te stin g p ro b le m in

w hich th e re le v a n t t e s t s ta tis tic d istributions were a sy m p to tically chi-

squared.

F a s t Cholesky fa c to riz a tio n by rows was shown to have a la d d e r form

re a liz a tio n also, w hich was th e sa m e as if th e fac to riz atio n h a d b e e n done by

colum ns. The equivalence of th e fa s t Cholesky a lg o rith m and th e Levinson

alg o rith m in la d d e r fo rm w as d e m o n stra te d u n d e r pipelining, th u s providing

a unified s tr u c tu r e fo r th e ir im plem entation. Large a rra y s of p ro c e sso rs

t h a t utilized th e chip as a p ro ce ssin g e le m en t w ere c o n s tru c te d for a v a rie ty

of m a trix a lg e b ra o p eratio n s. It was n o ted t h a t th e s e alg o rith m s exhibit

co n sid erab le s tr u c tu r e , allowing fo r th e definition of a p ro g ra m m odel. With

th e aid of th e m odel, a g e n e ra l p u rp o se a rra y s tr u c tu r e was shown to exist.

Unlike ad hoc te c h n iq u e s for co n stru c tin g a rra y s th a t a r e specific to an

algorithm , th is g u a ra n te e s th a t any p ro g ra m which satisfies th e m odel

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- 256 -

(adm ittedly, th e m o d el was r e s tr ic te d to a class of signal p ro ce ssin g p ro b ­

lem s) c a n be im p le m e n te d on th e g en eral p u rp o se a rra y a n d its p e rfo r­

m ance is c h a ra c te riz e d by th e d a ta d ep en d en cies of th e alg o rith m . A sim ple

tec h n iq u e for analyzing th e p e rfo rm a n c e of an a rr a y of p ro c e s s o rs was also

given.

Finally, a m ic ro p ro g ra m m e d chip consisting of two CORDIC p ro c e s s o rs

and a s c ra tc h p a d a r e a was designed. F our d iffe re n t con fig u ratio n s w ere

co n sid ered th a t in d ic a te d t h a t a b e tte r th ro u g h p u t p e r a re a ra tio was

achievable w ith b it p a ra lle l r a th e r th a n b it s e ria l a rith m e tic . In c o m p a ris­

ons w ith two co m m e rc ial chips, th e CORDIC p ro c e s s o r exhib ite d h ig h e r

th ro u g h p u t p e r a r e a for a la d d e r filter algorithm , h e n c e satisfying one of th e

goals s e t fo rth in th e in tro d u ctio n . A two level m ic ro p ro g ra m c o n tro l s tr a ­

teg y was chosen su c h t h a t th e u s e r n e e d n o t be b o th e re d w ith th e d e ta ils of

th e CORDIC ite ra tio n s . By working with a sm all b u t pow erful in s tru c tio n set,

signal p ro cessin g a lg o rith m s m ay be read ily p ro g ra m m e d b e c a u s e th e y a re

gen erally quite s h o rt a n d re p e titiv e (hence sp eed intensive) a n d do n o t often

exhibit conditional b ran ch in g . The g en e ra lity of th e chip was shown w ith a

v a rie ty of exam ples in c h a p te r six.

In conclusion, th is th e sis sp an s a n in te re stin g m ix tu re of topics. The

goal was to design a chip w hose a rc h ite c tu re was in tim a tely m a tc h e d to th e

th e o re tic a l foundations of th e alg o rith m s of in te re s t. This was larg ely done

b y identifying th e m o s t n a tu ra l s e t of prim itive o p e ra tio n s w hich d e sc rib e d

th e algorithm s, a n d to a le s s e r d egree, by m odifying a lg o rith m s to be co n d u ­

cive to im p lem e n ta tio n (e.g. fast Cholesky in la d d e r form , la d d e r form s for

signal d e te c tio n a n d la d d e r form s in a ro ta tio n fram ew ork). The r e s u lt is a

chip th a t is som ew hat unconventional b u t v ery well su ite d to th e c la ss of

p ro b lem s of in te re s t.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like