Signal Processing Algorithms and Architectures

INFORMATION TO USERS
This reproduction was m ade from a copy o f a docum ent sent to us fo r m icrofilm ing.
While the m ost advanced technology has been used to photograph and reproduce
this d o cum ent, the quality o f the reproduction is heavily dependent upon the
quality o f the m aterial subm itted.
The following explanation o f techniques is provided to help clarify m arkings or

n o tations which may appear on this reproduction.
1 .T h e sign or “ targ et” fo r pages apparently lacking from the docum ent
photographed is “ Missing Page(s)” . If it was possible to obtain the missing
page(s) o r section, they are spliced into the film along w ith adjacent pages. This
may have necessitated cutting through an image and duplicating adjacent pages
to assure com plete continuity.
2. When an image on the film is obliterated w ith a round black m ark, it is an

indication o f eith er blurred copy because o f m ovem ent during exposure,
duplicate copy, o r copyrighted m aterials th a t should n o t have been film ed. F or
blurred pages, a good image o f the page can be found in the adjacent fram e. If
copyrighted m aterials were deleted, a target n o te will appear listing th e pages in
the adjacent fram e.
3. When a m ap, drawing o r ch art, etc., is p art o f the m aterial being photographed,
a definite m ethod o f “ sectioning” the m aterial has been follow ed. It is
custom ary to begin filming at the up p er left hand com er o f a large sheet and to
continue from left to right in equal sections with small overlaps. If necessary,
sectioning is continued again—beginning below the first row and continuing on
until com plete.
4. F o r illustrations th at cannot be satisfactorily reproduced by xerographic

m eans, photographic prints can be purchased at additional cost and inserted
into y o u r xerographic copy. These prints are available upon request from the
D issertations C ustom er Services D epartm ent.
5. Some pages in any docum ent m ay have indistinct print. In all cases the best
available copy has been film ed.
University
Microfilms
International
300N .Z e e b Road
Ann Arbor, Ml 48106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8220419
Ahmed, H assan Masud
SIGNAL PROCESSING ALGORITHMS AND ARCHITECTURES
Stanford University Ph.D. 1982
University
Microfilms
International 300 N. Zeeb Road, Ann Arbor, MI 48106
Copyright 1982
by
Ahmed, Hassan Masud
All Rights Reserved
PLEASE NOTE:
In all ca ses this material has been filmed in the best possible way from the available copy.
Problems encountered with this document hava been identified here with a check mark V .
1. Glossy photographs or p a g e s ______
2. Colored illustrations, paper or print_____
3. Photographs with dark background_____
4. Illustrations are poor co p y ______
5. Pages with black marks, not original copy______
6. Print shows through as there is text on both sid es of paqe X
7. Indistinct, broken or small print on severalp ages is
8. Print exceeds margin requirements_____
9. Tightly bound copy with print lost in spine______
10. Computer printout pages with indistinct print______
11. P age(s)____________ lacking when material received, and not available from school or
auihor.
12. P a g e ( s ) _ _ 9 ______ seem to be missing in numbering only as text follows.
13. Two pages num bered____________ . Text follows.
14. Curling and wrinkled p a g e s______
15. O t h e r _________________________________________________ __
University
Microfilms
Internationa!
SIGNAL PROCESSING ALGORITHMS
AND
ARCHITECTURES
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
H assan M. A hm ed
June 1982
(c) C opyright 1982
by
H a ssa n M. Ahm ed
ii
I c e rtify t h a t I have r e a d th is th e s is a n d t h a t in m y
opinion i t is fully a d e q u a te , in sc o p e a n d quality, as
a d is s e r ta tio n fo r th e d e g re e of D octor of Philosophy.
(P rin cip al Advis
I c e rtify t h a t I have r e a d th is th e s is e n d t h a t in m y
opinion i t is fully a d e q u a te , in sc o p e a n d quality, as
I c e rtify t h a t I have r e a d th is th e s is a n d t h a t in m y
opinion it is fully a d e q u a te , in sco p e a n d quality, as
A pproved fo r th e U niversity C o m m ittee o n G ra d u a te Studies:
D ean of Grac&iate S tudies & R e se a rc h
ABSTRACT
The ad v e n t of th e V ery Large Scale In te g ra tio n (VLSI) technology has
provided th e ab ility to c o n s tru c t large sy ste m s o n a single silicon chip. This
d isse rta tio n is c o n c e rn e d w ith exploiting th is ab ility to design a powerful
signal p ro ce ssin g chip c ap ab le of efficiently im p lem en tin g su c h p o p u lar
algorithm s as th e d is c re te F o u rie r tra n sfo rm , la d d e r filters an d a sso c ia te d
m a trix a lg e b ra o p eratio n s. The la tte r include Givens ro ta tio n s and Cholesky
factorization.
The goal of th e p r e s e n t w ork is to efficiently m ap algorithm s onto
a rc h ite c tu re s b y m ain tain in g a close link w ith th e th e o re tic a l b a sis of a
p a rtic u la r signal p ro ce ssin g m ethod. It is show n t h a t all of th e algorithm s
co n sid ered c a n be c a s t in to a m a th e m a tic a l fram ew o rk involving g en eralized
v e c to r ro ta tio n s . S u c h r o ta tio n op eratio n s provide a n a tu ra l d e scrip tio n of
th e alg o rith m s a n d th e co m putational com plexity m e a su re d in te rm s of
th e s e e le m e n ta ry o p e ra tio n s is m u ch low er th a n in te rm s of th e usual
m ea su re of to ta l n u m b e r of m ultiplications. Thus, unlike p re s e n t day signal
p rocessing c o m p u te rs w hich em phasize ra p id m ultiplication, th e signal
p rocessing a rc h ite c tu re s in th is thesis a re b a s e d on th e ability to p e rfo rm
v e c to r ro ta tio n s in g e n e ra liz e d co ordinate sy stem s.
I t is show n t h a t th e C0RD1C algorithm of V oider provides a convenient
im p lem e n ta tio n cf v e c to r ro ta tio n s with only sim ple com ponents su c h as
adders, r e g is te rs an d sh ifters. U nfortunately, th ro u g h p u t is severely
com prom ised owing to th e n e e d for p erfo rm in g sp e cia l o p eratio n s to
a c co u n t fo r th e lim ited reg io n of convergence a n d spu rio u s scale c o n sta n ts
in h e re n t to th e m ethod. New tech n iq u es to c irc u m v e n t th e s e p ro b lem s w ith
no additional h ard w are a n d only a m arginal s p e e d p e n a lty are d escrib ed .
ili
F u rth e r sp e ed e n h a n ce m en ts th ro u g h th e u se of a newly developed m eth o d
known as h y b rid CORDIC a re discussed. Additionally, floating point CORDIC
(FL0RD1C) alg o rith m s th a t a re co n cep tu ally sim p le r th a n th e ir fixed point
c o u n te rp a rts a re developed and th e c o n n e c tio n of CORDIC to th e
convergence c o m p u ta tio n m eth o d s is shown.
The a rc h ite c tu re of a dual CORDIC block ch ip is d e scrib ed for a ta r g e t
application of re a l tim e s p e e c h analysis. The re su ltin g chip is shown to have
a h ig h er th ro u g h p u t p e r a re a th a n conventional chips b a se d on fast
m ultiplications. This is a ttr ib u te d to th e close m a tc h of th e p re s e n t chip to
th e algorithm s.
Large m e s h c o n n e c te d p ro ce sso r a rc h ite c tu r e s for m a trix facto rizatio n
a re developed w hich a re also closely m a tc h e d t o th e algorithm s. Individual
processing e le m e n ts in th e m esh a re b a se d on CORDIC o p erations, in fac t on
th e afo rem en tio n ed signal processing chip.
Finally, a new tech n iq u e for signal d e te c tio n in additive G aussian noise
is developed w ith a view tow ards e a se of im p lem en tatio n . It is b a sed on
la d d e r filters a n d m ay b e im p lem en ted using th e signal processing chip
m en tio n ed above.
ACKN0WLE3>GoMENTS
S tudying tow ards a Ph.D. d e g re e is m u c h m o re th a n a n e d u c atio n a l
ex p erien ce. During th is period, m a n y im p o rta n t friendships a re form ed.
While I do in d ee d in ten d to acknow ledge m an y individuals for th e ir d ire c t
co n trib u tio n s to m y th esis, I would like to th a n k th e m and m an y m o re a t th e
o u ts e t fo r th e ir friendships, which I have c h e rish ed .
P ro fe s s o r M artin Morf has u n q u e stio n a b ly b e e n m y m en to r. His wide
ranging in te r e s ts and abilities have affo rd ed m e th e o p p o rtu n ity to p u rsu e
m y own in te r e s t, w hich have o fte n b e e n away fro m th e m a in s tre a m of th e
In fo rm atio n S y stem s L aboratory. F o r th is a n d for th e p le a s u re of his
in te ra c tio n s o n m y re se a rc h , I am v e ry g rate fu l.
I am v ery hap p y to acknow ledge P ro fe sso r Jam es D. Meindl fo r his
c o n s ta n t e n c o u ra g e m e n t, his m any c o m m e n ts t h a t aided in focusing m y
r e s e a r c h a n d h is careful review of th e m a n u sc rip t. During th e c o u rse of a
Ph.D. in a field th a t is as volatile as VLSI is in th e in d u stria l a re n a , one is
o ften p lag u ed w ith dou b ts ab o u t th e m e r it of pursuing a d eg ree. I am
g ra te fu l to P ro fe sso r Meindl fo r en c o u ra g in g m e to co m plete m y stu d ie s
(which in re tro s p e c t, I see was th e c o r r e c t decision) - in him, I’ve tru ly found
a friend.
P ro fe sso r J o h n L. H ennessy pro v id ed m an y useful c o m m e n ts during his
rea d in g of th e th e sis th a t led to its im p ro v em en t. I would like to th a n k him
for those, as well as for his sp e ed y review, w hich allowed m e to " ru n off to
E urope" b e fo re th e end of th e q u a rte r:
It h a s b e e n a p lea su re to have w orked w ith Peng Ang on th e d e sig n of
th e chip (C h ap ter Six) a n d w ith Jean-M arc D elosm e on th e m u ltip ro c e s s o r
a rra y (C h a p te r Five). Their c o n trib u tio n s th ro u g h a n u n co u n tab le n u m b e r
of stim u latin g discussions a re in te g ral to th o s e c h a p te rs. V aluable
discussions w ith P ro fe sso r Thom as K ailath a n d P ro fe sso r Abbas ElGamal a re
also gratefu lly acknow ledged.
The lab s e c re ta rie s , esp ecially B arb ara, R achel, Kathy, C harlotte, Mieko
and Jean a re ail good friends. I'm convinced, m o re th a n ever, th a t th ey hold
th e lab to g e th e r. I am g ra te fu l for th e ir frie n d sh ip s a n d for all th e fun
discussions. I feel th e sam e v e ry special a tta c h m e n t to m y room m ates (ex
ro o m m a te s) Rich B a k e r an d P e te r Glynn.
The s u p p o rt of th e D efense Advanced R e s e a rc h P ro je c ts Agency an d th e
N atu ral S ciences a n d E ngineering R e se a rc h Council of C anada a t various
p h a ses of m y s ta y at S tan fo rd is g ra te fu lly acknow ledged. CODEX
C orporation, p a rtic u la rly Dr. G.D. F o rn ey and Dr. S.U.H. Qureshi have
provided m e w ith th e o p p o rtu n ity to k e e p m y h a n d in th e "industrial pie"
while studying. F o r th is, I a m v ery thankful.
Mrs. R achel Levy ty p e d th is m a n u s c rip t w ith excellence and speed.
However w hen th e d eadlines finally cam e, i t would have b e e n im possible fo r
m e to have c o m p le te d th e d iss e rta tio n on tim e w ithout h e r selfless
willingness to devote h e r tim e solely to it. To h e r a n d to m y friend, Mr. Am r
Badawi, who did th e "legwork" to subm it th e th esis, I a m e te rn ally grateful.
Finally, th e s e acknow ledgem ents would be incom plete w ithout
expressing m y d eep love and th an k s to m y p a r e n ts fo r th e ir patien ce, love
and e n c o u ra g e m e n t. (Being th re e th o u sa n d m iles a p a r t h a s n 't b e e n easy fo r
any of us.) So, th a n k s Mom an d Dad!!
I d e d ic a te th is th e sis to m y m o th e r a n d f a th e r a n d to m y good frie n d
Pegge - th e th re e p eople who have shown m e m o re cf life a n d love th a n m any
e x p e rien c e in a lifetim e.
vi
TABLE OF CONTENTS
C hapter rage
1. INTRODUCTION ....................................................................................................... 1
BIBLIOGRAPHY ................................................................................................. 5
2. SIGNAL PROCESSING ALGORITHMS .................................... 10
2.1 THE DISCRETE FOURIER TRANSFORM ..................................................... 11
2.2 EXACT LEAST SQUARES LADDER ALGORITHMS ..................................... 11
BIBLIOGRAPHY ................................................................................................. 16
3. APPLICATIONS OF LADDER FORMS .................. 18
3.1 THE SPEECH ANALYSIS PROBLEM ............................................................. 18
3.1.1 S p e e c h S ynthesis T echniques ........................................................ 19
3.1.2 S p e e c h Analysis ................................................................................. 21
3.1.3 S p e e c h Analysis with S quare Root N orm alized Ladder

F o rm s —The Analysis F ilte r .......................................................... 23
3.1.4 An A ltern ate View of th e S quare Root Norm alized

L adder E quations ............................................................................... 25
3.2 ADAPTIVE EQUALIZATION ............................................................................ 2?
3.2.1 E qualizer S tru c tu re .......................................................................... 30
3.3 DETECTION OF DIGITAL SIGNALS ............................ :................................. 33
3.3.1 Timing R ecovery w ith L ad d er F o rm s .......................................... 35
3.5.2 S im ulation R esults ........................................................................... 50
CHAPTER SUMMARY AND CONCLUSIONS .................................................. 50
APPENDICES .................................................................................................... 63
BIBLIOGRAPHY ................................................................................................. 66
vii
4. NUMERICAL ALGORITHMS .................................................................................... 69
4.1 THE CORDIC ALGORITHMS ........................................................................... 70
4.1.1 Som e C onvergence P ro p e rtie s .................................................... 74
4.1.2 Im p le m e n ta tio n Issues .................................................................. 78
4.1.3 Scale F a c to r N orm alization .......................................................... 79
4.1.4 Scaling in a P a ra lle l Im p lem en tatio n ........................................ 82
4.1.5 E xtending th e Domain of Convergence ...................................... 84
4.2 LOW OVERHEAD SOLUTIONS TO THE PROBLEMS OF CONVERGENCE

REGION AND SPURIOUS SCALE FACTORS ................................................ 87
4.2.1 E ffect on A ngular R esolution ........................................................ 94
4.2.2 S im u latio n R esults .......................................................................... 95
4.2.3 C om putational S peed and H ardw are C om plexity .................. 95
4.3 HYBRID CORDIC ALGORITHMS ...................................................................... 98
4.3.1 In te rp o la tio n with CORDIC’s ......................................................... 98
4.3.2 A Taylor S e rie s A pproach to Hybrid CORDIC’s ........................ 102
4.4 FLOATING POINT CORDIC ALGORITHMS (FLORDIC) ................................. 106
4.5 THE CONVERGENCE COMPUTATION TECHNIQUE .................................... Ill
4.5.1 E xam ples of th e Convergence C om putation Technique ...... 113
4.5.2 H ybrid C onvergence C om putation ............................................. 115
4.5.3 H ardw are Im p le m e n ta tio n ........................................................... 119
4.6 RELATIONSHIP BETWEEN THE CORDIC AND CONVERGENCE

COMPUTATION ALGORITHMS ........................................................................ 119
4.7 A GENERALIZED CONVERGENCE COMPUTATION METHOD AND

THE CORDIC CONNECTION ............................................................................ 126
4.7.1 E xam ples of th e G eneralized Technique .................................. 130
CHAPTER SUMMARY AND CONCLUSIONS ................................................ 137
APPENDIX ......................................................................................................... 140
BIBLIOGRAPHY .................................................................................. •........... 141
viii
5. PARALLEL PROCESSORS FOR LINEAR ALGEBRA. .............................................. 144
5.1 CHOLESKY FACTORIZATION ........................................................................ 145
5.1.1 F a st Cholesky by Rows in L ad d er F o rm ..................................... 146
5.2 SOLUTION OF LINEAR SYSTEMS OF EQUATIONS ................................... 151
5.2.1 A rc h ite ctu re for G ivens'A lgorithm ............................................ 154
5.3 COMPLEXITY DISTRIBUTION AND ACTIVITY CHARTS ............................ 158
5.3.1 Activity C harts ................................................................................... 159
5.3.2 A Tw o-dim ensional A rray fo r Givens’ A lgorithm ..................... 159
5.3.3 Dual A rrays ......................................................................................... 163
5.4 A FORMAL APPROACH TO COMPLEXITY MAPPING ................................. 164
5.4.1 C onstruction of M ultiprocessor A rrays ..................................... 169
5.4.2 An A pproach to F o rm alism ............................................................ 164
5.4.2.1 D istance M easures ........................................................................ 184
5.5 EIGENVALUE DECOMPOSITION ................................................................... 196
CHAPTER SUMMARY AND CONCLUSIONS .................................................. 200
BIBLIOGRAPHY ................................................................................................ 205
6. A LADDER FORM CHIP SET ..................................................................................... 208
6 .1 IMPLEMENTATION OF THE NORMALIZED

LADDER EQUATIONS ...................................................................................... 208
6.2 LADDER FORM CHIP ARCHITECTURES ...................................................... 212
6.3 DESIGN OF A CORDIC PROCESSOR ............................................................ 214
6.3.1 The Fully P a rallel CORDIC Block .................................................. 216
6.3 . 1 .1 Pipelining ......................................................................................... 221
5.3.2 The P arallel-S erial CORDIC Block ............................................... 221
6.3.3 The Serial-P arallel R ealization .................................................... 225
6.3.4 The Fully Serial CORDIC B lock ..................................................... 228
6.4 ARCHITECTURAL TRADEOFFS - A COMPARISON OF THE

CORDIC REALIZATIONS .................................................................................. 228
ix
6.5 THE MICRO-CONTROLLER .............................................................................. 234
6.5.1 The S p eech Analysis M icrocode ................................................... 237
6 .6 OTHER APPLICATIONS ................................................................................... 236
6 .6 . 1 The D iscrete F o u rier T ran sfo rm .................................................. 239
6.6.2 S p eech Synthesis ............................................................................. 241
6.6.3 The U nnorm alized L ad d er F o rm .................................................. 241
6.6.4 Adaptive E qualization ..................................................................... 245
CHAPTER SUMMARY AND CONCLUSIONS ................................................... 247
APPENDIX ............................................................................................................ 248
BIBLIOGRAPHY ................................................................................................. 251
7. CONCLUSIONS .......................................................................................................... 253
LIST OF FIGURES
F ig u re Page
2.1 L ad d er F ilte r S tru c tu re ............................................................................ 12
3.1 F ilte r Model fo r S p eech S ynthesis ........................................................ 20
3.2 S p e e c h S y nthesis ........................................................................................ 20
3.3 T hree Chip S y n th esizer b y Texas In stru m e n ts, In c ............................ 22
3.4 S p e e c h Analysis .......................................................................................... 22
3.5 C hannel E qualization ................................................................................. 29
3.6 T apped Delay lin e E qualizer ................................................................... 29
3.7 L adder F ilte r E qualizer ............................................................................. 31
3.8 P e rfo rm a n c e of L adder E qualizer ......................................................... 32
3.9 N on-R eturn to Zero T ransm ission F o rm a t .......................................... 36
3.10 Digital T ransm ission S y stem ................................................................... 36
3.11 S ta tistic a l D istribution of y n j .............................................................. 44
3.12 D etectio n T hreshold vs. False A larm P ro b ab ility .............................. 46
3.13 M issed D etectio n P robability for V arious Pp. X ............................... 47
3.14 Effective D egrees of F re ed o m of 7 ,l.r(X) ............................................ 48
3.15 B aseband Sim ulation, SNR = 46 dB, 8 t h o rd e r Ladder ................... 51
3.16 B aseband Sim ulation, SNR = 20 dB, 8 t h o rd e r L adder ................... 54
3.17 B inary PSK, SNR = 12.4 dB, 8 th o r d e r L adder ................................... 57
3.18 B inary PSK. SNR = 0 dB, 8 t h o rd e r L ad d er ......................................... 58
3.19 B inary PSK, SNR = 12.4 dB, 2nd o r d e r L ad d er ................................... 59
3.20 B inary FSK, SNR = 26.3 dB, 8 t h o rd e r L adder .................................... 60
3.21 B inary FSK, SNR = 12.4 dB, 4 th o rd e r L adder .................................... 61
xi
4.1 R o ta tio n in G eneralized C oordinate System s ..................................... 71
4.2 The CORDIC F unctions ................................................................................ 73
4.3 The R eversed Sign CORDIC Functions .................................................... 75
4.4 Y older's R otation S equences .................................................................... 80
4.5 A P a ra lle l CORDIC M achine A rc h ite ctu re ............................................. 83
4.6 P e rfo rm a n c e of W alther’s CORDIC M achine ......................................... 86
4.7 C o m p u ter S im ulation R esults .................................................................. 96
4.8 P e rfo rm a n c e of H ybrid CORDIC Schem e ............................................... 100
4.9 G eom etric In te rp re ta tio n of th e CCM .................................................... 104
4.10 A M achine A rc h ite c tu re for th e CCM ...................................................... 120
5.1 R ecursions In d u ced on th e Rows of th e Cholesky F a c to rs .............. 149
5.2 F a st Cholesky b y Rows in L adder F orm ................................................ 149
5.3 A P ip elin ed A rray of P ro c e sso rs ............................................................. 150
x5.4 Fully P ip elin ed Givens M ethod on a L inear A rray .............................. 155
5.5 . A rray In p u t S equence for Givens Algorithm ........................................ 157
5.6 L inear A rray A ctivity C hart ....................................................................... 160
5.7 A Two D im ensional A rray for Givens M ethod ......................................... 162
5.8 O p eratio n of th e Dual L inear A rray ......................................................... 165
5.9 Tim e-Space Dual A rray Activity C hart ................................................... 166
5.10 Dual T riangular A rray ............................................................................... 167
5.11 T rian g u lar A rray fo r H yperbolic Cholesky ........................................... 183
5.12 The R e c tan g u la r A rray ............................................................................. 186
5.13 The Double-H exagonal A rray ................................................................... 187
5.14 The H exagonal A rray ................................................................................. 189
5.15 Closed Ball Topology .................................................................................. 191
5.16 C oordinate A ssignm ent fo r T heorem 1 ................................................ 193
x ii
5.17 T rav ersal O rdering ...................................................................................... 194
5.18 C oordinate S ystem for Givens A rray ................................................... 197
5.19 Eigenvalue D ecom position - QR D ecom position ..... 201
5.20 Eigenvalue D ecom position - RQ C alculation ........................................ 202
5.21 Eigenvalue D ecom position - Activity C hart .......................................... 203
6 .1 CORDIC Im p le m e n ta tio n of Square Root L adder F orm ..................... 210
6.2 Dual-CORDIC Chip A rc h ite ctu re ............................................................... 213
6.3 The Fully P a ra lle l CORDIC Block .............................................................. 217
5.4 Bit Slice of A rithm etic U nit ........................................................................ 220
6.5 A R e g iste r Cell ................................................................................................ 220
6 .6 P ipelined CORDIC Block .............................................................................. 222
6.7 The P arallel-S erial CORDIC Block ............................................................. 223
6 .8 Bus S tru c tu re of th e P arallel-S erial A rc h ite ctu re .............................. 226
6.9 The S erial-P arallel CORDIC Block ............................................................. 227
6.10 P e rfo rm a n c e C om parison of Various A rc h ite ctu res ......................... 230
6.11 M icrocontroller In stru c tio n S et .............................................................. 236
6.12 M icrocontroller A rc h ite ctu re ................................................................... 236
6.13 The D isc re te F o u rie r T ransform Im p le m e n ta tio n ............................. 240
6.14 L adder F orm S p eech S yn th esizer .......................................................... 242
6.15 The U nnorm alized L adder F o rm ............................................................. 244
6.16 LMS A daptive E qualizer ............................................................................. 246
6.17 Adaptive E qualizer Im p le m e n ta tio n ....................................................... 246
xiii
CHAPTER ONE
INTRODUCTION
Over th e p a s t d ecad e, th e world has b e e n w itness to a n e le c tro n ic
rev o lu tio n t h a t has b e e n p rim a rily due to th e dev elo p m en t of th e Large
S cale In te g ra tio n (LSI) technology, and in p a rtic u la r, th e m ic ro p ro c e sso r.
V irtually ev ery in d u stry h a s b e n e fitte d fro m th e ab ility to fa b ric a te a
co m p u tin g m ach in e on a single chip. The c u rre n tly em erging V ery Large
Scale In te g ra tio n (VLSI) tech n o lo g y pro m ises still h ig h e r c irc u it d e n sitie s on
a chip. With th e ability to m a n u fa c tu re over a m illion devices o n a chip,
w h at does one build, and hew? The p ro b le m h e re is twofold [Mo79]. F irst, it
is difficult to build tru ly g e n e ra l p u rp o se sy ste m s th a t have a wide m a rk e t
ap p eal. W hereas SSI (sm all scale in te g ratio n ) afforded th e ability to
c o n s tr u c t two flip-flops p e r chip- a r a th e r g e n e ra l c irc u it- th e co m p lex ity of
VLSI sy ste m s a p p e a rs to n e c c e ssa riiy specialize th e m . C onsequently, th e
m a in th r u s t in VLSI r e s e a r c h in g e n e ra l and th is th esis in p a rtic u la r, has
b e e n to stu d y s tr u c tu r e s t h a t a re g e n e ra l p u rp o se w ithin a class of
p ro b lem s. Second, since e n tire sy stem s a re fa b ric a te d on chips, th e design
ta s k s ca n n o t be s e g m e n te d as w ith SSI and MSI an d to som e d eg ree, LSI
s y s te m s an d th e d e sig n e r n e e d s to b e fam iliar w ith m any a sp e c ts of design,
fro m c irc u its to m ach in e o rg an iz atio n an d alg o rith m s [Me 8 l].
This th e s is is c h a ra c te riz e d b y b o th of th e above ite m s. F irst, th e
c h o se n class of p ro b le m s is in signal p ro cessin g a n d re la te d lin e a r a lg e b ra
o p e ra tio n s. The u ltim a te goal is to d esig n a p ro g ra m m a b le , cu sto m
in te g ra te d c irc u it c a p ab le of efficiently im p lem en tin g com plex signal
p ro ce ssin g alg o rith m s u s e d for s p e e c h analysis, digital co m m u n ic a tio n s and
- 2 -
o th e r a re a s re la te d to e s tim a tio n th eo ry . Signal processing h as tra d itio n a lly
o ffered th e g r e a te s t challenge to in teg ratio n , being a sp e e d in te n siv e
a p p lic a tio n o n th e leading edge of technology, even fo r th e sim p le st
a lg o rith m s . VLSI affords f a s te r c irc u its th a n LSI, th u s allowing th e
c o rs id e ra tio n of still m o re com plex m eth o d s of processing. Secondly,
efficien t realizatio n s n e c c e ssa rily include a d e ta ile d stu d y of th e a lg o rith m s
to b e im p lem en ted , an y n u m e ric a l m eth o d s req u ire d to rea liz e th e m a n d a
s tu d y of c o m p u te r a rc h ite c tu re itself. P r e s e n t day thinking h a s led to signal
p ro c e ssin g m ic ro c o m p u te rs w hich m ay b e sim ply view ed as g e n e ra l p u rp o se
m ic ro c o m p u te rs w ith a ra p id m u ltip ly -an d -accu m u late facility; th e N ippon
E le c tric chip [KNSYM80], th e AMI chip [AMI79] an d th e Bell L a b o rato rie s
d ig ital signal p ro c e s s o r (DSP) [B 0 8 O] a re ty p ic a l exam ples. In c o n tra s t, th e
p re m ise of th is d iss e rta tio n is to exam ine th e m ath e m a tic a l fo rm u la tio n of a
c la ss of alg o rith m s in detail, id en tify th e n a tu ra l p rim itive o p e ra tio n s an d
effectively "m a p " th e alg o rith m s onto efficient a rc h ite c tu re s , th u s realizing
in te g r a te d sy ste m s t h a t "pack " m o re com puting pow er in to a given a re a . As
a r e s u lt, m an y different a re a s, ra n g in g fro m signal p ro cessin g m e th o d s an d
n u m e ric a l algorithm s to c o m p u te r a rc h ite c tu re and p a ra lle l p ro ce ssin g will
b e c o v e re d in th e c h a p te rs to com e, w ith new co n trib u tio n s being m ad e to
all. This stu d y c u lm in a te s in a signal p ro cessin g chip of novel a rc h ite c tu re ,
w hose p rim itiv e o p e ra tio n s s e t is b a s e d o n c o o rd in ate tra n s fo rm a tio n r a t h e r
th a n m ultiplication.
M any e stim a tio n r e la te d signal p ro cessin g alg o rith m s p e rfo rm m a trix
o p e ra tio n s su c h a s "fa cto riza tio n s" in w hich an a rb itra ry m a trix is
r e p r e s e n te d as a p ro d u c t of m a tric e s of sim p ler s tr u c tu r e . The larg e
c o m p u ta tio n a l com plexity of th e s e o p e ra tio n s is resp o n sib le fo r a q u e s t fo r
f a s te r com puting s tr u c tu r e s [La74] [SK75a] [Ch75] [Ku79] a s well as
- 3 -
algorithm s of low er c o m p u ta tio n a l com plexity [St73] [Mo74] [SK75b] [He78]
[LM80] [De82]. Existing u n ip ro cesso rs have b e e n u se d w ith only lim ited
success fo r th r e e m a jo r reasons. F irst, th e y a re u nable to efficiently
com pute a v a rie ty of e le m e n ta ry o p eratio n s s u c h as m ultiplication, v e c to r
ro ta tio n an d trig o n o m e tric functions. These o p e ra tio n s a re very com m on to
th e alg o rith m s of in te r e s t h ere. Secondly, g e n e ra l p u rp o se c o m p u te r
a rc h ite c tu re s provide only cu m b erso m e 'address a rith m e tic fo r d a ta
s tru c tu re s , su ch as c irc u la r buffers, th a t occur fre q u e n tly in
com m unications ap p licatio n s [SBA78]. Finally, signal p ro cessin g alg o rith m s
exhibit a s u b s ta n tia l a m o u n t of parallelism th a t is n o t efficiently ex p lo ited in
a u n ip ro c e sso r s y s te m (A n otable ex cep tio n is th e AMD2900 fam ily [AMD78]
which allows som e p a ra lle lism th ro u g h th e extensive u se of two p o rt ran d o m
a ccess m em o rie s (RAM’s)). The first and th ird ite m s have a pro fo u n d im p a c t
on th e design of a m a c h in e 's a rith m e tic facility, w hich is th e p rim e c o n c e rn
of th is th esis, while th e seco n d ite m falls in to th e re a lm of th e p ro g ra m
co n tro ller a n d m il n o t be considered. P a rallel p ro cessin g a rc h ite c tu r e s for
handling s u c h com plex algorithm s w ere afforded m u ch a tte n tio n in th e
lite ra tu re (see [Ku77] fo r a good survey), how ever th e VLSI technology has
im posed new c o n s tra in ts w hich m e rit renew ed in te r e s t in p a ra lle l com puting
s tru c tu re s . Tb-' m a jo r technological c o n stra in t im pacting th e a rc h ite c tu re
of a n in te g ra te d p a ra lle l p ro c e sso r is one of con n ectiv ity [MC80, C h a p te r 8 ].
The ability to effectively utilize silicon a re a fo r c o n stru c tin g m any
processing e le m e n ts will b e lim ited by th e a re a available fo r in te rc o n n e c t.
F u rth e rm o re , th e com m unications r a te betw een e le m en ts is in flu en ced by
th e c a p a c ita n c e of th e in te rc o n n e c t lines. B oth of th e s e fa c to rs c le a rly call
for p a ra lle l s tr u c tu r e s w ith sh o rt com m unications p ath s. T herefore, th e
problem to b e solved is to "m ap" th e com plex algorithm s onto p a ra lle l
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
- 4 -
a rc h ite c tu re s w hich exhibit s h o rt co m m unications p a th s b e tw e en elem ents,
a n d to e n su re th a t th e ele m en ts a re cap ab le of p erfo rm in g th e prim itive
o p eratio n s w hich n a tu ra lly d e scrib e th e algorithm s. In fac t, it will becom e
a p p a re n t t h a t tn e a fo re m e n tio n e d signal p ro cessin g ch ip is su itab le as an
ele m en t of th e p a ra lle l s tru c tu re .
As is u s u a f w ith stu d ies of c o m p u te r a rc h ite c tu re , it is n e c c e ssa ry to
define a m ea siire of su c ce ss (alb e it subjective) in th e r e s e a rc h (e.g. w hat
c o n s titu te s a b e tt e r a rc h ite c tu re ? ). The goals s e t above provide th re e
different in d ic a to rs of su ccess. F irst, th e chip m u s t b e g e n e ra l purpose
w ithin a class of problem s, t h a t is. one s tr u c tu r e should b e capable of
efficiently im plem enting th e d e sire d s e t of algorithm s. In th is sam e context,
a second m e a su re of su ccess, is to d e m o n s tra te th e u tility of th e chip as a
processing e le m e n t in any p a ra lle l a rc h ite c tu re s t h a t a re developed.
Thirdly, th e chip a rc h ite c tu re should provide m o re com puting power p e r
silicon a re a th a n existing in te g ra te d signal p ro ce sso rs, fo r th e class of
p roblem s u n d e r co n sid eratio n . This will be one of th e e a s ie r ite m s to show,
since m a n ) of th e com plex signal p ro ce ssin g alg o rith m s to be considered
involve sq u are ro o t o p erations, w hich c a n n o t b e e ffic ie n t^ co m p u te d by any
existing m ic ro co m p u te r. Still a n o th e r d esirab le fe a tu re would b e to provide
th e m ultiply-and-accum ulate p rim itiv e o p e ra tio n w ithout im p actin g chip
a re a , since th e n existing signal p ro c e sso rs could be viewed as
"special cases" of th e p r e s e n t chip. Finally, a re g u la r lay o u t would be
ex tre m e ly d esirab le, a n d p e rh a p s even m an d a to ry , if th e above goals a re to
b e m et. Why do th is p ro je c t a t all? The rew ard s a re in m an y ways obvious,
fo r th e ability to p e rfo rm a v a rie ty of signal p ro cessin g ta s k s w ith an
in te g ra te d c irc u it is m u c h so u g h t a fte r.
C hapter Two begins w ith a view of p o p u lar signal p ro cessin g m ethods.
- 5 -
T a rg et ap p licatio n s fo r th e ir use a re s tu d ie d in C h ap ter Three, in o rd e r to
a p p re c ia te th e th ro u g h p u t re q u ire m e n ts w hich will be p la c e d on th e final
chip. In th is c o n te x t, a new m eth o d fo r op tim al signal d e te c tio n is
developed, sin c e a n im p o rta n t fe a tu re of th is w ork is to c o n s tru c t
alg o rith m s w hich m ay b e read ily im p lem en ted . The s tu d y of alg o rith m s
allows id en tifica tio n of th e p re d o m in a n t e le m e n ta ry operatio n s; th is tu rn s
o u t to b e a m u c h la rg e r s e t th a n th e usual m ultiply an d a c c u m u la te
prim itiv es com m on in to d a y 's signal p ro ce sso rs. C hapter F o u r is dev o ted to
devising efficient n u m e ric a l tec h n iq u es fo r evaluating th e s e op eratio n s.
Large a rra y s o r m e sh e s of p ro c e sso rs a re developed in C h ap ter Five for
som e relativ ely c o m p u ta tio n intensive m a trix o p eratio n s. These a re again
b a s e d on a r ic h o p e ra tio n s set. Finally, a novel signal p ro cessin g chip
a rc h ite c tu re a p p e a rs in C h ap ter Six, w hich was m o tiv ate d th ro u g h th e
stu d ie s of th e prev io u s c h a p te rs.
BIBLIOGRAPHY
[AMD78] A m erican M icro Devices Inc., AMD2900 B ipolar F a m ily Users
M anual, 1978.
[AMI79] A m e ric an M icrosystem s Inc., S ig n a l p rocessing Peripheral
R e fe re n c e M anual, 1979.
[Bo80] J.R. Boddie, G.T. D aryanani, 1.1. E laum iati, K.N. Gadenz, J.3.
Thom pson, S.M. W alters, ’'A Digital Signal P ro c e s s o r fo r
T elecom m unications A pplications." Proc. o f In t'l. S o lid S ta te
C ircuits C onference, S an F rancisco, CA, 1980
[Ch75] S.C. Chen, "S peedup of Ite ra tiv e P ro g ra m s in M ultiprocessor
S y stem s," Ph.D D issertation, U n iv e rsity o f JZlinois a t Urbana-
Cham paign, Dept, of C om puter Science, Jan u ary , 1975
[De82] J.M. D elosm e, "Algorithm s for F in ite Shift-R ank P ro c esse s," Ph.D
D isserta tio n , S tam ford U niversity, D ept. of E le ctric a l
E ngineering. June 1982
[He78] D. H eller, "A Survey of P arallel A lgorithm s in N um erical L inear
A lgebra,” SIAM R eview , Vol. 20, No. 4, O ctober 1978, pp. 740-776
[KNSYM80] Y. Kawakam i, T. Nishitani, E. Sugim oto, E. Y am au rh i, M. Suzuki,
"A Single-Chip Digital Signal P ro c e s s o r fo r V oiceband
A pplications," Froc. o f I n tl . S o lid S ta te C ircuits C onference, S an
F ran cisco , CA, 1980
[Ku77] D. Kuck, "A Survey of P a ra lle l M achine O rganization and
P ro g ra m m in g ," Assoc, o f C om puting M achinery, C om puting
S u r v e y s , Vol. 9, No. 1, M arch 1977, p p. 29-59
[Ku79] H.T. Kung, "L et’s Design A lgorithm s for VLSI," Proc. o f the F irst
C aitech VLSI S y m p o siu m , C alifornia In s titu te of Technology,
1979, pp. 65-90.
[La74] L.L am p o rt, "The P arallel E x ecu tio n of DO Loops,"
C om m unicaziorts o f the Assoc, o f C om puting M achinery, Vol. 17,
No. 2, F e b ru a ry 1974, pp. 83-93
- 7 -
[LM80] D.T.L. Lee an d M. Morf, "Recursive Square-R oot L adder E stim atio n
A lgorithm s," Proc. 1 9 8 0 ICASSF, Denver, CO, April 9-11, 1980,
[MC80] C. Mead, L. Conway. In tro d u c tio n to VLSI S y s te m s , Addison
Wesley, 1980
[He 8 l] C. Mead, "VLSI a n d Technological Innovations,” Proc. o f VLSI81
In te rn a tio n a l Conference, E dinburgh, Scotland, August, 1981.
. [Mo74] M. Morf, "F a st A lgorithm s for M ultivariable System s", Ph.D
D issertation, S ta n fo r d U niversity, Dept. of E lectrical
E ngineering, 1974.
[Mo7S] G. Moore, ’A r e We Really R eady for VLSI?," Proc. o f the F irst
Caltech VLSI Sym posium ., California In stitu te of Technology,
1979.
[SBA78] A. Sewards, L. B eaudet, H. Ahmed, "Forw ard E rro r C orrection on
a n A eronautical S a tellite Channel.” Proc. o f th e AGARD P anel on
Avionics, May, 1978.
[SK75a] A.H. Sam eh, D. Kuck, "L inear S y stem Solvers for P arallel
C om puters," Technical R eport 75-701, U n iversity o f Illin o is at
Urbana-Champaign, Dept, of C om puter S cience, F e b ru ary , 1975.
[SK75b] A. Sam eh, D. Kuck, "A P a rallel QR-Algorithm fo r Sym m etric,
Tridiagonal M atrices," Proc. o f Second L a n g ley Conference on
S c ie n tific C om puting, 1975.
- 8 -
[St73] H. Stone, "An Efficient P arallel A lgorithm for th e Solution of a
T ridiagonal L inear S y stem of Equations." J o u rn a l of the
A ssociation o f C om puting M achinery, Vol. 20, No. 1, January,
1973, pp. 27-36
- 10 -
CHAPTER TWO
SIGNAL PROCESSING ALGORITHMS
Signal p ro ce ssin g algorithm s m ay b e b ro ad ly classified as e ith e r
fre q u e n c y o r tim e dom ain. The fo rm er, m o re tra d itio n a l m e th o d s m ak e
extensive u se of o rthogonal tra n sfo rm s while th e l a t te r fre q u e n tly e n ta il th e
app licatio n of e s tim a tio n th eo ry . The m o re p o p u lar alg o rith m s fro m b o th of
th e s e classifications will be c o n sid e red fo r VLSI re a liz a tio n in th is tre a tis e .
I t will b eco m e d e a r th a t a single co m p u ta tio n s tr u c tu r e is c ap ab le of
efficiently im p lem en tin g b o th ty p es of algorithm s.
The D iscrete F o u rie r T ransform (DFT) [OS75] is p e rh a p s th e m o st
com m on of ali fre q u e n c y dom ain algorithm s. Its wide app licab ility in a re a s
su c h as s p e c tra l estim atio n , filtering, digital co m m unications, c o n tro l and
id en tificatio n m a k e s its VLSI rea liz a tio n a p ro b le m of g r e a t im p o rta n c e . A
v a rie ty of fa s t alg o rith m s have b e e n developed fo r th e DFT, th e m o st
com m on being th e F a st F o u rier T ransform (FFT) a lg o rith m of Cooley and
Tukey [CT65]. C onsiderable a tte n tio n has b e e n given to th e c o n s tru c tio n of
DFT a n d FFT p ro c e sso rs, e.g. [De74] [De79] [P e 6 8 ] [Sw78].

\
Som ew hat less a tte n tio n h as b e e n afforded to th e new er tim e do m ain
algorithm s b a s e d o n estim ation, theory. L adder form s (see [Tu 8 Q] fo r a good
survey) a re am ong th e m o st p ro m isin g of su c h alg o rith m s owing to th e ir low
co m p u ta tio n a l com plexity an d rec u rsiv e s tru c tu re . The b a sis or th e s e so
called "la d d e r filters" lies in th e th e o ry of e x a c t le a s t sq u a re s p re d ic to rs an d
w h iten ers [VT6 8 ]. These algorithm s enjoy as wide ap p licab ility as th e DFT.
however, w ith fre q u e n tly b e tte r p ro p ertie s. F or exam ple, ia d d e r fo rm s u se d
for s p e c tr a l e s tim a tio n exhibit co n sid erab ly less sidelobe lea k a g e th a n th e
- 11 -
F o u rie r tra n s fo rm . There a re also a v a rie ty of signal p ro cessin g r e la te d
m a trix o p e ra tio n s, however th e s e will be c o n s id e re d in a la te r c h a p te r.
2.1 THE DISCRETE FOURIER TRANSFORM
The DFT, X (k), of an N p o in t sequence, x (n ), is defined by:
E z (n ) 0< k <N - 1
n= 0
* (* ) =
otherw ise
w here
WN = e-J'ai/jV
and x (n ), X (k ) a re in g e n e ra l com plex.
The DFT is th e tra d itio n al m eth o d of o b taining a s p e c tra l re p re s e n ta tio n
of a tim e se rie s a n d has b e e n effectively em ployed in a m u ltitu d e of a re n a s
including s p e e c h an d p ic tu re processing. Im p le m e n ta tio n s of th e DFT will be
c o n sid e re d in C h ap ter Six.
2 .2 EXACT LEAST SQUARES LADDER ALGORITHMS
C o n sid er th e p ro b lem of whitening a c o rre la te d tim e series, i.e.,
d e te rm in in g th e p ro c e ss of new (o r u n p re d ic ta b le ) in fo rm a tio n c o n ta in e d in
e a c h e le m e n t of th e series, know n as th e "in n o va tio n s p ro c e ss" [GK73]. The
tim e s e rie s is assu m e d to a rise from a finite o rd e r au to re g re ssiv e (AS)
p ro c e ss, y t a n d th e innovation of e a c h sam p le is d e te rm in e d fro m an
a u to re g re ssiv e , lin e a r le a st sq u a re s p re d ic tio n of t h a t sam ple b a se d on ’n ’
previous o b servations. This p ro b le m h a s th e c a sc a d e la d d e r fo rm so lu tio n
shown in F ig u re 2.1. Notice th e sim plicity of th e s tr u c tu r e , w hich co n sists of
- 12 -
- 13 -
delay elem en ts, gains an d sum m ing ju n ctio n s. The filte r coefficients a re
a d ju ste d to yield th e forw ard a n d backw ard p re d ic tio n e rro rs defined as:
e n .T = y T - Use ( y T I < y k & - 1 )
r n .T = y T ~ n ~ l l s e ( l / f - n I \V k ] k = T - n ^ l )
w here Use (2 jy ) d e n o te s th e linear, le a s t sq u a re s e s tim a te of x given y .
An e x tre m e ly useful p ro p e rty of th is s tr u c tu r e is th a t all th e p red ic tio n
e rro rs of o rd e r 1 to n a re obtained sim ultaneously in one s e t of
calculations. This is p a rtic u la rily useful fo r identifying th e o rd e r of a m odel.
The alg o rith m is said to be bo th o rd e r a n d tim e rec u rsiv e i.e., th e
coefficients of su c ce ssiv e filte r stag es a re c o m p u te d fro m th e previous
stag es (o rd e r re c u rsio n ) an d th e filter coefficients a re also u p d a te d b a sed
on new d a ta (tim e re c u rsio n ) th ro u g h th e ad d itio n of c o rre c tio n te rm s
r a th e r th a n rec o m p u tin g th e filte r fo r e a c h tim e s te p or o rd er.
A lthough la d d e r s tr u c tu r e s m ay be d eriv ed w hen th e p ro ce ss
covariance is e ith e r know n o r unknown, i t is th e l a t te r case which is of
fo rem o st in te r e s t in re a ltim e applications a n d in th is th esis. The p ro ce ss
covariance is e s tim a te d fro m seg m en ts of p a s t d ata; th e e x a c t se g m e n t
being d e t e r m in ed b y a d a ta window. The "slid in g w in d o w ” lad d e r form
[PFM82] uses a r e c ta n g u la r d a ta window of c o n s ta n t length, R , sam ples.
The "w eighted p re w in d o w e d " lad d e r fo rm [LM60], in w hich a positive
"fo rg e ttin g fa c to r " , X, of m agn itude less th a n u n ity is u se d to weight th e
previous d a ta a t e a c h tim e ste p , thus deem phasizing th e im p o rta n c e of o ld er
sam ples, is th e m o s t p o p u lar. B oth of th e s e filte rs a re able to tra c k sm all
variatio n s in p ro c e s s s ta tis tic s b e c au se of th e ir co n tin u e d deem phasis of old
data.
- 14 -
The prew indow ed la d d e r fo rm is d escrib ed in [LM80] and d efined by:
£o.t = r o.r = V t
a _ %a . sn.T*1rn.T
“ n+1.7+1 _ * &n+l .T + “
7r.-l.T
_ Ar+1.7
^ " J#.T
IV - _ An+1.7*
-Kn.T-1
Sn+1.T — sn.T ~ Kn+l.T rn.T-l
tta-i.T ~ r n.T-l ~ Kn+l.T En.T
DC
Xn.T+l —
~ X f?z +
Kn.T 4-
_
1
7 » -l .T
Kk.T+1 = x Kk.T + T
4 '7+1
7 n - i.r
^n+l.r
7 i» « .r = 7 n .r -
f i.r
w here
An+i 7 is th e (7t+l)Wl order partial correlation of yr
K £ + u • Rn+i.r are the forward and backward filter gains of the
(7i+ l)<ft filter stage
Rr .t . Rn.T are the covariances of the forward and backward

prediction errors and
7 „ .7 is a likelihood v ariable of n th o rd e r
These equations a p p e a r q u ite form idable com putationally, exhibiting a
n u m b er of housekeeping c h o re s a t e a ch tim este p , su c h as continuous
- 15 -
e s tim a tio n of th e re sid u a l covariances R£j an d Rn.T- This o b servation
p ro m p te d th e developm ent of th e so called "sq u a re ro o t norm alized" la d d e r
fo rm s in w hich th e ran d o m v ariab les in th e alg o rith m a re n o rm alized to
have u n it varian ce, th u s elim inating th e n e e d for th e R£j and. R £ .j
eq u atio n s. The v ariables a re th e n f u rth e r n o rm a liz e d by th u s rem oving
th e y n T u p d a te . The resulting a lg o rith m c o m p rise s few er equations a n d
h a s th e a d d e d fe a tu re th a t all q u a n titie s have m ag n itu d e b ounded by u n ity
th e r e b y m aking fixed p o in t im p le m e n ta tio n viable. The prew indow ed
a lg o rith m is sum m arized:
Order Updates:
Pn+l.T = V l —Vn.T V l —Vn.T-1 Pn +l.T-1 + Vn.T Vn.T-1
_ vn.T ~ Pn+l.T V n .T -l
Vn +1.T ~ i- g r- g--------
"V 1-Pn+l.T V l ~ V n . T - l
_ V n .T -l ~ Pn+l.T Vn.T
n + lS V l - p | +i.7-V1-1/2.J-
w here v and 77 a re th e n orm alized fo rw ard an d backw ards resid u a ls
re s p e c tiv e ly a n d p is th e filte r g a in (o r re fle c tio n coefficient or n orm alized
p a rtia l c o rre la tio n ).
Tim e Updates:
R t - R t - i [ X + h$ ]“*
Vt
V o .T - vo. t = —j -m-
"v R t - i
- 16 -
N otice t h a t th e m ajo r am o u n t of com plexity is in th e fo rm e r th re e equations
since th e s e m u st be e x e c u te d for e a c h s ta g e of th e la d d e r filter. These
equations will consequently be of p rim a ry c o n c e rn in th e p re s e n t work an d it
suffices to s ta te in advance th a t a com puting s tr u c tu r e capable of efficiently
p erform ing th e s e co m p u tatio n s will also b e c a p ab le of efficiently calculating
th e la s t th r e e equations.'
While ap p licatio n s of th e DFT have b e e n quite well studied, new uses of
la d d e r alg o rith m s, som e of which will be d e s c rib e d in th e n e x t c h ap ter, a re
c o n sta n tly em erging.
BIBLIOGRAPHY
[CT65] J.W. Cooley, J.W. Tukey, "An A lgorithm for th e M achine Calculation
of Complex F o u rie r S eries." Math. C om putation, Vol. 19, 1965, pp.
297-301
[Be74] A.M. Despain, "F ourier T ran sfo rm C om puters Using CORDIC
Ite ra tio n s," IE E E Trans. C om put., Vol. C-23, Oct. 1974, pp. 993-
1001 .
[De79] A.M. Despain, ’V ery F a s t F o u rie r T ransform Algorithms fo r
H ardw are Im p lem en tatio n ," IE E E Trans. Com puters, Vol C-28,
No. 5, May 1979, pp. 333-341.
[GK73] M. Gevers, T. Kailath, "An Innovations A pproach to L east-S quares
E stim ation, P a r t VI : D iscrete-Tim e Innovations R ep resen tatio n s
a n d R ecursive E stim ation," IE E E T ransactions on A utom atic
Control, Vol. AC-18, D ecem ber, 1973, pp. 588-600.
- 17 -
[LM80] D.T.L. Lee an d M. Morf, "R ecursive Square-R oot L adder E stim atio n
A lgorithm s," Proc. 1 9 8 0 1CASSP, Denver, CO, April 9-11, 1980,
[0S75] A. Oppenheim , R. Schafer. D igital S ig n a l P rocessing, P re n tic e
Hall, 1975
[PFM82] B. P o ra t, B. F ried lan d er, M. Morf, "Square Root Covariance
L ad d er A lgorithm s," IE E E Trams, on A utom atic Control, Vol. 27,
No. 4, August, 1982.
[Sw78] E. Sw artzlander, "VLSI Technology fo r Signal P rocessing," Proc.
o f GOMAC, M onterey, CA., 1978, pp. 76-79.
[VT68] H. Van Trees, D etection, E s tim a tio n and M odulation Theory,
Volume I, J. Wiley and Sons, 1968.
- 18 -
CHAPTER THREE
APPLICATIONS OF LADDER FORMS
This se c tio n will explore a variety of signal p ro ce ssin g applications of
lad d e r form s. S p e e c h analysis and sy n th e sis b a sed on lin ear predictive
tech n iq u es will b e d iscu ssed first, followed by ad ap tiv e equalization an d
finally a new tec h n iq u e for digital signal d e te c tio n . A lgorithm s of lin e a r
alg eb ra c a n fre q u e n tly b e c a st into a la d d e r s tr u c tu r e , however exam ples of
th is will b e d e fe rre d till C hapter Five. The aim of th e p re s e n t study is to
identify th e im p o rta n t com ponents of th e a lg o rith m s th a t will ultim ately be
im p lem e n te d in a special a rc h ite c tu re , th u s unifying th e algorithm s, a n d to
develop a n a p p re c ia tio n fo r th e th ro u g h p u t re q u ire m e n ts to be placed on
th e signal p ro ce ssin g chip.
3.1 THE SPEECH ANALYSIS PROBLEM
L adder fo rm s a re useful s tru c tu re s fo r th e analysis an d synthesis of
sp eech using th e m eth o d s of lin e a r p re d ic tiv e coding (LPC) [MG76],
A nalysis, n o t to be confused w ith re co g n itio n , r e f e rs to th e problem of
d e te rm ining a s p e c tr a l re p re s e n ta tio n of a d isc re tiz e d seg m en t of sp e e c h
while s y n th e s is is th e a c t of re c o n s tru c tin g th e analog sp e ec h from its
sp e c tra l re p re s e n ta tio n .
T echniques fo r sp e e c h synthesis will be d e sc rib e d first, following which,
th e sp e ec h analysis p ro b le m will be discussed. The u se of lad d e r form s for
sp eech an aly sis will b e th e m ajo r case m otivating th e design of a la d d e r
form chip set.
- 19 -
3.1.1 S p eech S yn th esis T echniques
A triv ia l way to synthesize a p a rtic u la r s e g m e n t of sp e ec h is to s to re
sam ples of th e se g m e n t in digital form , for ex am p le pulse code m odulation
(PCM)[TS7l]. If th e s e sam ples a re m ad e fre q u e n tly enough (i.e. a t le a s t a t
th e N yquist r a te ) th e n th e sp e ec h is re g e n e r a te d b y passing th e sam p les
th ro u g h a digital to analog (D/A) c o n v e rte r. Such a n a p p ro ach re q u ire s a
relatively la rg e a m o u n t of d a ta to sy n th e size a s h o rt seg m en t of sp eech .
A lternately, good quality sp e ec h m ay g e n e ra lly b e synthesized w ith m u c h
less d a ta by deriving p a ra m e tric m odels fo r th e s p e e c h p ro d u ctio n p ro c e ss
and rep la cin g a sp e e c h segm ent by its few er m odel p a ra m e te rs , w hich
p resu m a b ly re q u ire s few er bits.
A lin e a r m odel fo r sp eech p ro d u c tio n was developed in 1960 b y F a n t
[Fa60] in w hich th e various physiological e le m e n ts of th e vocal tr a c t a re
m odelled as tim e varying lin ear filters (F igure 3.1), th e ag g reg atio n of w hich
is r e f e rr e d to a s th e "syn th esis f i l t e r " . S p e e c h is p ro d u ce d by exciting th e
sy n th esis filte r w hose sp e c tra l p a ra m e te rs have b e e n a d ju sted to yield a
p a rtic u la r s p e e c h segm ent, as shown in F igure 3.2. The filter in p u t for
unvoiced sounds, s u c h as / / / in f i s h is a w hite noise signal. Voiced sounds
su c h as / i / * in eve a re p ro d u ce d w ith a n im pulse tra in of p erio d P , th e
p itc h p eriod.
S p e e c h signals ad m it to a u to re g re ssiv e m odelling, m eaning th a t th e
a g g re g a te filte r h as only poles. This filte r is typ ically of te n th o rd e r so t h a t
th e d a ta s e t re q u ire d to synthesize a s h o rt se g m e n t of sp e ec h co n sists of
th e te n filte r coefficients, som e p itc h p e rio d in fo rm atio n and p e rh a p s som e
pow er in fo rm a tio n fo r th e w hite n o ise p ro ce ss. This c a n r e p r e s e n t
co n sid erab le savings over d ire c t sto ra g e of PCM coded sam ples as will be
se e n in t h e exam ple to follow.
- 20 -
impulse t r a i n
G lo tta l Model Vocal T ra c t Model
white noise
S p e ctral Correction Lip Radiation

speech
Figure 3 .1 : F i l t e r Model f o r Speech Synthesis
impulse t r a i n
M
SYNTHESIS FILTER /VA/
speech
w hite noise
R eflec tio n C o e f f i c ie n t s
Figure 3 .2 : Speech Synthesis
- 21 -
Texas In stru m e n ts Inc. an n ounced a lin e a r p red ic tiv e sp e e c h sy n th e sis
chip s e t [WB7B] in 1978, consisting of th r e e chips d e p ic ted fu n ctio n ally in
F igure 3.3. A r e a d only m em ory (ROM) is u s e d to s to re th e sp e e c h
p ro d u c tio n p a ra m e te rs fo r th e v o cab u lary to b e u sed in an y p a rtic u la r
ap p lic atio n (e.g., TI’s SPEAK and SPELL re q u ire s th e 26 le tte r s of th e
a lp h a b e t a s well as approxim ately 200 w ords). T hese p ro d u ctio n p a ra m e te rs
a re re trie v e d as re q u ire d by a c o n tro lle r c h ip and p re s e n te d to th e
s y n th e s iz e r chip w hich is an all pole, la d d e r sy n th e sis filter.
One c o m p le te p a ra m e te r strin g is 49 b its in len g th and is re trie v e d
ev ery 20 m illiseconds. In c o n tra st, a 20 m s s p e e c h seg m en t sa m p le d a t
8 KHz utilizing 8 b it PCM would req u ire 1280 bits! Clearly, th e u se of LPC in
sp e e c h sy n th e sis yields a re m a rk a b le savings in th e sy n th esis d a ta s e t (and
h e n c e in th e to ta l a m o u n t of ROM re q u ire d fo r th is ch ip set), how ever som e
qu ality m u s t be sacrificed. F requently th e quality afforded by 8 bit PCM is
n o t re q u ire d in m an y applications.
3 .1 .2 S p eech A nalysis
D ep icted in F igure 3.4, sp e ec h analysis involves d e t e r m in in g th e filter,
r e f e rr e d to as th e "a n a lysis" o r " in v e rse " filter, w hich w hitens a sp e e c h
p ro c e ss. Clearly, since th e sy n th esis filte r, w hen d riv en by an im pulse tr a in
an d w hite noise, p ro d u c e s speech, th e analysis o p e ra tio n s m u st g e n e ra te
th e im p u lse tra in , w hite noise p ro ce ss an d s p e c tr a l p a ra m e te rs fro m th e
s p e e c h p ro ce ss. Consequently, a c a sc a d e of th e analysis e n d sy n th e sis
filters h a s u n ity tra n s fe r function (assum ing th e m odelling assu m p tio n s
w ere a c c u ra te ). An in te rp re ta tio n of th e analysis p ro b lem is c le a r in an
e s tim a tio n co n tex t. The in p u t o r s p e e c h p ro c e ss is c o rre la te d a n d th e
analysis filte r is t h a t filte r which p ro d u c e s th e innovations p ro c e ss [GK73]
- 22 -
Memory S y n th es iz er — [ f l W
speech
C on tro lle r
Figure 3 .3 : Three Chip S y n thesizer by Texas In strum ents Inc.
impulse t r a i n (p itc h )
r \ r r\ : ANALYSIS FILTER
speech
w h ite n o is e (power)
R eflection C o e f f i c ie n t s
Figure 3 .4 : Speech A nalysis
- 23 -
fro m th e in p u t. U n d er assum ptions of s ta tio n a rity , frequency dom ain
ex p ressio n s for th e w hitening filter w ere given by W iener [Wi49], However,
fo r th e p u rp o se of realizing a sp eech a n aly zer in VLSI, it is p referab le to view
th e p ro b le m in th e tim e dom ain, and in p a rtic u la r, in a d isc re te tim e
dom ain, h e n c e th e u se of lad d e r filters.
The an aly sis p ro b le m is actually th re e fold;
(1) d e te rm in e th e whitening filter
(2 ) d e te rm in e th e period of th e im pulse tra in , i.e.. th e p itc h period.
(3) d e te rm in e th e power of th e noise p ro c e ss.
This th e s is will p rim a rily ad d ress th e firs t issue. Ite m (3) is of course a
b y p ro d u c t of th e w hitening filter since th e noise p ro ce ss (i.e. th e
u n c o rre la te d innovations p rocess) is th e o u tp u t of th is filter, however,
fre q u e n tly th e noise pow er is norm alized to u n ity an d a c c o u n t of th e gain is
ta k e n elsew here. Many au th o rs have e x am in ed v arious tech n iq u es for p itc h
p e rio d e x tr a c tio n - se e for exam ple [AH71]. A novel m a xim um likelihood
a p p ro a c h using la d d e r form s was developed by Lee and Morf [LH80]. This
m eth o d u tilizes th e s ta tis tic a l d istrib u tio n of th e la d d e r form likelihood
v ariable (se e C h a p te r Two) to red u c e th e p ro b le m of d e te c tin g a p itc h pulse
to a b in a ry h y p o th e sis testin g problem .
3 .1 .3 S p eech A nalysis w ith Square R oot N orm alized Ladder Form s -

The A nalysis f ilt e r
N otice t h a t th e la d d e r form algorithm s of S e c tio n 2.2 com pute th e b e s t
w hitening filte r fo r a process in a le a s t sq u a re s se n se i.e. th e new
in fo rm atio n or innovation of a sam ple of th e in p u t p ro c e ss is obtained a s th e
difference b e tw e en t h a t sam ple and a lin e a r le a s t sq u a re s e stim ate of th e
- 24 -
sam ple fro m a h isto ry of th e p ro ce ss. T herefore, la d d e r algorithm s m ay b e
u sed to d e te rm in e th e analysis filter u n d e r a le a s t sq u are e rro r c riterio n .
G enerally, analysis is p e rfo rm e d o n a fra m e of sp e ec h sam ples u n d e r
th e a ssu m p tio n th a t th e sh o rt te r m s p e e c h s p e c tru m is s ta tio n a ry .
However, sin c e la d d e r algorithm s a r e b o th tim e and o rd e r recu rsiv e,
analysis m ay be done on a sam ple by sa m p le basis, elim inating th e n e e d fo r
su ch a ssu m p tio n s. In fact, slow v aria tio n s in th e sp e c tru m a re tra c k e d by
th e a lg o rith m due to its adaptive n a tu re . F u rth e rm o re , sam ple by sam p le
analysis d e te rm in e s th e inverse filter in d e p e n d e n t of any p a rtic u la r coding
tech n iq u e, leaving th is choice to th e s y s te m designer.
Rem ark:
A le a s t sq u a re s e rro r c rite rio n is fre q u e n tly o b jec ted to b e c au se it does
n o t a p p e a r to c o rre la te well w ith su bjective d isto rtio n m e a su re s. However, a
w eighted le a s t sq u a re s (WLS) c rite rio n c o rre la te s very well w ith th e s e
subjective m e a su re s. Often, WLS c a n b e achieved by applying a le a s t
sq u a re s c rite rio n to a p refiltere d v e rsio n of th e d a ta , h en ce justifying th e
use of a le a s t sq u a re s m easu re.
L a d d e r algorithm s, like m o st signal p ro ce ssin g algorithm s, a re q u ite
com p u tatio n ally expensive req u irin g m uch p rocessing power. T he
discussions of C h a p te r Two su g g e ste d th a t th e sq u are r o o t n o rm alized
la d d e r re c u rs io n s a re m o re su ite d to digital im p lem e n ta tio n th a n th e ir
u n n o rm a liz e d c o u n te rp a rts, since th e r e a re few er equations a n d all
va ria b le s a r e m ag n itu d e norm alized, m aking fixed point im p le m e n ta tio n
viable. U nfortunately, th e norm alized eq u atio n s also req u ire th e ca lc u la tio n
of sq u a re ro o ts, w hich is generally a tim e consum ing propositition. The n e x t
s e c tio n will explore a m eth o d of re c a s tin g th e la d d e r re c u rsio n s, in an effort
- 25 -
to expose th e o p e ra tio n s w hich a re fu n d a m e n ta l to th e equations and to
re d u c e th e ir c o m p le x ity to a m anageable level.
3 .1 .4 An A ltern ate View of th e Square R oot N orm alized Ladder E quations
R ecall th e sq u a re ro o t norm alized la d d e r algorithm :
P n + l.r = V 1 —V n .T V 1 - V n . T - l P n + l.T - l + v n .T V n . T - l
,, _ v n .T ~ Pn+1.7- ^ w T - 1
Ti+l.r r 5 r 5
V i - P n t i J V l-rjn .T -i
_ V n .T -1 ~ P n + l.T v n .T
n* " T
w here i/ and rj a re th e norm alized forw ard a n d backw ard resid u a ls
resp e c tiv e ly a n d p is th e filter gain (o r re fle c tio n coefficient or norm alized
p a rtia l c o rre la tio n ).
Clearly, th e dig ital rea liz a tio n of th e s e th r e e eq u ations is nontrivial,
req u irin g th e u se of sq u a re ro o t, m u ltip lic a tio n and division operations.
G enerally, th e s e o p e ra tio n s a re quite expensive, req u irin g e ith e r special
h ard w are (e.g. a r r a y m ultipliers) o r req u irin g c o n sid e rab le ex ecu tio n tim e
th ro u g h th e r e p e a te d u se of sim ple h a rd w a re (e.g. sh ift a n d ad d in ste a d of
m u ltip lic a tio n o r N ew ton's m eth o d [SD65] for sq u are ro o t operations).
However by placin g a p p ro p ria te in te rp re ta tio n s o n th e la d d e r variables, th e
la d d e r eq u a tio n s m ay b e w ritte n in te rm s of v e c to r ro ta tio n s w hich a re few
in n u m b er. E fficient n u m erica l alg o rith m s for com puting th e s e ro ta tio n s
will be given in C h ap ter Four, th u s allowing fo r a p a rtic u la rly effective
im p le m e n ta tio n of th e la d d e r filter.
F o r n o ta tio n a l convenience, le t
- Zb
P ~ P n + l . T —1 • P+ ~ P n + l.T (3.1a)
V = Vn S , V+ - Vn+l'T (3.1b)
V = V n .I'-l ■ V+ = V n + l.T (3.1c)
ZC = V l - x 2 . X~c = (1 - X * ) ~ l/Z (3. Id)
Then th e la d d e r re c u rs io n s m ay b e re w ritte n as:
P+ = ifrfp + V7] (3.2a)
v+ = ( v - p + r f i / p l r f (3.2b)
77+ = (77 -p+v)/p%vl (3.2c)
Now, observe th a t:
( 1 ) Since \v\, | p | , 177 1 < 1 always, in te r p re t v, 77, p a s co sin es of
som e angles and resp e c tiv e ly . N ote th a t if
x = cos tJj; th e n x c = sini?z .
(2) Let
if v
V = (3.3)
—v i f *- [ 58 * - [ S 7
w here V a n d N a re o rthogonal o r 2x2 ro ta tio n s. Then th e m a trix
p ro d u c t VAN is
if V 7f -7 7 i f r f p + vr\ i n f - p r\if
VAN = (3.4)
—V i f Ikd °1
iJ 77 77c i f 77 —p v r f i f r f + pvrj
p+ ' 1/ .
d o n 't (3.5)
V* care
I/. T}.
an d [1/+ 77+] = [1 0 ] /? (3.6a)
0 0
- 27 -
p r
w here R (3.6b)
0
is S —orthogonal w ith r e s p e c t to:
-1 0
0 1,
N otice th a t u n d e r th e in te rp re ta tio n s of (1) both, V an d N are
orthogonal m a tric e s re p re s e n tin g ro ta tio n s of th e colum n v e c to rs of A
th ro u g h and It is in te re s tin g to no te th e fu n d am e n ta l n a tu re of
ro ta tio n s in th e la d d e r re c u rsio n s. In p a rtic u la r, ro ta tio n s by and t) v
give alm ost th e e n tire u p d a te of th e la d d e r variables, a final J-ro ta tio n being
n e c e ssa ry to yield v+ an d 77+. This is n o t too su rp risin g sin ce la d d e r form s
a re n a tu ra lly r e la te d to th e G ram -Schm idt o rth o n o rm a liz a tio n and
orthogonal tra n sfo rm a tio n s as shown in [MML81]. Im p le m e n ta tio n of th e
la d d e r recu rsio n s, w hich a p p e a r to have m an ag eab le com plexity when
ro ta tio n s a re th e fu n d am e n ta l o p erations, will b e co n sid ered in C h a p te r Six.
3 .2 ADAPTIVE EQUALIZATION
An im p o rta n t p ro b le m in digital telephony is to achieve h ig h d a ta r a te s
acro ss a telep h o n e line, in th e p re s e n c e of a severe filtering effect c a u se d by
th e channel. The a c t of p refilterin g th e input d a ta to th e re c e iv e r (Figure
3.5) to c o m p e n sa te fo r th e c h a n n el c h a ra c te ris tic s is know n as e q u a liza tio n
[LSW65]. A com prom ise e q u a lize r is basically a fixed tra n s fe r function, high
p a ss filter. While a p e rfo rm a n c e im p ro v em en t is realized, su c h a n equalizer
is unable t c effectively cope w ith e ith e r th e variatio n of c h a ra c te ris tic s from
line to line, o r w ith th e slow tim e v ariations of any p a rtic u la r line. An
adaptive e q u a lize r is a filte r whose s p e c tra l p ro p e rtie s a re continuously se t
- 28 -
by a n ongoing lea rn in g of th e ch an n el c h a ra c te ris tic s . Initially, th e
equalizer is tra in e d w ith a known tra in in g seq u en ce a n d subsequently, it is
able to tra c k sm all variatio n s of th e line, b a se d on th e actu al, useful data.
P o p u lar im p lem en tatio n s of adaptive eq u alizers a re of th e ta p p e d delay
line v a rie ty (Figure 3.6), in which th e ta p coefficients a re a d ju ste d th ro u g h a
le a s t m e a n sq u ares (LMS) g rad ie n t algorithm [¥i70] els defined by:
Z* = E c*Tk -n
n
cA+1 = - Aen
w here
is th e n th com plex ta p coefficient a t tim e t
rn is a com plex input sam ple (applies to all lin e a r
m odulation schem es)
z k is th e com plex equalizer o u tp u t
A is a rea l a d a p ta tio n c o n sta n t
en is a com plex e rro r signal supplied fro m elsew here in th e
m odem (re fe rre d to as decision fe e d b a c k equalization)
This s tr u c tu r e will b e rev iste d briefly in C h a p te r Six. A draw back of
g ra d ie n t algorithm s is th e ir relatively slow co n v erg en ce ra te , owing to th e
locally optim al b u t globally suboptim al tra je c to ry followed by th e filter
during th e ad aptation. Specifically, changing th e coefficients of th e
eq u alizer a t each tim e ste p in a m a n n e r w hich re s u lts in th e la rg e st
re d u c tio n of th e m ea n sq u are e rro r a t th a t s te p (local optim ality) does n o t
g u a ra n te e th e mosc ra p id tra je c to ry to th e d e s ire d solution (global
o ptim ality). In c o n tra st, th e la d d e r algorithm s a r e globally o p tim al in a
m e a n sq u are sense since th ey satisfy a le a s t sq u a re s c rite rio n a t e a ch ste p
of th e ite ra tio n (it is notew orthy th a t le a s t sq u a re s ta p p e d delay line filters
Channel Disturbances
1
Modulator 1 x Demod
1 x
Channel Equalizer
Figure 3 . 5 : Channel E q ua liz a tio n
Figure 3 . 6 : Tapped Delay Line Equalizer

- 30 -
also exist, see e.g. [Mo74]). F u rth e rm o re , th e y exhibit th e sam e o rd e r of
c o m p u ta tio n a l com plexity as does th e LMS a lg o rith m [Sh?9], thus
m otivating th e ir stu d y fo r fa s t s ta rtu p eq u alizers. V arious a u th o rs have
s tu d ie d th e e x a c t le a s t sq u ares equalizer [Sh79] an d th e re s u lts of S atorius
a n d P a c k [SP80] a re p re s e n te d h e re .
3.2.1 E q ualizer S tru ctu re
L et ^0 * ^ = 0 b e a known train in g se q u en c e tra n s m itte d a c ro ss th e
ch an n el of F igure 3.5. Let th e in p u t se q u en c e to th e e q u alizer be \x k ]k=c
an d le t
Xjv(fc) = [* (* ) x(fc-l) ••• x{k-N )]T (3.7)
w ith
= 0 V i >n .
The p ro b le m is to d e te rm in e th e N + l dim ensional v e c to r F^rC*) of
ta p coefficients, w hich m inim izes th e m e a n sq u a re e rro r:
mse = £ X*"P $ a(p ) + Y $ { k ) X # i p ) l 2 (3.8;

p=o
The p a ra m e te r, X : 0 < X;= 1, is a re a l c o n s ta n t w hich d e te rm in e s th e
m em o ry of th e equalizer. The eq u alizer c a n eS ectively t r a c k slow ch an n el
v a ria tio n s if th e m em o ry is n o t infinite.
E quating th e derivative of (3.8) to z e ro yields th e e q u alizer equations
for m in im u m m e a n sq u are e rro r, which have a solution in la d d e r fo rm as
d e m o n s tra te d in [SP80]. The re su ltin g e q u a lize r s tr u c tu r e , show n in F igure
3.7, c o n sists of a n e x a c t le a s t sq u a re s la d d e r w hitening filte r a n d a ta p p e d
d elay line. F igure 3.8 is a co m parison of th is la d d e r alg o rith m (LSALE) w ith
- 3] -
«(n )
r_{n) r , (n)
sta g e STAGE STAGE

xin)-
NM
e . (n)
The least squares, adaptive lattice equalizer.
The mth stage of the lattice.
Figure 3 . 7 : Ladder F i l t e r Equalizer
- 32 -
CHANNEL-CORRELATlON MATRIX
EIGENVALUE R A T I O - 11
11 TAP EQUALIZER. NOISE VARIANCE • .001
ALCE
Ui
UJ
u. <
C
O 3
2« -1.0
25 z< GRADIENT
Ui ALGORITHM
2
OPTIMUM
100 200 300 40 0 500 600 700 800 300
NUMBER OF ITERATIONS
Comparison by simulation of convergence properties for eigenvalue

ratio = 11.
CHANNEL-CORRELATlON MATRIX
EIGENVALUE RATIO - 21
11 TAP EQUALIZER. NOISE VARIANCE - .001
0.0 ALCE
u.
O LSALE GRADIENT
ALGORITHM
C
o
.J
OPTIMUM
- 2.0
100 200 300 4 00 500 600 700 800 900
3.0
NUMBER OF ITERATIONS
Comparison by simulation of convergence properties for eigenvalue

ratio = 21.
Figure 3 . 8 : Performance o f Ladder Equalizer
- 33 -
th e conventional IMS ta p p e d delay lin e e q u alizer (G radient) and a g ra d ie n t
la d d e r (ALCE) equalizer, for a ty p ic a l tele p h o n e c h an n el (from [SP80]).
Notice th a t th e convergence and b ias of th e e x a c t le a s t sq u ares la d d e r
equalizer a re fa r s u p e rio r to th e o th e r s tr u c tu r e s , fo r th e sam e o rd e r of
c o m p u ta tio n a l com plexity, b e c au se th e co n v erg en ce tra je c to rie s of th e
fo rm e r m e th o d a re globally optim al.
3.3 DETECTION OF DIGITAL SIGNALS
An age old pro b lem has b e e n th e d e te c tio n of digital signals in additive
coloured G aussian noise (see e.g., [VT6B]). S olutions th a t a re optim al in a
m axim um likelihood sen se have b e e n d e s c rib e d [WJ65], however a tte n tio n
h a s b e e n p rim a rily afforded to obtaining com p u tab le ex pressions for th e
likelihood ra tio s th a t o ccu r in th e se solutions. O ptim al d e te c to r s tru c tu re s
a re quite sim ple w hen th e additive n o ise is w hite an d has a circu larily
sy m m e tric p ro b ab ility density, fo r e x a m p le G aussian [WJ65]. However, noise
colouring c o m p licates th e p ro b lem a n d two im p lem e n ta tio n a p p ro ach es
have b e e n fre q u e n tly em ployed in s u c h situ a tio n s. The first of th e s e is to
sim ply ig n o re th e colouring a lto g e th e r an d th e re fo re , em ploy well known,
realizable o p tim al d e te c to rs for ad ditive w hite G aussian noise. The second
ap p ro a c h is to w hiten th e observed signal p lu s noise p ro c e ss an d follow this
w ith a d e te c to r which is optim al for w hite noise. This la tte r technique, losing
th e so c a lled w h iten in g f i l t e r , involves p rio r know ledge of th e covariance
s tr u c tu r e of th e noise p ro c e ss in o r d e r to d e te rm in e th e inverse filter. This
is clearly "asking for a l o t While one fre q u e n tly does n o t know th e
covariance s tr u c tu r e of a noise p ro c e ss, it m ay b e possible to a s s e rt a m odel
for th e noise, e.g. it is a n a u to re g re ssiv e p ro c e s s of m o d e st o rd er. In this
case, la d d e r form s provide a nice s tr u c tu r e fo r op tim al d etectio n .
- 34 -
L ad d er filters a re very useful for estim atin g th e likelihood r a tio of
digital signals in ad ditive G aussian noise. In fact, i t will b eco m e c le a r t h a t
th e la d d e r form m ay essen tially b e used as a n adaptive w hitening filte r
w hich le a rn s th e co v arian ce s tr u c tu r e of th e noise p ro c e ss a n d th u s supplies
a s e t 6f a sy m p to tically orthogonal (i.e., w hite) sufficient s ta tis tic s fo r
d e te c tio n . Sim ulations have shown th a t a m p litu d e sh ift k e y e d (ASK),
fre q u e n c y shift k e y e d (FSK) a n d p h ase shift k e y e d (PSK) signals m ay b e
rea d ily d e te c te d w ith la d d e r form s e.g. [LeEO], however, in o rd e r to ex p lain
th e d e te c to r s tr u c tu r e , a sim p ler p ro b lem will b e c o n sid e re d initially. It is
d e s ire d to re c o v e r th e tim ing of a know n ra te , b a se b a n d digital b it s tr e a m
tra n s m itte d a c ro ss a n infinite bandw idth, additive G aussian noise channel.
The tra n s m is s io n fo rm a t is of th e n o n -re tu rn to zero (NRZ) v ariety , as shown
in F igure 3.S. The tim ing re c o v e ry algorithm o p e ra te s as follows. B o th
tr a n s m itte r an d re c e iv e r know th e d a ta r a te a n d e a c h have fre e r u n n in g
clocks a t equal freq u en cies, how ever a t different p h ases. T ran sitio n s in
re c e iv e r d a ta a re u s e d fo r fo rc e th e re c e iv e r clock in to syn ch ro n ism w ith
th e tr a n s m itte r so t h a t it suffices to d e te c t th e o c c u re n c e of b it tra n s itio n s
for tim in g re c o v e ry (This is know n as j a m - t i m i n g c o rre c tio n . F req u en tly ,
th e b it tra n s itio n s a r e only u s e d to m ak e sm all tim ing c o rre c tio n s, th u s
resu ltin g in a sm o o th e r tim ing acquistion). It is w o rth re m a rk in g th a t FSK
and PSK signals m ay be analyzed in a s imila r m a n n e r to w hat will b e
p re s e n te d h e re .
In em b a rk in g on th is discussion, it is im p o rta n t to rea liz e th a t th e
d e ta ile d analysis of th e d e te c to r is n o t w ithin th e scope of th is d iss e rta tio n
and will b e p r e s e n te d elsew here. However, th e sim ple exam ple of tim ing
re c o v e ry is ch o sen to illu s tra te firstly, th e u tility of la d d e r form s, h en ce th e
u tility of a signal processing chip capable of im plem enting th e la d d e r
- 35 -
re c u rsio n s (th e design of which is th e goal h e re ). Secondly, this exam ple is
m e a n t to illu s tra te th e im p o rta n c e of developing algorithm s th a t a re read ily
im p lem en ted . In th is case, knowing t h a t a n efficient la d d e r form chip is to
b e designed, a good o ptim al d e te c tio n alg o rith m (o r a t le a s t a tim ing
rec o v e ry algorithm ) utilizing th e la d d e r form will be developed.
3.3.1 T im ing R ecovery W ith L adder F o rm s
C onsider th e sy ste m of F igure 3.10 in which, th e re c e iv e r uses a n
unnorm alized, prew indow ed o r sliding window la d d e r form to e s tim a te
likelihood variables. Suppose t h a t th e noise p ro c e ss is zero m e a n w ith a n
Tixn cov arian ce m atrix , E, such t h a t th e jo in t d istrib u tio n of V sam ples
of th e in p u t p ro cess, y n is:
P y (s ) = (27r|2 |) n / 3 exP
w here y = [yi y z • ■• y n Y a n d 2 is a n unknow n p a ra m e te r.
P ro p o sitio n 3.1: The s ta tis tic , T ( y ) = y TE“ V is sufficient for th e fam ily
P y = I P y(E) | E e u j w ith a th e p a r a m e te r space.
Proof: Py(E) ad m its to th e fa c to riz a tio n g-e{T(Y))h(Y) w ith :
»«<nr» = (-K W i
* (« -
an d so T ( Y ) is sufficient for P r b y th e well known fac to riz atio n th e o re m
[Le59].
- 36 -
V(t)
+V
1 1 1 1
t
0 0 0 0
Figure 3 . 9 : Non-Return to Zero Transmission Format
Gaussian Noise
Modulator Demodulator
channel
Figure 3 .1 0 : D i g i ta l Transmission System
- 37 -
P r o p o s itio n 3.2: y n = y TZ n ly = r ^ E ^ r ^ , (y n = 1 - y n )
w here:
r j = v e c to r of backw ard resid u als
7 n = 1 “ 7n
P ro o f: This was shown by Lee a n d Morf [LM80], assum ing th a t th e sam ple
co v a rian c e m a trix provides a good e s tim a te of E, w hich is rea so n a b le for a
sufficiently larg e sam ple.
H ence, ynj is a sufficient s ta tis tic for d e te c tio n , in fac t, i t is a
likelihood variable. Although th is p ro p e rty of y n ,r will n o t b e specifically
ex p lo ite d in th is sim ple m otivation, i t c e rta in ly justifies ex am ining y n T as a
possible t e s t s ta tis tic for tra n sitio n d etectio n . F o r th is rea so n , it is fruitful
to e s ta b lis h its s ta tis tic a l d istrib u tio n u n d e r various conditions. However, it
is im p o r ta n t 'to note th a t th e unnorm alized, w eighted prew indow ed la d d e r
fo rm eq u atio n s of C h ap ter Two m u st be m odified to n o rm alize the
co v a rian c e s c o rre c tly . This influences th e d istrib u tio n of yn . In th e sequel,
it is a ssu m e d th a t th e alg o rith m has b e e n a p p ro p riate ly m odified to
n o rm a liz e th e covariances and p a rtia l covariances by th e n u m b e r of
sam p les.
P ro p o s itio n 3.3: Suppose th a t sufficient d a ta has b e e n o b served to
a c c u ra te ly e s tim a te E a n d h en ce ir7r r, a n d f u rth e r a ssu m e th a t no
tra n s itio n s have o c c u rre d fo r a sufficiently long tim e so th a t
■£’(£*.7’) = E i T k j ) = 0 f o r 0 < f c < 7 i . Then. y n.T~Xn< i-e., a c e n tr a l chi
sq u a re d v a ria te w ith n d eg rees of freedom .
- 38 -
Proof: U n d er th e conditions s ta te d ,
r k .T - N(QM)
and
E [ r k .T Tl.T \ = 0 V k *1
b y th e orth o g o n ality p rin c ip le of lin e a r le a s t sq u a re s e stim a tio n [Pa65].
-i
0 fl
R l
0
' T rn
_ y r i.T
i=i
but
~ 1) -» 7n.T ~ Xn
In keeping w ith th e s p irit of a sim ple exam ple, th e p r e s e n t analysis will
b e c o n sid erab ly sim plified by ignoring th e ra n d o m n a tu r e of th e reflec tio n
coefficients an d rep la c in g th e m w ith fixed v alu es during p eriods cf non
tra n s itio n (S im ulations p re s e n te d a t th e e n d of th is s e c tio n will justify th is
sim plification). It is now n e c c e s s a ry to ex p lo re how th e reflectio n
coefficients change d u rin g a tra n sitio n .
P ro p o siti o n 3.4: L et a tra n s itio n of 'y' o c c u r a t t = t 0 fro m +V to —V.
The re fle c tio n coefficients of th e la d d e r filte r a re u n a lte re d by th e
tra n sitio n .
- 39 -
U n d er th e conditions s ta te d ,
Tt.T N[0,Rif)
an d
E [rk j t i t ] = 0 Vk
by th e o rth o g o n ality p rin c ip le of lin e a r le a s t sq u a re s e stim a tio n [PaS5].
IV. 'r~ ... r *I~ R r, 0 -i w

l i • c ' Aj;
Rl TZ
7 n .T
0
T rn
ri.T
= 2
i=l Ri.T
but
n.r 0 , 1)
N { 7n .T ~ Xn
V r ~7
In keeping w ith th e s p irit of a sim ple exam ple, th e p re s e n t analysis will
b e co n sid erab ly sim plified by ignoring th e ran d o m n a tu re of th e reflec tio n
coefficients a n d rep la cin g th e m w ith fixed values during p erio d s of non
tra n s itio n (S im ulations p r e s e n te d a t th e end of th is se c tio n will justify th is
sim plification). It is now n e c c e s s a ry to explore how th e reflectio n
coefficients change d u rin g a tra n sitio n .
P ro p o sitio n 3.4: L et a tra n s itio n of 'y' o ccu r a t t = t 0 frc m + 7 to —V.
The reflec tio n coefficients of th e la d d e r filte r a re u n a lte re d by th e
tra n sitio n .
- 40 -
Proof: see appendix A.
P ro p o sitio n 3.5: F o r a tra n sitio n of 2V in th e rec e iv e d signal a t t = t 0,
E {rk) = 2 V a t t = t0 + k for 0 < k < n . O therw ise E (rk ) = 0.
Proof: see appendix B. The consequence of th is p ro p o sitio n is th a t an
im p u lse of change in resid u a l m e a n p rcp o g a tes th ro u g h th e la d d e r form ,
alte rin g th e e x p e c ta tio n of e a c h resid u a l for one tim e ste p .
T h eo rem 3.1: F o r a tra n s itio n of 2 V a t t = t0 in th e receiv ed signal,

7n.T ~ xl(A) fo r t0 < t ^ t 0+TL w ith n o n c e n tra lity p a ra m e te r given by:
At = i = t - tg
Proof:
7ni = E 0 < i < t 0 +71

i= l n i,t
b u t a t t = t- + k
n.t ~ N { 0.1) V i *k
and fro m p ro p o sitio n 3.5

Tk.t 2V
~ N 1
k.t k .t
7 n .i ~ E Xi + X?(Ar)
n —1
= Xn(Aj)
by th e re p ro d u c tiv e p ro p e rty of th e chi sq u are d istrib u tio n [Le59].
41 -
This th e o re m an d p ro p o sitio n 3.3 a re sufficient to co m p letely
c h a ra c te riz e yn j b o th before a n d a fte r a tran sitio n . Following a tra n sitio n ,
7 n .T r e v e rts to having a zero n o n c e n tra lity p a ra m e te r in 'n' tim e s te p s as a
c o n seq u e n c e of proposition 3.5. H ence, th e t iming reco v ery p ro b le m h as
now b e e n re d u c e d to a sim ple tw o h y p o th esis te stin g problem , fo r w hich a
w e a lth of knowledge exists. It is im p o rta n t to n o te th a t m o st of th e s e proofs
re ly on r i t being norm ally d istrib u te d . S tric tly speaking, th is is n o t th e
c a se sin ce R [ t is only a n e s tim a te ( th a t is, th e c o rre c t d istrib u tio n is
S tu d e n t’s-t) of th e tru e e r r o r covariance. However, a fte r a sufficient
n u m b e r of sam ples (which is quite sm all fo r G aussian signals), th is e s tim a te
h a s enough d e g re e s of freed o m to ju stify th e approxim ation.
T heorem 3.1 in dicates th a t th e re a re 'n' sequential o p p o rtu n itie s fo r
d e te c tin g th e tra n sitio n since X is n o nzero for n sam p les (E arlier
d e te c tio n is of course m ore d e sira b le w ith a ja m tim ing sch em e since th e
ex p ectin g tim ing e rr o r is red u c e d ). L et Pp.* b e th e false a la rm p ro b ab ility
a t th e I th sam p le (i = 1, 2 n ) a n d Pjt.i th e asso ciated m issed d e te c tio n
p ro b ab ility . Then, th e overall false a la rm (P /a) and d e te c tio n (Pp)
p ro b a b ilitie s are:
(3.9)
t=i
Po = i - n Pui (3.10)
i=i
N otice t h a t as th e false a la rm p ro b ab ility in c re a se s additively, th e m issed
d e te c tio n probability d e c re a ses geom etrically .
C onsider th e evaluation of th e false a la rm (Pp) an d d e te c tio n (P*)
p ro b a b ilitie s for a single sam ple. A b it tra n s itio n is now d e te rm in e d th ro u g h
th e two hyp o th esis problem s:
- 42 -
H0 : no tra n sitio n , i.e., y n j- ~ x 2

v e rs u s H j: tra n sitio n , i.e., 7*7 ~ X n(V )
w hich re s u lts in a th re sh o ld d e te c tio n sc h e m e of th e form :
7n.T < T no tra n s itio n

< T tra n s itio n
= T ran d o m ize if d e s ire d
w here T is th e th re s h o ld w hich d e te rm in e s th e ty p e 1 a n d ty p e U (false
a la rm a n d m isse d d etectio n ) e r r o r p ro b ab ilitie s. U nfortunately, it is
difficult to c h a ra c te riz e th e te s t since is n o t a lo catio n p a ra m e te r of th e
ch i-sq u a re d d istrib u tio n . However, good r e s u lts m ay b e o btained th ro u g h
th e u se of F is h e r’s no rm al appro x im atio n to th e c e n tra l x2 d istrib u tio n and
P a tn a ik ’s [Pa49] ap proxim ation to th e n o n -c e n tra l x2- F o r yn (X) a chi-
sq u a re d v a ria te w ith a larg e num ber. 71, d e g re e s of freed o m an d n o n c e n tral
p a ra m e te r, X, th e s e approxim ations are:
■s/2 7 n ( 0 ) ~ N iy tZ n - 1, 1)
and
V 2 7n(X) ~ ff2)
w ith
— & V 2n - 1 (3.11)
n + X ' '
(3.12)
The ap p ro x im atio n of th e n o n c e n tral ch i-sq u a re d d istrib u tio n is b e tte r th a n
for th e c e n tra l d istrib u tio n for a given value of 'tl' b e c au se th e fo rm e r has
effectively v > n d e g re e s of freed o m given by:
(3.13)
- 43 -
The two a p p ro x im a te d istrib u tio n s a re show n in F igure 3.11. Once a
th re sh o ld , T, is estab lish ed , th e false a la rm an d d e te c tio n p ro b ab ilitie s a re
given by:
Pp = P ro b [re je c t H0 | X = 0]
= e rfc (T - V 2 n - 1) (3.14)
an d
Pi - 1 - P ro b [a c c e p t H0 | X =X0 ^ 0]
1 -
- 1 - /r 7 e 1 ‘ d, x
VS tto5
= e rfc T ~ U (3.15)
w here / a, u a r e given by (3.11) (3.12) for X = \ and
e r f c x = 1 — - i —_ f e~xZ/z dx
V g i Jx
is th e c o m p le m e n ta ry e r r o r function.
E xam ple 3.1:
R ecall th e tim ing recovering sch em e o u tlin ed in S ection 3.3. Such a
’’jam m ed -tim in g " sc h em e is clearly m o re sen sitiv e to false a la rm
in fo rm a tio n t h a n m issed d e te c tio n fo r an in c o r r e c t t i m ing jam (false alarm )
c a n c a u se la rg e tim ing e rro rs. However failing to m a k e a n occasional tim ing
c o rre c tio n (m issed d e te c tio n ) is n o t cru cial sin ce o n c e th e tr a n s m itte r and
re c e iv e r clo ck s a re synchronized, th e y d rift slowly.
- 44 -
c
>-
to
Lu c
o
•r—
4->
.3
5-
+->
I/)
ro
U
c/>
on
•4-5
<0
00
a;
s-
3on
- 45 -
The d e te c tio n th re sh o ld fo r a given Pp is d e te rm in e d fro m (3.14),
T = V2n - 1 + e rfc _1(Pp)
following which th e d e te c tio n p ro b ab ility is given by (3.15). F igures 3.12 and
3.13 show th e th re s h o ld and m iss p robabilities fo r various Pp. X for th e
case of n = 3. The in te g rity of th e no rm al a p p ro x im atio n to th e n o n c e n tral
c h i-squared m ay b e a s c e rta in e d from th e effective d e g re e s of freedom , v,
shown in F igure 3.14.
Rem arks:
1) The n o n c e n tra lity p a ra m e te r is a m e a su re of th e signed to noise ratio of
th e w hitened signal com ponents.
2) A pplications w hich a re m o re sensitive to m isse d d e te c tio n th a n false
a la rm (e.g. ra d a r) c a n benefit from r e p e a te d d e te c tio n trials, as
in d ic a te d b y E quations (3.9) and (3.10). F or exam ple, Pp = .0001 leads
to a Pu - . 06 a t X = 16 d B . However, w ith a n e ig h th o rd e r la d d e r filter,
th e r e a re e ig h t o p p o rtu n ities for d e te c tio n w hich re s u lts in Ppa - .0006
a n d Pu = 1 —Pj) = 1.2xlO -10 assum ing \ ~ X , i = 1, 2,..., n (which
sim u latio n s will show to b e a reasonable a ssu m p tio n ).
3) N orm al ap p ro x im atio n s to a chi-squared v a ria te a re a c c u ra te for larg e
d e g re e s of freed o m (e.g. v > 20). This fact, to g e th e r w ith Figure 3.14
should be b o rn e in m ind w hen using F igure 3.13. The norm al
a p p ro x im atio n provides a convenient vehicle fo r a sim ple exposition
s u c h as th is, how ever a c c u ra te re s u lts for low signal to noise ratio s
m u s t be o b ta in e d fro m th e chi-squared d istrib u tio n s.
4) The p r e s e n t analysis h a s involved th e t e s t of a sim ple hypothesis
a g a in st a sim ple alte rn a tiv e . This c o rre s p o n d s to having p rio r
knowledge of th e noise pow er (effectively, X). A co m posite hypothesis
P ro b ab ility of False Alarm(Pp)
1 0 '7 10"6 10-5 1 0 '4 H O '3 10‘ 2 1 0 '1 lO13
Figure 3 .1 2 : D e te c tio n Threshold v s . F a ls e Alarm P r o b a b ilit y
- 47 -
:=.oi
Lambda (dS)
Figure 3 .1 3 : Missed D e te ctio n P r o b a b i l it y f o r Various Pr ,X
- 48 -
40 -
20 .
10 .
Lambda (dB)
0 5 10 15 20
F igure 3 .1 4 : E f f e c t i v e Degrees o f Freedom o f y_ -i- U )

— — — n , i
- 49 -
t e s t would provide re s u lts for c a se s w here th e noise pow er a n d th e bias
in th e p a rtia l c o rre la tio n coefficients due to th e noise is unknown.
5) The previous se ctio n has explored tim ing reco v ery , r a th e r th a n th e
c o m p lete p ro b lem of signed detectio n . A sim ple d iffe re n tia l d e te c tio n
met-hod using th e tim ing recovery d a ta m ay b e re a d ily c o n stru c te d , fo r
ev ery tra n s itio n co rre sp o n d s to a b it change. The u su al p ro b lem s
a s so c ia te d w ith differential d e te c tio n [TS71] e n c o u n te re d . L adder
fo rm s m a y b e u se d fo r m em oryless d e te c tio n also, how ever a tre a tm e n t
of th a t s u b je c t is n o t c o n sisten t w ith th e goals of th is d isse rta tio n .
8) The b a se b a n d c a se th a t has b e e n explored h e re c o n sists of a n e a rly
d isco n tin u o u s signal c o rru p te d by a differen tiab le noise p ro cess.
S ingular d e te c tio n th e o ry 1 shows th a t it is always th e o re tic a lly possible
to d e te c t th e signal w ith a rb itra rily low e r r o r p ro b ab ility by
d ifferen tiatin g th e signal p lus noise process. The la d d e r filte r provides
a re a lis tic d e te c tio n s tr u c tu r e in th e singular case.
7) Many assu m p tio n s w ere m ad e in p re se n tin g th is sim plified analysis
sin ce a d e ta ile d ev alu atio n was n o t c o n sisten t with th e th e m e of th e
th esis. P e rh a p s th e m o st serious of th e s e was to ignore th e
c o n trib u tio n of th e a u to reg ressiv e noise p ro c e ss to th e reflectio n
coefficients of th e lad d er. However, th is a s su m p tio n is n o t as
re s tric tiv e as it m ay initially a p p e a r since th e d e te c to r c a n first be
tra in e d to th e noise s p e c tru m in th e ab sen ce of a signal and th e n
a c c o u n t m ay b e ta k e n of th e noise reflec tio n co efficien ts w hen a signal
is p r e s e n t (i.e. during n o rm a l o p eratio n of th e d e te c to r).
1 I am grateful to Professor Kailath for pointing out this fact.
- 50 -
3 .3 .2 S im u lation R esu lts
A m o d em sim ulation p ro g ram em bodying a tra n s m itte r, additive
G aussian noise c h a n n el and a re c e iv e r was w ritte n to g a th e r sim u latio n d ata.
The noise so u rc e was a n eig h th o rd er a u to reg ressiv e m odel being d riv e n by
w hite noise. F igure 3.15 shows th e evolution of th e ^ l T w ith tim e fo r a
clean, b a seb a n d , NRZ fo rm a t signal. N otice th a t K7l T - 1, w hich was
a ssu m e d in P ro p o sitio n 3.4, is a good assum ption, as is th e a p p ro x im atio n
t h a t th e r e m aining p a rtia l c o rre la tio n coefficients a re zero. N otice f u rth e r
t h a t P ro p o sitio n 3.5 is verified since th e values of th e b ackw ards re sid u a ls
behave as e x p e cte d . The sam e re su lts a re shown in Figure 3.16 fo r a m u c h
low er signal to noise ratio.
S im ulations w ere also p e rfo rm e d for PSK and FSK signals fo r various
signal to noise ra tio s a n d la d d e r filter o rd ers. These a re shown in F ig u res
3.17 - 3.21. N otice th a t a norm alized value oi y n.r h as b e e n p lo tte d for
convenience as well as th e change in y nj from sam ple to sam ple. The
re s u lts (e.g. F igures 3.15 - 3.18) show th a t A* « A , i = 1, 2 n following a
tra n sitio n , w hich provides a convenient way to apply E quations (3.9) and
(3.10) sin c e Pp.* an d Pm-i becom e in d ep e n d e n t of i . The tra n s itio n s in y n j ,
even fo r low signal to noise ra tio s are quite profound. The effect of red u cin g
th e o rd e r of th e la d d e r is to in c re a se th e b a n d of noise a ro u n d (e.g
F igures 3.19 a n d 3.21) since th e lad d e r is unable to m a tc h t h a t p o rtio n of
th e noise s p e c tru m .
CHAPTER SUMMARY AND CONCLUSIONS
The p u rp o se of C h ap ter Three was to identify ta r g e t ap p licatio n s and
th ro u g h p u t re q u ire m e n ts for th e signed processing chip to be designed.
- SI -
1.5
Signal + Noise
160 (tim e )
1 .5 T
0 160
Figure 3 .1 : Baseband S im u la tio n , SNR=46 dB, 8 th ord er lad d er
- 52 -
0. 1
- 0.1
1 ,T
1.
0
0 160
Figure 3 .1 5 continu ed
- 53 -
0 200 (tim e)
2
3,T
2
0 200
Figure 3 .1 5 continu ed
- 54 -
Signal + Noise
1 .3 ,
M h 111*,
-1.3L
160 (tim e)
gamma.g j
Figure 3 .1 6 : Baseband S im u la t io n , SNR=20dB, 8th ord er ladder
- 55 -
1.0
0 .5
‘4 , 7
-0 .5
0 160
Figure 3 .1 6 continued
- 56 -
-2 L
160 (tim e )
2
5,T
2
160
Figure 3 . IS continued
- 57 -
O SO
0.00
CO •0 20
100 00 150.00
h-
CO
O 30
O 00 50 00 100 00 200 CO
Figure 3 .1 7 : Binary PSK, SNR=12.4dB, 8 th ord er ladder
- 58 -
C/5
O 00
o>
CO
•1 00
e.ce 150 00 200 00 300
O.So
0 -K
O20
0 CO
-O 20
300
Figure 3 .1 8 : Binary PSK, SNR= OdB, 8 th order ladder
- 59 -
(/>
O
c
cr>
c/i
i so
so oo
o.
I—
CJ
•0 4 0
200.00
0 So
CM
0 10
0 001
0 CO 100 0%' 200.00
Figure 3 .1 9 : Binary PSK, SNR= 12.4dB, 2nd order ladder
- 60 -
1 SOl
1 <.'Op;
+ n o is e
signal
-o.so
- i .00,
0 . 00. 100.00
CO
<
O 001
-o .
0. -101
SO. 00 200.00
Figure 3 .2 0 : Binary FSK, SNR= 26.3dB . 8th order lad d er
- 61 -
1 50
+ n o is e
signal
- 1 00
-1.50
0.00
0 . 40
-O, 1 0
-0.30
50.00 100 00 150.00 300
Figure 3 .2 1 : Binary FSK, SNR= 12.4dB. 4th order lad d er
- 62 -
L adder filters w ere shown to provide a unified s tr u c tu r e for th e signal
p ro cessin g ta s k s of sp e e c h analysis an d synthesis, ad aptive equalization and
digital signal d e te c tio n . V oiceband applications of th e s e ta sk s fre q u e n tly
e n tail th e handling of d a ta a t a n BKHz sam ple r a te . T herefore, th e signal
processing chip m u s t b e capable of com puting th e la d d e r filte r equations a t
th a t ra te . All of th e su g g ested applications u se filte rs w hich a re typically of
e ig th o r t e n th o rd er. It would be ideal if th e chip to be designed could
com pute all t e n sta g e s in th e req u isite a m o u n t of tim e, however, it will
suffice to co m p u te one sta g e p e r chip a n d c a sc a d e te n chips to fo rm th e
te n th o rd e r filter.
The com plexity of th e norm alized la d d e r fo rm e q u atio n s m o tiv ated an
a lte rn a te fo rm u latio n to expose th e fu n d am e n ta l n a tu re of g eneralized
v e c to r ro ta tio n s in describing th e algorithm s. It was shown th a t th e
re c u rsio n s could be rea d ily co m p u ted as a seq u en ce of a few two-
dim ensional ro ta tio n s. The a rith m e tic u nit of th e chip will th e re fo re be
b a sed on efficient n u m erica l techniques, stu d ie d in th e n e x t c h a p te r, for
p erform ing th e s e ro tatio n s.
Finally, n o te t h a t th e new schem e developed fo r d igital signal d e te c tio n
is im p o rta n t for two reaso n s. It is first im p o rta n t in its own rig h t as a
d e te c tio n sc h em e b e c au se it req u ires less a p rio ri know ledge th a n existing
schem es, a n d it is capable of adapting to ch an g es in th e noise environm ent.
Secondly, it is a n exam ple of devising a n a lg o rith m t h a t is am enable to
im p lem e n ta tio n fo r a specific task . In th is case, by using lad d e r filters in
th e d e te c tio n algorithm , it is now possible to p e rfo rm signal d e te c tio n using
th e sa m e h a rd w a re t h a t would have b e e n developed fo r sp e e c h p ro cessin g
and adaptive equalization. In fact, th e co m m o n ality of equalizer and
d e te c to r s tr u c tu r e s m ak e s th e im p lem e n ta tio n of a m odem quite easy.
- 63 -
APPENDIX A
To prove t h a t ? and K£ f a re c o n sta n t a c ro s s a tran sitio n , th e
following sim plifying assum ption is m ade:
S ince th e noise sp e c tru m is n o t known, th e la d d e r filter will be
an alyzed w ith no noise. The re su ltin g v alu es of th e p a rtia l
c o rre la tio n coefficients will be u se d to analyze th e signal plus noise
c a se . S tric tly speaking, th is is n o t c o rr e c t sin c e th e coefficients
a r e b ia se d by th e noise sp ectru m .
The NRZ b a se b a n d signal adm its to th e a u to re g re ssiv e m odel:
Vn = * Vn- 1 (A.1)
w here
k = -1 a t a tra n sitio n
= +1 everyw here else
Now, fro m th e orthogonality principle of lin ear le a s t s q u a re s estim atio n
[Pa65]:
_ E iVT Vt- i)
*l.T (A.2)
•T - E&/..
{ y T- i y T-i)
F rom (A.1) E ( y T y r - i ) = k E {y T-i Vt- i)
.. K y = Jfc = 1
i.e., th e b e s t e s tim a te of y ? from y r-i is sim ply yr~ \. Similarity,
K'i .t = 1.
Finally, all h ig h er o rd e r p a rtia l c o rre la tio n c o efficients a re zero since:
£71.r = y T - Use lyT\yr-i Vt- z ■• • Vt^ I
i=l
- 64 -
b u t E{Enm
xyr-i) - 0 i = l,2 n w hich leads to
cu = 1
Oj = 0 z = 2, 3 n .
H ence, th e b e s t e s tim a te of yT is y r - i s e ttin g K \ j - K f r = 1 and
Ki'T = K i j = 0, and i > 1.
APPENDIX B
Proof of P roposition 3.5:
L et
_ f 1 T < t0
UT ~ [ - 1 T 2: t 0
6? ~
P ro p o sitio n 3.4 a s s e r te d th a t
1 i = l
KZ.T = K* t = ' 0 i = 2. 3 7i
When th e in p u t to th e la d d e r form, is ZV u t +71t w here tit is a zero m e a n
noise p ro c e s s , th e n th e above values of and yield (re c a ll t h a t th e
bias in , K £r due to th e noise was ig n o re d in th is sim ple exposition):
Since = -Ki . t = 0. ?' = 2. 3 n i t is c le a r th a t
■£’( r i.7 ’) = ■£’(7'l.7 ’- £ + l ) = S F d y -iJ .!
- 66 -
BIBLIOGRAPHY
[AH71] B. Atal, S. H anauer, "S peech Analysis an d S ynthesis by L inear
P re d ic tio n of th e S peech Wave,” J o u rn a l o f the A coustical S o c ie ty
o f A m erica, Vol. 50, 1971, pp. 637-655.
[Fa60] G. F ant, Acoustic Theory o f S p eech P roduction, M outon and Co.,
1960.
[GK73] M. Gevers, T. Kailath, "An Innovations A pproach to L east-Squares
E stim atio n , P a r t VI : D iscrete-Tim e Innovations R ep resen tatio n s
a n d R ecursive E stim ation," IE E E T ransactions on A utom atic
Control, Vol. AC-16, D ecem ber, 1973, p p. 588-600.
[Le59] E. L ehm ann, Testing S ta tis tic a l H ypotheses, J.Wiley and Sons,
1959.
[Le80] D.T. Lee, "Canonical L adder F o rm R ealizations and F a st
E stim atio n A lgorithm s,” Fh.D D issertation, S ta n fo r d U niversity,
D ept, of E le ctric a l Engineering, 1980.
[LM80] D. Lee, M. Morf, "A Novel Innovations B ased Time Dom ain P itc h
D e te cto r," Proc. o f In t'l. Conf. on A coustics, S p e e ch and Signal
P rocessing, Denver, CO, 1980, pp. 40-44.
[LSW65] R. Lucky, J. Salz an d E. YTeldon, P rin cip les of Data
C om m unications, McGrawHill, 1968.
[MG76] J. M arkel, A. Gray Jr., L in ea r P re d ic tio n o f Speech, S pringer-
V erlag, 1976.
- 67 -
M. Morf, C. M uravcbik, D. Lee, "H ilbert S p ace A rray Methods for
A lpha-S tationarya P ro c ess E stim ation," Proc. o f I n t l . Conf. on
A coustics, S p e e ch a n d Signal P rocessing, A tlanta, GA, 1.981, pp.
856-859.
[Mo74] M. Morf, "F a st A lgorithm s for M ultivariable System s", Ph.D
D issertation, S ta n fo r d U niversity, D ept. of E lectrical
E ngineering. 1974.
[Pa49] P.B. P a tn aik , "The N on-Central x2 a n d F D istributions and Their
A pplications", B io m etrika , vol. 36, pp. 202-232, 1949.
[Pa65] A. P apoulis, P robability, Random. Variables a n d Stochastic
P rocesses, McGraw Hill, 1965.
[Sh79] M. Shensa, "A L east-Squares L a ttice D ecision Feedback
Equalizer," Proc. o f In t'I. C om m unications C onference, 1980, pp.
57.6.1 - 57.6.5
[SP80] E. S atorius, J. P ack, "A L east S quares A daptive L attice Equalizer
A lgorithm ," N aval Ocean S y s te m s C enter, T echnical R eport 575,
S e p te m b e r, 1980.
[TS71] H. Taub, D. Schilling, P rinciples o f C o m m u n ica tio n s S ystem s,
McGraw Hill, 1971.
[VT68] H. Van T rees, D etection, E stim a tio n a n d M odulation Theory,
Volume 1, J. Wiley and Sons, 1968.
- 68 -
[Wi49] N. Wiener, Extrapolation, In te rp o la tio n and S m o o th in g of
S ta tis tic a l Time S e rie s w ith E n g in eerin g A pplications,
Technology P re ss an d Wiley, 1949.
[Wi70] B. Widrow, "Adaptive F ilters." in A sp ects o f N etw ork a n d S y s te m s
Theory (Kalman, DeClaris), H olt. R in e h a rt and Winston, 1970.
[WJ65] J. W ozencraft, I. Jacobs, P rin cip les of C om m u n ica tio n
E ngineering, J. Wiley, 1965.
- 69 -
CHAPTER FOUR
NUMERICAL ALGORITHMS
The p rev io u s c h a p te r evidenced th e n e e d fo r th e efficient co m p u ta tio n
of e le m e n ta ry trig o n o m e tric functions and sq u a re ro o ts a s well as fo r 2 x 2
ro ta tio n s a n d /- r o ta tio n s . Exam ples in fu tu re c h a p te rs will d e m o n s tra te
th e p ro life ra tio n of th e se op eratio n s in m a trix a lg e b ra algorith m s th a t a re
com m onplace in signal processing. This p ro v id es fo r a ra d ic a l d e p a rtu re in
c u r r e n t day th in k in g re g a rd in g signal p ro ce ssin g co m p u te rs, w hich a re
p re s e n tly b ased on feist m ultiply and add c irc u its [AMI79] [BellBl]
[KNSYM80]. It new se e m s t h a t c o m p u te rs c a p ab le of v e c to r ro ta tio n would
b e m o re n a tu r a l a s signed p ro cesso rs.
The p ro b le m of c o n stru c tin g b o th h a rd w a re an d softw are efficient
n u m e ric a l a lg o rith m s for th e above functions, w hich co m p rise th e r a th e r
ric h s e t of e le m e n ta ry o p e ra tio n s in signed p ro cessin g , h a s b een a d d re sse d
by a n u m b e r of a u th o rs,e .g . [Me62] [CET62] [Sp65] [SK71] [DeL70], Two
prom ising a p p ro a c h e s fo r th e p re s e n t re q u ire m e n ts a re th e CORDIC
a lg o rith m s of V oider [Vo59] a n d W alther [W a7l] a n d th e convergence
c o m p u ta tio n te c h n iq u e s of Chen [Ch7l],
This c h a p te r will first d escrib e b o th of th e s e tech n iq u es. Som e m a jo r
im p ro v e m e n ts to th e alg o rith m s, w hich sim plify th e ir im p lem e n ta tio n while
enhancing th ro u g h p u t, will b e p re s e n te d . A g e n e ra liz a tio n of C hen's
m ethod, th a t p ro v id es a fram ew ork for d e m o n s tra tin g th a t th e CORDIC
alg o rith m s a re a c tu a lly a special c a se of th e convergence co m p u ta tio n
m eth o d , will b e c o n sid e re d last. Many new a n d useful functions m ay b e
c o m p u te d in th e g e n e ra liz e d fram ew ork.
- 70 -
4.1 THE CORDIC ALGORITHMS
The CORDIC com puting tech n iq u e was firs t p re s e n te d by V oider [Vo59]
as ite ra tiv e eq u atio n s for com puting som e e le m e n ta ry functions su c h as
plane ro ta tio n s an d hyperbolic (or J-) ro ta tio n s on two dim ensional vecto rs.
M ultiplication, division, t a n -1, t a n h -1 a n d sq u a re ro o ts were also included in
V older’s m eth o d . W alther [IVa7l] show ed t h a t th e CORDIC algorithm s could
be unified in to one s e t of ite ra tiv e eq u atio n s p a ra m e te riz e d b y a q u a n tity
'm.' w hich re p r e s e n te d a coordinate s y s te m in w hich th e rad ial co m p o n en t
or no rm . R , a n d a n g u la r com ponent, $, of a v e c to r X = ( x , y ) a re given
by
1 0
R - y /x 2 + m y 2 = | | (2,v) I Is • 2 (4.1)
0 771
$ = yfrrL ta n ’( y V m / i ) (4.2)
Figure 4.1 d e p ic ts R , $ for 771 = - 1 , 0, 1 w hich a re re fe rre d to as th e
hyperbolic, lin e a r a n d c irc u la r co o rd in a te sy ste m s respectively. The CORDIC
ite ra tio n s c o rre sp o n d to ro ta tin g a v e c to r along one of th e curves of F igure
4.1.
The CORDIC re c u rsio n s ro ta te a v e c to r X* = (xit y i) 7 to a new v e c to r
Xi+i = (xi+1. 2/i+1) 7 according to
(4.3)
w here {M. = ± 1 d e te rm in e s th e d ire c tio n of r o ta tio n a t e a ch ite ra tio n a n d
$<5i$ is a se q u en c e of a rb itra ry c o n s ta n ts re p re s e n tin g th e m agnitude of t h a t
ro ta tio n . A fter 'n ' ite ratio n s, th e new ra d ia l a n d an g u lar com ponents of th e
v e c to r a re
= ?C - O. (4.4a)
- 71 -
m= 1 lm=0 m =-l
S = Shaded Areo
Figure 4 . 1 : R otation in G en eralized Coordinate Systems
- 72 -
Rn = R c * K (4.4b)
w here
n —1 it—1 ^
a = 2 A^t®i = 2 /iim -1/2tan-1(5iV m ) (4.5a)
:0
*fm = V Ki = “n 1 V l + m 5 ? (4.5b)
i= 0 i=0
An auxilllary v ariable is in tro d u c ed to a c c u m u la te th e to ta l rotation:
2 i +1 = *£ “ (4.6)
(n o tice t h a t a* > 0 w ith th e sign of th e ro ta tio n b ein g ch o sen a t e a ch ste p
th ro u g h fj-i)
As a n exam ple, c o n sid e r evaluating t a n -1^ / ^ • If th e in itial v e c to r
Xo = (x0 y 0) is r o ta te d th ro u g h a seq u en ce of angles \ai] until
X„ = (x n , 0) th e n z n will equal th e n e t r o ta tio n p ro v id ed it was initially
equal to zero. Thus z is a useful q u a n tity sin ce a f te r n ro ta tio n s it
a c c u m u la te s th e n e t ro ta tio n ,
n=l
2n = 20 - 2 AfcOi (4.7)
i= 0
w here - sig n of th e I th ro ta tio n (±1).
The CORDIC alg o rith m is com prised of E quations (4.3) an d (4.6). If th e
ro ta tio n s a r e m a d e to p ro c e e d in a m a n n e r w hich fo rc e s to zero, i.e.,
y n -* 0 th e n a n d z„, re s u lt in som e useful fu n ctio n s, one of w hich is th e
a r c ta n e x am p le given above. Figure 4.2 su m m arizes th e r ic h co m plem ent of
fu ctio n s t h a t a r e o b tain e d w hen y -> 0 or z -» 0.
N otice t h a t <5* is re la te d to th e ta n g e n t of a* (in th e a p p ro p ria te
c o o rd in a te s y s te m of course) an d th is h as r e d u c e d th e n u m b er of
m u ltip lic a tio n s in E quation (4.3) from th e u su al fo u r re q u ire d in a v e c to r
ro ta tio n . The p e n a lty fo r th is sim plification is th e sp u rio u s scale facto r, K,
- 73 -
CORDIC Functions
The
4.2:
Figure
- 74 -
since each, ite r a tio n of (4.3) is a ro ta tio n as well as a stre tc h in g .
A definition of positive and negative an g les is im p licit in th e CORDIC
eq u atio n s, sin c e th e d ire c tio n of r o ta tio n is c h o s e n a t e a c h ite ra tio n to
achieve a p re s c rib e d destination, e.g., z* -» 0. The definitions a re c h o sen to
yield th e fu n ctio n s of F igure 4.2. It will b e fre q u e n tly convenient to re v e rse
th e definition of positive angles so as to g e n e ra te slightly differen t functions.
F o r exam ple, re c a ll E quation (3.4) in w hich th e m a tr ix ,N, is a tra n sp o se d
r o ta tio n m a trix . This m ay be re p r e s e n te d a s a p lan e r o ta tio n th ro u g h th e
angle — F u r t h e r mo r e , th e n e e d for m u ltip ly —a n d —su b tra c t r a th e r th an
m u l t ip l y —a n d —a c c u m u la te will b eco m e a p p a re n t in a v a rie ty of c a se s la te r.
I t would c le a rly b e m o re convenient to a b so rb th e s e sign changes in to th e
CORDIC re c u rs io n s (which is a relativ ely sim ple c o n tro l ta s k in a CORDIC
m ac h in e) r a t h e r th a n to in c u r th e s p e e d a n d com plexity p en alties of
s e p a ra te ly n e g a tin g a quantity. The re v e rs e d sig n fu n ctio n s shown in F igure
4.3 c a n b e re a d ily g e n e ra te d by sim ply re v e rsin g th e sign in th e Zj+j
ite r a tio n (E quation (4.6)), th a t is, th is ite ra tio n is defined as.
Zj+l = Zi - SPiCLi
w here s = 1 y ield s th e n o rm a l CORDIC eq u atio n s a n d s = - 1 re s u lts in th e
sign r e v e rs e d e q u a tio n s in w hich th e d ire c tio n of ro ta tio n has b e e n rev ersed .
A lthough th is r e s u lt is intuitively satisfying, th e d e ta ils of th e proof a re given
in th e appendix.
4 .1 .1 Som e C onvergence P roperties
The ch o ice of is p re d e te rm in e d a n d c ru c ia l to th e convergence
b eh av io r of th e algorithm . C onsider th e so c a lled v e cto rin g case in which
th e d ire c tio n of ro ta tio n is ch o sen to re d u c e th e m ag n itu d e of th e angle a t
K ,(y c o s/-x sin /)
CIRCULAR (m = 1): 0 CIRCULAR (m 1): y — 0
y -----
/ ------- ----- 0
LINEAR ( m = 0 ) : z - 0
K _ ,(x co sh j-y sin h /)
y — K .i(y c o sh f-x s in h j)
-----0
HYPERBOLIC (m = - 1 ) . ^ — 0 HYPERBOLIC ( m = - 1 ) : y - 0
Figure 4 . 3 : The Reversed Sign CORDIC Functions

- 76 -
e a c h ite ra tio n , th u s bringing th e r e s u lta n t v e c to r to th e abcissa, i.e.,
i*i+ii = I l St l - « i i (4.8)
The sum of th e r e m aining ro ta tio n s a t e a c h s te p m u st be large enough
to b rin g th e angle to a t le a s t w ithin a n -1 of zero, th u s g u aran teein g th e
"g r a n u la r ity " of th e calcu latio n or th e "a n g u la r re so lu tio n " to be an- j in
'n ' s te p s (This m u st be tru e even w hen = 0 and j$i+ i| = « i). This
c o n d itio n im plies:
«i - * £ a.j < ot„_i i = 0, 1, 2.........n —2 (4.9)

j=i+1
The d om ain of convergence of th e alg o rith m is lim ite d by th e to ta l possible
ro ta tio n , i.e.,
n-1
I* . I — 2 ^ 1
J=0
*1—2
-» m ax | $0 | = a n _i + a,- (4.10)
j= 0
In o r d e r to show th a t $ converges to w ithin a „ _ i of zero in 'n ' steps,
p r o c e e d as follows:
L e m m a 4.1 [Walther]:
l$tl < On-l + 1C °-i
P ro o f: By in d u ctio n
i = 0 : The hypothesis is t r u e for i = 0 by (4.10) above.
Opnpral i : A ssum e th e hypothesis is t r u e fo r som e i .
Tl-l
|$ i| < ^ CXj
3=i
- 77 -
n -l
l$tl ~ < a n- 1 + 2 «;
J=i+1
n-1
-O i < |$ i| - oti < a n _! + 2 aj
j= i+ 1
Now apply (4.8):
« n -l + IS «> < -O i < |$ i | - £Xi < ^ ttj

j= i+ 1 J=i+1
i.e.,
n -l
| | CLi < Otjj—i + 2
i = t+ l
However, applying (4.8) yields:
Tt-I
| f i+i| < a n _! + 2 ai
j=i+1
m aking th e hypothesis tru e for i + 1 an d th e lem m a is p ro v ed by induction.
T h e o rem 4.1 [Walther]:
§ converges to w ithin <*„_! of zero in n steps.
P roof: This is a d ire c t co n seq u en ce of th e above le m m a fo r i = n .
R em arks
(1) By rep lacin g § w ith z , e x a c tly th e sam e a rg u m e n t applies to
prove convergence in th e r o ta tio n c a se w hen th e v ariab le z is
driven to zero. C onsequently, z also has th e sa m e dom ain of
convergence as $, i.e.,
m ax | z 0 | = m a x | $ 0 1 = ^ a,-
- 78 -
(2) The th e o re m o n co n v erg en ce is v e ry im p o rta n t since it yields a
g u a ra n tee d , c o m p u ta b le co n v e rg e n c e bound on th e algorithm .
(3) The algorithm h as nice b eh av io r p ro p e rtie s even outside its region
of convergence. Suppose § 0 is to o larg e a n d outside th e dom ain
of convergence. Then th e sig n s of -will all b e chosen th e sa m e
a n d th e calcu latio n will a p p ro x im a te , as closely as it can, th e tr u e
angle, i.e., th e c a lc u la tio n will p ro c e e d to a n angle of ^ a* a n d

j =o
th e n s a tu ra te . For exam p le, if m a x | $„ | = and
$ = ta n -1 y 0/ x 0 is to be c o m p u te d , th e n th e re s u lt will b e if
$ > m ax | $o | . Thus, th e r e s u lts a re p re d ic ta b le and rea so n a b le
e v en outside th e d om ain of co n v erg en ce.
4 .1 .2 Im p lem en tation Issu es
The choice of j m ay b e q u ite a rb itra ry , how ever th ey a re a lm o st
always ta k e n to be in te g ra l pow ers of th e m ac h in e radix, generally two, so
th a t th e scaling by <5* in (4.3) c a n b e p e rfo rm e d w ith a rith m e tic sh ifte rs

J?
in ste a d of m ultipliers. If <5* = 2 *, th e choice of is equivalent to th e
choice of [Fi]. Once a se q u en c e is chosen, it is p re sto re d , fixing th e
m ag n itu d e of <5* a n d h e n c e cq a n d However th e d irectio n of ro ta tio n
is c h o sen a t e a c h ite r a tio n to re a liz e th e p a rtic u la r zero forcing z n -* 0
a n d y n -* 0 m en tio n ed above. The ch o ice of \F i\ is c ru c ial to th e size of
th e sc ale c o n stan t, K , in (4.5b) a n d th e do m ain of convergence of th e
alg o rith m (4.10).
R ecall th e rela tio n sh ip b e tw e en cq a n d <5it
« = E Mia i = £ Aqm._1/2ta n _1(<5i V m )

i= 0 i= 0
- 79 -
The seq u en ce Fi m u st be c h o sen s u c h t h a t $oci} satisfies th e
convergence c rite r io n of (4.9). F o r a ra d ix 2 m a c h in e , V oider su g g ested
som e se q u en c e s fo r m = —1 ,0 , 1 th a t a p p e a r in F igure 4.4. These
seq u en ces have b e e n com m only u se d in CORDIC p ro c e s s o rs r e p o r te d in th e
lite r a tu r e [W a7l] [HT80]. However th e n u isan ce sc ale fa c to rs a n d th e
r e s tr ic te d reg io n s of convergence have b e e n c irc u m v e n te d only a t th e
expense of e ith e r a h a rd w a re or sp eed p e n a lty o r b o th . Som e existing
tec h n iq u es fo r rem oving th e scale fac to rs an d ex te n d in g th e dom ains of
convergence a re now review ed, following which, new h a rd w a re a n d sp eed
efficient m e th o d s will b e developed.
4.1.3 S cale F a c to r N o rm a liz a tio n
W alther [W a7l] su g g e ste d storing th e scale fa c to r K in m em ory, since
its m ag n itu d e is know n once th e sequence \Fi] is d e te rm in e d (see E quation
4.5b). A CORDIC o p e ra tio n w ith m = ±1 is followed b y a division of th e

«. , ,, . - •• . - A*i> •«' __ i?
re s u lt by tile S L O T * I i ~ r T \ iTT^T**' »)~m^ QiviSiOUlS
rea d ily p e rfo rm e d w ith th e sam e CORDIC block how ever, c le a rly a large
sp e ed overhead is in c u r r e d by th e n e e d fo r e x ecu tin g two CORDIC o p eratio n s
for e a c h d e sire d o p e ra tio n . This is very u n d e sira b le in a h ig h th ro u g h p u t
application.
A n other a p p ro a c h due to Haviland a n d T uszynski [HT80] involves th e
use of scaling cy cles to tra n s fo rm Xi+1 to X’i+1. F o r so m e e le m e n ts of th e
seq u en ce \Fi ], th e e q u a tio n s
-v -’ _ 1 "*■ 7 t 2 i 0 „ fz.ii')
1+1 0 l + jiZ ~ Fi (4 1 1 )
im m ed iately follow th e re g u la r CORDIC ite ra tio n , th u s scaling th e m ag n itu d e
- 80 -
to
CO O CO
O rH rH rH
211 2H •
A
rH
4*
CO
o *
e* CO -X
• ■— o r*-.
A
to
• • CD
X rH rH rH • O
ea ill ZU • c
E 4T. <D
A 2
rH cr
CM 07
rH CO
A c
o o
•r—
4-5
A to
•r* • • CO 4-3
U. • • rH O
1 • CZ
CM A • A
n rH to
+ a »
<o • •r— S-
• CO 07
rS
a a S- "O
• CD r—
•r • • CD O
u . A CD >
ts> • • 4->
C
07 A A ••
O ID
c CD
a; A A A «C c3-
2 ^r CO 4-3
cr CD
CD #» A * 4-5 S.
CO CO CO CM «3 2
CD CD
4-5 A A * Q. •r—
«*- CM CM rH CD Ll_
•r* S-
.E A A a
CO rH rH o A
rH
ii
E
s-
o
* 4-
£ tH o rH *
I
- 81 -
of th e v e c to r Xi+1. The value of 7* = ±1 is chosen during e a c h scaling
cycle to m ove ||Xi+ili tow ards unity, while 7 * = 0 during a cycle in which no
scaling is to be p erfo rm ed .
T hese additional equations a re n o t e x e c u te d fo r e a c h e le m e n t in [F-,1
b u t r a th e r fo r a se le c t se t, £Gj} 1 c (i.e. w hen 7 * 5* 0) in o rd e r to
m ak e th e overall scale fa c to r converge to unity. Let K be th e scale fa c to r
in tro d u c e d by th e CORDIC alg o rith m an d le t K* be th e scaling resu ltin g
fro m th e additional o p eratio n s, i.e..
(4.12)
Then JQ J is c h o sen s u c h th a t KK* - 1.
While th is a p p ro a c h to seeding does n o t in tro d u c e as larg e a sp e ed
o verhead as did W alther’s tech n iq u e, th e p e n a lty in c u rre d is still quite
significant (ap p ro x im ately 50% re p o rte d in [HTBO]). Some ad d itio n al co n tro l
is also re q u ire d to recognize th e elem en ts of since scaling is not done
a t e a c h ite ra tio n .
A f u rth e r u n d e sira b le effect of th e s e two ap p ro ach es is to m ake th e
e x e cu tio n tim e for th e c irc u la r an d hyperbolic CORDIC fun ctio n s m u ch
lo n g er th a n th a t for m u ltip lic a tio n an d division in w hich no scaling is
req u ired . This n o nuniform ity of e x ecu tio n sp eed s is quite a nu isan ce in a
synchronous multi-CORDIC p ro c e sso r en v iro n m en t (see e.g. [AMLA81]), since
p ro c e sso rs co m puting m u ltip licatio n s o r divisions m u st w ait fo r o th e rs
w hich a re o p erating in th e m = ±1 system s.
- 82 -
4 .1 .4 S calin g in a P a ra llel Im p lem en tation
The s tr u c tu r e of th e th r e e CORDIC eq u atio n s is su c h t h a t th e y m ay b e
e x e c u te d c o n c u rre n tly a s shown in F igure 4.5. S u c h a re a liz a tio n will be
re f e rr e d to as p a ra lle l. The scaling cycle tec h n iq u e of Haviland and
Tuszynski for norm alizing th e spurious c o n sta n t, K, c a n b e im p lem e n te d in
p a ra lle l realizatio n s w ithout a sp e ed overhead a n d w ith a m o d e st a m o u n t of
ad ditional hard w are. By su b stitu tin g (4.3) fo r X i+1 in (4.11) and dropping
th e s u p e rsc rip t, th e co m b in ed CORDIC ite r a tio n a n d scaling eq u ations a re
Xi+i - (zv+i y t+ i)T -
E xpanding o u t fo r th e v e c to r co m p o n en ts yields:
Xi+1 = Xi + + ■yi x i Z~F' + ^ (4.14a)

A B C
3A+i = Vi ~ rn.j.iiXiZ F< + Fi - ^ (4.14b)

D F
Now te r m s A a n d D a re th e norm al CORDIC ite ra tio n s. T erm B is an
ad d itio n al te r m in th e a:-u p d ate b u t it is sim ply th e o u tp u t of th e sh ifte r in
th e T/-channel. H ence th is re q u ire s no a d d itio n a l c irc u itry in a p a ra lle l
re a liz a tio n Sim ilarly E is available a t th e o u tp u t of th e x -c h a n n e l shifter.
Finally, C an d F a re new te rm s b u t sin ce th e y a re seeded by 2 i , th e y
a re insignificant fo r m o s t of th e sequence \Fi]. They m u st be co m p u te d fo r
th e first few values of ’i \ w here th e ir c o n trib u tio n is significant b u t n o t fo r
all values of ’i ' m ean in g t h a t th e sh ifte r re q u ire d fo r th e se te rm s c a n b e
m a d e r a t h e r sm all. F o r exam ple, th e seeding se q u en c e given in [HT80] would
re q u ire only four possible shifts for six te e n b it q u an tities. Hence, by
building lo u r in p u t r a t h e r th a n two in p u t a d d e rs a n d two additional, sm a lle r
- S3 -
+ /-
A rith m etic U nit
+ /-
A rith m e tic U nit
F igure 4 . 5 : A P a r a l l e l CORDIC Machine A rc h it e c tu r e
- 84 -
sh ifters, th e sealing cycle tech n iq u e fo r norm alizing th e spurious scale
fa c to rs c a n b e im p lem e n te d w ith a v ery low sp e ed p e n a lty (som e p enalty is
due to c irc u it speed). Only a m o d e st a m o u n t of e x tr a h ard w are is n e e d e d
sin ce th e la r g e r additional te rm s, B a n d E , a re available in th e ’u n sealed
CORDIC. Observing th a t th e hardw are c a n b e so s h a re d yields a p o ten tially
larg e sp e e d adv an tag e in a p a ra lle l im p le m e n ta tio n w here sp e ed is likely to
b e a n im p o r ta n t c o n c e rn anyway (o r else a se ria l rea liz a tio n could have
b e e n em ployed).
4 .1 .5 E xtending th e Dom ain of C onvergence
This se c tio n will m ainly be c o n c ern e d w ith th e d om ain of convergence
of th e trig o n o m e tric functions (i.e., m = 1) since a d esirab le, finite dom ain,
i.e. th e circ le , is rea d ily defined. In th e c a se of m ultiplication, division and
h yp erb o lic functions, th e desirab le dom ain of co n v erg en ce is infinite (or a t
lea st, r e s tr ic te d only by th e finite re p re s e n ta tio n of n u m b ers). Recognizing
t h a t th e se q u en c e s \F i] of F igure 4.4 do n o t yield a co m p lete reg io n of
co n v erg en ce fo r th e algorithm (Equation 4.10), w h at c a n be done w hen a
ro ta tio n in itia te s (o r te rm in a te s ) outside th e region?
W alther [Wa7l] su g g ested th e use of p resc a lin g id e n titie s to o p e ra te on
a new v e c to r lying inside th e dom ain of convergence. F or exam ple, w ith th e
[Fil of F igure 4.4 th e dom ain of convergence in cludes angles whose
m ag n itu d e does n o t ex c ee d 1.74 radians. In o rd e r to calcu late, say th e sine
of a la rg e a rg u m e n t, th e angle is firs t divided b y rr/2 resu ltin g in a
q u o tie n t Q an d a re m a in d e r R w ith | R | < it/ 2 w hich is in th e reg io n of
co n vergence. Now
- 85 -
sini? if Q m o d r = 0
cosi? if Q m o d r = 1
siniS = s in ( $ ^ - + R ) =
—sini? if Q m od r = 2
. — cosR if Q m od r = 3
so t h a t only sin R o r cos R n e e d b e co m p u ted . Clearly this tech n iq u e
im plies b o th a su b sta n tia l speed a n d c o n tro l p e n a lty as one division
o p e ra tio n a n d m an y decisions m u st b e p e rfo rm e d .
Still a n o th e r a p p ro a c h is th a t of p re r o ta tio n su g g e ste d by Haviland and
Tuszynski [HTBO]. P rio r to com puting a v e c to r ro ta tio n w hich does n o t lie in
th e reg io n of convergence, ro ta tio n s b y n /2 and tt/ 4 a re p erfo rm ed .
These p rc ro ta tio n s cire relatively sim ple, req u irin g only a m ag n itu d e
m u ltip lic a tio n b y 1 /V 2 which is a c c o u n te d for in th e scaling cycles
g e n e ra tin g K* (E quation 4.12). B oth a s p e e d a n d co n tro l pen alty is once
ag a in in c u rre d .
Com bining b o th th e p re ro ta tio n s a n d scaling m eth o d s, to obviate th e
difficulties w ith th e CORDIC algorithm s, c a n r e s u lt in considerable overhead.
For exam ple, W alther’s CORDIC p ro c e sso r, w hich com bines his two
tech n iq u es, e x h ib its th e execution tim e s g iv en in Figure 4.6. Notice th a t
while th is m a c h in e re q u ire s only 70 //se c fo r th e CORDIC ite ra tio n s w hich
co m p u te th e sine function, a n additional 80 //se c a re re q u ire d for p rescalin g
a n d norm alization!
A new m e th o d capable of tailo rin g th e co n v erg en ce region a n d scale
fa c to rs to d e s ire d values will b e d e sc rib e d . Ideally, an y new tech n iq u e
should r e s u lt in la rg e regions of co n v erg en ce a n d u n ity scale facto rs, w ith
only m o d e s t s p e e d overhead.
- 86 -
■Maximum Execution Tim es
DATA
con D ie p r e s c a l e . t r a n s f e r s
EXE- NORMAL- FROM
CCTIOX IZE, MISC. COMPUTER TOT.AL
ROUTINE i<sec jiscc pscc jjscc
LOAD 0 5 25 30
STORE 0 0 15 15
ADI) 0 15 25 40
SUBTRACT 0 25 25 50
m u l t ip l y 60 15 25 100
DIVIDE 60 15 25 10C
SIN 70 65 5 160
COS 70 85 5 100
TAN 130 85 5 220
ATAN 70 15 5 90
S1NH 70 55 5 130
COSH 70 55 5 130
TAN 11 130 55 5 100
ATANH 70 45 5 12U
EXPONENTIAL 70 55 5 130
LOGARITHM 70 45 5 120
SQUAItE- 70 25 5 *00
KOOT
Figure 4 . 6 : Performance o f M alth er's CORDIC Machine
- 87 -
4 .2 LOT? OVERHEAD SOLUTIONS TO THE PROBLEMS OF CONVERGENCE

REGION AND SPURIOUS SCALE FACTORS
Novel solutions to th e a fo re m e n tio n e d p ro b lem s of scale and
convergence, w hich do n o t suffer fro m th e h a rd w a re a n d sp eed o verhead of
th o s e in [W a7l] a n d [HT80], c a n be found b y re tu rn in g to th e th e o ry of th e
CORDIC alg o rith m s. L et K x be th e value of th e scale facto r, K , w hen
m = 1 a n d K - x th e value of K w hen m = —1. Once th e seq u en ce \Fix
h a s b e e n d e te rm in e d , E quation 4.5b shows t h a t th e value of K can only b e
in fluenced th ro u g h th e choice of 'n '. It would b e id eal if K x and I<-x w ere
in te g ra l pow ers of th e m ac h in e radix, in th is c a s e 2, since th e y could th e n be
rem o v e d w ith sim ple sh ifters. U n fo rtu n ately th is is n o t possible w ith th e
seq u en ce of Figure 4.4 fo r an y value of 'ri as show n in Claim 4.I.-
Claim 4.1:
F o r a ra d ix 2 m ac h in e, th e re ex ists no 'n ' su c h th a t K x an d AT_X a re
in te g ra l pow ers of 2 using th e sequences [ F i ] ^ 1 in F igure 4.4.
Proof:
m _= .1.:
Ki = n VTTs?
<=0
-* lnfST? = 1 0 In (1 + 4“i ) since 5* = 2~Fi = 2“*

i= 0
7*= 1
^ 2j ^ (Jen sen ’s Inequality)
i=0
< lim ^ 4 ^ = 4 /3
- 88 -
-* 1 < K i < e 2' 3 < 2
Sim ilarily, it is possible to show th a t 1 / 2 < K -i < 1.
“While th is is m otivation enough to s e a rc h fo r new seq u en ces , it is
also p ossible to show th a t th e trig o n o m e tric fu n ctio n s c a n n o t be m ade to
co n v erg e for all angles in th e c irc le using th e se q u e n c e of F igure 4.4.
Claim 4.2:
F o r a ra d ix 2 m achine, th e re exists no s u c h th a t by using
of F ig u re 4.4, th e trig o n o m e tric functions converge V $ 0 e —[ tt.tt).
Proof:
n~l
m ax | $0 I = « n -i + 2 aj (from E quation 4.10)
3=0
i= cto + lim V a,-

71— j=0
= ta n _1l + lim y ta n -12-J

»— 3= 0
£ ta n _1l + lim Y. Z~j

7l_,“ j = 0
= 2.785 < 77
It is n o t p ossible to s ta te a sim ilar re su lt fo r th e hyperbolic functions, since
$0 e ( —=c,=) an d so th e alg o rith m is lim ited by th e finite b it re p re s e n ta tio n of
S ince th e se q u en c e s [Fi] of F igure 4.4 ex h ib it som e unfavorable
p ro p e rtie s , a lte rn a te seq u en ces m ay be sought. T hese too should be non
- 89 -
d e c re a sin g seq u en ces so t h a t e a c h ite ra tio n refin es th e ro ta tio n , an d th ey
m u s t co n sist only of in te g e rs to a c co m m o d ate th e use of sh ifters.
L em m a 4.2:
Suppose [Fil is a non -d ecreasin g se q u en c e of in te g e rs. If \Fi
sa tisfie s th e convergence c rite rio n (E quation 4.9) an d a new sequence,
$Q . is c o n s tru c te d by re p e a tin g th e l tk e le m e n t of su ch th a t
{Gil = [F0 F i • ■ • F i-i Fi Ft Fi+i ■■ ■ Fn - Z\ th e n also satisfies
th e co n v erg en ce c rite rio n fo r I = 0, 1.......n —2.
Proof:
L et be th e sequence of in c re m e n ta l ro ta tio n s corresponding to
i.e., oq • • • c j.j ctj cq a £+1 • • • a ^ j.
satisfying th e convergence c rite rio n im plies
a-: - £ CL}- < an -! i= 0 , 1 , 2 n-2 (4.13)

;=i+l
It is n e c c e s s a ry to prove th at:
a 'i - ^ a'j < a ’n - i i= 0 , 1 , 2 ......n - 2 (4.14)
N otice th a t:
cti i < I
K; = (4.15)
Oti_! i >I
F u rth e rm o re , n o tic e th a t fo r th e la rg e r values of i , a t ~ 5j, for sequences
like th o se of F igure 4.4, and since g e n e ra lly Fi - F i-i + 1. we will use th e
ap p ro x im atio n an - 2 « 2a n-i-
In o r d e r to prove (4.14), p ro ce e d as follows:
- 90 -
i >1:
n=l n=l
a 'i ~ 2 a '} = ai- 1“ 2
j'=i+l j= i+ l
n-2
= CXi- j - 2 a3
}= i
< 2on_1 « CXn-2 = a-'n-l
T t-l i n-1
a'i ~ 2 a 'j = ai - 2 ai + 2 «J- 1
3=1+1 j"=i+l 2=i+l
t n-2
= - 2 aj + 2 ai
j=i+l j=£
= at - eXj — at
3=i+1
< 2 an_i - a* s= 0 <
T h eo rem 4.2:
If ^ " o 1 satisfies th e co n v erg en ce c rite rio n and j^ c 1 is
c o n s tru c te d b y re p e a tin g th e Ith e le m e n t of \Fi] m u ltiply and fo r one o r
m o re I w ith in th e lim it s of seq u en ce le n g th , th e n also satisfies th e
convergence c rite rio n .
Proof:
Follows fro m re p e a te d a p p licatio n of L em m a 4.2. However n o tice th a t
th e im p licit a ssu m p tio n in th e proof of le m m a 4.2 th a t an - 2 « 2an _! r e s tr ic ts
th e form of [Fi] an d m ay be violated th ro u g h excessive re p e titio n .
This th e o re m is very pow erful b e c au se it g u a ra n te e s th a t th e m u ltitu d e
of se q u en c e s g e n e ra te d in th e p a rtic u la r m a n n e r su g g ested , do in fa c t
satisfy th e co n v erg en ce c riterio n . A nother tech n iq u e for c o n s tru c tin g new
se q u en c e s is v ery sim ilar, b u t th e seq u en ce is allowed to grow in length.
L em m a 4.3:
If satisfies th e convergence c rite rio n (Equation 4.9) and a new
sequence, . is c o n s tru c te d by rep e a tin g th e I th e le m e n t of \Fi]
su c h th a t I Q = [F0 F a • • • Fi-x Ft Ft Fm ••■ th e n also
satisfies th e co n v erg en ce c rite rio n fo r I = 0, 1.........n —2.
Proof:
I t is n e c c e s s a rv to prove:
CLi 2j j ^ » (4.15)
j=i+l
Once again, p ro c e e d in two p a rts:
i > I:
j=i+l i= i+ i
^ &jj-1 — & 71
- 92 -
i<Z:
n I n
£Xi — ^ &j ~ 2 — 2 -1
j'= i+ l j = i+ l i= i+ l
£ 7 1 -1
= ai - £ *i ~ H ai
j=i+1 J=£
= « i ~ 1 2 a ; “ «£
j=i+i
< c tn - 1 — cq < 0 < a 'n
T h e o rem 4.3:
If Sx'iMWc1 satisfies th e convergence c rite rio n a n d jGjjiLc. 7i' > n , is
c o n s tru c te d b y re p e a tin g th e Ith e le m en t of [Fi] m u ltip ly and for one o r
m o re I, th e n also satisfies th e convergence c rite rio n .
P roof:
Follows fro m r e p e a te d ap p lic atio n of Lem m a 4.3.
T heorem 4.3 provides a u sefu l c o n stru c tio n s c h e m e sim ilar to T heorem
4.2. At firs t glance, th is is v e ry close to th e scaling cy cle tech n iq u e of
H aviland e t al [HT80], how ever it is c le a r t h a t while th e ir m e th o d only scales
during a scaling cycle, th e p re s e n t tech n iq u e sc a le s and ro ta te s
sim ultaneously, h en ce m oving c lo se r to th e final re s u lt. Clearly, this is a
m o re efficient m ethod.
- 93 -
Exam ple 4.1:
L et n = 16. S tartin g w ith th e sequence of Figure 4.4 for m - 1, apply
th e c o n s tru c tio n of th eo re m 4.2 to g e n e ra te th e sequence
{Gil = I 0, 1, 1, 2. 2. 2. 3. 3. 3, 4, 4, 5, 6, 7. 8. 9 I
This y ield s K x = 1.99 and m ax |$„ | = 172.2° c o m p a red with K\ = 1.67 and
m a x ]$0 | = 99.2° fo r th e original sequence.
N ext, s ta r t w ith th e sequence for m. = —1 to c o n s tru c t
l(kl = u . 1. 1. 1. 2. 2, 2, 3, 3, 4, 4, 6, 7, 8, 9, 10 I
w hich yields K - x - .500 and m ax |$ 0 | = 3 .3 7 co m p ared w ith = .828 and
m a x |$ 0 | = 1.12 for th e original sequence.
N otice t h a t b o th sequences r e s u lt in scale facto rs w hich a re easiiy
c o m p e n sa te d , as well as g reatly e n la rg ed regions of convergence.
It is a p p a re n t from this ex am p le th a t no special seeding cycles a re
re q u ire d during th e co m putation. The algorithm p ro ceed s n o rm ally and
K i « 2 an d K - x » 1 / 2 a t term in a tio n . These effects a re rem o v e d by a
single final shift o p eratio n w hich is quite inexpensive to perfo rm .
F u rth e rm o re , th e c o n stru c tio n of lem m a 4.2 in effect tra d e s an - i for
an d sin c e at > a „_ i (due to th e non-decreasing n a tu re of J at O' th is
m e th o d will always re s u lt in a la r g e r dom ain of convergence. Sim ilarily,
applying th e o re m 4.3 also yields a la rg e r convergence region.
E xam ple 4.2:
L et 77- = 16, m = 1 and s t a r t w ith th e new sequence {Fil = {i - 1
w hich provides a la rg e r dom ain of convergence th a n V older's original
seq u en ce. In fact, {Fil yields = 3.73 and m ax |$ 0 | = 162.6°. Apply
- 94 -
th e o re m 4.2 to c o n s tru c t
= I - 1 , C, 1, 2, 2, 2, 3, 3. 3, 3, 4, 5. 6, 7, S. 9]
which yields = 4.0 an d m ax |$ 0 ! = 212.7° T heorem 4.3 yields sim ilar
re su lts, how ever th e seq u en ce is longer.
Exam ple 4.2 p rovides a sequence t h a t sim u ltan eo u sly re s u lts in a
co m plete re g io n of convergence and scale f a c to r n o rm alizatio n w ith two
shift o p eratio n s. P re cisio n loss owing to th e two b it sh ift is overshadow ed by
th e a c c u ra c y of th e scale fac to r.
4.2.1 E ffect o n A ngular R esolution
A final p ro p e rty affected by $5^ is th e "a ngular reso lu tio n " of th e
algorithm . R ecall t h a t if [F ^ satisfies (4.9) th e n $ (or z ) converges to
w ithin CLn-i of zero w ithin 'n' ste p s w here is defined to be th e
an gular re so lu tio n o r a n g u la r g ra n u la rity of th e co m putation. When \ }
is form ed fro m [F ^ using th e c o n stru c tio n of T heorem 4.2, a'n-i is la rg e r
th e n a n d so th e an g u lar reso lu tio n is w orse. Typically however th e
reso lu tio n is still a c c u r a te enough for m o st ap p licatio n s as was evidenced in
th e ex am ples w h ere cx'n-i = 0.002 rad ia n s w hen m = 1. The c o n stru c tio n
of th e o re m 4.3 m ay be em ployed w hen th e a n g u la r reso lu tio n of th e original
\Fi] m u s t be m e t, sin ce a'n■= On-y In exam ple 4.2, such a c o n stru c tio n
would wield a scale fa c to r of four, a co m p lete reg io n of convergence and
un co m p ro m ised a n g u la r reso lu tio n w ith only five additional CORDIC
ite ra tio n s. This c o m p a re s favourably with th e som e eleven scaling ite ra tio n s
as well a s th e p re -ro ta tio n overhead of o th e r im p lem en tatio n s [HT80].
Sim ilar c o m m e n ts apply to th e case of m. = —1.
- 95 -
4 .2 .2 S im u lation R esu lts
C o m p u ter sim u latio n re s u lts a p p e a r in F igure 4.7 w hich confirm th e
o p e ra tio n of th e CORDIC algorithm fo r exam ple 4.2. In th e tab le, th e v e c to r
h a s co m p o n en ts X a n d Y while Z r e p r e s e n ts a n angle ($). Scaling in th e
sim u latio n p ro g ra m was accom p lished by r ig h t sh iftin g b o th X an d Y two
bits. The ex am p les given in Figure 4.7 c le a rly show t h a t th e re g io n of
co n v erg en ce of th e algorithm is indeed c o m p le te an d th a t th e spurious scale
fa c to rs a re re a d ily c o m p e n sa te d for. The final e n tr y in th e tab le shows th a t
th e alg o rith m even converges for larg e angles. The se q u en c e s of Figure 4.4
would have given e rro n e o u s re su lts in th is case.
4 .2 .3 C om putational Speed and Hardware C om plexity
T echniques r e p o r te d by W alther [W a7l] a n d H aviland e t al. [HT80] for
scaling a n d im proving th e convergence re g io n of CORDIC algorithm s, have
b e e n shown to be expensive b o th in th e re q u ire d ad d itio n al h ardw are and
th e e x e c u tio n tim e. The n e e d for s e p a ra te s c alin g a n d p re ro ta tio n s im poses
a s p e e d o v erh ead as larg e as 120%, i.e., th e a d d itio n al o p e ra tio n s consum e
m o re tim e th a n th e CORDIC ite ra tio n s them selves! In com parison, th e
re s u lts m ay b e c o m p u te d in n e a rly th e tim e re q u ire d fo r th e CORDIC
re c u rsio n s aione, by using th e tech n iq u e ju s t d escrib ed . F u rth e rm o re , no
ad d itio n al h a rd w a re is re q u ire d for d ecision m ak in g or p re ro ta tio n s , a very
significant savings.
T here is a n e sse n tia l difference b e tw e en th e p r e s e n t m e th o d an d th e
scaling cycle tec h n iq u e of [HT80] w hich a c c o u n ts for th e en h an ced
efficiency of th e fo rm er. During a scaling cycle, th e m eth o d of [HT80] scales
th e_ m ag n ilu d e of a v e c to r b u t does n o t r o ta te it. In th e p re s e n t schem e,
r o ta tio n an d seeding o c c u r sim ultaneously, sin c e th e n a tu ra l scaling of th e
O perution Initial Values E xpected Final Values Final Values A n gu lar

E rro r
(R ad ians)
X Y Z X Y 7. X Y Z
V ectoring .25 .25 0 . 350 0 .7850 .350 .0007 .7836 .0010
V ectorin g -.1 -.2 0 .22<l 0 -2.03 0 .220 .0002 -2 .0 3 5 .0007
oo
o
V ecto rin g -.2 5 0 0 .250 0 3.101 .250 3.1399 .0017
Rotation -.2 5 0 1.5708 0 -.2 5 0 -.0001 -.25 03 .0003 .0003
Rotation .20 -.1 .61 .221 .0326 0 .222 .0320 .0013 .0013
.25 .25 2.3562 -.3536 0 0 -.3 5 3 9 -.0011 -.0 0 2 .002

Rotation
Figure 4 . 7 : Computer S im u la tio n R e su lts

- 97 -
CORDIC a lg o rith m is exploited. Hence, th e p r e s e n t sch em e is significantly
m o re efficient.
A final p o in t reg ard in g advantages of th e p r e s e n t sc h em e is th e
u n ifo rm ity in n u m b e r of req u ire d ite ra tio n s fo r th e e le m e n ta ry functions
com pare'd w ith th e d isp a rity in th e n u m b e r of ite ra tio n s in th e sc h em e s
p re s e n te d in [Wa7l] a n d [HT80]. Com puting v arious CORDIC functions
re q u ire s a n a p p ro x im ately equal n u m b e r of ite ra tio n s , sin ce th e m odified
seq u en ces a r e of a lm o st equal length. Thus, th e su g g e ste d m eth o d lends
itse lf m o re easily to im p lem en tatio n s of m any CORDIC p ro c e sso rs o p eratin g
in p arallel, s a y in a pipeline or a tightly coupled m ode, b e c a u se th e waiting
tim e of a p ro c e s s o r fo r o th e rs to co m p lete th e ir calcu latio n s is m inim ized
[AMLA81].
Remark:
The CORDIC sc h em e was originally s tr u c tu r e d in a m a n n e r w hich
e n s u re d t h a t [Fi] an d K w ere always c o n stan t. However, it is c le a r th a t th e
e x e cu tio n s p e e d of th e CORDIC re c u rsio n s can oe e n h a n c e d in som e c a se s by
te stin g th e pro x im ity of th e v e c to r being ro ta te d , to its d e stin a tio n value, a t
e a c h ite ra tio n . It c a n in d ee d happen, th a t th e d e s ire d re s u lt is achieved in
th e first few ite ra tio n s. However, in th is case, th e scale c o n s ta n t is also
d e te rm in e d solely by th e s e few ite ra tio n s, and its rem oval involves a division
o p e ra tio n (sin c e its value is d e t e r m in ed by th e n u m b e r of ite ra tio n s
e x e cu te d ). Sim ilarily, d uring vectoring o p e ra tio n s, th e CORDIC equations
always r o ta te th e v e c to r to th e positive abcissa. Som e sp e e d ad vantage c a n
b e rea liz e d b y ro ta tin g a v e c to r to th e n e a r e s t c o o rd in a te axis, how ever
seeding is o nce again p ro b lem atic.
- 98 -
4 .3 HYBRID CORDIC ALGORITHMS
CORDIC alg o rith m s offer th e ability to co m p u te v e c to r ro ta tio n s and
trig o n o m e tric fu n ctio n s to good p recisio n w ithout th e n e e d fo r m ultipliers
or s to re d trig o n o m e tric ta b le s (aside from ai). In c o n tra s t, a n a rra y
m u ltip lier coupled w ith s u c h tab le s (re fe rre d to a s th e s to re d tab le
approach) is cap ab le of com puting th e ro ta tio n s m u c h f a s te r th a n th e
ite ra tiv e CORDIC alg o rith m s. This sectio n will explore th e com b in atio n of
CORDIC's w ith m u ltip lie rs an d s to re d tab le s in o rd e r to achieve f a s te r v ecto r
ro ta tio n s th a n th e b a sic CORDIC’s, b u t w ith a c o n sid e rab ly low er sto rag e
re q u ire m e n t th a n th e s to re d ta b le approach. The p e rfo rm a n c e m e a su re to
be m ain ta in e d by all tech n iq u es is angular reso lu tio n . Only th e
trig o n o m e tric fu n ctio n s will be co nsidered alth o u g h th e m e th o d to be
p re s e n te d c a n be re a d ily applied to o th e r co o rd in ate sy ste m s.
4.3.1 Interp olation with. CORDIC’s
C onsider th e CORDIC alg o rith m s for m = 1 w ith th e se q u e n c e s [Fi] of
Figure 4.4. A fter ’n ’ ite ra tio n s, th e angular re so lu tio n is of th e o rd e r of
2 -JI+1 ra d ia n s. A m u ltip lie r a c cu m u la to r to g e th e r w ith sine a n d cosine
tab le s could a cco m p lish th e sam e re s u lt in four o p e ra tio n s pro v id ed th e
tab le s se g m e n te d th e u n it c irc le into 2nir p a rts . The m in im um sto rag e
re q u ire d is l / 8 t h of th e circle, i.e., sines an d co sin es fo r som e S71-3^
different angles. F o r even m o d e ra te values of 'n ' s u c h as 16 or 24, this
b ecom es q u ite significant.
The sto ra g e re q u ire m e n t m ay be re d u c e d c o n sid e rab ly b y quantizing
th e c irc le m u c h m o re coarsely, ro ta tin g by th e c lo s e s t angle using th e
m u ltip lier a n d t h e n in te rp o la tin g using CORDIC's to achieve th e desired
re s o lu tio n
- 99 -
Let iFilZTo1 = [iliFo'1 (as, for in stan ce, w hen m = i) so th a t th e CORDIC
a lgorithm re s u lts in a n an g u lar reso lu tio n of R q = 2_n+1 (radians) a fte r
'n ' ite ra tio n s. L et th e u n it circle be q u an tized to 2~k p a rts yielding
reso lu tio n R j - 2tp2_1: ~ 2~k+zs. An in te rp o la tio n ste p via CORDIC's
s ta r ts a t i = k - 2 and p ro ce e d s to n - l i.e.. n - k + i CORDIC ite ra tio n s
a re req u ired . F o r p u rp o se s of analysis, a ssu m e all q u an tities a re
re p re s e n te d w ith equal w ordlengths, th e m u ltip lie r tim e is 7# and a
CORDIC ite r a tio n re q u ire s tim e Tc- The e x ecu tio n tim e, E , and sto ra g e
req u irem e n t, 5 , fo r th e th re e sch em es u n d e r c o n sid e ratio n a re (Memory
m an ip u la tio n tim e is ignored) :
CORDIC only:
E c = riTc Sc = n locations
M ultiplier only:
Eji = 4 7 # S ji = ^ n = 2n-37r locations

o
Hybrid:
E jj — 4 7 # + (ti —k + l) Tc S jj * 2fc-s locations
Then:
+1 TC Sn fc Ec TL
- 1 + 4 TU wM e W = ^ n - k + l + 4 T M/ T c
i.e., while th e e x e c u tio n tim e ratio s a re only lin early d e p e n d e n t o n n - i (th e
n u m b e r of in te rp o la te d b its), th e s to ra g e re q u ire m e n t varies exponentially!
These ra tio s a re d e p ic te d grap h ically in F igure 4.8.
- 100 -
n-k
T “(bits)
20
Figure 4.8: Performance of Hybrid CORDIC Scheme
- 101 -
Example 4.3:
L et n = 24, A: = 15 a n d —— = 2. Then:
—— = 51277 and
Eu
i.e., while th e h y b rid m e th o d is only 2.25 tim e s slow er th a n th e s to re d ta b le
m eth o d , it provides a 1500 fold re d u c tio n in storage! On th e o th e r hand, th e
h y b rid m e th o d is also 33% f a s te r th a n th e s ta n d a rd CORDIC.
Rem arks:
1} The choice of ' k ' c a n be optim ized for th e d e sire d com bination of
s p e e d a n d sto ra g e c o n stra in ts.
2) The a rr a y m u ltip lie r is likely to b e m u ch la rg e r in a n in te g ra te d
re a liz a tio n th a n a CORDIC block so th e a re a p e n a lty in c u rre d by
th e h y b rid sc h em e th ro u g h th e ad d ition of a CORDIC block is
m arg in al (This is esp ecially tru e if th e CORDIC ite ra tio n s a re done
w ith th e m u ltip lie r in th e c a se w hen Tu - Tc). In an y event, th e
overriding c o n sid e ra tio n is th e re q u ire d sto rag e. Clearly, th e
h y b rid sc h em e re q u ire s less a r e a th a n th e m u ltip lie r sch em e since
th e m e m o ry a re a is d rastic a lly red u c e d .
3) This tec h n iq u e, a lth o u g h p r e s e n te d only fo r m = 1, c a n be read ily
a p p lied to o th e r c o o rd in a te sy stem s.
- 102 -
4 .3 .2 A T aylor S e rie s A pproach t o H ybrid CORDIC's
The foregoing h y b rid CORDIC m eth o d provided th e ability to tra d e
e x e cu tio n sp e e d fo r sto ra g e by initially com puting a c o a rse r o ta tio n and
th e n using th e CORDIC m eth o d to in te rp o la te th e rem ain in g b its. An
a lte rn a te h y b rid sch em e which first em ploys th e CORDIC ite ra tio n s to
re d u c e th e angle to a sm all value and th e n utilizes a Taylor se rie s expansion
to achieve th e final r e s u lt will now be describ ed . As in th e previous section,
only th e p lan e ro ta tio n case will be d iscu ssed in detail, how ever th e
ex ten sio n to th e additional CORDIC functions will be obvious. This hybrid
a p p ro a c h ach iev es s u p e rio r p erfo rm a n c e im p ro v e m e n t while req u irin g n o
ad d itio n a l sto ra g e , how ever it does n o t allow fo r as m u c h c o n tro l over th e
sp e e d tra d e o ff as th e m e th o d of S ection 4.3.1 d id (th ro u g h th e choice of
n —k ) .
C onsider th e v e c to r ro tatio n :
COS 2 —S£T. 2
Xn = sin 2 cos 2
Let all q u a n titie s b e re p re s e n te d to 'n' -b it precision. Then, a v e c to r ro ta tio n
re q u ire s 'n' CORDIC ite ratio n s. Suppose th a t m < tl ite ra tio n s a re
p e rfo rm e d yielding a v e c to r XTO and a resid u a l angle, <pm , close to zero,
w hich is th e d e s tin a tio n angle. Next, n o te th a t th e Taylor s e rie s expansions
of s in a n d cos y> a ro u n d th e origin are:
xn (—l) fc (p2*
c o s <" = s , < M ~ -
However for <pm close to zero, only th e first t e r m of th e ex p an sio n is
significant, i.e.,
- 103 -
sin. <pm = <pm
T herefore th e h y b rid CORDIC a lg o rith m is:
S tep 1: C om pute 'm ' CORDIC ite ra tio n s to o b tain X m w ith a sm all
resid u a l angle <pm .
S te p 2:
1 ~<Pm
= »» i *
N otice th a t no a d d itio n a l s to ra g e is req u ire d in th is h y b rid sch em e an d th a t
only two m u ltip lic a tio n s a re n e c e s s a ry in th e final ste p of th e algorithm . It
is straig h tfo rw ard to show th a t:
Eh _ m Tc 1_
0 < m. < n
Eh 4 Tm 2
71
0 < m <7i
Eh 772. + 2 T ji/ T c
These rela tio n s a re show n g rap h ically in Figure 4.9.
It is n e c e s s a ry to d e te rm in e how larg e 'm ' m u st be in o rd e r for th e
tru n c a tio n of th e T aylor s e rie s to be justified. F o r 'n' b it precision,
m u st be ch o sen sm all enough s u c h th a t th e additional te rm s of th e series
a re individually s m a lle r th a n m ay b e accom m odated by th e finite b it
re p re s e n ta tio n . (Note t h a t it is n o t n e c e ssa ry to g u a ra n te e t h a t th e
su m m a tio n of th e r e m a i n ing te r m s b e so sm all since finite b it a d d itio n is n o t
associative. Indeed, failu re to reco g n ize th is fa c t would serio u sly com prise
th e p e rfo rm a n c e of th is m eth o d ). T herefore, i t suffices to choose 'm ' such
th a t
<Pm < 1 ana
- 104 -
G(Wc.=oJ
/x —
ftr . / J = : 0 \ /
Figure 4 . 9 : Geometric I n t e r p r e t a tio n o f th e CCM
- 105 -
i.e.,
However, as in d ic a te d in S ection 4.3.1, th is re q u ire s
i.e., u p to alm o st one half of th e b its m a y b e ob tain ed from th e final s te p of
th e algorithm , w ith th e sim ple inclusion of a m ultiplier! (This is s u p e rio r to
som e of th e new er schem es, e.g., [F a 8 l], w hich em ploy tru n c a te d pow er
E xam ple 4 .4
R ecall Exam ple 4.3 in w hich ti = 24 and 7 # / Tc = 2. With

7b 1
ttl = — ] = 13, th e p e rfo rm a n c e of th e p re s e n t m eth o d is:
w hich is b e tte r th a n Exam ple 4.3. The r e a l b en efit of th is m e th o d is t h a t no
s to ra g e is req u ired .
R em ark :
1. In th is exam ple, only 13 CORDIC ite ra tio n s a re p e rfo rm e d . If 13
ite ra tio n s w ere p e rfo rm e d w ith th e m eth o d of S e c tio n 4.3.1, th e n
E # / E ji = 2.75 w hich is w orse th a n th e p r e s e n t schem e.
2. For larg e m , n i t is c le a r t h a t am - j « 5m_! = 2-mM. Since
9m < Kjr.-i. it is possible to sacrifice angular re so lu tio n and
a p p ro x im ate w ith a pow er of two (e.g.. a m-i). Then
- 106 -
Elif 77L 4 2
—— = — ------ —— -which r e p re s e n ts a n ad ditional p e rfo rm a n c e
Hu 4 la
im provem ent.
3. This tec h n iq u e m ay b e rea d ily applied to th e o th e r CORDIC
functions. F o r exam ple, th e hyperbolic ro ta tio n s involve Taylor
se rie s expansions of co sh <pm an d sinh cpm w hich a re v e ry sim ilar
in fo rm to th o se for cos <pm an d sin tpm .
4. A n g u lar R e so lu tio n R e v isite d
T heorem 4.2 provided a v e ry pow erful m eth o d fo r im proving th e
do m ain of convergence of th e CORDIC algorithm , how ever it
re d u c e d th e an g u la r reso lu tio n as m en tio n ed in S e c tio n 4.2.1.
F igure 4.7 shows th a t th e an g u la r residual, w hich is eq u iv alen t to
5 , was n o n e th e less v ery sm all. Exploiting th is fact, i t is n a tu ra l
to im prove th e reso lu tio n by applying Step 2 of th e foregoing
h y b rid algorithm , w hich re q u ire s th e addition of a m u ltip lie r to th e
CORDIC a rith m e tic u n it. 'Whether it is p re fe ra b le to c o n s tr u c t th is
m u ltip lie r o r sim ply em ploy T heorem 4.3 to im prove th e reso lu tio n
will d e p e n d on th e application.
4 .4 FLOATING POINT CORDIC ALGORITHMS (FLOEDIC)
All expositions on CORDIC algorithm s to d a te have b e e n a im e d a t fixed
p o in t im p lem en tatio n s, and th e n a tu r a l a ssu m p tio n h a s b e e n t h a t if th e
a lg o rith m s co u ld be generalized, a n in te g ra te d re a liz a tio n would b e m u c h
m o re com plex a n d costly. The FLORDIC tech n iq u e to b e d e s c rib e d in th is
sectio n , is a floating poin t CORDIC m ethod. In te restin g ly . FLORDIC
alg o rith m s a p p e a r to b e sim gder to im p lem e n t th a n th e fixed p o in t CORDICs
sin c e th e y c a n b e s tr u c tu r e d to re q u ire little o r no shifting fo r th e seeding by
- 107 -
<5j. This is v ery significant fo r b it p a ra lle l realizatio n s like F igure 4.5 in
which th e b a rr e l s h ifte r co n su m es m o s t of th e chip a re a (as will b eco m e
obvious in c h a p te r six). Accordingly, th e th ro u g h p u t p e r a re a ad v an tag es of
b it p a ra lle l v e rsu s b it se ria l rea liz a tio n s a re quite larg e (som e d e ta ils a p p e a r
in c h a p te r six).
C onsider floating point n u m b e r re p re s e n ta tio n :
X = Mz 6 c*+e (4.17)
w here
AT is a floating po in t n u m b e r
Mx is a m a n tissa in sign plus m ag n itu d e fo rm at [Hw79]
cz is a n in te g e r c h a ra c te ris tic
k is th e c h a ra c te ris tic b a se
e is a n offset q u a n tity for a n ex ce ss sto ra g e fo rm at [Hw79]
Mx is g e n e ra lly in e ith e r no rm alized o r sta n d a rd iz e d form , th e l a t te r
re p re s e n ta tio n being used in m o st m ain fra m e c o m p u te rs while th e fo rm e r is
th e IEEE m ic ro c o m p u te r floating p o in t sta n d a rd .
R ecall th e CORDIC ite ra tio n s (E quations 4.3 a n d 4.6):
a:i+1 = X i+ m S iy , (4.18a)
Vi+i = Vi ~ (4.18b)
Zi+i= -O i (4.18c)
(th e fa c t th a t th e signs in th e eq u atio n s dep en d o n th e d ire c tio n of ro ta tio n
has b e e n ig n o red h e re since it h a s no b e a rin g on th e p re se n ta tio n )
In th e sequel, a ssu m e th a t th e c h a ra c te ris tic and m a n tis s a in th e
floating poin t fo rm a t a re se p ara b le (all floating poin t a rith m e tic u n its a re
capable of doing th is, m aking it a rea so n a b le assum ption). F u rth e rm o re , le t
- 108
—F
6i = p i fo r som e p. Then, th e CORDIC equations m ay be w ritten:
Zi+1 = Xi + m.My.b Vi p Fi (4.19a)
Vi+i = V i - M s f i ^ ' p Fi (4.19b)
The m ajo r difficulty in realizin g th e s e equations a re th e p ro d u c t te rm s
of th e g en eric form :
b =+ e p -F
Many options ex ist for co m puting th is p ro d u ct.
Case 1: p = 2 , 6=2*
In th is case, w hich is th e c lo s e s t to th e fixed point situation, E quation
4.19 becom es:
(c — ft-) + g
Xi+1 = Xi + rnMy. b Vi k (4.20a)
Fi
Vi+i = Vi ~ MXib { Xi (4.20b)
These equations a re re a d ily reco g n ized as floating p o in t additions of x*
a n d y t in w hich one of th e two v a riab les h a s h a d its c h a ra c te ris tic m odified
F-
by S u b tra c tin g a c o n s ta n t fro m th e c h a ra c te ris tic is a n e asy task , so
th e in te g ra te d im p le m e n ta tio n of (4.20) is read ily achieved w ith a floating
F-
p oin t ad d er. W ell... n o t quite. N otice th a t ■£~is n o t always a n in te g e r. L et
F-
qi = in te g e r = Fi m o d k . Then Equation 4 m ay be w ritten :
xi+1 = Xi + m{2~*iMyi ) b ' Vi (4.21a)
- 109 -
__ (c_ - ?t ) + e
Vi+i = Vi ~ (2 % ) b 1 (4.21b)
N ov E quation 4.22 is p a rtic u la rily e a sy to im plem ent. Sim ply shift th e
m a n tis s a an d s u b tra c t a c o n s ta n t fro m th e c h a ra c te ris tic of (or y t) and
do a floating po in t addition w ith y i (or x^) to yield th e re s u lt. The re su ltin g
m a n tis s a is g u a ra n te e d to be in th e c o rr e c t fo rm a t since r* < k .
Rem arks:
1) The m a n tis s a shift could be done w ith th e shift u n it u se d in th e
floating point adder, th u s n e c c e ssita tin g no f u rth e r hard w are.
A lternatively, a s e p a ra te s h ifte r will en h a n ce th ro u g h p u t.
2) Even if a se p a ra te sh ifte r is built, its shift range is m e re ly from 0
to k — 1 r a th e r th a n th e e n tire ra n g e of [Fi ] as in th e fixed p o in t
case. For exam ple, c o n sid er a 32 b it floating point r e p re s e n ta tio n
in w hich b = 16 , i.e. k = 4 a n d a 24 b it m a n tissa is used. The
values of \F.■] ran g e from 0 to 23 req u irin g a 23 po sitio n b a rre l
s h ifte r for a parallel rea liz a tio n (quite a form idable task ), while th e
p r e s e n t floating p o in t m e th o d would req u ire a th re e p o sitio n
s h ifte r only. The la tte r is of c o u rse m uch, m u c h sim pler to
in te g ra te and consum es roughly 12% of th e area.
3) Fi c a n be s to re d as qz , w ith no s to ra g e pen alty w hatever.
Case 2: p = b
This c a se is p e rh a p s th e m o st in te re s tin g , since it is re a lly th e CORDIC
a lg o rith m applied to a radix p m ach in e (it is in te re stin g th a t th is re s u lts in a
n a tu r a l c o n n e ctio n to floating po in t alg o rith m s). S u b s titu te k = 1 in to
(4.20) to get:
- 110 -
(=„. - Fi ) + e
z i+1 = Xi + m U y<h 1 (4.22a)
(c - )+e
Vn-i = V i ~ * (4.22b)
This case re su lts in a tru ly sim ple im p le m e n ta tio n since yl+1 (o r Z£+1) is
read ily ob tain ed th ro u g h a floating po in t ad d itio n of y x (xt ) and x x (my,^),
a fte r a c o n sta n t has b e e n s u b tr a c te d fro m th e c h a ra c te ris tic of x i (yx). This
is clearly m u ch sim pler th a n th e fixed p o in t situ a tio n since only a d d e rs a re
req u ired . The am ount of shift re q u ire d by th e floating p o in t a d d e r for rad ix
p o in t alignm ent is sim ply [ cyi - c ^ + F.: j ( Ic^ - cyi + Ft j) and th e
m odification of th e c h a ra c te ris tic of z* (yi) n e e d only be explicitly c o m p u ted
for
th e c h a ra c te ris tic of y i+l w h en cx . — Fx < cy.
th e c h a ra c te ris tic of Zi+1 w hen cyi — Fx < cXl
This se ctio n has th u s fa r d e m o n s tra te d th a t CORDlCs m ay b e readily
g en eralized to floating point re p re s e n ta tio n s having sim ple realizations,
how ever it is still n e c c e ssa ry to d e sc rib e a m ea n s fo r sim ultaneously
obtaining a sufficiently larg e reg io n of convergence an d a n a p p ro p ria te
a n g u la r reso lu tio n to acco m o d ate th e larg e dynam ic ra n g e of n u m b ers, i.e.
a n a p p ro p ria te choice of \Fxl. 'When m = 1, an y of th e seq u en ces of Section
4.2 m ay be em ployed sin ce th e size of th e d e sire d reg io n of convergence is
n o t a lte re d by th e floating point re p r e s e n ta tio n (i.e. w h eth er floating or
fixed point, convergence is still d e s ire d fo r all angles in th e circle).
U nfortunately, th is is n o t th e c a se fo r th e lin ear an d hyperbolic co o rd in ate
sy ste m s and th e following a lte rn a tiv e s m u s t be considered:
1) R eso rt to th e u se of som e negative in te g e rs in \FX\ as was done in
exam ple 4.2. This will im prove th e region of convergence, however
- Ill -
additional ite ra tio n s will be n e c c e s s a ry if a n g u la r reso lu tio n is n o t
to b e com prom ised.
2) Utilize th e prepcaling id e n titie s in [Wa7l] w hen larg e qu an tities a re
e n c o u n te re d .
3) W hen m. - —1, 0 it is m o st convenient to exploit th e sep arab le
n a tu r e of floating point re p re s e n ta tio n s to p e rfo rm only fixed point
o p e ra tio n s o n th e m an tissas a n d a c c o u n t fo r scaling w ith sim ple
ad d itio n s to th e c h a ra c te ris tic s of th e q u a n titie s involved. Since
th e m an tissa s a re always norm alized (o r sta n d a rd iz e d ), th e y lie in
a very lim ited ran g e an d convergence of th e fixed po in t CORDIC
a lg o rith m does n o t pose any problem s. E xam ples of th is idea a re
p r e s e n te d in [Ah82], to which th e re a d e r is r e fe rre d fo r additional
details.
The l a t te r ap p ro a c h is v ery a ttra c tiv e since convergence of th e algorithm is
n o t of m a jo r concern.
4 .5 THE CONVERGENCE COMPUTATION TECHNIQUE
A new ite ra tiv e technique, b a sed on a g e n e ra liz a tio n of th e convergence
division m e th o d [Go64], for th e evaluation of exponentials, logarithm s, ra tio s
an d sq u a re ro o ts of fra c tio n a l n u m b ers was in tro d u c e d by Chen [Ch7l].
C onsider initially, th e convergence division m eth o d fo r evaluating
Q = N /D
At e a c h ite ra tio n , a c o n sta n t Ri is c h o sen to m u ltip ly b o th n u m e ra to r an d
d e n o m in a to r so a fte r K -iterations:
The sequence $/Z*J is c h o s e n so t h a t th e d e n o m in a to r converges to unity.
Hence, th e n u m e ra to r converges to th e desired quotient.
C hen's g e n e ra liz a tio n o p e ra te s sim iliarily a n d is b a s e d on th e co
tra n sfo rm a tio n of a n u m b e r p a ir (x . y ) su c h th a t som e fu n ctio n F ( x ,y ) is
in v arian t (e.g. above (x ,y ) = ( N ,D ) and F ( x , y ) = x / y ) . The ite ra tio n s or
tra n sfo rm atio n s a re m a d e to p ro c e e d in a m a n n e r driving 2 to a known
value x u so y converges to th e corresponding y u w hich is th e d esired
re su lt.
In o rd e r to ev alu ate a fu n ctio n z 0 = f ( x ) jz = x in tro d u c e a variable
y to form th e co n v erg en ce fu n ctio n F { x ,y ) satisfying
(1) th e r e ex ists a know n in itiatio n value y = y 0 su ch th a t
F (x 0 ,y0) = z 0
(2) th e r e ex ists a co n venient tra n sfo rm a tio n of {xk ,yk ) into
(xk+1y k+l) s u c h t h a t F (x k+1,yk+1) is in v arian t V k 5:0.
(3) a known d e s tin a tio n value x u is re a c h e d th ro u g h th e sequence of
x -tra n sfo rm a tio n s an d th e resulting -tra n sfo rm a tio n s converge
to y = y u = F { x a,y a) = z0.
A g e o m e tric in te r p r e ta tio n of th ese conditions is given in Figure 4.9.
The function F ( x ,y ) is c o n s tra in e d to lie in th e z = z 0 plan e of a th re e
dim ensional cube w hich h a s P0 = (x0 ,y0 ,z0) as one v e rte x . The invariant
tra n sfo rm a tio n im plies t h a t a t e a c h iteratio n , th e p o in t Pk = {xk ,yk ,zk )
lies on th e curve F ( x ,y ) , i.e.,
= F (x 0,y0) = F ( x ,y ) = • • • = F (xk .yk ) = • • • = F { x u,y u) = ^
F u rth e rm o re , th e c u rv e F m u s t pass th ro u g h th e p o in t Q (xu,y u,z0)
- 113 -
as a co nsequence of th e th ird co n d itio n above.
The tra n s fo rm a tio n ru le tak in g Pk to Pk +i involves th e se le c tio n of a
p a ir of functions, tp and ip s u c h th a t
»
= V (x k.Vk)
Vk+i = fiZk-Vk)
Som e exam ples m il now b e given.
4.5.1 E xam ples of th e C onvergence C om putation Technique:
C onsider th e co m p u ta tio n of f ( x ) = w e z fo r 0 ^ x < In 2. Let
F ( x ,y ) = y e x w ith in itiatio n value y 0 = w and d e stin a tio n x a = 0. The
tra n s fo rm a tio n ru le s are
z k+i = <p(xk ,Vk) = x k - Id. ak

yk+i = (Zk.Vk) = Vk u-k
Then c le a rly
F (x k+1,yk+1) = y k+ie*k+1 = y k eXk = F {xk ,yk )
is invariant.
The a lg o rith m is te m in a te d w hen x -»0. As w ith th e CORDIC
algorithm s, ak is ch o sen in s u c h a way as to rep la ce m ultip licatio n s w ith
shift an d a d d o p eratio n s, i.e., choose a* = 1 + 2~m w here th e se le c tio n of
’to ' is d e ta ile d in [C h7l] (Notice th e sim ilarity of ak to 6k in th e CORDIC
algorithm ).
A dditional exam ples are:
- 114 -
L o g arith m :
f(x) = w + ln x fo r 1 / 2 < x < 1
F { x ,y ) — y + In x
in itia tio n y 0 = w d e stin a tio n x u = 1
^Jfc+l — z k ak
T ransf o rm a tio n
Vk+1 = Vk - l n a i
R atio Algorithm :
f (x ) = w / x for l / 2 s = x < l
F { x ,y ) = i / / z
in itia tio n y g = w d e stin a tio n x u = 1
/
T ra n sfo rm a tio n >„ _
I V jt+ i = V k ^ k
In verse Square Root:
f (x ) = w / V x fo r l / 4 < z < i
F{x >y) ~ y / ^ x
in itia tio n y 0 = w d e stin a tio n x a — 1
**+i = x k<*k
T ransf o rm atio n
?/jfc+l - y k ak
Notice if w = x th e n VET is o b tain ed as th e re su lt.
- 115 -
4 .5 .2 H yb rid C onvergence C o m p u tatio n
As w ith th e h y b rid CORDIC a lg o rith m s of S ectio n 4.3, i t is possible to
derive h y b rid convergence c o m p u ta tio n sch em es. Since th e se dep en d on
th e individual tra n s fo rm a tio n eq u atio n s of th e CCM fo r th e p a rtic u la r
fu n c tio n d e sire d , th is se ctio n will illu s tra te th e h y b rid c o n c ep t w ith an
exam ple.
C onsider th e ra tio algorithm w hich is defined fo r 1 / 2 ^ x < 1. By
choosing ak = 1 + 2-m w here m is th e p o sitio n n u m b e r of th e leading 1-bit in
|1 —x k \, th e division o p e ra tio n re q u ire s a n e x p e c te d N / 4 ite ra tio n s for
N + 1 b it q u a n titie s [C h7l]. The d e stin a tio n , x* -» 1, could be re a c h e d in
a single ite r a tio n if Oj, = 1 / x a w ere looked u p in a ta b le (ROM) (This will be
r e f e r r e d to as th e s to re d ta b le ap p ro a c h ). However, th e a m o u n t of sto ra g e
is pro h ib itiv e even for 16 o r 24 b it p re c isio n since th e r e a re 2 15 or 2s3
d is tin c t values of x satisfying 1 / 2 ^ x < 1. Now assu m e th a t th e sto ra g e
re q u ire m e n t is re d u c e d by quantizing th e in te rv a l 1 / 2 ^ x < 1 m u ch
m o re co arsely , e.g., to 2Q quantities. F u rth e rm o re , a ssu m e th a t a fixed
p o in t m u ltip lie r, w ith m ultiply tim e Ta , is available for scaling by a*. (since
is no lo n g e r of sim ple form ). T hen a h y b rid CCM sc h em e involving tab le
lookup is:
S te p 1: F ro m xOJ look up a n a , in th e ta b le w hich is c lo se st to
1 / x 0.
S te p 2: C alculate
X i = XgOo
V i = v 5o=
S te p 3: C alculate m = po sitio n of leading 1 b it in j l - X j l
- 116 -
S te p 4: Continue w ith n o rm a l CCM tra n sfo rm a tio n un til co n v e rg e n c e is
achieved.
The h y b rid CCM essen tially u s e s ta b le lookup to g e t close to th e final
r e s u lt a n d th e n refines th e p re c is io n of th e re su lt via th e CCM re c u rsio n s,
h e n c e realizing th e m ajo r adv an tag e t h a t fewer ite ra tio n s a re re q u ire d to
achieve th e final resu lt, th a n th e s ta n d a rd CCM, while m u ch less m e m o ry is
n e c e s s a ry th a n th e s to re d ta b le ap p ro ach . An additional fixed poin t
m u ltip lie r in c re a se s th e am o u n t of hardw are, over th e s ta n d a r d CCM
ap p ro a c h . However, since th e m u ltip lie r allows rem oval of th e re s tric tio n
th a t ak = 1 + Z~m , as in th e u su al CCM, som e fu rth e r adv an tag e is likely
a tta in a b le in ste p 4.
In o rd e r to analyze th is m eth o d , le t th e ROM contain values of Og arising
fro m a lin e a r quan tizatio n of th e in te rv a l [1 /2 , l) to 2® levels. L et
S jj = sto rag e re q u ire d by h y b rid schem e (in words)
S x = sto rag e re q u ire d by th e ta b le lookup m eth o d
Tji = m u ltip lie r tim e
Tc = tim e for one u su a l CCM ite ra tio n
E c = e x p e cte d ex e cu tio n tim e fo r usual CCM
Eh = e x p e cte d ex e cu tio n tim e fo r th e h ybrid sch em e
E t - e x p e cte d e x e cu tio n tim e fo r th e tab le lookup m e th o d
jV t ! = n u m b er of b its in n u m b e r re p re s e n ta tio n
Then
- 117 -
^ r - = 2 Q~N (4.23)
Of
NTr
Ec - ~ z — (since N /4 ite ra tio n s ) (4.24)
Now, for som e x„, th e re trie v e d a*, will differ from ; a n d th e

xo
m axim um d eviation will d e te rm in e th e n u m b e r of u su al CCM ite ra tio n s
re q u ire d in s te p 4 above. It is re a so n a b le to e stim a te th e m axim um
d iscrep an cy a s o c c u rrin g w hen x0 falls b e tw e en two ta b le en tries, I x an d / 2,
which are:
h = x 0 - 2-((?+2>
J 2 = x 0 + 2 - « t8>
w ith c o rresp o n d in g values of a0, d e n o te d a,,1, a 2
a,,1 = ------------------
^ x 0 - z-w+Q
2_ 1
a° ~ x 0 +
Hence:
1_ _____
x ° a° ~ x 0 - 2 _w+2)
1
- ^ 2-C9«)
1 --------------
*0
< i _ 2~(G+2) ô r x ° e 2, 1)
n i + 2~(9+2)
cLTid.
x 0 a 2 < 1 - 2_(E?+2)
i.e., $ b its of p re c is o n a re obtained in th e first two steps.
- '118 -
T herefore, s te p 4 m u s t a c co u n t fo r th e re m a in in g N — Q b its of
N - Q
p recision, req u irin g a n e x p e c te d — CCM ite ra tio n s .
R elative sp e ed s of th e h ybrid CCM w ith one m u ltip lie r and th e s to re d
ta b ie a p p ro a c h e s a re c o m p a re d th ro u g h th e ratio :
Ej Tu
= 4 + {N (4.25)
E quations (4.23) a n d (4.25) su m m arize th e re la tiv e p e rfo rm a n c e of th e
two sch em es. M em ory a c c e ss tim e has b e e n ig n o red in th is sim ple analysis.
While th is is re a so n a b le fo r th e h ybrid CCM, it gives o p tim istic re s u lts for th e
s to re d ta b le a p p ro a c h w hich has a larg e m em o ry a n d hence, a slow er
JSp S ff
access. N otice t h a t while depends lin e a rly on N — Q, —— exhibits an
hr Ex
ex p o n en tial d e p e n d en c e . T herefore, a n e x p o n en tia l re d u c tio n in sto ra g e
re q u ire m e n ts (th ro u g h th e use of th e h y b rid CCM) in c u rs only a lin ea r
re d u c tio n in speed! R ecall th a t th is was also th e c a s e w ith th e h y b rid
CORDIC a lg o rith m s .
S im ilar h y b rid sc h e m e s m ay be o b ta in e d fo r th e o th e r CCM functions.
While q u a n tiz a tio n of th e d om ain of th e fu n c tio n h a s only b e e n considered,
th e r e m ay b e a n adv an tag e to quantizing th e ra n g e in ste a d . F u rth e rm o re ,
optim al q u a n tiz a tio n h a s n o t b e e n co n sid ered , a lth o u g h th e p re s e n t lin e a r
sch em e is e x p e c te d to b e quite efficient.
Finally, n o te t h a t a Taylor se rie s ty p e h y b rid sc h em e could also be
developed fo r m a n y of th e CCM functions in m u c h th e sam e m a n n e r as
p re s e n te d in S e c tio n 4.3.2.
- 119 -
4.5.3 H ardw are Im p le m e n ta tio n
C hen [Ch.71] has also pro p o sed a m ac h in e a rc h ite c tu re for
im plem enting th is algorithm . This is shown in F igure 4.10. Notice th a t
m u ltiplications have b e e n avoided by choosing all sc ale fa c to rs to be in te g ra l
pow ers of th e m a c h in e radix. As w ith th e CORDIC's, th e m a jo r a rith m e tic
co m p o n en ts a re , th e re fo re , a sh ifte r and a n a d d e r. R ecall th a t in th e
CORDIC algorithm s, th e values of Jdij w ere c h o s e n to b e pow ers of th e
m ac h in e radix, n e c e s s i ta ti n g sto rag e of th e co rre sp o n d in g angles a t in a
m em ory. Sim ilarily, b y choosing a* = 1 + 2-m h e re , a m e m o ry is req u ire d
to s to re th e v alu es of In ak .
The o p e ra tio n of Chen’s m achine is q u ite sim ple. Consider, fo r
exam ple, th e a lg o rith m of S ection 4.5.1 for co m puting w e x , whose
tra n s fo rm a tio n ru le s are:
**+i = P fa b .V*) = ** - l n t i f c
= if {xk'Vk) = Vk a-k
an d ak h a s th e fo rm 1 + 2~m. The y -tra n sfo rm a tio n is easily c a lc u la te d by
placing y k in th e T re g is te r a n d its scaled value in th e U re g is te r. Adding
th e two yields y k+i w hich is p la c e d b a c k into th e F -re g iste r. Sim ilarily, th e
value of z k is p la c e d in T while th e value of a*, is u s e d to d e te r m ine —In ak
from th e m em o ry . Adding yields x k+1 w hich is p la c e d in th e X re g is te r, th u s
com pleting a n ite ra tio n .
4 .6 RELaXIONSxilF BETWEEN THE CORDIC AND CONVERGENCE COMPUTATION
ALGORITHMS
In his original tre a tis e . C hen[C h7l] n o te s t h a t h is convergence
c o m p u ta tio n te c h n iq u e differs fu n d am en tally fro m o th e r algorithm s
- 12C -
To term ination
m < m• algorithm
Scratchpad
Captions
Memory
m < m
lX): Contents
o f register X
Shifter
C(m): Contents C(m) * - In H + 2~m)
o f memory
location m
x 01 y
C (m)
ADDER
Figure 4 .1 0 : A Machine A r c h ite c tu r e f o r th e CCM
- 121 -
including CORDIC. However, it is possible to show t h a t a g en eralizatio n of
th e CCM includes th e CORDIC equations as a sp e c ia l case. This n ot only
d e m o n stra te s t h a t th e two algorithm s a re in fa c t in tim a te ly linked, b u t also
provides th e ab ility to include v e c to r ro ta tio n a n d p o la r/re c ta n g u la r
coordinate co nversion in Chen’s m ethod, h e n c e, realizing a unified
s tru c tu re for im p lem e n tin g all of th e s e functions.
It will be c o n v en ien t to d e fe r discussion of th e g e n e ra liz e d s tru c tu re of
Chen's tech n iq u e u n til a fte r th e connection b e tw e e n th e two algorithm s has
b e e n d e m o n s tra te d by a n a lte rn a te m eans. The m a in in te r e s t h e re will be
th e CORDIC alg o rith m s fo r to = ± i since m u ltip lic a tio n an d division (i.e.
to = 0) a re in fa c t th e basis of th e convergence c o m p u ta tio n technique and
hence, obviously co m p u tab le.
C onsider th e co n v erg en ce c o m p u tatio n alg o rith m for th e exponential
given by:
/ (x ) = -wex
F ( x ,y ) = y e s
x k+l ~ xk ~ ^ - ak
. Vk +1 = V k°k
Suppose th e q u a n titie s a re all com plex, i.e., le t
x =
and
I I JOfc
Q* = |Ofc|e
Then
F ty -V ) - y e 5*
and th e x tra n s fo rm a tio n becom es:
- 122 -
Jtffe+i = - lnofc = j { 4 k - a t ) - In |ofc|

or
&k+i = (v t - a t ) - j l n | ctfc 1 (4.26)
It now follows th at:
V t+ 1 = Vk^k = V k \ O k \ e 3 *k
so th e tra n s fo rm a tio n p a ir is th e re fo re
~ ext + jf In 10 * |
(4.27)
y<fc+1 = % l“fc|e7“fc = y k ak
N otice t h a t &k - ak is re p re s e n ta tiv e of a r o ta tio n th ro u g h ak , th e
a rg u m e n t of ak . It is stra ig h tfo rw a rd to verify th a t F{i}.y) is invariant,
since:
-FOSjt+i.afe+i) = yk+ie3***1 = Vk°-k^k_lno*

-
—
...
SfKw
J*k
= F fa .y k )
T hese tra n sfo rm atio n s m ay now be u se d to o b tain th e CORDIC
ite ra tio n s . L et
Vk = fit + k [Pk < k \T k X k (4.26)
so th a t th e com plex valued 'yk ' tra n s fo rm a tio n is in fa c t two r e a l valued
tra n s fo rm a tio n s of ’{3k ’ and '(*'• F u rth e rm o re , since th e choice of ak is
a rb itra ry , le t
Ok = 1 4 j 6 k (4.29)
w here, 6k c a n be ch o sen to be a pow er of th e m ach in e rad ix so scaling by
ak c o rre sp o n d s to a sh ift a n d add.
Now su b s titu tin g (4.28), (4.29) in to (4.27) yields th e Xk u p d a te of th e
- 123 -
CORDIC algorithm , i.e.
1 -<5*
X* (4.30)
fi* 1
where
6k = ta n (Xk
Taking th e re a l p a r t of (4.26) yields th e auxilliary z u p d a te , i.e.,
z k = Re (•#*)
so th a t
z k + 1 = Re(-djfc) - ak
= z k - ak (4.31)
Equations (4.30) a n d (4.31) a re th e CORDIC alg o rith m in th e c irc u la r case.
The c o n n e c tio n b e tw e en th e CCM a n d c irc u la r CORDIC is com pleted
th ro u g h th e d e riv a tio n of th e spurious scale fac to r, K lt m en tio n e d in S ection
4.1 of th is c h a p te r. With in itia l values y 0 = w , i?0 = v a n d d estin a tio n
Re(i5fc) -* 0, (w hich c o rre so n d s to th e te rm in a tio n co n dition x n -» 0 in
th e re a l valued w e x alg o rith m ), th e d e stin a tio n value of F { x ,y ) is:
F (tfu,y u) = y ue _Sln|ail = F (i\ , y 0) = we>*

hence.
_ Elnlai.1
Vw = e *
n i° * i w e
*=0
Now since w is a com plex quantity, th e resu ltin g y w c o rre sp o n d s to a
ro ta tio n of th e w v e c to r th ro u g h an angle •£, how ever w ith a spurious
scale change. The sc ale fa c to r for th e c h o sen sequence, [ak = 1 + j 6 k \.
- 124 -
becom es:
* = H 1°*i = n viTî = k x (4.32)
This is p rec isely th e Ki of E q u a tio n 4.5b. F u rth e rm o re , since
Oj. = l + j 6 k = \ak \e i<Xk, th e a rg u m e n t of a* is
ak = ta n 16ifc (4.33)
an d a g a in th is is p recisely th e in c re m e n ta l ro ta tio n a t e a ch ite r a tio n of th e
CORDIC algorithm .
Remarks:
(1) Since th e s e equations w ere d e riv e d b a s e d on th e exponential
a lg o rith m of th e convergence c o m p u ta tio n m ethod, th e d e stin a tio n
x a -* 0 (which b ecom es -> 0) n a tu ra lly yields th e ro ta tio n m ode
of th e CORDIC m ethod. The v e c to rin g m ode m ay be read ily
o b tain e d by choosing ak su c h t h a t X* of (4.28) is driven to th e
abcissa.
(2) The in v arian t fu nctional in th is c a se is sim ply th e original v e c to r
to g e th e r w ith a com plex p h a se fa c to r w hich "reverses" th e effect
of e a c h in c re m e n ta l ro ta tio n . The im ag in ary p a r t of th e angle
(4.26) "re ve rse s" th e effect of m ag n itu d e scaling a t e a c h ite ra tio n ,
th u s leaving th e original v e c to r in v arian t.
The hyperbolic a n d c irc u la r sy ste m s a re in tim a tely c o n n e c te d th ro u g h
th e a n g u la r conversion
At = (4.34)
Making th is su b s titu tio n in to (4.33) yields th e identity:
6k = ta n ak = —ta n h jnk (4.35)
- 125 -
w hich re d u c e s E quation 4.30 to:
1 - t a n h fj.k
Nk+i (4.36)
ta n h /.i k 1
However, 6k = t anh jik in th e hyperbolic sy ste m (th e n o ta tio n is u n fo rtu n a te
since th e sa m e symbol, 5k , is u sed to r e p r e s e n t d iffe re n t qu an tities in
W aither's original p a p e r) so th a t (4.34) beco m es th e u su a l CORDIC ite ra tio n
for 77i = —1:
1 S k
V: + l - (4.37)
<5* 1
Turning now to th e derivation of K - lt re c a ll from E q u atio n 4.32
Jfc=0
o-l
= V l + ta n zak by definition
k=C
- fjb=C
i (Xfg
w hen (ik = j a k
a—i 1
= n — -—
it=o co sh (ijf
O-l
= n V l - tan h 2//*
*=o
= f i 61 = K . x (4.3B)
ik=0
(recall th a t th is la tte r 6k is defined differently th a n fo r th e c irc u la r system )
Sum m arizing, th e convergence c o m p u ta tio n tra n s fo rm a tio n s for
y k +i a re p re c ise ly th e CORDIC equations u n d e r th e analogy th a t
- 126 -
Vk = X i
Re(tfjt) = z k
I m ^ ) = -InAi
The te rm in a tio n co n dition Re(i5-k ) •* 0 th e re fo re c o rre sp o n d s to z k -* 0
w hich is th e r o ta tio n m ode of th e CORDIC. It was shown th a t y u in d eed
co rre sp o n d s to a v e c to r ro tation.
In som e se n se , Chen’s m ethod is m o re g e n e ra l th a n th e CORDIC
tech n iq u e sin ce ~$k is an au g m en ted v ersion of zk . I t is stra ig h tfo rw ard to
derive th e CORDIC equations fro m th e convergence c o m p u ta tio n
tra n sfo rm a tio n s b u t n o t vice versa. Note th a t th e CORDIC alg o rith m yiei ’. j
som e ad d itio n al in sig h t into its m ore g e n e ra l c o u n te rp a rt.
4 .7 A GENERALIZED CONVERGENCE COMPUTATION METHOD

AND THE CORDIC CONNECTION
The previous se ctio n estab lish ed th e CORDIC alg o rith m s as a special
case of a slightly generalized m eth o d of Chen, nam ely, th e inclusion of
com plex valued functions. However, t h a t a p p ro a c h was c u m b e rso m e since it
re q u ire d th e in clu sio n of a com plex angle, whose im ag in ary p a r t
m ain ta in e d th e in v arian ce of F(i>,y). The convergence c o m p u ta tio n m eth o d
will now be f u r th e r generalized to v e c to r valued functions, th u s
circum venting th e n e e d for com plex q uantities. This a p p ro a c h will also
re la x th e in v arian ce re q u ire m e n t on F t y ,y ) .
Suppose t h a t it is d esired to c o m p u te a v e c to r valued fu n c tio n of a
s c a la r q u a n tity (as p e r usuai, boldface q u a n titie s are v ectors):
z0 = f (x)\x=Xg
- 127 -
In tro d u c e a v ecto r, Y. a n d form th e v e c to r v alu ed functio n F(x ,Y) su c h
th at:
(1) T here e x ists a knovm initial value Y = Y 0 satisfying F (x 0 ,Y0) = z 0.
(3 ) e x is t0 ? nofrv rm afm r\ Q • {^, V.- ^ V,_ t
s u c h th a t F(xt+1,Yjt+i) is re la te d in a known, invertible m a n n e r to
F o r exam ple
(a) F(xfc+1,Y i+1)= F(rt ,Yjfc) V i.e.. F is invariant.
(b) F(xJk+1,Yfc+1)= cfcF(xfc,Yjt) w ith know n ck i* 0 V jfc>0
(3) T here ex ists a k n own d e stin a tio n x u r e a c h e d u n d e r G, su c h t h a t
x - » x u im plies Y -» Y u = F (x u,Ya) = g ( z „ ) with g ( ) a know n
fu n ctio n having a single valued in v erse. F o r exam ple
(a) F ( x*+1,Y*+1)= F(xfc,Yjfc) V f c 2 i0 t h e n g ( z o) = z o.
(b) F(xi+ 1.Yfc+1)= cfcF(xfc,Yjfc), t h e n g ( z 0) = ck
A lthough th e invariance r e s tric tio n o n F h a s b e e n re la x e d in a v e ry
g e n e ra l m a n n e r, sim ple v a ria tio n s of F, lik e th e exam ples above, will prove
quite useful in p ra c tic e . Sim ple v a ria tio n s will n o t im pose u ndue h a rd sh ip
on th e im p lem en tatio n , w ith r e s p e c t to th e in v ersio n of g . In p a rtic u la r, it
will now b e shown t h a t v a ria tio n (b) in ite m s (2) a n d (3) allows an e x tre m e ly
sim ple c o n n e c tio n to CORDIC's, w ithout th e n e e d fo r com plex v alued angles.
V e c to r ro ta tio n is d e sc rib e d by th e function:
f (x ) = A (x) Q
w here 0 is a know n v e c to r c o rresp o n d in g to in th e CORDIC alg o rith m s and
A is a g e n e ra liz e d r o ta tio n m a trix th a t ta k e s th e form :
CS X —7TL-si X
.si x cs x .
- 128 -
in which:
m is th e p a ra m e te r of th e c o o rd in a te sy stem , defined in S ection
4.1
x is th e g e n eralized angle of th e CORDIC a lg o rith m
cs x , s i x a re th e cosine a n d sine of x in th e g eneralized
co o rd in ate system .
Define:
F (x ,Y ) = A (x )Y
so choosing Y 0 = 0 yields z 0 = F (x 0 ,Y0). th u s satisfying condition (1).
Next, choose th e tra n sfo rm atio n , G: {xk ,Yfc)-»(xjt+1,YJfcx1) as
x k+\ - Xk ~ a k
1 —721
Yjt
6k 1
w ith ak = i n ' 1 6k , w here tn ~ l is a g e n e ra liz e d in v erse ta n g e n t (defined for
exam ple in E quation 4.2 fo r th e CORDIC alg o rith m s) an d 6k is a s e t of
a rb itra ry co n sta n ts, analagous to th e ak in C hen's m ethod.
It is now easy to verify th a t
FCxjb+j.Yjt+i) = ck F(xjfc,Yfc)
with
Cjfc = l / c s CCjfc
since
F ( a r j f c + i,Y j b + j ) — A (x jk +1) Y * + i — A ( x j t ~ a * ) Y jt+ i
1 —m.-5k
= A (xk - a k )
<51 1
- 129 -
CS Ok —771 SI at.
■
cs a*- A fa-otfc) s i Ok c s ak
—A(zfc) Yjt , b y lin e a rity of a rg u m e n t of A ( x )
1 -F fot.Y *)
c s ak
th u s satisfying con d itio n (2b).
Finally, co n d itio n (3b) is satisfied by choosing x u = 0 so th a t:
F (2 d'Y u) = IY U = Y a = Tf
fc*C
w here I is th e 2x2 id e n tity m atrix (This final eq u a tio n assum es th a t cs x = 1
an d s i x = 0 w hen x = 0, in all th e c o o rd in a te sy ste m s u n d e r consideration.
This is c e rta in ly th e c a se fo r th e CORDIC alg o rith m s). The l a t te r p a r t of th e
equality is. of course, th e consequence of ite ra tin g th e Y tran sfo rm atio n s.
R em ark :
N otice t h a t th e tran sfo rm atio n , G , is e x a c tly th e CORDIC equations a n d
in fact, all th e CORDIC functions a re c o m p u te d th ro u g h th e a p p ro p ria te
choice of th e c o o rd in a te system , th a t is, th e choice of ex*, 6k and hence,
cs x a n d s i x . The functions co rresponding to v e cto rin g in th e CORDIC
alg o rith m a re also rea d ily obtained by in te rc h a n g in g th e significance of x
an d Y in co n d itio n s (1) - (3) and forcing Y to a d e stin a tio n value on th e
abcissa. In th is connection, th e tru e g e n e ra lity of th is m ethod is a p p a re n t
since no d istin c tio n is m ad e betw een x a n d Y . R a th e r th a n s ta rtin g w ith a
function, f (x ), of a single variable, sim ply b eg in w ith a function, F (x,Y ), of

two in d e p e n d e n t v ariables, e ith e r of which m ay be con tro lled ( This is
p rec isely w h at th e CORDIC equations do, th e two in d e p e n d e n t v ariab les
- 130 -
being th e v ecto r, X , or th e angle $ ).
Sum m arizing, th e foregoing g en e ra liz a tio n provides a convergence
m eth o d which:
(1) re d u c e s to C hen’s m e th o d in th e sc a le r case, w hen F ( x , y ) is
in v arian t.
(2) provides ad d itio n al usefu l functions, including th e CORDIC
fu n ctio n s, w ithin a single c o m p u ta tio n a l fram ew ork.
To th e a u th o r's know ledge, th e g e n e ra liz e d ro ta tio n s an d v ectoring have
n e v e r b e e n c o m p u te d w ith C hen’s m eth o d before.
4.7.1 E x am p les o f t h e G e n e raliz e d T e c h n iq u e
The CORDIC eq u atio n s have a lre a d y b e e n shown to a rise fro m th e
g e n eralized c o n v erg en ce c o m p u ta tio n m e th o d w hen th e in v arian ce
r e s tric tio n on F is rela x ed . F u rth e r exam ples will now be given, in w hich F
is tru ly in v arian t. In th e s p irit of th e com m e n t ending th e previous section,
th e fu n ctio n z will be ignored, r a t h e r a fu n ctio n F (x ,Y ) will b e .defined
fro m th e o u ts e t a n d e ith e r x or Y m a y be d riv en to a d e stin a tio n value.
N otice th a t th e m a n n e r in w hich th e m e th o d was defined in S ection 4.7 does
n o t r e s tr ic t F fro m being s c a la r o r m a trix valued, a n d in fac t, b o th of th e s e
c a se s a re c o n sid e re d in th e following exam ples:
E xam ple 4 .5 The CORDIC alg o rith m s - th e se w ere developed in d e ta il in th e
previous sectio n .
E xam ple 4 .6 F re q u en tly , it is n e c e s s a ry to divide a quantity, x , by th e
p ro d u c t of two o th e r q u a n titie s £ a n d 77. R a th e r th a n first form ing th e
p ro d u c t £77 a n d th e n using C hen’s m e th o d to o b tain - —; it is p re fe ra b le to
- 131 -
utilize th e g e n e ra liz e d CCM to o b ta in th e re s u lt fro m x , £, 77 d irectly.
Define
Y = (£ , 77) (4.39)
and
F(a.*t,Yjfc) = t ~— (4.40)
SkV t
(Notice th a t £77 c a n b e -w ritten as w here a n d 7t2 a re first an d
seco n d co m p o n en t e x tra c tio n m a tric e s, resp ectiv ely . This, m o re
cu m b erso m e n o ta tio n is ig n o re d h e re .)
The re s u lt F 0 (x0 Y 0) = -r-2— is desired .

SoVo
Choose th e tra n sfo rm a tio n s:
**+i = ** 0 * 6 * (4.41a)
Ifc+i = £*&k (4.41b)
Vk+1 = V kak (4.41c)
u n d e r which F is c le a rly in v arian t.
Now, w ith d e s tin a tio n value Y u = (1,1) is is c le a r fro m (4.41b) an d
(4.41c) th a t
!if * = •/o
*=c
“ri‘ * = jso-
*=0
SO
u-1 a-l X0
*« = *0*=0
n n bi
1=0
= ttt
So Vo
is th e d e sire d re su lt.
The se q u en c e s [ak ] and a re b o th ch o sen to have th e form
- 132 -
1 + 2~m so th a t th e tra n sfo rm a tio n s m ay be im p lem e n te d w ith sim ply sh ifts
an d adds.
E xam ple 4 .7 The g e n e ra liz e d CCM c a n b e u se d to c o m p u te x 0 + l n | 07?0
and x 0 + In £0/ r , 0 w ithout explicitly com puting £07j0 or Vo (o r
altern ativ ely , w ithout w riting In ^sv s = In + ln 770 an d th e n applying
Chen’s m e th o d tw ice to o b ta in two s e p a ra te lo g a r ith m s ). Consider
evaluating
x 0 + In %07}o
b y defining th e functio n
F (xk , Y k ) = xk + In f* Vk
w ith Y as given by (4.39). Then th e tra n sfo rm atio n s
fjfc+l - IfcOfc (4.42a)

= Vkbk (4.42b)
x k — x k ~ In Ok — In bk (4.42c)
leave F invariant.
Again choosing Y u = (1,1) yields
so
In ak — In bk
k=C
= x 0 + In £0 7]0
- 133 -
As w ith th e CCM, ak a n d bk a re ch o sen to be of th e fo rm l+2~m and
a sm all ta b le of In ( l+ 2 -m ) is m aintained.
E xam ple 4.8: Define
Ftafc.Yfc) = x k + I n k /? ? *
The F -invariant tra n s fo rm a tio n s a re
$/fc+l = ak
x k +i — x k In otju
Vk-r i — Vk — Vo
Choosing th e d e stin a tio n Y u = {Vo-Vo) yields
CJ—1
11°*= V o /io
k=o
and
X u — Xg "F (?o / Vo')
Notice th a t choosing Y u = (Vo-Vo) r a th e r th a n Y ^ = (1,1) as in
Exam ple 4.7, allows fo r s im p le r tra n sfo rm a tio n rules. However th e p enalty
in c u rre d for th is sim plification is th a t \ak j m u st be ch o sen su ch th a t Vo is
a re a c h a b le value of which im plies less freedom in th e choice of th a t
sequence.
E xam ple 4.9: M atrix Inversion
Obtaining th e inverse of nonsingular m a tric e s fits th e g en eralized CCM
s tr u c tu r e very well. In th is case, all of x , Y and F a re m a trix valued and
wiii be d en o ted X. Y a n d F resp ectiv ely . Consider th e g e n eral function
- 134 -
F(X*,Y*) = Yt- ‘ Xt
Choose a s e t of known, e le m e n ta ry elim in ato r m a tric e s, ] fo r th e
tra n s fo rm a tio n s :
Xfc+i - C k X k (4.43a)
Y*+1 = Cjt Yjt (4.43b)
It is rea d ily s e e n th a t F is in v arian t u n d e r (4.43) b e c au se
F(Xjt+i, Yfc+1) = Y & X t + i
= Y * 1 Cfc1 Cfc Xk = F(Xfc.Y*)
Now choosing th e d e stin a tio n Y a = I it is c le a r th a t
n 1. c* = Y r 1
Jk=0
w here JJ* d e n o te s left m a trix p ro d u ct. Then
xu = Jt=0 CfcXo = Yo'1^

If X0 = I th e n Xu = Y 0- .
K em ark : This m ay be viewed as th e m a trix c o u n te rp a rt of
c onvergence division.
The ch o ice of is n o t a trivial m a tte r since th e m a tric e s m u st be of
sim ple fo rm in o rd e r to m ake th is schem e p rac tic a l. A sim ple 2x2 m a trix
exam ple provides som e guidelines for a n obvious choice of Ct (b u t p robably
n o t th e b e st).
L et Y„ = y u V iz and le t C* = I + C* w ith C* = 0 0
Vzi Vzz 0 2~m
Apply th e se q u en c e of su c h C* tra n s fo rm a tio n , say 'n' in n u m b er, to
o b ta in
Y = 3/n V iz
n y'zi y'zz
fi n
N ext apply C„ = ^ to o b tain
_ y 11 v i z
Now choose a seq u en ce = I + C*$*=71+1 w hich will yield
y - [y u y 12
Y" " 1 " 0 y-22
a n d apply
rU"-2 -~ f1 _1'
[0 1 .
to diagonalize Y u.
All of th e se tra n sfo rm a tio n s a re e a sily im p le m e n te d w ith sh ifts an d
ad d itio n s alone, an d r e s u lt in Xu w hich is clo sely re la te d to Y-T1 by known
c o n sta n ts.
E xam ple 4.10: Solution of a lin e a r S ystem of E quations
C onsider th e s y s te m A0x = b 0 to be solved fo r z . Define
F(Ajfc, b*) = AjfcX — b t = 0
a n d th e tra n s fo rm a tio n s
Ajt+i - CfcA* (4.44a)
b«e+l - Cfcb* (4.44b)
w ith C* as defined in th e previous exam ple. Then F is in v a ria n t since:
- 136 -
^’(Ai+i, b fc+i) - Cfc(Ajfcx —b ,t) - 0 - F(Ajfc,bi;)
N ext choose th e d e stin a tio n Au = I yielding (from 4.44a)
n 1. c * = a^1 .
fc=0
Then (4.44b) b ecom es
bu = Cfcb0
fc=0
= A0- 1b 0
= x
T herefore b a converges to th e so lu tio n v e c to r x.
R em ark : It is also possible to have A, b converge to a value from
w hich x is rea d ily a tta in a b le . A sim ple exam ple of th is is w hen x c a n be
o b tain ed by b ack -su b stitu tio n . In th is case, th e d e stin a tio n value of A is
A„ = U w here U is any u p p e r tria n g u la r form . Thus fro m (4.44a)
U = T f - CfcA0
t=c
an d (4.44b) b eco m es
b u = U A " 1^
= Lb0
w here L is a n u p p e r tria n g u la r form . By v irtu e of th e invariance of F,
Fu = 0 = U x — L b0 = U x - bu
w hich ca n be solved by b a c k su b stitu tio n . In th is case, th e ch o sen sequence
- 137 -
of e lim in a to r m a tric e s provides a lower- u p p e r tria n g u la r decom position of
A. An orth o g o n al decom position m ay b e rea liz e d by choosing orthogonal
elim in ato r m a tric e s, in p a rtic u la r th e CORDIC m a tric e s.
T here a re n u m ero u s o th e r functions of m ultiple a rg u m e n ts which ca n
b e co m p u te d using th e generalized CCM s tr u c tu r e . The key to th e p ra c tic a l
u tility of th e m e th o d is, of course, th e e x iste n ce of c o n v en ie n t
tra n sfo rm a tio n s w hich allow F to v a ry in a known, invertible m anner.
However, th e th e o re tic a l im p o rta n c e of th e g e n eralized CCM lies in providing
a unified s tr u c tu r e u n d e r which th e c o m p u ta tio n of m any seem ingly
u n re la te d fu n ctio n s c a n be studied.
This c h a p te r h a s b e e n som ew hat le n g th y a n d m e rits a su m m ary of th e
im p o rta n t developm ents. The m otivation fo r studying n u m erica l algorithm s
a ro se fro m th e n e e d to c a lc u la te a v a rie ty of e le m e n ta ry functions
ap p earin g in th e signal p ro cessin g alg o rith m s of th e previous c h a p te rs, fo r
w hich th e CORDIC an d Convergence C om putation M ethods (CCM) ap p e are d to
b e prom ising tech n iq u es. It was n o te d t h a t th e convergence p ro p ertie s of
th e CORDIC algorithm s w ere n o t a d e q u a te owing to lim ited regions of
c onvergence an d th e existence of sp u rio u s scale facto rs. C ircum venting
th e s e p ro b le m s w ith ideas appearing in th e lite ra tu re [Wa7l] [HT80]
in c u rre d la rg e a m o u n ts of h ardw are an d s p e e d overhead, in fa c t as larg e as
a 120% s p e e d penalty. However, it was shown t h a t th ro u g h h ardw are
sharing, th e scaling technique of Haviland e t al. [HT80] could b e realized in
hard w are w ith m inim al speed p e n a lty a n d only a m o d est in c re a se in
circu itry .
- 138 -
The m o st a ttra c tiv e technique fo r b o th scaling and in creasing th e
reg io n of convergence, was b ased on a new m e th o d which utilized only th e
re g u la r CORDIC ite ra tio n s. It was shown t h a t th e convergence p ro p e rtie s
could be c o n tro lle d th ro u g h th e a p p ro p ria te s e le c tio n of th e sequence, \Fi],
an d a m e th o d w as given to g e n e ra te good se q u en c e s which w ere g u a ra n te e d
to sa tisfy th e convergence c rite rio n of th e CORDIC algorithm s. The effects of
th is new m e th o d o n angular reso lu tio n w ere also studied.
CORDIC a lg o rith m s ten d to be rela tiv e ly slow due to th e ir ite ra tiv e
n a tu re ; in fac t, th e y g e n e ra te one bit equivalent p rec isio n p e r ite ra tio n . A
new sc h em e know n as th e H ybrid CORDIC m e th o d was developed which
com bined th e ad v an tag es of tab le lookup an d th e CORDIC algorithm s. This
sc h em e show ed im proved sp eed p e rfo rm a n c e w ith only m o d est am o u n ts of
sto ra g e . When c o m p a re d w ith a c o m p le te ly ta b le lookup ap p roach, th e
h y b rid CORDIC m e th o d realizes an ex p o n en tial re d u c tio n in sto ra g e w ith
only a lin e a r in c re a s e in ex ecu tio n tim e . A T aylor series app ro x im atio n
a p p ro a c h to h y b rid CORDICs provided eq u iv alen t sp e ed p erfo rm an ce w ithout
a n y ad d itio n a l s to r a g e .
F loating p o in t CORDIC (FLORDIC) a lg o rith m s were developed, b a se d
e n tire ly on floating po in t operations. These w ere conceptually sim p ler to
im p lem e n t th a n th e ir fixed point c o u n te rp a rts b e c a u se th e n e e d fo r a larg e
s h ifte r was e lim in ated . It was also show n t h a t floating point calculations
could b e p e rfo rm e d w ith fixed point CORDIC algorithm s. Floating p o in t
r e p re s e n ta tio n s provide g u a ra n te e s on th e convergence p ro p e rtie s of th e
alg o rith m s owing to th e lim ited dynam ic ra n g e of th e m an tissa.
The C onvergence C om putation M ethod was shown to be in tim a te ly
r e la te d to th e CORDIC algorithm s. By gen eralizin g th e CCM to v e c to r valued
fu n ctionals a n d relaxing th e invariance c o n s tra in t, a very sim ple deriv atio n
- 139 -
of th e CORDIC eq u atio n s was obtained. H ence, a unified s e t of equations
(and th e re fo re , also a single c o m p u ta tio n s tr u c tu r e ) provided m any
functions of in te r e s t. Hew functions in clu ding m a trix o p eratio n s ca n b e
co m p u te d w ith th e g e n eralized CCM. Finally, th e CCM was also s e e n to be
conducive to a h y b rid a rc h ite c tu re .
- 140 -
APPENDIX
I t is n e c c a s s a ry to prove th a t th e auxilliary e q u atio n of th e CORDIC
alg o rith m w hen defined as:
*i+i = z-; - ejUiOi
provides th e functions of F igure 4.2 w hen £ = 1 while s = —1 re s u lts in th e
re v e rs e d sig n fun ctio n s of F igure 4.3.
W hen £ = 1, th e n o rm al auxilliary e q u a tio n a s defined in [Vo59] is
o b tain e d so th e functions of F igure 4.2 Eire g e n e ra te d . Now, le t th e initial
value of z be d e n o te d 'z 0'. When £ = —1 , z n -» 0 im plies:
n
« = £ = -* 0 (A.l)
i=0
W alther [W a7l] showed th a t th e so lu tio n to th e CORDIC difference equations
(i.e. E q u atio n 4.3) is:
x t. - K [ x o cos(V7rT a ) 3 V m sin (V m a) ]
— /0
yn - K[ y 0 c o s (V m a ) + x 0^fm. sin (V m a) ]
w here x Q a n d y 0 a re th e initial values of x a n d y respectively. S u b stitu tin g
(A.1) yields:
x n = K [ x g co s(—VnTzo) - y 0>fm. s in ( - V m z 0) ]
= K[ x 0 co s(V rn z0) + t/0 V m sin (V m z 0 ) ]
y n = K[ y 0 c o s + 2 ,% /m s in ( - V m z 0) ]
= K[ y g c o s(V m 2 0) - x 0\ f m sin (V m z 0) ]
S u b stitu tin g in th e various values of m y ield s th e fu n ctio n s of Figure 4.3.
- 141 -
BIBLIOGRAPHY
[Ah82] H. Ahmed, "N um erical T echniques fo r th e ESL Boundary Cell,"
in te r n a l report, ESL Inc., S an Jose, CA, 1982.
[AMI79] A m erican M icrosystem s Inc., S ig n a l Processing Peripheral
R e fe re n c e M anual, 1979.
[AMLABl] H.M. Ahmed. M. Morf, D.T.L. Lee a n d P.H. Ang. "A VLSI S peech
A nalysis Chip S et B ased on Square-R oot N orm alized L adder
F orm s," Proc. 1981 ICASSP, A tlanta, GA, Mar.-Apr. 1981, pp. 648-
653.
[BellBl] B ell S ystem . Technical Journal, Vol. 60, No. 7, p a r t 2, S ep tem b er,
1981 (e n tire issue)
[CET62] D. C antor, G. E strin, R. Turn, "L ogarithm ic and Exponential
F u n c tio n E valuation in a V ariable S tru c tu re Digital C om puter,"
IR E Trans, on E lectronic C om puters, Vol. EC-14, 1965, pp. 85-86.
[C h7l] T.C. Chen, "A utom atic C o m p u tatio n of Exponentials, Logarithm s,
R atios and Square Roots," IB M Journal o f R esearch and
D evelopm ent, July 1972, pp. 380-388.
[DeL70] B. d e Lugish, "A Class of A lgorithm s fo r A utom atic E valuation of
C e rta in E lem en tary F unctions in a B inary C om puter," Technical
R eport No. 399, U n iversity o f Illin o is, Dept, of C om puter
S cience, June, 1970.
- 142 -
[FaBl] M. Farmwald, "On th e D esign of High P erfo rm an ce Digital
A rithm etic Units," Ph.D D issertation, S ta n fo r d U niversity, Dept,
of C om puter Science, 1981.
[Go64] R. Goldschm idt, "A pplications of Division by Convergence," M .S.
D issertation, M assachusetts I n s titu te o f Technology, Dept, of
E le ctric a l Engineering, June 1964.
[HT60] G. Haviland, A, Tuzynski, "A CORDIC A rithm etic P ro c esso r Chip.”
IE E E Trans, on Com puters, Vol. C-29, No. 2, F ebruary, 1980.
[Hw79] K. Hwang, C om puter A rith m etic, P rinciples, A rchitecture a n d
Design, J. Wiley, 1979.
[KNSYM80]Y. Kawakami, T. Nishitani, E. Sugim oto, E. Yam auchi, M. Suzuki,
"A Single-Chip Digital Signal P ro c e sso r for V oiceband
A pplications," Proc. o f In t'l. S o lid S ta te C ircuits Conference, S an
Francisco, CA, 1980
[Me62] J. Meggitt, "Pseudo Division and P seudo M ultiplication
P ro c e sse s,” IBM Journal o f R e se a rc h and D evelopm ent, Vol. 6,
1962, pp. 210-226.
[SK71] B. S a rk ar, E. K rishnam urty, "E conom ic Pseudo-division P ro c e sse s
for Obtaining Square Roots, L ogarithm and A rctan," IE E E
T ransactions on Com puters, Vol. C20, Dec. 1971, pp. 1589-1593
[Sp65] W. S pecker, "A Class of A lgorithm s for l n x , e x p x , sin x , cos x ,
ta n -1 x and c o f-1 x IR E Transactions on E lectronic
Com puters, Vol. EC-14, 1965, p p . 85-86.
- 143 -
[Vo59] J.E. Voider, "The CORDIC T rigonom etric Computing T echnique,”
IR E Trans, on. E lectronic C om puters, Vol. EC-8, No. 3, pp. 330-
334, Sept. 1959.
[W a7l] J.S. W alther, "A Unified A lgorithm for E lem entary F u nctions,"
Proc. o f the 1971 S p rin g J o in t C om puter Conference, p p . 379-
385.
- 144 -
CHAPTER FIVE
PARALLEL PROCESSORS FOR LINEAR ALGEBRA
L in ear a lg e b ra o p eratio n s fo rm a m a jo r co m p o n en t of m any d iffe re n t

<
com plex signal p ro cessin g ta sk s, how ever, m a trix alg eb ra algorithm s a re
th em se lv e s fre q u e n tly quite com plex, re q u irin g fo r instance, a n u m b e r of
o p e ra tio n s w hich is polynom ial in th e m a tr ix o rd e r. As a resu lt, som e
a u th o rs have b e e n p ro m p te d to c o n s tru c t larg e a rra y s o r m e sh e s of
p ro ce ssin g e le m e n ts (see e.g. [SK75] [Ch75] [KL80]). However, all of th e s e
efforts have b e e n b a s e d on th e use of f a s t m u ltip lie rs as th e c e n tra l e le m e n t
of e a c h p ro c e s s o r in th e m esh. Once again, g en eralized ro ta tio n s, in
p a rtic u la r th e CORDIC o p erations, a re fu n d a m e n ta l to a larg e n u m b e r of
alg o rith m s w hich a re com m only u sed to p e rfo rm m a trix o p eratio n s like
fa c to riz a tio n a n d eigenvalue decom position. The p rim e in te r e s t of th is
c h a p te r will b e th e synthesis of larg e a rr a y s of p ro c e sso rs which a re c a p ab le
of exploiting th e in h e re n t p a ra lle lism of a n u m b e r of m a trix a lg e b ra
a lg o rith m s of in te re s t. F o rtu ito u sly a few sim ple s tru c tu re s will be
sufficiently g e n e ra l purpose to im p lem e n t all th e algorithm s co n sid ered .
Individual p ro c e s s o rs of the a rr a y will b e b a s e d o n CORDIC o p erations. By
m atc h in g th e o p e ra tio n s an d th e m esh co n n ectio n s closely to th e
alg o rith m s, m o re efficient s tr u c tu r e s will be realized th a n have b e e n
r e p o r te d in th e lite r a tu r e to d a te .
VLSI s tr u c tu r e s m u st be re g u la r in n a tu r e in o r d e r to m anage th e la rg e
d esig n c o m p le x ity afforded by th e technology. This is p a rticu larily tr u e of
in te rc o n n e c tio n s w hich not only co n su m e th e m ajo rity of chip a re a b u t also
re d u c e o p e ra tio n a l sp e ed due to th e ir c a p ac itiv e loading. T herefore, all
a rr a y a rc h ite c tu r e s to be derived h e re will b e r e s tr ic te d from th e o u ts e t to
- 145 -
have re g u la r s tr u c tu r e s w ith local co m m u n icatio n s p a th s w here p ro cessin g
ele m en ts in th e a rr a y m ay only co m m u n ic a te w ith n e a re s t neighbours. This
re s u lts n a tu ra lly in p ipelined im p lem en tatio n s.
This c h a p te r will c o n sid er th e solution of a sy ste m of lin ear equations,
Cholesky fa c to riz a tio n an d eigenvalue d ec o m p o sitio n [SB80], which a re th e
m o st com m on m a trix a lg e b ra p ro b lem s a risin g in signal processing and
s ta tis tic s am ong o th e r a re a s. A form al m e th o d fo r co n stru c tin g and
analyzing la rg e a rra y s, so as to g u a ra n te e so m e g e n e ra l applicability of th e
s tru c tu re , will b e given.
5.1 CHOLESKY- FACTORIZATION
A m ajo r p ro b le m of in te r e s t in lin e a r le a s t-s q u a re s e stim a tio n is to
o b ta in th e C holesky fa c to rs of a n nxrc. T oeplitz m atrix , T, w hich is
fre q u e n tly th e cov arian ce m a trix of a s ta tio n a ry sto c h a stic process.
The C holesky d ecom positio n of T a s LLr w ith L low er tria n g u la r c a n
b e o b tain ed in a com p u tatio n ally e S icien t m a n n e r using th e F ast Cholesky
alg o rith m s developed by [Mo74], [MLNV77], [LeRG77]. In designing VLSI
s tr u c tu r e s to re a liz e th e se algorithm s, it is im p o rta n t to utilize th e th e o ry in
a m a n n e r w hich is conducive to im p lem e n ta tio n . In p a rtic u la r, rec u rsiv e
fa s t Cholesky alg o rith m s ex ist for g e n e ra tin g L e ith e r by colum ns o r by
rows. However, th e fo rm e r alg o rith m re q u ire s a c c e s s to ail ‘n ‘ e n trie s (th e
e n tire first colum n) defining T, while th e l a t te r u se s th e e le m en ts
sequentially, which, in a re a l tim e application, could c o rre sp o n d to using th e
co v arian ce in fo rm a tio n in sequence a s it 'a r riv e s'. Hence, th e row
re c u rsio n s a re m o re su itab le fo r im p le m e n ta tio n fro m a d a ta a c c e ss
viewpoint. A lternately, th e rec u rsio n s by co lu m n s induce a la d d e r form
s tr u c tu r e [LeRG77], [DM80] th u s m aking th is a lg o rith m a ttra c tiv e b e c a u se
- 146 -
la d d e r fo rm s exhibit n a tu ra l pipelining. No su ch la d d e r re c u rs io n for th e
row s a p p e a rs to exist.
In th is section, a lad d e r form fo r th e rec u rsio n by rows is d erived which
n a tu ra lly su g g ests a pipelined a rc h ite c tu r e b a sed on e le m e n ta ry ro ta tio n s,
allowing th e use of a lin e a r a rra y of CORDIC p ro ce sso rs. This is a n exam ple
of using th e th e o ry in an a p p ro p ria te fram ew ork for im p lem en tatio n , since
now b o th a sim ple d a ta access sc h em e a s well as a pipeline a rc h ite c tu r e a re
defined by th e sam e s e t of rec u rsiv e equations.
5.1.1 F ast Cholesky by Rows in Ladder Form
It is e a sie st to derive th e la d d e r fo rm for th e row re c u rs io n s by
exam ining th e fa s t Cholesky alg o rith m by colum ns [Mo74]. Let:
T = [tj : t 8 ; • • • : t B] (5.1)
0 0 0 . . . . 0
10 .
0
z = = colum n s h if t m atrix (5.2)
Lo . . . . o i o .
e = [i j o ; • • • ; o y (5.2b)
tk l = Ith elem en t of c o lu m n t*
Then th e colum n re c u rsio n s in n o rm alized form are:
Initialization:
[ c i : c 2]° = [ t i | t j - f n e j / V f n (5.3)
R ecursion:
[c i : c 2]*+1 = [Z c f i c |] 0 fc , (5.4)
- 147 -
w here
cosh-i?* siELhtfjfc
©* =
sin h cosh i?fc
is /-o rth o g o n a l w ith
= ta n h -1 -~ C.3^ -2 .
e lm
Then th e d e sire d Cholesky facto r.L , is given in colum n p a rtitio n e d fo rm by:
L = [c? | c f | • • • I e f -1] •
N otice th a t th e re c u rs io n only re q u ire s knowledge of th e first co lu m n of
th e Toeplitz m atrix . T. It c a n b e shown th a t (5.4) is th e C h a n d ra se k h a r
eq u a tio n fo r a m oving av erage p ro cess, i.e. w here T is a b a n d e d m a trix
[Mo?4], [MKD74]. In th is algorithm , th e co lu m n of L is o b tain e d by
p e rfo rm ing a J-orth o g o n al tra n s fo rm a tio n (or a CORDIC hyperbolic ro ta tio n )
on th e row v e c to rs of [Z c f j c | ] to ann ihilate c | i+ 2 - However th is
re c u rs io n on th e colum ns of L n a tu ra lly induces a no rm alized re c u rs io n fo r
its rows as shown in F igure 5.1. The arrow s in d ic a te which ro ta tio n angle is
applied in going fro m one colum n to a n o th e r o r fro m one row to th e n ex t.
Unlike th e colum n a lg o rith m w hich u se s a single ro ta tio n m a trix for e a ch
rec u rsio n , th e l a t t e r a lg o rith m req u ires several d istin c t ro ta tio n s to
co m p u te a single row. This fa c t to g e th e r with th e shift p ro p e rty (in d ic a te d
by th e arrow s in F igure 5.1) su g g ests a la d d e r form re a liz a tio n using J-
o rthogonal sectio n s w hich m ay b e w ritte n as:
Initialization:
Ci = 0 , V k
v 0.{ = V o.t = £ = 0, 1 .2 ....... 7 i — l
- 148 -
R ecursion:
fo r fc = 0 to £-1 begin:
= 0>
end;
■with Qk as defined e a rlie r an d = ta n h -1

7)k*
T hen fo r L ~ [ l j \ !■{ ; ■ • • ; ^ T -i]r , th e row s a re defined by:
= [V o.i V l.( ■■■ V n -u l ■
In th is algorithm , th e re c u rs io n on ’k ’ is th e o rd e r u p d a te of th e
la d d e r while ite ra tin g on £ c o rre sp o n d s to th e tim e u p d a te . The tem p o ral
se q u en c e h e re , is th e se q u en tial a c c e s s of th e covariance in fo rm a tio n (th e
e n trie s of tj), w hich m ay well b e c o m e available a s a tim e s e rie s in a re a l
a p plication.
The la d d e r fo rm consisting of J-o rth o g o n al sections is shown in Figure
5.2 a n d it m ay be rea d ily im p le m e n te d using th e lin ear pipelined a rra y of
F ig u re 5.3. D ata e n te rs th e pipe only via th e le ftm o st p ro c e sso r an d th e k th
p ro c e s s o r p ro d u ce s zero o u tp u ts u n til £ = fc+ l w hen it c a lc u la te s and
s to r e s i^jc a n d p ro d u ces ?]Jc+ijc+i a n d i/jt+ijt+i- T h ereafter, all e n te ring d a ta
is r o ta te d th ro u g h . N otice t h a t is only c a lc u la te d once.
To sum m arize, th is se ctio n h a s shown how a novel a rc h ite c tu re is
o b ta in e d th ro u g h an in tim a te c o n n e ctio n b etw een th e o ry and
im p lem e n ta tio n . Specifically, a new la d d e r fo rm s tru c tu re fo r th e fa st
Cholesky alg o rith m by rows was d e riv e d to exploit th e n a tu ra l pipelining of
la d d e r s tr u c tu r e s and to o b tain a n e le m e n t b y e le m en t d a ta a c c e ss schem e.
The n a tu ra l op eratio n s defining th e alg o rith m w ere /-ro ta tio n s , h en ce a
- 151 -
p ip elin e d lin e a r a rra y consisting of CORDIC p ro c e s s o rs was em ployed. It is
im p o rta n t to n o te t h a t th is la d d e r s tr u c tu r e tu r n s out to b e th e sam e as for
th e Levinson a lg o rith m in la d d e r fo rm a n d th e fa s t Cholesky by colum ns in
la d d e r fo rm £DMSQ) algorithm ! m p a rtic u la r, th e row and co lu m n re c u rsio n s
fo r th e Cholesky fa c to rs becom e equivalent w hen pipelining is in tro d u ced .
C onsequently, th e pipelined lin e a r a rra y is a unified VLSI re a liz a tio n of all of
th e s e algorithm s, suggesting a good m a tc h of alg o rith m s to a rc h ite c tu re .
5.2 SOLUTION OF LINEAR SYSTEMS OF EQUATIONS
C onsider th e m a trix equation
Ax = b (5.6a)
w here,
A: is th e coefficient m a trix of dim en sio n n x n
x : is the- n-dim ensional v e c to r to b e d e t e r m in ed
b: is a known v ecto r
P o p u la r m e th o d s of solving for x , invariably fa c to r A s u c h t h a t a t le a s t
one of its fa c to rs is of sim ple form , e.g. u p p e r tria n g u la r. When su ch
fa c to riz a tio n is done w ith e le m en ta ry row a n d colum n o p e ra tio n s ap p lied to
b o th sides of (5.6), th e re d u c e d sy ste m
Ux = c , w ith U u p p e r tria n g u la r (5.6b)
m ay b e solved b y b a c k su b stitu tio n . F a c to riz a tio n of A is th e m o st
c o m p u ta tio n a lly intensive operation, and will b e of p rim e c o n c e rn h e re . The
p o p u la r G aussian Elim ination m e th o d em ploys row o p e ra tio n s to fa c to r
A = LU w h ere L is lower tria n g u la r. However, n u m erica l sta b ility of LR
p ro c e d u re s (as LU facto riz atio n p ro c e d u re s a re called) d em an d s th e use of
pivoting (e x c e p t w hen A is positive definite), w hich is a p ro c e d u re req u irin g
- 152 -
th e physical exchange of rows of a m atrix. This is an e x tre m e ly
cu m b erso m e ta s k in a rr a y p ro c e s s o r a rc h ite c tu re s [Ku79], [KuS80]. In
c o n tra s t, QR fac to riz atio n s b a s e d for in sta n c e on Givens’ m eth o d , a re
s ta b le w ithout pivoting an d th e re fo re good ca n d id a te s for a rr a y
im p le m e n ta tio n (this was no ted , am ong o th ers, by II.T. Kung a n d C.
L eiserson [KL80]). In th is ap p ro ach , th e facto riz atio n A = QU, w here Q is
orthogonal, is ob tain ed th ro u g h a sequence erf orthogonal tra n sfo rm a tio n s
a pplied to A.
B efore em barking o n a n im p lem e n ta tio n of Givens’ m ethod, it is of
in te r e s t to n o te th a t QR fa c to riz a tio n also c o n stitu te s a significant s te p in
th e eigenvector deco m p o sitio n of a m atrix ,e.g . [SB80]. Im p o rta n t
applications of eig e n v e c to r deco m p o sitio n include beam form ing, sy ste m
identification, s p e c tra l e stim a tio n , e.g. [Sc 8 l ] an d com m unications [VT6 8 ].
In Givens’ m ethod, a n tlxtl m a trix A is o p e ra te d on by a n o rthogonal
m atrix , Q ^. su c h th a t th e ( i ,r ) e le m e n t of A is a nnihilated.
1 0
0 1
cos'iSjj.
1
Qr i = (5.7)
1
s im J jr
0 0
r th col col
Then A is re p la c e d by Q,?A an d th e p ro c e s s re p e a ts . Finally:
a = n n -Q 5 * = qk - (5.8)
r i
w here J"J • d e n o te s a le ft m a trix p ro d u ct.
N otice t h a t m u ltip lic a tio n b y Q,? in effect ro ta te s th e colum n v e c to rs
- 153 -
of
^1 ^r2 ■ ■ • a m
° il ° i2 • • •
th ro u g h v-ri, a very n a tu ra l ap p lication of th e CORDIC tech n iq u e. The
algorithm is su m m arized as follows:
fo r t = l to - begin:
fo r i = r + 1 to n begin:
■6n = - ta n 1(Lir/Qyj. ; Qyj. <- V o ^ + o |;
fo r j = r + 1 to 71 begin;
C O S tfji -sim Jrt ° n

“ii. simJrt C O S tir i
end;
^1
bi
<- c o s - i - s i m 5 rt
sin-^ rt c o st >_-
end;
end;
The ,7 -loop p e rfo rm s th a t r o ta tio n on th e 1 th and r 01 rows which z ero es o u t
Ojj., while th e i-loop re p e a ts th is o p e ra tio n to zero o u t all th e e le m en ts
below a^..
On s ta n d a rd g eneral p u rp o se m achines, Givens m eth o d is m o re
c o m p u tatio n ally intensive th a n G aussian elim ination since it p e rfo rm s a
plane ro ta tio n w here G aussian elim ination calls for only one m u ltip lic a tio n
a n d one addition. However CORDIC algorithm s p e rfo rm a plane ro ta tio n in
no m o re tim e th a n a b it se ria l m ultiplication, w hich is th e p ro p o se d
e m b o d ie m e n t of m u ltip liers in [KuS80] for larg e a rra y s. H ence, w hen
e le m e n ta ry ,.o p a o a tio n s a re co u n ted (an e le m e n ta ry o p e ra tio n being a
m ultiply-and-add o r any o th e r CORDIC op eratio n ), Givens m e th o d has th e
sam e com plexity as G aussian elim ination w ithout pivoting. On
- 154 -
m u ltip ro c e sso r m ach in es, w here pivoting is r a th e r cu m b erso m e, Givens'
alg o rith m becom es an a ttra c tiv e a lte rn a tiv e .
5.2.1 A rc n ite c tu re s f o r G ivens’ A lgorithm
C onsider th e im p le m e n ta tio n of Givens’ algorithm on a lin e a r a r r a y of
'71' CORDIC p ro cesso rs, w here n is th e o rd e r of A; th e p ro c e s s o rs being
p e rh a p s of th e kind to b e d e sc rib e d in c h a p te r six. Given th e row in d ic e s r
and i , th e firs t ta s k is to c o m p u te th e angle an d th e new value of 0^;
th is is done in one CORDIC op eratio n . Next, th e v e c to rs [ a ,,, Oy]r ,
j = r + 1 ...... 71 m u st b e r o ta te d th ro u g h th e angle In o rd e r to p e rfo rm
th is w ith sev eral p ro c e s s o rs in parallel, 1S should be tra n s m itte d
sim ultaneously to all th e s e p ro c e sso rs. However th is involves global
com m unications and is n o t a c c e p ta b le fo r a VLSI im plem entation. Thus, in
th e designs to be p re s e n te d , th e ro ta tio n s a re pipelined on a rr a y s w here
e a c h p ro c e s s o r is able to c o m m u n ic a te only w ith an im m ed iate n e ig h b o r in
one cycle. Notice th a t th e local co m m unications re s tric tio n im p o sed a t th e
o u tse t lead s n a tu ra lly to a p ip elin ed s tru c tu re .
A fully pipelined im p le m e n ta tio n of Givens m ethod o n a lin e a r a rr a y is
shown in F igure 5.4 fo r n = 5 (th e values, and bi ap p e arin g a t th e
o u tp u ts of th e p ro c e sso rs , c o rre sp o n d to th e e n trie s of U and c of E quation
5.6b respectively). The n e e d fo r th e leftw ard an d rig h tw ard d a ta p a th s as
well as th e first-in, first-o u t (FIFO) s ta c k s will becom e a p p a re n t sh o rtly . The
m ovem ent of d a ta in th is a r r a y is v ery n a tu ra l a n d is su m m arized as follows.
The value of ^ , which is th e m ajo r p a ra m e te r in u p dating th e t 01 an d i th
rows and zeroing is always c o m p u te d in th e leftm o st p ro c e s s o r and
p ro p ag a tes to th e rig h t a s th e new e le m e n ts of th e se rows a re c o m p u te d in
sequence. However, fo r a given r , th e ta s k of settin g to z e ro m u s t be
- 149 -
Figure 5 .1 : Recursions Induced on th e Rows o f th e Cholesky
Factors
n-2
-V n -l. Z
Figure 5 .2 : Fast Cholesky by Rows in Ladder Form
- 150 -
D a ta M a n a g e r
----------- A--------------
OLU
d a t a to
le ft
Proc 1 Proc 2 Proc n

d a ta to
rig h t
Figure 5 . 3 : P ip e lin e d Array o f P rocessors
- 155 -
“ 55
“54 a45
53 a44 a 35
52 a43 a34 a25
51 a42 a33 a24 a 15
a41 a32 a23 “14
a31 a22 “13
a21 a12
ro c 3
13
14
Lb*
24
a
2b
F igure 5 .4 : F u lly P ip e lin e d Givens Method on a L in e ar Array
- 156 -
done fo r 7i - r rows requiring th a t th e r t h subrow [tz^ a,./+1 • • • aTn~\
be re s id e n t in th e leftm o st n —r + 1 p ro c e sso rs for e a c h of th e tl —r values
of i . Thus, th e new value of a0- (for e a c h j > r , 7-+1, t i) is co m p u te d in
th e (j - t + l ) t h p ro ce sso r (e.g., a t t = 5 a22 is u p d a te d in p ro c e sso r 1
and re q u ire d in th e sam e p ro ce sso r a t t = 6 for co m puting -$24). Finally,
th e new e le m e n ts of th e i t h row p ro p ag a te left. This m o v em en t is
e x em p lary of th e fac t th a t a s r in cre ase s, th e subrow to b e o p e ra te d on
b eco m es sh o rte r, the leading ele m en ts having a lre a d y b e e n zero ed . Thus
th e e le m e n ts m u s t move left sin ce th e value of ^ w hich is b a s e d on is
always c o m p u te d in th e leftm o st p ro ce sso r.
C onsider now the evaluation 'of th e rig h t h a n d side of th e re d u c e d
system , i.e., com puting c. With re fe re n c e to th e alg o rith m given earlier, it
is ev id en t th a t th e elem en ts of b m u s t b e r o ta te d in p a irs in exactly th e
sam e fash io n as th e rows of A w ere affected. This is re a d ily done in th e
rig h tm o st p ro c e s s o r of th e a rra y as shown in Figure 5.4. When th e angle tin
a rriv es a t th is p ro cesso r, it is u sed to ro ta te th e a p p ro p ria te su b v e cto r of
b . Finally, b a c k su b stitu tio n is p e rfo rm e d to o b ta in th e re s u lt, x. The
r e a d e r is re f e rr e d to [Ku80] an d [De82] fo r b a c k su b s titu tio n m eth o d s on
a rra y p ro c e sso rs.
A few com m en ts regarding th e lin e a r a rra y s tr u c tu r e a r e in order.
F irst, n o te th a t a m em ory m an ag em en t sy ste m is re q u ire d to provide d a ta
to th e a r r a y in th e o rd e r req u ired . In th is p a rtic u la r case, th e m em o ry
m a n a g e r is quite simple, being ju s t a b an k of FIFO's (first in, first out
sta c k s). This is rea d ily se e n from F igure 5.5 w hich shows th e d a ta inputs
re q u ire d by e a c h p ro ce sso r during th e o p eratio n s se q u en c e fo r n = 5. The
d ashed arrow s a re re p re se n ta tiv e of th e leftw ard d a ta m o v em en t in th e
a rray . If e a c h p ro c e sso r is fed oy a FIFO, th e y -in p u t a t e a c h is an
- 157 -
PROCESSOR 1 2 3 4 5 6
* y x y x y * y x y x y
»=1 3)1 321
2 a11 33i a 12 a 22
✓
/
3 a l1 a4i ,a 12 332 a 13 323
' /
r=1 4 a ti 35/
,a 12 a42 3)3 a33 aI4 a24
*
r= 2 5 322 a32 val2 a52 a13 343 a14 a34 a15 a25
j
6 322 342 ,a23 333' a i3 353 314 a44 al5 335 j>2
'— ~V */ /
y
CM
» 352
322 a23 a43 324 334 ’ , 314 354
II
a15 a45 A 63
----------- V ' ----------V ' /
/
r= 3 8 3^3 343 a23 3s3 324 "35 a25 a35' a 1S a55<r /& ) *4
\ _____ ' ------/ - x
bz H
F igure 5 . 5 : Array Input Sequence f o r Givens Algorithm
- 158 -
e le m en t of th e FIFO (th e a:-input is a pseudo in p u t b e c a u s e its valu e is
u p d a te d an d re c irc u la te d in th e sam e p ro ce sso r). The new e le m e n ts of th e
i t h row (for i = r + 1 , r + 2 n) a re p ro p a g a te d le ft a n d th e n in to th e
a sso c ia te d FIFO th e re b y s e ttin g up th e d a ta in p u t for th e n e x t value of V .
At e a c h r , th e b u ffe r le n g th is n - r so t h a t th e FIFO a d d re s s logic m u s t
a c c o u n t fo r th e shrin k in g size of th e buffer, a ta s k which is re la tiv e ly e a sy in
a b it serial, nMOS realizatio n . N otice finally t h a t d u rin g s t a r t u p th e r e a re
som e b la n k e n trie s in th e FIFO's of p ro c e sso rs 2. 3 n. Tnese do n o t
how ever c o rre s p o n d to e m p ty o r w asted m em o ry lo catio n s. This is b e c a u se
a d a ta tr a n s f e r does n o t o c c u r u n til a p ro c e s s o r h a s b e e n a c tiv a te d by th e
rig h tw ard m o v em e n t of Thus fo r exam ple, th e first d a ta tr a n s f e r to
p ro c e s s o r 3 fro m its FIFO is n o t u n til t = 3.
5 .3 COMPLEXITY DISTRIBUTION AND ACTIVITY CHARTS
The p revious se c tio n s have given exam ples of re d u c in g th e te m p o ra l
com plexity of a n alg o rith m th ro u g h th e a d d itio n of c o m p u tin g re s o u rc e s , i.e.
th ro u g h a n in c re a s e in sp a tia l com plexity. W hereas u n ip ro c e s so rs e x e c u te
alg o rith m s in tim e, th e lin e a r a rra y enjoys one sp a tia l d im e n sio n as well. It
is quite n a tu r a l to th in k of enhancing th e e x e c u tio n s p e e d of Givens'
alg o rith m th ro u g h ad d itio n al sp a tia l dim ensions, fo r exam ple, u se a two
dim ensional m e s h of p ro ce sso rs. While th is e n d c a n b e achieved by
inspection, it is th e goal of th is section to d e m o n s tra te by exam ple, t h a t th e
lin e a r a rr a y m ay be em ployed sy ste m atic a lly to c o n s tr u c t h ig h er
dim ensional n etw orks. An a tte m p t a t form alizing th e p ro c e d u re even
f u rth e r will be m a d e in a la t e r section.
5.3.1 A ctivity Charts
Givens' a lg o rith m co n sists of th re e n e s te d loops an d could th e re fo re
intuitively m ak e efficient u se of a t le a s t th re e dim ensions . nam ely, one
tem p o ral and two sp a tia l dim ensions. The lin e a r a rr a y h as exploited only a
single sp atial d im en sio n an d a doubly indexed tim e dim ension. It is
in stru c tiv e to show th e a c tiv ity of th e p ro c e sso rs of th e lin e a r a rra y as a
fu n ctio n of tim e a n d th is diagram , shown in F igure 5.6, will be te rm e d th e
a c tiv ity c h a r t. The six individual p ro c e sso rs a re draw n horizontally acro ss
th e page and th e ir o p e ra tio n a l evolution a t e a c h tim e s te p is in d ic a te d
vertically. This c h a r t is a useful tool fo r synthesizing a two dim ensional
a rray .
5.3 .2 A Tw o-dim ensional Array fo r G ivens’ A lgorithm
The goal of th is s e c tio n is to red u c e th e doubly indexed tim e axis of th e
lin e a r a rr a y (in e ffect "tw o " te m p o ra l dim ensions) to a singly indexed one.
by m apping th e co m p lex ity of one index onto a n o th e r sp a tia l dim ension.
This o p e ra tio n is q u ite sim ple an d sy stem atic given th e activity c h a rt of th e
lin e a r a rra y . It is n e c c e s s a ry only to observe th a t:
1) The leftw ard m o v em e n t of d a ta in th e lin e a r a rra y is in effect
p re p a rin g th e in p u t m a trix of th e a lg o rith m for th e n e x t value of
V.
2) F o r in cre asin g 'r' , th e size of th e rows to b e o p e ra te d on d e c re a ses,
m eaning t h a t th e n u m b er of p ro c e s s o rs re q u ire d for th e V
dim ension b e c o m e s continually sm aller.
With th e se fa c ts an d th e activity c h a rt, it is q u ite sim ple to c o n s tru c t a
2-D a rra y w ith well defined o peration, by stack in g a s e t of one dim ensional
- 160 -
PROCESSOR 1 PROCESSOR 5 PROCESSOR 6

*11i i *21
r- 1 A » l!
f-1
r -2 k£*u it
f-3
r-7
r-2
notes- process:*
INPUTS
inactiveprocessor
Figure 5 . 6 : L inear Array A c t i v i t y Chart
- 161 -
a rra y s. Consider th a t for a given "r\ th e lin e a r a rr a y is a one tim e /o n e
sp ace dim ensional s tr u c tu r e whose o u tp u t c a n be fed in to exactly th e sam e
s tr u c tu r e executing ex actly th e sam e o p e ra tio n s (ex cep t th a t th is
s tr u c tu r e is sm aller) fo r th e su b seq u en t 'r '. Sim ply stack in g th e s e lin ear
s tr u c tu r e s and adding th e in te rco n n e c tio n s yields th e tria n g u la r a rra y
s tr u c tu r e of Figure 5.7. An orthogonal d e co m p o sitio n is now p e rfo rm e d in
0 { n ) tim e ste p s c o m p a re d w ith 0 { n z) on th e lin e a r a rra y .
Eem arks:
1) It is im p o rta n t to n o tice th a t th e tria n g u la r a rra y has b e e n
c o n s tru c te d using q u ite a g en eral p rin cip le. No p rio r 2-D s tru c tu re
was assum ed. This h a s resu lte d in a m o re p ro c e sso r efficient
solution th a n beginning with an a ssu m e d c o nfigu ra tio n su c h as th e
re c ta n g u la r o r hexagonal array s. In fact, th e p re s e n t tria n g u la r
a rra y a c tu a lly h a s th e in te rc o n n e c tio n of a re c ta n g u la r s tru c tu re
w ith th e re d u n d a n t p ro ce sso rs rem o v ed as a consequence of th e
synthesis p ro c e d u re (In te re ste d re a d e r s m a y find th e re c ta n g u la r
a rra y s tr u c tu r e in [KR81]. It ca n also b e o b tain e d from th e activity
c h a rt b y ignoring th e shrinking dim ension of th e lin e a r a rra y w ith
in creasing ’r ') .
2) Many a u th o rs e.g. [M u7l] [KuSBO] have n o te d th a t th e "locus" of
active p ro c e sso rs in a m u ltip ro c esso r a rra y m ay b e viewed as a
series of c o m p u ta tio n w a v e fr o n ts rese m b lin g plane waves. It is
in te re stin g to n o te t h a t th e lin e a r a rr a y is m o re efficient th a n th e
tria n g u la r a rr a y in its p ro ce sso r utilizatio n sin ce it co rresp o n d s to
a c u t th ro u g h a h ig h er dim ensional a rr a y along a com p u tatio n
wavefront.
- 162 -
F ig u re 5 .7 : A Two Dimensional A rray f o r Givens A lgorithm
- 163 -
3) Only th e tria n g u la riz a tio n of a sq u a re m a trix A has been
co n sid e red . The a rra y s tr u c tu r e s e x te n d n a tu ra lly how ever to th e
p ro b le m of tria n g u la riz in g a n n x p m a trix , e.g. for solving th e
lin e a r le a s t-s q u a re s pro b lem s. A lin e a r a rr a y of 'p' p ro ce sso rs
co u ld b e o p e ra te d exactly a s in F igure 5.4 a n d th e e x e cu tio n tim e
would b e p + n ( n — l ) / 2 units.
4) The p r e s e n t re s u lts could b e g e n e ra liz e d to still h ig h e r dim ensional
a rra y s , p a rtic u la riiy for algorithm s w ith m an y n e s te d loops, since
th e b a sic p rin c ip le involves unravelling th e loops and m apping
th e m o n to s p a tia l dim ensions. F o u r dim en sio n al a rra y s (in which
one d im en sio n is tim e) a re for all p ra c tic a l p u rp o se s, th e lim it in
th is w orld a lth o u g h a b stra c tio n s to still h ig h e r dim ensions m ay
in d ee d p ro v e useful. Algorithm s w ith finite co m plexity enjoy th e
p r o p e rty t h a t th e dim ensional "axes" onto w hich th e com plexity is
m a p p e d a re finite. H ence, m ultiple in d e xing on th e s e axes, i.e.
th e ir r e p e a te d use, is a m ean s for c re a tin g h ig h e r dim ensional
a rra y s in a four dim ensional world. The sim p le st exam ple of a
m ac h in e w hich exploits th is prin cip le is th e u n ip ro cesso r. The
e n tire co m p lex ity of a n alg o rith m is m a p p e d o n to th e tim e axis
a n d c e r ta in se c tio n s of code a re e x e c u te d re p e a te d ly in tim e
r a t h e r th a n p erfo rm in g th e m in sp a ce a s is th e case w ith m esh
c o n n e c te d sy ste m s.
5 .3 .3 D ual A rrays
The lin ear_ an d tria n g u la r a rra y s p r e s e n te d h e re a re p a rtic u la r
exam p les of m apping algorithm ic com plexity on to m an y dim ensions of
co m p u tatio n , how ever th e m apping is not unique. In th e p r e s e n t case of
- 164 -
Givens' algorithm , com plexity lies in th e th re e n e s te d loops. In "u n w in d in g "
th e s e loops, a decisio n was m ad e to m ap c e rta in loops in space, how ever th is
choice was n o t unique. In p a rtic u la r, a n o th e r so lu tio n read ily a rises to th e
lin e a r a rr a y b y m apping th e loop into sp ace a n d p erform ing th e 'j ' loop
in tim e. The a rra y o p eratio n is now d e p ic te d in F igure 5.8 and th e
a s so c ia te d activ ity c h a rt in F igure 5.9. E a c h p ro c e s s o r is d e d ic a te d to a
specific r o ta tio n a n d first c a lc u la te s an d s to re s a n angle (angle in place).
F u tu re in p u ts a re r o ta te d by th a t angle. This a rra y will b e called th e
tim e / sp a ce dual -or sim ply th e dual to th a t of F igure 5.6. While th e
te m p o ra l com plexity is still 0 { n z) w ith 0 ( n ) p ro c e sso rs, th e e x a c t n u m b ers
in d icate n —r tim e step s p e r r-lo o p w ith n + 2 —r p ro ce sso rs w hereas
previously, it was n + 2 - r p ro ce sso rs w ith n —r tim e ste p s, i.e. th e e x a c t
dual. A du al tria n g u la r a rra y to th a t of F igure 5.7 is also rea d ily g e n e ra te d
using th e du al a c tiv ity c h a rt an d th is a rra y is shown in F igure 5.10.
5 .4 A FORMAL APPROACH TO COMPLEXITY MAPPING
Ad hoc tec h n iq u es a re gen erally applied in o r d e r to o b tain an a rra y
a rc h ite c tu re th a t co m p u tes a p a rtic u la r algorithm . H ence, while it is known
th a t th e c h o s e n alg o rith m m ay b e e x e cu te d efficiently on th e a rra y , little
c a n b e said a b o u t th e g en eral applicability of th e a rc h ite c tu re . A step
tow ards a sy s te m a tic c o n stru c tio n p ro c e d u re w as ta k e n in th e previous
sectio n s w here th e a c tiv ity .c h a rt of th e lin e a r a rr a y provided th e ability to
c o n s tru c t a two dim ensional a rra y . The n o tio n of du al a rra y s also provided a
m ean s fo r exam ining a lte rn a te a rc h ite c tu re s . This s e c tio n will be c o n c ern e d
w ith form alizing th e id ea of dual a rra y s as well as th e use of activ ity c h a rts
of lin e a r a rra y s fo r obtaining h ig h er dim ensional s tr u c tu r e s . S tatin g re s u lts
th a t a re g e n e ra l to an y conceivable algorithm is a difficult task , however
- 165 -
proc proc .proc proc
1
rt-
A
1
al l
a12 ,.t= 5
a13 t= 6
a14 t= 7
a15 t= 8
a22 t= 9
C1
a23 t=10
“24 t = ll
a25 t=12
a33 c2 t=13
a34 t=14
a35 t=15
c3 t=16
t=17
t=18
F ig u r e s .8 : O peration o f th e Dual L in ear Array
- 166 -
PROCESSOR! PROCESSOR 2 PROCESSOR4
r-1
l-l
f-7
NOTES: INPUTS
CTVE
OUTPUTS
F igure 5 . 9 : Time-Space Dual Array A c t i v i t y Chart
- 167 -
fO to
bl a i 5 3 14 3 13 3 12 3 i l
F ig u re 5 .i 0 : Dual T ria n g u la r Array
- 168 -
s ta te m e n ts c a n b e m a d e ab o u t a p a rtic u la r class of alg o rith m s th a t satisfy
som e p ro g ram .m o d el (re c a ll th a t it was s ta te d in c h a p te r one th a t a goal of
th is d iss e rta tio n was to exam ine a class of p ro b le m s, nam ely signal
p rocessing alg o rith m s). F ortuitously, com plex signal p ro ce ssin g ta sk s often
em ploy a v a rie ty of lin e a r a lg e b ra algorithm s th a t a d m it to th e specification
of a m odel.
A convenient n o ta tio n fo r describing d a ta d e p e n d e n c ie s in program s
will prove useful. Following Kuck [Ku77] and others:
D efinition. 5.1: A basic loop is one in which th e loop body does n o t contain
loops
D efin ition 5.2: The s e t of in p u t variables to a loop, L, i.e.. in d ep en d en t
v ariab les u sed on th e r ig h t h an d side of a ssig n m en t s ta te m e n ts in th e loop
body, is d e n o te d I{ L ) while th e s e t of o u tp u t v a ria b le s is d e n o te d f1(L).
D efin ition 5.3: A loop. L, w ith index s e t {Ih / 2 In ). a t a p a rtic u la r
ite ra tio n is d e n o te d Z ( i lf %z, .... in). This ite r a tio n o c c u rs for l x = i x,
h - i'Z In = in -
D efin ition 5.4: F o r two loops Li.Lj-
(i) Lj is d a ta dep en d en t on 1+, d e n o te d Li 6 L j. if for som e
x € /(£,-)• The co n d itio n x e II(Li) also holds.
(ii) Lj is d a ta in d e p e n d e n t of Li, d e n o te d Li 6 L j, if V z e I{L j),
th e co n d itio n x e Q(Z,j) is satisfied.
(iii) Lj is d a ta o u tp u t d ep en d en t on L , d e n o te d Li 6° Lj, if
x € n ( i j ) is c o m p u te d a fte r th a t of Li
- 169 -
(iv) Lj is in d ir e c tly d a ta d e p e n d en t on Lj , den o ted Lj A L, - , if th e r e
a re a sequence of s ta te m e n ts , $ Sj su c h th a t S 1 6 S 2 • • • 6 S k
an d S 2 £ ijj, Sk € L j.
D efin ition 5.5: D ata D ependencies:
F o r any loop, L, w ith in d ex s e t $/j j ji i a n d body s ta te m e n ts
S i, S 2, •••. S k , an d ( i lt i z .... i d) , (k l t k z ............k d) e / w here I is th e in d ex
sp a c e and fo r som e x e D[S’i (fc1, k z .........fcd)] th e following d a ta
d e p e n d e n c ie s hold:
(i) S i ik i, k z...... k d) < S j( ix i d) an d
x e I { S j{ ii, iz, ..., £*)) -> S i 6 S j w here Sj < S j m ea n s S'* is
e x e c u te d befo re S j.
(ii) x e / ( S , - ^ ........•£*)) an d S ^ i i . i 2 i*) < S j ^ j fcd) Sj <5 5,-
fxt i * '. ~ <r

n. j x“ x* ••••
<?. f f c .i- •••* •“<*/
«• A <^ -S' j’.r\Vl ,l> •••■ £_,>
~dJ
-» .S ’. °<5° ‘- 'j
D efin ition 5.S: D ata D ependence Graph:
A d a ta d ep en d en ce g raph, G, of s nodes, one for e a c h Sj , 1 < i ^ s .
F o r e a c h d ep en d en ce re la tio n b e tw e en Sj an d S j , th e re is a c o rresp o n d in g
lab e lle d a rc from th e node re p re s e n tin g Sj to th e S j node.
5.4 .1 C on struction of M ultiprocessor Arrays
The foregoing n o ta tio n p ro v id es a convenient vehicle to d escrib e th e
m apping of algorithm ic co m p lex ity onto larg e collections of p ro ce sso rs.
M atrix a lg e b ra o p eratio n s ex h ib it co n sid erab le s tr u c tu r e allowing th e
fo rm a tio n of a p ro g ra m m odel o n w hich th e p r e s e n t re s u lts a re based. Many
m a trix alg o rith m s a d m it to th e following s tru c tu re :
- 170 -
do while I lt Cj ;
Bx
do while / 2, C2 ;
Bz
O
Q
Q
do while Is , Cy ;
Bu
e n d Iu ;
e n d Ij i - i :
end I i ;
w here U il& i a re th e loop indices, a re th e loop co n d itio n als and
\B iliL i a re th e loop bodies, assu m ed to be b a sic . F re q u en tly , only B u
a n d /o r B u - 1 a re n o n em pty.
L et B i(ii,iz ,..,iji) d e n o te th e execution of th e loop body B t w hen
Ji = i i . Iz = ^z e tc .. le t Bt ( i) d enote th e ex ecu tio n of Bt w hen /j = i while
B i(li) m ean s th e e x e cu tio n of B t for einy allowable value of / t . F u rth e rm o re ,
r e s tr ic t sill loop bodies to have a single e n try p o in t an d a single exit. The
goal of th is s e c tio n is to exam ine u n d e r w hat conditions, som e g e n e ra l
s ta te m e n ts c a n b e m ad e regarding th e c o n s tru c tio n of m u ltip ro c e sso r
a rra y s having local connectivity. Specifically, th e re a re two issues:
i) The d a ta d ep e n d en c ies of a n algorithm lim it th e a m o u n t of
p a ra lle lism th a t m ay be em ployed. What s o rts of d e p e n d e n c ie s a re
allowable s u c h t h a t a m u ltip ro cesso r having lo cal co n n ectivity
re s u lts in a sp e e d advantage? In p a rtic u la r, do th e m a trix a lg e b ra
o p e ra tio n s of in te r e s t exhibit th e s e d e p en d en cies?
- 171 -
ii) W hat is th e p erfo rm a n c e a d v a n ta g e ach iev ed by th e
m u ltip ro c e s s o r? F o r exam ple, a n a lg o rith m req u irin g 0(71-®) ste p s
on a u n ip ro c e s so r should a t le a s t re q u ire no m o re th a n
s te p s if 0 (n ) p ro c e sso rs a re em ployed. C learly th e d a ta
d e p e n d e n c ie s influence th e achievable s p e e d en h a n ce m en t.
The m a in t h r u s t of th is sectio n will b e to show t h a t u n d e r c e rta in
conditions, a rr a y s th a t achieve th e p e rfo rm a n c e e n h a n c e m e n t alluded to in
(ii) do exist. P ro o f of e x iste n c e will be given b y providing a c a n d id a te a rra y .
A lthough a rr a y s of a rb itra ry com plexity a re possible fo r v e ry com plex
p ro g ra m s, e.g., a n 71-dim ensional a rra y fo r a p ro g ra m w ith n a p p ro p ria te ly
s tr u c tu r e d loops, VLSI in te g ra tio n on a two d im ensional silicon su rface
im poses v e ry s e v e re c o n stra in ts on a rra y size. While it is possible to
c o n s tru c t a n a r r a y w ith th r e e sp atial dim ensions an d only local connectivity
in a fo u r d im en sio n al universe, two sp a tia l dim ensions a re th e lim it on a
silicon plane. P ro g ra m s of large c o m p u ta tio n a l com plexity m ay be
im p lem e n te d only b y m ultip ly indexing th e dim ensions.
T hree d im en sio n al a rra y s (w here one d im en sio n is tim e) will be studied
first while g e n e ra liz a tio n s will b e m en tio n ed la te r. The m a in in te r e s t of this
s tu d y is to e s ta b lis h th e s o rts of d a ta d e p e n d en c ies t h a t a re allowable in th e
p ro g ra m m odel, given t h a t a th re e dim ensional a rr a y s tr u c tu r e w ith local
co n n ectivity is d e sire d . The p erfo rm a n c e lim ita tio n s o n th e a rra y due to th e
d ep e n d en c ies will also b e exam ined. Since th e r e a re two available spatial
dim ensions, th e two m o st deeply n e s te d loops will b e d istrib u te d in space
b e c a u se th e s e cure e x e c u te d m o st frequently.
L et th e in d ic e s / 1. Iz , .... In assu m e 0{ 71 ) d istin c t values e a c h (i.e. Ii
assu m es a t m o s t f c n values) yielding a n a lg o rith m of com plexity 0{7im)
(assum ing e a c h loop body co n sists of only a few sim ple in stru c tio n s, whose
- 172 -
n u m b e r is in d e p e n d e n t of n ) .
l e m m a 5.1: L et Br (a ,b ) d e n o te th e e x e c u tio n of Br w hen I j/-i = a , I y - b
a n d le t * d e n o te any allowable value in th e a p p ro p ria te in d ex set):
(d l): B y (iy ) S iy e Iy , Iu _j = c o n s ta n t, / ^ 0
(d2): B y { i y - i . d ) 6 B y - ^ i y - i + l , *) iy -i^ -Iy -i • d G /if—i , I > 0
(d3): B ji-i{ia -i, *) 5 B y {iy-i, i) i# -i e Iy-1 , V i e Iy
(d4): B ji{iy -i, iy-T-e) 6 B y (iy - i + l, iy ) , s <= 0
a re allowable dependencies, th e n th r e e ex ists a one dim ensional co m puting
s tr u c tu r e w ith only n e a re s t n eig h b o r connections, capable of co m p u tin g th e
p ro g ra m re s u lts in O fa * -1) tim e.
R e m a rk : Ite m ( d l) freq u en tly d e g e n e ra te s in m a trix alg o rith m s s u c h t h a t
n o d e p e n d e n c e ex ists b etw een su ccessiv e e x ecu tio n s of th e in n e rm o st loop.
Sim ply s ta te d , th e ap p licatio n of th e s e o p eratio n s to a s e t of rows (o r
co lu m n s) does n o t g e n e ra te d a ta d ep e n d en c ies am ong p a rtic u la r row
e le m e n ts fo r t h a t sam e o p eratio n . The p r e s e n t m odel quite re a so n a b ly
a ssu m e s t h a t su c h a n e le m e n ta ry o p e ra tio n is applied to individual m a trix
e le m e n ts in th e m o st deeply n e s te d (i.e., M01) loop.
Proof:
C onsider th e c a n d id a te s tr u c tu r e of p ro ce sso rs: (a c irc le d e n o te s a
p ro c e s s o r an d its label in d ic a te s th e loop body t h a t it ex ecu tes)
w here:
fc= 1
a (5 ir -i) U Q(îf(fc)) i> f

fc=i-/+1
is th s o u tp u t of th e J?# p ro c e sso rs to th e rig h t.
D ependence ( d l) is c le a rly satisfied by th e e x p re ssio n for U (5ff(i)).
D ependence (d3) is sa tisfie d since Q(Bji - i ) p ro p o g a te s th ro u g h th e array.
Finally d ep en d en ce (d2) a n d (d4) a re satisfied by th e re v e rs e flow of data. It
is assu m ed t h a t e a c h p ro c e s s o r h a s local m em ory.
Suppose t h a t *) is ex e cu te d a t T = 1. T hen th e th ro u g h p u t
of th e a rra y will b e d e te rm in e d by w hen successive o p e ra tio n s of B jf-i can
be in itiated , a s s um ing everything h ap p en s in som e b a sic tim e unit. From
(d2) it is c le a r th a t is in itia te d a t tim e ft given by
ti = m ax ( i , i + (2d —iyf 1 J)
Now since /j^_x v a rie s over a t m o st k u _x n values, th e la s t BU- X
p ro ce ss is in itia te d a t:
m a x ( k u .- j i , k j / ^ n + —J )
- 174 -
L et I n v ary over a t m ost kMn values. T hen all th e calculations of th e
(M -l)**1 and M 0 1 loop are co m p lete a t tim e (we a ssu m e n > I)
T - { k j i + k j i - 1)71 + (2 d -Z )[ U J/{Z(i-£>oj
w here /{cj is th e in d ic a to r fu n ctio n of th e logical event, C. It is c le a r th a t
T - 0 (7 1 ) sin c e th e 2nd te r m is a c o n sta n t. T herefore, th e ex ecu tio n tim e
fo r th e e n tire p ro g ra m is
= 0 (71* J)
a n d th e proof is com plete.
The foregoing lin e a r a rra y has a c o rresp o n d in g activity c h a rt and it is
now possible to g en eralize th e id e a of "stack in g " of th e rows of th e activity
c h a r t to c o n s tru c t a tw o-dim ensional a rra y .
T heorem 5.1.
If ( d l) - (d4) a n d th e additional d ep e n d en c ies (i?r ( a ,b ,c ) d en o tes th e
e x e c u tio n of B r w hen Iu -z - a, I = b , I*? = c):
(d5): *) 5 5 jf-i(,'.ff-i+fc> *) . i-H-i ^ I ji - i . fc > 0 ,
I n -z - c o n sta n t
(th is is allowable in Lem m a 5 .1 w ith m em ory)
(d 6 ): B ji-z iiy -z , *, *)6 B ji-iiiji-z. i . *) . v '• £ 7ff- l
- 175 -
(d.7): | ) 6 B u - z H u - z ^ l . * , * ) , ijg-z e lu - z . £ e I j i - i c o n sta n t.
a re allowable, th e n th e r e e x ists a two dim ensional a rr a y of p ro c e s s o rs w ith
local connectivity, ca p ab le of executing th e p ro g ra m m o d el in 0 { n M~z)
tim e.
Proof:
B eginning w ith th e s tr u c tu r e of Lem m a 5.1 w hich satisfies ( d l) - (d4)
an d is in d e p e n d e n t of (d5)-(d6), c o n s tru c t th e c a n d id a te s tru c tu re :
W D
#
M-2
• 0
BM(kMn)
0
0 # 9
9 9 9
9 0
V kMn )
+r
It is c le a r t h a t (d l), (d3) a re satisfied. Next, n o te t h a t (d2). (d4) a re
satisfied th ro u g h th e a d d itio n a l connectivity, w hich also satisfies (d5) an d
(d6).
N otice th a t freq u en tly , som e of th e p ro c e sse s, p a rtic u la rity th e Bu
p ro c e s s e s will be null, so t h a t th e a rra y will n o t b e re c ta n g u la r.
- 176 -
The n e x t ta s k is to a s c e r ta in th e e x e cu tio n tim e of th e p ro g ra m m odel
on th is a rra y , to c o m p le te th e proof. It is once a g a in n e c e ssa ry to
d e te rm in e th e tim e s te p s a t w hich su ccessiv e B u -z p ro c e sse s c a n b e
in itia te d given th a t B u -z{Iu -z = 1. *, *) o c c u rs a t t = t0 . Let
r e p re s e n t th e tim e of o c c u rre n c e of B u - X(i, *) for Iu -z = co n stan t. F ro m
th e can d id ate a rra y s tr u c tu r e , th is is given b y
ti = \r-jH 2 d + i +t0
Notice th a t (d6) h as no e ffect in £* sin ce is always a t le a s t as larg e as
th e tim e re q u ire d for Bu-zi^-u-z. *’• *) to p ro p a g a te to B u-iH u-z- *)■
Next, le t t \ d e n o te th e tim e of o c c u rre n c e of B u -z{i. *). F rom (d?), it
is c le a r th a t
= i + [------ J/ (£,g) ; / is a fu n c tio n of f,g only
" ~ kji-Z71 + l ~ ^ ------ J/ ( £ • ? )
= 0 (n )
T' = 0 ( n ) = tim e to c o m p le te th e th r e e in n e r loops.
.'. The p ro g ra m ex e cu tio n tim e is T'p :
t=i
t= i
- Ill -
th u s com pleting th e proof.
It is in te re s tin g to ev alu ate th e latency, L, th ro u g h th e a rr a y for a given
Iji-z - iu - z . i-e. th e tim e to co m p lete th e two in n e r loops. This is given by:
~ kyn
= 0 (n ) (5.9)
Rem arks:
1} N e ith e r of th e c a n d id a te a rra y s of Lem m a 5.1 nor of T heorem 5.1
have b e e n allowed to a c c e p t input, for exam ple in th e m a n n e r of
F igure 5.6. This is re a d ily acco m o d ated in th e p r e s e n t s tr u c tu r e
b y p r e te n d in g th a t e a c h p ro ce sso r h as a m e m o ry a s so c ia te d w ith
it, w hicn i t a c c e sse s as p a r t of th e icop body Sfc. In p ra c tic e , th e
m em o ry n e e d n o t ex ist if in p u t is provided fro m a n e x te rn a l so u rc e
in th e sa m e m a n n e r, however, th e m em ory c o n c ep t is a convenient
tr ic k to e n s u re t h a t n one of th e proofs a re violated.
2) A lthough th e proofs above are c o n stru c tiv e in n a tu re , it is
im p o rta n t to re m e m b e r t h a t th e propositions th em selves r e f e r to
th e e x iste n c e of a rra y s having a c e rta in p e rfo rm an ce.
The proof of T h eo rem 5.1 h a s im m ense value in providing a n a rra y th a t
is g e n e ra lly app licab le to alg o rith m s satisfying th e p a rtic u la r p ro g ra m
m o d el u sed . In c o n tra s t, previous m eth o d s of synthesizing su c h a rra y s have
b e e n q u ite ad hoc, p resu p p o sin g a specific problem , a rra y in te rc o n n e c tio n
o r b oth. H ence, th e ra n g e of applicability of th e m u ltip ro c e sso rs is seldom
know n (ILIAC TV). F u rth e rm o re , th e a rra y s tr u c tu r e s c a n b e g e n e ra te d
- 178 -
auto m atically fro m th e algorithm . A pplications of th e th e o re m will b e given
shortly, for som e p o p u lar th re e loop alg o rith m s.
G eneralizations to a rra y s of h ig h er dim ension, while of academ ic
in te re s t, a re n o t p ra c tic a l in VLSI im p lem e n ta tio n s u n less th e connectivity
is en tirely two dim ensional, which is c le a rly re s tric tiv e . In any event, if one
continues to a u g m e n t th e d a ta dep en d en cies, as was done fo r T heorem 5.1
in th e case of (d5)-(d7) i t is possible to show th e following resu lt:
F o r any p ro g ra m satisfying th e p ro g ra m m odel a n d d a ta dependencies,
th e re e x ists a n {M — l) dim ensional com puting s tr u c tu r e w ith only
local connectivity, capable of ex ecu tin g th e p ro g ra m in 0{ ti ) tim e
steps.
Remark:
(1) The g e n e ra liz a tio n is possibly of in te r e s t fo r 4 dim ensional a rra y s (one
tim e dim ension) if c u rre n t day ta lk of " th re e dim ensional VLSI" proves
c o rre c t.
(2) In th e c a s e of 3-D VLSI as well a s fo r non-VLSI a rra y s (i.e., c a se s w here
local con n ectiv ity is practiced), it is u n fair to a ssig n equal cost (tim e) to
all com m unications.
It is w o rth re m a rk in g t h a t th e c o n s tru c tio n u se d in th e proof of
T heorem 5.1 is sim ply a n ex ten sio n of th e " sta c k in g " id e a of S ection 5.3.2
em ployed fo r Given's algorithm . The tim e sp ace d u ality c o n c ep t of S ection
5.3.3 c a n be sim ilarly ex te n d e d to a u seful a n d g e n e ra l re s u lt.
Lem m a 5.2:
The tim e s p a c e dual lin ear a rra y of th e U m loop alw ays exists.
- 179 -
A du al a rra y of L em m a 5.1 is:
G C B ^ a )] «[B m_1 ( 2 ) ]
Theorem 5 .2 (T heorem of D uality)
F o r e a c h a rr a y satisfying th e c o n stru c tio n of T heorem 5..1 (i.e., for th a t
class of alg o rith m s), th e re ex ists a t le a s t one dual a rr a y in w hich th e loop of
th e p ro g ra m m odel m ap p e d into th e te m p o ra l dim ension is ex changed fo r a
loop m a p p e d in a sp a tia l dim ension.
Proof:
Follows as a con seq u en ce of L em m a 5.2 w hen th e lin e a r a rr a y in th e
proof of T h eo rem 5.1 is re p la c e d by its dual.
Corollary;
T here e x ist "space du a l" a rra y s in which th e sp a tia l dim ension of two-
loops of th e p ro g ra m m odel a re exchanged.
S p ace dual a rra y s will n o t b e stu d ied in th is d isse rta tio n , m ainly
b e c au se th e y m u s t b e stu d ie d a t a co n stru ctiv e level. Clearly e x iste n c e
level stu d ie s c a n b e quite trivial since a space dual a rra y m ay b e o b tain e d
b y sim ply re o rd e rin g (i.e., renam ing) th e sp a tia l dim ensions of an y
- 180 -
p a rtic u la r a rra y .
Rem ark:
Sim ply s ta te d , th e foregoing h as d isc u sse d th e m apping of algorithm ic
c o m plexity onto m an y dim ensions fo r a lg o rith m s w hich satisfy a p a rtic u la r
s tr u c tu r e . F undam entally, th e re is no difference b e tw e en any two
dim ensions a lth o u g h it m ay be im posed b y d a ta d e p e n d en c e o r co n n ectiv ity
c o n stra in ts, e.g.. th e dual n a tu re of tim e an d space. In th is world, th e
p ra c tic a l lim it is four dim ensions, i.e., four loops in th e p ro g ra m m odel
how ever, h ig h e r dim ensional p ro g ra m s in w hich th e to ta l n u m b e r of
co m p u ta tio n s a re finite c a n be th o u g h t to g e n e ra te h ig h e r dim ensional
a rra y s b y m u ltip ly indexing along a given dim ension. By indexing in th e
tim e dim ension, th e usual c o n c e p t of tim e sh arin g o r c o m p u ta tio n in tim e
arises; sp a c e sh a rin g is clearly th e m o re u n u su al case.
Exam ples:
T here a re a v a rie ty of im p o rta n t m a trix a lg e b ra algorithm s, p rim a rily
fo r fac to riz atio n , w hich satisfy th e p ro g ra m m odel. In view of th e ir r e c e n t
p ro life ra tio n in th e lite ra tu re [MD81], a rra y s fo r th e ir re a liz a tio n will b e
deriv ed a s e x a m p les for th e tec h n iq u es ju s t d escrib ed . These a rra y s
a p p e a re d in [ADM82] w here th e y w ere o b tain e d b y in sp ectio n . In c o n tra s t,
th e a rra y s to follow a re all derived fro m a g e n e ra l m eth o d . A g e n e ra l
p u rp o se a r r a y s tr u c tu r e , n am ely a tria n g u la r a rra y , w hich is usable fo r all
th e ex am p les, is th e re s u lt of e ith e r a p p ro a c h in th is case.
E xam ple 5.1:
Givens A lgorithm - This h a s a lre ad y b e e n p r e s e n te d in S e c tio n 5.3.
- 181 -
Example 5.2:
G aussian E lim ination is a well know n m e th o d fo r obtaining th e LB
deco m p o sitio n of a m a tr ix A = [ay ] as follows:
fo r r = 1 to begin;
fo r i = t +1 to 7i begin;
■>?ri = Q j r / ° r r ; arr*~ ar r •
fo r j = r + 1 to 7i begin;
°T3 1 o' au
1 “y.
end;
&r <—
1 0 br
b i.
tfri 1 bi
I J
end;
end;
This is sim ply th e Givens’ m e th o d in w hich th e coefficient m a trix is a
lin e a r (i.e., a m u ltip ly a n d a c c u m u la te ) r a th e r th a n a c irc u la r ro ta tio n . Its
im p le m e n ta tio n is e x a c tly th a t of Givens m eth o d (e.g. F igure 5.10), however,
w ith a lin e a r m e tric c h o se n fo r th e CORDIC p ro ce sso rs.
Exam ple 5.3:
The H yberbolic C holesky Algorithm , w hich is a n efficient m e th o d of
facto rin g a positive defin ite m atrix , A, firs t a p p e a re d in [MD81]. Following
Belosm e an d Morf’s d e s c rip tio n of th e alg o rith m [DM81], observe t h a t A
satisfies th e iden tity :
A ~ V~ V - W7 Tr
w here
- 182 -
v tj = 0 otherw ise
and
m ;- = if ? > i
■U/'y- = 0 otherw ise
a re th e e le m e n ts of th e u p p e r tria n g u la r m a tric e s V an d W.
H yperbolic ro ta tio n s a r e now applied to com bine rows of V a n d W in a
m a n n e r forcing W to zero while m aintaining th e s tr u c tu r e of V. This yields
th e u n iq u e Cholesky decom position of A. This alg o rith m m ay b e w ritte n as
[DM81]:
fo r r - 2 to n begin;
for i = r —1 to 1 begin;
tfrt = - ta n h -1^ / ^ ; «- y/v£. - -w£.\
fo r j = r + l to 7i begin;
v rj (cosh#ri sinhtfrt Vrj

[sinhtfri coshtfri Wij
V*.
end;
end;
end;
It is re a d ily verified th a t th is alg o rith m satisfies th e p ro g ra m m odel.
Applying T heorem 5.1 for n = 4, in p a rtic u la r th e c o n s tru c tio n in its proof,
th e th r e e dim ensional (two sp atial dim ensions) a rra y of F igure 5.11 is
ob tain ed . Incom ing c o m p o n en ts of th e m atrix , A, a re first n o rm alized to
c o m p u te Vy an d iUy. Then e a c h p ro c e s s o r in th e a rra y c o m p u te s an d s to re s
th e angle th ro u g h w hich it will ro ta te all su b seq u e n t in p u ts, resu ltin g in th e
- 183 -
t = l t= 2 t= 3 t= 4 t= 5
a l l a L2 a 13 a l4 b l
a 22 a 23 a 24 b 2
a 33 a 34 b 3
a 44 b4
F igure 5 . 1 1 : T ria n g u la r Array f o r H yperb olic Cholesky
- 184 -
tria n g u la r a rr a y w ith a ngles i n p la c e .
N otice th a t a g e n e ra l purpo se a rra y , n a m e ly a tria n g u la r a rra y , has
b e e n show n to exist fo r th e p ro g ra m m odel, a s co n firm ed b y th e exam ples.
5 .4 .2 An Approach To Form alism
The previous s e c tio n provided som e fo rm a lism fo r th e sy n th e s is of
m u ltip ro c e s s o r a rra y s, w hich g u a ra n te e d th e s tr u c tu r e s to b e g e n e ra l
p u rp o se for a given p ro g ra m m odel. This s e c tio n is c o n c e rn e d with
developing additional ideas for th e a n a lysis of m u ltip ro c e s s o r a rra y s, b a sed
o n th e n o tio n of "m e tric spaces", p a rtic u la rly th e n o tio n of d ista n c e. It is
c le a r th a t d a ta d e p e n d en c ies and a rra y in te rc o n n e c tio n s lim it th e speed of
m u ltip ro c e s s o r a rra y s a n d th e a p p ro a c h to b e d ev eloped will provide a
s y s te m a tic m e th o d fo r th e ir analysis.
D istances will b e m e a su re d in u n its of tim e (w hich is in a c co rd a n ce with
o u r u su a l c o n c e p t of d ista n c e since tim e is sim ilar to sp a tia l dim ensions)
a n d d ista n c e m e a s u re s will be d e te rm in e d b y th e a rr a y connectivity. Data
d e p e n d e n c ie s in a p ro g ra m will e sta b lish a p a r tia l ord erin g w hich will
d e te rm in e th e m a n n e r in which th e c o m p u ta tio n w avefronts [se e e.g., Mu71
fo r a discussion] tra v e rs e th e a rra y , w hich to g e th e r w ith th e distan ce
m e a s u re s , will allow ex e cu tio n tim e analysis for v ario u s p ro g ra m m odels. In
fac t, sim ple proofs to th e th eo re m s of th e previous s e c tio n c a n also be
ob tain ed .
5 .4.2.1 D istan ce M easures
L et dn{P,Q ) d e n o te th e "distance" b e tw e e n two elem en ts, P and Q
of a c o lle ctio n D.
- 185 -
D efinition 5.7:
A d ista n c e m e a s u re m u st satisfy th e following p ro p e rtie s [R0 6 6 ]
(1) d a(P ,Q ) > 0 . P * Q ; d a{P ,P ) = 0
( 2) d 0{P.Q ) = d a(Q.P)
(3) d a(P .Q ) + d D{Q ,E) ^ d a(P .B ) P .Q .R e Q
The l a t te r ite m is known as th e T riangle Inequality. C onsider th e
re c ta n g u la r a r r a y of p ro cesso rs, w ith lo ca l connectivity, shown in F ig u re
5.12. E a c h p ro c e s s o r is assigned a s e t of co-ordinates, e.g., le t P have
c o o rd in a te s (px , p y ). an d th e co llectio n of p ro ce sso rs is th e se t Q.
D efin ition 5.8:
The d ista n c e b e tw e en e lem en ts P , Q of th e re c ta n g u la r a rra y is:
djt{F.Q ) = IPz 9z i + \ P y - q V \ (5-10)
I t is c le a r t h a t satisfies th e firs t tw o p ro p e rtie s of Definition 5.7. To
prove th e tria n g le inequality.
dR {P.Q ) + dR {Q ,R) = \px - q s \ + \qs - r z \ + \py - qy \ + \qv - r y
^ \?= ~ ?x + qx ~ r x | + \py - qy + gy - Ty |
= d jtiP .Il)
The d ista n c e , dj>(P,Q) is a m e a su re of th e tim e re q u ire d to go fro m P , Q
following th e con n ectiv ity of th e a rra y .
N ext c o n s id e r th e double hex ag o n al a rr a y of Figure 5.13 (th e a rr a y is
te r m e d double hexagonal due to th e in te r c o n n e c t p a tte rn , n o t to th e sh a p e
of th e c o lle ctio n of p ro ce sso rs, w hich would v ary considerably with n u m b e r
of p ro c e s s o rs a n d a rtis tic ability).
- 186
o,o
2,0
3 ,0
F igure 5 .1 2 : The R ectan g u lar A rray
- 187 -
0,0 0 ,2 0 ,3
1,0
2,0 2 ,3
3 ,0 3 ,1
F ig u re 5 .1 3 : The Double-Hexagonal Array
- 188 -
Definition 5.9:
The d ista n c e b etw een e le m en ts P , Q of th e double hexagonal a rra y is:
dx{P .Q ) = m ax [ \px - qx \ , \pv - q v }]
and th is is th e tim e re q u ire d to go fro m P to Q. Again, dx c le a rly
satisfied Definition 5.7. To prove th e tria n g le inequality:
dx {P,Q ) + dx (Q .R ) = m ax [ \px - qx | . i P y - ? * ! ] + m a x [ \qx - rx | . |gy - r y |]
St m a x [ \px - q x \ + \qx - T x \, Ip y - q y \ + | qy - r y |l
£= m a x [ \px - r x I, |py -T y |]
= dx {P ,R )
Finally, co n sid e r th e hexagonal a rra y of Figure 5.14.
D efinition 5.10:
The d ista n c e b etw een e le m en ts P , Q of th e hexagonal a rra y is:
dH(P ,Q ) = dPX P .Q )I,Z]p(j + dx {P .Q )I[clpQ
w here is th e in d ic a to r fu n ctio n of th e logical event
\C \PQ = \sg n (p x - q x ) = sg7i(py - qy )\
To prove th is e x p ressio n satisfies th e tria n g le inequality p ro c e e d a s follows:
dH(P,Q ) + d ff(Q .R ) = d p {P ,Q )ImpQ + dx {P ,Q )Ilcip9 + dR { Q .R )I p l(}R + dx { Q .R )I[clqR
dfi(P ,Q ) + dR {Q ,R) , (Cjpp n

dx {P ,Q ) + dx (Q ,R ) . \Clpg n \C \qs
djt(P .Q ) + dx {Q ,R ) , \C \ pq n \ C \ ^
dx (P ,Q ) + dR (Q ,R ) , n
d ff(P .R )
- 189 -
0,0 0, 2 0 ,3
1,0
2,0 2 , 2 '.
3 ,0
F igure 5 .1 4 : The Hexagonal Array
- 190 -
This in eq u ality is c le a rly tru e fo r th e first tw o c a se s by th e re s u lts of
Definitions 5.8 a n d 5.9. It is also tru e fo r th e l a t te r two c a se s for i t c a n
never h a p p e n t h a t th e a rra y is tra v e rs e d fro m P to R via a longer p a th
th a n P , Q, R . C learly if th a t w ere th e case, th e p a th P, Q, R would be
chosen, th u s achieving a n equality a t m ost.
These d ista n c e m e a s u re s im pose a topology w hich is useful for
analyzing th e co m p u ta tio n s being p erfo rm ed on a n a rra y .
D efinition 5.12: [Ro68]
A "closed ball" w ith c e n te r P an d rad iu s, r , is th e s e t of elem en ts 0
su c h th a t:
d a{P.Q) * r P .g € Cl
For exam ple, F ig u re 5.15 shows th e shape of a clo sed ball in th e collection
fi, of p ro c e s s o rs fo r th e re c ta n g u la r and hexagonal a rra y . Notice th a t since
th e co llectio n of p ro c e s s o rs in finite., th e sh ap e of th e o p en ball depends on
b o th th e c e n te r P a n d th e rad iu s r .
T heorem 5.3:
L et JC J d e n o te a collection of c o m p u ta tio n s o c c u rrin g a t tim e t 0 (Q
could b e as sim ple as a d a ta tra n s fe r) whose r e s u lts a re re q u ire d to p e rfo rm
a .c a lc u la tio n C in p ro c e s s o r P . The e a rlie s t in s ta n t a t which C c a n be
p e rfo rm e d is tp = t 0 + r w here r is th e ra d iu s of th e sm a lle st closed ball
of c e n te r P enclosing $Cij. i.e.,
tp = t Q + m in : d(Q, P ) « r V Q}
191 -
o
ooo
s~~\
K J
O0 ©O o o
O0 o o o
ooo
o
R ectangular Array
oo oo
ooo oo
oo o P i
'•v_y oo
o o o ®o oo
ooo o PN
o
oo o \ ___ y t
o o » }
[lexa gone! Array
Figure 5 .1 5: Cl os ed B a l l Topol ogy
- 192 -
Proof:
The th e o re m is tru e b y n e g a tio n for if C could be p erfo rm ed a t P a t
t < f p , th a t would im ply th e ex isten ce of r a < r su c h th a t d ( Q , P ) < r 0.
B ut th is is false b y th e definition of V .
I t is c le a r fro m Figure 5.15 a n d T h eo rem 5.3 th a t th e hexagonal
in te rc o n n e c tio n provides fo r m o re flexible c o m p u ta tio n th a n th e
re c ta n g u la r a r r a y since a closed ball of ra d iu s V encloses m o re
p ro ce sso rs.
E xam ple 5.4:
S uppose th e late n cy of th e c a n d id a te th re e dim ensional a rr a y of
T heorem 5.2 is d e s ire d for a given value of Ij i - z - iji-z- This p ro b le m is
rea d ily solved using th e d istan ce c o n c e p t a n d th e p ro c e sso r c o o rd in a te
assig n m en t of F ig u re 5.16. The a rr a y h a s d ista n c e m ea su re given by
E quation 5.10 sin c e it has a re c ta n g u la r in te rc o n n e c tio n p a tte rn .
The d a ta d ep e n d en c ies e sta b lish a p a rtia l o rdering d eterm ining th e
tra v e rs a l of th e a rra y . This ordering is shown in Figure 5.17; to g e th e r w ith
th e d e p e n d e n c e g rap h . Now, in o rd e r to c a lc u la te th e ex ecution tim e on th e
a rra y p ro c e e d as follows;
(1) L et L d en o te th e ex ecution tim e fo r th e in n er two-loops fo r a
given
(ii) The c a lc u la tio n is com plete once th e d ista n c e from
to B iiiijf- 2 . fcjf-iTi, k Mn ) is tra v e rs e d . However the choice of
tra v e rs a l p a th is n o t a rb itra ry (in p a rtic u la r, it is not th e m o s t
d ir e c t p a th ) b u t r a th e r it is d e t e r m in ed b y th e d a ta d ep en d en cies
of th e p ro g ra m m odel. Following th is p a th yields:
- 193 -
M-2
3 ,0 3,k..n
►»
Figure 5 .1 6 : Coordinate Assignment f o r Theorem 1
- 194 -
Dependence: Graphs: '■ '

vuu; (d3)
(d7) (d2)
c^cx T < f
I M -2< q “M-l
P a r t ia l O r d e r in g s :
f o r BM - 2 ^ X 1- q
BM - 2 ^ BM - 2 ^ BM-2î _ 1 ^ BM - 2 '
fo r bm_ 9 ( 1 )>
BM-2( i " q > BM -1{1) BM -I {2> BM-2(q )
for y y
BM - 1 ^ BM - 1 ^ BM - l ( i >
for
• •
f o r BM( i ) c «
* _BM - 1 ^ ) bm ( D V 2) V 1')
Figure 5 .1 7 : T raversal Ordering
L = S
3=1
1
+ d s [ B n { k ji-in - j l,d ) \ B u -i{k ji-in - —j l , *)] )!j
+ B ll(k n -in .,k ji 7i)]
k u .n —l
*jf-in-f li
3=2
(iii) N ext, applying T heorem 5.3 it is re a d ily a s c e rta in e d th at:
d R [ B j i - i ( l - j B f f - x{ l - j - l , * ) ] - 1 0 <,?' < Z- 2
d n [B ji-i{k fi-in - #): B u { k U - xTt, = ka n
d u l B n - i i k j / ^ n - j l . *)\ B x {k ]i- l-n.-tj-H)l,<£)'\ - d+Z
(iv) Thus, th e la te n c y of th e a rra y fo r th is p ro g ra m m odel is given b y
L:
+ 1
- r ^ *------ ] (2d) + {kjj + k j i ^ y n
- 195 -
= 0{n)
This is p re c is e ly th e late n cy evaluated in th e previous section, E quation
5.9, sin ce it is easily s e e n th at:
E xam ple 5.5:
This ex am p le will d e m o n stra te how "quick and. d irty " approxim ations
c a n b e o b ta in e d from th is m ethod. C onsider Givens algorithm , w hich
satisfies th e p ro g ra m m odel, im p lem en ted o n th e a rr a y of F igure 5.6. This
a rra y h a s a re c ta n g u la r in terconnection. With th e co o rd in ate system of
f ig u r e 5.18, it is e a sy to e stim a te th e ex e cu tio n tim e , T, of th e algorithm as
follows:
T ~ 7i + «&[(1.1) : ( !.« ) ] + cfe[(l.n) ; (n.n)]

= 371
N otice t h a t th is is th e ap p ro x im ate e x ecu tio n tim e th a t was d e te rm in e d for
t h a t a rr a y e a rlie r.
5.5 EIGENVALUE DECOMPOSITION
D eterm in in g th e eigenvalues of a n ti x n m a trix , B , is a n e x trem ely
im p o rta n t c a lc u la tio n in lin ear algebra, owing to th e a p p earan ce of
eigenvalues in th e solution to a larg e n u m b e r of pro b lem s. The m o st
efficient te c h n iq u e s fo r calculating eigenvalues a re th e LR algorithm of
R u tish a u se r a n d th e QR p ro c e d u re of F ra n cis [SB80]. B ased on tria n g u la r
an d o rth o g o n a l decom positions resp ectiv ely , b o th th e s e algorithm s a re
ite ra tiv e in n a tu re . The focus of th is s e c tio n will be on th e QR algorithm
- 197 -
2 .n
n -l,n -l • *
Figure 5. : Coordinate System f o r Givens Array
- 198 -
since th e d eco m p o sitio n is s ta b le w ithout pivoting, a s m e n tio n e d ea rlie r.
However, LR w ith o u t pivoting m ay b e im p le m e n te d on th e a rra y s to be
p re se n te d , sim ply th ro u g h a se le c tio n of th e lin e a r m e tric in th e CORDIC
p ro c e sso rs (ex a c tly as in exam ple 5.2 above).
Since th e QR an d LR alg o rith m s a re b o th ite ra tiv e , it is d e sira b le to
re d u c e th e original m atrix , R , to a s p a rs e r s tr u c tu r e , A , which re q u ire s
less c o m p u ta tio n p e r ite ra tio n ; fo r exam ple,A , is trid ia g o n a l o r H essenberg.
S im ilarity tra n s fo rm a tio n s applied using Givens’ ro ta tio n s a re a convenient
m eth o d of achieving th e d e sire d s tru c tu re , and sin ce a rr a y a rc h ite c tu re s
have a lre a d y b e e n p r e s e n te d fo r Givens’ m ethod, th e seq u el will a ssu m e th a t
A is in trid ia g o n a l form .
L et Aq —A. This will be ite ra tiv e ly m odified, so le t A* d e n o te th e m a trix
a t th e k 01 ite ra tio n . T hen th e QR a lg o rith m is d e sc rib e d as follows:
S te p 1: F o rm th e o rth o g o n a l-u p p e r tria n g u la r d eco m p o sitio n of Afc , i.e.
A* = Q /R fc, w ith Qfc o rthogonal and R* u p p e r t r i angu lar. This ste p
c a n b e a c co m p lish e d w ith Givens’ m e th o d a s d isc u sse d in a n e a rlie r
section.
S te p 2: F o rm A*+i = R k Q j an d r e p e a t th e p ro c e d u re .
C onsider one s te p of th e p ro c e d u re so t h a t th e ite r a tio n in d ices m ay b e
dropped. The o rth o g o n a l m atrix , Q, m ay b e w ritte n a s th e p ro d u c t of th e
Givens ro ta tio n m a tric e s , i.e.:
Q A = Qn Qn-i • • Qi A = R
(notice th a t a tra n s p o s itio n of Q , as in s te p 1,is n o t re q u ire d sin ce it is u se d
on th e le ft h a n d sid e of th e e q u a tio n h e re )
while th e p r o d u c t in s te p 2 m ay b e c a s t into th e form :
R Q = R Qn Qn—i ■ ■ ■ Qi
- 199 -
It is con v en ien t to rew rite th e p ro d u c t as:
(RQ)r = QfQ2r • • • Q j R r
This l a t te r form, is quits sim ple to im p lem e n t since a p re m u ltip lic a tio n b y
Q j is a ro ta tio n th ro u g h The angle is a lre ad y know n fro m th e
d eco m p o sitio n p h a se of th e a lg o rith m so th is r o ta tio n is rea d ily p e rfo rm e d
b y redefining th e d ire c tio n of r o ta tio n of th e CORDIC algorithm , a s d isc u sse d
in S e c tio n 4.1. The trid ia g o n a l s tr u c tu r e of A m ak e s th e fo rm a tio n of RQ
fro m (RQ)r a n e a sy ta s k involving only local co m m unications b e tw e en
p ro c e sso rs.
C onsider a n exam ple w ith n = 4. The initial m atrix , A,
ttjl tljg 0 0
a Sl a 22 a 23 0
A = 0 a 32 a 33 a 34
0 0
is re d u c e d to:
r ll r 12 T 13 0 rn 0 0 0
0 7-33 7-24 r l2 7-32 0 0
R = 0 0 7-33 7-34
and R7 =
r 13 r 23 r 33 0
0 0 0 r 44 0 r 24 r S4 r 44
in s te p 1. A lin e a r a rra y of p ro c e s s o rs suffices fo r th is c o m p u ta tio n since it
h a s only 0 ( n ) com plexity owing to th e trid iag o n al s tr u c tu r e of A. Only
th r e e p ro c e sso rs, e a c h one d e d ic a te d to a p a rtic u la r ro ta tio n angle, a re
re q u ire d w ith an im p le m e n ta tio n like F igure 5.8 , w here th e angles a re in
p lace.
R e m a rk : The du al a rra y is n o t a good a lte rn a tiv e h e re , since e a c h p ro c e s s o r
p e rfo rm s only th re e calcu latio n s owing to th e sp a rse n e ss of A. The dual
a rr a y (like F igure 5.6) would b e p re fe ra b le if A was less sp a rse , e.g.
H essen b erg . The trad eo ff is obvious. The load is sp lit b e tw e en th e initial
- 200 -
re d u c tio n of B to A an d th e ite ra tio n s of th e QR m eth o d . This exam ple
b rin g s o u t th e u tility of dual array s. The initial s tr u c tu r e of A h a s a -
profound im p a c t on w hich a rr a y is p referab le.
Once th e QR decom position has been co m pleted, th e angles a re in place
an d th e p ro c e s s o r o u tp u ts m ay b e fed b a c k to co m p u te th e se co n d ste p of
th e ite ra tio n . B idirectional connections a re re q u ire d betw een neighbours in
th e a rra y , owing to th e n eed fo r transposing sp a rse m a tric e s in th is second
ste p . An activity c h a rt of th e a rra y during s te p 1 is shown in F igure 5.19,
Figure 5.20 fo r ste p 2 an d finally Figure 5.21 for th e com b in atio n of ste p 2
p ipelined behind ste p 1. The beginning of s te p 1 of th e n e x t ite r a tio n is also
shown a s a re th e d a ta m ovem ents. Notice th a t only local co m m u n icatio n is
req u ired .
Finally n o te t h a t to im prove th e convergence of th e ite ra tiv e QR
p ro c e d u re , o fte n re q u ire s shifting th e diagonal e le m en ts of A* a t e a ch ste p
[SB60], involving som e additions. These m ay be read ily p e rfo rm e d w ith th e
a rith m e tic u n it of th e CORDIC block. F u rth e rm o re , a n a rith m e tic te s t
facility w hich influences th e m em ory m an a g e m e n t u n it proves useful in
d e te c tin g w hen th e p ro b le m sp lits into sm aller problem s.
L arge a rra y s of p ro c e sso rs were p re s e n te d for th e m a trix operations
th a t a re com m on in m an y a re a s including signal processing- A new lad d e r
s tr u c tu r e for th e fa s t Cholesky algorithm by rows was developed which was
rea d ily im p lem e n te d on a lin e a r a rra y of CORDIC p ro c e sso rs. It was no ted
th a t th e Levinson in la d d e r fo rm and feist C holesky by rows an d colum ns
a lg o rith m s a re all equivalent u n d e r pipelining an d h en ce, enjoy th e sam e
- 201 -
'al l a21
t= i le x
a 12 a22
t=2
r 12 a22V x
\
\
\
\
. a 22 a32
0v 523
t=3
r 13 a 23x r22 0
\
\
\
\
^ a23 a33
t= 4 £=!)
’ 23 “ 33
\
\
\
0 a,'34
a33_ / a43
t= 5
r 24 a3 ^ r 33 0
\
\
\
^ \34 , a44
t=6
in a c tiv e . «. r 34 r 44
in p u ts
^ ^ ^ C O R D I C p ro c e sso r
o u tp u ts
F ig u re 5 .1 9 : E igenvalue D ecomposition - QR Decomposition
- 202
rl l r12
t= l (-6
P11 r 12\
\
\
\
0 r ,„
^ - / 13
P12 r 22>S" PCl

ot r»-%\
lO ■
\
\ \
\ \
X
' S 'v r 22 r 23 X r13 0
t=3
p22 r 23s' \ 31 P41

X
0 r.
33 ,r 24
t=4 f-9
P23 r3 3 \ P32 P42

\
X
t=5
X r 33 r 34
P33 p43
u r
44
t=6
& Y z<
p34 p.44
F ig u re 5 .2 0 : E igenvalue Decomposition - rq C a lrn i^ -; on
- 203 -
Processor 1 Processor 2 Processor 3
*11 *21
rn °
12 22
12 *22^- „
QR Decomposition
*23 > i2 2 *32
r
M
13 23 N .
H
/2 2 °
x /
~\ v ' /
rl l 12 w f V ..
x ^ t-*
CRq)t - p
P12 r2 2 \
next QR deconp.
rn °
22 *23 / X PI3
p22 r2 3 ^
/ X
r 33/ X \ r23 r 24
P23 r 33X x 32 *42
\ l 3 34
‘ rV
“'») f 9
*34 *41
\,S-
‘ p34 p«
Figure 5 .2 1 : E igenvalue Decom position - A c t i v i t y Chart
- 204 -
im p lem e n ta tio n .
L inear a n d tria n g u la r a rr a y a rc h ite c tu re s w ere p r e s e n te d for Givens
m eth o d a n d th e no tio n of dual a rra y s was given. I t w as shown th a t th e
ac tiv ity c h a r t of th e lin e a r a rr a y provides a useful to o l fo r th e sy ste m a tic
c o n s tru c tio n of a rra y s of h ig h er dim ension. This id e a was form alized to
include a lg o rith m s t h a t satisfy a p a rtic u la r m odel, th u s g u aran teein g an
a rr a y t h a t is g e n e ra lly applicable w ithin a c lass of p ro b le m s. Som e id ea s of
r e a l analysis w ere c a s t into a fram ew ork a p p ro p ria te fo r analyzing a rr a y
co m p u ta tio n s th u s allowing a convenient m e th o d fo r analyzing th e
p e rfo rm a n c e of a p a rtic u la r p r o g ra m /a rc h ite c tu r e com bination. Although
sim ple ex am p les w ere analyzed; th is tech n iq u e is b eliev ed to b e th e c o rr e c t
a p p ro a c h fo r a tta c k in g m ore g e n e ra l p ro b le m s in w hich p ro g ra m an d
c o m m u n ic a tio n c o sts a re d ifferen t and n o t n e c c e s s a rily in te g e r valued.
D ifferent p ro g ra m se g m e n ts an d d ifferent d a ta p a th s m ay have widely
varying a s so c ia te d costs, for exam ple as in a d is tr ib u te d sen so r netw ork
[CB79 for exam ple]. N e a rest n eighbour c o n n e ctio n s only was an im plicit
a ssu m p tio n in th is work, how ever m an y in te re s tin g possibilities fo r
g e n e ra liz a tio n exist. For exam ple, if d a ta n e e d s to b e tra n s fe rre d to a
d is ta n t p ro c e sso r, w h at is th e im p a c t of accom plishing th is tra n s fe r in a
sm all n u m b e r of ste p s, e a c h of w hich is la r g e r th a n sim p ly th e d istan ce to
th e n e a re s t neig h b o u r?
Finally, a n a rr a y a rc h ite c tu re was given fo r th e c e le b ra te d QR
algorithm . The u tility of dual a rra y s was b ro u g h t o u t in th is exam ple.
BUSLIOGRAFHY
[CB79] D. Cohen, J. B a rn e tt, Y. Yem in i, D. Schw abe, "DSN - D istributed

t.
S ensor Networks," In fo r m a tio n S c ie n c e s In s titu te , Univ. of
S o u th ern California, Working p a p e r ISI/WP-12, April, 1S79
[Ch75] S.C. Chen, "Speedup of Ite ra tiv e P ro g ra m s in M ultiprocessor
System s," Ph.D D issertation, U n iversity o f Illin o is at Urbana-
Champaign, Dept, of C om puter Science, Jan u ary , 1975
[De82] J.M. Delosme, "Algorithm s fo r Finite Shift-Ran k P ro cesses," Ph.D
D issertation, S ta n fo r d U niversity, Dept. o? E lectrical
Engineering, June 1982
[DM80] J.M. Delosme, M. Morf, "A Tree C lassification of A lgorithm s for
Toeplitz and R elated E quations Including G eneralized Levinson
a n d Doubling Type A lgorithm s," Proc. 19 th IE E E CDC, D ecem ber
10-12, 1980, pp. 42-46.
[DM81] J.-M. Delosme, M. Morf, " S c a tte rin g A rrays fo r M atrix
C om putations," Proc. o f the 2 5 th In t'l. Tech. S ym p . o f SPIE, San
Diego, CA. August, 1981.
[KL80] H.T. Kung, C. L eiserson, "Highly C o n c u rre n t System s," in
In tro d u c tio n to VLSI S y s te m s . (Mead an d Conway), Addison-
Wesley, 1980.
[KR81] S.-Y. Kung, D. Rao, "Highly P a rallel A rc h ite c tu re s fo r Solving
l in e a r Equations," Proc. o f I n i L Conf. o n A coustics Speech and
Signed Processing, A ltanta, GA, M arch. 1961, pp. 39-42.
- 206 -
[Ku77] D. Kuck, "A Survey of P a ra lle l M achine O rganization a n d
P ro g ram m in g ," ACM C om puting S u rv e y s, Vol. 9, No. 1, M arch
1977.
[Ku79] H.T. Kung, "L et’s Design A lgorithm s for VLSI System s," Proc. o f
1 st Caltech VLSI S y m p o siu m , pp. 65-90, Ja n u a ry 1979.
[KuSSO] S.Y. Kung, 'VLSI A rray P ro c e sso r for Signal P rocessing," Proc. o f
1st M IT Conf. on A dvanced R esearch on In te g ra te d C ircuits,
J a n u a ry 28-30, 1980.
[LeRG77] LeRoux, C. Gueguen, "A Fixed P oint C om putation of P a rtia l
C o rrelatio n Coefficients in L inear P red ictio n ," Proc. 1877
ICASSP, p p 742-743
[MD81] M. Morf, J.-M. Delosme, "M atrix D ecom positions an d Inversions Via
E le m e n ta ry Signature-O rthogonal T ransform ations." IS S M I n t l .
S y m p . o n M ini and M icrocom puters i n Control a n d M ea su rem en t,
S a n F rancisco, CA, May, 1981.
[Mo74] M. Morf, "F a st A lgorithm s fo r M ultivariable S ystem s", Ph.D
D issertation, E le ctric a l E ngineering Dept., S tanford U niversity.
S tanford, CA, 1974.
[MLNV77] M.Morf, D. Lee, J. Nickolls, A. V ieira, "A C lassification of
Algorithm s fo r ARMA Models an d L ad d er R ealizations,"
P roceedings o f the 1977 IE E E In tl. Conf. o n A coustics, S p e e ch
a n d S ig n a l Processing,
- 207 -
[Mu7l] Y. M uraoka, "P arallelism E xposure and E xploitation in
P ro g ram s," Ph.D. D issertation, Dept, of Comp Sci., Univ. of
Illinois a t U rbana, 1971.
[Ro56] M. R oseniicht, In tr o d u c tio n to A nalysis, S cott, F o re sm a n a n d Co.,
1S6B
[SB80] J. S to er, R. B ulirsch, In tr o d u c tio n to N u m e ric a l Analysis,
Springer-V erlag, NY. 1980.
[Sc81] R. Schm idt, "A Signal S u b sp a c e A pproach to M ultiple E m itte r
L ocation an d S p e c tra l E stim ation." Ph.D. D issertation, Dept, of
E le c tric a l E ngineering, S ta n fo rd U niversity, 19B1.
[SK75] A.H. Sam eh, D. Kuck, "L inear S y stem Solvers fo r Parallel
C om puters," Technical R eport 75-701, U niversity o f Illinois at
Urbana-Champaign, Dept, of C om puter S cience, F e b ru a ry . 1975.
[VT6 8 ] H.L. Van T rees, D etection, E s tim a tio n a n d M odulation Theory,
John Wiley an d Sons, New York, 1968.
- 208 -
CHAPTER SIX
A LADDER FORM CHIP SET
It is now possible to co n sid er th e design of a VLSI chip o r chip s e t which
would be su itab le for th e im p lem en tatio n of th e a lg o rith m s of C h apters
T hree a n d Four. Such a chip would have a wide ran g e of applications,
c e rta in ly as m an y possibilities as th e re a re fo r la d d e r form s. In addition,
th e ch ip s e t should be useful for o th e r signal p ro ce ssin g su c h as th e DFT of
C h a p te r Two an d th e m a trix operations of th e p rev io u s c h a p te r. A c ritic a l
ex a m in atio n h a s rev ealed th a t CORDIC o p e ra tio n s provide a n a tu ra l
d e sc rip tio n of all th e se algorithm s, hen ce th e y should m ap quite efficiently
onto a g e n e ra l p u rp o se CORDIC p ro cesso r. However, th e la d d e r filter
fo rm u la tio n of C h ap ter T hree (Equations 3.4-3.6) a p p e a rs to pose th e m o st
in te re s tin g challenge, sc th is c h a p te r will c o n c e n tra te on im p lem en tin g th e
sq u a re ro o t norm alized la d d e r rec u rsio n s for th e analysis of 8 KHz sam pled
sp e e c h as a ta r g e t application. In m any ways, th is is a very re a l p ro b lem of
in te r e s t in th e in d u stry today. F u rth e rm o re , th e re su ltin g a rc h ite c tu re will
b e m o re in te re s tin g th a n sim ply a g e n e ra l p u rp o se CORDIC p ro c e sso r a n d it
will also b e cap ab le of rea d ily im plem enting th e rem a in in g algorithm s.
6.1 IMPLEMENTATION OF THE NORMALIZED LADDER EQUATIONS
R ecall t h a t in C hapter Three, la d d e r filte rs p ro v id ed a convenient
s tr u c tu r e fo r sp e e c h analysis. The sq u are ro o t n o rm a liz e d la d d e r re c u rsio n s
w ere co n sid e red to be p re fe ra b le for im p le m e n ta tio n th a n th e ir
u n n o rm a liz e d c o u n te rp a rts b e c au se th e equations w ere few er in n u m b e r
a n d a ll v a ria b le s w ere in m ag n itu d e less th a n unity, th u s m aking fixed po in t
- 209 -
im p le m e n ta tio n viable. The equations w ere e x p re sse d in te rm s of
g e n eralized ro ta tio n s in o rd e r to re d u c e th e com plexity of th e algorithm .
This se ctio n will p u rsu e th e im p lem e n ta tio n of E quations 3.4-3 . 6 using th e
CORDIC algorithm s.
C onsider first th e m a trix p ro d u c t VAN (E quation 3.4) which co rresp o n d s
to ro ta tio n s of th e colum n v ecto rs of A th ro u g h -!?v and i3n . P rio r to
p erfo rm in g a ro tatio n , it is n e c e ssa ry to c o m p u te th e angle (e.g. iSv from
v). This o p e ra tio n re p re s e n ts co n sid e rab le o v e rh e a d since one CORDIC
o p e ra tio n is re q u ire d to c a lc u la te xc a n d th e n a second o p e ra tio n is
n e e d e d for calculating in o rd e r to c o m p u te fro m x . Due to th e
sp a rs e n e ss of A it is p refe ra b le to im p le m e n t th e ro ta tio n by ~3,v i.e. th e
p ro d u c t 'AN1 w ith s tra ig h t forw ard m u ltip lic a tio n (also using a CORDIC
p ro c e sso r) th e re b y avoiding th e n e e d to c a lc u la te However, th e p ro d u c t
A N is no lo n g er sp a rse an d th e ro ta tio n of its colum ns th ro u g h •#„ is m o st
efificently realized as a v e c to r ro tatio n .
A two p ro c e sso r realizatio n of one la d d e r filte r s ta g e is shown in Figure
6.1 . H ere, it is assu m ed th a t b o th p ro c e s s o rs c a n co m p u te all the CORDIC
functions, th e p a rtic u la r fu nction being s e le c te d by a s e t of control signals
w hich r e p r e s e n t th e value of 'ml and id en tify th e zero fo rced variable 2 or
y in th e CORDIC algorithm . F u rth e rm o re , th e spu rio u s scale fac to rs which
a p p e a r in th e CORDIC re c u rsio n s a re a ssu m e d to b e norm alized out and th e
convergence reg io n of th e algorithm s is a ssu m e d to be e x ten d ed by one of
th e m eth o d s of C h ap ter Four.
E ach tim e slo t in Figure 6.1 c o rre sp o n d s to one com plete CORDIC
o p e ra tio n an d is r e fe rre d to as a ''vnacrocycls ”. It is f u rth e r subdivided in to
m an y "m icro cyctes" e a c h of which c o rre sp o n d s to one ite ra tio n of th e
CORDIC algorithm .
CORDICBLOCK1
HYP LIN CIRC HYP p+ HYP 1+
m»-1 -I
y -0 z-0 y-0 p+
COMPUTE i)° COMPUTEpi)c ROTATE COMPUTE 0 , HYPERBOLIC ROTATION
BY- BY0„.
HYP CIRC LIN CIRC HYP
COMPUTE i c COMPUTE - 0 , COMPUTE - p i ; ROTATE HYPERBOLIC ROTATION

BY
MOTES Ri d em o n s Ilie/H i s o a lc h p a d fea>slcr *c = ' r 1
Figure 6 . 1 : r.ORniC Implem entation o f Square Root Ladder Form

- 211 -
During 7=1, 2 th e first CORDIC block c o m p u te s th e m a trix p ro d u c t
'A N (w hich is s tra ig h t forw ard d u e to th e sim ple s tr u c tu r e of A).
S im ultaneously, th e seco n d block p re p a re s fo r th e r o ta tio n th ro u g h ~SV (o r
m u ltip lic a tio n by V by com puting th e angle as
= t a n -1 —
ir
Then th e two colum n v e c to rs of ’A N a re r o ta te d th ro u g h one colum n in
e a c h p ro c e s s e r. S ubsequently, th e final J r o ta tio n (E quation 3.6) is
p e rfo m e d during 7 =5 a fte r com puting its "angle" J#p as
J*p = ta n h -1 p+
The u p d a te d variables p+y+ a n d 77+ a p p e a r in various tim e slo ts during th e
c o m p u ta tio n .
The la d d e r re c u rsio n s a re rea d ily im p le m e n te d w ith two "p erfect"
CORDIC p ro c e sso rs in five CORDIC o p eratio n s. This ra is e s th re e issues.
F irst, it is possible to design a " p e rfe c t” CORDIC, i.e., one which does n o t
suffer fro m th e scaling a n d convergence p ro b le m s of th e CORDIC algorithm s
w hich w ere p o in te d o u t in C hapter Four. Secondly, c a n th e s e be m ade to
o p e ra te sufficiently fa s t for th e ta r g e t application? Finally, design th is chip.
The l a t te r point will be ex am in ed first a n d a d e ta ile d answ er to th e
se co n d q u e stio n will n a tu ra lly arise. Note a t th e o u ts e t th a t one of th e
c o n trib u tio n s of C h ap ter Four was to provide e x tre m e ly low overhead
so lutions to th e scaling an d convergence s h o rtc o m ings of th e CORDIC
alg o rith m s, th u s answ ering th e first point.
- 212 -
6 .2 LADDER FORM CHIP ARCHITECTURE
P rio r to em b ark in g on a chip design, it is n e c e s s a ry to e sta b lish th e
d e sig n c o n stra in ts, which a re now item ized:
1) The chip will be fa b ric a te d in a single polysilicon, du al th resh o ld ,
tl-channel silicon g a te MOS p ro ce ss. This decision is b a s e d on th e
relativ ely good availability of su c h p ro c e sse s as well as th e ir
o m n ip resen ce in th e a c ad e m ic VLSI com m unity.
2) A single chip should be capable of im plem enting a t le a s t one and
p re fe ra b ly m o re s ta g e s of a la d d e r filter w ith an in p u t r a te of 3000,
16 b it sam p les p e r second. Multiple chips should b e easily
in te rfa c ed , providing p ro to c o l fre e d a ta tra n s fe r, so t h a t filters of
larg e o rd e r an d m u ltip ro c e s s o r a rra y s (d a ta flow a rc h ite c tu re s )
m ay b e read ily c o n s tru c te d .
3) Fixed poin t CORDIC alg o rith m s will be u tilized since m an y signal
p ro cessin g alg o rith m s, especially square ro o t n o rm alized la d d e r
form s, a re conducive to th a t.
4) The chip should be m ic ro p ro g ram m ed fo r flexibility an d design
ease. This will allow ra p id m odification of th e co n tro l p ro g ra m in
applications o th e r th a n s p e e c h analysis.
These c o n s tra in ts su g g e st th e g e n e ra l s tr u c tu r e of F igure 6.2 which
c o n sists of two CORDIC p ro c e sso rs, som e sc ra tc h p a d memory,- in p u t/o u tp u t
(I/O) and a m ic ro p ro g ra m c o n tro lle r. W hether o r n o t to include th e
c o n tro lle r on th e sa m e chip as th e a rith m e tic facility is a n in te re s tin g issue
(A rithm etic facility h e re re f e rs to th e CORDIC blocks, som e s c ra tc h p a d area,
I/O a n d bus s tr u c tu r e .) An onboard c o n tro lle r is a ttra c tiv e in m any
ap p licatio n s b u t it m ak es chip te stin g difficult. F u rth e rm o re , a la d d e r filter
- 213 -
CORDIC BLOCK 1
INPUT SCRATCHPAD
PORT REGISTERS PORT
CORDIC BLOCK 2
MICROPROGRAM CONTROLLER
Figure 5 . 2 : Dual-CORDIC Chip A r c h ite c tu r e
- 214 -
c o n s tru c te d of m a n y of th e s e chips c a n b e o p e ra te d w ith a single co n tro ller
since e a c h chip in th e la d d e r p e rfo rm s th e sa m e operations. For
developm ent p u rp o se s, th e decision was m a d e to have a s e p a ra te controller,
how ever th e o p tio n to in te g ra te it a t a la te r d a te rem a in s.
Only th re e s c ra tc h p a d locations a re a c tu a lly re q u ire d to execute one
sta g e of th e la d d e r algorithm s, as in d ic a te d in F ig u re 6.1, however, a to ta l of
eight will b e in clu d e d in th e p ro to ty p e chip to allow fo r applications th a t a re
m ore sto ra g e intensive.
The in p u t a n d o u tp u t p o rts a re com pletely syn ch ro n o u s p o rts w ithout
any h a n d sh ak e p ro to co l. S u ch p o rts a re m o st n a tu r a l fo r la d d e r form s and
th e a rra y s of C h a p te r Five, w hich a re in h e re n tly d a ta flow a rc h ite c tu re s with
local connectivity. M ultiple chips m ay b e re a d ily co n n ected w ithout
u p se ttin g th e la d d e r d a ta flow. Since m any r e a l tim e signal processing
ap plications a re of a d a ta flow variety, th e I/O s tr u c tu r e is quite generally
applicable.
The im p le m e n ta tio n of a single CORDIC p ro c e s s o r will be considered in
d etail next. The a rc h ite c tu r a l definition of th e a rith m e tic facility will th e n
be c o m p le te d b y specifying th e bus s tru c tu re . D iscussion of th e co n tro ller
is d e fe rre d to a l a t e r section.
6.3 DESIGN OF A CORDIC PROCESSOR
In focussing a tte n tio n o n a single CORDIC p r o c e s s o r of Figure 6.2, th e
following q u e stio n s m u s t b e resolved.
1) How a r e th e scaling an d convergence p ro b le m s of th e CORDIC
alg o rith m s to b e solved?
- 215 -
2) The CORDIC ite ra tio n s involve th re e equations. Will th e se be
e x e c u te d c o n c u rre n tly o r in seq u en ce?
3) Is th e a rith m e tic in th e CORDIC e q u a tio n s to b e done b it serially o r
b it p arallel?
4) How a re th e tru n c a tio n e rr o r s t h a t o c c u r in th e CORDIC ite ra tio n s
handled?
All a rith m e tic will be two's co m p le m e n t in th e in te r e s t of sim plicity and
sp e ed . Ite m (4) is re a d ily handled b y p e rfo rm in g all in te rn a l a rith m e tic to
20 b its, sin ce th e additional fo u r g u a rd b its g u a ra n te e zero e rr o r due to
tr u n c a tio n in s ix te e n CORDIC ite ra tio n s [W a7l]. Ite m s (2) and (3) have a
p ro fo u n d im p a c t on chip a re a a n d th ro u g h p u t. P a ra lle l execution of
eq u a tio n s an d a rith m e tic will c le a rly re s u lt in h ig h e r chip th ro u g h p u t b u t
also in m o re chip a re a and h en ce re d u c e d yield. However, for a given filter
of la rg e o rd er, few er chips of h ig h er th ro u g h p u t would b e re q u ire d since
m o re s ta g e s of th e filte r could b e co m p u te d b y a single chip com pared w ith
a single, serially b ased , lower th ro u g h p u t chip. The b it serial a rithm e tic
re a liz a tio n s offer p o te n tia l ad v an tag es of re d u c e d pow er and pinout. A
specific configuration will c le a rly be c h o se n b a s e d on th e p ro b lem to be
solved.
R e tu rn in g now to ite m 1) above, re c a ll t h a t so m e existing as well as new
so lu tio n s to th e convergence en d scaling p ro b le m s w ere stu d ie d in C h ap ter
F o u r These ra n g e d fro m sim ple id ea s to e la b o ra te sc h em e s of p re ro ta tio n s
a n d scaling cy cles w hich involved sp ecial c o n tro l an d hardw are.. The
im p o rta n c e of th is choice is n o t to b e u n d e rs ta te d in view of its m ark e d
e ffect on chip an d c o n tro l com plexity as well as th ro u g h p u t. Clearly th e
s u p e rio r sc h em e is th e new one of S ection 4.2 sin ce it involves no h ardw are
o r c o n tro l overhead a n d in cu rs only a m inim al s p e e d p en alty . F u rth e rm o re ,
- 216 -
since th e sc h e m e re lie s only on th e s ta n d a r d CORDIC recu rsio n s, no sp e cia l
o p e ra tio n a l c a p ab ility beyond th e shift a n d ad d re q u ire m e n t need be b u ilt
in to e a c h CORDIC p ro ce sso r. The seq u en ce of exam ple 4.2 'will be em ployed
■with m = l a n d a sim ila r sequence c a n b e g e n e ra te d for 771 = - 1.
Given th is decision, various a rc h ite c tu re s fo r th e a rith m e tic facility will
b e stu d ie d n ex t.
6.3.1 The F ully P arallel CORDIC Block
The CORDIC eq u atio n s a re re c a lle d fo r convenience:
r
1 rr.^ d i
-AkOi 1 yi
* t+ i = zi ~ £ Mi “ t
S uppose th a t all th re e CORDIC eq u atio n s a re e x e cu te d c o n c u rre n tly
using b it p a ra lle l a rith m e tic . This co n fig u ratio n will b e known as th e
" f u l l y p a ra lle l" a p p ro a c h and it exhibits th e h ig h e st th ro u g h p u t of all th e
a r c h ite c tu r e s to b e c o n sid ered while occupying th e m o st area. Since th e
c h o sen scalin g te c h n iq u e m ak es use of th e s ta n d a rd CORDIC equations, th e
m ajo r logic c o m p o n e n ts req u ire d a re a d d e rs a n d tw o's com plem ent p a ra lle l
sc a la rs. F ig u re 6.3 shows th e fully p a ra lle l a rc h ite c tu r e whose re g is te r
tra n s fe r language (RTL) d e scrip tio n is:
t l : § 1= BUFy «- X ; B U F y «- Y ; B U F y <- Z
SH FT y <- B U F y : S H F T y <- B U F y ; ZAUa <- ROM
$ 2: X <- XAU ; Y <- YAU ; Z <- ZAU
w here t i is th e i th m icrocycle an d SH FT y, SH FT y re fe r to th e sc ale rs in th e
- 217 -
< ■ /-
A rith m e tic U nit
+ /-
A rith m e tic U n it
+ /-
F igure 6 . 3 : The F u lly P a r a l le l CORDIC Block
- 218 -
X a n d Y channels respectively, (n o te t h a t c o n tro l in form ation is assu m ed to
b e valid th ro u g h o u t th e m icrccycle).
This d e scrip tio n is d elib erately -written in a two p h ase clock fo rm a t as
th is is th e rea liz a tio n ch o sen fo r th e a rc h ite c tu re , although it is c le a rly
quite a g e n e ra l descrip tio n b e c au se ®lt $>2 m ay b e viewed as th e active and
in activ e p h a se s of som e clock 'C'.
D uring th e first phase, f j, d a ta is tra n s fe r re d from th e d a ta re g is te r,
th ro u g h th e buffers to th e sc a le rs a n d a rith m e tic units. This is th e c ritic a l
tim ing p a th . At th e end of th e seco n d p hase, th e new re s u lts w hich a re
e m e rg in g fro m th e AU’s are w ritte n b a c k in to th e d a ta re g iste rs com pleting
one ite r a tio n of th e CORDIC equations. Since d a ta is only w ritte n b a c k on
$g, th e r e t u r n p a th c a n be p re c h a rg e d on §,, resu ltin g in a sp e ed
ad v an tag e. The z -c h a n n e l is p a rtic u la rly sim ple since one of its o perands,
a, is su p p lie d by a ROM. A c o n tro l signal b a s e d on e ith e r th e sign of y or
z is u s e d to se le c t th e d irectio n of r o ta tio n a t e a c h iteratio n , th u s realizing
th e fu n c tio n to b e com puted. F o r exam ple, in com puting th e a rc ta n
fu n ctio n , th e goal is to drive y n -» 0 th u s m aking th e sign of y th e
c rite r io n for selectin g th e d ire c tio n of r o ta tio n Notice th a t a n additional
signal is re q u ire d for th e z-channel to define th e d ire c tio n of ro ta tio n , b a s e d
on £ (se e S e c tio n 4.1).
E a c h channel (i.e. X , Y , Z ) of th e fully p a ra lle l a rc h ite c tu re h as
d e d ic a te d b u sse s for th e re a d a n d w rite functions. In fact, a single b u s m ay
be s h a re d since th e se functions a re on a lte rn a te clock p h ases.
U n fo rtu n ately , p rech arg in g is th e n p re c lu d e d w ith a two p h a se clock
sc h em e . W hether or n o t p re c h a rg in g would realize a significant sp e ed
a d v an tag e, is a question w hich c a n only b e answ ered th ro u g h c irc u it
sim ulation, and a d etailed knowledge of th e p ro c e ss p a ra m e te rs (the l a t te r
- 219 -
is fre q u e n tly n o t available in a university e n v iro n m en t). Since th e m ixed
p a ra lle l-se ria l a p p ro a c h e s of th e n e x t sectio n s, a re of p rim a ry in te r e s t in
th is th esis, th e d e ta ile d developm ent of th e fully p a ra lle l sy ste m is le ft to
th e re a d e r.
Som e d ev elo p m en t a n d layout was how ever n e c c e s s a ry in o rd e r to
o b ta in chip size e s tim a te s fo r com parison p u rp o se s. These will be d iscu ssed
la te r; it suffices to s ta te t h a t th ey w ere b a se d on th e use of p se u d c -sta tic
re g iste rs, r e fre s h e d once p e r clock, a b a rr e l s h if te r [MC80] au g m en ted to
p ro p o g ate sig n fo r th e a rith m e tic scaling and a fully active, tw o’s
c o m p lem en t a rith m e tic u n it (an active c a rr y c ir c u it provides th e ability to
p ro p o g ate th e c a r r y signal rapidly. In c o n tra s t, th e M anchester c a rry chain
[MC80] p e rfo rm s poorly in propogating high levels). A c u te design of th e AU
is possible since th is u n it is only req u ired to a d d a n d s u b tra c t. Addition c a n
be sim ply done w ith one full a d d e r [Pe72] p e r b it p o sitio n a n d a half a d d e r in
th e le a s t sig n ifican t b it position. However, by m ak in g th e l a t te r a full a d d e r
also, a tw o's c o m p le m e n t addition (i.e. a s u b tra c tio n ) is read ily obtained.
The s u b tra h e n d is logically co m p lem en ted a n d a d d e d to th e o th e r op eran d
yielding a o n e’s co m p le m e n t addition. F orcing a c a rr y in to th e le a s t
significant p o sitio n re s u lts in a two’s c o m p le m e n t o p e ra tio n (and th is avoids
th e n e e d to w ait for th e e n d around, c a rry [Hw79] re q u ire d by o n e ’s
co m p lem en t addition).
A logic d ia g ra m of th e AU is given in F igure 6.4. Notice th a t an in v e rte r
has b e e n in c lu d e d to fo rm th e one’s c o m p le m e n t of an op eran d during a
su b tra c tio n . S tric tly speaking, this c a n b e e lim in a te d th ro u g h th e addition
of a single c o u p le r in th e re g is te r cell, w hich e n a b le s th e co m p lem en t of th e
re g is te r c o n te n ts onto th e bu s (Figure 6.5). U n fo rtu n ately , th e ad d itio n of a
c o n tro l line in e a c h re g is te r to activate th is c o u p le r consum es m o re a re a
- 220 -
n
•S.. (sum)
i+1
i+1
ADD
ADD
a i bi
Figure 6 . 4 : B it S l i c e o f A r ith m e tic Unit
WRITE
REFRESH
READ NEGATIVE
READ
Figure 6 . 5 : A R e g i s t e r Cell
- 221 -
th a n th e e x tr a in v erter.
In a d d itio n to being a specific rea liz a tio n of th e CORDIC block, th e fully
p a ra lle l s tr u c tu r e also serves as a functional a r c h ite c tu r e fo r th e o th e r
co n fig u ratio n s to be studied.
6.3.1.1 P ip elin in g
The individual CORDIC ite ra tio n s 'm ay b e pip elin ed as show n in Figure
6 .6 , w hich is p a rtic u la rily convenient in a d a ta flow a rc h ite c tu re . This
m eth o d e sse n tia lly lea d s to a d istrib u te d s c a le r (unlike F igure 6.3) w here a
sm all s h ifte r, su p p o rtin g one or two shift values, is b u ilt for e a c h ite ra tio n of
th e alg o rith m . In terestin g ly , su c h a rea liz a tio n is likely to b e sm a lle r in an
MOS tec h n o lo g y th a n th a t of Figure 6.3, since th e s e le c t lin es to th e sc ale r
a re no lo n g e r n e e d ed . In te rm e d ia te sto ra g e re q u ire d b e tw e e n ite ra tio n s is
read ily a c c o m o d a te d by th e p se u d o sta tic node sto ra g e aflo rd ed by th e MOS
technology.
The c o m p u ta tio n r a te (clock r a te ) of th e s tr u c tu r e of F igure 6 .6 m ay
now b e s e le c te d th ro u g h a com bination of th ro u g h p u t and late n cy
re q u ire m e n ts .
6 .3 .2 The P arallel-Serial CORDIC Block
" P a r a lle l- S e r ia l'’ re fe rs to th e rea liz a tio n in w hich th e CORDIC
eq u atio n s a re e x e c u te d sequentially, however w ith b it p a ra lle l a rith m e tic .
With re fe re n c e to th e a rc h ite c tu re shown in F igure 6.7 th e RTL d escrip tio n
tl: BUF <- X; SHFT «- Y
- 222 -
fZ
o
o
CO
o
»—I
Q
O'
o
o
X
•a
•c
0)
CM
Q.
O
VO
CO U_ J— VO
<D
s-
3
cn
•p*
Lu
co
00 U_ >—
X N
- 223 -
MUX
>-
— >N— -^5
CU
c c o
o> cn e
CO CO
0)
Figure 6 . 7 : The P a r a l l e l - S e r i e l CORDIC Block
- 224 -
$2: TEMP «- AU /* TEMP is a s c r a tc h p a d re g is te r V
t2: BUF <- Y; SHFT <- X
$ 2: Y <- AU
t3: BUF - Z; AU «- BOM (a*); X «- TEMP
$2: Z «- AU
N otice th a t th e d a ta r e g is te r s m ay b e re a d on two b u sses so th a t b o th
AU o p eran d s a re a c c e sse d sim ultaneously, saving co n sid erab le tim e over
sequential fe tc h schem es. B eyond th e obvious c o m p o n e n t red u c tio n , th e
m ajo r difference b etw een th is sc h em e and th e fully p a ra lle l a p p ro a c h is th e
n eed to buffer th e new value, xi+1, u n til T/i+i has b e e n co m puted, since th e
la tte r q u a n tity re q u ire s th e value Xj. This buffer could be e ith e r one of th e
s c ra tc h p a d lo catio n s or an ad ditional location b u ilt a d ja c e n t to th e x -
re g iste r. The ev en tu al tra n s fe r of Xi+i fro m th e buffer to th e x r e g is te r is
done on of C3. W hereas previously all writing was p e rfo rm e d during <S?2,
m aking this tra n s fe r on saves a n e n tire clock cycle, i.e., it is n o t
n e c e ssa ry to w ait fo r th e s u b se q u e n t $2 o ccurring in C4. However, if th e
buffer was one of th e s c ra tc h p a d re g is te rs , being tra n s fe rre d to x on
WRBUS, th e n p rec h a rg in g of th is bus during is n e c e ssa rily p rec lu d ed , a t
le a s t during C3. This im poses a severe size p e n a lty on th e a rith m e tic u n it
and s c ra tc h p a d re g is te rs sin c e th ey a re th e n re q u ire d to pull th e bus in
b o th d irectio n s. A p re fe ra b le solution is to build a sm all, slow buffer
r e g is te r a d ja c e n t to th e X re g is te r w ith a short, d e d ic a te d tra n s fe r p ath.
Since WRBUS will th e n no lo n g e r be re q u ire d fo r TEMP -» X , it m ay be
p re c h a rg e d as before. The s h o r t tra n s fe r p a th to X m ay b e p re c h a rg e d on
$2 a n d u se d on $ lt or vice v e rsa , th e la tte r being p re fe ra b le since YfRBUS
- 225 -
and th e d e d ic a te d tra n s fe r p a th to X m ay be p re c h a rg e d w ith th e sam e
signal. An a lte rn a te a p p ro a c h is to n o t p re c h a rg e th e tr a n s f e r p a th a t all
and u tilize th e e n tire clock cycle to effect th e tra n s fe r. J u s t p rec isely w hich
o p tio n is p re fe ra b le c a n be a s c e rta in e d th ro u g h c irc u it sim u latio n fo r th e
specific, p ro c e s s being used.
This re a liz a tio n of th e CORDIC block h as th e logic b u iit u n d e r a th re e
bus s tr u c tu r e fo r m o st effective a re a utilization. B oth AU o p eran d s a re
a c c e sse d sim u ltan eo u sly , via d ire c t drive of m e ta l b u sses th ro u g h couplers,
while th e th ird b u s is u se d to w rite th e re s u lt b a c k in to one of th e reg iste rs.
P a rallel a c c e s s of th e AU o p eran d s re s u lts in th e b e s t th ro u g h p u t.
The bu s s tr u c tu r e fo r a single b it p a th is show n in F igure 6 .6 . M03
tech nology affords a p a rtic u la rily notew orthy m e th o d of b u s c o n tro l th ro u g h
th e use of couplers. The s c ra tc h p a d a re a is also n e a tly a c c e sse d by
effectively d istrib u tin g th e m ultiplexing to be lo cal to e a c h re g is te r. All
re g is te rs an d th e a rith m e tic u n it a re sim ilar to th o se of th e p arallel
realization.
6 .3 .3 The Serial-P arallel R ealization
W hen b it se ria l a rith m e tic is em ployed to c o m p u te all th re e CORDIC
ite ra tio n s c o n c u rre n tly , th e rea liz a tio n is te r m e d s e r ia l- p a r a lle l. While
th is an d th e fully s e ria l a p p ro a c h of th e n e x t s e c tio n a re in c lu d e d h e re for
co m p leten ess, it is to b e em p h asized th a t th e se w ere developed by Peng Ang
[AnBO] b a s e d on th e fu n ctio n al a rc h ite c tu re of S e c tio n 6.5.1.
The d e ta ils of th e CORDIC block is shown in F igure 6.9. The /- r e g is te r
in d ic a te d in t h a t figure a c ts as a c o n tro lle r to th e 2* s c a le r. It se le c ts
w hich of th e 20 b its in th e X a n d Y re g is te rs g e t fed d ire c tly tc th e full
- 226 -
WRITE
READ ON BUSS1
CORDIC REGISTERS
READ ON BUSS2 ____
WRITE '
READ (BUSS1 nnly ) AU BUFFER REGISTER
BARREL SCALER
OPERATION CONTROL ARITHMETIC UNIT
WRITE REG 0
READ REG 0
WRITE REG 1
READ REG 1 SCRATCHPAD REGISTERS
WRITE REG 7
READ REG 7
F ig u re 6 .8 : Bus S tr u c tu re o f P a r a l le l - S e r i a l A rc h ite c tu re
- 227 -
BUFFER
AU
I-REGISTER/SHIFTER MUX
AU.
BUFFER
:-REGiSTER
AU
a.
SIGNY-
CONTROI.
S1GN/'.■ ■MODE
Figure 6 . 9 : The S e r i a l - P a r a l l e l CORDIC Block
- 228 -
ad d er. I t is p e rtin e n t to n o te th a t th e com plexity of th e s c a le r of a p a ra lle l
im p le m e n ta tio n is com pletely b y p a sse d in th e se ria l a rith m e tic a p p ro a c h .
Timing is h iera rc h ica lly organized in to th re e levels. These c o m p rise
(a) th e m a c ro (or T) cycles
(b)" th e ite ra tio n (or I ) cycles
(c) th e se ria l (or 5 ) cycles.
There a re tw enty /-c y c le s n e s te d w ithin each m a c ro cycle,
c o rresp o n d in g to th e ite ra tio n s of th e se ria l a rith m e tic on 20 b it q u a n titie s.
F u r th e r e m b e d d e d w ithin e a c h /-c y c le a re tw enty 5 -c y c le s for th e
individual CORDIC ite ratio n s. .At th is level, th e ite ra tio n s involving th e X , Y
an d Z re g is te r s of th e re sp e c tiv e CORDIC blocks a re serially c o m p u te d and
re c irc u la te d b a c k into th e ir X ', T a n d Z ' re g iste rs. At th e s t a r t of a new
ite ra tio n cycle, th e c o n tro lle r invokes a p a ra lle l load from th e p rim e d
re g is te rs to th e ir non-prim ed c o u n te rp a rts.
6 .3 .4 The F ully S erial CORDIC B lock
Bit se ria l a rith m e tic is u s e d to c o m p u te th e th re e CORDIC eq u atio n s in
sequence in th e fully se ria l block. It is esse n tia lly a th ird of th e serial-
parallel a p p ro a c h an d so will n o t b e covered in any d e ta il (re fe r to
[AMLA81]). It will be s e e n th a t fo r th e t a r g e t application, th is s tr u c tu r e does
n o t ex h ib it sufficient th ro u g h p u t in a c u rr e n t conservative nMOS technology.
6.4 ARCHITECTURAL TRADEOFFS

A COMPARISON OF THE CORDIC REALIZATIONS
Som e s ta tis tic s on th e four CORDIC block realizations, p a r tic u la r ly size,
have a lre a d y b e e n m entioned. These w ere b a sed o n six te e n b it s c ra tc h p a d
- 229 -
r e g is te r s an d in p u t sam ples w ith tw enty b it q u a n titie s X , Y , Z ; th e additional
four b its being ad d ed to th e CORDIC blocks to r e d u c e th e effects of roundoff
a t e a c h s te p of th e ite ratio n . A m o re c o m p le te s e t of p e rfo rm a n c e figures
a re given in F igure 6.10. C om parisons w ith th e S p e a k a n d S p e ll sy n th e siz e r
by T exas In stru m e n ts Inc. [WB78] a s well as w ith th e Bell L ab o rato ries Echo
C anceller chip [CD80] a re also in clu d e d in t h a t figure. Som e of th e
q u a n titie s in th e tab le a re quite sub jectiv e so th e rea so n in g b e h in d th e m will
b e given h e re .
A rith m e tic A r e a : This a r e a c o rre sp o n d s to th e two CORDIC blocks and
t h e s c ra tc h p a d re g iste rs. In th e c a se of S p e a k a n d Spell, th is is th e
a r e a of th e sy n th e size r chip w ithout th e in te g ra te d digital to analog
c o n v e rte r. U nfortunately, it was n o t p o ssib le to s e p a ra te o u t th e
a rith m e tic a re a from th a t o ccu p ied b y th e c o n tro l functions fo r th e
e c h o c a n ce lle r chip. Size e s tim a te s for th e CORDIC chips a re b a sed on
a five m ic ro n nMOS technology. It is s e e n t h a t all four CORDIC
re a liz a tio n s a re sm aller th a n th e c o m m e rc ial chips by a co n sid erab le
m a rg in . Interestingly, th e fully p a ra lle l CORDIC rea liz a tio n is less th a n
tw ice th e size of th e s e ria l-p a ra lle l tec h n iq u e. The la tte r m eth o d
r e q u ir e s som e buffer r e g is te rs w hich a re n o t n e c c e s s a ry in th e parallel
im p le m e n ta tio n as is evidenced fro m F ig u res 6 .6 and 6 .8 . Hence, th e
p a ra lle l a p p ro a c h p u ts th is a re a to good u s e resu ltin g in a m u ch
sm a lle r a re a difference b etw een th e two th a n m ig h t have b e e n
originally sp eculated.
I t h a s b e e n n o te d th a t th e CORDIC s p e e c h analysis chips a re sm a lle r
t h a n b o th co m m ercial efforts, b u t it is only fa ir to m en tio n t h a t
in d u stria l designs a re re q u ire d to b e ro b u s t to th e p ro cessin g v ariatio n
of th e fa b ric a tio n facility. C onsequently, th e y a re b a se d on m o re
LADDER FORM SPEECH ANALYSIS CHIP

FULL PARALLEL S E R IA L FULL SPEAK & echo:
PARALLEL S E R IA L PARALLEL S E R IA L SPELL CANCELLER
(P a rtia lly
A R IT H M E T IC Dynamic)
AREA 1.76 1 .4 3 1 .0 0 .7 3 .1
(M M2 )* 16 13 9 6 .3 28 72
( A r i t h m e t i c Area) ( In c lu d e s C n t l )
If T R A N S IS T O R S * 6000 4720 4000 3100 *12000 35000

( A r i t h m e t i c Area) (In c lu d e s C n t l )
CONTROL
COMPLEXITY 1 -2 1 1 >1 1 -2 B IG
(Synthesis
Only)
R E L A T IV E
THROUGHPUT 20 6.67 1 0 .3 3 1 2 .5 1 2 6 (T A P S )/5 '
M I N CLK RATE*
FOR 1 STAGE
0.6 MHZ 1.9 MHZ 12.7 MHz 38 MHz * ,8 MHz ■v 2 MHZ.
(8 KHZSPEECH
A N A L Y S IS )
(Random
MICROPROGRAM
-vl -vl ^1 >1 Logic)
COST *1
R E L A T IV E
1 .2 1 1 <1 »1 » 1
DESIGN
EFFORT
*- All q u a n titie s ex cept those a sterisk ed are r e l a t i v e .
Figure 6 .1 0 : Performance. Comparison o f Various A r c h i t e c t u r e s

- 231 -
p e sse m istic design p a ra m e te rs th a n th e ty p ical p a ra m e te rs utilized in
th e d esig n of th e la d d e r fo rm chips. While th is acco u n ts for a p a r t of
th e size discrepancy, th e lion’s s h a re is c e rta in ly a ttrib u te d to th e fa c t
t h a t th e CORDIC im p le m e n ta tio n is well m a tc h e d to th e alg o rith m s
(su c h as th e c o rre c t s e t of p rim itiv e o p eratio n s) resu ltin g in a c o m p a c t
realizatio n . This is especially tr u e given t h a t th e la d d e r filter chips also
e x h ib it su p e rio r th ro u g h p u t c o m p a re d w ith b o th Speak and Spell an d
th e echo canceller.
O perational Speed — M n im u m d o c k B a te For One Stage O f

The A n a lysis F ilter With BKHz S a m p le d Speech The title of th is
se c tio n being self ex planatory, it is s e e n th a t if one chip w ere u s e d p e r
sta g e of th e la d d e r filter, th e fully p a ra lle l chip could be clocked a t a
v e ry re a so n a b le 600 KHz, a b o u t tw e n ty tim e s slower th a n th e 12.7 MHz
clock r a t e req u ired by th e se ria l-p ara lle l approach. This is quite
re m a rk a b le in view of th e fa c t t h a t th e fo rm e r is less th a n twice th e size
of th e la tte r . While 12.7 MHz a p p e a rs to b e quite a form idable clock
r a t e fo r c u rr e n t day nMOS technology, it is in fa c t quite rea so n a b le fo r
th e se ria l a p p ro ach es which e x h ib it e x tre m e ly sh o rt p ro p ag atio n p a th s
b y v irtu e of th e ir se ria l n a tu re s . In c o n tra s t, th e fully serial a p p ro a c h
is n o t p a rtic u la rily useful sin c e it re q u ire s a prohibitively h ig h clock
fre q u e n c y for th e ta r g e t five m ic ro n nMOS technology. It m ay well b e
u sefu l fo r lower sp e ed a p p lic atio n s or alternatively, if it was
im p le m e n te d in a fa s te r tech n o lo g y su c h as su b m icro n nMOS (One could
a rg u e th is po in t forever, sin c e in a su b m icro n technology, th e fully
p a ra lle l a p p ro a c h would b e e x tre m e ly d e sira b le an d favourable).
The two c o m m ercial chips a re n o t rea lly cap ab le of s p e e c h analysis (a t
le a s t w ith th e chosen a lg o rith m s) since th e y a re unable to co m p u te th e
- 232 -
r a t h e r com plex o p e ra tio n s re q u ire d in th e n orm alized la d d e r fo rm
rec u rsio n s. While sq u a re ro o ts co u ld be co m p u te d using ite ra tiv e
tec h n iq u es su c h as New ton’s m e th o d [SW65], th e sp e ed o v e rh e a d would
be prohibitively larg e. However, a sp e ed co m parison w ith th e analysis
chips will still be m ad e on th e b asis t h a t analysis involves five ro ta tio n s
p e r sta g e of th e filter. By com parison, sy n th esis involves a single
r o ta tio n p e r stage. Since S peak a n d Spell is cap ab le of sythesizing
speech a t 10000 s a m p le s /s e c fro m a te n th o rd e r filter, it is effectively
c ap ab le of 12.5 ro ta tio n s a t 8 KHz o r roughly two sta g e s of th e analysis
filte r (We a re rea lly giving S p e a k a n d Spell th e b en efit of th e d o u b t h e re
as a lre a d y m en tio n ed ) a t its o p e ra tio n a l clock ra te , th u s providing a
p e rfo rm a n c e figure in F ig u re 6.10. Sim ilarily, th e echo c a n c e lle r w hich
c o n sists of 126 ta p p e d d elay line sta g e s, is c o n sid e red to b e c a p ab le of
64 ro ta tio n s fo r BKHz data.
R ela tive Throughput -. An a lte rn a te way to co m p are th e o p e ra tio n a l
sp e e d of th e chips u n d e r c o n s id e ra tio n is to exam ine how m an y la d d e r
sta g e s (for BKHz sam ples) could b e c o m p u ted by th e fo u r CORDIC
realizatio n s a t th e sa m e c lo ck r a te . T hat th e y c a n in fa c t b e clocked a t
th e sam e r a te a rise s from t h e ir c o m p a c t an d re g u la r layouts. H ence,
even th e b it p a ra lle l a p p ro a c h e s ex h ib it quite sm all p a th delays as is
ju stified in th e appendix. The c h o sen clock ra te for co m p a riso n is 12.7
MHz, w hich is th e d esig n lim it of th e serial-parallel a p p ro a c h fo r
calcu latin g one sta g e of th e la d d e r filter. Notice once a g a in fro m th e
tab le , th e re m a rk a b le p ro p e rty t h a t fo r less th a n tw ice th e a re a , th e
p a ra lle l a rc h ite c tu re affords app ro x im ately tw e n ty tim e s th e
th ro u g h p u t. H ence, a te n sta g e la d d e r could b e c o n s tru c te d w ith a
single chip, co m p a red w ith t e n chips if th e serial-p arallel a p p ro a c h is
- 233 -
utilized.
The th ro u g h p u t figures fo r th e two c o m m e rc ial chips a re in fe rre d from
th e lite r a tu r e , b a s e d on th e assum ptions above, as well a s th e fa c t t h a t
th e ir m ax im u m c lo ck r a te s a re p rec isely th o se fo r which th e y w ere
designed to o p e ra te , i.e. th e y a re in cap ab le of 12.7 MHz o p e ra tio n .
Again, th is is a d m itte d ly u n fa ir since th e s e designs a re b a s e d o n w o rst
c a se p ro ce ssin g p a ra m e te rs w hereas th e CORDIC b a se d analysis chips
have b e e n d esig n ed w ith ty p ic a l processing p a ra m e te rs . The d ifferen ce
in th e p a ra m e te rs u su ally re s u lts in a s p e e d fa c to r of two. In an y event,
th e ta b le shows c le a rly th a t th e p re s e n t la d d e r filte r chips p a c k m u c h
m o re p e rfo rm a n c e in to a given a re a an d th is is due to th e f a c t th a t
th e ir im p le m e n ta tio n is in tim ately m a tc h e d to th e th e o ry of th e
p ro b le m re s u ltin g in a good m apping of th e alg o rith m s o n to th e
a rc h ite c tu re s . F o r exam ple, th e a rith m e tic u n it's o p e ra tio n s s e t
co n sists of th e m o st n a tu ra l operations describing th e alg o rith m s.
The rem a in in g fig u res in th e tab le a re q u ite subjective. F o r exam ple, all
fo u r im p le m e n ta tio n s a re quite straig h tfo rw ard to c o n tro l w ith th e
c o n tro lle r s tr u c tu r e c h o se n (to b e d escrib ed in th e n e x t section), involving
ju s t d ire c t in te r p r e ta tio n of th e m ic ro in stru c tio n . S peak and Spell is also
qu ite sim ple to c o n tro l b y v irtu e of its m ic ro p ro g ram m ed n a tu re , how ever it
d o e sn o t len d itse lf to being read ily a d a p te d to o th e r ap p licatio n s sin ce it
does n o t provide th e pow erful in stru c tio n s e t of th e c o n tro lle r to be
d escrib ed . The e ch o c a n c e lle r h a s a ran d o m logic c o n tro lle r w hich is quite
larg e, difficult to a lte r and also tim e c o n sum ing to design. The
m ic ro p ro g ra m c o s t (p ro g ram m in g ease and sto ra g e ) of th e analysis chips is
th e sam e for all fo u r a p p ro a c h e s and p ro b ab ly lower th a n th a t of S p e a k and
Spell due to th e sp ecial c o n tro lle r a rc h ite c tu re . Finally, th e CORDIC chips
- 234 -
a re e x tre m e ly re g u la r since th e y c o n sist only of re g is te rs , a d d e rs and
m u ltip lex ers. H ence, th e y a re m u c h e a sie r to d e sig n th a n e ith e r of th e
c o m m e rc ial chips.
R e m a rk : The a u th o r acknow ledges t h a t som e of th e n u m b e rs in F ig u re 6.10
m ay be in c o rre c t sin ce th e y w ere in fe rre d from publications. My apologies
to Texas In stru m e n ts a n d Bell L aboratories, as well as th e re a d e rs , if th is is
in d eed th e case.
6.5 THE IDCROCONTROLLER
A good m ic ro p ro g ra m co n tro l s tra te g y should have a sim ple s tru c tu re ,
b e easily p ro g ra m m e d a n d be cap ab le of efficiently im p lem en tin g th e basic
o p e ra tio n s re q u ire d b y th e signal p ro cessin g a lg o rith m s of in te re s t. In this
p re se n ta tio n , th e c o n tro lle r will be d e sc rib e d (b ase d on th e CORDIC
o p e ra tio n s being of in te r e s t) an d th e n im p lem e n ta tio n s fo r a v a rie ty of
signal p ro cessin g alg o rith m s will be given.
S im plicity of s tr u c tu r e a n d pro g ram m in g is re a liz e d w ith a two level
c o n tro l philosophy. The h ig h e r level o r "m a c ro le v e r' of o p e ra tio n c o n sists of
a s e t of pow erful in stru c tio n s (Figure 6 . 1 1 ), few in n u m b e r, w hich define th e
fun ctio n al o p e ra tio n of th e chip, e.g. as a s p e e c h sy n th e size r, adaptive
equalizer, filte r e tc . A ch ip u s e r n e e d only be c o n c e rn e d w ith th is level of
o p eration. M acrolevei in stru c tio n s m ay invoke one o r m o re o p e ra tio n s a t
th e m icrolevei, th e se c o n d level of control. Note t h a t while th e prefix
"m acro" o r "m icro" is u se d to distin g u ish b e tw e en in stru c tio n s a t th e two
levels of op eratio n , th e y b o th fo rm a p a r t of th e c o n tro lle r’s m ic ro p ro g ram .
A m ic ro in stru c tio n is a single ite ra tio n of th e CORDIC re c u rs io n s (th e p recise
definition d ep en d s on w hich of th e fo u r im p lem e n ta tio n s is u sed ).
- 235 -
M acrolevel in s tru c tio n s a re n o t u niform in th e ir e x ecu tio n tim e. The
CORDIC o p e ra tio n s a re co n sid erab ly slow er th a n th e d a ta tra n s fe r o r SADD
a n d SSUB in stru c tio n s. "While th e l a tte r a re said to req u ire one m icro cy cle,
th e fo rm e r consum e one m ac ro cy c le . In fact, th e CORDIC in stru c tio n s su c h
as MUL, JROT e tc . a re calls to s u b p ro c e d u re s w hich im plem ent th e rec u rsiv e
CORDIC alg o rith m s as a se q u en c e of m icrolevel in stru ctio n s. In o rd e r to
avoid p ro c e sso r w aiting, a s well as in th e in te r e s t of p rogram m ing ease,
m a c ro in s tru c tio n s m ay only invoke o p e ra tio n s of th e sam e group (Figure
6 .11) in th e two p ro ce sso rs.
A sim ple m ic ro c o n tro lle r s tr u c tu r e is shown in F igure 6.12. A two p o r t
m em o ry provides th e n e c c e s s a ry m ic ro co d e to e a c h p ro ce sso r. S e p a ra te
p ro g ra m c o u n te rs c o n tro l p ro g ra m e x ecu tio n a t th e two levels while a field
of th e in stru c tio n is u se d for a d d re ss sequencing via th e n e x t a d d re ss logic
(NAL). All of th e n e c c e s s a ry c o n tro l signals to d ire c t th e o p e ra tio n of th e
CORDIC p ro c e sso rs as well as th e I/O an d sc ra tc h p a d com m unications a re
provided by v ario u s fields of th e m icrocode. An ite ra tio n c o u n te r whose
sequencing is c o n tro lle d by th e NAL, is provided for sim ple looping
c o n s tru c ts . A loop is in itia te d w ith a DO in stru c tio n specifying th e beginning
of a c o n s tru c t. The final in s tru c tio n of th e loop body is signified by a
m ic ro p ro g ram m ed (nanoprogram m ed?!) LOOP bit. At th is point, c o n tro l
r e tu r n s to th e a d d re s s following th e DO in stru c tio n and th e ite ra tio n c o u n te r
is d e c re m e n te d . N otice th a t a DO f o r e v e r facility is provided by se ttin g
n=255.
B oth of th e m acrolevel p ro c e s s e s a re tightly coupled in th e ir a d d re ss
sequencing by v irtu e of a single p ro g ra m counter, PCO. The two p o rt
m em o ry a p p e a rs a s a wide single p o rt device a t th is level of operation.
F u rth e rm o re , sin ce b o th p ro c e s s e s m u st b e from th e sam e group, th e chip
- 236 -
Sign
In s tr u c t io n Operands Reversal O peration Comments
Mnemonic B it(c )
---------------- i
MOVE1 s r c .d e s t no d a ta tr a n s f e r s r c . d e s t a re X.Y.Z o r
sc ra tc h p a d re g s o r I/O
SADD1 k no X+2~kY k i s a 4 b i t unsigned

i n te g e r
SSUB1 k no X -2 'kY u ses sig n re v e rs a l b i t

and SADD in te r n a lly
DO1 n no i n i t i a t e s loop do " fo re v e r" i f n=255

f o r 0<n<254
MUL2 yes Y+eXZ M u ltip ly and Accumulate
DIV2 y es Z+eY/X D ivide and Accumulate
ATAN2 y es Z+etan_1Y/X
CROT2 y es p lan e r o t a ti o n o' ' [X Y f by an g le eZ
ATANH2 y es Z+etanh_1Y/X
JROT2 yes h y p e rb o lic r o t a t on o f [X V]T by eZ |
E=±! 1 - d e n o te s group 1 in s tr u c t io n 2 - deno tes g rc jp 2 in s tr u c tio n
Figure 6.1 1 : M ic r o c o n tr o lle r I n s t r u c t io n S e t
PORT 1
2 - PORT
PROGRAM MEM
PORT 2
CORDIC 1 CORDIC 2
Figure 6 .1 2 : M ic r o c o n tr o lle r A r c h ite c tu r e
- 237 -
m ay b e viewed as an SIMD m achine invoking two c o n c u rre n t o p eratio n s, w ith
th e wide m em o ry o u tp u t being th e in s tr u c tio n (altern a te ly , it c a n b e view ed
as a c o n stra in e d MIMD s tru c tu re ). When a gro u p 2 m a c ro in s tru c tio n is
e x e cu te d , p ro g ra m co n tro l is tra n s fe r re d to PC I a n d PC2 a t th e m icrolevel
of o peration. Now, a tru e MIMD s tr u c tu r e exists w ith a se p a ra te in s tru c tio n
fo r e a c h p ro ce sso r. These a re of course, th e a c tu a l ite ra tio n s of th e CORDIC
alg o rith m s. Microlevel p ro ce d u re s m u s t b e of th e sam e le n g th (again fo r
sim plicity) an d p ro g ra m co n tro l is r e tu r n e d to PCO sim ultaneously for b o th
p ro c e sso rs. It is w orth noting th a t it is fre q u e n tly n o t possible to achieve
equal le n g th p ro c e d u re s for th e vario u s CORDIC in stru ctio n s d u e to th e n e e d
fo r scaling cycles and p re -ro ta tio n s [HT80]. However, th e e x e c u tio n tim e
d isp a rity m ay b e re d u c e d co n sid erab ly w ith th e m eth o d s of S e c tio n 4.2.
Finally n o te th a t although th e c o n c u rre n c y s tr u c tu r e of a p ro g ra m is
r e s t r i c te d by th is co n tro ller (e.g. a n u n c o n stra in e d MIMD philosphy would be
m o re g en eral), m any signal processing alg o rith m s fit th e c o n tro lle r well, fo r
exam ple, th e algorithm s to follow.
6.5 .1 The Speech A nalysis M icrocode
R eturning to th e ta r g e t a p p lic atio n of sp e e c h analysis, it c a n now be
m a d e m u ch m o re tangible by a c tu a lly w riting a p ro g ra m in th e m icro co d e
lan guage of th is chip which e x e c u te s th e la d d e r form re c u rsio n s d e p ic te d in
th e " flo w c h a r t" of F igure 6.1. N otice th a t on in stru c tio n defines an
o p e ra tio n for e a c h of th e two p ro ce sso rs.
MOVE 1 ,X ; 1 .X ; N ote SIMD S tru c tu re
MOVE , Y ; i/, Y ~
MOVE 0 . Z ;0 . Z
- 238 -
ATANH ; ATANH \T = 1
MOVE X,Rl;v,Y
MOVE p .Z;0 ,Z
MUL ; ATAN ; N ote MIMD S tru c tu re
MOVE 77 , X ; Z , R2
MOVE R2 , Z ; 7 7 , X
MOVE NOP ; M , Z
E-i
CO
CROT ; MUL
II
MOVE X , R3 ; R1 , X
MOVE Y , R4 ; R2 . Z
MOVE R l , X ; NOP
ATANH ; CROT ;T = 4
MOVE R3 , X ; Y , X
MOVE R3 , X ; Y , X
MOVE Z ,R3 ; 0 , Y
MOVE NOP ; R3 , Z
s-3
01
JROT ; JROT
II
6.6 OTHER APPLICATIONS
A lthough th e d esig n of th e chip h a s b e e n p re s e n te d in light of q u ite a
specific ap plication, it is actu ally app licab le to a larg e class of signal
pro cessin g p ro b lem s. The chip a r c h ite c tu r e is a r a t h e r powerful one
consisting of tw o c o n c u rre n t p ro c e s s o rs w hich o p e ra te on a v ery ric h s e t of
prim itive o p e ra tio n s. These a re f u rth e r a u g m e n te d by a c o n tro lle r which is
easily p ro g ra m m e d to s u p p o rt th e c o n c u rre n t o p e ra tio n of th e s e p ro c e sso rs
in a m a n n e r w hich is com m on to a h o s t of signal p ro ce ssin g applications.
This se ctio n will explore th e 'use of th e ch ip for com puting d isc re te F o u rier
- 239 -
tra n s fo rm s (DFT), p erform ing LPC b a s e d sp e ec h sy n th esis an d im p lem en tin g
b o th LMS ty p e adaptive equalizers. E xtensive u se of "flo w c h a r ts " s im ila r to
F ig u re 6 .1 will b e m ade. Som e f u r th e r applications of th e chip w hich will n o t
b e p r e s e n te d a re echo c a n ce lla tio n a n d adaptive line en h an cem en t.
B efore em barking on th e d e s c rip tio n of th e s e o th e r applications, it is
w o rth noting t h a t a single CORDIC v e rsio n of th is chip w ith a sim p ler c o n tro l
s tr a te g y will o fte n be useful. The id ea s p re s e n te d in th e design of th e
p r e s e n t chip a n d c o n tro lle r apply d ire c tly to th e rea liz a tio n of th e single
CORDIC version.
6 .6.1 The D iscrete Fourier Transform
R ecall th e DFT alg o rith m of S e c tio n 2.1. Notice th a t z(n)JV£71 is a
ro ta tio n of th e com plex v e c to r x ( n ) th ro u g h an angle -!?ta = —2 i r k n / N , so
th e p r e s e n t chip should b e id eal fo r im plem enting th e DFT. C onsider
utilizing N / 2 chips for a n N p o in t tra n sfo rm . E ach CORDIC blo ck is u sed
to c o m p u te one point of th e tra n s fo rm as shown in F igure 6.13. The q u a n tity
- 2 tt k / N is a c o n sta n t for e a c h CORDIC block so th a t as sam ples arriv e, th e
angle = (-2 7 - k / N ) n is first c a lc u la te d an d th e n th e c u rr e n t sam ple
x ( n ) is ro ta te d . Two m icro cy cles a r e th e n re q u ire d to a c cu m u la te th e re a l
a n d im ag in ary p a rts of th e tra n s fo rm sam ple. After N sam ples, th e
tra n s fo r m c a lc u la tio n is com plete. The final two m icrocycles im pose v ery
little a d d itio n al sp eed o v erh ead c o m p a re d w ith th e calcu latio n of and
th e r o ta tio n of x (n ).
E a c h r o ta te and a c cu m u la te o p e ra tio n re q u ire s 36 m icrocycles o r clock
cy cles. F or m axim um th ro u g h p u t, one CORDIC block p e r tra n sfo rm p o in t is
u sed . H ence, a sam ple d a ta r a te of 210K com plex s a m p le s/s e c o n d is
MACROCYCLE M ACRCCYCLE M ICII0C Y CL E M ICROCYCLE
CORDIC BLOCK k
LIN CIRC
x!(kj)
conm c olock a * i
LIN CIRC
m«1
/ —0
/-I
NOTES »'„>-.-inM/n SUBSCRIPT fl (I) Di NOTES REAL (IMAGINARY) PART x’(k,l)-=xllf-B^kl X(k.l) = E x'(k.i)
Ftgure 6 . 1 3 : The D i s c r e t e F ou rier Transform Implementation

- 241 -
possible w ith a n 8 MHz clock. Of course, for low er d a ta r a te operation, th e
CORDIC blocks m ay be tim e s h a re d resu ltin g in few er chips for a n iV-point
tran sfo rm .
D espain [De74] [De79] h a s stu d ie d th e use of CORDIC algorithm s for th e
DFT quite extensively. The r e a d e r is r e fe rre d to h is work.
6.6 .2 S p eech S yn th esis
Many ap p licatio n s involve th e sy n th esis of s to r e d s p e e c h segm ents e.g.
c o n su m er p ro d u c ts su c h as SPEAK&SPELL. The sy n th esis problem (C hapter
T hree) m ay b e rea d ily im p lem e n te d in la d d e r fo rm using a single p ro ce sso r
v a ria n t of th e p re s e n t ch ip (Figure 6.14) b e c au se e a c h sta g e of th e filter is
ju s t a ro ta tio n b y a n a m o u n t re la te d to th e re fle c tio n coefficient of th e
stage.
When th e sy n th e sis application involves s to re d r a th e r th a n a rb itra ry
sp e e c h se g m e n ts (as would be th e case in digital • telephony), th e
im p lem e n ta tio n c a n be significantly sim plified b y sto rin g i?n ra th e r th a n th e
reflectio n coefficient. pn .
6.6 .3 The U nnorm alized Ladder Form
Recall th e u n n orm alized praw indow ed la d d e r form equations from
C h ap ter Two:
£cj - T c.r - V t
7 n - l.T
rsc - & n+ l.T

■Kn+1.7 -
- 242 -
T =1 T =2. T =3
1
1_ \^ P Z ATAN
1 1 MOVE P< Y
1 1 MOVE 0, Z
P_ P__ V___ V+ /•JAN
ATAN 1 ATAN 1 CROT
I MOVE v, X
0 0 9P MOVE V* Y
tt l CROT
MOVE X, V
cos Bp -s in Bp MOVE Y, V
sin 8 COS Be
Figure 6 .1 4 : Ladder Form Speech S y n th e s iz e r
- 243 -
1ST _ ^7 1 + 1 .7
X n +l.T -
s7i+l;r = Sti.T — Kr. +l.T r n .T - l
r n+ l.T = r n .T - l ~ ■^r +1.7’ S-n.T
/n -l.r+ l
K Z.T +1 - * - - R n . T + = ? ' 7+1

7n-l.T
7 n+ l.r = 7n.T ~
fin+l.T
w here
&n+i.T is th e (7 1 + 1 ) ^ o rd e r p a rtia l c o rre la tio n of y ?
K£+l T , a re th e forw ard and b ack w ard filte r gains of th e
( n + l ) th fille r sta g e
R £ t . R n.r a re th e covariahces of th e fo rw ard an d backw ard
p re d ic tio n e r r o r s an d
7n.r is a likelihood v a riab le of n th o rd e r
It is possible to re a liz e th e s e equations w ith th e CORDIC functions when
773. = 0 provided th a t th e r e is sufficient dynam ic ra n g e in th e fixed point
sto ra g e fo rm a t to r e p r e s e n t th e q u an tities e n c o u n te re d . The re q u ire d ran g e
is clearly a functio n of th e in p u t p ro c e ss sta tistic s.
F igure S. 15 shows th e two CORDIC im p lem e n ta tio n of th e unnorm alized,
prew indow ed la d d e r form . N otice t h a t th e sign re v e rs e d CORDIC functions of
F igure 4.3 have also b e e n em ployed. The e a se w ith w hich th e sig n rev e rsal is
h an d led by th e CORDIC p ro c e s s o rs saves co n sid erab le overhead. R otations
a p p e a r to b e th e c o r r e c t co m plexity m e a su re since th e unnorm alized and
- 244 -
I 33 E
s.
o
u.
s.
0
"O
T3
as
*D
0
N
0
.C
t—
ID
0
S-
3
cn
Ll.
- 245 -
n o rm a liz e d alg o rith m s req u ire roughly th e sam e n u m b e r of th e m p e r sta g e
(since so m e p rep ro c e ssin g of th e d a ta m u s t b e done in th e norm alized
case).
6.6.4 A daptive E q u a liz a tio n
The m o s t com m on rea liz a tio n of ad ap tiv e equalizers in p re s e n t day
m o d em s is a com plex tra n sv e rsa l filte r (Figure 6.16) w hose coefficients a re
a d ju ste d using th e le a s t m ea n sq u a re (LMS) g ra d ie n t alg o rith m [Wi70]. The
fo rm u la tio n is re c a lle d from S ection 3.2:
z ifc — 2 c n r if c - n ( 6 .1 )
n
c *+1 = e* - Ae„
w here
c£ is th e n th com plex ta p coefficient a t tim e t
r n is a com plex in p u t sam ple (applies to all lin e a r
m o d u la tio n schem es)
z k is th e com plex eq ualizer o u tp u t
A is a re a l a d a p ta tio n c o n sta n t
en is a com plex e rro r signal supplied fro m elsew here in th e
m o d em (re fe rre d to as decision fee d b a c k equalization)
The e q u alizer is a significant p o rtio n of th e m o d em d a ta processing,
esp ecially w hen th e com plex m u ltiplications a re rea liz e d as re a l m ultiply
an d a c c u m u la te o perations. Again, th is p ro b le m is m ore n a tu ra lly
r e p r e s e n te d in te rm s of ro ta tio n s since th e s e a re a well known
re p r e s e n ta tio n of com plex m ultiplications. An im p le m e n ta tio n of one
eq u alizer ite r a tio n is given in F igure 6.17.
- 246 -
Figure 6 .1 6 : LMS A daptive E q ualizer
T=1 T=2
Im ( c ;)
ATAN CROT MUL
R e(e„)
MUL MUL
Irr. ic’ )
S p ( z ), S j ( z ) are real and imog. partial sums of zk at order n.
Figure 6 .1 7 : A d aptive E q u a liz e r Implementation
- 2*7 -
N otice t h a t th e m odem p refilterin g c a n also b e done as defined by ( 6 . 1 )
e x c e p t t h a t th e filter is tim e in v a ria n t so t h a t it is d esirab le to s to re iSr.
r a th e r th a n th e filter coefficients them selves.
All of th e foregoing exam ples a re n a tu ra lly d e sc rib e d by ro ta tio n s.
E fficient im p lem e n ta tio n s b ased on the r o ta tio n fram ew ork c a n be ob tain ed
e ith e r w ith th e p re s e n t chip o r u n ip ro c e sso rs w hich a re v ariatio n s of th e
p r e s e n t a r c h ite c tu r e and co n tro l s tr u c tu r e .
The d e sig n of a two p ro c e s s o r CORDIC chip was p re s e n te d w ith th e
a p p lic atio n of rea l-tim e sp e e c h analysis using la d d e r filters in m ind. F o u r
possible a rc h ite c tu re s , exploiting varying d e g re e s of parallelism , w ere
c o n sid e red a n d c o m p a red to som e existing chips. It was se e n th a t by
carefu lly m atch in g a rc h ite c tu re to algorithm s, th is chip offers m o re
c o m p u ta tio n pow er fo r a given chip area.
A tw o level m ic ro p ro g ram c o n tro l s tr a te g y was p re s e n te d which was
desig n ed to fa c ilita te u s e r p rogram m ing. A lim ited n u m b er of pow erful
in stru c tio n s allow th e u s e r to use th e two CORDIC m ach in e for a v a rie ty of
in te re s tin g signal p ro cessin g functions, including s p e e c h analysis.
- 248 -
APPENDIX
Is it possible to clo c k th e p a ra lle l a rc h ite c tu re s a t 12.7 MHz? The e le c tric a l
p ro p e rtie s given in [MC80] 'will be u sed in a n a tte m p t to answ er th is ques
tion. The d a ta flow in th e p a ra lle l a rc h ite c tu re is shown in figure 5.3. From
th e p re lim in a ry design an.d lay o u t t h a t was p e rfo rm e d to g e n e ra te figure
6.10, th e following first o rd e r (i.e. RC) eq u iv alen t c irc u it of th e c ritic a l tim
ing p a th is o b tain e d (p rech arg in g is n o t a ssu m ed ):
DD
R1=10K
AU d elay
R3=14.6K
A /W
C^=.3pF — —
- i — C2=.3pF
C3= l . i pF
It is sim p ler to analyze th e m o re pessim istic circ u it:
V,
DD
R2=20K
"T Cx=1.4pF ~ C2=.3pF
- 249 -
L et Tc b e th e c lo ck p e rio d a n d Tav = 40 n s th e delay th ro u g h th e g a te s of
th e a rith m e tic u n it. If signal rise tim e s a re m e a s u re d to th re e volts (assum
ing a 5V supply), i t is c le a r th a t:
Tc ^ R \C i + R zC z + Ta'j
- 81 n s
H ence th e m ax im u m clock freq u en cy is v e ry close to th e d e sire d 12.7 MHz.
Notice th a t in s tru c tio n fe tc h and decode tim e has b e e n ignored in c a lc u la t
ing th e cycle tim e of th e chip b e c a u se th e s e functions a re o verlapped w ith
in stru c tio n e x e c u tio n th ro u g h a single in stru c tio n p re fe tc h . It c a n n o t be
em p h asized en o u g h t h a t th is sim ple analysis is n o t ro b u st to processing
variatio n s a n d i t is likely quite optim istic.
It re m a in s to show th a t Tû - 40 n s is achievable w ith th e p ro ce ss
p a ra m e te rs of [MC80] (rem em b erin g again t h a t th ey a re e x tre m e ly optim is
tic , how ever th e y do provide a co m m o n b a sis for com parison). Since th e
a d d e r is 20 b its wide, w ith a rip p le c a rry , a propogation delay of 2 n s / b i t is
to le rab le in th e c a rr y circu it. R eferring to figure 6.4, it is c le a r th a t th e AU
sp e e d is in d e e d lim ited by th is c irc u it.
L et th e AU o perands, a* an d 6 * b e s ta b le a t th e re sp e c tiv e inputs. The
c a rry signal in c u rs a single p a ir delay in e a c h stag e of th e a rith m e tic unit.
The equivalent c irc u it of a stag e is:
V,
DD
T
- 250 -
Sam ple layouts, t h a t w ere u sed to d e te rm in e th e c o m p arativ e chip
a re a s p ro v id ed in figure 6 .1 0 , le a d to:
R u - 30 A:n R dl = 3 R dZ = 6 k Q C = 0.1 p F
L et Tr — Rn C a n d T f = R i ZC. An in v e rte r ty p ically sw itches a t 1 .5 7 so
define:
t r - rise tim e = tim e to charge C fro m 0 7 to 1 .5 7
t j = fall tim e = tim e to discharge C fro m 5 7 to 1 7
tp = p a ir d elay = tr + tf
Then:
tp = —( T f 'in 0.2 + r r ln 0.7 ) = 2 ns
H ence. Ta y = 40 ns is achievable w ith th e r a t h e r o p tim istic p ro c e s s p a ra m e
t e r s p re s e n te d in [MC80]. In rea lity , 60 ns — 100 ns w ould b e m o re likely *.
1 1 am grateful to Professor Hennessy for pointing out these figures and far supplying me
■with th e perform ance of som e adders for comparison.
- 251 -
BIBLIOGRAPHY
[AMLABl] H.M. Ahmed, M. Morf, D.T.L. Lee a n d P.H. Ang, "A VLSI Speech
Analysis Chip S e t B ased on S quare-R oot N orm alized Ladder
F orm s," Proc. 1981 ICASSP, A tlanta, GA, M ar.-Apr. 19B1, pp. 64B-
653.
[AnBO] P. Ang, N otes o n th e S erial-P arallel CORDIC Block, 1980
[CD80] Y.-S. Chen, D. D uttw eiler, "A 35,000 T ra n sisto r Chip VLSI Echo
C anceler," Proc. o f the I n t l . S o lid S ta te C ircuits Conference,
S an F ran cisco , CA., 1980.
[De74] A.M. D espain, "F o u rier T ransform C om puters Using CORDIC Ite ra
tions." IE E E Trans. Com put., Vol. C-23, Oct. 1974, pp. 993-1001.
[De79] A.M. D espain. 'V e ry F a s t F o u rie r T ran sfo rm A lgorithm s for
H ardw are Im p le m e n ta tio n ," IEEE Trans. C om put. , Vol C-28, No.
5. May 1979, pp. 333-341.
[HT80] G. Haviland, A. Tuzynski, "A CORDIC A rith m etic P ro c e sso r Chip,"
IE E E Trans, o n C om puters, Vol. C-29. No. 2, F e b ru ary , 1980.
[Hw79] K. Hwang, C om puter A rithm etic, p rin c ip le s, A rc h ite ctu re and
Design, J. Wiley, 1979.
[MC80] C. Mead. L. Conway, In tro d u c tio n to VLSI S y s te m s , Addison Wes
ley, 1980
- 252 -
[Pe72] J. P e a tm a n , The D esign o f D igital S y s te m s , McGrawHill, 1972
[SD65] P. SouttivOFux , 2. i/c i£ c u W t lscytA iM i oO I / t p r s i wlu j 'i l i S ftx T L C O l
M ethods, M cGrawHill, 1965.
[W a?l] ‘ J.S. W alther, "A Unified A lgorithm fo r E le m e n ta ry F u n c tio n s,”
Proc. o f the 1971 S p rin g J o in t C om puter Conference, pp. 379-
365.
[WB78] R. Wiggins, L. B rantingham , "Three Chip S y stem Sjm thesizes
H u m an S p e e c h ,” E lectronics, Vol. 51, No. 16, Aug. 31, 1978, pp.
109-116.
[Wi70] B. Widrow, "Adaptive F ilters," in A spects o f N etw o rk and S y s te m s
T heory (Kalm an. DeClaris), Holt, R in e h a rt a n d Winston, 1970.
- 253 -
CHAPTER SEVEN
CONCLUSIONS
This d iss e rta tio n was d ev o ted to th e stu d y of signal processing algo
rith m s for efficient, in te g r a te d im p lem en tatio n . The goal was to d esign a
g e n e ra l purp o se signal p ro c e ssin g chip w hich would provide a la rg e r
th ro u g h p u t p e r silicon a r e a th a n existing signal p ro cesso rs, n ot b e c a u se of
technological ad v an tag es in IC p ro cessin g , b u t r a th e r by v irtu e of having a n
a rc h ite c tu re th a t is closely m a tc h e d to th e algorithm s of in te re s t. This
re q u ire s a d e ta ile d stu d y of a re a s spanning from signal p ro cessin g th e o ry to
c o m p u te r a rc h ite c tu re a n d c irc u it design. The chip th a t was designed
r e p re s e n ts a ra d ic a l d e p a rtu re from c u rr e n t day signal p ro c e sso rs b e c a u se
its a rith m e tic u n it is b a s e d on th e ability to p e rfo rm g e n eralized v e c to r
ro ta tio n s as p rim itive o p e ra tio n s r a t h e r th a n th e visual m u ltiply 2 n d a c c u
m u la te function.
The m otiv atio n for v e c to r ro ta tio n p rim itives arose from th e rea liz a tio n
t h a t m an y algorithm s c a n b e c a s t in to a fo rm w here co o rd in ate tra n s fo rm a
tio n s are the n a tu ra l o p e ra tio n s d escribing th em . In fact, m u ltip licatio n is
re a iiy a v ecto r r o ta tio n in a p a rtic u la r co o rd in ate system , i.e. it is a s u b s e t
of th e ric h c o m p lem en t of o p e ra tio n s th a t n atu rally d e scrib e m any signal
pro cessin g algorithm s. B o th th e d isc re te F o u rie r tra n sfo rm (DFT) and th e
s q u a re root n o rm alized la d d e r form w ere c a s t into a ro ta tio n fram ew ork. In
th e la t te r case, th e e n tire la d d e r u p d a te was shown to only re q u ire five r o ta
tio n s p e r stage, i.e. five p rim itiv e op eratio n s. However, th e com plexity in
te r m s of m u ltiplications is fo rm id ab le owing to th e sq u are ro o t op eratio n s.
M atrix algebra alg o rith m s t h a t o c c u r com m only in signal p ro cessin g w ere
- 254 -
also shown to be n a tu ra lly d e scrib ed by ro tatio n s.
The CORDIC algorithm s were e v alu ated as a m ea n s fo r p e r f o r m ing vec
to r ro ta tio n s b e c au se th e ir im p lem e n ta tio n is v e ry sim ple, requiring only
a d d e rs, sh ifte rs a n d re g iste rs. U nfortunately, th e alg o rith m s a re in h ere n tly
slow owing to th e ir ite ra tiv e n a tu re a n d th e y do n o t provide a sufficiently
larg e dom ain of convergence for m o st applications. The d e sire d re s u lts a re
also sc ale d by a spurious scale c o n s ta n t w hich is difficult to co m p en sate for.
In fact, existing m ethods of circum venting th e s e sh o rtcom ings in c u r a
se v e rs h a rd w a re and sp e ed penalty. A new m eth o d was developed for sim ul
tan e o u sly enlarging th e reg io n of convergence of th e alg o rith m and com pen
sa tin g for th e scale fa c to r w ithout th e ad d ition of any h ardw are an d w ith
only a m in o r sp e ed overhead. This h a d a profound im p a c t on th e chip
design, red u cin g its cycle tim e re q u ire m e n t by n e a rly 50%. Chip size was
also re d u c e d sin ce special hardw are was n o t req u ired .
Two h y b rid CORDIC techniques, w hich com bine th e CORDIC algorithm s
w ith a m ultip lier, w ere developed for enhancing th e o p e ra tio n speed of th e
algorithm s. When c o m p ared w ith a s to re d ta b le a p p ro a c h in which a m u lti
p lie r is u se d to p e rfo rm r. v ecto r ro ta tio n , th e h y b rid CORDIC afforded an
ex p o n en tial re d u c tio n in req u ire d sto ra g e in exchange for a lin e a r in c re a se
in ex e cu tio n tim e. Consequently, th e h y b rid m e th o d h a s a m u ch h ig h er
th ro u g h p u t p e r a re a figure of m e rit th a n th e s to re d ta b le approach.
Floating p o in t CORDIC algorithm s th a t a re b a se d solely on floating point
ad d itio n s w ere also developed. They a re co n cep tu ally sim p ler th a n th e ir
fixed p o in t c o u n te rp a rts since no explicit shifting is re q u ire d (effectively,
th e sh ift o c c u rs a u to m atically during ra d ix p o in t alignm ent in th e floating
p o in t ad d e r).
C hen’s convergence co m p u ta tio n m eth o d (CCM) was co nsidered fo r th e
- 255 -
a rith m e tic u n it d esig n a s well. The CCM was g e n eralized to v e c to r valued
fun ctio n s an d i t was show n to be in tim a tely r e la te d to th e CORDIC; CORDIC
being a special c a se of th e g e n e ra liz e d CCM. H ence, a unified fram ew ork for
m an y e le m e n ta ry fu n ctio n s was discovered in th is gen eralizatio n .
T arget ap p licatio n s for th e signal processing chip included r e a ltim e
sp e e c h analysis an d synthesis, adaptive equalization, dig ital signal d e te c tio n
a n d som e m a trix o p e ra tio n s. An effort was m ad e to c a s t m any of th e se
p ro b lem s in to a la d d e r fo rm s tr u c tu r e since th e la d d e r form h a s a nice
im p lem e n ta tio n (b ase d on ro ta tio n s) and exhibits n a tu ra l pipelining. A new
signal d e te c tio n sc h em e was d e scrib ed w hich u se s th e likelihood variable in
th e la d d e r filte r to d e te c t ch a n g es in signals c o rru p te d b y additive G aussian
noise. The p e rfo rm a n c e of th e d e te c to r was shown to d ep en d on th e tr a n
s ie n t behaviour of th e la d d e r following a b it change. D etecting a change in
th e in p u t signal was re d u c e d to be a binary hyp o th esis te stin g p ro b le m in
w hich th e re le v a n t t e s t s ta tis tic d istributions were a sy m p to tically chi-
squared.
F a s t Cholesky fa c to riz a tio n by rows was shown to have a la d d e r form
re a liz a tio n also, w hich was th e sa m e as if th e fac to riz atio n h a d b e e n done by
colum ns. The equivalence of th e fa s t Cholesky a lg o rith m and th e Levinson
alg o rith m in la d d e r fo rm w as d e m o n stra te d u n d e r pipelining, th u s providing
a unified s tr u c tu r e fo r th e ir im plem entation. Large a rra y s of p ro c e sso rs
t h a t utilized th e chip as a p ro ce ssin g e le m en t w ere c o n s tru c te d for a v a rie ty
of m a trix a lg e b ra o p eratio n s. It was n o ted t h a t th e s e alg o rith m s exhibit
co n sid erab le s tr u c tu r e , allowing fo r th e definition of a p ro g ra m m odel. With
th e aid of th e m odel, a g e n e ra l p u rp o se a rra y s tr u c tu r e was shown to exist.
Unlike ad hoc te c h n iq u e s for co n stru c tin g a rra y s th a t a r e specific to an
algorithm , th is g u a ra n te e s th a t any p ro g ra m which satisfies th e m odel
- 256 -
(adm ittedly, th e m o d el was r e s tr ic te d to a class of signal p ro ce ssin g p ro b
lem s) c a n be im p le m e n te d on th e g en eral p u rp o se a rra y a n d its p e rfo r
m ance is c h a ra c te riz e d by th e d a ta d ep en d en cies of th e alg o rith m . A sim ple
tec h n iq u e for analyzing th e p e rfo rm a n c e of an a rr a y of p ro c e s s o rs was also
given.
Finally, a m ic ro p ro g ra m m e d chip consisting of two CORDIC p ro c e s s o rs
and a s c ra tc h p a d a r e a was designed. F our d iffe re n t con fig u ratio n s w ere
co n sid ered th a t in d ic a te d t h a t a b e tte r th ro u g h p u t p e r a re a ra tio was
achievable w ith b it p a ra lle l r a th e r th a n b it s e ria l a rith m e tic . In c o m p a ris
ons w ith two co m m e rc ial chips, th e CORDIC p ro c e s s o r exhib ite d h ig h e r
th ro u g h p u t p e r a r e a for a la d d e r filter algorithm , h e n c e satisfying one of th e
goals s e t fo rth in th e in tro d u ctio n . A two level m ic ro p ro g ra m c o n tro l s tr a
teg y was chosen su c h t h a t th e u s e r n e e d n o t be b o th e re d w ith th e d e ta ils of
th e CORDIC ite ra tio n s . By working with a sm all b u t pow erful in s tru c tio n set,
signal p ro cessin g a lg o rith m s m ay be read ily p ro g ra m m e d b e c a u s e th e y a re
gen erally quite s h o rt a n d re p e titiv e (hence sp eed intensive) a n d do n o t often
exhibit conditional b ran ch in g . The g en e ra lity of th e chip was shown w ith a
v a rie ty of exam ples in c h a p te r six.
In conclusion, th is th e sis sp an s a n in te re stin g m ix tu re of topics. The
goal was to design a chip w hose a rc h ite c tu re was in tim a tely m a tc h e d to th e
th e o re tic a l foundations of th e alg o rith m s of in te re s t. This was larg ely done
b y identifying th e m o s t n a tu ra l s e t of prim itive o p e ra tio n s w hich d e sc rib e d
th e algorithm s, a n d to a le s s e r d egree, by m odifying a lg o rith m s to be co n d u
cive to im p lem e n ta tio n (e.g. fast Cholesky in la d d e r form , la d d e r form s for
signal d e te c tio n a n d la d d e r form s in a ro ta tio n fram ew ork). The r e s u lt is a
chip th a t is som ew hat unconventional b u t v ery well su ite d to th e c la ss of
p ro b lem s of in te re s t.

Signal Processing Algorithms and Architectures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Signal Processing Algorithms and Architectures

Uploaded by

Copyright:

Available Formats

INFORMATION TO USERS

The following explanation o f techniques is provided to help clarify m arkings or

2. When an image on the film is obliterated w ith a round black m ark, it is an

4. F o r illustrations th at cannot be satisfactorily reproduced by xerographic

Ahmed, H assan Masud

SIGNAL PROCESSING ALGORITHMS AND ARCHITECTURES

Stanford University Ph.D. 1982

1. Glossy photographs or p a g e s ______

2. Colored illustrations, paper or print_____

3. Photographs with dark background_____

4. Illustrations are poor co p y ______

5. Pages with black marks, not original copy______

6. Print shows through as there is text on both sid es of paqe X

7. Indistinct, broken or small print on severalp ages is

8. Print exceeds margin requirements_____

9. Tightly bound copy with print lost in spine______

10. Computer printout pages with indistinct print______

12. P a g e ( s ) _ _ 9 ______ seem to be missing in numbering only as text follows.

13. Two pages num bered____________ . Text follows.

14. Curling and wrinkled p a g e s______

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

(P rin cip al Advis

A pproved fo r th e U niversity C o m m ittee o n G ra d u a te Studies:

D ean of Grac&iate S tudies & R e se a rc h

The ad v e n t of th e V ery Large Scale In te g ra tio n (VLSI) technology has

provided th e ab ility to c o n s tru c t large sy ste m s o n a single silicon chip. This

d isse rta tio n is c o n c e rn e d w ith exploiting th is ab ility to design a powerful

signal p ro ce ssin g chip c ap ab le of efficiently im p lem en tin g su c h p o p u lar

algorithm s as th e d is c re te F o u rie r tra n sfo rm , la d d e r filters an d a sso c ia te d

m a trix a lg e b ra o p eratio n s. The la tte r include Givens ro ta tio n s and Cholesky

The goal of th e p r e s e n t w ork is to efficiently m ap algorithm s onto

a rc h ite c tu re s b y m ain tain in g a close link w ith th e th e o re tic a l b a sis of a

p a rtic u la r signal p ro ce ssin g m ethod. It is show n t h a t all of th e algorithm s

co n sid ered c a n be c a s t in to a m a th e m a tic a l fram ew o rk involving g en eralized

v e c to r ro ta tio n s . S u c h r o ta tio n op eratio n s provide a n a tu ra l d e scrip tio n of

th e alg o rith m s a n d th e co m putational com plexity m e a su re d in te rm s of

th e s e e le m e n ta ry o p e ra tio n s is m u ch low er th a n in te rm s of th e usual

m ea su re of to ta l n u m b e r of m ultiplications. Thus, unlike p re s e n t day signal

p rocessing c o m p u te rs w hich em phasize ra p id m ultiplication, th e signal

p rocessing a rc h ite c tu re s in th is thesis a re b a s e d on th e ability to p e rfo rm

v e c to r ro ta tio n s in g e n e ra liz e d co ordinate sy stem s.

I t is show n t h a t th e C0RD1C algorithm of V oider provides a convenient

im p lem e n ta tio n cf v e c to r ro ta tio n s with only sim ple com ponents su c h as

adders, r e g is te rs an d sh ifters. U nfortunately, th ro u g h p u t is severely

com prom ised owing to th e n e e d for p erfo rm in g sp e cia l o p eratio n s to

a c co u n t fo r th e lim ited reg io n of convergence a n d spu rio u s scale c o n sta n ts

in h e re n t to th e m ethod. New tech n iq u es to c irc u m v e n t th e s e p ro b lem s w ith

no additional h ard w are a n d only a m arginal s p e e d p e n a lty are d escrib ed .

known as h y b rid CORDIC a re discussed. Additionally, floating point CORDIC

(FL0RD1C) alg o rith m s th a t a re co n cep tu ally sim p le r th a n th e ir fixed point

c o u n te rp a rts a re developed and th e c o n n e c tio n of CORDIC to th e

convergence c o m p u ta tio n m eth o d s is shown.

The a rc h ite c tu re of a dual CORDIC block ch ip is d e scrib ed for a ta r g e t

application of re a l tim e s p e e c h analysis. The re su ltin g chip is shown to have

a h ig h er th ro u g h p u t p e r a re a th a n conventional chips b a se d on fast

m ultiplications. This is a ttr ib u te d to th e close m a tc h of th e p re s e n t chip to

Large m e s h c o n n e c te d p ro ce sso r a rc h ite c tu r e s for m a trix facto rizatio n

a re developed w hich a re also closely m a tc h e d t o th e algorithm s. Individual

processing e le m e n ts in th e m esh a re b a se d on CORDIC o p erations, in fac t on

th e afo rem en tio n ed signal processing chip.

discussions. I feel th e sam e v e ry special a tta c h m e n t to m y room m ates (ex