164

Parallel Algorithms for Singular Value Decompo: ion Renard R. Ulreyt, Anthony A. Maciejewskit, and Howard Jay Siegelt NCR Corporation, 2001 Danfield Ct., Ft. Collins, CO 80525 USA ‘Parallel Processing Laboratory, School of Electrical Engineering Purdue University, West Lafayette, IN 47907-1285 USA. Abstract {In motion rate control applications, itis faster and eas jer @ solve the equations imolved if the singular value decomposition (SVD) of the Jacobian matrix is frst deter ‘mined. A parallel SVD algorithm with minimum execution lime is desired. One approach using Givens rotations lends itself 10 parallelization, reduces the iterative nature ofthe algorithm, and efficiently handles rectangular matr- ces. This research focuses on the minimization ofthe SVD execution time when using this approach. Specific issues dressed include considerations of data mapping, effects Of the manber of processors used on execution time, impacts of the interconnection network on performance, ‘and trade-offs between modes of parallelism. Results are verified by experimental data colleced on the PASM par. allel machine prototype Introduction Decreasing the execution time of computerized tasks is the focus of a tremendous amount of study. The use of parallel computer systems is one method to belp decrease these times. The performance of a parallel system, however, is dependent on the algorithm implementation and the parallel machine characteristics. Performance opt ‘mization is therefore complicated, due to the wide variety of algorithm characteristics [7] and the rapidly growis variety of parallel machines that have been built or proposed. Thus, the study of mapping algorithms onto paral- Jel machines is an important research area, ‘The singular valve decomposition (SVD) of matrices ‘has been extensively used in cotrol applications, e... during the computational analysis of robotic manipulators (8, 22), The decomposition aids the computational solution This rec was soponed in pt by the Nan Seepe Foundation unde grant CDA-S01866, by Suds Nabe! Labret com tees T1790, and by) Rome Laboriory inder enact, 0.8186 6602-604 © 1996 IEEE, of system equations such s the movon rate como! fr- aula f= J8, where eR specifies the end effector veloc- iy, JER specifies joint velocities, and JER™™ is the Jacobian matix 21). For ystems with many cooperating ‘maniplatos te value of N can rach ato the hundreds, revuling in s severe computational burden for achieving realtime contol Tn general, computation of the SVD of an arbitrary satix is an iterative procedure, so the numberof opere- tions required to calculate ito within acceptable eer lim- its is not known beforehind, The contol of many systems, however, is based on equations involving the curent Jacobian matrix, which can be regarded asa perturbation of the previous matrix, ie. 1440) =JCO+AN. It has ‘been demenstated that for these cases knowledge ofthe previous state canbe used during the computation of the ezureat SVD to decrease execution time [12]. This paper describes and analyzes two SVD algorithm implementa sions for these cases. Experimental data obtained on the PASM protouype parle compute {1,19} is provide that supports the conclusions of the algorithm analyses. ‘Section 2 provides background information about SVD. ‘Givens rotations, and PASM. Descriptions ofthe two parallel SVD implementaions being analyzed are preseated in Secon 3. Section 4 demoaseats an analysis approach to determine wiich implementation haste shores exeou- tion time. The performances of SVD iinplementatons oa PASM are evalu in Section 5. 2: Background information ‘The SVD of a matrix JER" is defined as the matrix fctorization J=UDV", where YER! and VER™' are orthogonal maces ofthe singular vectors, and DERM is a nonnegaive diagonal matrix, The singular values of J, ‘ae ordered from largest to smallest along the diagonal OfD. Itis assumed here that MSN. ‘The Golub-Reinsch algorithm [6] is the standard toch- nique for determining the SVD of a matrix. This method,ower, has evo unattractive aspects, The st i that the Algorithm, a its deed, camot vse kaowedge ofa pre= iow matix decomposition. ‘The second is tht the techy nigue i elavely serial in nature. making more paral ‘ble algorithms desirable Sever parallel SVD algriims tne been imple mone for various machise architectures, incding those proposed in 3, 4,10, 11, 16. These implementations also do not allow thei iterative nates 0 be reduced. Algo. tits beng stodied i ths paper ae based on a method logy pesated in [12]. which exclshely uses Givens ‘otaons (6) orhogonaize matix columas. Swccesive Givens rotations are vsed fo generate the conbogonal matrix V tat wil rest in JVB. wher te columns of BER“ are orthogonal. A matrix with ‘vhogonal columns can be writen asthe product of an fnbogenal matrix U ands diagonal matix D (ie, [B=UD) by leting the columns of Unequal nomalized colums of B. by y= byflibll (where Ilb,|l = Vb"). co) and defining the diagonal elements of D to be equal to the orm ofthe columns of B y= bl This esol inthe SVD of ‘The orthogonal mati V chat will onhogonalize the columns of Js formed as a prodct of Givens rotations, tach of which onhogonalzes wo columns. Coniering the ih and kth columns of an arirary matin A. single Givens rotation results in ew columns, af anda given by af =a,cos(¢)+a, sin(¢) @) a = & cost) —a, sin(4), @ The cox) and sin) teams accessary to achieve orhogo- ality ae computed using the formulas in 14), which we based on the quanites peataqaa"y—a"y. and o= VaR Using these quattes, when q20 cos) = EFI and sing) Wheaa<0 Sind) = sex(p)- C=O and cos(#) = pic sin(¢)) . where sgn(p) equals 1 if p20 and —1 if p<0. Two sets of formolas are given so that il-condiioned equations esult- ing from the subacon of neary equal mumbers can shay be voided. “To orthogonalize each possible pair of columns requires N(N- 1)/2 rotations, referred to as a sweep [6]. The matrix V can be computed by iteratively forming the prod- ‘ct of ast of sweeps and tesing for convergence. While the number of sweeps required to onhogonalize the columns of J is not generally known beforehand. it vas shown in [12] hat by sing the V matix frm the SVD of @ 6 llc -cost)). (6) o 05 the previous Jo ind a inital estimate for B, BUeAd=Ii+Ad=VO. ® ‘one can obtsin good approximation to the new SVD sing a single sweep if AKO is small, Therefore, ia this work the cureat V- mauix is caleulaed using Verd0=VOrTT IT Ga. whee Gy denotes the Givens roation 10 erhoponalize columas i and k. Only @ single sweep is performed to update the matrix V. ‘The PASM (partionable SIMD/MIMD) parle! processing sytem [I 19] was used to implement thse algorithms. PASM, designed at Purdue Univesity, supports mixed-mode parallelism. is, it can opera in either SIMD or MIMD mode of parallelism, and can switch modes at instruction level geasulariy with generally negli- ible overhead. A small scale 30-processor PASM proto- {ype has boen built with 16 PES (processonemory pairs) in the computational eagine. Tor inter-PE communice tions, PASM uses a paridousle ciuitswitched multi- stage coke intercoanection network (18) also called an Omege (9), The network can be used in both SIMD and MIMD moses. PASM is capable of employing baie synchronization {5] in MIMD mode, called Barier MIMD (MIMD) Fach PE executes its code independendy until it aves at 4 syachronzaton point called a bacriet. Thea, each PE waits athe baie ual all PEs indicate they have reached i One use for this is to synchronize inte-PE wansfers peeformed in MIMD mode 3: Data mapping 3.1: Overview Based on the equations in Section 2, Fig. 1 gives an gorithm to calculate V, D, and U using Givens rotations. This algorithm assumes that the SVD of the Jacobian ‘matrix from the previous control sample period has been ‘computed. Thus for step 1, the previous V matrix is avail- fable on the system. It is assumed thet the algorithm then Coaverges with a single sweep of rotations in sep 2. Referring to the parallel execution of a Givens rotation by all PEs asa rotation step. N= I rotation steps must be performed on N72 column pairs to form all NIN 1)/2col- ‘umn pairs. With unique column pairs distributed among 1N2 PEs, inver-PE communication is avoided within each rotation step. After the inital rotation step, however, an iter-PE communication is required before esch remaining rotation step. This rotatedransferroiae sequence is ‘required both to form all column pairs and to converge the B and V matices to their single-sweep values. Newl ‘updated columas are being transferred in each communi cation step,‘lel nates for B rm J previo Ving 8) {eral etn pie 6) do Taliep and ing) ‘Site cout) and ng (6) 07) erm cotati on com and of simi to) and) ef oan o coh and Kf ¥ snare ad “aul D tom Bing (2) ‘eit om Band sing) Fig. 1: Highvievel algorithm for finding SVD using Givens rotator Poca ene") ‘As presented in Section 2, the calculations involved in this algorithm are suaightforward. Of greater interest are ways to effectively map matrix elements to particular parallel machines, and the types of inter-PE. communication these mappings dictate Various implementations of column transfer operations have been devised, including those in (3, 4.16]. Each of these methods map a unique ‘column pair to each of N/2 PES. The availability of a mul tistage cube network on PASM allows matrix data to be isrbuted scross more PEs than allowed by implementa: ‘ions in (3, 4, 16), and thus increases the number of PES that can perform useful work while stil performing all necessary inter-PE communications in single transfer steps ‘Two different methods for mapping matrices to PEs are resented. These implementations assume that M=2" M in these ‘two equations, the mumber of FLOPS used in each imple ‘mentation contnves to decrease as R (and the aumber of Ps) increases, upto the maximum allowed when R = M. Setting the derivative of the DT count of the 2CPP approach to zero results in the mathematically optimal value of R= ((N? -2N+NM-4N)/(@N—2)). In this equation, R may be less than M, depeading on the values of N and M. Setting the derivative ofthe DT count ofthe 1CPP spproach to zero results in the mathematically optimal ling pn Operon Ona 2EP | ARKEN=ONGTOMBD | gay a $1CIN= 2) 10) Gna) [“aaTeaNeioNMt—aM=9 [Ret TORUS aN FORMS | rea aay asi | +N 4cu0N 10) ‘and column seg- = SN GONT aN aa SNM ~ORNME ‘ewan Dar Trier Cont 2crP | GURKNTANS RMB Ty cy #HGN=2) 44201) NAPE [NIN eNMRaM=? To] AKIN N RMB] aay aaxrey | H2N-eivem = ae Table 1: SVD algoritim operation count totals. value of R=(N?-N+NM-2MVQN-1)). Ageia, R ‘may be less than M, depending on the values of N and M. ‘An examination of this equation, however, provides inter- esting results. Leting M=N, the equation reduces 10 Re=N-QN/GN= 1)), so the optimal value of R will be between N=2 and N=1. Therefore, when using the ICRP algorithm with M=N, the number of DTS will ecrease as R increases from I toM ~2, Also if Min the original equation is reduced to less than N24 by some ‘power of two, the mathematically optimal value of R is larger than its assumed maximum value of MsN/ and the minimum mmber of DT is always reached when the ‘maximum numberof PES are used. ‘The possibilty that the number of DTs performed by an algorithm may increase as the number of PEs increases ‘means that there could be a case when the total algorithm ‘execution time increases when more PEs are used, A ‘method is presented in the next subsection for determining ‘whether this is tue for a given system and problem size. 44: Performance prediction ‘A method is adapted from [15] to predict the number of PEs to use chat will minimize the execution time for the SVD algorithm, This method gives relative weights to the FP and DT operations by the determination of communication ratio (CR). This ratio is used with the complexity equations in Table 1 to predict only whether performance ‘improves as more PEs are used. Because the numbers of FP and DT operations do not account forthe total execution time, machine-dependent data was collected to use for the prediction. ‘The CR is calculated in tems of average expectod time to perform a DT over the average expected time to per-form a FLOP (including memory access and array address calculation times). The units of measure for the CR are (sees. DT)\secs. /FLOP)=FLOPYDT, Various met ‘ods can be used to determine the CR. The one chosen executes one implementation of the SVD algorithm on small matrix, using the minimum number of PEs that the ‘implementation allows. The ICP algorithm was arbiter. ily selected to measure the CR. with four PEs being used to decompose a random 4 matrix. Although the PASM. prototype can operate in different modes of parallelism, SIMD mode is used throughout this analysis for consis- tency. Hardware timers are used to measure the execution times of the operations being considered. Because the PASM prototype curently performs all FP calculations in software and has a relatively fast inter-PE communication network, its CR measured 0.119. It is assumed for this analysis that the CR does not vary with the number of PES sed, Using the CR, the predicied performance (PP) of a ‘machine running an SVD implementation is approximated by PP=(o0. of FLOPs)+CR: (no. of DTs), and is a func tion of both matrix size and the number of PEs, With this efiniion, PP will have units of number of FLOPs. Because the PP equations for the 2CPP and ICP approaches (PPacey and PPicep) do not consider many ‘overhead operations. they'do not provide absolute execu tion times, but they are reasonable estimates of relative ‘execution times as R, N, and M are varied. Therefore, they can be analyzed to determine the number of PEs that will provide minimum execution time on a particular machine. Implementation comparison The operation counts of the 2CPP and ICPP approaches are now compared. One comparison covers when the number of PEs equals the minimum common number that tbe two implementations can use (N PES). A. second comparison is for when the maximum common number of PEs are used (NM/2 PEs). These two cases are focused on because various numbers of PEs can be used, pending on the valves N, M, and R. The third case directly compares PPscyp and PP cer. To compare the two implementations with N PEs, replacements are made for R and r in the equations of Table 1 which correspond to using N PEs with either approach. The 2CPP approach requires both fewer FLOPS and fewer DTs under the constraints that M22 and N=4 tails in 20). Because these constraints are met forall values of N’and M of interest, the 2CPP implementation is ‘expected to be the fastest (neglecting differences in over- ‘head berween the two approaches) when the minimum common number of PES are used. ‘To compare the two approaches using NM/2 PEs, the same method i fllowed, with different values replacing R ‘nd rin the equations of Table 1, Analysis in [20] shows that the ICPP implementation uses fewer FLOPs when [NM/2 PEs are used, under the constraint that M> 2, which js tre forall values of M of interest. Its also shown that the ICPP implementation uses fewer DTs under the constraint (M(N~1)-(m+1)+M)>N?. This inequality is ‘ot tre for all values of N and M, but it can easily be shown to be te when M
You might also like
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (266)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4610)
Magazines
Podcasts
Sheet music
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (122)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (590)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2259)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (540)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (608)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (401)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (5813)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (844)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (822)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (234)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2409)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (271)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1929)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1716)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (897)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1898)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (137)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (474)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (348)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1092)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (441)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (104)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (2104)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (3811)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (74)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2521)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4203)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (789)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (98)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1850)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1104)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (104)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (1947)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1018)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (792)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (807)
Constructing and Writing Mathematical Proofs - A Guide For Mathema
Document80 pages
Constructing and Writing Mathematical Proofs - A Guide For Mathema
Jaime Andres Aranguren Cardona
No ratings yet
Support Vector Machines and Perceptrons Learning, Optimization, Classification, and Application To Social Networks
Document103 pages
Support Vector Machines and Perceptrons Learning, Optimization, Classification, and Application To Social Networks
No ratings yet
The Structure of The Real Number System: "The Integers God Made All The Rest Is The Work of Man. " L. Kronecker
Document10 pages
The Structure of The Real Number System: "The Integers God Made All The Rest Is The Work of Man. " L. Kronecker
No ratings yet
HHU
Document10 pages
HHU
No ratings yet
Using The FFT As An Arbitrary Function Generator
Document5 pages
Using The FFT As An Arbitrary Function Generator
No ratings yet
ADSP-2189M A
Document32 pages
ADSP-2189M A
No ratings yet
HT45F0074
Document1 page
HT45F0074
No ratings yet
ADSP-21992 Pra
Document48 pages
ADSP-21992 Pra
No ratings yet
Updated PI 20230307
Document1 page
Updated PI 20230307
No ratings yet
Attendeelist
Document37 pages
Attendeelist
No ratings yet
NVKlee Digital BOM
Document2 pages
NVKlee Digital BOM
No ratings yet
RPi Setup
Document1 page
RPi Setup
No ratings yet
Kodomo No Hihongo I - 003
Document1 page
No ratings yet
ADSP21990 Board Manual
Document60 pages
ADSP21990 Board Manual
No ratings yet
Allwinner AXP203 Datasheet V1.0
Document55 pages
Allwinner AXP203 Datasheet V1.0
No ratings yet
AXP209
Document49 pages
AXP209
No ratings yet
L14 SVD
Document8 pages
L14 SVD
No ratings yet
Document1 page
No ratings yet
Document1 page
No ratings yet
Document1 page
No ratings yet
SSPv1.7.0 Additional Usage Note (R11ut0064eu0101 Synergy Sspv170 Additional Usage Note)
Document18 pages
SSPv1.7.0 Additional Usage Note (R11ut0064eu0101 Synergy Sspv170 Additional Usage Note)
No ratings yet

164

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

164

Uploaded by

Copyright:

Available Formats

You might also like