You are on page 1of 289
An Introduction to Parallel Algorithms Joseph JaJa aryl he desgnton ed by manor dels to Stings bi oduct ae ca ms a ‘The propons an ppcations prsetd hs ok ve een edo trict ane Fey he bees cd wi creat 2 aan {hy ptt prpse. The pabiner des a ay warts ‘eben rote am abs este amr bray of Cons CataingioPubeaion Date 0.44, 3.7 1046 ton . Eine pura ti eh 8. Includes bolingraphical references and inex, * SOF: Bavaria vsesa TS Parlelprecesing (Eectoniecomputers) 2. Compute aigontio The > OkeSes Toe te sends ase cP ope 199 Assn Wes Pablting Company, los Alig eerie. No pr hi beatin maybe pried ore in ref to ae yy ea con Inchon pte econo herae no he piste) ‘rrson othe poste Prnedin the Unced Sts of Ree. 456709 10m 9695 PREFACE ‘Thisbook san troduction othe design nd sna of paral algorithms ‘There i satcien material fora one semester couse a he Sei Or frst ear graduate eve, and fora folowup gradte evel course covering mote ‘vanced mater Our pial model for alginic design the shared ‘memory model however a of ov alrite ae described at high evel Similar to tht proce by 4 datpralel programming envrcment. eis {sumed that the feader has some understanding of elementary drte ‘mathematic, asic dita sutures and agen, atthe leve! need ‘ypclundergradate curren incomputct scence computer engineering The Need Fundamental physical limitations on proceing speeds wll ently fore high performance computations ie ated pnpallva te explain paral, Jost she fastest le es ae approaching thee nde ‘ures new pneratonsf paral machines af emerging Ista ater ok tie uni he anton aflet the whole acy surounding gee purpose computing. We hope that thi book wil serve ara veh topeeare the ‘Mudentsinconputer science ndcompulerenpiecringorsuchsrsstion, ‘The Model A universal accepted mode! for desing and anaeng sequel ag ths sonst entelproesing unt with Tandom ac memory ‘tached woit-The saves of tb mode! iprimariy eto fs imple and to abtyto capture in x ignfean way the perfnmance of squeal alerts on on Neomann-ype compu. Unrate para! om {ston fer rome co common acted afl cote aon complet itoduedy the presence of ast freon posers arcuate prtorance of paral lgnih eerso depend ons factors such asoverl sure equenens cman ence p> ‘esr lle nd hedlng communion, nd snchroiation ‘urea mae! ov alam deg thy oak the sa ‘memory modal of. ote precy. Ie pote! rendom acces machine (PRARD)ts fra ode alow utc opal ey ad telat pra compleiytocompeiymessres dered on casa mds sch athe cuit model ane Turing machines: However grins Sredeserbed at ahigherievelcaledte moraine pret rman a ihframcnork anit s described intemsofssequnce ttn oi ‘here ay numberof concen priniveopezstins ean ake lace ding {ime ont The prformanc ofthe paral sloth mesure a {Em of to pamelor: mek whieh fh dfne 0 e the total amber of peas used bythe lg, and oa te, hich she nme Ae unis reqited to execute he sgh. Ths farework proves + Simp machine independent model for sing pacll slgsins A + tater act thn reltd tate praeligithethat havea ‘Showa be ecenlyimplemenabiecns wide vay of parallel mactnes {SIMDor MIMD, sksedinemory or dstrbuted memory The dateparat Progamming mol as Been steadily guining scepane the parle rossing communi ‘The Use of this Book ‘Thete ate numberof ways to organize the material ofthis book ini = ‘one-emster cour ur 2 sequeee of wo one semerter courses. The the chapters (eacept the sections marked with sam) and Section 41 ep reset the core mate tt shold be covered inst course In pale Sortie Chapters Staough Dae mony independent of eachother and «an be conte in lost any onder (however, 3 pale approseh snot ‘soonmended!) One pose equence fora baie course paral lg Tithe consists of Chapters 2 Sevtons 31-33. 41-42 431044,81-83, f.1-53, and 81-8. Chapters through IOeanbe used asthe basisor a second, me avaneed couse n paca algrthns. "A tection are wth star reprecets either optional material oe acvaneed materal that canbe skipped without os. of continuity. Each Sapte ens wih 2 mune of exerees, ome of which are sirighiforvard Spiions ofthe dees developed mie chapter, and ters presen aot ‘ents, Exroe jged te om he die sie are tarred More tals "ndenennns cane ound the references ced at he ed ofeach chapter. ‘atthe end of each chape,Thave ced references that ae related 0 ‘he mater covered inthe Capt. My apologies the may authors whose trek on pra grits was atc wil constr any comments ‘Shc in he cpa and will make apprepinte updates flare eons of te book “Comment cr corrections can be commusizaedothe author trough ‘he emai addres septs umd ea ‘Acknowledgments "wou fk to thank ihe many people who hae contributed significantly 0 ‘he quality af this ook "Tac work on this book oul ot hive hoen started fit were net or the enero sepport andthe professona environment provided bythe Parallel Processing Grovp atthe National tsitue of Stanaards and Techaoey (NIST) wheret spent sabbatical duringthe year 1989-109. Iara, Tesh thank Carpenter and Mik rte horpitaliy and hee lp “Mycellesguesa the Unversity of Mayandhave ben extremely be {ulinproidng suggestions and constructive cit during the reparation tf the anger partic, U Vin has provided me wh valuable Insghs and comments, especaly uring the sil stage of is work. 0. Berka, - Dav B Grane, R Grenier, Khu, ¥. Matin KNB ‘ims S Remgarjn, and Re Tusa ead porons of he manus and Bene corrections and sgestione for improving the qi the ok 1 Would as ike to tank gagoat stents M, Farah, 8. Krsthaamuet, E McFadden, KW. Ry, and SS. Yu for thei hel. My graduate stat Kolagola fas Kncy sipped almest alo the ginal gates fr this book, ‘Outside the Unierty of Malang, hee are many people who have reviewed one oF mote chapters of the book and provided me wih hei iit comments | would ne an i pare M.A {Gancari, Ganda Pan. Rel and Schmit for thet hel. fm aso indebied tothe folowing reviewers fe thei many eal comments a aries version ofthe manuscript S, Brose, L Heath Manber B. Moe! 'S'Seidmen, D. Huschberg 1 Roepe, and Zaha, Final, wh to thask my wile § Laskowski, mostly for her unde standing the time commitment necesary twit uch Book ao partly forhereitoralcommeats on several portion this book Coleg Perk Marland Joseph Js CONTENTS rence 1 INTRODUCTION 12 Beckgone 43° Paral Moce's 44 Pertormanca ot Paral gore 15 The Were ProsoratenFramovok of Paral oortrns 18 The Optimaity Notion 17 “Communcaton Compesty 18° Sammary Eneroses Bilographic Notes Reteonces 2 BASC TECHNIOUES 24 dalnced Tess 22 Pomer hneng 23° Owe and Comer 24 Partening 28 Ppainng (8 Accoaates Cescadg seegeee seege 2:7 Symon Bresing 28 Summary Evercees [Biographic Notes Feterones LUSTS AND TREES 34 Let Rarkro 82 The EulerTeurTecrnque 33. Tee Conracion 3a Lowest Common Ancestors 35° Surmary Sonics ibographe Notes SEARCHING, MERGING, AND SORTING 41 Searching 42° Merona 43 Soring 44 Soring Notwerks 45. Seecien 448 “Lower Bours for Comparten Potions 47 Summary Evercioes Biegraphic Notes Reiornoes cRaPHs ‘51 Connactes Componets 82 Minimum Searnng Tees 53 Beornecteg Components 54 Ear Crcomposiion 55 Orecies Gupte 88 Summon, Sreroses Biotgraphe Notes Peterncoe PLANAR GEOMETRY 6s 62 63 “Tre Convex Probar Rewted Ietreactone ot Conven Sete Plane Sueepng SEER ESRREEREEE Be 84 Visbity Problems 65 Dominance Counting 68 Summary Sere Bbgraphic Noses Felerencos stanes TA Praimanary Fate About Stings 72 Sring Matenes 73. Tee Araya Ta Pater Anais 75 Sut Toos 1 opieabors of Sut Tees 22° Sarvnary Ereraoos Bolographic Notes Reteences| ARTTHVETIC COMPUTATIONS 22 83 aa as as a7 a a a unear Avorencas| angular Lear Systm ‘ne Discrete Feuer Taner ajnomal Mutipleaton and Carweton Toop Matices Pajama sion Pojemal Evauson ard epoaton (Gereral Dense Matrees *Oorse Siucred Maticos Summary Emre Bolograpic Notes Peterercos| RANDOMIZED ALGORITHNS Perfomance Meacuos of Rancomzed Pare! Agents ‘Problem o the Fractional ndepandont Set Port Locabonin Teanguated Pana Subdivsces| Pater Naienng orion of Polynomial tries Sortng Mumm Matening e8egeR SSesesaes gegae EES es geegte @ fe 3 ‘sumenaty Series Siblgrapnic Notes Felerances 10 “UMTATIONS OF PRAMS 404 102 103 104 105 108 ‘Noe SSuaions Botwoan PRM Modale Lower Bounds for the CREM PREM ‘Lower Bounds or he ERG PRAM {ower Bounds one CRG PRAM Irroducton te P-Comaltencae Sunmary Eneroses Siblograpne Noes Pelerences @ Seeeeceess see8 1 Introduction ‘The purpose ofthis chapter i to intoduce several pale modes slo speciy a suitable Trameork for presenting and aalying parallel algo ‘itm. A commonly seceped model or desing apd naan sequential "urns const oacenel processing unitwith random cress memory ‘tached The pia stron efor this model nls reading rom {Endrin into the memory, and basi lope and athmetc operations. The ‘ces model se fos simply nd ite ability tocar he pela Iance of sequential algorithms on won Neamnaype computers Une teh parle computation sutlers rom the lack ol vuch amey accepted "gnrhmie model There's no such model primary because the perf mance of pra algorithms depends oma tof tetlaed faces ina ‘mpl anion thats machine dependent. These factors ince comput ‘onl concurrency, proceso allocation aod schedlng, communion, sndsyochionuation| Thahschapet, we tart wih genera discussion of paral processing and relted performance measures We then sntoduce the tele es ‘dey wedi sort development and anh. These models ae based tn diecedscyte gap, shared memory, aNd tetwons.Dreted ace 2 crs mtn sy, and can provide simple paras ane that doesnot ile any bchitestorelstes fenre Thesharedsmemory mode, ere mare ‘ot process cemmenicate though s common obo memory. flrs am Stacie trameverk forthe developent of algoithai ecb or pa tleleompurations Unite twoother models the mebwork model coptures “ommuneaton by incrporting the topology ofthe interconnection nt the model sel, We sow several paral algorithms on these me, #3 luwed by a bref comparison, “The ahured memory model scrcs 38 our Wohi for designing ad seayzng parallel lpn hs hook an an heer fer proud for {theoreti research ins bath the power and imitations of palm. We Shall describe a general framework for resenting and anascng parle) ‘leona ats node 1.4 Parallel Processing “The main purpose of parallel processing 0 perfor conpratons ser thon canbe done with sige pocewor by ving x namber of prceor once. The puri of hkl ss ha temendoos cence oF Simos al Be actus elated 1osompuring. The eed for faster Slutions Sod fr suhag lrgestae problems ares ia wie att of appiaions. ‘These cde aid dynam, weather prediction, odsing ad simulation of age systems formation proces and extation mage proceSCe. Srl ineligence, nl tomate manta “Three main acon havecontrbutedto the current steng trend ine ‘of parallel presi, Fit the hardware cost hus been fling steady: hotest how poset Dull sats with many proceso at eas able cist. Secon, the very ltpe cle neration (VLSI) cel techy has advances vo the pat wheres possible fo design complex sem requiringrillon of tarsisors on asingle chip. Thr. the astest le time ‘oF von Neumann-type processor scems 10 be approaching funsamenta physical imitations beyond which romprovement spose nado, Inger performances qucerevout of sequent pocsior the asocated font increase dramatically All hee factors have pane rscarebes io ‘plorng parallem and is potemal use imporant pplieaons “A par! compte is nya colton of proceso, pal of He some np inercomected ina cna fasion 0 low the coon ofr eres an ae echang of da. The processors are sumed We rated by Petetmecsing 9 ‘within sel distance ofone another, andar primi used ose agiven Problem ony, Contrast sick computer wit dstbuted systems, Were 2 cro osuby many diferent ype processors are dsiuted eer a ge [soprphic aren an where the prim) golsae wus the mallabe sisi ‘ed rexouree, and (0 calee information and transmit over @ network ‘imnesing the various process. ‘Paral computer san he sas acotding to avait of architec tural features apd moses of operations a parc, hese citer ele the pe and the ruber of procewon, she interconnections among the [procenors andthe soresponding communication schemes, she veal en {tol and synchronization, andthe input/output operations. These conse sons ar ouside te spe of Hs Pook. "Dur main goals pete lero tha ave mth or mplememtation ‘on pra computes. We emghszetechnigos, paradigms, ee methods, er than dead sgn fo specie applications An immediate ques ton comes ming: How shoud am sg be evaluated for subi for ural prosesing” As inthe cate of sequel sioritims there see Several imputan eritela, sucha tie performance space slain, snd Progtammably. The sitstion for pail lgetins ore complicated ‘oe tothe presence of adational parameters, such the Rumber of proces Sirs, the cpactis of he lel meals the communication scheme, ad the eynchroiaton poten. Tope started, we toda to general mea ‘irs common wel for easing the performance of parte lgnth. ere a gien computational problem and let n bei input Se Denote the sequent ompesiy of Py Tn) That her sa sequen "leo that solves Putin htm Bound adn aiion wean rove that no sequential lgrithm can solve Paster. Lett be a parle algorityn thatsobes atime /,(n)ona paral computer with proesars Thea. the speedup achieve by 4s defined tbe 5,9 = $8 ‘het, 5m) measres the speedup lator obtalnes by algorithm 4 when proctsnrs are svulable [aly nce 5y(n) = pee wos Uke 10 ‘Sesgn alsin that chine Sn) ~ p.Inresby. there ieseveral ators ‘hat mtodoceinfiienin. Tee elu nslcen cancurency i ihe ‘compotion, dls inroducedby communication, andovecheadinerred in ‘Snoring the acties of vas process ain eontling he stem. ‘Nae al T)the runing ime of the parallel agit 4 when the umber p of proessr sequal to is mt neces the se 387") ence, the peep smeared relative to Thebes pose sequel arth, Tas common prose to replae T(x y the tate Bound othe Des nc quent algoitim whenever te cumple othe problem not known 4 ote oan Another performance measure ofthe parle algorithm iefleney, etined oy E,t0) = BS. ‘This measure provides an indsaion ofthe elective anton hep processors lative fhe gen algcitim. vale of ,)approumateh {ual for sme pines that algorithm rans approximately ines {aster singp processor thanit das with one proceso. follows hat each ‘the process cingsful wrk” dering each time sep relate tothe total amount of work required by algorithm ‘Thee cx lining bound on the ronning time, dented by (0) beyond whic he slgoithm cant ru any faster no matter what he am be of processors, Hence, Z(t) > Tn) for any value of, and thus the _eficency Fin stsies Eqn) = TPF a(n). Therefore, te efceny of Sm igrithm depres guy ap prom Deon TNT <0) ‘Our main goal nth bok a develop parle! algonthes that ean owbly achieve the best posible spesup. Therefore, out model of pall Somputation must alow the mattematial derhation ofan estimate onthe ‘running time Tn) ana the etainiment ote hound on he Bests Sle speedup or piven problem. Before introducing several candiate ‘nodes we outine the background bowledge tha ears sould hve 1.2 Background Readers should hive an understanding ofclementary dita sn bse techniques for designing and analyzing sequemtl alot ‘ate usualy covered tthe undergrads level in computer scence {Indcompuer cnginecringcuecls, Ou termitolgy and potion af stan ‘lar hay ae described in several the eerenes given at hee ot Ns sapier ‘Alaorithmsareexpressedinahigh‘evelangugein comme use. Ech gor gin eth a deerption of tempt ands obtpet,fwed ty 3 {totomen (unicheomatsofasequence ofone or more statement). We net [pve alist olthe datements most equcrly ose nour algorithms Weal ‘Spment hi it rer with oastruct eeded for espresing paral ‘aniabl: expression ‘The expression onthe right evalated and assigned the variable on thelett 12 mace 8 2 Begin statement eel ot ‘This ck defines sequence of statement tht mat be execute io the onder in wiih hey sppese 3 Condonal statment (condition) then sateen [ee statement) “The condition i evaluated, ad the statement following then see cited he vale of he conan re. The ele pat soptonl ik ‘xeeted if the conton i fale Inthe case of ested conditional Steen we we braces inate the Msatemeat associated With ‘ashlee statement 4. Lone: We ws oe ofthe flowing evo formas {or vatiable= inl valet fl vale do statement ile (condton) do statement note fr oop flows. fhe ial vale sess ea value, he statement flowing de exceed, nd he value of the vaable iscemented by one: Otherwise, the ‘reutionaf the loop terminates Te ume pocessrepeate wath he ‘ew vale of the vai, unt tat value exceeds the Bl value, fa stich eae the execution the oop terminates ‘The leloop similar, except thatthe contin sestedbefore each ceceuton ofthe statemens If the condition i tag, the statement executed ebers, the execution ofthe oop terminates. Es statment ext ‘Thisttemen! css the ection ofthe whol agothn to ermine “The bounds onthe resoures (for example time and space) required by ‘sequel alportm ate measured sa fencton ofthe lap sae, which Fees the amount of Jala tobe processed. We ate primary itetested i the worstease ana oF alot ence, piven an input size m each ‘source bound represents he maximum amount ofthat resource required bany stance of size m. These bounds ae expressed asmpotcaly using ‘he ftioming tandied notation: © Tin) = OU) itiherexis postive constants dng uch tha Tn) ie) forall n= © Tin) = Nseries coma ani sich that Fo) = Si foal = 1 Tin) = O¢sem) ET) = OU) and Ten) = AU, “The runing ime of asequerat algorithms estimated the mambor ot hase operant teguted by the agit as unc of ths pte “This devon natraly leads to the questions of whl sonstute base ‘operation. and whether the to's gperation sole arto of he tord sz af the dataiaoted These nes depend a the scifi $M fand andthe model of computaton sed. riety we char auto ine tothe operation of reading fom ond rings the memory a to ase vim and loge erate (och as ulding sarang. cmporing "ultiphingtwonumbersancompating the Wiehe lec OR oe AND ofS ‘vas. The cos of at operation doen dependonthe wordsie Fence me reusing ht scaled te anor cst erherion A formu compuations ‘oleate for our purposes ste Random Access Machine (RAM, hich sssimes he presence sfacentalprosingumituitha andonees memory stacked to an some may fo handle he inp a the out ‘operations knowledge ft model bson our ioral desrpion no necessary for undersanding the mater covered inthis book. For Oe Aetilsconcerning the anabsisof algorithms refer tothe iblegrapc mts the end ofthis chapter Fay Yui ase inthis ok ne the hse 2 anes wie sate. A logariti used nan asypionc eres wl ava havea 1.3 Parallel Models “The RAM model has been used succesful 0 prediet he performance of Sequential slgortns. Modahing parallel computation considerably more Challenging sien the new dimension introduced ty the preence of many Intercorncted processor. We shoul tte atthe outset hat we re pram. ly mterested italgorthmic models han br ued aera ramen for kescating ond ana pralelyrhms. del werd ke oat me tots the ong (onftng) equietents © Simpiiy: The mod! shouldbe spe enough o allow us to dsr pall signs esi. and to anatyre mathematically important Performance asus such speed commuricaton, a memo ut ton nation the mak sl tet ay prt cass ot “rotecttes and Hence shold be as hacen spose © Implementabilin: The pall algorithms developed for the mode Stabe easy tnplementbl on poral corpares- In aden, the als performed sould capture ms igefiant Wy the ata pr Flymance of tes algorithm on paral computer: “Trust n sing slg pre ol as prone oe ae ae woos eure pr prises The ere coma an ‘Sondanranberofpaie ierahmetor ele arhetes specie Panache chp ea et a oe sas [er ew unin techn meth Teton kt tivough ne mols thc paral models ind twin dese hi velane erst paral mod. se paral Compara resorting wei an Boglan ciel dace ier ntie bo ie a wearer othe parte ode we 8 eo an pros jon or neice 418.1 DIRECTED ACYCLIC GRAPHS Many computations canbe eprescted ty ete acy graphs (gin ‘atul ny. Evin cps nose tha aso noang I ch opcroton represent hy node hat hs naming es the oes epeneng ine operands Te npc ofeach tena node at Imost wo A vode mos cu depree requ 5 repens an cutpt. We Some the nt ect enrion, fre each nod epenent ah operation that takes one unto tin ' ‘Pete oye raph with n inp nodes present computation shathosno brnchingimactonsontha ss puto ee Teele, Shalgrtneprteted by fami dats (Gh whore Gy crtesponds toch algo wit apt si hs osprey se easing name compte sins banchiginiato ate peated tocar ase poration cra smb tines dependent onthe pur sien. nhs Sere ant Scie ni by cig the sexes ‘Speaton tobe repeated the appropiate of tines. “A cagopecton ib operations performed y teapot, and imps precedence coarints othe cot in whe these pertions Mus De fatomed. incompletely aehteteresndepenen ‘Consider the problem of computing the sm ol the = 2 elements of an Saye Two poste sports are presented Uc dagsin Fig oF eo The alors in Fb) comptes the pal sms eonsecuth

You might also like