You are on page 1of 12
From the Proceedings of the Distributed Data Acquisition, Computing, and Control Symposium, December 1980. {THE HYBRID CUBE NETWORK Robert J. Menilen Woward Jay Sieget Purdue University Sehoot of Electrical Engineering West Lafayette, Abst * Large-seale, multisicroprocessor-baced distribur ted ond paralict computer systens are now teznnolo~ Qically feasinie, The Ballistic Missile Defense aNd) Agency is designing 2 test bed for evaluating such syatent as they may apply to GMD tasks. A Righly-flesiule network is needed for consunican tions wong crocessors, nenories, and other dev! Eese The Mybria Cube netuork 46 proposed here for this purpose. the Hybrid Cute is an implenentation Of a multistage cube network that exploits the ad- Vantages ‘of tuo network configurations: 1) processor-to-senory, in. which processors are con Peeted to one side of the network and menories are Connected ta’ the other? and (2) processing lenenteto-pracersing element (PE=to-PED, in which processors are paired with memories to form proces Sing elenents which use the netvork to conmunicate Sith ‘each other, The network is designed to sup~ port both SIMD and MIMD environaents. 1. INTRODUCTION Large-scale, multinicroprocessor-based disteibu- ted and paraltet conputer systems are nov technolo Gically feasible. The Ballistic Missile Defense eno) Agency is designing a test bed for evaluating such systens as they nay apply to OND tasks (6). A Mohlyctlenible network ts needed for communica ions enong processors, menories, and other devi~ ‘The Hybrid Cube network 1s proposed here for hs purpose, it is in the class of Generalized Cube “netvorks (23). Various properties of cube networks have been explored Ce.gey ty 2, 3, 5b, Fe 9, 10, M1, 12, thy 15, 16, 18, 19, 23, 26, 25, 2%, 2%, 285. The hybrid cibe is an implementation Of” a multistage cube network that exploits the ad= This work uas supported by the fallistic Missile Defense Agency under grant number DASC6D-80-C-0022 and the Air Force Office of Scientific Research, Kir Force Systens Command, USAF, under ArOsn-78-55B1. The United States Governent {3 au- thorized to reproduce and distribute reprints for Government purposes nonwithstanding any copyright fotetion hereon, The views, opinions, and/or tin~ Gings contains: in this report are those of the author(s) ang. anould not be construed as an offiei~ A department of the Aray position, policy, or de~ Cisiony unless 0 designated by other official do- cunentation. ‘c571-s10/0000-00115075 © 980 IEEE ” N47907 o in which processors are con vantages of -tvo network configurations: processor-to-nenarys Rocked to one side of the network and memories are Connected tor the. other? and. (2). processing EmeEEeEtonotecesting element CeseosPED, in ahich Groceasors are paired with menories to form proces ing elecents which use the network to comunicate GIG “caer Sotherc the network 1s. designed to sp" Sort both StRD and AIM® enviroments. Tne acronye SIND stands for” single, sgstruct ion aurenn o “nutciple ota strean Tl. TyBteatly, on ‘Stee dacnint Te-> tomputer system consisting of 2 Eomtrol. unit, processors, M memory modules, and Gr interconteetton “network, "The contrat “unit Boadcasts instroctions to all of the processors, Srd'ail active processors execute the sane. instruc Sian’ at the sane tines, Thus, there is-0 single n= Struction stream. Each active processor” xecutes The'instruction,_ on date. in. its. om associated Steory aodules Thus, there "isa multiple. data Stress. "The intercosneetion network, somet ines re= ferred toes on aljgrment or permutation. network, provides” a'comunfeations facility for the proce Sera ond senory modules. Tn'RIND ultiple instevction strean ~ multinte ea" Hertanh naehtse “Cel typically consists of W Sracestore and Weenories, wnere each processor can fatto an independent ingtruction stream,” As with the Sino architectures, there is. a multiple data Stteas and’ an interconnection networke Thus, there Stew independent processors whieh may _comontcate Seong. thencelves, There may be a coorginator unit Rerhelo orchestrate the activities of the proces A” pactitionable SIND/MIND system (18) is, arate processing system which ean be structured one of nore independent SIND. and/or RIAD Mienines of varying size Cevge PASM (20D), In or der to be a Teaible reseath tool, the OAD test bed Shouts be. reconfigurable under’ software contro| W°Snouta be” sole. to emulate different size multie= leroprocessor systenss furthermore, by designing {tte be a partitionsbie SIMD/MIND system, it would be “able “to simulate several SIMD and/or MIND sys Stes which can communicate with eachother, | This We altoTa‘nacarat way to view sieulating Both the data’ source Cengee radar yates) as well a5 the Gita “processiog’“systen. The proposed Hybrid Cube Setvork can” support. a” partitionaole SIND/AIM Gmanieally reconfigurable test beds Tmplenentstion. considerations for, the hybrid cube netvork for the OND test bed is the main focus grins papers Wovever, this study of "design 0D the fons for each of the two multistage cube networks fn the Hybrid cube is applicable to any SIMD and/or IND system vhich may Use this type of networks th particular, a multistage cube network {3 being con Sidered for PASH C13, 17, 20, 21, 22). Thus, the discussions of design tradéotts here will aid in the development of both the EMO test bed and PASH, ae well ae other systens. The logical structure of cube networks is de- fined in Section II. In Section III, the adven= tages and disadvantages of the PECtO-PE and processor-to-nenory configurations are discussed as well as the implications of circuit switching vere sus packet switching. Section IV describes how the two configurations and implementations are nerged to. fore the Hybrid Cube network. In designing the switching elements many options are avsilable, especially “if packet avitehing is performed. In Section V, many options are enumerated. and. their effects on the cost and performance of the network characterized. Several architectural. designs for switching elements are" presented” in Section Vic Consideration is given to design options that ef fect nce under blacking and non-blocking conditions. incorporating the designs into the Hy= brid Cube network is discussed. HE, NETWORK DEFINITION The Generalized Cube network {¢ 2 multistage gube-type network topology which was introduced in (233. "te “has been shown’ this topology is equivalent to that used by the onega (61, indirect binary n-cube C11, STARAN (23, and SW-banyan (Fo = $= 2) (72 networks £19, 231, “In this section, the overall structure of the cubs network f3 described. The Generalized Cube topology is shoxn in Figure 11. “Assume the network has N' inguts and W Ist, N= 8, The generalized outputs. In Figure ‘Sube topology. has stage consists of o interchange boxes. BB logaN “stages, vhere each Set of N Lines connected to N/2 input, twovoutput of the input/output C/O) “Lines” entering” the upper and lover inputs of am interchange bor ere used as the Labels “for” the "upper and. tover outputs, respec tively. The labets are the integers fron 6 to Ne The connections in this: network are based on the cube” interconnection functions C12]. Let P= . ° o Q x s [ro y : 2 On s b [7 re ’ ts le Bu t I. r AR? esr { | 2 sme 7 7 . Generalized cube topolosy for N Figure 11.1: 2 Me aH Figure 11.2: Three dimensional cube structure, with vertices Labeled from 0 to 7 binary. Pget «++ PjPg be the binary representation of an arbitrary 1/0 Line Label, Then the cube inter= connection functions can be defined ss? ube 5 yay ++ P4Rg) = Ppt oe+ PeesPsPjey s+ PAPO where 0.4 By assune that at Least one whole vord is transfer réd'between B/G eycles. This implies using mULti~ ple transfer cycles to multiplex the word over the Gata path. If the R/G cycles are synchronous, Pto/ad + 2 network clock cycles are required for each stage to execute one transfer (Oxi is the smallest integer greater than or equal to X), The time required for one packet to traverse the entire network unimpeded ie TS = CW/P3(PCD/0I + 2m (recall m is the number of stages in the network). In an asynchronous network the number of network clock cyeles required for a message to be trans fer= ved from one stage to the next is wooed + 2. The tine required to traverse the network uniapeded TA = (4C0/83 + 2a, It is not surprising that TA < TS. The equation for TS. reflects the overhead due to fragnentation (the CW/e3 term) and due to the extea R/G cycles CDW/PI2" versus "2" in TAD. AS an example, Let 0 = 16, 8 = 8, P= 2, and w= i, The message to be sent” contains 2 oné word routing tag and three date Seltch To Stage re Request (U) Grant (u) Request (L) Grant (U) Netork Clock Architecture of @ packet switching node containing two Input Queues aueve PHF et Select Steatoht Queve From Stage i Fen i= Queue steerer Exch in Request (U) J request (0) s/e (u) }— s/e (u) 2a ol Lei tee (0 [rato s/e (tL) sve (U) wy oath Qi swt state fst woe Figure VI.2: ‘input queues ords. Assume the network can acconnadate 125 pro~ Cersors. so that w= 7- TS = 84 cycles and TA = 70 Gyeless Notice in ehig example that there is n0 Gsgnentation, so the difference in tines is due to the additional R/S cycles. Synchronous operation Fequires, 20% more network clock cycles in this Case. If W= Py there ts no speed difference betwcen the two iaplenentations. tn the preceding discussion, the timing analysis os “performed assuming no blockage occurred. One pect cf the design that can make blockage “worse then it needs to be 8 iUluminated by the folloving Scenarios There 's 2 nade in stage 1 whose queues are empty, upper” output Link is unavailable, and Toner outpit Link fe available. Two packets, A and By arrive, on the respective input Links and are Stored in the corresponding input queues. Both packets request. the upper output Link. Since that Tinks ts not avaflable, the packets wait. during the, next network clock eyele, a packet, Cy arrives fon the lower input and desires the lover” output. Despite the fact the lover Link is available, pack fet € cannot proceed because packet 8 4s ahead of Si the, queue. Once the upper output Link becones: available, packet C must wait one or two additional etwork elock cycles. Cepending upon whether packet Kor bis selected first). due to the queuing dis Cipline iaplenented, packets can be unnecessarily Gelayed. "To avoid that problen, the design of Figure Vi.2 $s proposed, Associated with each in- Architecture of a packet svitching node containing four put Link are two queues, labeled straight and ex Change. To determine which queue an incoming pack et should be placed in, an additional signal, S/E Gtraignt/enchange) 3s. provided. S/E is derived fron the routing tag and presented sinulteneously With "a request. Regardless of _inplenentation (discrete or LSE), the S/E signal can be multiplex egwith a data Line, Since S/E is specified during the request cycle, the packet can be gated directly into” the appropriate queue during the transfer. cyele(sd. The two 2-to-t aultiplexers and switch bor shoun ‘would actually be implemented a5 two Getont multiplexers to minimize delay. Each multi~ pleser ‘would be able” to. select any of the four Queves, under supervision of the contrat unit. There is 9 distinct tradeoff in the design of a packet switched node between cost ond performance One factor that enters into the tradeoff that has fot been mentioned concerns component speed, which Te a function of device technology. If the network js fest. relative to the speed of the processors, there may be low contention, and designs such as the one shoun in Figure VI-2 may not be necessary. On the other hand, if the processors can generate messages. nearly as fost as the network can accept Then, more sophisticated designs may be a necessi~ ty. A key aspect to deterwining the architecture of a packet switched node it simulation. Designs Sinilar. to those discussed in this section need to be evaluated under different load conditions and in Uight of the options discussed in Section V. stage | Selectors Tag Registers From Stage i Seiten To Stage From Stage i Request (U) s/e (uy) Grant (U) Request (1) se Grant (L) (U) = Upper (L) = Lower Mester S/E = Straight/Exchange Reset ie To Stage it Request (U) sf) Grant (V) Request (U) s/e Grant (L) Network Clock Figure V1.3: Architecture of 2 circuit suitehing node contratied by routing tags. Be Circuit Suitening A proposed architecture for a circuit switched node “is shown in Figure VI.3. The architecture ts very sinilar to that of a tvorqueve, one word per ‘queue, packet svitched node. The primary differen ce is that either input data path ean be connected directly to either output data path. The input re- gisters and control unit allow cirevits to be ests lished “in the network via routing tags. To esta- blish a path, @ routing tag enters the network” and proceeds fron’ stage to stage as though routing @ packet. In this case, however, once the packet has passed” through a. node, the node connects: the ap- Propriate input and output together and holds” the connection. To nove the routing tag from one node to the next requires 2 request, grant, transfer Sequence (or multiple transiers if the fouting 99 is wider than the nade to node data path width). Consequently, request and grant control. Lines are Provided as shown. In addition, the S/E control tine must be included since’ granting a request depends on the desired node configuration. For ex ‘ample, if a node war set to "straight" as 9 result of a Fouting tag that entered via the upper Inout Link, only” "straight™ requests on the lower input ink”can be granted until. the “upper input is released). ‘A new requirement imposed on the network is to provide a means of informing processors of com pletion of the circuit. The following is description of the sequence of events that occur when anode in stage i receives a request: 1. S/E input Line is examined to see if the re~ ‘quired: connect fon ie avaiable. 2. Tf available, the grant output” Line = nade active, but énly for the duration of the grant tyele. 3. The routing tag which has been placed on the date bus “by the requesting node in stage +1 is Latehed into the register of the node in stage i during the transfer cycle. The grant output Line ts reset. 4. During the next request phase, node i esta blishes the cireute connecting input directly Yo output. The grant output’ Line. iz als0 directly connected to the grant input Line Both circuits are held as long as the correspon- ding request Line remains active. The grant. input Lines of nodes in the Last stage (stage 0) are con= nected to the respective "nenory resdy" signals. When a processor receives a grant signal. during a non-grant’ cycle, it is essured the desired memory Aodule is available and ready for use. During the establishment of "path, a pracessor can monitor Brogress by counting grant pulses. To release a Circuit at any ties, it is only necessary foro Processor to withdrav the request Signal. from the Stage a-1 node to which it is connected, use circuits "tie-up" the nodes they are constructed from, there exists a possibility of deadlock. To avoid deadlock requires sone disci pline ‘on the part of the processors. The schene Used by Barnes [1] permits “processors to access only one word before releasing the path. Other posstbiLities include 9 word Limit greater than one or 2 tine Limit scheme unich requires. renoval of the request if the circuit has not been established after sone specified nuaber of cycles. A variation Gf the Latter would be to extend the tine Limit as 3 function of the nunber of stages the circuit has Required, to reward progress. Once again, simular tion fs required to verify the effectivencss of any such schemes. Considerations Hybrid cube. Given the desians proposed for packet and circu- switching, combining thea for use as the buil- ‘network 1 to be Considered, Betause of the high degree of sinilar~ {ty between the proposed node architectures, if LSI Japlenentation 1s employed, the circuit and packet Suitehing capabilities can be combined into one chip. for modularity and quantity production. The complexity of such a chip module depends on the packet switching options chosen. If the packet Switching is performes with synchronous W/G cycles, a fined message sfze, and one word per packet, the packet and circuit architectures are extrenely com patible, Any of the combinations, however, can be Beconplished, An advantage of combining the two architectures into one chip is that it can be used Sh stege m, the interface stage of the Hybrid Cube fetworks “Using the chip for that purpose vould in Crease the complexity of the control section, but rot significantly. If the Hybrid Cube is inplemen~ ted with discrete components, the cost of cosbining the architectures would not Ge worth the modularity gained. ing blocks of the Hybrid Cube VEL. CONCLUSIONS The Hybrid Cube Interconnection network has been presented. Its advantages for” the proposed BND Fost bed have been discussed. even ff the actual Hybrid cube is not used in'e systen, the infor ton presented can be applied to the construction of a regular (non-hybrid) ‘cube network Snplenentation issues examined should sys ten designer” in saking implementation choices for either a hybrid or regular eube-type network for an SIMD" and/or MIND system, such as the BND test bed or PASK. VEIT, ACKNOWLEDGEMENTS ‘The idea of a hybrid evbe network, part circuit suitehed for processor=to-nenory conmunications and Dart packet switched for PE-to-PE consunicat ions, as suggested by BILL MeDonald. 1X, REFERENCES validation of a nany-processor nul= 1980 "international [1] GM. Barnes, "Design and connection. fetvork for processor systems," Conference on Parallel Processing, pp. 77-60, Tagust 1980. a ca oa wa esa ca m 81 09 co on 2a c32 ca sa Ke. Batcher, "The fLip network in STARAN," 3976" International Conference on Parallel racessing, po. 65-71, August 1978 D.M. Dias and J. R. Jump, “Analysis and simulation of buffered delta networks,” Proceedings "of the Workshop on Tntercomertion Networks for Paraltel and Bigributes Processing, ob. SVE," hore ms feaey foots Flynn, "Nery high-speed computing sys Proceedings of the IEEE, Vol. 5¢, Bpe 7, Becenber 966. 6. R. Goke and G. J. Lipoveki, “Banyan networks for partitioning multiprocessor sys= fener) est Amun Sampsiun on computer Archivecture, Bp: Ds He Lawrie, “Access and alignment of date fn an array processor," IEEE Transactions on Sonpsters, Wot. €-84, So. HSTIS5, beceaber 6. Js Lipovski and A. Tripathi, "A reconti= guradle. varistructure array processor,” 1977 Gntemnational Conference on Paral rocessing, po. 12-138, August 1977. We C. MeDonald and J. Mme Williams, "The ad Vanees” cate procesting test bed," Conpsac, bp. 346-351, March 1978. We C. MeDonatd and T. Gs willians, “evalua tion” of 2 multiaicroprocessor interconnec~ tion network for a class of sensor data pro- cessing problens," SPIE's 24th Annual Intemational. Technical Sysposiue Emibition, July 1980, to appear. Jz He Patel, "Processor-nenory _interconnec= tions | for’ multiprocessors," Sixth Ana Syppesiun "on tempter Areniecture, Bos N.C. Pease, “The indirect binary n-cube m= croprocessor array," IEEE Transactions on Lonputers, Vol, C26, pp. 458-473, Ray 1977. He 4, Siegel, “Analysis techniques for SIND Imachine interconnection networks. and the ef= fects, of "processor address masks," IEEE Transactions on Computers, Vol. €-26, Bp. Ties, February We J. Siegel, "Pretininary design of a versa tile’ parallel image processing syste," Third Biennial Conference on Computing in indiana, ‘pp. Ti=25, Aprit 1978, He 4. Siegel, "Interconnection networks for Simb'nachines," Computer, Vol. 12, pp. 57-85, dune 1979, H. de Siegel, "A model of SIND sachines and = conpariton of | various interconnection networks," IEE2 Transactions on Tas pecan Vol, €-28, pp. 9075 ca ca 183 casa 200 en cea rhe theory underlying the par= Sf permutation netvorks," Sete titioning rangactions on Computers, Vols e-29, Sept ‘er 1980, He J. Sfegel, F. Kesmerer, and". Mashburn, "Parallel memory systen’ for a partitionable SIND/AIND “nachine,” 1979.” International Conference parallel prozessing, pp. P22, ‘agust 1979. He Js Siegel, Re Je MeMiLLen, and P. T. Muel~ Lory” ste,” "A Survey of” interconnection nethods for reconfigurable parallel proces ing systens,” AFIPS 1979 National computer Conference, po. SEI-5K2, June WI He J. Siegel and Rs J, MeMiLten, The Use of fhe " multistage "Cube Network in Jultinicroprocessor Test Bed, School of Fle ‘trical. trainer ing, Pardue Univers ity, feck nical Report TREE 80-16, 75 pee dune 1360. He de Siegel, Pa Ts Mucller, drs, and He Ee Snatiey, Je“, “control of a partitionable rnultinicroprocessor system, International Conference an Processing, pp. S17, August 1978. He 4. Siegel and P. T. mueller, Jr. genization and Language design of eessors for. an SIMD/MIND system," Rocky Mountain Symposium on Microconsuters, bo. S11=300,ugust TTBS He de Stegely Le Js Siegel, Re J. MemiLten, PLT. “teller, drs, and. Ss 01 smith, SIMD/AIMD multinicroprocessor system for 232 ran 253 rea cen 283 age processing and pattern recognition," IEEE Section oo Saag oceltings ope 220, August 1979, ° fH J. Siegel and 5. 0, saith, "Study of ul Listage StnD interconsection networkey” Fifth Anoual Symposium’ en” Computer Archifecturey Boe 2OSEEIe RPA T9TB S. 9. Smith and He Je Siegel, “Rectreulating, pipelined and multistage sind interconnectioa networks," 1978 International conference. on Parallel’ Processing, pp. 206214, August Tere, TREES A. Re Tripathi and 6. J. Lipovski, "Packet switching” banyan networks," Sixth Annusl Symsosiun on. Computer Architecture," per Ter vane KY. Wen, Interprocessor Comections = Capabilities “Eepisttation, ad Effectiveness, Ph.o. “Thesis, Departeent “oF “Conputer science, University of 1ltinots, Re port UIUEDCS=R-76-830, 170 po., October 1976. C. Wu andT. Feng, "Fault-diagnosis for 9 class "of “multistage interconnect ion networks," 1979 International Conference. on. Parallel’ Processing, pp. 269-278, August €. Wu and T. Feng, "On 9 class of multistage interconnection networks,” IEEE Transactions fon Computers, Vol. ¢-29, tp- 90-702, August Fe

You might also like