You are on page 1of 21
EMAPTER 12. + LARGE COMPUTER SYSTEMS in which all processors execute the same program but operate on different data called a Single Instruction stream, Multiple Data stream (SIMD) system, The mul data streams are the sequences of data items accessed by the individual processors their own memories. The third scheme involves a number of independent processn| each executing a different progrant and accessing its own sequence of data ita Such machines are called Multiple Instruction stream, Multiple Data stream (MIM) systems. The fourth possibility is a Multiple Instruction stream, Single Data steam (MISD) system. In such a system, a common data structure is manipulated by separ processors, each executing a different program. This form of computation does at ‘occur often in practice, so itis not pursued here. This chapter concentrates on MIMD structures because they are most useful fi general purposes. However, we first briefly consider the SIMD structure to illu the kind of applications for which itis well-suited 12.2 ARRAY PROCESSORS ‘The SIMD form of parallel processing, also called array processing, was the frst fo of parallel processing to be studied ind implemented. In the early 1970s, a syste named ILLIAC-IV [2] was designed at the University of Illinois using this approx and was later built by Burroughs Corporation. Figure 12.1 illustrates the structure ofa array processor. A two-dimensional grid of processing elements executes an instruct stream that is broadcast from a gentral control processor. As eich instruction Wai cast, all elements execute it simultaneously: Each processing element is connect Control processor Broadcast t O Figure 12.1 An arcay processor Grid of processing elements its fo my L the I diese oct ‘Anap is der inter thene prove different data, idual processors in endent processors, © of data items, fa stream (MIMD) ingle Data stream ulated by separate Pputation does not fe most useful for ture to illustrate was the frst form 1970s, a system, tis connected to 12.2 ARRAY PROCESSORS is four nearest neighbors for purposes of exchanging data. End-around connections th) be provided in both rows and columns, but they are not shown in the figure £sTVD Let us consider a specific computation in order to understand the capabilities of de SIMD goohtecture The grid of processing elements can be used {0 solve 1W0> See ret ems, For example, if each clement ofthe grid represents a point in reer hearty canbe uscd to compate te teperatre at points inthe interior of acon- ae pesane’ Assume that the edges of the plane are held at some fixed temperatures. rae estate solupon at the diserete points tepresenied by the processing elements eerie follow The outer edges ae ntlized tthe specified temperatures. ll varvuice to abiteay values, not necessarily the jam teraTons are sor cient. Each iteration consists of cfeulting an im- Tues of ts four imerior points a then executed in parallel at groved estimate of the temperature ata point by averaging the current vest neighbors. The process stops when changes in the estimates during successive erations are less than some predefined small quantity ‘The capability nceded in the array processor to perform such calculations is quite sinple. Each element must be able to exchange values with each of its neighbors over the pats shown in the figure, Each processing element has afew registers and some local enor to store data. also hasa register which we can call the network register, that fa- Chines movemato ylues oan from its neighbors, Ts cea processor can broad SBravimsiruction o shift the valueST the network registers one step up, down, left tah. Each processing element also contains an ALU to execute arithmetic instructions twoadcast by the cont sor, Using these basic facilites, a sequence of instruc Tomcabe broadcast repeatedly to implement the iterative loop. The control processor mnustbe able to determine when each of the processing elements has developed its com fonent ofthe temperature tothe required accuracy. To do thig.gagh slement sets an inter (0 to indigate this condition, The grid interconnections include a facility ler to detect when all status bits are set atthe end of an iteration. ‘An interesting question with respect to array processors is whether it is better (0 sea elatively small number of powerful processors ora large number of very simple procesois, ICLIACTV is an example of the former choice. Its 64 processors had a Pepittitesal structure. Array processors introduced in the late 1980s are examples of thelutter choice. The CM-2 machine produced by the Thinking Machines Corporation ould accommodate up to 65.536 processors, but each processor is only one bit wide Maspar’s MP-1216 has a maximum of 16,384 processors that are 4 bits wide, The Cambridge Parallel Processing Gamma II Plus machines can have up to 4096 proces som that can operate on either byte-sized or bit-sized operands. These choices reflect the bie hatin the SIMD environment, it more useful to ave a high degree of Fallelism rather than to have fewer but more powerful processors. Array processors are highly specialized machines. They are well-suited to m merical problems that can be expressed in matrix or vector format. Recall that supse Computers with a vector architecture are also suitable for solving such problem( A key difference between vector-based machines and array processors is that the former achieve high performance through heavy use of pipelining, whereas the latter provide allelism by replication of computing modules )Neither array processors ‘up general computations, ral status bit ‘Tatallows extensive pi nor vector-based machines are particularly useful in speed and they do not have a large commercial market 622 CHAPTER 12 + Lane cre Com Tux SvsTes 12.3 THE STRUCTURE OF GENERAL-PURPOSE MULTIPROCESSORS Problems nodes in cain rraysional cube. The hypercube networks fost much of their populasty fn the ary 1990s when mesh-ased structures emerge a a more tractive akemathe tions than se networks be used t0 ‘implement fon circuits, some VO pepresent the ch node are circuits, B some /O p addresses 1.2.4 INTERCONNECTION NETWORKS No (000) Figure 12.7 A 3dimensional hypercube network idem in such a way that the addresses of any two neighbors differ in exactly one bit poston, as shown in the figure outing messages through the hypercube is particulary €29¥ If the processor at ands Ny wishes to send a message 10 node Nj, it proceeds 3 follows. The binary Mdiresses of the source, i, and the destination, jy are ‘compared from least to most fgnieant bits. Suppose that they differ frst in position Node N; then sends the wetage to its neighbor whose address, k differs from tM bit position p. Node Nx Frvars the message to the appropriate neighbor using he SAN ‘address comparison vet, The message gets closer to destination node Nj Wh & bof these hops from varove to another. For example, amessage from nade N2 ‘node Ns requires 3 hops. pasing through nodes Ny and Ny, The maximum distance that any message needs to ivel in an n-dimensional hypercube is 7 hops. scraming address pater from right to Tft is only one ofthe ‘methods that can be sed to determine message routing. Any other scheme that aves message closer 10 sation on each hop is equally acceptable, a Jong as Be Toe decision can weifage at each node on the path using only local information ‘This feature of the Iypnoube is attractive from the reliability viewpoint TP existence of multiple paths Riven two nodes means that when faulty Tinks ars “encountered, they ean usually be sac by simple, local routing decisions fone of the shores SUS not available, rescage may be sent over a longer path. When this is done, So must be taken t0 ad looping, which i the situation in which the message ‘circulates in a closed loop and never reaches its destination. Hypercube interconnection networks have been use 1 & ‘number of machines. The Haver known examples include Inel’s IPSC. which wsed 9 ‘T-dimensional cube 0 Pe pet up to 128 nodes, and NCUBE’s NCUBEsten, which had up to 1024 nodes in

You might also like