ACKNOWLEDGEMENT First & foremost I thanks my teacher who has assigned me thisterm paper to bring out my creative capabilities.

I expre ss my grat i t ude to my pa re nt s for be i ng a cont i nuoussource of encouragement & for all their financial aids given tome.I would like to acknowledge the assistance provided to me by theli bra ry st a ff of L PU pha gwa ra. My he a rt ful gra t it ude t o my friends, roommates for helping me to complete my task in time. Manpreet

table of content

Introduction n computing, MIMD (multiple instruction, multiple data) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors access memory. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes. In parallel systems, there are two kinds of fundamental models: shared memory and message passing. From a programmer's perspective, shared memory computers, while easy to program, are difficult to build and aren't scalable to beyond a few processors. Message passing computers, while easy to build and scale, are difficult to program. In some sense, shared

Computation Processor and Private Memory b. there are three important elements that do have tasks related to message passing : a. which in plain English means how the nodes . c. Router. Inside each node. Then Generation II multicomputers came along with independent switch units that were separate from the processor. as hardware became more and more complex and useful. commonly referred to as switch units The router’s main task is to transmit the message from one node to the next and assist the communication processor in organizing the communication of the message in through the network of nodes. in which messages were passed through direct links between nodes. The development of each of the above three elements took place progressively. Communication Processor This component is responsible for organizing communication among the multicomputer nodes. there are a large number of nodes and a communication network linking these nodes together. "packetizing" a message as a chunk of memory in the on the giving end.memory model and message passing model are equivalent.MultiComputer Classification Schemes One method of classifying a multicomputer’s scheme of message transmission is its interconnection network topology. Rather than having a processor being allocated to different process or having a processor continue with other computations (in other words stall). MIMD Architecture’s Goal Without accessibility to shared memory architectures. Distributed MIMD architectures are used. and finally in the Generation III multicomputer. it is not possible to obtain memory access of a remote block in a multicomputer. II. all three components exist. First came the Generation I computers. but there were not any communication processors or routers. the main point of this architectural design is to develop a message passing parallel computer system organized such that the processor time spent in communication within the network is reduced to a minimum. and "depacketizing" the same message for the receiving node.Components of the MultiComputer and Their Tasks Within a multicomputer. III. Thus.

Interconnection Network Topology Background Information On Interconnection Topologies With a geometrically arranged network. The logically shared memory is physically distributed among the processing nodes of NUMA machines. A. the Cost is the number of communication links required for the network approches numa Non-uniform memory access (NUMA) machines were designed to avoid the memory access bottleneck of UMA machines. hypercube) there are several "tradeoffs" associated with the given "designs" ability to transmit messages as quickly as possible: 1. the better the design for message transmission. 5. 4. the latency time for a message to transmit reduces also. There is great value in this information. The lower the degree of a node. the longer it may take for node X to pass the message to node Y. The main difference is in the organization of the address .are geometrically arranged as a network. the Shortest Path Algorithm from point A to point B(or in this case node A to node B) is directly linked to the shape that the nodes generate when linked together. The Arc Connectivity is the minimum number of arcs that need to be removed in order for the network to be two disconnected networks. When the diameter is less. but on the other hand they are very sensitive to data allocation in local memories. the structure and design of these machines resemble in many ways that of distributed memory multicomputers. Finally. Interconnection network topology has a great influence on message transmission. The Network’s Diameter is the shortest path between all the pairs of nodes in the network. leading to distributed shared memory architectures.(eg star. 3. The Degree of the Node This term means the number of input and output links to a given node. The Network’s Size The more nodes you have. 6. On one hand these parallel computers became highly scalable. The network’s Bisection Width is the minimum links that need to be removed so that the entire network splits into two halves. for obvious reasons 2. The idea behind it is related to the notion in computer science known as the Traveling Salesman problem. Accessing a local memory segment of a node is much faster than accessing a remote memory segment. Not by chance.

The figure below shows the design space of the CC-NUMA machines. that is. Another distinguishing design issue is the selection of the interconnection network among the nodes. while NUMA machines are programmed on the basis of the global address space (shared memory) principle. The main goal of the Stanford FLASH design was the efficient integration of cache-coherent shared memory with high-performance message passing. They demonstrate a progress from bus-based networks towards a more general interconnection network and from the snoopy cache coherency protocol towards a directory scheme. The FLASH applies a directory scheme for maintaining cache The Wisconsin multicube architecture is the closest generalization of a single bus-based multiprocessor. the address space is replicated in the local memories of the processing elements. It completely relies on the snoopy cache protocol but in a hierarchical way. all processors can transparently access all memory locations. a global address space is applied that is uniformly visible from each processor. Cache-Coherent Non-Uniform Memory Access (CC-NUMA) Machines All the CC-NUMA machines share the common goal of building a scalable shared memory multiprocessor. This difference in the address space of the memory is also reflected at the software level: distributed memory multicomputers are programmed on the basis of the message-passing paradigm. The main difference among them is in the way the memory and cache coherence mechanisms are distributed among the processing nodes. In multicomputers. In multiprocessors. .

according to .Cache-Only Memory Access (COMA) Machines COMA machines try to avoid the problems of static memory allocation of NUMA and CC-NUMA machines by excluding main memory blocks from the local memory of nodes and employing only large caches as node memories. Similarly to the way virtual memory has eliminated the need to handle memory addresses explicitly. In these architectures only cache memories are present. In COMA machines data allocation is demand driven. COMA machines render static data allocation to local memories superfluous. no main memory is employed wither in the form of a central shared memory as in UMA machines or in the form of a distributed main memory as in NUMA and CC-NUMA computers.

Synchronized access to share data in memory needed. Since COMA machines are scalable parallel architectures. Two representative COMA architectures are: DDM (Data Diffusion Machine). only cache coherence protocols that support large-scale parallel systems can be applied. that is. uniprocessor programming techniques can be adapted + Communication between processor is efficient . The only difference is that these techniques must be extended with the capability of finding the data on a cache read miss and of handling replacement. KSR1. data is always attracted to the local (cache) memory where it is needed.the cache coherence scheme. conditional critical regions. Synchronising constructs (semaphores.Lack of scalability due to (memory) contention problem . Multi-processor (shared memory system): Advantages and Disadvantages + No need to partition data or program. directory schemes and hierarchical cache coherent schemes. monitors) result in nondeterministc behaviour which can lead programming errors that are difficult to discover . In COMA machines similar cache coherence schemes can be applied as in other shared memory systems.


Classification of MIMD computers .

Sign up to vote on this title
UsefulNot useful