You are on page 1of 40

Introduction to

Multiprocessor Concepts
Introduction

 Why parallel processing?


-to meet increasing demand for higher performance, lower costs, sustained
productivity in real-life applications
 How is it achieved?
-multiprogramming, multiprocessing or multicomputing
-Parallelism can appear in various forms like lookahead, pipelining,
vectorization, concurrency, simultaneity, data parallelism, partitioning,
interleaving, overlapping, multiplicity, replication, time sharing,
multithreading, distributed computing at processor levels.
Multiprocessors and Multicomputers

 Parallel computers can be modeled as physical models having shared


common memory or unshared distributed memories
Shared-Memory Multiprocessors

 3 main shared memory multiprocessor models


-Uniform memory access(UMA)
-Nonuniform memory access(NUMA)
-Cache only memory architecture(COMA)
They differ in how the memory and peripheral resources are shared or
distributed
The UMA model

 Physical memory is uniformly shared by all the processors


 All processors have equal access time to each memory words
 Peripherals are shared usually but however each processor might have a
private cache
 Multiprocessors are called tightly coupled systems due to high degree of
resource sharing
 System interconnect can be a common bus, crossbar switch, multistage
network
 UMA-suitable for general purpose and time sharing applications by
multiple users
Cont..

 Also used to improve execution speed of large time-critical applications


 Parallel events are done using shared variables in the common memory
 Symmetric multiprocessor- when all processors have equal access to all
peripheral devices
 Asymmetric multiprocessor- Only one or a subset of processors are
executive capable(Master processor)-equally capable of running OS
kernel and I/O service routines
-Master processor can execute OS and handle I/O
-Others do not have I/O capability and are called attached processors(AP)
-They execute user codes under the supervision of Master processor
Cont..
Approximate performance of a
Uniprocessor

 Array elements A(I), B(I), C(I) is assumed to have N elements


 L2,L4,L6 assumed to take 1 machine cycle and L1,L3,L5,L7 ignored
 K cycles need for interprocessor communication via shared memory
Cont..

 Ignoring bus contention/memory access violation problems


 Uniprocessor take
- 2N cycles( N for I loop and N for J loop)
 Multiprocessor with M number of processors take
-looping partitioned into M sections L=N/M elements per section
NUMA model

 Is a shared memory system in which access time varies with the location of
the memory word.
 Shared memory is distributed to all processors called local memories
 Collection of all local memories forms a global address space accessible
by all processors
Cont..
COMA

 Special case of NUMA model where distributed main memories are called
as caches
 No memory hierarchy at each processor node
 All caches form global address space
 Remote cache access is assisted by distributed cache directories
Cont..
Cont..

 Multiprocessor systems are suitable for general-purpose multiuser


applications where programmability is major concern
 Shortcoming of multiprocessors is the lack of scalability
 It is also difficult to build centralized shared memory model and is limited
by latency tolerance
 Hence distributed memory multi computers could be used which have
larger scalability but are limited by less programmability
Distributed memory computers

 Distributed memory multicomputer system consists of multiple


computers(nodes) interconnected by a message passing network
 Each node is an independent computer consisting of processor, local
memory. Disks or I/O peripherals being optional
 Message passing network provides point-to-point static connections
among the nodes
 Since a processor can access only its private local memory,
multicomputers are called no-remote-memory-access(NORMA) machines
 Internode communication is carried using message passing using static
connection network
Cont..
Conditions for Parallelism

 To provide parallelism in terms of computing, the key areas are identified


as
-Computation models for parallel computing
-Interprocessor communication in parallel architectures
-System integration
Data dependencies

 Execution of several segments in parallel needs to have each segment


independent of the other
 Dependence graphs are used to describe the dependencies among
elements
 Nodes of dependence graph correspond to instructions and the directed
edges shows the relationship between nodes
 Analysis of dependence graph provides idea to bring in parallelism and
vectorization
Data Dependence

 Flow dependence
Data dependence in programs

 Dependence is a partial ordering relation, ie members of not every pair of


statements are related
Dependence on I/O operations

 S1 and S3 are I/O dependent and both access tape unit 4


 Hence data dependence relation should not be arbitrarily violated during
program execution else may get erroneous result
 In uniprocessor system, repetitive runs should yield same results and is
preserved by defining the order of execution
Cont..
 In multiprocessor system, the program order may or may not be preserved
depending on type of memory model used
 Determinism is obtained by
-control by programmer
-constrained modification of writable data on shared memory
 l
Control Dependencies

 A situation where the order of execution of statements cannot be


determined before run time
 Different paths taken after conditional branch may introduce or eliminate
data dependence among instructions
 Dependence might exist between operations in successive iterations of a
loop
Cont..

 The successive iterations of the loop are control independent


Cont..

 Control dependence ex
 Control dependence often restricts parallelism
Resource Dependence

 Is concerned with using shared resources like integer units, floating point
units, registers, memory areas, ALU
Bernstein’s Conditions

 A process is a software entity corresponding to the abstraction of a


program fragment defined at various processing levels
 Ii- set of all input variables needed to execute Pi (operands-obtained by
memory or registers)
 Oi- set of all output variables generated after executing Pi ( to be stored in
registers or memory locations)
 If P1 and P2 has no dependencies for input and output, then it is said as
parallel P1II P2 and are obtained using Bernstein’s conditions which is flow
independence, antiindependence and output independence
Cont..

 l
Example

 Consider P1, P2, P3, P4, P5 in program order


 Assume each statement takes 1 step and no pipelining
Cont..

 Parallelism is commutative but not equivalence( Pi II Pj and Pj II Pk does


not imply Pi II Pk) but it is associative
Detection of parallelism
Cont..

 In general

 Where n is number of processes and parallelism violation can happen


collectively or partially
Hardware and Software Parallelism

 Hardware Parallelism defined by machine architecture and hardware


multiplicity
 It is often a function of cost and performance tradeoffs
 It can also indicate the peak performance of the processor resources
Cont..

 It can be characterized by number of instructions per machine cycle like if


a processor issues k instructions per machine cycle, then it is called a k-
issue processor
 A conventional processor might take 1 or 2 m/c to issue single instruction
 Intel processor variant is a 3 issue processor which issues 3 instructions per
machine cycle like 1 arithmetic, 1 memory access, 1 branch
 IBM processor variant issues 4. 1 arithmetic, 1 memory, 1 floating point and
1 branch per cycle
 A multiprocessor system built with n k-issue processors should be able to
handle a maximum of nk threads of instructions simultaneously
Software Parallelism

 Is defined by the control and data dependence of programs


 It is a function of algorithm, programming style, and compiler optimization
 Parallelism in a program varies during the execution period
 It often limits the sustained performance of the processor
Mismatch between software
parallelism and hardware parallelism

You might also like