Multiprocessor Concepts

Introduction to
Multiprocessor Concepts
Introduction
 Why parallel processing?

-to meet increasing demand for higher performance, lower costs, sustained
productivity in real-life applications
 How is it achieved?
-multiprogramming, multiprocessing or multicomputing
-Parallelism can appear in various forms like lookahead, pipelining,
vectorization, concurrency, simultaneity, data parallelism, partitioning,
interleaving, overlapping, multiplicity, replication, time sharing,
multithreading, distributed computing at processor levels.
Multiprocessors and Multicomputers
 Parallel computers can be modeled as physical models having shared

common memory or unshared distributed memories
Shared-Memory Multiprocessors
 3 main shared memory multiprocessor models

-Uniform memory access(UMA)
-Nonuniform memory access(NUMA)
-Cache only memory architecture(COMA)
They differ in how the memory and peripheral resources are shared or
distributed
The UMA model
 Physical memory is uniformly shared by all the processors

 All processors have equal access time to each memory words
 Peripherals are shared usually but however each processor might have a
private cache
 Multiprocessors are called tightly coupled systems due to high degree of
resource sharing
 System interconnect can be a common bus, crossbar switch, multistage
network
 UMA-suitable for general purpose and time sharing applications by
multiple users
Cont..
 Also used to improve execution speed of large time-critical applications

 Parallel events are done using shared variables in the common memory
 Symmetric multiprocessor- when all processors have equal access to all
peripheral devices
 Asymmetric multiprocessor- Only one or a subset of processors are
executive capable(Master processor)-equally capable of running OS
kernel and I/O service routines
-Master processor can execute OS and handle I/O
-Others do not have I/O capability and are called attached processors(AP)
-They execute user codes under the supervision of Master processor
Cont..
Approximate performance of a
Uniprocessor
 Array elements A(I), B(I), C(I) is assumed to have N elements

 L2,L4,L6 assumed to take 1 machine cycle and L1,L3,L5,L7 ignored
 K cycles need for interprocessor communication via shared memory
Cont..
 Ignoring bus contention/memory access violation problems

 Uniprocessor take
- 2N cycles( N for I loop and N for J loop)
 Multiprocessor with M number of processors take
-looping partitioned into M sections L=N/M elements per section
NUMA model
 Is a shared memory system in which access time varies with the location of
the memory word.
 Shared memory is distributed to all processors called local memories
 Collection of all local memories forms a global address space accessible
by all processors
Cont..
COMA
 Special case of NUMA model where distributed main memories are called
as caches
 No memory hierarchy at each processor node
 All caches form global address space
 Remote cache access is assisted by distributed cache directories
Cont..
Cont..
 Multiprocessor systems are suitable for general-purpose multiuser

applications where programmability is major concern
 Shortcoming of multiprocessors is the lack of scalability
 It is also difficult to build centralized shared memory model and is limited
by latency tolerance
 Hence distributed memory multi computers could be used which have
larger scalability but are limited by less programmability
Distributed memory computers
 Distributed memory multicomputer system consists of multiple

computers(nodes) interconnected by a message passing network
 Each node is an independent computer consisting of processor, local
memory. Disks or I/O peripherals being optional
 Message passing network provides point-to-point static connections
among the nodes
 Since a processor can access only its private local memory,
multicomputers are called no-remote-memory-access(NORMA) machines
 Internode communication is carried using message passing using static
connection network
Cont..
Conditions for Parallelism
 To provide parallelism in terms of computing, the key areas are identified

as
-Computation models for parallel computing
-Interprocessor communication in parallel architectures
-System integration
Data dependencies
 Execution of several segments in parallel needs to have each segment

independent of the other
 Dependence graphs are used to describe the dependencies among
elements
 Nodes of dependence graph correspond to instructions and the directed
edges shows the relationship between nodes
 Analysis of dependence graph provides idea to bring in parallelism and
vectorization
Data Dependence
 Flow dependence
Data dependence in programs
 Dependence is a partial ordering relation, ie members of not every pair of

statements are related
Dependence on I/O operations
 S1 and S3 are I/O dependent and both access tape unit 4

 Hence data dependence relation should not be arbitrarily violated during
program execution else may get erroneous result
 In uniprocessor system, repetitive runs should yield same results and is
preserved by defining the order of execution
Cont..
 In multiprocessor system, the program order may or may not be preserved
depending on type of memory model used
 Determinism is obtained by
-control by programmer
-constrained modification of writable data on shared memory
 l
Control Dependencies
 A situation where the order of execution of statements cannot be

determined before run time
 Different paths taken after conditional branch may introduce or eliminate
data dependence among instructions
 Dependence might exist between operations in successive iterations of a
loop
Cont..
 The successive iterations of the loop are control independent

Cont..
 Control dependence ex
 Control dependence often restricts parallelism
Resource Dependence
 Is concerned with using shared resources like integer units, floating point
units, registers, memory areas, ALU
Bernstein’s Conditions
 A process is a software entity corresponding to the abstraction of a

program fragment defined at various processing levels
 Ii- set of all input variables needed to execute Pi (operands-obtained by
memory or registers)
 Oi- set of all output variables generated after executing Pi ( to be stored in
registers or memory locations)
 If P1 and P2 has no dependencies for input and output, then it is said as
parallel P1II P2 and are obtained using Bernstein’s conditions which is flow
independence, antiindependence and output independence
Cont..
 l
Example
 Consider P1, P2, P3, P4, P5 in program order

 Assume each statement takes 1 step and no pipelining
Cont..
 Parallelism is commutative but not equivalence( Pi II Pj and Pj II Pk does

not imply Pi II Pk) but it is associative
Detection of parallelism
Cont..
 In general
 Where n is number of processes and parallelism violation can happen

collectively or partially
Hardware and Software Parallelism
 Hardware Parallelism defined by machine architecture and hardware

multiplicity
 It is often a function of cost and performance tradeoffs
 It can also indicate the peak performance of the processor resources
Cont..
 It can be characterized by number of instructions per machine cycle like if

a processor issues k instructions per machine cycle, then it is called a k-
issue processor
 A conventional processor might take 1 or 2 m/c to issue single instruction
 Intel processor variant is a 3 issue processor which issues 3 instructions per
machine cycle like 1 arithmetic, 1 memory access, 1 branch
 IBM processor variant issues 4. 1 arithmetic, 1 memory, 1 floating point and
1 branch per cycle
 A multiprocessor system built with n k-issue processors should be able to
handle a maximum of nk threads of instructions simultaneously
Software Parallelism
 Is defined by the control and data dependence of programs

 It is a function of algorithm, programming style, and compiler optimization
 Parallelism in a program varies during the execution period
 It often limits the sustained performance of the processor
Mismatch between software
parallelism and hardware parallelism

Multiprocessor Concepts

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiprocessor Concepts

Uploaded by

Copyright:

Available Formats

Introduction to

 Why parallel processing?

 Parallel computers can be modeled as physical models having shared

 3 main shared memory multiprocessor models

 Physical memory is uniformly shared by all the processors

 Also used to improve execution speed of large time-critical applications

 Array elements A(I), B(I), C(I) is assumed to have N elements

 Ignoring bus contention/memory access violation problems

 Multiprocessor systems are suitable for general-purpose multiuser

 Distributed memory multicomputer system consists of multiple

 To provide parallelism in terms of computing, the key areas are identified

 Execution of several segments in parallel needs to have each segment

 Dependence is a partial ordering relation, ie members of not every pair of

 S1 and S3 are I/O dependent and both access tape unit 4

 A situation where the order of execution of statements cannot be

 The successive iterations of the loop are control independent

 A process is a software entity corresponding to the abstraction of a

 Consider P1, P2, P3, P4, P5 in program order

 Parallelism is commutative but not equivalence( Pi II Pj and Pj II Pk does

 Where n is number of processes and parallelism violation can happen

 Hardware Parallelism defined by machine architecture and hardware

 It can be characterized by number of instructions per machine cycle like if

 Is defined by the control and data dependence of programs

You might also like