Professional Documents
Culture Documents
Parallel Computation Models: Slide 1
Parallel Computation Models: Slide 1
Lecture 3
Lecture 4
Slide 1
Parallel Computation Models
• PRAM (parallel RAM)
• Fixed Interconnection Network
– bus, ring, mesh, hypercube, shuffle-exchange
• Boolean Circuits
• Combinatorial Circuits
• BSP
• LOGP
Slide 2
PARALLEL AND DISTRIBUTED
COMPUTATION
• MANY INTERCONNECTED PROCESSORS WORKING CONCURRENTLY
P4 P5
P3
INTERCONNECTION
NETWORK
P2
P1 .... Pn
• CONNECTION MACHINE
• INTERNET Connects all the computers of the world
Slide 3
TYPES OF MULTIPROCESSING FRAMEWORKS
PARALLEL DISTRIBUTED
TECHNICAL ASPECTS
PURPOSES
Slide 4
FOR PARALLEL SYSTEMS
•COMMUNICATION SERVICES
ROUTING
BROADCASTING
Slide 5
PARALLEL ALGORITHMS
Slide 6
We need a model of computation
• NETWORK (VLSI) MODEL
Slide 7
HYPERCUBE
0110 0111
1110 1111
0100
0101
degree = 4 (log2N)
1100
1101
diameter = 4
0010
0011
1010
1011
0000 0001
1000 1001
N = 24 PROCESSORS
Slide 8
Other important topologies
• binary trees
• mesh of trees
• cube connected cycles
Slide 9
Model Equivalence
• given two models M1 and M2, and a problem
of size n
Slide 10
PRAM
• Parallel Random Access Machine
• Shared-memory multiprocessor
• unlimited number of processors, each
– has unlimited local memory
– knows its ID
– able to access the shared memory
• unlimited shared memory
Slide 11
PRAM MODEL
1
2
P1
3
P2
. Common Memory
. ? .
Pi
.
.
Pn
m
ASSUMPTION: at each time unit each Pi can read a memory cell, make an internal
computation and write another memory cell.
Slide 13
PRAM Instruction Set
• accumulator architecture
– memory cell R0 accumulates results
Slide 14
PRAM Complexity Measures
• for each individual processor
– time: number of instructions executed
– space: number of memory cells accessed
• PRAM machine
– time: time taken by the longest running processor
– hardware: maximum number of active processors
Slide 15
Two Technical Issues for PRAM
Slide 16
Processor Activation
• P0 places the number of processors (p) in the
designated shared-memory cell
– each active Pi, where i < p, starts executing
– O(1) time to activate
– all processors halt when P0 halts
Slide 17
THE PRAM IS A THEORETICAL (UNFEASIBLE) MODEL
• The interconnection network between processors and memory would require
a very large amount of area .
• Instead of design ad hoc algorithms for bounded degree networks, design more
general algorithms for the PRAM model and simulate them on a feasible network.
Slide 18
• For the PRAM model there exists a well developed body of techniques
and methods to handle different classes of computational problems.
COARSE-GRAINED MODELS
• local computation
• communication phase
• syncronization phase
Slide 19
Metrics
A measure of relative performance between a multiprocessor
system and a single processor system is the speed-up S( p),
defined as follows:
Execution time using a single processor system
S( p) = Execution time using a multiprocessor with p processors
T1 Sp
S( p) = Efficiency =
Tp p
Cost = p Tp
Slide 20
Metrics
• Parallel algorithm is cost-optimal:
parallel cost = sequential time
Cp = T1
Ep = 100%
Slide 21
Amdahl’s Law
• f = fraction of the problem that’s
inherently sequential
(1 – f) = fraction that’s parallel
1
• Speedup with p processors: S p 1 f
f
p
Slide 22
Amdahl’s Law
• Upper bound on speedup (p = )
1 1
Sp Converges to 0 S
1 f f
f
p
• Example:
f = 2%
S = 1 / 0.02 = 50
Slide 23
PRAM
• Too many interconnections gives problems with synchronization
• However it is the best conceptual model for designing efficient
parallel algorithms
– due to simplicity and possibility of simulating efficiently PRAM
algorithms on more realistic parallel architectures
Slide 24
Shared-Memory Access
Concurrent (C) means, many processors can do the operation simultaneously in
the same memory
Exclusive (E) not concurent
Slide 25
Example CRCW-PRAM
• Initially
– table A contains values 0 and 1
– output contains value 0
Slide 26
Example CREW-PRAM
• Assume initially table A contains [0,0,0,0,0,1] and we
have the parallel program
Slide 27
Pascal triangle
PRAM CREW
Slide 28