Cluster Computing: Dr. C. Amalraj 12/02/2021 The University of Moratuwa Amalraj@uom - LK

Cluster Computing
(IN 4700)
Dr. C. Amalraj
12/02/2021
The University of Moratuwa
amalraj@uom.lk
Lecture 2:
Parallel Processing
Outline
 Introduction: Parallel Processing
 Why use Parallel Processing?
 Flynn’s Classical Taxonomy
 SISD, SIMD, MISD, MIMD
 Parallel Architectures
 Designing Parallel Programs
 Embarrassingly parallel problems
Parallel Processing
What is Parallel Processing
 The Simultaneous
use of multiple
resources to solve
a computational
problem :
 The problem is broken into discrete parts that can

be solved concurrently
 Instructions from each part execute simultaneously
on different CPUs
Why use Parallel Processing?
 Save time
 Solve larger problems:
 Many problems are so large and/or complex that it is
impractical or impossible to solve them on a single
computer
 Use of non-local resources:
 Using compute resources on a wide area network, or
even the Internet when local compute resources are
scarce
 E.g. : SETI@home : over 1.3 million users, 3.2 million
computers in nearly every country in the world.
 Limits to serial computing:
 Transmission speeds : limits on how fast data can move
through hardware
 Limits to miniaturization
 Heating issues : Power Consumption proportional to
frequency
 Economic limitations : it is increasingly expensive to
make a single processor faster
 Current computer architectures are increasingly relying
upon hardware level parallelism to improve
performance:
 Multiple execution units
 Pipelined instructions
 Multi-core
Hardware Level Parallelism
 Multiple execution units
 An execution unit is a part of the CPU that performs
the operations and calculations as instructed by the
computer program.
 Pipelined instructions / Instruction pipelining
 A technique that implements a form of parallelism
called instruction-level parallelism within a single
processor.
 Allows faster CPU throughput (the number of
instructions that can be executed in a unit of time)
 Multi-core
 a single computing component with two or more
independent actual processing units (called "cores")
 each units read and execute program instructions
The Instruction Cycle
 Parallelism and Moore's law:
Moore's law : performance of chips
effectively doubles every 2 years due to the
addition of more transistors to a circuit
board
Parallel computation necessary to take full
advantage of the gains allowed by Moore's
law
Flynn’s Classical Taxonomy
Classification of Parallel Computers : Flynn's
Classical Taxonomy
 Single Instruction, Single Data (SISD):

Classification of Parallel Computers : Flynn's
Classical Taxonomy
 Single Instruction, Single Data (SISD):

A serial (non-parallel) computer
Single Instruction: Only one instruction stream is
being acted on by the CPU during any one clock
cycle
Single Data: Only one data stream is being used as
input during any one clock cycle
 Single Instruction, Multiple Data (SIMD):
 Single Instruction, Multiple Data (SIMD):
Single Instruction: All processing units
execute the same instruction at any given
clock cycle
Multiple Data: Each processing unit can
operate on a different data element
Best suited for problems characterized by a
high degree of regularity, such as image
processing.
E.g. : GPU
 Multiple Instruction, Single Data (MISD):
 Multiple Instruction, Single Data (MISD):
Multiple Instruction: Each processing unit
operates on the data independently via
separate instruction streams.
Single Data: A single data stream is fed into
multiple processing units.
Few actual examples.
 E.g. : Space Shuttle flight control computers
 Multiple Instruction, Multiple Data (MIMD):
 Multiple Instruction, Multiple Data (MIMD):
Multiple Instruction: Every processor may be
executing a different instruction stream
Multiple Data: Every processor may be
working with a different data stream
E.g. : networked parallel computer clusters
and "grids", multi-processor SMP computers,
multi-core PCs.
Parallel Architectures
 Shared Memory :
Ability for all processors to access all memory
as global address space
Changes in a memory location effected by one
processor are visible to all other processors
Shared memory machines can be divided into
two main classes based upon memory access
times:
Uniform Memory Access (UMA)
Non-Uniform Memory Access (NUMA)
 Uniform Memory Access (UMA) :
Commonly represented by Symmetric
Multiprocessor (SMP) machines
Identical processors
Equal access times to memory
 Non-Uniform Memory Access (NUMA) :
 Made by physically linking two or more SMPs
 One SMP can directly access memory of another
 Not all processors have equal access time to all
memories
 Memory access across link is slower
 Distributed Memory :
 Processors have their own local memory
 Change in a processor’s local memory have no effect
on the memory of other processors
 Needs message passing
 Explicit programming required
 Shared vs Distributed Memory :
 Hybrid Distributed-Shared Memory :
 Shared memory component : a cache coherent SMP
machine
 Distributed memory component : networking of
multiple SMP machines
Designing Parallel Programs
Automatic and Manual Parallelization :
 Manual Parallelization : time consuming, complex
and error-prone
 Automatic Parallelization : done by a parallelizing
compiler or pre-processor. Two different ways:
 Fully Automatic :
 compiler analyzes the source code and identifies
opportunities for parallelism
 Programmer Directed :
 using "compiler directives" or flags, the programmer
explicitly tells the compiler how to parallelize the code
 E.g. : OpenMP
Partitioning :
 Breaking the problem into discrete "chunks" of
work that can be distributed to multiple tasks
 Two basic ways to partition :
 Domain decomposition : the data associated with
a problem is decomposed
Partitioning :
 Two basic ways to partition :
 Domain Functional decomposition : the focus is on
the computation that is to be performed rather
than on the data manipulated by the computation
Load Balancing :
 Practice of distributing work among tasks so that all
tasks are kept busy all of the time
 Two types :
 Static load balancing : assigning a fixed amount of
work to each processing site
 Dynamic Load Balancing : Two types :
 Task-oriented : when one processing site finishes
its task, it is assigned another task
 Data-oriented : when a processing site finishes
its task before other sites, the site with the most
work gives the idle site some of its data to
process
Granularity :
 Qualitative measure of the ratio of
computation to communication
 Fine-grain Parallelism : relatively small
amounts of computation between
communication events
 Facilitates load balancing
 High communication overhead
 Coarse-grain Parallelism : significant work
done between communications
 Most efficient granularity depends on the
algorithm and the hardware environment
used
Amdahl's Law
Embarrassingly parallel
 Embarrassingly parallel problem : little or no effort is
required to separate the problem into a number of parallel
tasks
 No dependency (or communication) between the parallel
tasks
 Examples :
 Distributed relational database queries using distributed
set processing
 Rendering of computer graphics
 Event simulation and reconstruction in particle physics
 Brute-force searches in cryptography
 Ensemble calculations of numerical weather prediction
 Tree growth step of the random forest machine learning
technique
Applications of parallel processing
Summary
 Parallel Processing : Simultaneous use of multiple
resources to solve a computational problem
 Need for parallel processing : Limits to serial computing
and Moore’s Law
 Flynn’s Classical Taxonomy : SISD, SIMD, MIMD, MISD
 Parallel architectures : Shared memory, distributed
memory and hybrid
 Designing parallel programs : Automatic parallelization,
partitioning, load balancing and granularity
 Embarrassingly parallel problems : very easy to solve by
parallel processing

Cluster Computing: Dr. C. Amalraj 12/02/2021 The University of Moratuwa Amalraj@uom - LK

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Computing: Dr. C. Amalraj 12/02/2021 The University of Moratuwa Amalraj@uom - LK

Uploaded by

Copyright:

Available Formats

Cluster Computing

 The problem is broken into discrete parts that can

 Single Instruction, Single Data (SISD):

 Single Instruction, Single Data (SISD):

You might also like