Parallel Computing Unit 1 - Introduction To Parallel Computing

Unit 1 – Introduction to
Parallel Computing
Parallel Computing
2022 Parallel Computing | UNITEN 1

Objectives
• To explain the concept of serial computing and it’s limitation
• To explain the concept of parallel computing and it’s importance
• To describe the terminologies in parallel computing
• To calculate the performance measurement using speed-up factor

Serial Computing

Serial
Computation
• Traditionally software has
been written for serial
computation:-run on a single
computer-instructions are
run one after another-only
one instruction executed at a
time
4
2022 Parallel Computing | UNITEN
Von Neumann Architecture
• the von Neumann Central Processing Unit (CPU)
architecture has been
the basis for virtually
all computer designs Arithmetic Logic Unit
since the first
generation
Main Memory I/O Equipment
Control Unit

Von Neumann Architecture
• John von Neumann first authored the general requirements for an
electronic computer in 1945
• Data and instructions are stored in memory
• Control unit fetches instructions/data from memory, decodes the
instructions and then sequentially coordinates operations to
accomplish the programmed task.
• Arithmetic Unit performs basic arithmetic operations
• Input/Output is the interface to the human operator

Von Neumann Architecture - Components
Central Main Memory Control Unit
Processing Unit • Store program instruction • Fetches instruction/data from
(CPU) and data memory
Arithmetic • Program instruction – • Decode instructions
I/O Equipment
Logic Unit
Main Memory
code to instruct computer • Sequentially coordinated

• Data – information to be operations to be performed
used by the program
Control Input/output Arithmetic Logic Unit

Unit • Interface to the human • Perform basic arithmetic
operator operations

Instruction Cycle
• Processor fetches instruction from memory
• Program counter contains address of next instruction to
be fetched
• Program counter incremented after each fetch
Fetch Cycle Execute Cycle
Fetch Next Execute

START HALT
Instruction Instruction

Modern Instruction Cycle
(a) A three-stage pipeline

(b) A superscalar CPU
(figure taken from Stallings, Operating Systems, 4th Edition)

User-visible registers
• Minimize main-memory references

• Referenced by assembly language instructions (i.e. not high-
level languages)
• Types of registers
• Data - can be assigned by the programmer
• Address – used for direct or indirect references to main
Processor memory
• Condition Codes or flags
Registers Control and status registers
• Control operation of the processor

• Control execution of programs
• Program Counter (PC): Contains the address of an
instruction to be fetched
• Instruction Register (IR): Contains the instruction most
recently fetched
• Program Status Word (PSW): condition codes; Interrupt
enable/disable; Supervisor/user mode
11
Limitations of Serial Computing
• Limitation of single CPU
• Performance
• Cache memory
• Transmission speeds
• the speed of a serial computer is directly dependent upon how fast data can move through
hardware. Absolute limits are the speed of light (30 cm/nanosecond) and the transmission limit of
copper wire (9 cm/nanosecond). Increasing speeds necessitate increasing proximity of processing
elements.
• Limits to miniaturization
• processor technology is allowing an increasing number of transistors to be placed on a chip.
However, even with molecular or atomic-level components, a limit will be reached on how small
components can be.
• Economic limitations
• it is increasingly expensive to make a single processor faster. Using a larger number of moderately
fast commodity processors to achieve the same (or better) performance is less expensive.

Parallel Computing

Parallel
Computing
• Simultaneous use of multiple
compute sources to solve a
single problem
• use of multiple processors or
computers working together
on a common task.
14
What is Parallel Computing?
• In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved
concurrently
• Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different CPUs

Parallel Computing: Resources
• The compute resources can include:
• A single computer with multiple processors
• A single computer with (multiple) processor(s) and some specialized
computer resources (GPU - graphics processing unit), FPGA - Field-
programmable gate array, etc..)
• An arbitrary number of computers connected by a network
• A combination of both

Parallel Computing – Resources
• A single computer with multiple
processors
• A single computer with
(multiple) processor(s) and some
specialized computer resources
(GPU, FPGA …)
• Multiple number of computers
connected by a network
• Combination of both

processors
(GPU, FPGA …)

processors
(GPU, FPGA …)

processors
(GPU, FPGA …)
• Combination of multiple
computers with multiple
processors

Characteristics of Computational Problem
• Broken apart into discrete pieces of work that can be solved
simultaneously
• Execute multiple program instructions at any moment in time
• Solved in less time with multiple compute resources than with a
single compute resource

Application of Parallel Computing
• Multiple complex, interrelated events happening at the same time, yet
within a sequence e.g. Weather and ocean patterns, Planetary and galactic
orbits, etc…
• Traditionally, parallel computing has been considered to be "the high end
of computing" and has been motivated by numerical simulations of
complex systems and "Grand Challenge Problems" such as:
• weather and climate
• chemical and nuclear reactions
• biological, human genome
• geological, seismic activity
• mechanical devices - from prosthetics to spacecraft
• electronic circuits
• manufacturing processes

The Real World
is Massively
Parallel

Application of Parallel Computing
• Today, commercial applications are providing an equal or greater driving
force in the development of faster computers. These applications require
the processing of large amounts of data in sophisticated ways. Example
applications include:
• parallel databases, data mining
• oil exploration
• web search engines, web-based business services
• computer-aided diagnosis in medicine
• management of national and multi-national corporations
• advanced graphics and virtual reality, particularly in the entertainment industry
• networked video and multi-media technologies
• collaborative work environments
• Ultimately, parallel computing is an attempt to maximize the infinite but
seemingly scarce commodity called time.

Importance of Parallel Computing
• Ability to solve larger problems
• Ability to solve same problem faster than the single CPU
• Provide concurrency (do multiple things at the same time)
• Limits to serial computing
• Taking advantage of non-local resources - using available compute
resources on a wide area network, or even the Internet when local
compute resources are scarce.
• Cost savings - using multiple "cheap" computing resources instead of
paying for time on a supercomputer
• Overcoming memory constraints - single computers have very finite
memory resources. For large problems, using the memories of multiple
computers may overcome this obstacle

The future
• During the past 10 years, the trends indicated by ever faster networks,
distributed systems, and multi-processor computer architectures
(even at the desktop level) clearly show that parallelism is the future
of computing.
• It will be multi-forms, mixing general purpose solutions (your PC…)
and very specialized solutions as 1IBM Cells, 2ClearSpeed, GPGPU
from NVidia
• 1
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose Power Architecture core of modest
performance with streamlined coprocessing elements[1] which greatly accelerate multimedia and vector processing applications,
as well as many other forms of dedicated computation)
• 2ClearSpeed Technology Ltd is a semiconductor company, formed in 2002 to develop enhanced SIMD processors for use in high
performance computing and embedded systems.

Limitations to parallelism
• True data dependency
• A data dependency in computer science is a situation in which a program statement
(instruction) refers to the data of a preceding statement
• Procedural dependency
• Resource conflicts
• Output dependency
• Anti-dependency
• An anti-dependency, also known as write-after-read (WAR), occurs when an
1. B = 3 instruction requires a value that is later updated. In the following example,
2. A = B + 1 instruction 2 anti-depends on instruction 3 — the ordering of these instructions
3. B = 7 cannot be changed, nor can they be executed in parallel (possibly changing the
instruction ordering), as this would affect the final value of A.

Data dependency problem
• The major problem of executing multiple instructions in a scalar

program is the handling of data dependencies. If data
dependencies are not effectively handled, it is difficult to
achieve an execution rate of more than one instruction per
clock cycle.

Terminologies in Parallel
Computing

Term Definition
Supercomputing / • Using the world's fastest and largest computers to solve
High Performance large problems
Computing (HPC)
Node • A standalone "computer in a box“
• Nodes are networked together to comprise a
supercomputer
CPU / Socket / • CPU (Central Processing Unit) was a singular execution
Processor / Core component for a computer
• Multiple CPUs were incorporated into a node
• individual CPUs were subdivided into multiple "cores“
• CPUs with multiple cores are sometimes called "sockets" -
vendor dependent
• a node with multiple CPUs, each containing multiple cores

Term Definition
Task • A logically discrete section of computational work
• Typically a program or program-like set of instructions that
is executed by a processor
Parallel tasks • A task that can be executed by multiple processors safely
(yields correct results)
Serial execution • Execution of a program sequentially, one statement at a
time. In the simplest sense, this is what happens on a one
processor machine. However, virtually all parallel tasks will
have sections of a parallel program that must be executed
serially.
Parallel Execution • Execution of a program by more than one task, with each
task being able to execute the same or different statement
at the same moment in time.
Pipelining • Breaking a task into steps performed by different processor
units, with inputs streaming through, much like an
assembly line; a type of parallel computing

Term Definition
Symmetric Multi- • Hardware architecture where multiple processors share a
Processor (SMP) single address space and access to all resources; shared
memory computing.
Distributed Memory • Hardware - network based memory access for physical
memory that is not common
• Programming model - tasks can only logically "see" local
machine memory and must use communications to access
memory on other machines where other tasks are
executing
Shared Memory • Hardware - describes a computer architecture where all
processors have direct (usually bus based) access to
common physical memory
• Programming - model where parallel tasks all have the
same "picture" of memory and can directly address and
access the same logical memory locations regardless of
where the physical memory actually exists
Communications • Exchange data;
• shared memory bus
• Network

Term Definition
Granularity qualitative measure of the ratio of computation to
communication
• Coarse: relatively large amounts of computational work are
done between communication events
• Fine: relatively small amounts of computational work are
done between communication events
Scalability • Refers to a parallel system's (hardware and/or software)
ability to demonstrate a proportionate increase in parallel
speedup with the addition of more processors.
Parallel Overhead • amount of time required to coordinate parallel tasks
Massively Parallel • hardware that comprises a given parallel system - having
many processors
Speed • time it takes the program to execute

Performance
Measurement
SPEED UP FACTOR

Amdahl's Law
Amdahl's Law states that potential program
speedup is defined by the fraction of code (P) that
can be parallelized:
1
speedup = --------
1 - P
• If none of the code can be parallelized, P = 0 and the speedup

= 1 (no speedup). If all of the code is parallelized, P = 1 and
the speedup is infinite (in theory).
• If 50% of the code can be parallelized, maximum speedup = 2,
meaning the code will run twice as fast.

Amdahl's Law
• Introducing the number of processors performing the parallel fraction
of work, the relationship can be modeled by
1
speedup = ------------
P + S
---
N
• where P = parallel fraction, N = number of processors and S = serial

fraction

Amdahl's Law
• It soon becomes obvious that there are limits to the scalability of
parallelism. For example, at P = .50, .90 and .99 (50%, 90% and 99% of
the code is parallelizable)
speedup
--------------------------------
N P = .50 P = .90 P = .99
----- ------- ------- -------
10 1.82 5.26 9.17
100 1.98 9.17 50.25
1000 1.99 9.91 90.99
10000 1.99 9.91 99.02

Amdahl's Law
• However, certain problems demonstrate increased performance by increasing the problem size.
For example:
• 2D Grid Calculations 85 seconds 85%
• Serial fraction 15 seconds 15%
• We can increase the problem size by doubling the grid dimensions and halving the time step. This
results in four times the number of grid points and twice the number of time steps. The timings
then look like:
• 2D Grid Calculations 680 seconds 97.84%
• Serial fraction 15 seconds 2.16%
• Problems that increase the percentage of parallel time with their size are more scalable than
problems with a fixed percentage of parallel time.

Complexity
• In general, parallel applications are much more complex than corresponding
serial applications, perhaps an order of magnitude. Not only do you have multiple
instruction streams executing at the same time, but you also have data flowing
between them.
• The costs of complexity are measured in programmer time in virtually every
aspect of the software development cycle:
• Design
• Coding
• Debugging
• Tuning
• Maintenance
• Adhering to "good" software development practices is essential when when
working with parallel applications - especially if somebody besides you will have
to work with the software.
2022 Universiti Tenaga Nasional. All rights reserved.

Parallel Computing Unit 1 - Introduction To Parallel Computing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Computing Unit 1 - Introduction To Parallel Computing

Uploaded by

Copyright:

Available Formats

Unit 1 – Introduction to

2022 Parallel Computing | UNITEN 1

2022 Parallel Computing | UNITEN 2

2022 Parallel Computing | UNITEN 3

2022 Parallel Computing | UNITEN 5

2022 Parallel Computing | UNITEN 6

code to instruct computer • Sequentially coordinated

Control Input/output Arithmetic Logic Unit

2022 Parallel Computing | UNITEN 7

Fetch Cycle Execute Cycle

Fetch Next Execute

2022 Parallel Computing | UNITEN 9

(a) A three-stage pipeline

2022 Parallel Computing | UNITEN 10

• Minimize main-memory references

Registers Control and status registers

• Control operation of the processor

2022 Parallel Computing | UNITEN 12

2022 Parallel Computing | UNITEN 13

2022 Parallel Computing | UNITEN 16

2022 Parallel Computing | UNITEN 18

2022 Parallel Computing | UNITEN 19

2022 Parallel Computing | UNITEN 20

2022 Parallel Computing | UNITEN 21

2022 Parallel Computing | UNITEN 22

2022 Parallel Computing | UNITEN 23

2022 Parallel Computing | UNITEN 24

2022 Parallel Computing | UNITEN 25

2022 Parallel Computing | UNITEN 26

2022 Parallel Computing | UNITEN 27

2022 Parallel Computing | UNITEN 28

2022 Parallel Computing | UNITEN 29

• The major problem of executing multiple instructions in a scalar

2022 Parallel Computing | UNITEN 30

2022 Parallel Computing | UNITEN 31

2022 Parallel Computing | UNITEN 32

2022 Parallel Computing | UNITEN 33

2022 Parallel Computing | UNITEN 34

2022 Parallel Computing | UNITEN 35

2022 Parallel Computing | UNITEN 36

• If none of the code can be parallelized, P = 0 and the speedup

2022 Parallel Computing | UNITEN 37

• where P = parallel fraction, N = number of processors and S = serial

2022 Parallel Computing | UNITEN 38

2022 Parallel Computing | UNITEN 39

2022 Parallel Computing | UNITEN 41

2022 Parallel Computing | UNITEN 43

You might also like