Professional Documents
Culture Documents
Parallel computing refers to the process of breaking down larger problems into smaller, independent,
often similar parts that can be executed simultaneously by multiple processors communicating via
shared memory, the results of which are combined upon completion as part of an overall algorithm
Networks connect multiple stand-alone computers (nodes) to make larger parallel computer clusters.
Named after the Hungarian mathematician John von Neumann who first authored the general
requirements for an electronic computer in his 1945 papers.
Also known as "stored-program computer" - both program instructions and data are kept in
electronic memory. Differs from earlier computers which were programmed through "hard
wiring".
Memory
Control Unit
Input/Output
Read/write, random access memory is used to store both program instructions and data
Program instructions are coded data which tell the computer to do something
Control unit fetches instructions/data from memory, decodes the instructions and
then sequentially coordinates operations to accomplish the programmed task.
Definition. Flynn's taxonomy is a categorization of forms of parallel computer architectures. From the
viewpoint of the assembly language programmer, parallel computers are classified by the concurrency in
processing sequences (or streams), data, and instructions.
Flynn’s classification is a proposed classification for the organization of a computer system by the
number of instructions and data items that are manipulated simultaneously
The operations performed on the data in the processor constitute a data stream.
Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be
classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these
dimensions can have only one of two possible states: Single or Multiple
Flynn's classification divides computers into four major groups that are:
Single Instruction: Only one instruction stream is being acted on by the CPU during any one
clock cycle
Single Data: Only one data stream is being used as input during any one clock cycle
Single Instruction: All processing units execute the same instruction at any given clock cycle
Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD
instructions and execution units.
CPU
Contemporary CPUs consist of one or more cores - a distinct execution unit with its own instruction
stream.
Node
Task
A logically discrete section of computational work. A task is typically a program or program-like set of
instructions that is executed by a processor
Shared Memory
Describes a computer architecture where all processors have direct access to common physical memory.
Shared memory hardware architecture where multiple processors share a single address space and have
equal access to all resources - memory, disk, etc.
Distributed Memory
In hardware, refers to network based memory access for physical memory that is not common.
Communications
Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as
through a shared memory bus or over a network.
Synchronization
The coordination of parallel tasks in real time, very often associated with communications.
Computational Granularity
Fine: relatively small amounts of computational work are done between communication events
Observed Speedup
-----------------------------------
One of the simplest and most widely used indicators for a parallel program's performance.
Parallel Overhead
Required execution time that is unique to parallel tasks, as opposed to that for doing useful work.
Parallel overhead can include factors such as:
Synchronizations
Data communications
Massively Parallel
Refers to the hardware that comprises a given parallel system - having many processing elements.
Solving many similar, but independent tasks simultaneously; little to no need for coordination between
the tasks.
Scalability
Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase
in parallel speedup with the addition of more resources.
What Is Parallel Programming?
Parallel programming, in simple terms, is the process of decomposing a problem into smaller
tasks that can be executed at the same time using multiple compute resources.
Parallel programming works by assigning tasks to different nodes or cores. In High Performance
Computing (HPC) systems, a node is a self-contained unit of a computer system contains
memory and processors running an operating system. Processors, such as central processing
units (CPUs) and graphics processing units (GPUs), are chips that contain a set of cores. Cores
are the units executing commands; there can be multiple cores in a processor and multiple
processors in a node.
Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be
parallelized:
1
speedup = --------
1 -P
Two types of scaling based on time to solution: strong scaling and weak scaling.
The total problem size stays fixed as more processors are added.
The problem size per processor stays fixed as more processors are added. The total
problem size is proportional to the number of processors used.
Perfect scaling means problem Px runs in same time as single processor run
Shared Memory
Multiple processors can operate independently but share the same memory resources.
Historically, shared memory machines have been classified as UMA and NUMA, based upon
memory access times.
Uniform Memory Access (UMA)
Identical processors
Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor
updates a location in shared memory, all the other processors know about the update. Cache
coherency is accomplished at the hardware level.
If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA
Distributed Memory
Processors have their own local memory. Memory addresses in one processor do not map to
another processor, so there is no concept of global address space across all processors.
The network "fabric" used for data transfer varies widely, though it can be as simple as Ethernet