CSC429

CSC429
COMPUTER ORGANIZATION
AND ARCHITECTURE
PARALLEL PROCESSING
CHAPTER 7
Contents
• Multiple processor organizations
- Types of parallel processor system
- Parallel organization
• Symmetric multiprocessors
• Clusters
• Nonuniform Memory Access (NUMA)
FMH/UiTM 2
Classification of Parallel Processors
• The organization of a computer system can be classified by the
number of instructions and data items that can be manipulated
simultaneously
• The sequence of instructions read from the memory constitute
an instruction stream
• The operation performed on data in the processor constitutes a
data stream
• Parallel processing may occur in instruction stream or data
stream, or both
FMH/UiTM 3
Types of Parallel Processor
Organizations
• Single instruction, single data stream (SISD)
- Instruction are executed sequentially
• Single instruction, multiple data stream (SIMD)
- All processors receive the same instruction from control unit, but
operate in different sets of data
• Multiple instruction, single data stream (MISD)
- It is theoretical interest only as no practical organization can be
constructed using this organization
• Multiple instruction, multiple data stream (MIMD)
- Several programs can execute at the same time. Most
multiprocessors come in this category
FMH/UiTM 4
Single Instruction, Single Data Stream
(SISD)
• A single processor executes a single instruction stream to
operate on data stored in a single memory
• Uniprocessors fall into this category
CU – Control Unit
PU – Processing Unit
MU – Memory Unit
IS – Instruction Stream
DS – Data Stream
LM – Local Memory
FMH/UiTM 5
Single Instruction, Multiple Data Stream
(SIMD)
• A single machine instruction controls the simultaneous
execution of a number of processing elements on a lockstep
basis
• Each processing element has an associated data memory, so
that each instruction is executed on a different set of data by
the different processors
• Application: vector and array processing
FMH/UiTM 6
Single Instruction, Multiple Data Stream
(SIMD)
CU – Control Unit
MU – Memory Unit
DS – Data Stream
LM – Local Memory
FMH/UiTM 7
Multiple Instruction, Single Data Stream
(MISD)
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Not clear if it has ever been implemented
FMH/UiTM 8
Multiple Instruction, Multiple Data
Stream (MIMD)
• Set of processors
• Simultaneously executes different instruction sequences
• Different sets of data
• Examples: SMPs, NUMA systems and clusters
• Have 2 types:
- MIMD with shared memory
- MIMD with distributed memory
FMH/UiTM 9
Multiple Instruction, Multiple Data
Stream (MIMD)
CU – Control Unit
MU – Memory Unit
DS – Data Stream
LM – Local Memory
FMH/UiTM 10
Taxonomy of Parallel Processor
Architectures
FMH/UiTM 11
Symmetric Multiprocessors (SMP)
An SMP can be defined as a standalone computer system with the following characteristics:
• Two or more similar processors of comparable capability.

• Processors share the same main memory and I/O facilities
• Processors are interconnected by a bus or other internal connection scheme
• Memory access time is approximately the same for each processor.
• All processors share access to I/O devices either through the same channels or through
different channels that provide paths to the same device.
• All processors can perform the same functions (hence the term symmetric).
• The system is controlled by an integrated operating system
• providing interaction between processors
• interaction at the job, task, file, and data element levels.
FMH/UiTM 12
Symmetric Multiprocessors (SMP)
Advantages
• Performance: If the work to be done by a computer can be organized so that some

portions of the work can be done in parallel, then a system with multiple processors
will yield greater performance than one with a single processor of the same type
• Availability: In a symmetric multiprocessor, because all processors can perform the
same functions, the failure of a single processor does not halt the machine. Instead,
the system can continue to function at reduced performance.
• Incremental growth: A user can enhance the performance of a system by adding an
additional processor.
• Scaling: Vendors can offer a range of products with different price and performance
characteristics based on the number of processors configured in the system.
FMH/UiTM 13
Block Diagram of Tightly Coupled
Multiprocessor
• Each processor is self-
contained, including a control
unit, ALU, registers, and,
typically, one or more levels of
cache
• Each processor has access to a
shared main memory and the
I/O devices through some form
of interconnection mechanism
• The processors can
communicate with each other
through memory (message and
status information left in
common data area)
FMH/UiTM 14
Symmetric Multiprocessor Organization
• The most common
organization for personal
computers, workstations, and
servers is the time-shared
bus.
• The time-shared bus is the
simplest mechanism for
constructing a
multiprocessor system.
• The bus consists of control,
address and data lines
FMH/UiTM 15
• To facilitate DMA transfers from I/O subsystems to processors,
the following features are provided:
- Addressing: It must be possible to distinguish modules on the bus to
determine the source and destination of data.
- Arbitration: Any I/O module can temporarily function as “master.” A
mechanism is provided to arbitrate competing requests for bus
control, using some sort of priority scheme.
- Time-sharing: When one module is controlling the bus, other modules
are locked out and must, if necessary, suspend operation until bus
access is achieved.
FMH/UiTM 16
• The bus organization has several attractive features:
- Simplicity: This is the simplest approach to multiprocessor
organization. The physical interface and the addressing, arbitration,
and time-sharing logic of each processor remain the same as in a
single-processor system.
- Flexibility: It is generally easy to expand the system by attaching more
processors to the bus.
- Reliability: The bus is essentially a passive medium, and the failure of
any attached device should not cause failure of the whole system.
FMH/UiTM 17
Clusters
• A cluster is a group of interconnected, whole computers
working together as a unified computing resource that can
create the illusion of being one machine
• Different purpose from general purpose business needs such
as web-service support, to computation-intensive scientific
calculations
• Three types of cluster architecture:
- High performance
- High availability
- Load balancing
FMH/UiTM 18
Clusters
• Collection of independent whole uniprocessors or SMPs
- Usually called nodes
- Interconnected to form a cluster
• Working together as unified resource
- Illusion of being one machine
• Communication via fixed path or network connections
FMH/UiTM 19
Benefits of Cluster Organization
• High processing speed
• Offer scalability – easily expanded by adding additional nodes
to the network
• Provide high availability of resources – can act as backup
system in the event of failure of system
• Processing power is cost effective compare to mainframe
• Drawback: High cost to implement and maintain
FMH/UiTM 20
Benefits of Clusters
Incremental Superior
Absolute scalability High availability
scalability price/performance
Because each node in a

It is possible to create large A cluster is configured in such
cluster is a standalone
clusters that far surpass the a way that it is possible to
computer, the failure of one
power of even the largest add new systems to the
node does not mean loss of
standalone machines. cluster in small increments.
service. By using commodity building
blocks, it is possible to put
together a cluster with equal
or greater computing power
A user can start out with a than a single large machine,
modest system and expand it at much lower cost.
A cluster can have tens,
as needs grow, without In many products, fault
hundreds, or even thousands
having to go through a major tolerance is handled
of machines, each of which is
upgrade in which an existing automatically in software.
a multiprocessor.
small system is replaced with
a larger system.
FMH/UiTM 21
Cluster Configurations
• In the literature, clusters are classified in a number of different
ways.
• Two cluster configurations:
- Standby server with no shared disk
- Shared disk
• Standby server with no shared disk
- the only interconnection is a high-speed link that can be used for
message exchange to coordinate cluster activity.
 The link can be a LAN that is shared with other computers that are not part of
the cluster
 The link can be a dedicated interconnection facility.
FMH/UiTM 22
• Shared disk
- there generally is still a message link between nodes.
- there is a disk subsystem that is directly linked to multiple computers
within the cluster.
- the common disk subsystem is a RAID system.
- The use of RAID or some similar redundant disk technology is
common in clusters so that the high availability achieved by the
presence of multiple computers is not compromised by a shared disk
that is a single point of failure.
FMH/UiTM 23
FMH/UiTM 24
Cluster Computer Architecture
• The individual computers are connected by some high-speed
LAN or switch hardware.
• Each computer is capable of operating independently.
• A middleware layer of software is installed in each computer
to enable cluster operation.
• The cluster middleware provides a unified system image to the
user, known as a single-system image.
• The middleware is also responsible for providing high
availability, by means of load balancing and responding to
failures in individual components.
FMH/UiTM 25
Cluster Computer Architecture
FMH/UiTM 26
Clusters vs. SMP
• Both provide multiprocessor support to high demand applications
• Both available commercially
• SMP:
- Easier to manage and control
- Closer to single processor systems
 Scheduling is main difference
 Less physical space
 Lower power consumption
• Clustering:
- Superior incremental & absolute scalability
- Less cost
- Superior availability
 all components of the system can readily be made highly redundant
FMH/UiTM 27
Nonuniform Memory Access (NUMA)
(Tightly coupled)
• Alternative to SMP and Clusters
• Nonuniform memory access
- All processors have access to all parts of memory
 Using load & store
- Access time of processor differs depending on region of memory
 Different processors access different regions of memory at different speeds
• Cache coherent NUMA?
- Cache coherence is maintained among the caches of the various
processors
- Significantly different from SMP and Clusters
FMH/UiTM 28
Motivation
• SMP has practical limit to number of processors
- Bus traffic limits to between 16 and 64 processors
• In clusters each node has own memory
- Apps do not see large global memory
- Coherence maintained by software not hardware
• NUMA retains SMP flavour while giving large scale
multiprocessing
• Objective is to maintain transparent system wide memory
while permitting multiprocessor nodes, each with own bus or
internal interconnection system
FMH/UiTM 29
NUMA Organization
FMH/UiTM 30
NUMA Pros & Cons
• Possibly effective performance at higher levels of parallelism
than one SMP
• Not very supportive of software changes
• Performance can breakdown if too much access to remote
memory
- Can be avoided by:
 L1 & L2 cache design reducing all memory access
• Need good temporal locality of software
• Not transparent
- Page allocation, process allocation and load balancing changes can be
difficult
FMH/UiTM 31
END OF CHAPTER 7
FMH/UiTM 32

CSC429 - Chapter 7 - Parallel Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSC429 - Chapter 7 - Parallel Processing

Uploaded by

Copyright:

Available Formats

• Two or more similar processors of comparable capability.

• Performance: If the work to be done by a computer can be organized so that some

Because each node in a

You might also like