Professional Documents
Culture Documents
COMPUTER ORGANIZATION
AND ARCHITECTURE
PARALLEL PROCESSING
CHAPTER 7
Contents
• Multiple processor organizations
- Types of parallel processor system
- Parallel organization
• Symmetric multiprocessors
• Clusters
• Nonuniform Memory Access (NUMA)
FMH/UiTM 2
Classification of Parallel Processors
• The organization of a computer system can be classified by the
number of instructions and data items that can be manipulated
simultaneously
• The sequence of instructions read from the memory constitute
an instruction stream
• The operation performed on data in the processor constitutes a
data stream
• Parallel processing may occur in instruction stream or data
stream, or both
FMH/UiTM 3
Types of Parallel Processor
Organizations
• Single instruction, single data stream (SISD)
- Instruction are executed sequentially
• Single instruction, multiple data stream (SIMD)
- All processors receive the same instruction from control unit, but
operate in different sets of data
• Multiple instruction, single data stream (MISD)
- It is theoretical interest only as no practical organization can be
constructed using this organization
• Multiple instruction, multiple data stream (MIMD)
- Several programs can execute at the same time. Most
multiprocessors come in this category
FMH/UiTM 4
Single Instruction, Single Data Stream
(SISD)
• A single processor executes a single instruction stream to
operate on data stored in a single memory
• Uniprocessors fall into this category
CU – Control Unit
PU – Processing Unit
MU – Memory Unit
IS – Instruction Stream
DS – Data Stream
LM – Local Memory
FMH/UiTM 5
Single Instruction, Multiple Data Stream
(SIMD)
• A single machine instruction controls the simultaneous
execution of a number of processing elements on a lockstep
basis
• Each processing element has an associated data memory, so
that each instruction is executed on a different set of data by
the different processors
• Application: vector and array processing
FMH/UiTM 6
Single Instruction, Multiple Data Stream
(SIMD)
CU – Control Unit
PU – Processing Unit
MU – Memory Unit
IS – Instruction Stream
DS – Data Stream
LM – Local Memory
FMH/UiTM 7
Multiple Instruction, Single Data Stream
(MISD)
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Not clear if it has ever been implemented
FMH/UiTM 8
Multiple Instruction, Multiple Data
Stream (MIMD)
• Set of processors
• Simultaneously executes different instruction sequences
• Different sets of data
• Examples: SMPs, NUMA systems and clusters
• Have 2 types:
- MIMD with shared memory
- MIMD with distributed memory
FMH/UiTM 9
Multiple Instruction, Multiple Data
Stream (MIMD)
CU – Control Unit
PU – Processing Unit
MU – Memory Unit
IS – Instruction Stream
DS – Data Stream
LM – Local Memory
FMH/UiTM 10
Taxonomy of Parallel Processor
Architectures
FMH/UiTM 11
Symmetric Multiprocessors (SMP)
An SMP can be defined as a standalone computer system with the following characteristics:
FMH/UiTM 12
Symmetric Multiprocessors (SMP)
Advantages
FMH/UiTM 13
Block Diagram of Tightly Coupled
Multiprocessor
• Each processor is self-
contained, including a control
unit, ALU, registers, and,
typically, one or more levels of
cache
• Each processor has access to a
shared main memory and the
I/O devices through some form
of interconnection mechanism
• The processors can
communicate with each other
through memory (message and
status information left in
common data area)
FMH/UiTM 14
Symmetric Multiprocessor Organization
• The most common
organization for personal
computers, workstations, and
servers is the time-shared
bus.
• The time-shared bus is the
simplest mechanism for
constructing a
multiprocessor system.
• The bus consists of control,
address and data lines
FMH/UiTM 15
Symmetric Multiprocessor Organization
• To facilitate DMA transfers from I/O subsystems to processors,
the following features are provided:
- Addressing: It must be possible to distinguish modules on the bus to
determine the source and destination of data.
- Arbitration: Any I/O module can temporarily function as “master.” A
mechanism is provided to arbitrate competing requests for bus
control, using some sort of priority scheme.
- Time-sharing: When one module is controlling the bus, other modules
are locked out and must, if necessary, suspend operation until bus
access is achieved.
FMH/UiTM 16
Symmetric Multiprocessor Organization
• The bus organization has several attractive features:
- Simplicity: This is the simplest approach to multiprocessor
organization. The physical interface and the addressing, arbitration,
and time-sharing logic of each processor remain the same as in a
single-processor system.
- Flexibility: It is generally easy to expand the system by attaching more
processors to the bus.
- Reliability: The bus is essentially a passive medium, and the failure of
any attached device should not cause failure of the whole system.
FMH/UiTM 17
Clusters
• A cluster is a group of interconnected, whole computers
working together as a unified computing resource that can
create the illusion of being one machine
• Different purpose from general purpose business needs such
as web-service support, to computation-intensive scientific
calculations
• Three types of cluster architecture:
- High performance
- High availability
- Load balancing
FMH/UiTM 18
Clusters
• Collection of independent whole uniprocessors or SMPs
- Usually called nodes
- Interconnected to form a cluster
• Working together as unified resource
- Illusion of being one machine
• Communication via fixed path or network connections
FMH/UiTM 19
Benefits of Cluster Organization
• High processing speed
• Offer scalability – easily expanded by adding additional nodes
to the network
• Provide high availability of resources – can act as backup
system in the event of failure of system
• Processing power is cost effective compare to mainframe
• Drawback: High cost to implement and maintain
FMH/UiTM 20
Benefits of Clusters
Incremental Superior
Absolute scalability High availability
scalability price/performance
FMH/UiTM 21
Cluster Configurations
• In the literature, clusters are classified in a number of different
ways.
• Two cluster configurations:
- Standby server with no shared disk
- Shared disk
• Standby server with no shared disk
- the only interconnection is a high-speed link that can be used for
message exchange to coordinate cluster activity.
The link can be a LAN that is shared with other computers that are not part of
the cluster
The link can be a dedicated interconnection facility.
FMH/UiTM 22
Cluster Configurations
• Shared disk
- there generally is still a message link between nodes.
- there is a disk subsystem that is directly linked to multiple computers
within the cluster.
- the common disk subsystem is a RAID system.
- The use of RAID or some similar redundant disk technology is
common in clusters so that the high availability achieved by the
presence of multiple computers is not compromised by a shared disk
that is a single point of failure.
FMH/UiTM 23
Cluster Configurations
FMH/UiTM 24
Cluster Computer Architecture
• The individual computers are connected by some high-speed
LAN or switch hardware.
• Each computer is capable of operating independently.
• A middleware layer of software is installed in each computer
to enable cluster operation.
• The cluster middleware provides a unified system image to the
user, known as a single-system image.
• The middleware is also responsible for providing high
availability, by means of load balancing and responding to
failures in individual components.
FMH/UiTM 25
Cluster Computer Architecture
FMH/UiTM 26
Clusters vs. SMP
• Both provide multiprocessor support to high demand applications
• Both available commercially
• SMP:
- Easier to manage and control
- Closer to single processor systems
Scheduling is main difference
Less physical space
Lower power consumption
• Clustering:
- Superior incremental & absolute scalability
- Less cost
- Superior availability
all components of the system can readily be made highly redundant
FMH/UiTM 27
Nonuniform Memory Access (NUMA)
(Tightly coupled)
• Alternative to SMP and Clusters
• Nonuniform memory access
- All processors have access to all parts of memory
Using load & store
- Access time of processor differs depending on region of memory
Different processors access different regions of memory at different speeds
• Cache coherent NUMA?
- Cache coherence is maintained among the caches of the various
processors
- Significantly different from SMP and Clusters
FMH/UiTM 28
Motivation
• SMP has practical limit to number of processors
- Bus traffic limits to between 16 and 64 processors
• In clusters each node has own memory
- Apps do not see large global memory
- Coherence maintained by software not hardware
• NUMA retains SMP flavour while giving large scale
multiprocessing
• Objective is to maintain transparent system wide memory
while permitting multiprocessor nodes, each with own bus or
internal interconnection system
FMH/UiTM 29
NUMA Organization
FMH/UiTM 30
NUMA Pros & Cons
• Possibly effective performance at higher levels of parallelism
than one SMP
• Not very supportive of software changes
• Performance can breakdown if too much access to remote
memory
- Can be avoided by:
L1 & L2 cache design reducing all memory access
• Need good temporal locality of software
• Not transparent
- Page allocation, process allocation and load balancing changes can be
difficult
FMH/UiTM 31
END OF CHAPTER 7
FMH/UiTM 32