You are on page 1of 13

ADVANCED COMPUTER ARCHITECTURE

UNIT-3
Basic Multiprocessor Architecture
 Flynn’s classification
 UMA,NUMA
 Loosely Coupled and Tightly Coupled System
 Centralized Shared Memory Architecture and Distributed Shared Memory Architecture
 Array Processor
 Vector Processor

A multiprocessor system is defined as "a system with more than one processor", and, more precisely, "a
number of central processing units linked together to enable parallel processing to take place".
The key objective of a multiprocessor is to boost a system's execution speed.

Flynn’s Classification/ Flynn’s Taxonomy:

Stream:
 A sequence of items (data / instructions)
 A sequence or flow of either instruction or data operated on by the computer.
 It is of two types
o Instruction stream
o Data stream

Instruction Stream:
In the complete cycle of instruction execution, a flow of instructions from main memory to the CPU is
established. This flow of instructions is called instruction stream.

Data Stream:
Flow of operands between processor and memory bidirectional is called data stream.

It can be said that the sequence of instruction executed by CPU forms the instruction stream and sequence of
data (operands) required for execution of instructions forms the data stream.

Parallel computing is a computing where the jobs are broken into discrete parts that can be executed
concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on different CPUs.
Parallel systems deal with the simultaneous use of multiple computer resources that can include a single
computer with multiple processors, a number of computers connected by a network to form a parallel processing
cluster or a combination of both.

Parallel systems are more difficult to program than computers with a single processor because the architecture
of parallel computers varies accordingly and the processes of multiple CPUs must be coordinated and
synchronized.

The crux of parallel processing is CPUs.


Based on the number of instruction and data streams that can be processed simultaneously, computing
systems are classified by Michael Flynn in 1966 into four major categories:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)

Single-instruction, single-data (SISD) systems –


 An SISD computing system is a uniprocessor machine which is capable of executing a single
instruction, operating on a single data stream.
 In SISD, machine instructions are processed in a sequential manner and computers adopting this
model are popularly called sequential computers.
 Most conventional computers have SISD architecture.
 All the instructions and data to be processed have to be stored in primary memory.

The speed of the processing element in the SISD model is limited (dependent) by the rate at which the
computer can transfer information internally. Dominant representative SISD systems are IBM PC,
workstations.

Ex: IBM 701, IBM 1620, IBM 360-91, CDC 6600

2. Single-instruction, multiple-data (SIMD) systems –

 An SIMD system is a multiprocessor machine capable of executing the same instruction on all the
CPUs but operating on different data streams.
 Machines based on an SIMD model are well suited to scientific computing since they involve lots of
vector and matrix operations.
 So that the information can be passed to all the processing elements (PEs) organized data elements
of vectors can be divided into multiple sets(N-sets for N PE systems) and each PE can process one
data set.
Dominant representative SIMD systems are Cray’s vector processing machine.

Ex: Illiac-IV, STARAN

3. Multiple-instruction, single-data (MISD) systems –


 MISD computing system is a multiprocessor machine capable of executing different instructions on
different PEs but all of them operating on the same dataset.
 The system performs different operations on the same data set. Machines built using the MISD
model are not useful in most of the application, a few machines are built, but none of them are
available commercially.

4. Multiple-instruction, multiple-data (MIMD) systems –

An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on


multiple data sets.
Each PE in the MIMD model has separate instruction and data streams; therefore machines built using this
model are capable to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory
MIMD based on the way PEs are coupled to the main memory.

In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are connected to a
single global memory and they all have access to it.
The communication between PEs in this model takes place through the shared memory, modification of the
data stored in the global memory by one PE is visible to all other PEs. Dominant representative shared
memory MIMD systems are Silicon Graphics machines and Sun/IBM’s SMP (SymmetricMulti-Processing).

In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local
memory. The communication between PEs in this model takes place through the interconnection network (the
inter process communication channel, or IPC). The network connecting PEs can be configured to tree, mesh
or in accordance with the requirement.

UNIFORM MEMORY ACCESS (UMA)

Multiprocessors can be categorized into three shared-memory model which are:


1. Uniform Memory Access (UMA)
2. Non-uniform Memory Access (NUMA)
3. Cache-only Memory Access (COMA)

 UMA represents Uniform memory access.


 It is a shared memory architecture used in parallel computers.
 All the processors in the UMA model share the physical memory uniformly.
 In UMA architecture, access time to a memory location is autonomous of which processor creates the
request or which memory chip includes the shared data.
 Although the UMA architecture is not suitable for building scalable parallel computers, it is excellent
for constructing small-size single bus multiprocessors.
 Two such machines are
Encore Multimax of Encore Computer Corporation representing the technology of the late 1980s and
the Power Challenge of Silicon Graphics Computing Systems representing the technology of the
1990s.

 In UMA, where Single memory controller is used.


 Uniform Memory Access is slower than non-uniform Memory Access.
 In Uniform Memory Access, bandwidth is restricted or limited rather than non-uniform memory
access.
 There are 3 types of buses used in uniform Memory Access which are: Single, Multiple and
Crossbar.
 It is applicable for general purpose applications and time-sharing applications.

 All processors have equal access time to any memory location.


 Because access to shared memory is balanced, these systems are also called SMP (symmetric
multiprocessor) systems. Each processor has equal opportunity to read/write to memory, including
equal access speed.
 Examples of this architecture are Sun Starfire servers, HP V series, and Compaq AlphaServer GS,
Silicon Graphics Inc. multiprocessor servers.
 In UMA model of multiprocessor physical momory is shared uniformly shared by all the processors
 All the processors have equal access time to all the memory words so it is called uniform memory
access.
 Here peripherals are also shared by in some fashion.
 It is also called as tightly coupled systems due to high degree of resource sharing.

NON-UNIFORM MEMORY ACCESS (NUMA)

 NUMA (non-uniform memory access) is a method of configuring a cluster of microprocessor in


a multiprocessing system so that they can share memory locally, improving performance and the ability
of the system to be expanded.
 NUMA can be thought of as a "cluster in a box."
 NUMA is a multiprocessor model in which each processor is connected with the dedicated memory.
 Non-uniform memory access (NUMA) machines were intended to prevent the memory access
bottleneck of UMA machines. The logically shared memory is physically assigned among the
processing nodes of NUMA machines.
 In NUMA machines, such as multicomputer, the main design issues are the organization of processor
nodes, the interconnection network, and the possible approaches to lower remote memory accesses.
 Typical NUMA machines are the Cray T3D and the Hector multiprocessor.
 In NUMA, where different memory controller is used.
 Non-uniform Memory Access is faster than uniform Memory Access.
 Non-uniform Memory Access is applicable for real-time applications and time-critical applications.
Difference between UMA and NUMA:
S.NO UMA NUMA
UMA stands for Uniform Memory NUMA stands for Non-uniform
1. Access. Memory Access.

In Uniform Memory Access, Single In Non-uniform Memory Access,


2. memory controller is used. Different memory controller is used.

Uniform Memory Access is slower Non-uniform Memory Access is


3. than non-uniform Memory Access. faster than uniform Memory Access.

Non-uniform Memory Access has


Uniform Memory Access has more bandwidth than uniform
4. limited bandwidth. Memory Access.

Uniform Memory Access is


applicable for general purpose Non-uniform Memory Access is
applications and time-sharing applicable for real-time applications
5. applications. and time-critical applications.

In uniform Memory Access,


memory access time is balanced or In non-uniform Memory Access,
6. equal. memory access time is not equal.

While in non-uniform Memory


There are 3 types of buses used in Access, There are 2 types of buses
uniform Memory Access which are: used which are: Tree and
7. Single, Multiple and Crossbar. hierarchical.

Loosely Coupled and Tightly Coupled System

Multiprocessors are classified by the way their memory is organized.

MULTIPROCESSOR

LOOSELY COUPLED TIGHTLY COUPLED


MULTIPROCESSORSYSTEM MULTIPROCESSOR SYSTEM

ALSO CALLED AS CLOSELY COUPLED


(COMMON SHARED MEMORY) (DISTRIBUTED MEMORY)
MULTI PROCESSOR MULTI COMPUTER
Loosely coupled system
A loosely coupled multiprocessor system is a type of multiprocessing where the individual processors
are configured with their own memory and are capable of executing user and operating system
instructions independent of each other. This type of architecture paves the way for parallel processing.
Loosely coupled multiprocessor systems are also known as distributed memory, as the processors do
not share physical memory and have their own IO channels.
It is a concept of system design and computing where loosely coupled system is one in which every
individual component has no knowledge of the definitions of other components.
In a loosely coupled system, hardware and software may interact but they are not dependent on each
other.
1. It uses distributed memory concept. High space in this architecture
2. Contention is low in loosely coupled
3. It has high scalability
4. Data rate in loosely coupled system is low
5. Cost of loosely coupled system is low
6. It has static interconnection network
7. It operates on Multiple Operating System
8. In loosely coupled system, each process have its own cache memory
9. Throughput is low in loosely coupled
10. Security is low in loosely coupled
11. Power consumption is higher than tightly coupled system
12. Reusable in the case of flexibility
Tightly coupled system
It is a type of multiprocessing system in which, there is shared memory.
In tightly coupled multiprocessor system, data rate is high rather than loosely coupled multiprocessor
system
It is a concept of system design and computing where every hardware and software components that
are linked together in such manner that each component is dependent upon each other.
Tightly coupled architecture promotes interdependent applications and code.
Tightly coupled architecture is fragile as the minor issue in one segment can bring the whole system
down.
1. It uses shared memory concept. Hence Low space in this architecture
2. Contention is high in tightly coupled
3. It has low scalability
4. Data rate in tightly coupled system is high
5. Cost of tightly coupled system is high
6. It has dynamic interconnection network
7. It operates on Single Operating System
8. In tightly coupled system cache memory assign according to the need of processing
9. Throughput is high in tightly coupled
10. Security is high in tightly coupled
11. Power consumption is lower than loosely coupled system
12. Not reusable in the case of flexibility
Comparison Between Loosely Coupled and Tightly Coupled System:

SL NO TIGHTLY COUPLED MULTIPROCESSOR LOOSELY COUPLED MULTIPROCESSOR


1 Tightly coupled multiprocessor has shared Loosely coupled multiprocessor has distributed
memory memory
2 The degree of coupling between processor is high The degree if coupling between processor is low
3 CPU are connected in close communication CPUs or Systems are located at different
locations
4 In this system CPU shares computer bus, memory In this system there is no such sharing
and I/O devices
5 It is efficient when high degree of interaction It is efficient when tasks running in different
between processes and for high speed and real processors has minimal interaction between them
time processing. (Parallel Applications)
6 Memory conflict occurs due to use of shared No memory conflict occurs
memory
7 Processors communicate with each other using Processors communicate with each other using
shared memory message passing techniques
8 Data rate is high Data rate is low
9 More expensive Less Expensive
10 Compact in Size Larger in size
11 Less Scalable ( Limited no of CPU can be added) More Scalable ( More no of CPU can be added to
improve the system performance)
12 Less Fault tolerant ( if shared memory is More fault tolerant ( A fault in One system/
corrupted then all processor fails) module does not lead to a complete system
breakdown.
13 Less Complex Complex due to additional hardware is required
to provide communication between individual
processors
14 Portable Less Portable
15 Applications of tightly coupled multiprocessor Applications of loosely coupled multiprocessor
are in parallel processing systems. are in distributed computing systems.

Centralized Shared Memory Architecture and Distributed Shared Memory Architecture


Array Processor
A processor that performs computations on a vast array of data is known as an array processor.
Multiprocessors and vector processors are other terms for array processors.
It only executes one instruction at a time on an array of data.
They work with massive data sets to perform computations.
Hence, they are used to enhance the computer's performance.
Classification of Array Processors
Array processors can be divided into two categories:
1. Attached Array Processors
2. SIMD(Single Instruction Stream, Multiple Data Stream) Array Processors

Attached Array Processor


The attached array processor is the auxiliary processor connected to a general-purpose computer to enhance and
improve the machine's performance in numerical computational tasks.

It provides excellent performance by using numerous functional units in parallel processing.


The attached array processor includes a common processor with an input/output interface and a local memory
interface.
The main memory and the local memory are linked.

The attached array processor intends to improve the performance of the host computer in specific numeric
computations.

SIMD Array Processor


SIMD refers to the organization of a single computer with multiple parallel processors.
The processing units are designed to work together under the supervision of a single control unit, resulting in a
single instruction stream and multiple data streams.

An array processor's general block diagram is given below.


It comprises several identical processing elements (PEs), each with its local memory M. An ALU and registers
are included in each processor element.
The master control unit controls the processing elements' actions.
It also decodes instructions and determines how they should be carried out.
The program is stored in the main memory.
The control unit retrieves the instructions.
Vector instructions are sent to all PEs simultaneously, and the results are stored in memory.

The ILLIAC IV computer, manufactured by the Burroughs Corporation, is the most well-known SIMD array
processor. Single Instruction Multiple Data (SIMD) processors are highly specialized computers. They're only
good for numerical issues that can be stated as vectors or matrices; they're not suitable for other kinds of
computations.
Configurations of SIMD
1. Array processors that use RAM(Random Access Memory) are also known as Dedicated Memory
Organisation.
 ILLIAC-IV
 CM-2
 MP-1

2. Associative processor that uses content accessible memory is known as Global Memory Organisation.
 BSP
Usage of Array Processors
 Array processors enhance the total speed of instruction processing.
 Most array processors' design optimizes its performance for repetitive arithmetic operations, making it
faster at vector arithmetic than the host CPU. Since most Array processors run asynchronously from the
host CPU, the system's overall capacity is thus improved.
 Array Processors have their own local memory, providing additional extra memory to systems with limited
memory. This is an essential consideration for the systems with a limited physical memory or address
space.

Applications of Array Processors


Array processing is used at various places, including:-
 Radar Systems
 Sonar Systems
 Anti-jamming
 Seismic Exploration
 Wireless communication
 Medical applications
 Used for Speech Enhancement
 Used in Astronomy applications

Array processors are extremely useful for dealing with problems that require a lot of parallelisms. However,
they do require a change in programming methodology. Converting conventional (sequential) programs to
support array processors is complex, and different (parallel) algorithms may be needed to match the parallel
approach.

Vector Processor
Vector processor is basically a central processing unit that has the ability to execute the complete vector input in
a single instruction.
More specifically we can say, it is a complete unit of hardware resources that executes a sequential set of similar
data items in the memory using a single instruction.
We know elements of the vector are ordered properly so as to have successive addressing format of the memory.
This is the reason why we have mentioned that it implements the data sequentially.
It holds a single control unit but has multiple execution units that perform the same operation on different data
elements of the vector.
Unlike scalar processors that operate on only a single pair of data, a vector processor operates on multiple pair
of data. However, one can convert a scalar code into vector code. This conversion process is known as
vectorization. So, we can say vector processing allows operation on multiple data elements by the help of single
instruction.
These instructions are said to be single instruction multiple data or vector instructions. The CPU used in
recent time makes use of vector processing as it is advantageous than scalar processing.
Architecture and Working
The figure below represents the typical diagram showing vector processing by a vector computer:

The functional units of a vector computer are as follows:


IPU or instruction processing unit
Vector register
Scalar register
Scalar processor
Vector instruction controller
Vector access controller
Vector processor

As it has several functional pipes thus it can execute the instructions over the operands. We know that both data
and instructions are present in the memory at the desired memory location. So, the instruction processing unit
i.e., IPU fetches the instruction from the memory.
Once the instruction is fetched then IPU determines either the fetched instruction is scalar or vector in nature. If
it is scalar in nature, then the instruction is transferred to the scalar register and then further scalar processing is
performed.
While, when the instruction is a vector in nature then it is fed to the vector instruction controller. This vector
instruction controller first decodes the vector instruction then accordingly determines the address of the vector
operand present in the memory.
Then it gives a signal to the vector access controller about the demand of the respective operand. This vector
access controller then fetches the desired operand from the memory. Once the operand is fetched then it is
provided to the instruction register so that it can be processed at the vector processor.
At times when multiple vector instructions are present, then the vector instruction controller provides the
multiple vector instructions to the task system. And in case the task system shows that the vector task is very
long then the processor divides the task into subvectors.
These subvectors are fed to the vector processor that makes use of several pipelines in order to execute the
instruction over the operand fetched from the memory at the same time.
The various vector instructions are scheduled by the vector instruction controller.
Classification of Vector Processor
The classification of vector processor relies on the ability of vector formation as well as the presence of vector
instruction for processing. So, depending on these criteria, vector processing is classified as follows:

Register to Register Architecture


This architecture is highly used in vector computers. As in this architecture, the fetching of the operand or
previous results indirectly takes place through the main memory by the use of registers.
The several vector pipelines present in the vector computer help in retrieving the data from the registers and also
storing the results in the desired register. These vector registers are user instruction programmable.
This means that according to the register address present in the instruction, the data is fetched and stored in the
desired register. These vector registers hold fixed length like the register length in a normal processing unit.
Some examples of a supercomputer using the register to register architecture are Cray – 1, Fujitsu etc.
Memory to Memory Architecture
Here in memory to memory architecture, the operands or the results are directly fetched from the memory
despite using registers. However, it is to be noted here that the address of the desired data to be accessed must be
present in the vector instruction.
This architecture enables the fetching of data of size 512 bits from memory to pipeline. However, due to high
memory access time, the pipelines of the vector computer requires higher startup time, as higher time is required
to initiate the vector instruction.
Some examples of supercomputers that possess memory to memory architecture are Cyber 205, CDC etc.

Advantages of Vector Processor


Vector processor uses vector instructions by which code density of the instructions can be improved.
The sequential arrangement of data helps to handle the data by the hardware in a better way.
It offers a reduction in instruction bandwidth.
So, from the above discussion, we can conclude that register to register architecture is better than memory to
memory architecture because it offers a reduction in vector access time.

You might also like