CA Classes-241-245

Computer Architecture Unit 11
among memory and vector registers in vector load/store unit. It is

responsible for overlapping read and write operation from memory and also
mark the high latency linked with main memory access.
1. Load Vector Operation: In this operation a vector moves from memory to
vector register
2. Store Vector Operation: This operation moves a vector to memory
Vector Functional Units: This unit have some vector functional units:
 integer operations
 floating point
 logical operations
As shown in figure 11.1, Cray 1 has six functional units The NEC SX/2 has
sixteen functional units: four shift units, four integers add/logical, four FP
add and four FP multiply/divide.
Memory: In this processor the memory unit is different from the memory
unit we use in normal processors. This unit permits the pipelined data
transfer to and from memory. Interleaved memory is utilized to support
pipelined data transfer from memory.
Self Assessment Questions
1. The first vector machine was _____________.
2. ________ operations get the scalar inputs present in scalar registers.
11.3 Pipelining
We have discussed this concept in Unit 4 and 5, but we need to recap it in
order to get a better idea of the next sections.
What is Pipelining?
An implementation technique by which the execution of multiple instructions
can be overlapped is called pipelining. In other words, it is a method which
breaks down the sequential process into numerous sub–operations. Then
every sub-operation is concurrently executed in dedicated segments
separately. The main advantage of pipelining is that it increases the
instruction throughput, which is specified the count of instructions completed
per unit time. Thus, a program runs faster. In pipelining, several
computations can run in distinct segments simultaneously.
Manipal University of Jaipur B1648 Page No. 241

A register is connected with every segment in the pipeline to provide

isolation between each segment. Thus, each segment can operate on
distinct data simultaneously. Pipelining is also called virtual parallelism as it
provides an essence of parallelism only at the instruction level.
In pipelining, the CPU executes each instruction in a series of following
small common steps:
1. Instruction fetching
2. Instruction decoding
3. Operand address calculation and loading
4. Instruction execution
5. Storing the result of the execution
6. Write back
The CPU while executing a sequence of instructions can pipeline these
common steps. However, in a non-pipelined CPU, instructions are executed
in strict sequence following the steps mentioned above.
To understand pipelining, let us discuss how an instruction flows through the
data path in a five-segment pipeline. Consider a pipeline with five
processing units, where each unit is assumed to take 1 cycle to finish its
execution as described in the following steps:
a) Instruction fetch cycle: In the first step, the address of the instruction
to be fetched from memory into Instruction Register (IR) is stored in PC
register.
b) Instruction decode fetch cycle: The instruction thus fetched is
decoded and register is read into two temporary registers. Decoding
and reading of registers is done in parallel.
c) Effective address calculation cycle: In this cycle, the addresses of
the operands are being calculated and the effective addresses thus
calculated are placed into ALU output register.
d) Memory access completion cycle: In this cycle, the address of the
operand calculated during the prior cycle is used to access memory. In
case of load and store instructions, either data returns from memory
and is placed in the Load Memory Data (LMD) register or is written into
memory. In case of branch instruction, the PC is replaced with the
branch destination address in the ALU output register.

e) Instruction execution cycle: In the last cycle, the result is written into
the register file.
Pipelines are of two types - Linear and Non-linear. Linear pipelines perform
only one pre-defined fixed functions at specific times in a forward direction
from one stage to next stage. On the other hand, a dynamic pipeline which
allows feed forward and feedback connections in addition to the streamline
connections is called a non-linear pipeline.
An Instruction pipeline operates on a stream of instructions by overlapping
and decomposing the three phases of the instruction cycle. Super pipeline
design is an approach that makes use of more and more fine-grained
pipeline stages in order to have more instructions in the pipeline. As RISC
instructions are simpler than those used in CISC processors, they are more
conducive to pipelining.
Self Assessment Questions
3. _____________ specifies the count of instructions completed per unit
time.
4. Pipelining is also called ________________ as it provides an essence
of parallelism only at the instruction level.
5. Linear pipelines perform only one pre-defined fixed functions at specific
times in a forward direction. (True/False)
11.4 MIMD Architectural Concepts

Computers with multiple processors that are capable of executing vector
arithmetic operations using multiple instruction streams and multiple data
streams are called Multiple Instruction streams Multiple Data stream (MIMD)
computers. All multiprocessing computers are MIMD computers. The
framework of an MIMD computer is shown in figure 11.2.

Control Unit I/O Unit
IS IS IS
P : Processor
IS P2 DS : Data Stream
P1 Pn
IS : Instruction Stream
DS DS DS
Memory
Figure 11.2: The Framework of an MIMD Computer
MIMD is also known as multiple independent processors which operate as a

component of huge systems. For example parallel processors, multi-
processors and multi-computers. There are two forms of MIMD machines:
 multiprocessors (shared-memory machines)
 multi-computers (message-passing machines)
11.4.1 Multiprocessor
Multiprocessor are systems with multiple CPUs, which are capable of
independently executing different tasks in parallel. They have the following
main features:
 They have either shared common memory or unshared distributed
memories.
 They also share resources for example I/O devices, system utilities,
program libraries, and databases.
 They are operated on integrated operating system that gives interaction
among processors and their programs at the task, files, job levels and
also in data element level.
Types of multiprocessors
There are 3 types of multi-processors they are distributed in the way in
which shared memory is implemented. (See figure 11.3). They are:
 UMA (Uniform Memory Access),

 NUMA (Non Uniform Memory Access)

 COMA (Cache Only Memory Access)
Figure 11.3: Shared Memory Multiprocessors
Basically the memory is divided into several modules that is why large
multiprocessors into different categories. Let’s discuss them in detail.
UMA (Uniform Memory Access): In this category every processor and
memory module has similar access time. Hence each memory word can be
read as quickly as other memory word. If not then quick references are
slowed down to match the slow ones, so that programmers cannot find the
difference this is called uniformity here. Uniformity predicts the performance
which is a significant aspect for code writing. Figure 11.4 shows uniform
memory access from the CPU on the left.
Figure 11.4: Uniform and Non-Uniform Memory Access

CA Classes-241-245

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CA Classes-241-245

Uploaded by

Copyright:

Available Formats

Computer Architecture Unit 11

among memory and vector registers in vector load/store unit. It is

Manipal University of Jaipur B1648 Page No. 241

A register is connected with every segment in the pipeline to provide

Manipal University of Jaipur B1648 Page No. 242

11.4 MIMD Architectural Concepts

Manipal University of Jaipur B1648 Page No. 243

Control Unit I/O Unit

Figure 11.2: The Framework of an MIMD Computer

MIMD is also known as multiple independent processors which operate as a

Manipal University of Jaipur B1648 Page No. 244

 NUMA (Non Uniform Memory Access)

Figure 11.3: Shared Memory Multiprocessors

Figure 11.4: Uniform and Non-Uniform Memory Access

Manipal University of Jaipur B1648 Page No. 245

You might also like