You are on page 1of 24

Pipelining in Computer Architecture

Computer Organization and Architecture

Introduction-

● A program consists of several number of instructions.


● These instructions may be executed in the following two ways-

1. Non-Pipelined Execution
2. Pipelined Execution

1. Non-Pipelined Execution-

In non-pipelined architecture,

● All the instructions of a program are executed sequentially one after the other.
● A new instruction executes only after the previous instruction has executed
completely.
● This style of executing the instructions is highly inefficient.

Example-
Consider a program consisting of three instructions.

In a non-pipelined architecture, these instructions execute one after the other as-

If time taken for executing one instruction = t, then-

Time taken for executing ‘n’ instructions = n x t

2. Pipelined Execution-

In pipelined architecture,

● Multiple instructions are executed parallely.


● This style of executing the instructions is highly efficient.

An instruction in a process is divided into 5 subtasks likely,


1. In the first subtask, the instruction is fetched.
2. The fetched instruction is decoded in the second stage.
3. In the third stage, the operands of the instruction are fetched.
4. In the fourth, arithmetic and logical operation are performed on the
operands to execute the instruction.
5. In the fifth stage, the result is stored in memory.

What is Pipelining?
Pipelining is the process of accumulating instruction from the
processor through a pipeline. It allows storing and executing
instructions in an orderly process. It is also known as pipeline
processing.

Before moving forward with pipelining, check these topics out to


understand the concept better :

● Memory Organization

Memory Organization in Computer


Architecture
A memory unit is the collection of storage units or devices together.
The memory unit stores the binary information in the form of bits.
Generally, memory/storage is classified into 2 categories:

● Volatile Memory: This loses its data, when power is switched off.
● Non-Volatile Memory: This is a permanent storage and does not
lose any data when power is switched off.

Memory Hierarchy

Memory Access Methods


Each memory type, is a collection of numerous memory locations. To
access data from any memory, first it must be located and then the
data is read from the memory location. Following are the methods to
access information from memory locations:
1. Random Access: Main memories are random access memories,
in which each memory location has a unique address. Using this
unique address any memory location can be reached in the
same amount of time in any order.
2. Sequential Access: This methods allows memory access in a
sequence or in order.
3. Direct Access: In this mode, information is stored in tracks, with
each track having a separate read/write head.

Hit Ratio

The performance of cache memory is measured in terms of a quantity


called hit ratio. When the CPU refers to memory and finds the word in
cache it is said to produce a hit. If the word is not found in cache, it is
in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory
is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)

● Memory Mapping and Virtual Memory

The transformation of data from main memory to cache memory is


called mapping. There are 3 main types of mapping:
● Associative Mapping The associative memory stores both
address and data.

● Direct Mapping The CPU address of 15 bits is divided into 2


fields.The number of bits in index field is equal to the number of
address bits required to access cache memory.
● Set Associative MappingThe disadvantage of direct mapping is
that two words with same index address can't reside in cache
memory at the same time. This problem can be overcome by set
associative mapping.In this we can store two or more words of
memory under the same index address. Each data word is stored
together with its tag and this forms
Replacement Algorithms

Data is continuously replaced with new data in the cache memory


using replacement algorithms. Following are the 2 replacement
algorithms used:

● FIFO - First in First out. Oldest item is replaced with the latest
item.
● LRU - Least Recently Used. Item which is least recently used by
CPU is removed.

Virtual Memory

Virtual memory is the separation of logical memory from physical


memory. This separation provides large virtual memory for
programmers when only small physical memory is available.

Virtual memory is used to give programmers the illusion that they


have a very large memory even though the computer has a small main
memory. It makes the task of programming easier because the
programmer no longer needs to worry about the amount of physical
memory available.
● Parallel Processing

Instead of processing each instruction sequentially, a parallel


processing system provides concurrent data processing to
increase the execution time.Throughput is the number of
instructions that can be executed in a unit of time.
Data Transfer Modes of a Computer System

According to the data transfer mode, computer can be divided into 4


major groups:

1. SISD
2. SIMD
3. MISD
4. MIMD

SISD (Single Instruction Stream, Single Data Stream)


It represents the organization of a single computer containing a
control unit, processor unit and a memory unit. Instructions are
executed sequentially. It can be achieved by pipelining or multiple
functional units.

SIMD (Single Instruction Stream, Multiple Data Stream)

It represents an organization that includes multiple processing units


under the control of a common control unit. All processors receive the
same instruction from control unit but operate on different parts of
the data.

They are highly specialized computers. They are basically used for
numerical problems that are expressed in the form of vector or matrix.
But they are not suitable for other types of computations

MISD (Multiple Instruction Stream, Single Data Stream)

It consists of a single computer containing multiple processors


connected with multiple control units and a common memory unit. It
is capable of processing several instructions over single data stream
simultaneously. MISD structure is only of theoretical interest since no
practical system has been constructed using this organization.

MIMD (Multiple Instruction Stream, Multiple Data Stream

It represents the organization which is capable of processing several


programs at same time. It is the organization of a single computer
containing multiple processors connected with multiple control units
and a shared memory unit. The shared memory unit contains multiple
modules to communicate with all processors simultaneously.
Multiprocessors and multicomputer are the examples of MIMD. It
fulfills the demand of large scale computations.

Pipelining is a technique where multiple instructions are overlapped


during execution. Pipeline is divided into stages and these stages are
connected with one another to form a pipe like structure. Instructions
enter from one end and exit from another end.

Pipelining increases the overall instruction throughput.


In pipeline system, each segment consists of an input register
followed by a combinational circuit. The register is used to hold data
and combinational circuit performs operations on it. The output of
combinational circuit is applied to the input register of the next
segment.

Pipeline system is like the modern day assembly line setup in


factories. For example in a car manufacturing industry, huge assembly
lines are setup and at each point, there are robotic arms to perform a
certain task, and then the car moves on ahead to the next arm.

Types of Pipeline
1. Arithmetic Pipeline
2. Instruction Pipeline
3. Processor Pipelining
4. Unifunction Vs. Multifunction Pipelining
5. Static vs Dynamic Pipelining
6. . Scalar vs Vector Pipelining

Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They
are used for floating point operations, multiplication of fixed point
numbers etc. For example: The input to the Floating Point Adder
pipeline is:

X = A*2^a

Y = B*2^b

Here A and B are mantissas (significant digit of floating point


numbers), while a and b are exponents.

The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.


2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.

Registers are used for storing the intermediate results between the
above operations.

Instruction Pipeline

In this a stream of instructions can be executed by overlapping fetch,


decode and execute phases of an instruction cycle. This type of
technique is used to increase the throughput of the computer system.

An instruction pipeline reads instruction from the memory while


previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously.
The pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration.

3. Processor Pipelining

Here, the processors are pipelined to process the same data stream. The
data stream is processed by the first processor and the result is stored in the
memory block. The result in the memory block is accessed by the second
processor. The second processor reprocesses the result obtained by the first
processor and the passes the refined result to the third processor and so on.

4. Unifunction Vs. Multifunction Pipelining

The pipeline performing the precise function every time is unifunctional


pipeline. On the other hand, the pipeline performing multiple functions at a
different time or multiple functions at the same time is multifunction pipeline.

5. Static vs Dynamic Pipelining

The static pipeline performs a fixed-function each time. The static pipeline is
unifunctional. The static pipeline executes the same type of instructions
continuously. Frequent change in the type of instruction may vary the
performance of the pipelining.

Dynamic pipeline performs several functions simultaneously. It is a


multifunction pipelining.

6. Scalar vs Vector Pipelining

Scalar pipelining processes the instructions with scalar operands. The vector
pipeline processes the instruction with vector operands.

Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal
performance. Some of these factors are given below:

1. Timing Variations

All stages cannot take same amount of time. This problem generally
occurs in instruction processing where different instructions have
different operand requirements and thus different processing time.

2. Data Hazards

When several instructions are in partial execution, and if they


reference same data then the problem arises. We must ensure that
next instruction does not attempt to access data before the current
instruction, because this will lead to incorrect results.

3. Branching

In order to fetch and execute the next instruction, we must know what
that instruction is. If the present instruction is a conditional branch,
and its result will lead us to the next instruction, then the next
instruction may not be known until the current one is processed.
4. Interrupts

Interrupts set unwanted instruction into the instruction stream.


Interrupts effect the execution of instruction.

5. Data Dependency

It arises when an instruction depends upon the result of a previous


instruction but this result is not yet available.

Pipelining Hazards
Whenever a pipeline has to stall due to some reason it is called pipeline
hazards. Below we have discussed four pipelining hazards.

1. Data Dependency

Consider the following two instructions and their pipeline execution:


In the figure above, you can see that result of the Add instruction is stored in
the register R2 and we know that the final result is stored at the end of the
execution of the instruction which will happen at the clock cycle t4.

But the Sub instruction need the value of the register R2 at the cycle t3. So
the Sub instruction has to stall two clock cycles. If it doesn’t stall it will
generate an incorrect result. Thus depending of one instruction on other
instruction for data is data dependency.

2. Memory Delay

When an instruction or data is required, it is first searched in the cache


memory if not found then it is a cache miss. The data is further searched in
the memory which may take ten or more cycles. So, for that number of cycle
the pipeline has to stall and this is a memory delay hazard. The cache miss,
also results in the delay of all the subsequent instructions.

3. Branch Delay

Suppose the four instructions are pipelined I1, I2, I3, I4 in a sequence. The
instruction I1 is a branch instruction and its target instruction is Ik. Now,
processing starts and instruction I1 is fetched, decoded and the target address
is computed at the 4th stage in cycle t3.

But till then the instructions I2, I3, I4 are fetched in cycle 1, 2 & 3 before the
target branch address is computed. As I1 is found to be a branch instruction,
the instructions I2, I3, I4 has to be discarded because the instruction Ik has to
be processed next to I1. So, this delay of three cycles 1, 2, 3 is a branch
delay.

Prefetching the target branch address will reduce the branch delay. Like if the
target branch is identified at the decode stage then the branch delay will
reduce to 1 clock cycle.
4. Resource Limitation

If the two instructions request for accessing the same resource in the same
clock cycle, then one of the instruction has to stall and let the other instruction
to use the resource. This stalling is due to resource limitation. However, it
can be prevented by adding more hardware.

2 RESOLVING PIPELINING COMPLICATIONS (DATA HAZARDS) In order


to resolve or remove data hazards effectively we can follow these
effective ways.

1. Data hazards occur due to data dependence b/w instruction so in order to


resolve data hazards we must eliminate data dependency
2. To elimination of hazards it is requires that some of the instructions that are
in pipeline and ready for execution are allowed to proceed while other are
delayed that are depend or produce unacceptable results.

3. Other way to eliminate data hazards is uses of pipeline stalls between


some instructions. To block some instruction enter to pipeline while other later
instruction complete, Stalls, NOP are inserted between two pipeline stages.
Inserting stalls will remove dependency b/w the instructions but stalls can
decrease the ideal performance.

4. Data hazards can be eliminated by using forwarding or bypassing.

5. Proper synchronization of instructions is required to deal with data hazards


effectively

6. Data hazards can be removed easily by reordering of instruction but there


is need to be careful about instruction re-ordering technique because
reordering of instruction to avoid one type of hazard can become cause of
other hazards as will.

7. False data dependencies b/w instructions can be removed by using register


renaming technique.

Advantages of Pipelining
● Instruction throughput increases.
● Increase in the number of pipeline stages increases the number of
instructions executed simultaneously.
● Faster ALU can be designed when pipelining is used.
● Pipelined CPU’s works at higher clock frequencies than the RAM.
● Pipelining increases the overall performance of the CPU.

Disadvantages of Pipelining
● Designing of the pipelined processor is complex.
● Instruction latency increases in pipelined processors.
● The throughput of a pipelined processor is difficult to predict.
● The longer the pipeline, worse the problem of hazard for branch
instructions.

Key Takeaways
● Pipelining divides the instruction in 5 stages instruction fetch, instruction
decode, operand fetch, instruction execution and operand store.
● The pipeline allows the execution of multiple instructions concurrently
with the limitation that no two instructions would be executed at the
same stage in the same clock cycle.
● All the stages must process at equal speed else the slowest stage
would become the bottleneck.
● Whenever a pipeline has to stall for any reason it is a pipeline hazard.
This is all about pipelining. So, basically the pipelining is used to improve the
performance of the system by improving its efficiency.

You might also like