You are on page 1of 8

Types of Pipeline

It is divided into 2 categories:


• Arithmetic Pipeline
Concepts of Pipelining • Instruction Pipeline
By

Suparna Dutta

Arithmetic pipelining Arithmetic pipelining


Arithmetic pipelines are usually found in most of the computers. They The floating point addition and subtraction is done in 4 parts:
are used for floating point operations, multiplication of fixed point • Compare the exponents.
numbers etc. For example: The input to the Floating Point Adder
pipeline is: • Align the mantissas.
• Add or subtract mantissas
• Produce the result.
Instruction pipelining Advantages and Disadvantages of pipelining
• In this a stream of instructions can be executed by
overlapping fetch, decode and execute phases of an instruction cycle.
This type of technique is used to increase the throughput of the
computer system.
• An instruction pipeline reads instruction from the memory while
previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously.
The pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration.

Pipelining Hazards Types of Hazards


• As we all know, the CPU’s speed is limited by memory. There’s one The three different types of hazards in computer architecture are:
more case to consider, i.e. a few instructions are at some stage of
execution in a pipelined design. There is a chance that these sets of • 1. Structural
instructions will become dependent on one another, reducing the • 2. Data
pipeline’s pace. Dependencies arise for a variety of reasons, which we
will examine shortly. The dependencies in the pipeline are referred to • 3. Control
as hazards since they put the execution at risk.
• We can swap the terms, dependencies and hazards since they are
used interchangeably in computer architecture. A hazard, in essence,
prevents an instruction present in the pipe from being performed
during the specified clock cycle. Since each of the instructions may be
in a separate machine cycle, we use the term clock cycle.
Data Hazard Types of Data Hazard
There are mainly three types of data hazards:
• 1) RAW (Read after Write) [Flow/True data dependency]
• Data hazards in pipelining emerge when the execution of one 2) WAR (Write after Read) [Anti-Data dependency]
instruction is dependent on the results of another instruction that is 3) WAW (Write after Write) [Output data dependency]
still being processed in the pipeline. The order of the READ or WRITE Let there be two instructions I and J, such that J follow I. Then,
operations on the register is used to classify data threats into three • RAW hazard occurs when instruction J tries to read data before instruction I writes it.
groups. Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
• WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
• WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5

Control Hazard Control Hazard example and solution


• Branch hazards are caused by branch instructions and are known as
control hazards in computer architecture. The flow of
program/instruction execution is controlled by branch instructions.
Remember that conditional statements are used in higher-level
languages for iterative loops and condition testing (correlate with
while, for, and if case statements). These are converted into one of
the BRANCH instruction variations. As a result, when the decision to
execute one instruction is reliant on the result of another instruction,
such as a conditional branch, which examines the condition’s
consequent value, a conditional hazard develops.
Solution for Control dependency Structural Hazard
Branch Prediction is the method through which stalls due to control Hardware resource conflicts among the
dependency can be eliminated. In this at 1st stage prediction is done about instructions in the pipeline cause
which branch will be taken. For branch prediction Branch penalty is zero. structural hazards. Memory, a GPR
• Branch penalty : The number of stalls introduced during the branch Register, or an ALU might all be used as
operations in the pipelined processor is known as branch penalty. resources here. When more than one
instruction in the pipe requires access to
NOTE : As we see that the target address is available after the ID stage, so the very same resource in the same
the number of stalls introduced in the pipeline is 1. Suppose, the branch clock cycle, a resource conflict is said to
target address would have been present after the ALU stage, there would arise. In an overlapping pipelined
have been 2 stalls. Generally, if the target address is present after the execution, this is a circumstance where
kth stage, then there will be (k – 1) stalls in the pipeline. the hardware cannot handle all
• Total number of stalls introduced in the pipeline due to branch instructions potential combinations.
= Branch frequency * Branch Penalty

Solution Pipeline Optimization


Process to maximize the rendering speed, then allow stages that are
not bottlenecks to consume as much time as the bottleneck.
Pipeline Optimization Non-linear pipeline
• For a given reservation table, find the current average sample period
(ASP). Non-Linear pipeline is a pipeline which is made of different pipelines
• Find the largest no. of cycles for which a resource is busy. that are present at different stages. The different pipelines are
connected to perform multiple functions. It also has feedback and
• This is equal to the Minimum possible Average Sampling Time feed-forward connections. It is made such that it performs various
(MASP). function at different time intervals. In Non-Linear pipeline the
• If ASP = MASP, there is nothing to be done. functions are dynamically assigned.
• Else, we should try to re-schedule events such that MASP is achieved.

Reservation Table Reservation Table


• A reservation table is a way of representing the task flow pattern of a pipelined system. Each row
of the reservation table represents one resource of the pipeline and each column represents one
time-slice of the pipeline. All the elements of the table are either 0 or 1. If one resource (say,
resource i) is used in a time-slice (say time-slice j), then the (i,j)-th element of the table will have
the entry 1. On the other hand, if a resource is not used in a particular time-slice, then that entry 1.Forbidden Latencies are : 0, 2, 3, 5
of the table will have the value 0.
2.Pipeline collision Vector is : (101101)
3.Greedy Cycle is : (1, 6)*
4.Minimal Average Latency is : 3.5
5.Throughput is 0.28
Definitions Problem
• Latency means the number of time units [clock cycles] between two
initiations of a pipeline.
• Forbidden Latency: Latencies that cause collisions.
• Permissible Latency: Latencies that will not cause collisions.
• Latency Sequence : A sequence of permissible latencies between
successive task initiations
• Latency Cycle : A Latency Cycle is a latency sequence which repeats the
same subsequence (cycle) indefinitely.
• Average latency : The average latency of a latency cycle is obtained by
dividing the sum of all latencies by the number of latencies along the cycle.
• Collision Vector: The combined Set of permissible and forbidden latencies
can be easily displayed by a collision vector

• Latency Sequence: A sequence of permissible latencies between successive


initiations
• Latency Cycle: A latency sequence that repeats the same subsequence
(cycle) indefinitely
• Simple cycles: A simple cycle is a latency cycle in which each state appears
only once. (3), (6), (8), (1, 8), (3, 8), and (6,8)
• Greedy Cycles: Simple cycles whose edges are all made with minimum
latencies from their respective starting states.
• Greedy cycles must first be simple, and their average latencies must be
lower than those of other simple cycles. (1,8), (3) → one of them is
MAL(Minimum Average latency)
• MAL (Minimum Average Latency) is the minimum average latency obtained
from the greedy cycle. In greedy cycles (1, 8) and (3), the cycle (3) leads to
MAL value 3.

Cycles: (1, 8), (I, 8, 6, 8), (1, 8,


3, 8), (3), (6), [3, 8), (3, 6, 3)
Instruction Level Parallelism Compiler Techniques for Exposing ILP
• Instruction Level Parallelism (ILP) is used to refer to the architecture Basic Pipeline Scheduling and Loop Unrolling
in which multiple operations can be performed parallelly in a To avoid a pipeline stall, a dependent instruction must be separated
particular process, with its own set of resources – address space, from the source instruction by a distance in clock cycles equal to the
registers, identifiers, state, program counters. It refers to the pipeline latency of that source instruction.
compiler design techniques and processors designed to execute
operations, like memory load and store, integer addition, float A compiler’s ability to perform this scheduling depends both on the
multiplication, in parallel to improve the performance of the amount of ILP available in the program and on the latencies of the
processors. Examples of architectures that exploit ILP are VLIWs, functional units in the pipeline. Throughout this chapter we will
Superscalar Architecture. assume the FP unit latencies shown in Figure

Compiler Techniques for Exposing ILP contd…. Example


for (i=1000; i>0; i=i–1)

x[i] = x[i] + s;

How this loop will run


when it is scheduled on a
simple pipeline for MIPS
with the latencies ?

This code takes 10


clock cycles per
instruction
Compiler Techniques for Exposing ILP
Compiler Techniques for Exposing ILP cont…. cont….
This loop is parallel by noticing that the body of each iteration is independent. The first step We can schedule the loop to obtain only one stall:
is to translate the above segment to MIPS assembly language.
In the following code segment, R1is initially the address of the element in the array with the
highest address, and F2 contains the scalar value, s. Register R2 is precomputed,
so that 8(R2) is the last element to operate on. The straightforward MIPS code, not scheduled
for the pipeline, looks like this:
Loop: L.D F0,0(R1) ;F0=array element Loop: L.D F0,0(R1)
ADD.D F4,F0,F2 ;add scalar in F2
S.D F4,0(R1) ;store result DADDUI R1,R1,#-8
DADDUI R1,R1,#-8 ;decrement pointer
;8 bytes (per DW) ADD.D F4,F0,F2
BNE R1,R2,Loop ;branch R1!=zero stall
Let’s start by seeing how well this loop will run when it is scheduled on a simple pipeline for
MIPS with the latencies BNE R1,R2,Loop ;delayed branch
S.D F4,8(R1) ;altered & interchanged with DADDUI

Compiler Techniques for Exposing ILP


cont….
• Loop unrolling can also be used to improve scheduling. Because it
eliminates the branch, it allows instructions from different iterations
to be scheduled together. In this case, we can eliminate the data use
stall by creating additional independent instructions within the loop
body. If we simply replicated the instructions when we unrolled the
loop, the resulting use of the same registers could prevent us from
effectively scheduling the loop. Thus, we will want to use different
registers for each iteration, increasing the required register count.

You might also like