You are on page 1of 10

Pipeline

Until now, we have assumed that there is no overlap in the execution of the basic steps of
successive instructions. A substantial improvement in performance can be achieved by
overlapping the execution of basic steps called pipe lining.
Example: Suppose that we want to perform the combined multiply and add operations
with a stream of numbers.
Ai * Bi + Ci for i = 1,2,3,……,7
Each sub operation is to be implemented in a segment within a pipeline. The sub
operations performed in each segment of the pipeline are as follows:

R1 Ai, R2 Bi Input Ai & Bi

R3 R1* R2, R4 Ci Multiply & Input Ci

R5 R3 + R4 Add Ci to product

R1 R2

Multiplier
Ai Bi Ci
R3 R4

Adder

R5

Clock Segment 1 Segment 2 Segment 3
Pulse R1 R2 R3 R4 R5
Number
1 A1 B1 --- --- ---
2 A2 B2 A1 * B1 C1 ---
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 --- --- A7 * B7 C7 A6 * B6 + C6
9 --- --- --- --- A7 * B7 + C7
Content of register in pipeline
Example

1. Suppose to execute n task there are k-segment pipeline with a clock cycle time of
tp .
2. The first task T1 requires a time equal to ktp to complete its operation using k
segments in the pipe.
3. The remaining n - 1 tasks emerge from the pipe at the rate of one task per clock
cycle and they will be completed after a time equal to (n - l) tp
4. Therefore, to complete n tasks using a k-segment pipeline requires k + (n - 1)
clock cycles.

SPEEDUP

The speedup of a pipelining processing over an equivalent non pipelining processing is
defined by the ratio.
nt p
S=
(k + n − 1)t p

As the number of tasks increases, n becomes much larger than k - 1, and k + n - 1
approaches the value of n. Under this condition, the speedup becomes
tn
S=
tp
If we assume that the time it takes to process a task is the same in the pipeline and non
pipeline circuits, we will have tn = ktp. Including this assumption, the speedup reduces to
kt p
S= =k
tp
This shows that the theoretical maximum speedup that a pipeline can provide is k,
where k is the number of segments in the pipeline.

For example:

When a finite number of tasks are executed in a pipeline, the space-time diagram
clearly shows the pipeline start-up region, where all stages are not yet fully utilized, and
the pipeline drainage region, composed of stages that have become idle because the last
task has left them. If a pipeline executes many tasks, the overhead of start-up and
drainage can be ignored and the effective throughput of the pipeline taken to be 1 task per
cycle.
In terms of instruction execution, this corresponds to an effective CPI (cycles per
instruction) of 1. If the 5-stage pipeline has to be drained after executing 7 instructions
(as in Figure 2.b), then 7 instructions are executed in 11 cycles, making the effective CPI
equal to 11/7 = 1.57.
For example:

For example:

There are 4 segments t1=60ns,t2=70ns,t3=100ns, t4=80ns and the interface registers have
a delay of tr=10ns

Using Pipeline :
The clock cycle is choosen to be tp= t3+ tr =100+10 =110ns
Non-pipeline:
tn = t1+ t2+ t3+ t4+ tr =60+70+100+80+10 =320
The pipeline adder has a speedup of 320/110 = 2.9

Pipeline conflicts:

There are 3 major difficulties that cause the instruction pipeline to deviate from its
normal operation.

1. Resource conflict
2. Data dependency
3. Branch difficulties

Resource conflict:

Resource conflict access to memory by 2 segments of the same time. Most of
these conflicts can be resolved by using separate instruction and data memories.
Data dependency:

When an instruction depends on the result of a previous instruction ,but this result
is not available.
Branch difficulties:

Branch difficulties arise from branch and other instructions that change that value
of PC.
Example:

Instr 1 FI DA FO EX
Instr 2 FI DA FO EX
Branch Instr 3 FI DA FO EX
Instr 4 - - FI DA FO EX
Inst 5 - - - FI DA FO EX

Hazard:
In pipe line system in which instruction executions are overlapped, this means that one of the
operations required for instructions IK+1,IK+2 may be started and completed before the
instruction IK is completed, this difference can cause problems if not properly considered in
the design of the control.

Existence of such dependencies causes what is called “hazard”. These hazards must be
detected and resolved so that the accurate results produced by the machine matching
programmer’s expectations.

The hardware technique that detects and resolves hazards is called interlock.
A hazard occurs whenever an object within the system (i.e. registers, flags or memory
locations) is accessed or modified by two separate instructions that are close enough in the
program such that they will be active simultaneously in the pipeline. In computer
architecture, a hazard is a potential problem that can happen in a pipelined processor.

Types of hazards:

1. Instruction hazard
2. Data hazard
3. Branching hazards
4. Structural hazards

1. INSTRUCTION HAZARDS:

An instruction RAW hazard occurs when the instruction to be initiated in the pipeline is
fetched from a location that is yet to be updated by some uncompleted instruction in the
pipeline. The instruction initiation must be suspended till the required change has occurred.

To handle this hazard a centralized controller is required to keep the address in the range sets
for the instructions inside the pipeline. It is then required that every instruction fetch, PC be
compared against the possible match with the address in the address range set used by the
subsequent stages. The match with any of the address means there is an instruction RAW
hazard, and the instruction fetch must be suspended till the instruction updating the object
moves out from point of hazard in the pipeline.
Types of hazards:-

1. RAW (read and write) or true dependency hazard
2. WAR (write and read) or ant dependency hazard
3. WAW (write and write) or out dependency hazard

We demonstrate 3 hazards with the help of a sample program:-

Instruction no code

1. STORE R4,A
2. SUB R3,A
3. STORE R3,A
4. ADD R3,A
5. STORE R3,A
RAW hazard:

In the above code, the second instruction must use the value of A updated by the first
instruction. If the second instruction (SUB) reads the value A before instruction 1 has a
chance to update it, a wrong value of data will be used by the CPU. This situation is called
raw hazard.

WAR hazard:

A WAR hazard between 2 instructions i & j occurs when the instruction j attempts to
write onto some object that being read by instruction i.

The WAR hazard exists between the instructions 2 & 3, since an attempt by instruction 3 to
record a value in A before instruction 2 has read the value is clearly wrong.
WAW hazard:

When WAW hazard between 2 instructions i & j occurs when the instruction j
attempts to write onto some object that is also required to be modified by the
instruction i.

Similarly WAW hazard occurs between the instructions 3 & 5 since an attempt by instruction
5 to store before the store of instruction 3 is clearly incorrect.

2. DATA HAZARDS:

Data hazards occur when data is modified. Ignoring potential data hazards can result in race
conditions. There are 3 situations data hazard can occur in:

1. RAW (read after write).
An operand is modified and read soon after.

Example:

Instr 1: R3 ← R1 + R2
Instr 2: R5 ← R3 + R2

The first instruction is calculating a value to be saved in register R3 and the second is going
to use this value to compute a result for register 5.

2. WAR (write after read)

Example:

R1←R2+R3
R3←R4+R5
We must ensure that we do not store the result of register R3 before it has had a
chance to fetch the operands.

3. WAW (write after write)
Two instructions that write the same operand are performed.

For example:

Instr -1: R2←R1+R3
Instr -2: R2←R4+R5

We must delay the WB (write back) of instr2 until the execution instr1.
Eliminating hazards:

There are several established techniques preventing hazards:

1. Bubbling the pipeline/pipeline break or pipeline stall
2 Elimination data hazards

Pipeline stall Bubbles:

Data dependency in pipelines (execution of one instruction depending on completion of a
previous instruction) can cause pipeline stalls which diminish the performance.

As instruction is fetched, control logic determines whether a hazard could/will occur. If this
is true, then the control logic inserts NOPs into the pipeline. Thus, before the next instruction
is executed, the previous would have had sufficient to complete and prevent the hazard.

Bubble insertion

The first solution is for the assembler to detect this type of data dependency and insert
three redundant but harmless instructions (adding 0 to a register or shifting a register by 0
bit) they perform no useful work and just take up memory locations to space out the data-
dependent instructions.

We say that the assembler inserts three bubbles in the pipeline to resolve a read-after-
compute data dependency. Actually, inserting two bubbles might suffice if we note that
writing into and reading out of a register each take 1 ns.

Insertion of bubbles in a pipeline implies reduced throughput; inserting three
bubbles obviously hurts the performance more than inserting two bubbles. Therefore, an
objective of pipeline designers is to minimize the instances of bubble insertion. An
intelligent compiler can also help mitigate the performance penalty.

A read-after-load data dependency is shown in Figure 4 where the third
instruction uses the value that the second instruction loads into register $ 8. Without data
forwarding, the third instruction must read register $ 8 in cycle 7, one cycle after the
second instruction completes the load process. This implies the insertion of three bubbles
into the pipeline to avoid using the wrong value in the third instruction. Note that data
forwarding alone cannot solve the problem in this case. The value needed by the ALU in
the third instruction becomes available at the end of cycle 5; so even if a data forwarding
mechanism is available, we still need one bubble to ensure correct operation.

sw – store word ,lw –load word
Data Forwarding:

writes into register $ 8 and the fourth instruction needs the result of the third instruction
in register $ 9. Note that writing into register , $ 8 is completed in cycle 6; hence, reading
of the new value from register $ 8 is possible beginning with cycle 7. The third
instruction, however, reads out registers $8 and $2 in cycle 4 and will thus not get the
intended value for register $ 8. This data dependency problem can be solved by bubble
insertion or via data forwarding.

Note that in Figure 3, even though the result of the second instruction is not yet stored in
register $ 8 by the time the third instruction needs it, the result is in fact available at the
output of the ALU. Thus, if a bypass path is provided from the output of the ALU to one
of its inputs. This approach is known as data forwarding.
Forwarding involves feeding output data into a previous stage of a pipeline, for instance, let’s
say we want to write the value 3 to register 1, and then add 7 to register 1 and store the result
in register 2. i.e.

Instr -0 : register 1=6
Inst r-1 : register 1=3
Instr -2 : register 2= register 1+7 =10

Following execution register 2 should contain the value 10. However if instruction 1 does
not completely exit the pipeline before inst-2 starts execution, it means that the register 1
does not contain the value when instr-2 performs its addition. In such an event, inst-2 adds 7
to the old value of register 1 and so register 2 would contain 13.

Operand hazard:

Logic to detect the hazards in operand fetches can be worked out as follow from the
definition of range and domain sets, we can device a mechanism to keep track of the domain
and range sets for various instructions passing through the pipeline stages. Each set will
associate with one set of storage registers for the domain set and another for the range set
addresses. This storage will be required for all the stages beyond the decode stage. When an
instruction moves from stage to stage it also carries its range and domain set information.

Let k be the stage beyond which hazard can occur. For simplicity assume, that there is only
one element each in the domain and each range set. Let DK & RK be the domain and range
address register associated with a stage k.

If a stage i needs to detect any of the 3 hazards, then it must compare it Ri & Di.
The comparators are connected as follows:-

1. The stage j detects RAW hazard by comparing Dj & Ri for all I >j > k
2. The stage j detects WAR hazard by comparing Rj & Di for all I > j> k
3. The stage j detects WAW hazard by comparing Rj & Ri for all I > j > k