Kuliah 14 Pipeliningg

Pipeline
Course 14
The idea of Pipelining
Traditional Pipeline Concept
• Laundry Example
• Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes

6 PM 7 8 9 10 11 Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6 hours
A for 4 loads
• If they learned pipelining, how
long would laundry take?
B
D
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
• Pipelining doesn’t help latency
6 PM 7 8 9 of single task, it helps
throughput of entire workload
Time
• Pipeline rate limited by slowest
T pipeline stage
a 30 40 40 40 40 20
• Multiple tasks operating
s simultaneously using different
A
k resources
• Potential speedup = Number
O B pipe stages
r • Unbalanced lengths of pipe
stages reduces speedup
d C
• Time to “fill” pipeline and time
e
to “drain” it reduces speedup
r
D • Stall for Dependences
Let’s bring the pipelining concept to the
Processor
Classical 5-Stage Pipeline
• Instruction Fetch (IF)
• Instruction Decode (ID)
• Execute (EX)
• Memory Access (MEM)
• Write Back (WB)
Pipeline concept in the processor
What makes pipeline easy?
• All instructions have the same length in clock cycle
• No dependency among the instructions
What makes pipelining challenging?
• Hazards : The situations that prevent the next instruction in the
instruction stream from being executing during its designated clock
cycle
• Hazards reduce the performance from the ideal speedup gained by
pipelining
• There are 3 types of Hazards : Structural Hazards, Data Hazards and
Control Hazards
Structural Hazards
• They arise from resource conflicts when the hardware cannot support
all possible combinations of instructions in simultaneous overlapped
execution.
• If some combination of instructions cannot be accommodated
because of a resource conflict, the machine is said to have a structural
hazard.
Example of Structural Hazards
a machine has shared a single-memory pipeline for data and instructions. As a result, when an instruction contains a
data-memory reference(load), it will conflict with the instruction reference for a later instruction (instr 3):
Example of Structural Hazards
• To simplify the picture, from this:
• To this:
Data Hazards
• They arise when an instruction depends on the result of a previous
instruction but, the next instruction hasn’t given the result from the
write back of the previous instruction because of the overlapping of
instructions in the pipeline.
• Example of Data Hazards :
• How can we solve this? With the stall again?

• No. because it’ll take the more time. Let’s take the better one
• The Answer is…
Forwarding for Data Hazards
• The first forwarding is for value of R1 from EXadd to EXsub

• The second forwarding is also for value of R1 from MEMadd to EXand
• This code now can be executed without stalls
• What if we use the stall? It should be like this
Control Hazards
• Control hazard occurs when the pipeline makes wrong decisions on branch
prediction and therefore brings instructions into the pipeline that must
subsequently be discarded
• Also called Branch Hazards, They arise from the pipelining of branches (e.g.,
an if-then-else structure) and other instructions that take effect at the end of
MEM stage.
• Control hazards can cause a greater performance loss for DLX pipeline than
data hazards. When a branch is executed, it may or may not change the PC
(program counter) to something other than its current value plus 4. If a branch
changes the PC to its target address, it is a taken branch; if it falls through, it
is not taken.
• If instruction i is a taken branch, then the PC is normally not changed until the
end of MEM stage
Control Hazards
• The simplest method of dealing with branches is to stall the pipeline as
soon as the branch is detected until we reach the MEM stage, which
determines the new PC. The pipeline behavior looks like :
• The stall does not occur until after ID stage (where we know that the
instruction is a branch)
• This control hazards stall must be implemented differently from a data
hazard, since the IF cycle of the instruction following the branch must
be repeated as soon as we know the branch outcome.
• Thus, the first IF cycle is essentially a stall (because it never performs
useful work), which comes to total 3 stalls.
Control Hazards
• Three clock cycles wasted for every branch is a significant loss.
• The number of clock cycles can be reduced by two steps:
- Find out whether the branch is taken or not taken earlier in the pipeline;
- Compute the taken PC (i.e., the address of the branch target) earlier.
Both steps should be taken as early in the pipeline as possible. To predict
the branch is taken or not taken earlier in the pipeline, there are some
Branch prediction schemes.
Branch prediction Schemes
• Stall pipeline (as shown earlier)
• Predict not taken
• Predict taken
• Delayed branch
Predict not Taken
• A higher performance, and only slightly more complex, scheme is to
predict the branch as not taken, simply allowing the hardware
to continue as if the branch were not executed. Care must be taken
not to change the machine state until the branch outcome is
definitely known
When branch is not taken, determined during ID, we have fetched the fall-through and just continue. If the branch is taken
during ID, we restart the fetch at the branch target. This causes all instructions following the branch to stall one clock cycle.
Predict Taken
• An alternative scheme is to predict the branch as taken. As soon as
the branch is decoded and the target address is computed, we
assume the branch to be taken and begin fetching and executing at
the target address.
• Because in DLX pipeline the target address is not known any earlier
than the branch outcome, there is no advantage in this approach. In
some machines where the target address is known before the branch
outcome a predict-taken scheme might make sense.
Delayed Branch
• Delayed branch : give the slot to reschedule the instruction sequence
• In a delayed branch, the execution cycle with a branch delay of length
n is :
• Sequential successors are in the branch-delay slots. These

instructions are executed whether or not the branch is taken.
• Where to get instructions to fill the branch delay slot?
Delayed Branch
• Where to get instructions to fill the branch delay slot?
• From before the branch instruction
• From target address : only valuable when the branch taken
• From fall-through : only valuable when the branch is not taken
If the compiler cannot find such an instruction, it must insert nop (no
operation in the slot)
From Before the Branch Instruction
• Delay slot scheduled with
instruction from before the
branch
• No dependency in sequence
between the branch and other
instruction (ADD instruction in
piture example)
• Best choice, always improve the
performance
From target address
• Delay slot scheduled from branch
target address (target address L in
the picture example)
• We can see, there’s a dependency
between ADD instruction and the
branch BEQZ. We cannot
reschedule the ADD instruction. So,
we reschedule the SUB instruction
into the delayed slot
• It must be OK to execute that
instruction if branch is not taken
From Fall-Through
• Delay slot scheduled from fall
through path
• It must be OK to execute that
instruction if branch is taken
References
• https://www.youtube.com/watch?v=zPmfprtdzCE
• https://www.youtube.com/watch?v=riJC3sAekpY
• https://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/hazards.html#:
~:text=Structural%20Hazards.,instructions%20in%20simultaneous%20
overlapped%20execution.&text=Control%20Hazards.,instructions%20
that%20change%20the%20PC
.
• https://www.youtube.com/watch?v=4JuAZvDgkAg

Kuliah 14 Pipeliningg

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kuliah 14 Pipeliningg

Uploaded by

Copyright:

Available Formats

Pipeline

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

• How can we solve this? With the stall again?

• The first forwarding is for value of R1 from EXadd to EXsub

• Sequential successors are in the branch-delay slots. These

You might also like