Professional Documents
Culture Documents
Kuliah 14 Pipeliningg
Kuliah 14 Pipeliningg
Course 14
The idea of Pipelining
Traditional Pipeline Concept
• Laundry Example
• Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes
Time
30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6 hours
A for 4 loads
• If they learned pipelining, how
long would laundry take?
B
D
Traditional Pipeline Concept
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
• Pipelining doesn’t help latency
6 PM 7 8 9 of single task, it helps
throughput of entire workload
Time
• Pipeline rate limited by slowest
T pipeline stage
a 30 40 40 40 40 20
• Multiple tasks operating
s simultaneously using different
A
k resources
• Potential speedup = Number
O B pipe stages
r • Unbalanced lengths of pipe
stages reduces speedup
d C
• Time to “fill” pipeline and time
e
to “drain” it reduces speedup
r
D • Stall for Dependences
Let’s bring the pipelining concept to the
Processor
Classical 5-Stage Pipeline
• Instruction Fetch (IF)
• Instruction Decode (ID)
• Execute (EX)
• Memory Access (MEM)
• Write Back (WB)
Pipeline concept in the processor
What makes pipeline easy?
• All instructions have the same length in clock cycle
• No dependency among the instructions
What makes pipelining challenging?
• Hazards : The situations that prevent the next instruction in the
instruction stream from being executing during its designated clock
cycle
• Hazards reduce the performance from the ideal speedup gained by
pipelining
• There are 3 types of Hazards : Structural Hazards, Data Hazards and
Control Hazards
Structural Hazards
• They arise from resource conflicts when the hardware cannot support
all possible combinations of instructions in simultaneous overlapped
execution.
• If some combination of instructions cannot be accommodated
because of a resource conflict, the machine is said to have a structural
hazard.
Example of Structural Hazards
a machine has shared a single-memory pipeline for data and instructions. As a result, when an instruction contains a
data-memory reference(load), it will conflict with the instruction reference for a later instruction (instr 3):
Example of Structural Hazards
• To simplify the picture, from this:
• To this:
Data Hazards
• They arise when an instruction depends on the result of a previous
instruction but, the next instruction hasn’t given the result from the
write back of the previous instruction because of the overlapping of
instructions in the pipeline.
• Example of Data Hazards :
• The stall does not occur until after ID stage (where we know that the
instruction is a branch)
• This control hazards stall must be implemented differently from a data
hazard, since the IF cycle of the instruction following the branch must
be repeated as soon as we know the branch outcome.
• Thus, the first IF cycle is essentially a stall (because it never performs
useful work), which comes to total 3 stalls.
Control Hazards
• Three clock cycles wasted for every branch is a significant loss.
• The number of clock cycles can be reduced by two steps:
- Find out whether the branch is taken or not taken earlier in the pipeline;
- Compute the taken PC (i.e., the address of the branch target) earlier.
Both steps should be taken as early in the pipeline as possible. To predict
the branch is taken or not taken earlier in the pipeline, there are some
Branch prediction schemes.
Branch prediction Schemes
• Stall pipeline (as shown earlier)
• Predict not taken
• Predict taken
• Delayed branch
Predict not Taken
• A higher performance, and only slightly more complex, scheme is to
predict the branch as not taken, simply allowing the hardware
to continue as if the branch were not executed. Care must be taken
not to change the machine state until the branch outcome is
definitely known
When branch is not taken, determined during ID, we have fetched the fall-through and just continue. If the branch is taken
during ID, we restart the fetch at the branch target. This causes all instructions following the branch to stall one clock cycle.
Predict Taken
• An alternative scheme is to predict the branch as taken. As soon as
the branch is decoded and the target address is computed, we
assume the branch to be taken and begin fetching and executing at
the target address.
• Because in DLX pipeline the target address is not known any earlier
than the branch outcome, there is no advantage in this approach. In
some machines where the target address is known before the branch
outcome a predict-taken scheme might make sense.
Delayed Branch
• Delayed branch : give the slot to reschedule the instruction sequence
• In a delayed branch, the execution cycle with a branch delay of length
n is :