Professional Documents
Culture Documents
Pipelining
Princeton University
Fall 2015
1
Pipelining is Natural: Assembly Line!
Laundry Example
• Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes
2
Sequential Laundry
6 PM 7 8 9 10 11 12 1 2 AM
T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
a Time
A
s
k B
C
O
r D
d
e
r Sequential laundry takes 8 hours for 4 loads
If they learned pipelining, how long would laundry take?
3
Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O
D
r
d
e
r
• Pipelined laundry takes 3.5 hours for 4 loads!
4
Slow Dryers
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O
D
r
d
e
r
5.5 Hours. What is going on here?
5
Pipelining Lessons
6
7
MIPS
Pipe Stages == The Five Execution Steps
1. Instruction Fetch
IDEAL?
9
Can We Pipeline the Multicycle Datapath?
10
Can We Pipeline the Unicycle Datapath?
Unicycle
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr
Looks good, but….
Store Ifetch Reg Exec Mem Wr
Multicycle Machine
10 ns/cycle x 4.6 CPI (inst mix) x 100 inst = 4600 ns
18
Unicycle Implementation Detail
Clock
Time
19
3 Stage Pipeline Implementation Detail
R R R
Comb. Comb. Comb.
E E E Delay = 39ns
Logic Logic Logic Throughput = 77MHz
G G G
Clock
Op1
Op2
• Space operations 13ns
Op3 apart
Op4 • 3 operations executing
Time • • • simultaneously
20
Limitation 1: Nonuniform Pipelining
R R R
Com. Comb. Comb.
E E E
Log. Logic Logic
G G G
Delay = 18 * 3 = 54 ns
Clock Throughput = 55MHz
21
Limitation 2: Deep Pipelines
5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns
R R R R R R
Com. Com. Com. Com. Com. Com.
E E E E E E
Log. Log. Log. Log. Log. Log.
G G G G G G
Structural Hazards
• Resource oversubscription
• Suppose we had only one memory
• In laundry, think of a washer/dryer combo unit
Pipeline Hazards
Control Hazards
• What is the next instruction?
• Branch instructions take time to compute this.
Solution 1: Stall
Data Hazard:
Solution: Bypassing
Pipeline Hazards
Data Hazards
Value from prior instruction is needed before write back
Summary
• Pipelining is a fundamental concept in computers/nature
• Multiple instructions in flight
• Limited by length of longest stage, Latency vs.Throughput
• Hazards gum up the works
Real Stuff
• MIPS I instruction set architecture made pipeline visible
(delayed branch, delayed load)
• More performance from deeper pipelines, parallelism to
a point
• Pentium 4 has 22 pipe stages!
32
33
Review: Pipelined Datapath
Data Hazard:
Solution: Bypassing
Review: Pipeline Hazards
Data Hazards
Value from prior instruction is needed before write back
scheduled unscheduled
gcc 54%
31%
spice 42%
14%
tex 65%
25%
40
Pipeline Control
41
Pipeline Control
• Control is divided into 5 stages
• Signal values same as unicycle case!
• Timing is different…
42
Pipeline Control
43
Pipeline Control
44
What About Data Hazards?
45
What About Data Hazards?
46
Forwarding Unit
47
Forwarding Unit
EX Hazard:
EX/MEM.RegWrite AND EX/MEM.RegisterRd != 0 AND EX/
MEM.RegisterRd == ID/EX.RegisterReadRs(Rt)
49
What About Load-Use Stall?
50
Hazard Detection Unit
Nop is all zeros!!
51
Hazard Detection Unit
ID/EX.MemRead AND
(ID/EX.RegisterRt == IF/ID.RegisterRs OR ID/
EX.RegisterRt == IF/ID.RegisterRt)
52
What About Control Hazards?
(Predict Not-Taken Machine)
Architectural
State Change?
54
What About Control Hazards?
55
What About Control Hazards?
56
57
Review: Exceptions
Exception
An event that disrupts program execution.
58
Review: Multicycle Exception Handling
59
Exceptions in Pipelines
Don’t Forget
EPC and
Cause!!!
60
Pipeline Exception Handling
61
Look at this mess!!!
62
Precise vs. Imprecise Exceptions
Precise Exceptions
• EPC has value of excepting instruction PC
• Easy for OS to handle
• We have been looking at precise exception machine
Imprecise Exceptions
• Reduce pipeline complexity by putting current PC or
other approximation into EPC
• OS figures it out
63
Summary
64