You are on page 1of 18

Pipelining

Speeding up through pipelining


 Ann, Brian, Cathy, Dave each have one load
of clothes to wash, dry, and fold
• Washer takes 30 minutes A B C D
• Dryer takes 30 minutes
• “Folder” takes 30 minutes
• “Stasher” takes 30 minutes
to put clothes into drawers
Sequential Laundry

6 PM 7 8 9 10 11 12 1 2 AM

T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
a Time
A
s
k B
C
O
r D
d
e
r
 Sequential laundry takes 8 hours for 4 loads
 If they learned pipelining, how long would laundry take?
Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O D
r
d
e
r
Pipelined laundry takes 3.5 hours for 4 loads!
Pipelining Lessons
Pipelining doesn’t help
6 PM 7 8 9 latency of single task, it helps
Time throughput of entire workload
T Multiple tasks operating
a 30 30 30 30 30 30 30 simultaneously using different
s resources
A
k Potential speedup = Number
B pipe stages

O C
r Pipeline rate limited by
D slowest pipeline stage
d
 Unbalanced lengths of pipe
e stages reduces speedup
r  Time to “fill” pipeline and
time to “drain” it reduces
speedup
MIPs Datapath
 Datapath contains 5 stages
 Instruction fetch (IF), Decode (ID), Execute (EX), Memory (Mem
), Writeback (W)

A
Instruction L Data
PC Memory Registers
U Memory

Stage 1 (IF) Stage 2 (ID) Stage 3 (EX) Stage 4 (Mem)

Stage 5 (W)

 Can I pipeline the MIPs stages?


Pipelining Instructions

Time (in cycles)


Fetch = 200 ps
IF ID EX M W Decode = 100 ps
Execute = 200 ps
IF ID EX M W Memory = 200 ps
Instruction

Write back = 100 ps


IF ID EX M W

IF ID EX M W

IF ID EX M W

IF ID EX M W

What is the latency for this pipeline?


Pipeline Performance

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)


Why Pipeline? Because the resources are there!

Time (clock cycles)

ALU
I Im Reg Dm Reg
n Inst 1
s

ALU
t Inst 2 Im Reg Dm Reg
r.

ALU
O Inst 3 Im Reg Dm Reg
r
d

ALU
e
Inst 4 Im Reg Dm Reg
r
Inst 5

ALU
Im Reg Dm Reg
MIPS Pipelined Datapath
 State registers between pipeline stages to isolate them

IF:IFetch ID:Dec EX:Execute MEM: WB:


MemAccess WriteBack

Inst 5 Inst 4 Inst 3 Inst 2 Inst 1

Add

4 Shift Add
left 2
Read Addr 1
Instruction Register Read Data
IFetch/Dec

Memory Read Addr D2ata 1 Memory

Exec/Mem
Dec/Exec
Read
PC

Read

Mem/WB
Address File
Write Ad dr ALU Address
Read Data
Data 2 Write Data
Write Data

Sign
16 Extend 32

System Clock
Pipeline Hazards
 Data hazards: an instruction uses the result of a previous
instruction (RAW)
ADD R1, R2, R3 or SW R1, 4(R2)
SUB R4, R1, R5 LW R3, 4(R2)

 Control hazards: the address of the next instruction to be


executed depends on a previous instruction
BEQ R1,R2,CONT
SUB R6,R7,R8

CONT: ADD R3,R4,R5

 Structural hazards: two instructions need access to the same


resource
• e.g., single memory shared for instruction fetch and
load/store
Structural Hazard
Time (clock cycles)

Reading data from


lw

ALU
I Mem Reg Mem Reg
memory
n
s

ALU
t Inst 1 Mem Reg Mem Reg
r.

ALU
O Inst 2 Mem Reg Mem Reg
r
d

ALU
e Inst 3 Mem Reg Mem Reg
r

ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory

 Fix with separate instruction and data memories (I$ and D$)
Data Hazards

Time (in cycles)

F D EX M W
Instruction

Write Data to Here


R1
F D EX M W

Get dat a from Here


R1 ADD R1, R2, R3
SUB R4, R1, R5
One Way to handle a Data Hazard

By waiting –
add $1,… introducing

ALU
I IM Reg DM Reg
stalls – but
n
impacts
s
t stall performace
r.

O stall
r
d
e stall
r

ALU
sub $4,$1,$5 IM Reg DM Reg
Additional Way to “Fix” a Data Hazard
Time
by forwarding
add $1,…

ALU
I IM Reg DM Reg
n
s

ALU
t sub $4,$1,$5 IM Reg DM Reg
r.

ALU
IM Reg DM Reg
r and $6,$1,$7
d
e
or $8,$1,$9

ALU
r IM Reg DM Reg

xor $4,$1,$5

ALU
IM Reg DM Reg
Internal data forwarding
Time
Fix data hazards
add $1,… by forwarding

ALU
I IM Reg DM Reg
results to where
n
they are needed
s

ALU
t sub $4,$1,$5 IM Reg DM Reg
r.

ALU
IM Reg DM Reg
r and $6,$1,$7
d
e

ALU
r or $8,$1,$9 IM Reg DM Reg

ALU
xor $4,$1,$5 IM Reg DM Reg

ALU-to-ALU forwarding vs. full forwarding


Forwarding with Load-use Data Hazards
Time

ALU
I lw $1,4($2) IM Reg DM Reg

n
s

ALU
t sub $4,$1,$5 IM Reg DM Reg

r.

ALU
O and $6,$1,$7 IM Reg DM Reg

r
d

ALU
IM Reg DM Reg
e or $8,$1,$9
r

ALU
IM Reg DM Reg
xor $4,$1,$5

 sub needs to stall


 Will still need one stall cycle even with forwarding
Control Hazard

Time (in cycles)

F D EX M W
Instruction

Destination Available Here

F D EX M W

Need Destination Here


JR R25
...
XX: ADD ...

Simple solution: Flush Instruction fetch until branch resolved

You might also like