Professional Documents
Culture Documents
Pipeline Hazards
ALU
LOAD Mem Reg Mem
Mem Reg
Instruction Order
ALU
Instr 1 Mem Reg Mem Reg
ALU
Instr 2 Mem Reg Mem Reg
ALU
Instr 3 Reg Mem Reg
Mem
Mem
ALU
Instr 4 Mem
Mem Reg Mem Reg
Operation on Memory
by 2 different instructions
in the same clock cycle
Hazards CS510 Computer Architectures Lecture 7 - 4
Structural Hazards
with Single-Port Memory
Time(clock cycles)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
ALU
Mem
LOAD Mem Reg Mem Reg
Instruction Order
ALU
Instr 1 Mem Reg Mem
Mem Reg
ALU
Instr 2 Mem Reg Mem Reg
Stall
ALU
Instr 3 Reg Mem Reg
Mem
Stall
Stall
ALU
Instr 3 3 cycles stall Mem
Mem Reg
with 1-port memory
ALU
IM
IM Reg DM
DM Reg
LOAD
Instruction Order
ALU
Instr 1 IM
IM Reg DM Reg
ALU
Instr 2 IM Reg DM
DM Reg
ALU
Instr 3 IM
IM Reg DM Reg
DM
ALU
Instr 4 IM
IM Reg DM Reg
No stall with
Instr 5 2-port memory DM
ALU
IM
IM Reg DM
ALU
Mem Reg Mem Reg R1
ADD R1,R2,R3
ALU
Mem Reg
Reg Mem Reg
SUB R4,R1,R3
ALU
Reg
Re
Reg Mem Reg
AND R6,R1,R7 Mem
ALU
Reg
Reg
OR R8,R1,R9 Mem Mem Reg
ALU
Reg
Reg
XOR R10,R11,R1 Mem Mem Reg
Clcok
Cycle
Store Read
into Ri from Ri
Register Ri
ALU
ADD R1,R2,R3 Mem Reg Mem Reg R1
ALU
Mem Reg Mem Reg
SUB R4,R1,R3 Reg
ALU
AND R6,R1,R7 Reg
Reg Mem Reg
Mem
ALU
OR R8,R1,R9 Mem Reg
Reg Mem Reg
ALU
XOR R10,R11,R1 Mem Reg
Reg Mem Reg
ALU
ADD R1,R2,R3 Mem Reg Mem Reg
ALU
SUB R4,R1,R3 Mem Reg Mem Reg
ALU
AND R6,R1,R7 Mem Reg Mem Reg
ALU
OR R8,R1,R9 Mem Reg Mem Reg
ALU
XOR R10,R11,R1 Mem Reg Mem Reg
Zero?
MUX
D/A Buffer
M/W Buffer
A/M Buffer
ALU
Data
MUX
Memory
ALU
LOAD R1,0(R2) Reg DM Reg
IM Load Delay
=2cycles
ALU
SUB R4,R1,R6 IM Reg DM Reg
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
ALU
AND R6,R1,R7 IM Reg DM Reg
ALU
OR R8,R1,R9 IM Reg DM
ALU
LOAD R1,0(R2) IM Reg DM Reg
Load Delay with
Forwarding=1cycle
ALU
SUB R4,R1,R6 IM Reg DM Reg
ALU
IM Reg DM Reg
AND R6,R1,R7
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
OR R8,R1,R9
scheduled unscheduled
54%
gcc
31%
42%
spice
14%
65%
tex
25%
MUX
Add Zero?
+4
MUX
M/W Buffer
PC
F/D Buffer
D/A Buffer
A/M Buffer
Instr. Reg ALU
Memory File Data LMD
MUX
MUX
Memory
SMD
Sign
16 Ext 32
• Branch Address
• Branch
Calculation
Decision for
• Decide Condition
Hazards CS510 Computer Architectures target address Lecture 7 - 23
Control
Control Hazard
Hazard on
on Branches:
Branches:
Three Stall Cycles
Cycles
Time(clock cycles)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
Program execution order in instructions
ALU
40 BEQ R1,R3, 36 Reg DM Reg Should’t be executed when
IM
branch condition is true !
ALU
44 AND R12,R2, R5 DM Reg
IM
IM Reg
Reg DM Reg Branch Target
available
ALU
48 OR R13,R6, R2 Reg DM
DM Reg
IM
ALU
52 ADD R14,R2, R2
IM Reg
Reg DM Reg
Reg
ALU
80 LD R4,R7, 100
IM Reg DM Reg
Execute stage,
Next Address
is available here.
Add
Add
+4
MUX
M/W Buffer
PC
F/D Buffer
D/A Buffer
A/M Buffer
Instr. Reg ALU
Memory File MUX Data LMD
MUX
Memory
SMD
To get the
Condition Earlier. Sign
Target Address Ext 32
16
available after ID.
3 cycle penalty
Revised DLX pipeline(get the branch address at EX)
Branch instruction IF ID EX MEM WB
Branch successor stall IF ID EX MEM WB
Branch successor + 1 IF ID EX MEM
Branch successor + 2 IF ID
branch instruction
sequential successor1
sequential successor2
........ Delayed Branch of length n
sequential successorn
branch target if taken
- Always improve performance - Improve performance when TAKEN(loop) - Improve performance when
- Branch must not depend on - Must be alright to execute rescheduled NOT TAKEN
rescheduled instructions instructions if Not Taken - Must be alright to execute
- May need duplicate the instruction instructions of Taken
if it is the target of another branch instr.
Hazards CS510 Computer Architectures Lecture 7 - 36
Limitations on Delayed
Branch
• Difficulty in finding useful instructions to fill the delayed
branch slots
• Solution - Squashing
– Delayed branch associated with a branch prediction
– Instructions in the predicted path are executed in the
delayed branch slot
– If the branch outcome is mispredicted, instructions in the
delayed branch slot are squashed(discarded)
Code Motion
LW R1, 0(R2)
SUB R1, R1, R3 If branch is almost always NOT TAKEN,
TAKEN
and R4 is not needed on the taken path,
Depend BEQZ R1, L and R5 and R6 are not modified in the
on LW, following instruction(s), this move can
OR R4, R5, R6
increase speed
need to ADD R10,R4,R3
stall
L: ADD R7, R8, R9
14%
70%
Misprediction Rate
60% 12%
50% 10%
40% 8%
30% 6%
20% 4%
10% 2%
0%
0%
doduc
gcc
ora
tomcatv
alvinn
hydro2d
compress
espresso
mdljsp2
swm256
gcc
doduc
ora
tomcatv
alvinn
hydro2d
compress
espresso
mdljsp2
swm256
Always taken Taken backwards
Not Taken Forwards
Hazards CS510 Computer Architectures Lecture 7 - 41
Evaluating Static Branch
Prediction Strategies
ignores frequency of
10000
branch
• Instructions between 1000
mispredicted branches
is a better metric 100
10
gcc
doduc
ora
tomcatv
alvinn
hydro2d
compress
espresso
mdljsp2
swm256
Profile-based Direction-based