Professional Documents
Culture Documents
Computer Architecture
Spring 2016
Tufts University
MULT
LD F2, F0, F2,F4
45(R3) ADDD F6, F8,F2
LD F6, 34(R2)
LD F6, 34(R2) DIVD F10, F0,F6
LD F2, 45(R3)
MULT
LD F2, F0, F2,F4
45(R3) ADDD F6, F8,F2
• The DIVD is supposed
SUBD F8, F6,F2 SUBD F8, F6,F2
to execute next to last
MULT F0, F2,F4 – It will probably execute
DIVD F10, F0,F6
pretty early
ADDD F6, F8,F2 – There's a problem with
that
EE194/Comp140 Mark Hempstead 5
Data flow
Cycle 1 2 3 4 5 6 7 8
Add R3, F1 F2 D1 D2 E1 E2 E3 M1 M2 W1 W2
R2, R1
Add R4, F1 F2 D1 D2 D2 D2 E1 E2 E3 M1 M2 W1 W2
R3, R5
IM Reg ALU
Ld
SubR3,
R7, MUX Reg
10(R1)
R8,R9
Ld/St pipe
ALU DM
IM Reg ALU
R8,R9 10(R1)
Ld/St pipe
ALU DM
IM Reg ALU
ALU DM
Ld R3,
10(R1)
EE194/Comp140 Mark Hempstead 19
Multiple small pipes
• The load stalls due to a load miss.
• The Sub advances in the Arith. Pipe.
Arith. pipe
IM Reg ALU
Sub R7,
MUX Reg
R8,R9
Ld/St pipe
ALU DM
Ld R3,
10(R1)
EE194/Comp140 Mark Hempstead 20
Multiple small pipes
• The load is still stalled.
• The Sub retires.
Arith. pipe Sub R7,
IM Reg ALU R8,R9
MUX Reg
Ld/St pipe
ALU DM
Ld R3,
10(R1)
EE194/Comp140 Mark Hempstead 21
Unintended consequences
• We’ve had an unrelated instruction that would
have been stuck behind a stalled load.
– We let it advance past the load into its own mini-pipe
– But it retired OOO. Won’t that cause problems with
exceptions? Yes!
– And it still doesn’t solve all of our problems
– Consider this sequence:
Ld R3, 10(R1)
Add R4, R3,R5
Sub R7, R8,R9
Arith. pipe
IM Reg ALU
R3,R5 10(R1)
Ld/St pipe
ALU DM
IM Reg ALU
Add R7,
Sub R4, Ld R3, MUX Reg
R3,R5
R8,R9 10(R1)
Ld/St pipe
ALU DM
R8,R9 R3,R5
Ld/St pipe
ALU DM
Ld R3,
10(R1)
EE194/Comp140 Mark Hempstead 25
Multiple small pipes
• And here we sit, stuck, until the load miss finishes
. Any ideas?
Arith. pipe
IM Reg ALU
R8,R9 R3,R5
Ld/St pipe
ALU DM
Ld R3,
10(R1)
EE194/Comp140 Mark Hempstead 26
We’ve fixed just one problem
• We fixed the problem of a load miss stalling all
upstream arithmetic instructions (even unrelated
ones).
• But a load miss still stalls an upstream arithmetic
instruction that needs the load result.
• And that arithmetic instruction will, in turn, stall
all upstream arithmetic instructions (whether or
not they’re really waiting for the load data).
IM Reg ALU
Res.St
MUX Reg
Ld/St pipe
Res.St
ALU DM
Res.St
IM Reg ALU
Res.St
Add
Ld R4,
R3,
MUX Reg
R3,R5
10(R1) Ld/St pipe
Res.St
ALU DM
Res.St
IM Reg ALU
Res.St
AddR7,
Sub R4, Ld R3,
R3,R5
MUX Reg
R8,R9 10(R1) Ld/St pipe
Res.St
ALU DM
Res.St
IM Reg ALU
Res.St
Sub R7, Add R4,
R8,R9
MUX Reg
R3,R5 Ld R3, Ld/St pipe
Res.St
10(R1)
ALU DM
Res.St
ECEC 621 38
Mark Hempstead
WAW &WAR hazards
• Consider the following code snippets, where the Sub finishes
early as in our example.
Ld R3, 10(R1) Ld R3, 10(R1)
Add R4, R3,R5 Add R4, R3,R7
WAW WAR
Sub R4, R8,R9 Sub R7, R8,R9
• WAW: R4 has the wrong final value. Why?
– The Add writes it after the Sub does
• WAR: the Add gets the wrong R7. Why?
– Again, the Sub writes R7 before the Sub issues.
– Actually, our reservation station doesn’t have this
problem, since the Add grabbed R7 before the Sub even
entered the pipe. But other implementations might.
ECEC 621 39
Mark Hempstead
Register renaming
• We can fix the WAW problem by register
renaming.
Ld R3, 10(R1) Ld R3, 10(R1)
Add R4, R3,R5 Add R400, R3,R5
Sub R4, R8,R9 Sub R4, R8,R9
• It seems dumb of the compiler to reuse R4 like
that; wasn’t it just asking for trouble?
– Not all ISAs have a ton of registers; the compiler may
have been forced to reuse R4.
IM Reg ALU
Res.St
Ld R3,
MUX Reg
10(R1) Ld/St pipe
Res.St
ROB FIFO
ALU DM
Load R3,10(R1)
Res.St
IM Reg ALU
Res.St
Add R4,
Ld R3,
R3,R5 Reg
10(R1) Ld/St pipe
Res.St
ROB FIFO
ALU DM
Load R3,10(R1)
Res.St
Add R4,R3,R5
EE194/Comp140 Mark Hempstead 53
Same example, with ROB
• Sub issues & enters the ROB
• Add moves to ID
• Load moves to reservation station with R1.
IM Reg ALU
Res.St
R4, Ld R3,
AddR7,
Sub
R3,R5 10(R1)
R8,R9 Reg
Ld/St pipe
Res.St
ROB FIFO
ALU DM
Load R3,10(R1)
Res.St
Add R4,R3,R5
Sub R7,R8,R9 EE194/Comp140 Mark Hempstead 54
Same example, with ROB
• Sub moves to ID
• Add moves to reservation station with R5 but without R3.
• Load moves to EX.
IM Reg ALU
Res.St
Sub R7, Add R4,
R8,R9 R3,R5 Reg
Ld R3, Ld/St pipe
Res.St
ROB FIFO 10(R1)
ALU DM
Load R3,10(R1)
Res.St
Add R4,R3,R5
Sub R7,R8,R9 EE194/Comp140 Mark Hempstead 55
Same example, with ROB
• Sub grabs R8,R9 and moves to a reservation station
• Add is stuck waiting for R3.
• Load moves to Mem and has an L1 miss.
Add R4,
Res.St Arith. pipe
R3,R5
IM Reg ALU
Res.St
Reg
Ld/St pipe Ld R3,
Res.St
ROB FIFO 10(R1)
ALU DM
Load R3,10(R1) 0x34
Res.St
Add R4,R3,R5
Sub R7,R8,R9 0x12 EE194/Comp140 Mark Hempstead 60
Same example, with ROB
• The load is at the top of the ROB & has data. It commits.
• The Add exits the pipe and puts its result into the ROB.
IM Reg ALU
Res.St
Reg
Ld/St pipe
Res.St
ROB FIFO
ALU DM
Add R4,R3,R5 0x56 Res.St
Sub R7,R8,R9 0x12
EE194/Comp140 Mark Hempstead 62
Same example, with ROB
• Now (finally!) the Sub is at the top of the ROB, and so can
commit.
IM Reg ALU
Res.St
Reg
Ld/St pipe
Res.St
ROB FIFO
ALU DM
Sub R7,R8,R9 0x12 Res.St
ECEC 621 77
Mark Hempstead
Dynamic Scheduling:
• Example
Basic Idea
DIVD F0, F2, F4
ADDD F10, F0, F8 Dependency
SUBDF12, F8, F14
Independent
• Hazard detection during decode stalls whole pipeline
• No reason to stall in these cases: Out-of-Order Execution
– Reduces stalls, improved FU utilization, more parallel
execution
– To give the appearance of sequential execution: precise
interrupts
– First, we will study without this, then see how to add them
ECEC 621 78
Mark Hempstead
Loop unrolling example
Loop: Ld r2, 100(r1)
Add r3, r3, r2
Add r1,r1,#1
J loop