You are on page 1of 31

Lecture 6: Dynamic Scheduling with

Scoreboarding and Tomasulo Algorithm


(Section 2.4)

Nov. 9, 2004 1
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards

• Solutions for WAR


– CDC 6600: Stall Write to allow Reads to take place; Read registers only during Read
Operands stage.
• For WAW, must detect hazard: stall in the Issue stage until other completes

• Need to have multiple instructions in execution phase => multiple execution units
or pipelined execution units

• Scoreboard replaces ID with 2 stages (Issue and RO)


• Scoreboard keeps track of dependencies, state or operations
– Monitors every change in the hardware.
– Determines when to read ops, when can execute, when can wb.
– Hazard detection and resolution is centralized.

2
Four Stages of Scoreboard Control
1. Issue—decode instructions & check for structural hazards (ID1)
If a functional unit for the instruction is free and no other active instruction has the same
destination register (WAW), the scoreboard issues the instruction to the functional unit and
updates its internal data structure. If a structural or WAW hazard exists, then the instruction
issue stalls, and no further instructions will issue until these hazards are cleared.
2. Read operands—wait until no data hazards, then read operands (ID2)
A source operand is available if no earlier issued active instruction is going to write it, or if the
register containing the operand is being written by a currently active functional unit. When the
source operands are available, the scoreboard tells the functional unit to proceed to read the
operands from the registers and begin execution. The scoreboard resolves RAW hazards
dynamically in this step, and instructions may be sent into execution out of order.

3
Four Stages of Scoreboard Control
3.Execution—operate on operands (EX)
The functional unit begins execution upon receiving operands. When the result is
ready, it notifies the scoreboard that it has completed execution.
4.Write result—finish execution (WB)
Once the scoreboard is aware that the functional unit has completed execution,
the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it
stalls the instruction.
Example:
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
CDC 6600 has one integer, 2 FP multipliers, 1 FP divide, 1 FP add units.
See Fig. A.50.

4
Three Parts of the Scoreboard
1. Instruction status—which of 4 steps the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each
functional unit
Busy—Indicates whether the unit is busy or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready and not yet read. Set to
No after operand are read.

3. Register result status—Indicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that register

5
Scoreboard Example Cycle 7
Read Execution Write Note:
Instruction status
Instruction j k Issue operands
complete Result (1) In-order Issue
(2) I2 could not be issued at
LD F6 34+ R2 1 2 3 4 cycle 2 due to structural hazard
LD F2 45+ R3 5 6 7 (3) I3 issued in cycle 6, but
MULTD F0 F2 F4 6 stalled at read because I2 isn’t
SUBD F8 F6 F2 7 complete
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Subd F8 F6 F2 Integer Yes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult Integer Add
6
27
Review: Scoreboard
• Limitations of 6600 scoreboard
– No forwarding
– Limited to instructions in basic block (small window)
– Large number of functional units (structural hazards)
– Stall on WAR hazards
– Stall on WAW hazards
DIV.DF0, F2, F4
ADD.D F6, F0, F8
WAR S.D F6, 0(R1)
SUB.D F8, F10, F14 WAW
Antidependence MUL.D F6, F10, F8 Output dependence

Name dependence

Nov. 2, 2004 Lec. 7 8


Dynamic Scheduling: Tomasulo Algorithm

• For IBM 360/91 about 3 years after CDC 6600 that proposed
scoreboarding
• Goal: High Performance without special compilers
• Differences between Tomasulo Algorithm & Scoreboard
– Control & buffers distributed with Function Units vs. centralized in
scoreboard; called “reservation stations”
– Registers in instructions replaced by pointers to reservation station buffer
– HW renaming of registers to avoid WAW hazards
– Buffer operand values to avoid WAR hazards
– Common Data Bus broadcasts results to all FUs
– Load and Stores treated as FUs as well
• Why study? Lead to Alpha 21264, HP 8000, MIPS 10000,
Pentium II, Power PC 604 …
Nov. 2, 2004 Lec. 7 9
FP unit and load-store unit using Tomasulo’s alg.

Nov. 2, 2004 Lec. 7 10


Dynamic Algorithm: Tomasulo Algorithm
DIV.D F0, F2, F4
ADD.D S, F0, F8
S.D S, 0(R1) register renaming
SUB.D T, F10, F14
MUL.D F6, F10, T
• Implemented through reservation stations (rs) per functional unit
– Buffers an operand as soon as it is available – avoids WAR hazards.
– Pending instr. designate rs that will provide their inputs – avoids WAW hazards.
– The last write in a sequence of same-register-writing actually updates the
register
– Decentralize hazard detection and execution control
– Instruction results are passed directly to the FU from rs rather than from
registers
 Through common data bus (CDB)

Nov. 2, 2004 Lec. 7 11


Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
Stall if structural hazard, ie. no space in the rs. If reservation station (rs) is free, the
issue logic issues instr to rs & read operands into rs if ready (Register renaming
=> Solves WAR). Make status of destination register waiting for this latest instn
even if the previous instn writing to this register hasn’t completed => Solves WAW
hazards.
2. Execution—operate on operands (EX)
When both operands are ready then execute;
if not ready, watch CDB for result – Solves RAW
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available. Write result into dest. reg. if its status is r. =>
Solves WAW.
• Normal data bus: data + destination (“go to” bus)
• CDB: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address
– Write if matches expected Functional Unit (produces result)
– Does broadcast

Nov. 2, 2004 Lec. 7 12


Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Vj, Vk— Value of the source operand.
Qj, Qk— Name of the RS that would provide the source
operands. Value zero means the source operands already
available in Vj or Vk, or is not necessary.
Busy—Indicates reservation station or FU is busy

Register File Status Qi:


Qi —Indicates which functional unit will write each register, if
one exists. Blank (0) when no pending instructions that will write
that register meaning that the value is already available.

Nov. 2, 2004 Lec. 7 13


Tomasulo Status pp. 99
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 x x x Load1 No
LD F2 45+ R3 x x Load2 Yes
MULTD F0 F2 F4 x
SUBD F8 F6 F2 x
DIVD F10 F0 F6 x
ADDD F6 F8 F2 x
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk A
Load1 No
Load2 yes Load 45+Regs[R3]
0 Add1 yes SUB Mem[34+Regs[R2]] Load2
0 Add2 yes ADD Add1 Load2
Add3 No
0 Mult1 yes MUL Regs[F4] Load2
0 Mult2 yes DIV Mem[34+Regs[R2]] Mult1

Register result status


Field F0 F2 F4 F6 F8 F10 F12

14
Tomasulo Example Cycle 0
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
Nov. 2, 2004 Lec. 7 15
Tomasulo Example Cycle 1
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
Nov. 2, 2004 Lec. 7 16
Tomasulo Example Cycle 2
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2- Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2 Assume Load takes 2 cycles
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
Nov. 2, 2004 Lec. 7 17
Tomasulo Example Cycle 3
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 Load1 Yes 34+R2
LD F2 45+ R3 2 3- Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No read value
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
Nov. 2, 2004 Lec. 7 18
Tomasulo Example Cycle 4
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) Load2
0 Add2 No
Add3 No
0 Mult1 Yes Mult R(F4) Load2
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1
Nov. 2, 2004 Lec. 7 19
Tomasulo Example Cycle 5
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes Sub M(A1) M(A2)
0 Add2 No
Add3 No
10 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2
Nov. 2, 2004 Lec. 7 20
Tomasulo Example Cycle 6
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 --
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
9 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2
Nov. 2, 2004 Lec. 7 21
Tomasulo Example Cycle 7
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes Sub M(A1) M(A2)
0 Add2 Yes Add M(A2) Add1
Add3 No
8 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
Nov. 2, 2004 Lec. 7 22
Tomasulo Example Cycle 8
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
2 Add2 Yes Add M1-M2 M(A2)
Add3 No
7 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 M1-M2 Mult2
Nov. 2, 2004 Lec. 7 23
Tomasulo Example Cycle 9
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 --
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
1 Add2 Yes Add M1-M2 M(A2)
Add3 No
6 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 M1-M2 Mult2
Nov. 2, 2004 Lec. 7 24
Tomasulo Example Cycle 10
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 Yes Add M1-M2 M(A2)
Add3 No
5 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 M1-M2 Mult2
Nov. 2, 2004 Lec. 7 25
Tomasulo Example Cycle 11
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
Nov. 2, 2004 Lec. 7 26
Tomasulo Example Cycle 12
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
4 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
Nov. 2, 2004 Lec. 7 27
Tomasulo Example Cycle 15
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
0 Mult1 Yes Mult M(A2) R(F4)
0 Mult2 Yes Div M(A1) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) M1-M2+M(A2)
M1-M2 Mult2
Nov. 2, 2004 Lec. 7 28
Tomasulo Example Cycle 16
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
Nov. 2, 2004 Lec. 7 29
Tomasulo Example Cycle 56
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes Div M*F4 M(A1)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 Mult2
Nov. 2, 2004 Lec. 7 30
Tomasulo Example Cycle 57
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 1 2--3 4 Load1 No
LD F2 45+ R3 2 3--4 5 Load2 No
MULTD F0 F2 F4 3 6 -- 15 16 Load3 No
SUBD F8 F6 F2 4 6 -- 7 8
DIVD F10 F0 F6 5 17 -- 56 57
ADDD F6 F8 F2 6 9 -- 10 11
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
57 FU M*F4 M(A2) M1-M2+M(A2)
M1-M2 result
Nov. 2, 2004 Lec. 7 31

You might also like