You are on page 1of 12

CKV

Advanced VLSI Architecture


MEL G624

Lecture 19: Instruction Level Parallelism


CKV
Tomasulo’s Algorithm

Loop: L.D F0,0(R1)


MUL.D F4,F0,F2
S.D F4,0(R1)
DADDIU R1,R1,-8
BNE R1,R2,Loop;
MUL.D F4, F0 F2 CKV
FP Registers
L.D F0, 0 (R1)
Instruction F0
BNE R1, R2, Loop
Queue F2
DADDIU R1, R1, -8
S.D F4, 0 (R1) F4
Load-store MUL.D F4, F0 F2 F6
L.D F0, 0 (R1)
Operations F8
Store
Buffers Address Unit Floating Point F10
Load
Operations Operand
Buffers
Buses

Data Address
Memory Unit

Reservation
Stations

FP Adders FP Multipliers

Common Data Bus (CDB)


CKV
Tomasulo’s Algorithm

Loop: L.D F0,0(R1)


MUL.D F4,F0,F2
S.D F4,0(R1)
DADDIU R1,R1,-8
BNE R1,R2,Loop;

A load is before the store in program order and interchanging them


results in a WAR hazard

A store is before the load in program order and interchanging them


results in a RAW hazard

Interchanging two stores to the same address results in WAW hazard


CKV
Tomasulo’s Algorithm

Load-store Address calculations


Operations in Order
Store
Buffers Address Unit ld 56
Load
Buffers
Stall due to RAW

56

Data Address
Memory Unit
CKV
Tomasulo’s Algorithm

Load-store Address calculations


Operations in Order
Store
Buffers Address Unit sw 56
Load
Buffers
Stall due to WAW

56

Data Address
Memory Unit
CKV
Tomasulo’s Algorithm

Load-store Address calculations


Operations in Order
Store
Buffers Address Unit sw 56
Load
Buffers
Stall due to WAR

56

Data Address
Memory Unit
CKV
Tomasulo’s Algorithm
Tomasulo’s scheme was unused for many years after 360/91, but
was widely adopted in multiple-issue processors in 1990s

The presence of caches, with unpredictable memory access times, has


become major motivations dynamic scheduling.

As processors become more aggressive in their issue capability,


techniques such as register renaming, dynamic scheduling and
speculation become more important.

It can achieve high performance without requiring the compiler to


target code to a specific pipeline structure
CKV
Hardware-Based Speculation
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F4, F2
….

Loop: L.D F0,0(R1)


MUL.D F4,F0,F2
S.D F4,0(R1)
DADDIU R1,R1,-8
BNE R1,R2,Loop;
OR F2, F3, F6
……
CKV
Hardware-Based Speculation
A wide issue processor may execute a branch every clock cycle to
maintain maximum performance

Exploiting more parallelism requires that we overcome the limitation


of control dependence

Done by speculating the outcome of the branches and executing the


program as if our guesses were correct.

Mechanisms to handle situations where speculation is incorrect

Hardware-Based Speculation
CKV
Hardware-Based Speculation
Hardware-Based Speculation

Dynamic branch prediction to choose which instructions to execute

Speculation to allow execution of instructions before control


dependences are resolved (with ability to undo the effects of an
incorrectly speculated sequence)

Dynamic scheduling to deal with scheduling of different combinations


of basic blocks
CKV

Thank You for Attending

You might also like