Professional Documents
Culture Documents
ILP Techniques: Laxmi N. Bhuyan CS 162 Spring 2003
ILP Techniques: Laxmi N. Bhuyan CS 162 Spring 2003
ILP Techniques
Laxmi N. Bhuyan
CS 162
Spring 2003
F0,F2,F4
F10,F0,F8
F12,F8,F14
Pentium Datapath
Pentium consists of two pipes (U-pipe
and V-pipe) operating in parallel. U-pipe
contains an 8-stage FP pipeline (see
Pentium Figure)
Two stages of Decode Decode and
control one stage Register read 2nd
stage
See I-cache and D-cache in Fig. 6-1.
What is TLB? How does the Virtual
memory work?
Scoreboard Implications
Scoreboard replaces ID, EX, WB with 4 stages
Out-of-order completion => WAR, WAW hazards?
Solutions for WAR => Wait at the WB stage until the
other instruction completes
For WAW, must detect hazard at the ID stage: stall
until other completes
Need to have multiple instructions in execution
phase => multiple execution units or pipelined
execution units
Scoreboard keeps track of dependencies, state or
operations
Wait until
Bookkeeping
Issue
Read
operands
Rj and Rk
Rj No; Rk No
Execution
complete
Functional unit
done
f((Fj( f )Fi(FU)
or Rj( f )=No) &
Write result
(Fk( f ) Fi(FU) or
Rk( f )=No))
Summary
Instruction Level Parallelism (ILP) in SW or HW
Loop level parallelism is easiest to see
SW parallelism dependencies defined for program,
hazards if HW cannot resolve
SW dependencies/compiler sophistication determine if
compiler can unroll loops
Memory dependencies hardest to determine
HW exploiting ILP
Works when cant know dependence at run time
Code for one machine runs well on another
Tomasulo Algorithm
Results to FU from RS, not through registers, over Common Data Bus
that broadcasts results to all FUs
Load and Stores treated as FUs with RSs as well
Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue
DAP Spr.98 UCB 13
Tomasulo Organization
FP Op Queue
FP
Registers
Load
Buffer
Common
Data
Bus
FP Add
Res.
Station
Store
Buffer
FP Mul
Res.
Station
DAP Spr.98 UCB 14
Tomasulo v. Scoreboard
(IBM 360/91 v. CDC 6600)
Pipelined Functional Units
Multiple Functional Units
(6 load, 3 store, 3 +, 2 x/)
(1 load/store, 1 + , 2 x, 1 )
window size: 14 instructions
5 instructions
No issue on structural hazard
same
WAR: renaming avoids
stall completion
WAW: renaming avoids
stall completion
Broadcast results from FU
Write/read registers
distributed reservation stations
central scoreboard
Tomasulo Drawbacks
Complexity
delays of 360/91, MIPS 10000, IBM 620?
Tomasulo Summary
Reservations stations: renaming to larger set of registers +
buffering source operands
Prevents registers as bottleneck
Avoids WAR, WAW hazards of Scoreboard
Allows loop unrolling in HW
Reorder
Buffer
FP Regs
Res Stations
FP Adder
Renaming Registers
Common variation of speculative design
Reorder buffer keeps instruction information
but not the result
Extend register file with extra
renaming registers to hold speculative results
Rename register allocated at issue;
result into rename register on execution complete;
rename register into real register on commit
Operands read either from register file
(real or speculative) or via Common Data Bus
Advantage: operands are always from single source (extended
register file)