Professional Documents
Culture Documents
Vliw/Epic:: Statically Scheduled ILP
Vliw/Epic:: Statically Scheduled ILP
Joel Emer
Joel Emer
November 28, 2005
6.823, L21-2
Little’s Law
or
N = T ×
L
Throughput per Cycle
One Operation
Latency in Cycles
Joel Emer
November 28, 2005
6.823, L21-3
Issue Width W
Issue Group
Previously
Issued Lifetime L
Instructions
MIPS R10000
Control
Logic
[ SGI/MIPS
Technologies
Inc., 1995 ]
Joel Emer
Superscalar processor
The compiler:
• Schedules to maximize parallel execution
operations/instruction
Loop Unrolling
November 28, 2005
6.823, L21-11
Unroll 4 ways
loop: ld f1, 0(r1) Int1 Int 2 M1 M2 FP+ FPx
ld f2, 8(r1)
add r2 bne sd f8
add r2, 32
Software Pipelining
6.823, L21-13
sd f5 fadd f5
sd f8, -8(r2)
sd f6 fadd f6
bne r1, r3, loop
epilog
add r2 sd f7 fadd f7
bne sd f8 fadd f8
How many FLOPS/cycle?
sd f5
4 fadds / 4 cycles = 1
Joel Emer
Software Pipelining
November 28, 2005
6.823, L21-14
Startup overhead
Loop Iteration
time
Software Pipelined
performance
Loop Iteration
time
Joel Emer
November 28, 2005
ld r1, ()
Prolog
ld r1, () add r2, r1, #1
Loop ld r1, () add r2, r1, #1 st r2, ()
add r2, r1, #1 st r2, ()
Epilog
st r2, ()
P7
P6
RRB=3 P5
P4
P3
P2
R1 + P1
P0
Rotating Register Base (RRB) register points to base of
current register set. Value added on to logical register
specifier to give physical register number. Usually, split
into rotating and non-rotating registers.
Cydra-5:
6.823, L21-22
Joel Emer
November 28, 2005
6.823, L21-24
Joel Emer
November 28, 2005
6.823, L21-26
IA-64 Registers
b0: Inst 1 if
Inst 2
br a==b, b2
Inst 1
b1: Inst 3 else Inst 2
Inst 4 p1,p2 <- cmp(a==b)
br b3 Predication (p1) Inst 3 || (p2) Inst 5
(p1) Inst 4 || (p2) Inst 6
b2: Inst 5 Inst 7
then Inst 8
Inst 6
Dynamic Execution
Joel Emer
November 28, 2005
6.823, L21-29
0x4 nop
E M W
IR IR IR
Add
ASrc 31
we
rs1
rs2
A
PC addr D rd1 we
inst IR ws ALU Y addr
wd rd2
Inst GPRs B rdata
Memory Data
Imm Memory R
wdata
Ext wdata
BSrc
MD1 MD2
Load r1
Use r1 Check for exception in
Inst 3 Chk.s r1 original home block
Use r1 jumps to fixup code if
Inst 3
Can’t move load above branch exception detected
because might cause spurious
exception
Joel Emer
November 28, 2005
Clustered VLIW
6.823, L21-32
• Unpredictable branches
• Variable memory latency
(unpredictable cache misses)
• Code size explosion
• Compiler complexity
Question:
How applicable are the VLIW-inspired
techniques to traditional RISC/CISC
processor architectures?
Joel Emer
November 28, 2005
6.823, L21-34
Thank you !