Professional Documents
Culture Documents
VLIWspr09 PDF File PDF
VLIWspr09 PDF File PDF
VLIW hardware is simple and straightforward, VLIW separately directs each functional unit
add r1,r2,r3
load r4,r5+4
mov r6,r2
mul r7,r8,r9
FU
FU
FU
FU
datapath control
Bit Slice ALU Bit Slice ALU Bit Slice ALU
Aggressive (but tedious) hand programming at the microcode level provided performance well above sequential processors. Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel
VLIW Instruction
100 - 1000 bits
Multiple functional units execute all of the operations in an instruction concurrently, providing fine-grain parallelism within each instruction Instructions directly control the hardware with no interpretation and minimal decoding. A powerful optimizing compiler is responsible for locating and extracting ILP from the program and for scheduling operations to exploit the available parallel resources The processor does not make any run-time control decisions below the program level
Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel
Functional Unit
Functional Unit
Functional Unit
Functional Unit
Instruction Memory
Sequencer
Condition Codes
FAdd (1 cycle)
FDiv 16 cycle
Compatibility issues still limit interest in general-purpose VLIW technology However, VLIW may be the only way to build 8-16 operation/cycle machines.
Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel
Instruction Memory
Sequencer
Condition Codes
Example
A: B: C: D: r1 = a + b r2 = c + d e = r1 * r2 b=b+1 3-Address Code
Original Program A B
Dependency Graph
00: 01: add a,b,r1 mul r1,r2,e add c,d,r2 nop add b,1,b nop
VLIW Instructions
Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel
2
2 3
3
4
3
5
3
6
1 6 9 3 2 10 4 7 11 5 8
2
7 8
2
9
1
10
1
11
2
12
12 13
1
13
Difficulties with variable memory latencies (caching) VLIW memory systems can be very complex
- Simple memory systems may provide very low performance - Program controlled multi-layer, multi-banked memory
Exceptions and interrupts are easily managed Run-time behavior is highly predictable
- Allows real-time applications. - Greater potential for code optimization.
Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel
Superscalar
yes no yes yes maybe yes (Resv. Stations) yes (renaming) yes yes no no
DSP Processors (TI TMS320C6x ) Intel/HP EPIC IA-64 (Explicitly Parallel Instruction Comp.) Transmeta Crusoe (x86 on VLIW??) Sun MAJC (Microarchitecture for Java Computing)
Copyright 2001, James C. Hoe, CMU and John P. Shen, Intel