Professional Documents
Culture Documents
BLUEEYES Presentation
BLUEEYES Presentation
VLIW
Scalar Processors
Fetching and executing an instruction at a time A program represents a plan of execution. The processor acts as an interpreter that executes the instruction in the program one at a time.
VLIW
Fetch
Decode
VLIW
Superscalar processors
Decision about operations by H/W More than one instruction at a time Dynamic scheduling
VLIW
REGISTER FILE
RECORD BUFFER
EXECUTION UNIT #1
EXECUTION UNIT #2
EXECUTION UNIT #3
EXECUTION UNIT #4
VLIW
DATA CACHE .
Execution in Superscalar
With degree 4
VLIW . 8
Disadvantages of Superscalar
Complexity of hardware. Window size constrained. This limits the capacity to detect independent instructions. More power consumption.
VLIW
VLIW
Very Long Instruction Word. Instructions hundereds of bits in length Uses long instruction called a Multiop Multiple functional units are concurrently used Functional units share a common register file. Code compaction by compiler.
VLIW
10
A Brief History
Joseph fisher,Trace scheduling,1979 He coined the acronym VLIW. In 1984, two companies were started
Multiflow, started by Joseph Fisher Cydrome, founded by Bob Rau.
VLIW
11
In 1987, Cydrome delivered the first machine the 256 bit Cydra 5. Multiflow delivered Trace/200 - 1987 Trace/300 - 1988 Trace/500 - 1990
VLIW
12
Since then VLIW machines have seen a revival and some degrees of success.
Multiflow closed in 1990 Cydrome closed in 1998
VLIW
13
INSTRUCTION REGISTER
REGISTER FILE
EXECUTION UNIT #1
EXECUTION UNIT #2
EXECUTION UNIT #3
EXECUTION UNIT #4
VLIW
DATA CACHE .
14
Instruction Format
FP ADD FP MULT INT ALU Branch Load/Store
FP ADD
FP MULT
Int ALU
Branch
VLIW
15
VLIW Execution
With degree 4
VLIW . 16
Case Studies
Defoe. Intel Itanium Processor.
VLIW
17
Defoe Architecture
To L2 Cache D-Cache
Load/ Store
Load/ Store
Branch/ Cmp
Dispersal Unit
From L2 Cache
VLIW
D-Cache
.
Instruction Encoding
64 bit compressed VLIW architecture. Used variable length multiops Individual operations are encoded as 32 bit words. A special stop bit indicates the end of an instruction word.
Stop bit Predicate OPCODE RDEST VLIW bit) (1 (4 bits) (9 bits)Abhilash.P.K. (6 bits) RSRC 1 RSRC 2 (6 bits) (6 bits)
19
VLIW
20
VLIW
22
Instruction Format
Instructions are either 64 or 128 bits long. Molecules and atoms. 64 GPRs
VLIW
23
Compiler Support
Instruction scheduling algorithms are critical. Three important scheduling algorithms Trace scheduling Trace scheduling-2 Super Block scheduling
VLIW . 24
Advantages
Less hardware complexity. Static Scheduling Much more hardware can be devoted to useful computation. Software has a larger window to look at.. Can find more ILP.
VLIW . 25
Shortcomings
Wasteful encoding with NOPs. Hard to maintain code compatibility between generations. Increased program size.
Compiler has to explicitly add NOP.
New versions of the architecture can force major rewriting of the compiler.
VLIW . 26
Future of VLIW
Newer processors are mainly used for Stream and image processing. Eg PhilipsTrimedia Digital Signal Processig. Eg TMS320C62x from Texas Instr Mobile computing. Eg Transmeta Crusoe High end server applications. Eg Intel Itanium
VLIW
27
Stream and media processing lend themselves to VLIW style with large amounts of ILP. Superscalars will be forced to use simpler structures and seek help from software.
VLIW
28
References
www.cs.utah.edu/~mbinu/coursework/686_vliw/ www.semiconductors.philips.com/acrobat/others/ Advanced Computer Architecture - Kai Hwang. www.entecollege.com
VLIW
29
VLIW
30