You are on page 1of 30

VLIW ARCHITECTURE

Increasing Processor Performance


Semiconductor Technology Parallel Processing
Multiprocessors, Multicomputers

Parallelism within the Processor


Pipelining ILP

VLIW

ILP (Instruction Level Parallelism)


Parallel Execution of Instructions. Overlapping of instructions ILP processors

Superscalar processors VLIW processors.

VLIW

Scalar Processors
Fetching and executing an instruction at a time A program represents a plan of execution. The processor acts as an interpreter that executes the instruction in the program one at a time.

VLIW

Execution in a Scalar Processor

Fetch

Execute Write Back


. 5

Decode
VLIW

Superscalar processors
Decision about operations by H/W
More than one instruction at a time Dynamic scheduling

VLIW

Basic Superscalar Approach


INSTRUCTION CACHE

INSTRUCTION BUFFERS, DECODERS, DISPATCHER

REGISTER FILE

RECORD BUFFER

EXECUTION UNIT #1

EXECUTION UNIT #2

EXECUTION UNIT #3

EXECUTION UNIT #4

VLIW

DATA CACHE .

Execution in Superscalar

Fetch Decode Execute Write Back

With degree 4
VLIW . 8

Disadvantages of Superscalar
Complexity of hardware. Window size constrained. This limits the capacity to detect independent instructions. More power consumption.

VLIW

VLIW
Very Long Instruction Word. Instructions hundereds of bits in length Uses long instruction called a Multiop Multiple functional units are concurrently used Functional units share a common register file. Code compaction by compiler.

VLIW

10

A Brief History
Joseph fisher,Trace scheduling,1979 He coined the acronym VLIW. In 1984, two companies were started
Multiflow, started by Joseph Fisher
Cydrome, founded by Bob Rau.

VLIW

11

In 1987, Cydrome delivered the first machine the 256 bit Cydra 5. Multiflow delivered Trace/200 - 1987 Trace/300 - 1988 Trace/500 - 1990

VLIW

12

Since then VLIW machines have seen a revival and some degrees of success.
Multiflow closed in 1990 Cydrome closed in 1998

VLIW

13

Basic VLIW Approach


INSTRUCTION CACHE

INSTRUCTION REGISTER

REGISTER FILE

EXECUTION UNIT #1

EXECUTION UNIT #2

EXECUTION UNIT #3

EXECUTION UNIT #4

VLIW

DATA CACHE .

14

Instruction Format
FP ADD FP MULT INT ALU Branch Load/Store

Instruction Issue Unit

FP ADD

FP MULT

Int ALU

Branch

Load/ Store Register File

VLIW

15

VLIW Execution

Fetch Decode Execute Write Back

With degree 4
VLIW . 16

Case Studies
Defoe. Intel Itanium Processor.

Transmeta Crusoe Processor.

VLIW

17

Defoe Architecture
To L2 Cache D-Cache

Simple Simple Complex Integer Integer Integer

Load/ Store

Load/ Store

Branch/ Cmp

64 entry Register File

16x Pred Score Board & Fetch


18

Dispersal Unit

From L2 Cache
VLIW

D-Cache
.

Instruction Encoding
64 bit compressed VLIW architecture.

Used variable length multiops Individual operations are encoded as 32 bit words. A special stop bit indicates the end of an instruction word.
Stop bit Predicate OPCODE RDEST VLIW bit) (1 (4 bits) (9 bits)Abhilash.P.K. (6 bits) RSRC 1 RSRC 2 (6 bits) (6 bits)
19

Intel Itanium Processor


Intels first implementation of IA-64. IA-64 is an ISA for the EPIC (Explicitly Parallel Instruction Computing) style of VLIW, developed jointly by Intel and HP.

VLIW

20

64 bit processor, with


4 integer units 4 multimedia units 2 load/store units 2 extended precision floating point units 2 single precision floating point units
VLIW . 21

Transmeta Crusoe Processor


Designed to reduce power consumption. Dynamic scheduling consumes more power.

VLIW replaces the complex ways of gaining ILP with simpler and more power efficient ways.

VLIW

22

Instruction Format
Instructions are either 64 or 128 bits long. Molecules and atoms.

64 GPRs

VLIW

23

Compiler Support
Instruction scheduling algorithms are critical.
Three important scheduling algorithms Trace scheduling Trace scheduling-2 Super Block scheduling

VLIW

24

Advantages
Less hardware complexity. Static Scheduling Much more hardware can be devoted to useful computation. Software has a larger window to look at.. Can find more ILP.
VLIW . 25

Shortcomings
Wasteful encoding with NOPs. Hard to maintain code compatibility between generations.

Increased program size.


Compiler has to explicitly add NOP.

New versions of the architecture can force major rewriting of the compiler.
VLIW . 26

Future of VLIW
Newer processors are mainly used for Stream and image processing. Eg PhilipsTrimedia Digital Signal Processig. Eg TMS320C62x from Texas Instr Mobile computing. Eg Transmeta Crusoe High end server applications. Eg Intel Itanium

VLIW

27

Stream and media processing lend themselves to VLIW style with large amounts of ILP.
Superscalars will be forced to use simpler structures and seek help from software.

VLIW

28

VLIW

29

VLIW

30