Professional Documents
Culture Documents
Only 6 instructions
CKV
Vector Execution Time
Execution Time depends on three factors:
V3 <- V1 * V2
Simplifies control of deep pipeline because elements in
vector are independent No Hazards
CKV
Vector Execution Time
ADDVVV4, V2, V3
Separate convoys?
Sequences with RAW hazards can be in the same convey
via chaining.
Vector version of register bypassing
Allows a vector operation to start as soon as the individual
elements of its vector source operand become available
CKV
Vector Execution Time
Chaining Flexible Chaining
LV V1
MULV V3, V1, V2 V1 V2 V3 V4 V5
ADDV V5, V3, V4
Chain Chain
Load Unit
Memory
CKV
Vector Chaining Advantage
Without chaining, must wait for last element of result to be
written before starting dependent instruction
Load
Mul
Add
Time
With chaining, can start dependent instruction as soon as
first result appears
Load
Mul
Add
CKV
Vector Execution Time
Timing metric to estimate the time for a convoy
Chime
Application and architecture must support long vectors.
Assume the same as Cray‐1
Floating‐point add => 6 clock cycles
Floating‐point multiply => 7 clock cycles
Floating‐point divide => 20 clock cycles
Vector load => 12 clock cycles
CKV
Multiple Lanes
A[5] B[5]
Lane
A[4] B[4]
A[3] B[3]
A[2] B[2]
A[1] B[1]
+ + + + +
C[0] C[0] C[1] C[2] C[3]
CKV
Multiple Lanes
CKV
Vector Length Registers
“Strip‐mining”
Break loops into pieces that fit into vector registers
Modify