You are on page 1of 4

a) Loop:

L.D ADD.D S.D DADDUI

F0, 0(R1) F4, F0, F2 F4, 0(R1) R1, R1, #-8

; F0=array element ; add scalar in F2 ; store result ; decrement pointer ; 8 bytes (per DW)

BNE

R1, R2, Loop

; branch R1!=R2

b) Stalls for parts (a) are shown as follows: Clock cycle issued Loop: L.D stall ADD.D stall stall S.D DADDUI stall BNE R1, R2, Loop F4, 0(R1) R1, R1, #-8 F4, F0, F2 F0, 0(R1) 1 2 3 4 5 6 7 8 9

c) Schedule code produced in (b) as follow -Loop: L.D DADDUI ADD.D stall stall S.D BNE d) 7 clock cycles are saved. e) We unroll the loop as Loop: L.D ADD.D S.D L.D F0, 0(R1) F4, F0, F2 F4, 0(R1) F6, -8(R1) ; drop DADDUI & BNE F4, 8(R1) R1, R2, Loop F0, 0(R1) R1, R1, #-8 F4, F0, F2

ADD.D S.D L.D ADD.D S.D DADDUI BNE f) We schedule the unrolled loop as follows: Loop: L.D L.D L.D ADD.D ADD.D ADD.D S.D S.D DADDUI S.D BNE

F8, F6, F2 F8, -8(R1) F10, -16(R1) F12, F10, F2 F12, -16(R1) R1, R1, #-24 R1, R2, Loop ; drop DADDUI & BNE ; drop DADDUI & BNE

F0, 0(R1) F6, -8(R1) F10, -16(R1) F4, F0, F2 F8, F6, F2 F12, F10, F2 F4, 0(R1) F8, -8(R1) R1, R1, #-24 F12, 8(R1) R1, R2, Loop

f) The execution time is 11 clock cycles.

2.

Instruction status
Instruction L.D L.D F2, 32(R1) F6,44(R2) Issue Execute result

MUL.D F8, F6, F4 SUB.D DIV.D ADD.D F0, F6, F2 F10, F8, F2 F2, F0, F6

Reservation stations
Name Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Busy no yes yes yes no yes yes MUL DIV Regs[F4] Mem[34+ Regs[R1]] Load2 Mult1 LOAD SUB ADD Mem[34+ Regs[R1]] Load2 Add1 Load2 45+ Regs[R2] Op Vj Vk Qj Qk A

Register status
Field Qi F0 Add1 F2 Add2 F4 F6 Load2 F8 Mult1 F10 Mult2 F12 F30

You might also like