Professional Documents
Culture Documents
; F0=array element ; add scalar in F2 ; store result ; decrement pointer ; 8 bytes (per DW)
BNE
; branch R1!=R2
b) Stalls for parts (a) are shown as follows: Clock cycle issued Loop: L.D stall ADD.D stall stall S.D DADDUI stall BNE R1, R2, Loop F4, 0(R1) R1, R1, #-8 F4, F0, F2 F0, 0(R1) 1 2 3 4 5 6 7 8 9
c) Schedule code produced in (b) as follow -Loop: L.D DADDUI ADD.D stall stall S.D BNE d) 7 clock cycles are saved. e) We unroll the loop as Loop: L.D ADD.D S.D L.D F0, 0(R1) F4, F0, F2 F4, 0(R1) F6, -8(R1) ; drop DADDUI & BNE F4, 8(R1) R1, R2, Loop F0, 0(R1) R1, R1, #-8 F4, F0, F2
ADD.D S.D L.D ADD.D S.D DADDUI BNE f) We schedule the unrolled loop as follows: Loop: L.D L.D L.D ADD.D ADD.D ADD.D S.D S.D DADDUI S.D BNE
F8, F6, F2 F8, -8(R1) F10, -16(R1) F12, F10, F2 F12, -16(R1) R1, R1, #-24 R1, R2, Loop ; drop DADDUI & BNE ; drop DADDUI & BNE
F0, 0(R1) F6, -8(R1) F10, -16(R1) F4, F0, F2 F8, F6, F2 F12, F10, F2 F4, 0(R1) F8, -8(R1) R1, R1, #-24 F12, 8(R1) R1, R2, Loop
2.
Instruction status
Instruction L.D L.D F2, 32(R1) F6,44(R2) Issue Execute result
MUL.D F8, F6, F4 SUB.D DIV.D ADD.D F0, F6, F2 F10, F8, F2 F2, F0, F6
Reservation stations
Name Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Busy no yes yes yes no yes yes MUL DIV Regs[F4] Mem[34+ Regs[R1]] Load2 Mult1 LOAD SUB ADD Mem[34+ Regs[R1]] Load2 Add1 Load2 45+ Regs[R2] Op Vj Vk Qj Qk A
Register status
Field Qi F0 Add1 F2 Add2 F4 F6 Load2 F8 Mult1 F10 Mult2 F12 F30