You are on page 1of 2

CSE 490/590

Spring 2013 Homework #3 Solutions


1.
SW AND ADD r3, 0(r2) r1, r2, r3 r1, r1, r3 LW XOR ADD r1, 4(r2) r0, r1 r1, r2, r3

Data dependency is shown in red above.

IF

Data dependency is shown in red above.

IF

Penalty w/o Forwarding: 3 Cycle Reduction: 3

ID

RD

The penalty cannot be completely eliminated via forwarding. One way to eliminate it is to insert an independent instruction between the LW and the XOR. Penalty w/o Forwarding: 3 Cycle Reduction: 2

ID

RD

ALU

ALU

MEM

MEM

WB

WB

2. ADD SW SUB LW r5, r3, r6, r1, r4, r3 0(r2) r4, 0(r2) 0(r2)

There is one hazard here. It is between the SW and the SUB. It can be eliminated with the critical forwarding path is from the output of the RD stage to the input of the MD stage as shown at the right. The addition of this new stage increase the amount of external fragmentation since not every instruction will use the new stage. 3. The actual performance is lower since this equation is an oversimplification. Some reasons for this are as follows. The load cannot be perfectly balanced across all stages. Latch overhead varies to due fan-in and fan-out constraints when connected to the combinational logic in the various stages.

4.

Nonpipelined Latency = Cycle Time = 31 ns Pipelined Cycle Time = 9 ns + 0.5 ns + 1 ns = 10.5 ns Pipelined Latency = 10.5 ns * 5 = 52.5 ns Potential Speedup = 31 / 10.5 = 2.952 Internal Fragmentation = 5 * 10.5 31 = 21.5 External Fragmentation exists, but is minimized. For example, a jump instruction may not require all five states, a load instruction will not require the last (write back) stage, and an instruction that does not access memory will not require the fourth stage.

Add (2.5 ns) 0.5 ns setup


CLK Program Counter (1 ns)
32

Stage #1 Latency = 6 ns

Instruction Cache (6 ns) Latch


32

Instruction Type Decoder (3 ns) Stage #2 Latency = 9 ns

Function Decoder (2.5 ns)

Source Operand Decoder (2 ns)

Immediate Operand Decoder (2 ns)


16

Destination Operand Decoder (2 ns)

10 5

Register File (4 ns) Read Register 1 Read Register 2 Operand 1 Write Register Write Data Operand 2 MUX (1 ns)

32

Latch
1

Stage #3 Latency = 5 ns

32 32

ALU (4 ns)
32

Latch MUX (1 ns)

Stage #4 Latency = 6 ns

Data Cache (6 ns)


32

Stage #5 Latency = 5 ns

You might also like