You are on page 1of 9

ECE 313 Computer Organization Name SOLUTION

FINAL EXAM December 14, 2002

This exam is open book and open notes. You have 2 hours.

Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which
implements update addressing – an addressing mode that is used in the PowerPC (see p.
177). The assembly language form of lwu and its register transfers are shown below:

New Instruction Equivalent MIPS Instructions


lwu rt, immed(rs) lw rt, immed(rs);
addi rs, rs, immed
Register Transfers
address <- Reg[rs] + sign_extend(immed);
Reg[rt] <- MEM[address];
Reg[rs] <- address;

1. Multicycle Processor Design 20 Points

Modfiy the multicycle processor design to efficiently implement the lwu instruction. Mark
changes on the state diagram below and the datapath diagram on the next page.

Instruction decode /
0 Instruction Fetch register fetch
MemRead
ALUSrcA = 0 1
IorD = 0
IRWrite ALUSrcA = 0
ALUSrcB = 01 ALUSrcB = 11
Start ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00

(OP = ‘JMP’)

Memory address Branch Jump


computation Execution Completion Completion
2 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 00 PCWrite
ALUSrcB = 10 ALUOp = 01
ALUOp = 10 PCSource = 10
ALUOp = 00 PCWriteCond
PCSource = 01

(OP =
(OP = (‘SW’)
‘LW’)
Memory Memory
access access R-type completion
10 3 5 7
MemRead
IorD = 1 RegDst = 1
MemRead MemWrite RegWrite
RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
MemToReg=0
RegDst=2

Writeback step
4

RegWrite
MemToReg=1
RegDst = 0

Page 1 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

Page 2 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

3. Pipelined Processor Design 20 Points

Modify the pipelined processor datapath and control to implement the lwu instruction.
Note that in a pipelined implementation all register updates should take place during the WB
stage. To make this possible, the register file has been extended with a second “write port”
so that it can write two registers at the same time.

Mark any changes to the datapath on the diagram on the next page. In addition, show all
control outputs in the table below:

EX Stage MEM Stage WB Stage


Control Lines Control Lines Control Lines
Instr. Reg ALU ALU ALU Branch Mem Mem Reg Memto Reg
Dst Op1 Op0 Src Read Write Write Reg Write2
lwu 0 0 0 1 0 1 0 1 0 1

Page 3 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

Page 4 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

4. Data Hazards & Forwarding 20 Points

The following sequence of MIPS instructions includes the new lwu instruction. Assume
that this sequence is executing on the modified pipeline design from Problem 3, but that
this design is altered to perform forwarding as in Fig. 6.40 on p. 484.

lwu $4, 200($1)


add $6, $1, $7
sub $5, $6, $4

(a) Circle any data dependencies which exist between these instructions.

(b) Note that the lwu instruction writes the rs register as well as the rt register. Can the
data dependencies on the rs register be resolved by forwarding alone, or will stalling
be necessary? Why or why not?

No forwarding is necessary!

(c) Fill in the multicycle diagram shown below to show the execution of the instruction
sequence, including stalls (if any). Shade active stages and show forwarding.

0 2 4 6 8 10 12 14

$4 $4
lwu $4, 200($1) IF ID EX MEM WB
$1 $1

$1
add $6, $1, $7 IF ID EX MEM WB
$7

sub $5, $6, $4 $6


IF ID EX MEM WB
$4

IF ID EX MEM WB

Page 5 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

5. Pipelined Processor Timing 10 Points

This problem refers to the pipelined datapath used in Problem 3. Assume that the pipelined
datapath components have the same delay characteristics as the single-cycle components
described on page 373 of the book – ALU and memory have a 2ns delay, register file read
and write each have a 1ns delay.

(a) Assume that all other components have no delay. What is the minimum clock
period at which this design can operate properly?

Solution: Worst case register-register delay = 2ns

(b) Now assume that in addition to the delays given above, the delay of multiplexers is
0.1ns. What is the minimum clock period at which this design can operate
properly? What stages limit the execution time?

Solution: Worst case register-register delay = 2.1ns

Page 6 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

6. Short Answers 10 Points

Provide a short answer for each of the following questions:

(a) When would a compiler be able to use the lwu instruction to increase the speed of a
program?

Solution: When adjacent memory locations or array elements


are accessed in a loop, we can perform both the memory access
and address update to access the next element in one
instruction. The original MIPS will require 2.

(b) Why do RISC architectures use fixed-width instructions?

Solution: To make instructions easy to decode and the


implementation simpler. Variable-width instructions save
memory, but require that the hardware handle instructions of
varying width.

(c) What are the steps required to add two floating point numbers?

Solution: 1. Shift one number to make the exponents equal.


2. Perform the addition
3. Round the result
4. Normalize the result and adjust the exponent

(d) What is the motivation for out-of-order execution in dynamic pipelining?

Solution: Out-of order execution allows the execution of


instructions to start execution even though earlier
instruction have stalled, which increases performance.

(e) When does a page fault occur?

Solution: When a processor with virtual memory attempts to


access instructions or data that are not currently stored in
the main memory but instead stored on disk.

Page 7 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

7. Cache Memories 20 Points

The diagram on the next page shows a cache memory design which contains 8 blocks of
four words each. Note that on a cache miss all four words in a block are replaced at the
same time.

(a) How many bits will there be in the “Block Offset” field of the address?

Solution: Two. Oops, diagram shows this already! But index


is 3 bits!

(b) How many bits will there be in the “Tag” field of each address?

Solution: 32 bit address


– 2 bit byte offset
– 2 bit block offset
– 3 bit block index
= 25 bits

(c) How many bits of storage will be required for this cache memory?

Solution: (128 bits + 25 bits + 1 valid bit) X 8 = 1232 bits

Hit Tag Data

Tag 2 Byte
Index Offset Block Offset

128 Bits
V Tag Data
0 1 2 3 Block 0
4 5 6 7 Block 1
8 9 10 11 Block 2
12 13 14 15 Block 3
16 17 18 19 Block 4
20 21 22 23 Block 5
24 25 26 27 Block 6
28 29 30 31 Block 7
32 32 32 32

Page 8 of 9
ECE 313 Computer Organization Name SOLUTION
FINAL EXAM December 14, 2002

(d) Given the word references below, mark each reference as a hit or miss and show the
cache contents in the table below.

Reference 1 4 8 5 20 17 19 56 9 11 4 43 99 6 9 17
Hit/Miss

0 1 2 3 Block 0
4 5 6 7 Block 1
8 9 10 11 Block 2
12 13 14 15 Block 3
16 17 18 19 Block 4
20 21 22 23 Block 5
24 25 26 27 Block 6
28 29 30 31 Block 7

Page 9 of 9

You might also like