This action might not be possible to undo. Are you sure you want to continue?
§ § § § § §
This exam has 8 pages (including a cheat sheet at the end). Read instructions carefully! You have 50 minutes, so budget your time! No written references or calculators are allowed. To make sure you receive credit, please write clearly and show your work. We will not answer questions regarding course material.
Question 1 2 3 Total
Maximum 30 50 20 100
Part (a) The single-cycle datapath from lecture appears below.0] Sign extend MemWrite Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 0 M u x 1 ALUSrc ALU Zero Result Read Read address data Write address Data Write memory data MemRead MemToReg 1 M u x 0 Add M u x 1 PCSrc Registers ALUOp 2 . Clearly mark all wires that are active during the execution of lw instruction.11] x 1 RegDst I [15 . do parts (a) and (b) as a small warm-up exercise. Field Bits op = 0 31-26 rs 25-21 0 20-6 func = 8 5-0 Before implementing the jr instruction.16] 0 M u I [15 . This instruction is useful only if we can also return from function calls.21] I [20 . The format of jr instruction is shown below. you have seen how to implement jal instruction. (10 points) Part (b) On the diagram below. implement jr rs instruction (jump to register rs) in the single-cycle datapath. write (next to the signal’s name) values of all non-0 control signals required for the lw instruction.Question 1: Single-cycle CPU implementation (30 points) In sections. To perform the return. (5 points) 0 Add PC 4 Shift left 2 RegWrite Read Instruaddress ction [31-0] Instruction memory I [25 .
11] x 1 RegDst I [15 . Part (d) On the diagram below. Memory and Register file all take 2ns. do not modify the main functional units themselves (the memory.0] Sign extend MemWrite Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 0 M u x 1 ALUSrc ALU Zero Result Read Read address data Write address Data Write memory data MemRead MemToReg 1 M u x 0 Add M u x 1 PCSrc Registers ALUOp 3 . (5 points) 0 Add PC 4 Shift left 2 RegWrite Read Instruaddress ction [31-0] Instruction memory I [25 . and everything else is instantaneous. full points will only be rewarded to solutions that do not lengthen the clock cycle. write (next to the signal’s name) values of all non-0 control signals required for the jr instruction. You should only add wires and muxes to the datapath. Assume that the ALU. Try to keep your diagram neat! (10 points) Note: While we’re primarily concerned about correctness.Question 1 continued Part (c) The single-cycle datapath from lecture (same as on the previous page) appears below. Show what changes are needed to support jr instruction.21] I [20 .16] 0 M u I [15 . register file and ALU).
return the greater of 2 inputs): alu_result = (A_input > B_input) ? A_input : B_input. ALUOp for this instruction is MAX2..e. Assume that the ALU. register file. Try to keep your diagram neat! (10 points) Note: While we’re primarily concerned about correctness. rm) Note that register rd is both an input and an output. full points will only be rewarded to solutions that use a minimal number of cycles and do not lengthen the clock cycle. rm # rd = max(rs. rt. Instruction max4 has the following format: Field Bits op 31-26 rs 25-21 rt 20-16 rd 15-11 rm 10-6 func 5-0 Part (a) The multicycle datapath from lecture appears below. rd. implement the max4 instruction that writes into register rd the largest value of 4 registers: max4 rs. PCWrite PC IorD ALUSrcA 0 RegDst 0 M u x 1 MemRead Address IRWrite Memory Write data Mem Data [31-26] [25-21] [20-16] [15-11] [15-0] Instr register Memory data register 0 M u x 1 Sign extend Shift left 2 0 M u x 1 Read reg 1 Read reg 2 Write register Write data Read data 1 Read data 2 Register file A RegWrite M u x 1 ALU Zero Result 0 4 1 2 3 ALUOp ALU Out 0 M u x 1 PCSrc B MemWrite ALUSrcB MemToReg 4 . You should only add wires and muxes to the datapath. Show what changes are needed to support max4. and everything else is instantaneous. Given this improved ALU.Question 2: Multi-cycle CPU implementation and its performance (50 points) Assume that the ALU can perform the max2 operation (i. rt. rd. and ALU). Memory and Register file all take 2ns. do not modify the main functional units themselves (the memory.
Remember to account for any control signals that you added or modified in the previous part of the question! (15 points) Branch completion Instruction fetch and PC increment IorD = 0 MemRead = 1 IRWrite = 1 ALUSrcA = 0 ALUSrcB = 01 ALUOp = ADD PCSource = 0 PCWrite = 1 Register fetch and branch computation Op = BEQ ALUSrcA = 1 ALUSrcB = 00 ALUOp = SUB PCSource = 1 PCWrite = Zero ALUSrcA = 0 ALUSrcB = 11 ALUOp = ADD Op = R-type Op = MAX4 ALUSrcA = 1 ALUSrcB = 00 ALUOp = func R-type execution RegDst = 1 MemToReg = 0 RegWrite = 1 Writeback Effective address computation Op = LW/SW ALUSrcA = 1 Op = SW ALUSrcB = 10 ALUOp = ADD IorD = 1 MemWrite = 1 Memory write Op = LW Memory read lw register write IorD = 1 MemRead = 1 RegDst = 0 MemToReg = 1 RegWrite = 1 5 .Question 2 continued Part (b) Complete this finite state machine diagram for the max4 instruction. Control values not shown in each stage are assumed to be 0.
0(a0) a1. 4(a0) t1. 4 t1. (5 points) 6 . the second of which uses the max4 instruction: Program 1 lw add label: add lw slt bne move skip: bne jr v0. 0(a0) t0. a1. 12 a0. t1. (5 points) Part (d) The max4 instruction can be used in place of three branch instructions. assume slt and move can be considered as R-type instructions and jr and bne take the same amount of time as beq. 12(a0) v0. Also assume that the first element in the input array is the largest (so that the first branch of program 1 is always taken). a0. v0 t2. Below are two functionally equivalent programs. a0. t1 a0. reducing the number of instructions that need to be executed. t2 $ra Assume the datapath and control that you implemented in parts (a) and (b). t0. label $ra Program 2 lw lw lw lw max4 jr v0. $zero. Fill in the missing control signals in the FSM diagram. How much faster (or slower) is program 2 than program 1? You may leave your answer as a fraction.Question 2 continued Part (c) The FSM diagram on the previous page is incomplete. 8(a0) t2. t1. (15 points) Part (e) What is average CPI of program 1? You may leave your answer as an expression. skip v0. The control signals for the fourth cycle of sw instruction are missing. 0(a0) t2.
beq.time required on multi-cycle datapath (from q. add.minimum time to perform each instruction . 2).0Ghz Pentium4 x86 processor be if it achieved a CPI of 0.5 on a workload? (5 points) 7 . calculate: .time required on a single-cycle datapath (from q. State any assumptions. 1) .Question 3: More performance (20 points) Part (a) Assume the following delays for the main functional units: Functional Unit Memory Given the following instructions: lw. sw. ALU Register File Time delay 5 ns 4 ns 3 ns Write your answers in the table below. (15 points) Time to perform the instruction: Instruction lw sw add beq minimum time in single-cycle in multi-cycle Part (b) How much faster than a 1GHz single-cycle MIPS processor would a 3.
P = Number of instructions executedP × CPIX. Speedup is a metric for relative performance of 2 executions: Speedup = Performance after improvement ⁄ Performance before improvement = Execution time before improvement / Execution time after improvement 8 .Performance 1. CPI is the average number of clock cycles per instruction: CPI = Number of cycles needed ⁄ Number of instructions executed 3.P × Clock cycle timeX 2. Formula for computing the CPU time of a program P running on a machine X: CPU timeX.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.