## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Question 1 (4 points). Consider enhancing a computer by adding vector hardware. When a computation is run in vector mode on the vector hardware, it is 100 times faster than in the normal mode. We call the percentage of the original time that could be spent using the vector mode the percentage of vectorization. 1a) What percentage of vectorization is necessary to achieve a speedup of 10? (Just set up the equation to show how to get the answer.) Speedup = (Old Execution Time)/(New Execution Time) 10 = ____________________1_____________________ Time in vector mode*1/100+ Time in normal mode 10 = _____1______ f/100+(1-f) f = 10/11 1b) What percentage of the enhanced computation time is spent in vector mode if a speedup of 10 is attained? (Just set up the equation to show how to get the answer.) Percentage time = _Time in vectorization____ Total time T = f/100___ = 1/110 1-f+f/100 11/110 = 1 11

However. Execution Time = Instruction Count * Cycles per instructions * Clock cycle time = (2*109 instructions *1 cycle/instruction + 1*109 instructions *2 cycles/instruction) 10 9 cycle/s = 4s MIPS rate = Instruction Count /106 = 3000 = 750 MIPS Execution Time 4 2b) Calculate the new MIPS rate and execution time when we use the optimizing compiler. Execution Time = Instruction Count * Cycles per instructions * Clock cycle time = (1*109 instructions *1 cycle/instruction + 1*109 instructions *2cycles/instruction) 10 9 cycle/s = 3s MIPS rate = Instruction Count /106 = 2000 = 666 MIPS Execution Time 3 2c) Discuss the results in (a) and (b).) Suppose we build an optimizing compiler that discards 50% of the ALU instructions (but cannot reduce other instructions). and not necessary to calculate the final numerical results. Are there any contradictions? At first.Question 2 (6 points). (It is sufficient to derive the formulas. it would seem that a decrease in MIPS rate would lead to an increase in execution time. Assume that the original total instruction count is 3*109 and that the original ALU instruction count is 2*109. . 2a) Calculate the original MIPS rate and execution time. which is clearly not the case here. This reinforces the idea that MIPS rate is not a reliable indicator of performance in applications. and let each non-ALU instruction take 2 clock cycles. let each ALU instruction take 1 clock cycle. one notes the optimizing compiler eliminated the most lightweight instruction and so the MIPS rate will decrease as the instructions that take multiple instructions become more dominant. Let the clock rate be 1-GHz.

(R3). (R2) ADD R1. #4 ADD R1.. There are several instances where the new addressing mode performs an operation in 1 instruction that used to take 2 instructions. (R1) . no displacement). and R3. 3a) Give an advantage of this new addressing mode.g. R2 and R3. R3.e. (R2). R3. ADDI R1. Consider a new addressing mode that allows one source operand to be in memory. (R3) 3c) Consider “ADD R1. “ADD R1. 4(R3)” are illegal instructions. R2. R2. So. The loop overhead for setting up these computations would be decreased.” Suggest a simple code sequence of two instructions to simulate this illegal instruction using the new addressing mode and only registers R1. R1. (R2). (R3)” and “ADD R1. This would greatly reduce execution time for working with arrays as it would not be necessary to load the value before using it in execution.” Suggest a simple code sequence of two instructions to simulate this illegal instruction using the new addressing mode and only registers R1. e. 4(R3). R2. R3.Question 3 (6 points). 0(R2) 3b) Consider “ADD R1. LD R1. However “ADD R1. R2. R2. To reduce complexity. (R3)” adds the contents of register R2 to the contents stored at address R3 in memory. you restrict all memory addressing to be register indirect only (i. LD R1. 0(R2) ADD R3.. R1 becomes ADD R3.

2a) In class. Giving our representation as: 0 10000000 00000000000000000000000 2c) Give the IEEE representation of two and two-tenths. exponent field is 1+bias=1+127=128. 0. Consider single precision IEEE 754 representation with a “truncate” policy.0001100110011…*21. thus .2. giving our representation as.. Giving our representation as: 0 01111100 10011001100110011001100 2b) Give the IEEE representation of two.2=1. exponent field is -3+bias=-3+127=124. 8 exponent bits encoded using a bias of 127 and the remaining 23 bits used to encode the fractional part of the mantissa.110= 1. i. Normalize and add . 210 = 21*1 Thus sign field is 0.000110011001100… Give the IEEE representation of two-tenths. This is a 32-bit representation with 1 sign bit.10011001100…*2-4.100110011…*2-3=.2=2+.100110011…*2-3.0001100110011… *21 2=1*21 Thus 2.Question 4 (6 points). you learned that the quantity of one-tenth has the binary representation of 0.110 .e.210= 2* . 2. Thus sign field is 0.210=1. 0 10000000 00011001100110011001100 .210=1.

5c) What is a precise exception for a pipeline. the speed of transferring information from memory becomes critical. 5b) You are tempted to define a new instruction for a complex task that occurs frequently. • How many instructions you need to perform a task. This becomes very important on machines with out of order execution. Use exceptions to explain why this new instruction may hurt performance. Other parts of the machine may become a bottleneck for execution. 5a) Suppose that you can double the clock rate of your processor. A distinguished computer architect suggested these two factors to improve performance: • How fast you can crank up the clock. Suppose the new instruction requires many cycles for execution. that is it is no longer possible to restart the instruction and produce the proper result. it will become impossible to have a precise exception. as the processor may have to stall to wait for retrieving data from memory. if other parts of the machine are not changed. Explain why you may not see a performance improvement. In particular. A precise exception is one in which all of the instructions prior to the exception have been completed and do not need to be restarted and the state of the machine is such that it is possible to restart execution of all instructions in the pipeline after execution.Question 5 (6 points). If an instruction that takes a large number of instructions to complete is put into the pipeline and other instructions finish and overwrite the operands it reads from. .

8 ADD R2. Thus. 0(R2) ADD R1. 6c) Write a MIPS assembly code to show an example where forwarding cannot completely eliminate the data hazard stall cycles. 6a) What is data forwarding? Forwarding is making the data available to subsequent instructions as soon as the computation is complete and allowing instructions to receive this data in the beginning of the EX stage instead of retrieving it in ID. R1 Without forwarding.Question 6 (6 points). the result of R1 is not written until clock cycle five and thus the second ADD does not enter EX until cycle 6. R1 Here the data is not available until after the MEM stage in cycle 4. R1. 6b) Write a MIPS assembly code to explain how data forwarding may eliminate the data hazard stall cycle. ADD R1. LD R1. and so this cannot be forwarded to the ADD’s EX stage in cycle 4. Data cannot be forwarded backwards in time. Consider the simple 5-stage integer pipeline. R1. R3. . the results of the ALU and MEM register are given as possible source operands to the ALU. With forwarding. the R1 result is piped back to the input and the ADD is able to enter EX at cycle 4 with no stalls. this must stall and the EX of the ADD can proceed in cycle 5.

R2.Question 7 (6 points). Construct an example using MIPS assembly code to illustrate one such case. R3 Where the last SUB is in the branch delay slot. R5. consider the following loop: L1: SUB R1. R2. 4 ADD R2. R3 L1: ADD R5. R1 ADD R2. Consider the simple 5-stage integer pipeline. R2. R3 BNEZ R2. R3. then the instruction to which the branch may be chosen in the branch delay slot. R2. R5. In order this to be viable. R1 BNEZ R1. -4 ADD R3. where a branch instruction causes a one-cycle delay. This can be done if there is an independent instruction before the branch instruction. R5. R5. it must be possible to either stop execution of this instruction or remove the effects that the instruction has on the state of the machine. R3 BNEZ R2. 7a) Construct an example using MIPS assembly code to show how you may schedule the branch delay slot to always eliminate the one-cycle branch delay. L1 to SUB R1. R3 ADD R5. L1 SUB R1. R1. . 7b) There are cases where scheduling the branch delay slot may not always eliminate the one-cycle branch delay. What assumption must be made to ensure that the program will work correctly? If there are no readily available independent instructions. For example convert: L1: SUB R1. 4 ADD R2. L1 Here either ADD instruction can be scheduled in the branch delay slot because they are independent of the branch instruction.

- week01
- coa-LECTURE-1-3-AUGUST-2009
- P4 - RiscCisc Sec1b- Pipe
- Computer Architecture (Kiến trúc máy tính)
- Homework 2
- Coa
- primerHW-SWinterface
- 8086 Microproccessors Questions
- 3computer Arch Outline
- Microprocessor
- MIT6_172F10_lec04 Computer Architecture and Performance Engineering
- Lec12 Pipeline
- CA-2marks
- 330_03
- 2009 CO Midterm_sol
- Embedded Systems
- ICS 2101 Course Outline
- M.E.VLSI Design
- Chapter 07 Notes
- Project 12spring
- Cummins Placement Paper | Freshers Choice
- 1999pm
- h
- PIC24FJ128GA
- Machine Language
- Mpasm Directive
- arquitectura
- Pipeline
- computer arch n fetch exc cycle
- Arm Architecture Reference Manual

Sign up to vote on this title

UsefulNot usefulRead Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading