You are on page 1of 14

Pipelining tutorial

MEM steps of instruction execution require 45 nanoseconds and the other steps require 20 nanoseconds.1. i. For a multi-cycle implementation. What is the minimum clock cycle time? ii. Assume an instruction mix of 60% R class and 40% I class instructions. Consider a machine which supports the following two instruction schedules for R class and I class instructions. How long does it take to execute 100 instructions in nanoseconds? . Assume that IF steps take 25 nanoseconds.

For a multi-cycle implementation. clock cycle time is the time for the longest stage => 45 ns 2.For a multi-cycle implementation. exec_time = 100 x (exec_time R + exec_time I ) = 100 x (cycle time x CPI x IC) R + (cycle time x CPI x IC) I = 100 x (4 x 45 x 0.1.6 + 5 x 45 x 0.4) = 19800ns .

In order to improve the speedup you are considering two options: Option 1: Modifying the compiler so that 70% of the computations can use the floating-point processor. When a program uses the floating- point processor. .Q. Cost of this option is $50K. the speedup of the floating point processor is 40% faster than when it doesn’t use it. You have determined that 60% of your computations can use the floating-point processor. What is the overall speedup obtained by using the floating point operations ii. You have a system that contains a special processor for doing floating-point operations. i.

Option 2: Modifying the floating-point processor. The speedup of the floating point processor is 100% faster than when it doesn’t use it. Overall speedup by using the floating- point processor. Which option would you recommend? Justify your answer quantitatively ANS : 1. Cost of this option is $60K. Assume in this case that 50% of the computations can use the floating–point processor.4 = 1/[(1 – 0.6/1. where F = 0.206 Where F is the fraction of computation time .4] = 1.6 and S = 1.6) + 0.

For option 1.33 = $45.4] = 1. Option 1 is better .5 S = 2 overall speedup = 1/[(1 – 0.1K Therefore.7/1.5/2]= 1.33 Cost/Performance = $60K/1.5) + 0.4 overall speedup= 1/[(1 – 0.7) + 0.25 Cost/Performance = $50K/1. F = 0.7 S = 1. F = 0.25 = $40K For option 2.

Ans. and a miss rate of 5%. Suppose doubling the size of the cache decrease the miss rate to 3%. the cycle time is:1/(100MHz) = 10 ns which gives AMAT = 10 ns x (2 + 20 x 0. AMAT = hit_time + miss_rate x miss_penaltySince the clock rate is 100 MHz. calculate the average memory access time (AMAT). Q. Given a 100 MHz machine with a with a miss penalty of 20 cycles. but causes the hit time to increases to 3 cycles and the miss penalty .Q.05) = 30 ns Note: Here we needed to multiply by the cycle time because the hit_time and miss_penalty were given in cycles. a hit time of 2 cycles.

R2. These instructions are executed in a computer that has a four stage pipeline .R1 mul #3 .R5 In all instructions . R0. R2.R2.R4 Add R0.1. and that instruction fetch requires only one clock cycle a. respectively. the destination operand is given last. Assume that the first instruction is fetched in cock cycle1. R3 And #$3A. Consider the following sequence of instructions Add #20. Initially registers R0 and R2 contain 2000 and 50. Describe the operation being performed .

.

.

Assume that it has 1-ns clock cycle and that it uses 4 cycles for ALU operations and 5 cycles for branches and 4 cycles for memory operations. Assume that the relative frequencies of these operations are 50%. pipelining the processor adds 0.Eg. 35% and 15% respectively. Ignoring any latency impact. Consider an un-pipelined processor. how much speed up in the instruction execution rate will we gain from . Suppose that due to clock skew and set up.15 ns of overhead to the clock.

CPI = 1 ns × ((0.2 ns So speed up = 4.15 × 4)) = 4.35 × 5) + (0.The average instruction execution time on an un-pipelined processor is = clock cycle × Avg.2 = 3.5 × 4) + (0. instruction execution time on pipelined processor is = 1 ns + 0.2 ns = 1.3625 .35/1.35 ns The avg.

Assume a pipeline with four stages Fetch Instruction (FI). and execute EX. Decode instruction and calculate address DA. . fetch operand FO. Draw a diagram for a sequence of 7 instructions in which the third instruction is a branch that is taken and in which there are no data dependencies.Q .

The design should allow for both byte and 16 bit word accesses. Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64 X 1 bit. .1. Give the array configuration of the chips on the memory board showing all required input and output signals for assigning this memory to the lowest address space .