You are on page 1of 5

# Homework 4 (Due October 20

)
Question 1: /30

In this question, consider the following series of address references (given as word addresses): 4, 1, 20, 5, 8, 17, 16, 44 45, 4, 20, 21, 1, 20, 5, 21 For each of the following cache organizations, show the content of the cache after each memory reference and indicate whether the reference is a hit or a miss. Use [tag, M(address), ...] to describe the content of each entry. For example [4,M(46)] indicates that the entry contains tag=4 and the data from memory location 46. Similarly, [4,M(46),M(47)] indicates that the entry contains a block of two words from memory locations 46 and 47. As discussed in class, avoid drawing the cache after each reference by drawing only one cache and indicating that an entry E1 is replaced by E2 by crossing E1 and writing E2 next to it. Assume Least Recently Used replacement and assume that the cache is initially empty (invalid entries). (a) a direct mapped cache with 16 one-word blocks. An address¶s tag is floor(A/16), the index is A % 16 Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Content of cache (ordered from oldest to most recent) [1,M(16)] [0,M(1)] [1,M(17)] [0,M(1)]

[0,M(4)] [1,M(20)] [0,M(5)] [1,M(21)]

[0,M(4)] [1,M(20)] [0,M(5)] [1,M(21)]

[0,M(8)]

[2,M(44)] [2,M(45)]

4(miss), 1(miss), 20(miss), 5(miss), 8(miss), 17(miss), 16(miss), 44(miss), 45(miss), 4(miss), 20(hit), 21(miss), 1(miss), 20(hit), 5(miss), 21(miss) (b) a direct mapped cache with two-word blocks and total size of 16 words. For this part, cache blocks are twice as large, so index 0 would hold data held by indices 0 and 1 in part a. An address¶s index is floor(A/2) % 8. The tags are unchanged from a.

8(miss).M(8). 20(miss). 17(miss).M(5)] [1.M(21)] . 45(hit).M(20).M(20)] [0.M(5)] Content of second way [1. 8(miss).M(17)] [0.M(21)] [0. 5(miss).M(9)] [2. 4(miss).M(1)] [2.M(20).M(4).M(0).Index 0 1 2 3 4 5 6 7 Content of cache (ordered from oldest to most recent) [0.M(16).M(44). 16(miss). there are only 4 sets or valid indices. 21(miss) (c) a 2-way associative cache with one-word blocks and total size of 16 words Two-way set associativity means addresses mapping to index 0 hold blocks that would have mapped to indices 0 and 8 in part a.M(21)] [5. 17(miss). 5(miss). 20(hit). A cache block holding addresses 6 and 14 where 6 is the most recently used looks like [0.M(4).M(4)] [5.M(1)] [2. 21(miss). 20(hit).M(21)] [0.M(45)] 4(miss). 21(hit) (d) a 2-way associative cache with two-word blocks and total size of 16 words With both 2-way set associativity and 2-word blocks.M(17)] [0. The index is A % 8 now.M(5)] [1.M(5)] [1. 20(miss).M(6)|1.M(45)] [2. 1(miss).M(20).M(1)] [0. 44(miss). 4(hit).M(14)] Index 0 1 2 3 4 5 6 7 Content of first way [0.M(5)] 4(miss).M(1)] Content of second way [2.M(17)] [0. 44(miss).M(20).M(4)] [5.M(4). 5(miss).M(44).M(5)] [2.M(8)] [0.M(20)] [0.M(44)] [0. 1(miss). and the tag is floor(A/8). 45(miss). 1(hit).M(0). 16(hit).M(16). its tag is floor(A/8) Index 0 1 2 3 Content of first way [0. An address¶s index is floor(A/2) % 4. 21(hit).M(1)] [1.M(9)] [0.M(45)] [0.M(4).M(0).M(21)] [0.M(0).M(16)] [2. 1(miss).M(8).M(21)] [0. 20(miss).M(20). 20(miss). 5(miss).

20(hit). In this case. 21(hit) Question 2: /25 Compute the total number of bits required to implement each of the caches in question 1. The difference for each cache organization is in the tag bits. but are log2(the number of cache lines) y Tag bits are all bits needed to uniquely identify the address. y Index bits do not need to be stored. tags and valid bits. 45(hit). respectively. 44(miss). 8(miss).4(miss). valid bits. 5(hit). Questions 3: /20 . which is 512 bits. Note that the number of bits needed to implement the cache represents the total amount of memory needed for storing all of the data. 1(miss). which will be: (17 ± index bits ± offset bits) for each cache entry y We need one valid bit for each cache entry y We need log2(associativity) bits for each block. 20(miss). 1 bit per entry a) y y y b) y y y c) y y y y d) y y y y Tag bits: (17-2-1)*8 =112 Valid bits: 8 LRU bits: 8 Total: 512+112+8+8=640 Tag bits: (17-3) * 16 = 224 Valid bits: 16 LRU bits: 16 Total: 512+224+16+16=768 Tag bits: (17-3-1) * 8 = 104 Valid bits: 8 Total: 512+104+8 = 624 Tag bits: (17-4) * 16 = 208 Valid bits: 16 Total: 512+208 + 16 = 736 bits Note that you are not expected to take into account the LRU overhead for this question. Assume that each memory word is 32-bit long and that the entire memory contains 512KB (128 Kilo words). To hold the data for all parts. 5(hit). 1(miss). That is. 17(miss). 4(hit). we need 16 32-bit words. In this case. 21(hit). and bits to store LRU information. the answers for (c) and (d) change to 752 and 632. 16(hit). the address of a word is 17-bit long. 20(miss).

4)What¶s the optimal block size for a miss latency of 20 * B cycles? Just minimize the average latency for a memory access (miss rate * miss penalty) a) Block Size 8 16 32 64 128 b) Average latency equation 0.03 * 20 * 16 0.02 * 20 * 128 Average Latency 12.52 19. CPI = = = CPI(base) 1 4.018 * 20 * 32 0.015 * 20 * 64 0.08 * 20 * 8 0.6. and assume that the CPI for this machine is 1 when the data and instructions are always found in the cache. Assume that an L2 cache is added to the system.7 + CPI(inst) + 0. Assume that the cache miss penalty is 50 cycles.04*50 + CPI(data) + 0. what is the effective CPI if the instruction cache miss rate is 4% and the data cache miss rate is 6%? 2. we need to stall 50 cycles.Consider a program in which 40% of the instructions are memory load or store instructions. 1.2 .6 11.6)*40 Now we repeat the steps in part 1.2 + CPI(inst) + 0.4.5 and 5.6. and that the hit time for the L2 cache is 10 cycles and its miss penalty is 40 cycles.2 51. but we use 26 instead of 50 for the expected miss latency CPI = = = CPI(base) 1 2.6.06*0. When we miss in either cache.6 from the textbook. 5.04*26 + CPI(data) + 0.8 9. 5.4*26 Question 4: /25 Do problems 5. What would be the effective CPI if 60% of the references to the L2 cache (the misses from L1) are L2 hits? 1.4*50 2. Here we first get the average miss time (T) for an L1 cache (in part 1 it was 50 cycles) T = = = T(L2) 10 26 + T(L2 miss) + (1-0.06*0.6.

015 * C 0.04 * C 0.6.04 * (24+16) 0.04 * C 0.04 * 20 * 8 0.2 19.03 * C 0.04 * 20 * 16 0.28 1.6. what¶s the optimal block size? a) Block Size 8 16 32 64 128 b) Block Size 8 16 32 64 128 Average latency equation 0.04C 0.08 * C 0.08C 0.015 * (24+64) 0.04 Average latency equation 0.015 * 20 * 64 0.008 1.02 * 20 * 128 Average Latency 6.02 * C Average Latency 0.4 12.015C 0.8 19.Block Size 8 16 32 64 128 Average latency equation 0.02 * C Average Latency 0.04 * (24+8) 0.015 * C 0.02 * (24+128) Average Latency 1.03 * (24+32) 0.02C .04 5.018 * C 0.015C 0.015 * (24+64) 0.6) For constant miss latency.56 1.68 1.2 51.03 * C 0.03 * (24+16) 0.03C 0.08 * (24+8) 0.5) What¶s the optimal block size for a miss latency of 24+B cycles? a) Block Size 8 16 32 64 128 b) Block Size 8 16 32 64 128 Average latency equation 0.04C 0.018C 0.6 1.2 1.03 * 20 * 32 0.02 * (24+128) Average Latency 2.32 3.2 5.32 3.03C 0.018 * (24+32) 0.02C Average latency equation 0.