P. 1
cs1541 HW4

cs1541 HW4

|Views: 909|Likes:
Published by Fei Kou

More info:

Published by: Fei Kou on Dec 02, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less

02/25/2014

pdf

text

original

Homework 4 (Due October 20

)
Question 1: /30

In this question, consider the following series of address references (given as word addresses): 4, 1, 20, 5, 8, 17, 16, 44 45, 4, 20, 21, 1, 20, 5, 21 For each of the following cache organizations, show the content of the cache after each memory reference and indicate whether the reference is a hit or a miss. Use [tag, M(address), ...] to describe the content of each entry. For example [4,M(46)] indicates that the entry contains tag=4 and the data from memory location 46. Similarly, [4,M(46),M(47)] indicates that the entry contains a block of two words from memory locations 46 and 47. As discussed in class, avoid drawing the cache after each reference by drawing only one cache and indicating that an entry E1 is replaced by E2 by crossing E1 and writing E2 next to it. Assume Least Recently Used replacement and assume that the cache is initially empty (invalid entries). (a) a direct mapped cache with 16 one-word blocks. An address¶s tag is floor(A/16), the index is A % 16 Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Content of cache (ordered from oldest to most recent) [1,M(16)] [0,M(1)] [1,M(17)] [0,M(1)]

[0,M(4)] [1,M(20)] [0,M(5)] [1,M(21)]

[0,M(4)] [1,M(20)] [0,M(5)] [1,M(21)]

[0,M(8)]

[2,M(44)] [2,M(45)]

4(miss), 1(miss), 20(miss), 5(miss), 8(miss), 17(miss), 16(miss), 44(miss), 45(miss), 4(miss), 20(hit), 21(miss), 1(miss), 20(hit), 5(miss), 21(miss) (b) a direct mapped cache with two-word blocks and total size of 16 words. For this part, cache blocks are twice as large, so index 0 would hold data held by indices 0 and 1 in part a. An address¶s index is floor(A/2) % 8. The tags are unchanged from a.

and the tag is floor(A/8). 8(miss). 4(hit). The index is A % 8 now. 20(miss).M(21)] [5.M(20).M(4). An address¶s index is floor(A/2) % 4.M(5)] 4(miss).M(5)] [1.M(4)] [5.M(45)] 4(miss). 21(hit). 1(miss). 16(miss).M(0). A cache block holding addresses 6 and 14 where 6 is the most recently used looks like [0.M(9)] [2.M(21)] [0. 20(hit). 5(miss).M(20)] [0.M(20).M(17)] [0.M(1)] Content of second way [2. there are only 4 sets or valid indices. 21(miss) (c) a 2-way associative cache with one-word blocks and total size of 16 words Two-way set associativity means addresses mapping to index 0 hold blocks that would have mapped to indices 0 and 8 in part a.M(45)] [2.M(44). 5(miss).M(17)] [0. 20(hit).M(4). 45(hit). 21(hit) (d) a 2-way associative cache with two-word blocks and total size of 16 words With both 2-way set associativity and 2-word blocks.M(4).M(0). 4(miss).M(21)] . 1(hit). 45(miss).M(5)] [2.M(21)] [0.M(8)] [0. 20(miss).M(5)] [1.M(20).M(6)|1.M(1)] [2. 1(miss).M(0).M(4)] [5.M(9)] [0.M(1)] [0.M(5)] [1.M(8). its tag is floor(A/8) Index 0 1 2 3 Content of first way [0.M(21)] [0. 17(miss).M(8).M(17)] [0.M(21)] [0. 20(miss).M(20)] [0.M(5)] Content of second way [1.M(45)] [0. 44(miss).M(14)] Index 0 1 2 3 4 5 6 7 Content of first way [0.M(44). 17(miss). 8(miss). 44(miss).M(44)] [0.M(20).M(20).M(1)] [1. 5(miss). 5(miss). 21(miss).M(16).Index 0 1 2 3 4 5 6 7 Content of cache (ordered from oldest to most recent) [0. 16(hit).M(16).M(16)] [2.M(1)] [2. 20(miss).M(4).M(0). 1(miss).

In this case. 45(hit). 5(hit). the answers for (c) and (d) change to 752 and 632. Questions 3: /20 . 21(hit) Question 2: /25 Compute the total number of bits required to implement each of the caches in question 1. valid bits. 20(miss). 5(hit). 16(hit). but are log2(the number of cache lines) y Tag bits are all bits needed to uniquely identify the address. 8(miss).4(miss). To hold the data for all parts. y Index bits do not need to be stored. The difference for each cache organization is in the tag bits. respectively. the address of a word is 17-bit long. 21(hit). 20(miss). tags and valid bits. 1(miss). In this case. which will be: (17 ± index bits ± offset bits) for each cache entry y We need one valid bit for each cache entry y We need log2(associativity) bits for each block. Assume that each memory word is 32-bit long and that the entire memory contains 512KB (128 Kilo words). 44(miss). That is. 1(miss). 4(hit). 20(hit). we need 16 32-bit words. and bits to store LRU information. Note that the number of bits needed to implement the cache represents the total amount of memory needed for storing all of the data. which is 512 bits. 17(miss). 1 bit per entry a) y y y b) y y y c) y y y y d) y y y y Tag bits: (17-2-1)*8 =112 Valid bits: 8 LRU bits: 8 Total: 512+112+8+8=640 Tag bits: (17-3) * 16 = 224 Valid bits: 16 LRU bits: 16 Total: 512+224+16+16=768 Tag bits: (17-3-1) * 8 = 104 Valid bits: 8 Total: 512+104+8 = 624 Tag bits: (17-4) * 16 = 208 Valid bits: 16 Total: 512+208 + 16 = 736 bits Note that you are not expected to take into account the LRU overhead for this question.

2 . what is the effective CPI if the instruction cache miss rate is 4% and the data cache miss rate is 6%? 2. Assume that an L2 cache is added to the system.015 * 20 * 64 0. CPI = = = CPI(base) 1 4.6.Consider a program in which 40% of the instructions are memory load or store instructions. When we miss in either cache. Here we first get the average miss time (T) for an L1 cache (in part 1 it was 50 cycles) T = = = T(L2) 10 26 + T(L2 miss) + (1-0. 5.02 * 20 * 128 Average Latency 12.04*26 + CPI(data) + 0.8 9.2 51.4)What¶s the optimal block size for a miss latency of 20 * B cycles? Just minimize the average latency for a memory access (miss rate * miss penalty) a) Block Size 8 16 32 64 128 b) Average latency equation 0.52 19. but we use 26 instead of 50 for the expected miss latency CPI = = = CPI(base) 1 2.018 * 20 * 32 0.06*0. we need to stall 50 cycles.6.6 11.06*0. and that the hit time for the L2 cache is 10 cycles and its miss penalty is 40 cycles.4*50 2.6)*40 Now we repeat the steps in part 1.04*50 + CPI(data) + 0. 5. Assume that the cache miss penalty is 50 cycles.6.7 + CPI(inst) + 0.4*26 Question 4: /25 Do problems 5.4.08 * 20 * 8 0.03 * 20 * 16 0. and assume that the CPI for this machine is 1 when the data and instructions are always found in the cache. 1.5 and 5.2 + CPI(inst) + 0.6 from the textbook. What would be the effective CPI if 60% of the references to the L2 cache (the misses from L1) are L2 hits? 1.6.

2 1.6) For constant miss latency.6.015C 0.015C 0.04C 0.015 * C 0.018 * C 0.32 3.6 1.04 * C 0.02 * C Average Latency 0.04 * C 0.03 * (24+32) 0.08 * (24+8) 0.04C 0.03 * C 0.2 51.02 * (24+128) Average Latency 2.03C 0.Block Size 8 16 32 64 128 Average latency equation 0.018 * (24+32) 0.32 3.68 1.02C .2 19.04 * 20 * 16 0.015 * (24+64) 0.4 12.03C 0.56 1.6. what¶s the optimal block size? a) Block Size 8 16 32 64 128 b) Block Size 8 16 32 64 128 Average latency equation 0.02 * 20 * 128 Average Latency 6.5) What¶s the optimal block size for a miss latency of 24+B cycles? a) Block Size 8 16 32 64 128 b) Block Size 8 16 32 64 128 Average latency equation 0.2 5.08C 0.04 * (24+16) 0.04 * (24+8) 0.8 19.015 * 20 * 64 0.04 Average latency equation 0.02 * (24+128) Average Latency 1.28 1.008 1.018C 0.02 * C Average Latency 0.04 * 20 * 8 0.015 * (24+64) 0.03 * C 0.08 * C 0.04 5.02C Average latency equation 0.03 * (24+16) 0.03 * 20 * 32 0.015 * C 0.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->