Key 9

Key 7.
10 Notice that we have blocks of size four, thus our cache thus each block (row) in our cache holds four words. These word offsets are labeled 00, 01, 10 and 11. There are four rows labeled 00, 01, 10 and 11. The sequence we are given is of word addresses. To convert this into a sequence of block numbers we divide by 4 and throw away the remainder (which is the offset). Word 2 will be found in block 0 (it is actually the third word in that block. Word 3 is also found in block 0 ( = 0 plus a remainder of 3). It is the 4th word in that block. Word 11 is in block 2 and will the 3rd word in that block. Thus our word reference sequence: 2 3 11 16 21 13 64 48 19 11 3 22 4 27 6 11 When divided by 4 using integer division becomes our block number reference sequence. 0 0 2 4 5 3 16 12 4 2 0 5 1 6 1 2 We map these block nos into 4 cache blocks direct mapping (mod 4) to get the cache block number sequence. 0 0 2 0 1 3 0 0 0 2 0 1 1 2 1 2 Thus we get the following map where the contents of each cache block is read L to R timewise. (CBN is cache block number and BN is block nummmmmber. CBN BN 0 0 4 16 12 4 0 1 5 1 2 2 6 2 3 3 To see this consider the following explanation. Word 2 maps to block 0, this gives us a miss and block 0 which contains words 0-3 is loaded into the cache block 0 mod 4 which is cache block 0. Word 3 maps to block 0 which is in CBN 0 so we have a hit. Word 11 maps to block 2 (as are all words 8-11). 2 mod 4 is 2, miss ,so block 2 is stored in CBN 2. Word 16 maps to block 4 (as are all words 16-19), 4 mod 4 is 0. This is a miss, so block zero is removed from CBN 0 and replaced by block 4. Word 21 maps to block 5 (20-23), 5 mod 4 is 1, miss, so CBN 1 now contains block 5. Word 13 maps to block 3, 3 mod 4 is 3, miss so CBN 3 now contains block 13. Word 64 maps to block 16 which in turn maps to CBN 0, miss. Word 48 maps to block 12 which in turn maps to CBN 0, miss. Word 19 maps to block 4 which in turn maps to CBN 0, miss. Word 11 maps to block 2 which in turn maps to CBN 2, hit. Word 3 maps to block 0 which in turn maps to CBN 0, miss. Word 22 maps to block 5 which in turn maps to CBN 1, hit. Word 4 maps to block 1 which in turn maps to CBN 1, miss. Word 27 maps to block 6 which in turn maps to CBN 2, miss. Word 6 maps to block 1 which in turn maps to CBN 1, hit. Word 11 maps to block 2 which in turn maps to CBN 2, miss.
Thus we had 4 hits and 121 misses. Final contents of Cache CBN 0 CBN 1 CBN 2 CBN 3 contains contains contains contains block block block block number number number number 4( 1( 2( 3( or or or or words words words words 0-3) 4-7) 9-11) 12-15)
Whew!!
7.14 The miss penalty is the time to transfer one block from main memory to the cache.
Assume that it takes 1 clock cycle to send the address to the main memory.
a. Conguration (a) requires 16 main memory accesses to retrieve a cache block, and words of the block are transferred 1 at a time. Miss penalty = 1 + 16 10 + 16 1 = 177 clock cycles. b. Conguration (b) requires 4 main memory accesses to retrieve a cache block and words of the block are transferred 4 at a time. Miss penalty = 1 + 4 10 + 4 1 = 45 clock cycles. c. Conguration (c) requires 4 main memory accesses to retrieve a cache block, and words of the block are transferred 1 at a time. Miss penalty = 1 + 4 10 + 16 1 = 57 clock cycles
7.28 Two principles apply to this cache behavior problem. First, a two-way set-associative
cache of the same size as a direct-mapped cache has half as many sets as the direct-mapped has blocks. Second, LRU replacement can behave pessimally (as poorly as possible) for access patterns that cycle through a sequence of addresses that reference more blocks than will fit in a set managed by LRU replacement. Consider three addressescall them A, B, Cthat all map to the same set in the two-way set-associative cache, but to two different sets in the direct-mapped cache. Without loss of generality, let A map to one set in the direct-mapped cache and B and C map to another set. Let the access pattern be A B C A B C A . . . and so on. The direct-mapped cache will then have miss, miss, miss, hit, miss, miss, hit, . . . , and so on. With LRU replacement, the block at address C will replace the block at the address A in the two-way set-associative cache just in time for A to be referenced again. Thus, the two-way set-associative cache will miss on every reference as this access pattern repeats.
Here is a second answer: The example we did in class 8 cached blocks and thus used mod 8 for direct mapping and mod 4 for two way. If A has block nos 0, B has nos 4 and C has nos 8. Then for direct mapped, A and C go to cached block 0 will be goes to Cache block 4, thus A B C A B C has the miss pattern M M M M H M since B is a hit the second time around. While A, B and C all are zero mod 4, they all map to set 0 which can hold only two block. So the ABCABC patter results in all misses. This is not quite an example of the first answer since that would require B and C to map to the same cache location rather than A and B,
7.32 Here are the computations for each machine.
We are interested in the miss rate times the miss penalty per instruction. C1: Miss penalty is 6 + 1 or 7 units of time. For instruction fetches the miss rate is 4%, hence MR* MP = 7 * .04 = .28 units of time spent on stall cycles. But of the instructions contain a data reference and 6% of these miss, on the average each instruction requires * 7 * .06 = .21 units of time. Therefore the average instruction takes .49 units of time spent on stall cycles.. C2: Miss penalty is 10. 10 * .02 = .2 * 10 * .04 = .2 cycles. C3 Miss penalty is 10. 10 * .02 = .2 cycles. * 10 * .03 = .15 total of .35 units of time spent on stall total of .40 units of time spent on stall
Therefore C1 spends the most time on cache misses.

Key 9

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Key 9

Uploaded by

Copyright:

Available Formats

Key 7.

7.32 Here are the computations for each machine.

Therefore C1 spends the most time on cache misses.

You might also like