You are on page 1of 4

5.1.

In this exercise we look at memory locality properties of matrix


computation. The following code is written in C, where elements within the
same row are stored contiguously. Assume each word is a 64-bit integer.
for (I=0; I<8; I++)
for (J=0; J<8000; J++)
A[I][J]=B[I][0]+A[J][I];
5.1.1 How many 64-bit integers can be stored in a 16-byte cache
block?
Answer:
1 byte = 8 bits
=> 16 bytes = 128bits
=> In a 16-byte cache block, 2 64-bit integers can be stored.
5.1.2 Which variable references exhibit temporal locality?
Answer:
Temporal locality variable: I, J, B[I][0]

5.1.3 Which variable references exhibit spatial locality? Locality is affected by


both the reference order and data layout. The same computation can also be
written below in Matlab, which differs from C in that it stores matrix elements
within the same column contiguously in memory.
for I=1:8
for J=1:8000
A(I,J)=B(I,0)+A(J,I);
end
end
Answer: A[I][J]
5.1.4 Which variable references exhibit temporal locality?
Answer: I, J, B(I,0)

5.1.5 Which variable references exhibit spatial locality?


Answer: A(J,I) and B(I,0)
5.1.6 How many 16-byte cache blocks are needed to store all 64-bit
matrix elements being referenced using Matlab’s matrix storage? How many
using C’s matrix storage? (Assume each row contains more than one element.)
Answer:
With matrix A, the code references 8*8000=64 000 integers. Because 2 integers
per 16-byte block, we need 64 000/2 = 32 000 blocks.
Using C’s matrix storage (that it stores matrix elements within the same column
contiguously in memory), with matrix B have 8 blocks
=> 32000 + 8 = 32 008 (blocks)
Using Matlab’s matrix storage (that it stores matrix elements within the same
row contiguously in memory), with matrix B have 4 blocks (2 integers per
block)
=> 32000 + 4 = 32 004 (blocks)

5.2 Caches are important to providing a high-performance memory hierarchy


to processors. Below is a list of 64-bit memory address references, given as
word addresses.
0x03, 0xb4, 0x2b, 0x02, 0xbf, 0x58, 0xbe, 0x0e, 0xb5, 0x2c, 0xba, 0xfd
5.2.1 For each of these references, identify the binary word address, the tag, and
the index given a direct-mapped cache with 16 one-word blocks. Also list
whether each reference is a hit or a miss, assuming the cache is initially empty.
Answer:
16 blocks = 24 => 4-bit Index
Word address Binary address Tag Index Hit/ Miss

0x03 0000 0011 0x0 0011 M


0xb4 1011 0100 0xb 0100 M
0x2b 0010 1011 0x2 1011 M
0x02 0000 0010 0x0 0010 M
0xbf 1011 1111 0xb 1111 M
0x58 0101 1000 0x5 1000 M
0xbe 1011 1110 0xb 1110 M
0x0e 0000 1110 0x0 1110 M
0xb5 1011 0101 0xb 0101 M
0x2c 0010 1100 0x2 1100 M
0xba 1011 1010 0xb 1010 M
0xfd 1111 1101 0xf 1101 M
5.2.2 For each of these references, identify the binary word address, the tag, the
index, and the offset given a direct-mapped cache with two-word blocks and a
total size of eight blocks. Also list if each reference is a hit or a miss, assuming
the cache is initially empty.
Answer:
8 blocks = 23 => 3-bit index
2 words/block = 21 => 1-bit offset
Word address Binary address Tag Index Offset Hit/ Miss
0x03 0000 0011 0x0 001 1 M
0xb4 1011 0100 0xb 010 0 M
0x2b 0010 1011 0x2 101 1 M
0x02 0000 0010 0x0 001 0 M
0xbf 1011 1111 0xb 111 1 M
0x58 0101 1000 0x5 100 0 M
0xbe 1011 1110 0xb 111 0 M
0x0e 0000 1110 0x0 111 0 M
0xb5 1011 0101 0xb 010 1 M
0x2c 0010 1100 0x2 110 0 M
0xba 1011 1010 0xb 101 0 M
0xfd 1111 1101 0xf 110 1 M

5.2.3 You are asked to optimize a cache design for the given references. There
are three direct-mapped cache designs possible, all with a total of eight words of
data:
■ C1 has 1-word blocks
■ C2 has 2-word blocks
■ C3 has 4-word blocks.
Answer:
C1 has 1-word blocks => 8 blocks => 3-bit Index, no bit offset
C2 has 2-word blocks => 4 blocks => 2-bit Index + 1-bit offset
C3 has 4-word blocks => 2 blocks => 1-bit Index + 2-bit offset
Word Binary Tag Cache 1 Cache 2 Cache 3
Address Address Index Hit/Miss Index Hit/Miss Index Hit/Miss
0x03 0000 0011 0x00 011 M 01 M 0 M
0xb4 1011 0100 0x16 100 M 10 M 1 M
0x2b 0010 1011 0x05 011 M 01 M 0 M
0x02 0000 0010 0x00 010 M 01 M 0 M
0xbf 1011 1111 0x17 111 M 11 M 1 M
0x58 0101 1000 0x0b 000 M 00 M 0 M
0xbe 1011 1110 0x17 110 M 11 H 1 H
0x0e 0000 1110 0x01 110 M 11 M 1 M
0xb5 1011 0101 0x16 101 M 10 H 1 M
0x2c 0010 1100 0x02 100 M 10 M 1 M
0xba 1011 1010 0x17 010 M 01 M 0 M
0xfd 1111 1101 0x1f 101 M 10 M 1 M

Cache 1 miss rate = 100%


Cache 2 miss rate = 10/12 = 83%
Cache 3 miss rate = 11/12 = 92%
=> Cache 2 provides the best performance.

You might also like