Professional Documents
Culture Documents
lecture-10.ppt
Announcements
Exam grading done
mean was 50, high was 70 solution sample should be up on website soon some got them in recitation working on plan for everyone else (worst case = recitation on Monday) please read the syllabus for details about the process for handling it
15-213, S08
10 4
0 Memory: 4 8 12
1 5 9 13
2 6 10 14
From lecture-9.ppt
15-213, S08
1 hit rate
Typical numbers (in percentages): 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc.
Hit Time
Typical numbers: 1-2 clock cycle for L1 5-20 clock cycles for L2
Miss Penalty
Additional time required because of a miss typically 50-200 cycles for main memory (Trend: increasing!)
15-213, S08
15-213, S08
combinations of indices and constraints on valid locations called a replacement policy write-through vs. write-back
on a miss, usually need to pick something to replace with the new item
CPU looks first for data in L1, then in main memory Typical system structure:
CPU chip L1 cache register file ALU
bus
bus interface
main memory
15-213, S08
abcd
... ...
pqrs
wxyz
The big slow main memory has room for many 4-word blocks
...
15-213, S08
wwww
... ...
wwww
wwww
The big slow main memory has room for many 4-word blocks
...
15-213, S08
valid
tag
B1
tag tag
0 0
1 1
B1 B1
10
valid
tag
B1
tag tag
0 0
1 1
B1 B1
11
Addressing Caches
Address A:
t bits
v
set 0: v v set 1: v v
s bits
b bits
0
tag
tag tag tag tag
0 1 B1 0 1 B1
0 1 B1 0 1 B1 0 1 B1 0 1 B1
m-1
<tag>
set S-1:
tag
The word at address A is in the cache if the tag bits in one of the <valid> lines in set <set index> match <tag> The word contents begin at offset <block offset> bytes from the beginning of the block
15-213, S08
12
Addressing Caches
Address A:
t bits
v
set 0: v v set 1: v v
s bits
b bits
0
tag
tag tag tag tag
0 1 B1 0 1 B1
0 1 B1 0 1 B1 0 1 B1 0 1 B1
m-1
<tag>
1.
set S-1:
tag
13
Locate the set based on <set index> 2. Locate the line in the set based on <tag> 3. Check that the line is valid 4. Locate the data in the line based on <block offset>
15-213, S08
14
15-213, S08
Use the set index bits to determine the set of interest. set 0: valid set 1: valid set S-1: valid
selected set
tag
tag tag
t bits
m-1
tag
15
0110 =?
b0
b1 b2
b3
(2) The tag bits in the cache line must match the tag bits in the address
m-1
16
0110
b0
b1 b2
b3
(3) If cache hit, block offset selects starting byte. t bits 0110 tag s bits b bits i 100 set index block offset0
15-213, S08
m-1
17
0 1 7 8 0
v
0 1
tag
? 0 1
data
? M[8-9] M[0-1]
M[6-7]
15-213, S08
18
set 0:
E=2
set 1:
19
15-213, S08
set 0:
cache block cache block cache block cache block cache block cache block
selected set
set 1:
valid
valid valid
tag
tag tag b bits block offset
0
set S-1:
t bits
m-1
20
tag
15-213, S08
must compare the tag in each valid line in the selected set.
1 1
1001 0110 b0 b1 b2 b3
(2) The tag bits in one of the cache lines =? must match the tag bits in the address t bits 0110 m-1 tag
21
1 1
1001 0110 b0 b1 b2 b3
m-1
22
0 1 7 8 0
0 1 0 1 0 1 0
tag
? 00 10 01
data
23
t bits
m-1
tag
24
15-213, S08
00 01 10 11
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
Middle-Order Bit Indexing 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
15-213, S08
25
00 01 10 11
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
Middle-Order Bit Indexing 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
15-213, S08
26
Memory
disk
200 B 3 ns 8B
8-64 KB 3 ns 32 B
30 GB 8 ms $0.05/MB
27
15-213, S08
What to do on a replacement?
28
29
15-213, S08
Locality Example #1
Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer
30
15-213, S08
Locality Example #2
Question: Does this function have good locality?
int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; }
31
15-213, S08
Locality Example #3
Question: Can you permute the loops so that the function scans the 3-d array a[] with a stride-1 reference pattern (and thus has good spatial locality)?
int sum_array_3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) sum += a[k][i][j]; return sum;
32
15-213, S08