Professional Documents
Culture Documents
Cache Part II
© IT - TDT Computer Organisation 2
Acknowledgement
• The contents of these slides have origin from School of
Computing, National University of Singapore.
• We greatly appreciate support from Mr. Aaron Tan Tuck
Choy for kindly sharing these materials.
© IT - TDT Computer Organisation 3
Assembly
Language Cache Part II
Processor:
Type of Cache Misses
Datapath Direct-Mapped Cache
Processor:
Set Associative Cache
Control Fully Associative Cache
Block Replacement Policy
Pipelining Cache Framework
Improving Miss Penalty
Cache Multi-Level Caches
Cache II 5
Conflict Miss
• Two or more distinct memory blocks map to the same cache block
• Big problem in direct-mapped caches
• Solution 1: Increase cache size
• Inherent restriction on cache size due to SRAM technology
• Solution 2: Set-Associative caches (coming next ..)
Capacity Miss
• Due to limited cache size
• Will be further clarified in "fully associative caches" later
CS2100
Cache II 6
Block Size
Average Tradeoff
Access Time (1/2)
= Hit rate x Hit Time + (1-Hit rate) x Miss penalty
CS2100
Cache II
8 KB
Miss rate (%)
16 KB
5 64 KB
256 KB
0
8 16 32 64 128 256
Block size (bytes)
CS2100
7
Cache II 8
CS2100
Cache II 9
CS2100
Cache II 11
SA Cache Structure
Valid Tag Data Valid Tag Data Set Index
00
01
Set 10
11
2-way Set Associative Cache
• An example of 2-way set associative cache
• Each set has two cache blocks
• A memory block maps to an unique set
• In the set, the memory block can be placed in either of the cache
blocks
Need to search both to look for the memory block
CS2100
Cache II 12
31 N+M-1 N N-1 0
Tag Set Index Offset
Offset, N = 2 bits
Block Number = 32 – 2 = 30 bits
1 Block Check: Number of Blocks = 230
= 4 bytes
31 N+M-1 N N-1 0
Tag Set Index Offset
Number of Cache Blocks
Cache = 4KB / 4bytes = 1024 = 210
4 KB
4-way associative, number of sets
= 1024 / 4 = 256 = 28
Set Index, M = 8 bits
= = = =
A B C D
Note the
simultaneous
"search" on all
A
elements of a set B
C 4 x 1 Select
D 32
Hit Data
CS2100
Cache II 15
5
6 Direct Mapped Cache
7
8
9
10 Example:
11 Given this memory access sequence: 0 4 0 4 0 4 0 4
12
Result:
13
Cold Miss = 2 (First two accesses)
14
Conflict Miss = 6 (The rest of the accesses)
15
CS2100
Cache II 16
5
6 2-way Set Associative Cache
7
8
9
10 Example:
11 Given this memory access sequence: 0 4 0 4 0 4 0 4
12
Result:
13
Cold Miss = 2 (First two accesses)
14
Conflict Miss = None (as all of them are hits!)
15
CS2100
Cache II 17
CS2100
Cache II 18
31 4 3 2 0
Tag Set Index Offset
Offset, N = 3 bits
Block Number = 32 – 3 = 29 bits
2-way associative, number of sets= 2 = 21
Set Index, M = 1 bits
Cache Tag = 32 – 3 – 1 = 28 bits
CS2100
Cache II
Miss
4, 0 , 8, 36, 0
Example: LOAD #1 Tag Index Offset
Load from 4 000000000000000000000000000 0 100
Block 0 Block 1
Set
Valid Tag W0 W1 Valid Tag W0 W1
Index
0 01 0 M[0] M[4] 0
1 0 0
CS2100
19
Cache II
Miss Hit
4, 0 , 8, 36, 0
Example: LOAD #2 Tag Index Offset
Load from 0 000000000000000000000000000 0 000
Result:
[Valid and Tag match] in Set 0-Block 0 [ Spatial Locality ]
Block 0 Block 1
Set Valid Tag W0 W1 Valid Tag W0 W1
Index
0 1 0 M[0] M[4] 0
1 0 0
CS2100
20
Cache II
Miss Hit Miss
4, 0 , 8, 36, 0
Example: LOAD #3 Tag Index Offset
Load from 8 000000000000000000000000000 1 000
Block 0 Block 1
Set
Index
Valid Tag W0 W1 Valid Tag W0 W1
0 1 0 M[0] M[4] 0
1 01 0 M[8] M[12] 0
CS2100
21
Cache II
Miss Hit Miss Miss
4, 0 , 8, 36, 0
Example: LOAD #4 Tag Index Offset
Load from 36 000000000000000000000000010 0 100
Block 0 Block 1
Set
Index
Valid Tag W0 W1 Valid Tag W0 W1
1 1 0 M[8] M[12] 0
CS2100
22
Cache II
Miss Hit Miss Miss Hit
4, 0 , 8, 36, 0
Example: LOAD #5 Tag Index Offset
Load from 0 000000000000000000000000000 0 000
Block 0 Block 1
Set
Valid Tag W0 W1 Valid Tag W0 W1
Idx
1 1 0 M[8] M[12] 0
CS2100
23
Cache II 24
FULLY ASSOCIATIVE
CACHE
How about total freedom?
CS2100
Cache II 25
CS2100
Cache II 26
• Key Idea:
• Memory block placement is no longer restricted by cache index /
cache set index
++ Can be placed in any location, BUT
--- Need to search all cache blocks for memory access
CS2100
Cache II 27
31 N N-1 0
Tag Offset
CS2100
Cache II
Cache Performance
Observations:
1. Cold/compulsory miss remains the same
Identical block size irrespective of cache size/associativity.
2. For the same cache size, conflict miss goes down
with increasing associativity.
3. Conflict miss is 0 for FA caches.
4. For the same cache size, capacity miss remains
the same irrespective of associativity.
5. Capacity miss decreases with increasing cache
size .
CS2100
Cache II 30
BLOCK REPLACEMENT
POLICY
Who should I kick out next…..?
CS2100
Cache II 31
CS2100
Cache II 32
Access 16
0 8 12 4
x Miss (Evict Block 0)
Access 12
8 12 4 16 Hit
Access 0
8 4 16 12
x Miss (Evict Block 8)
Access 4 4 16 12 0 Hit
16 12 0 4
CS2100
Cache II 33
CS2100
Cache II
1 0 1 0 M[8] M[12]
2 0
3 0
CS2100
34
Cache II
1 0 1 1 M[8] M[12]
2 0 1 4 M[32] M[36]
3 0
CS2100
35
Cache II
1 01 0 M[8] M[12] 0
CS2100
36
Cache II 37
CS2100
Cache II 38
CS2100
Cache II 39
CS2100
Cache II 40
EXPLORATION
What else is there?
CS2100
Cache II 41
CS2100
Cache II 42
Multilevel
• Options:
Cache: Idea
• Separate data and instruction caches or a unified
cache
• Sample sizes:
• L1: 32KB, 32-byte block, 4-way set associative
• L2: 256KB, 128-byte block, 8-way associative
• L3: 4MB, 256-byte block, Direct mapped
Processor
L1 I$
Reg Unified
Memory Disk
L2 $
L1 D$
CS2100
Cache II 43
CS2100
Cache II 44
per Core:
-32KB L1 Inst Cache
-32KB L1 Data Cache
-256KB L2 Inst/Data Cache
-up to 2.5MB LLC
CS2100
Cache II 45
Reading Assignment
• Large and Fast: Exploiting Memory Hierarchy
• Chapter 7 sections 7.3 (3rd edition)
• Chapter 5 sections 5.3 (4th edition)
CS2100
© IT - TDT Computer Organisation 46
Q&A