Professional Documents
Culture Documents
Cache Memory
Muhammad Tahir
Lecture 19-20
Contents
1 Introduction
2 Placement Policies
3 Cache Examples
4 Caching Principles
5 Performance Analysis
2/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
• A Cache:
• is a small but fast memory leading to reduced latency
• holds an identical copy of most frequently used segments from
main memory
• provides faster access or reduced latency
• is transparent to the user
• can have multiple levels
3/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Cache Structures
Data (word)
Data (block)
Data (word)
Cache Memory
(Data Array)
Index, Offset
Data (block)
Refill/
Memory
CPU
update
Stall
Cache Ready
Read/Write
Address
4/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
ADDRESS ADDRESS
Copy of main
memory
location 101
Copy of main
Data Data
memory 100 ---------- Line
Byte Byte
location 100
Data
304 ----------
Byte
6848
416
DATA
----------
BLOCK
ADDRESS
TAG
5/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Block
Line
Word
CACHE
REG File
CPU MEMORY
6/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Caching Terminology
7/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Cache Blocks
• Each cache block or cache line has a tag field to find whether
the requested data is already in the cache or not
• A cache block has a valid bit to determine whether the data
in the block is valid or not
• There is a dirty bit to mark whether the block is modified
while in cache (dirty) or not modified (clean)
Block/Line No.
V
D
Tag
Data (16 byte block)
0
. . .
. . .
. . .
2k
8/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Cache Addressing
• Logically memory (main) is divided into multiple blocks
• Each block from main memory maps to a cache line
• The selection of cache line is determined by the index field in
the address
• The tag field is used to determine the presence/absence of
data in the cache
• The tag and index fields collectively form the block address
• Which data element to select from the cache line, is
determined by the offset field
Offset
32 {
31 12 4 2 0
Tag Index
20 8 2 (word offset)
9/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
32
{
31 12 4 2 0
Tag
Index
20
8 2 (word offset)
Line No.
V
D
Tag
Data (16 byte block)
0
. . .
. . .
254
255
32 Data
Hit
10/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
32
{
31 12 4 2 0
Tag
Index
20
8 2 (word offset)
Line No.
V
D
Tag
Data (16 byte block)
0
. . .
. . .
valid bit
254
255
32 Data
Hit
11/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
32
{
31 12 4 2 0
Tag
Index
20
8 2 (word offset)
Line No.
V
D
Tag
Data (16 byte block)
0
. . .
. . .
valid bit
tag field
32 Data
Hit
12/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
32
{
31 12 4 2 0
Tag
Index
20
8 2 (word offset)
Line No.
V
D
Tag
Data (16 byte block)
0
. . .
. . .
valid bit
field
0x2
32 Data
Hit
13/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
14/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Placement Policies
• Direct Mapped
• A block can only go in a single line of the cache – has many
sets but each set can hold only one block
• Only single tag needs to be checked against block number
• Different blocks can go to the same line
• A particular block can only go to one line (may lead to conflict
misses)
15/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
• Set Associative
• A block can be placed in any of the N lines (belonging to the
indexed set)
• There are M sets and each set contains N blocks (B = M × N)
• N > 1 but less than the total number of lines in the cache
• Fully Associative
• Any block can be placed in any line
• All tags are compared against incoming block number
• A special case of set associative cache where N is equal to
number of cache lines
16/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Spectrum of Associativity
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Main
...
Memory
0 1 2 3 4 5 6 7 0 1 2 3
Cache 13 mod 4
13 mod 8
0 1
13 mod 2 anywhere
17/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
18/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
19/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
{
31 12 11 4 3 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Line No. LRU V0 D0 Tag0 Data0 (16 byte block) V1 D1 Tag1 Data1 (16 byte block)
0 1 1 0 0 0
1 1 1 1 1010 1010 1010 1100 0000 0 0
2 1 1 0 0 0
3 0 0 0 0 0
...
...
...
...
...
...
...
0 0 0 0 0
253 0 0 0 0 0
254 0 0 0 0 0
255 0 0 0 0 0
...
...
= =
32 Data 32 Data
Encoder
(2 to 1)
32 Data
hit
20/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
21/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data
= = = =
hit Data
22/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
23/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
253
254
=
=
Encoder
(2 to 1)
32
Data
Hit
24/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
253
254
=
=
Encoder
(2 to 1)
32
Data
Hit
25/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
=
=
Encoder
(2 to 1)
32
Data
Hit
26/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
253
254
=
=
Encoder
(2 to 1)
32
Data
Hit
27/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
253
254
=
=
Encoder
(2 to 1)
32
Data
Hit
28/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Address
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
253
Step3: If miss, load
254
data from memory
=
=
Encoder
(2 to 1)
32
Data
Hit
29/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
(from Processor)
Offset
32
{
31 12 4 2 0
Read from address 1010 1010 1010 1100 0000 0000 0001 00 00
Tag 20
Index 8
2 (word offset)
Line No.
LRU
V0
D0
Tag0
Data (16 byte block) V1
D1
Tag1
Data1 (16 byte block)
. . .
. . .
. . .
. . .
. . .
Encoder
(2 to 1)
32
Data
Hit
30/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Cache Misses
31/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Replacement Policy
• Random
• Least Recently Used (LRU)
• LRU cache state must be updated on every access
• True implementation only feasible for small sets (2-way)
• Pseudo-LRU binary tree often used for 4-8 way
• First In, First Out (FIFO) a.k.a. Round-Robin
• Used in highly associative caches
• Not Least Recently Used (NLRU)
• FIFO with exception for most recently used block or blocks
32/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Data (word)
Data (word)
Data (block)
Cache Memory
Index, Offset
(Data Array)
Refill/
Memory
CPU
update
Stall
Cache Ready
Controller
Rd/Wr/Flush
(Tag Array)
Tag, Index
Read/Write
Address
33/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Data (word)
Data (block)
Data (word)
Cache Memory
(Data Array)
Index, Offset
Data (block)
Refill/
Memory
CPU
update
Stall
Cache Ready
Read/Write
Address
34/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
• Allocate policy
• Do we allocate an entry in the cache for blocks we write on a
write miss?
• Write-allocate: Brings in a block when we have a write miss.
Most caches today are of this type because in general there is
some locality between writes and read.
• No-write-allocate: Brings in a block only on a read miss
35/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
36/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Suggested Reading
38/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
Acknowledgment
39/40
Introduction Placement Policies Cache Examples Caching Principles Performance Analysis
References
40/40