Professional Documents
Culture Documents
Organization
Ajit Pal
Professor
Department of Computer Science and
Engineering
Indian Institute of Technology Kharagpur
INDIA -721302
Outline
Key characteristics of memory systems
Hierarchical Memory Organization
Basic principles of cache memory
Basic issues in cache design:
Cache size
Mapping functions
Replacement algorithms
Write policy
Block size
Number of caches
Performance analysis
Summary
Ajit Pal, IIT Kharagpur
Introduction
Memory systems are critical to performance
Computer designers devoted great deal of
attention to develop sophisticated
mechanisms to improve the performance of
memory systems
Primary approach used to improve
performance is Hierarchical Memory
Organization
Programs exhibit temporal locality and
spatial locality
ALU I
N M EM O RY
T
R E G IS T E R S E
R
T IM IN G F
& A
CO NTRO L C
U N IT E
I/O D E V IC E S
SYSTEM
BUS
A
Storage capacity in c
c
Mb versus Access e
time in sec of s
different types of s
memories shown
Observation: t
i
Larger the capacity m
slower is the device e
Storage capacity
Ajit Pal, IIT Kharagpur
Key characteristics of Memory Systems
P
r
o
c
Observation: u
Higher capacity, r
e
lower cost m
e
Cost n
decreasing over t
the years
Year
c Pal, IIT Kharagpur
Ajit
o
Gap in Performance Between Memory and
CPUs Plotted over time
Register file Multiported 200+ GB/s 300+ ps >1X10E-2 > 10M (?)
SRAM (?)
Three Properties:
Inclusion: M1 < M2 < … < Mn
Coherence: Copies of different levels are
consistent
Locality of Reference:
Temporal locality
Spatial Locality
Sequential locality
0
Faster 1 Bigger
2
3
2n-1 m
k words each
Main Memory
Cache Memory
Ajit Pal, IIT Kharagpur
Cache Memory: Basic Principles
Start
Access main memory
for block containing RA
Receive Address
RA from CPU Allocate cache slot for
main memory block
Is block containing
No
RA in cache? Deliver RA word
Yes to the CPU
Done
Ajit Pal, IIT Kharagpur
Basic Issues: Cache Size
Mapping Functions:
One place: Direct mapping
Any place: Associative mapping
Few places: Set associative mapping
Ajit Pal, IIT Kharagpur
Block Identification
Given an address, how do we find where it goes
in the cache?
Indexing
Full search
Limited search
This is done by first breaking down an address
into three parts
Tag used for Index of the set Offset of the address in
identifying a match the cache block
i = j mod m,
where
i = the cache
line number
j = main memory
block number
m = number of
lines in the cache
Disadvantages
Fixed Cache location for a main
memory word
216-1
Two words with the same index but
Advantages different tag value cannot reside in
Simple cache simultaneously
Vulnerable to continuous swapping
Inexpensive
Ajit Pal, IIT Kharagpur
Associative Mapping: Full Search
T D T D … … T D
Associative mapping
Allows any main
memory block to be
mapped into any
cache line
Better performance
Expensive to
implement
Ajit Pal
Professor
Department of Computer Science and
Engineering
Indian Institute of Technology Kharagpur
INDIA -721302
Mapping Functions
Q1: Where can a block be placed in the
cache?
(Block placement)
Direct Mapped, Fully Associative, Set
Associative
Q2: How is a block found if it is in the
cache?
(Block identification)
Tag/Block
Ajit Pal, IIT Kharagpur
Direct Mapping
Tag Index BO V Tag Data
D
e
c
o
d
e
r
Data
=
Hit / Miss
Fully Associative Mapping
Tag BO V Tag Data
=
=
=
Parallel search
Associative search
Content =
Addressable Memory
Data
Hit / Miss
Set-Associative Mapping: Limited Search
A compromise that exhibits the strengths of both
the direct and associative mapping
Overcomes the disadvantages of both
m = v × k, i = j mod v
i = cache set number
j = main memory block number
m = number of lines in the cache
v = number of sets
k = number of lines in each set
A generalization of the previous two approaches
Address 31 10 9 210
Cache size = 4Kb TAG SET
Ajit Pal, IIT Kharagpur
Set-Associative Mapping
Cache
size =
4Kb
AND
OR
Set-Associative Mapping
Example:
two-way set-
associative
mapping, v=
m/2, k= 2
For v = m
and k = 1, it
reduces to
direct
mapping
For v = 1 and
k = m, it
reduces to
associative
mapping
Ajit Pal, IIT Kharagpur
Size of Tags Versus Associativity
31 13 2 0 31 2 0
18 12 2 30 2
Total no. of tag bits = 212x18 Total no. of tag bits = 212x30
Total no. of comparators = 1 Total no. of comparators = 212
D
e
c
o
d
e
r
= BO
Data
Hit / Miss
Basic Issues:
Block Size
P R IN C E T O N HARVARD
A R C H IT E C T U R E A R C H IT E C T U R E
CPU CPU
PRO G RAM
M EM O RY DATA
M EM O RY
M EM O RY
Processor Processor
I - Cache-1 D - Cache-1
Unified
Cache-1
Unified
Cache-2
Unified
Cache-2
Cache Pentium 4:
Two on-chip, each 8KB, Off-chip 256KB
Step 1:
CPU generates 44-bit address
The address is split into
29-bit tag
9-bit set index (29 = 512 sets)
6-bit block offset (26 = 64 bytes
blocks)
Step 2:
The “right” set is selected
using the index bits
Step 3:
The tag is compared to
both tags in the set; If a
match AND valid=1:
then a hit (If not: then a
miss)
Step 4:
If a match, select the matching
block and return the byte at the
right offset
The election of the matching
block is done via a 2:1
multiplexer