Professional Documents
Culture Documents
Registers
L1 Cache
L2 Cache
L3 Cache
Main memory
Disk cache
Disk
Optical
Tape
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Virtual memory
Facility that allows programs to address memory from a
logical point of view, without regard to the amount of main
memory physically available
When used, the address fields of machine instructions
contain virtual addresses
For reads to and writes from main memory, a hardware
memory management unit (MMU) translates each virtual
address into a physical address in main memory
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Because there are fewer cache lines than main memory blocks,
an algorithm is needed for mapping main memory blocks into
cache lines
Three techniques can be used
Direct
The simplest technique
Maps each block of main memory into only one possible cache line
Associative
Permits each main memory block to be loaded into any line of the cache
The cache control logic interprets a memory address simply as a Tag and a Word field
To determine whether a block is in the cache, the cache control logic must
simultaneously examine every line’s Tag for a match
Set Associative
A compromise that exhibits the strengths of both the direct and associative
approaches while reducing their disadvantages
Cache of 64kByte
Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address
(224=16M)
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
8 14 2
Mapping method
i = j modulo m
where
i cache line number
j main memory block number
m number of lines in the cache
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high
Word
Tag 9 bit Set 13 bit
2 bit
No choice
Each block only maps to one line
Replace that line
Replace that block in the set that has been in the cache longest with no reference
to it
For two-way set associative, this is easily implemented.
Each line includes a USE bit
When a line is referenced, its USE bit is set to 1 and the USE bit of the other line
in that set is set to 0
When a block is to be read into the set, the line whose USE bit is 0 is used
Because we are assuming that more recently used memory locations are more
likely to be referenced, LRU should give the best hit ratio
LRU is also relatively easy to implement for a fully associative cache
The cache mechanism maintains a separate list of indexes to all the lines in the
cache
When a line is referenced, it moves to the front of the list
For replacement, the line at the back of the list is used
Because of its simplicity of implementation, LRU is the most popular replacement
algorithm.
Chapter # 5 Computer Organization & Architecture 50
LRU Replacement Algorithm - Method
S H E H E R YAR MALI K
Solution
_________________________________________________________________________________________________________________________________
0 0 0 0 Initialization
5 Miss 0 1 1 1 Block 0 filled
6 Miss 1 0 2 2 Block 1 filled
5 Hit 0 1 2 2 Block 0 accessed
7 Miss 1 2 0 3 Block 2 filled
8 Miss 2 3 1 0 Block 3 filled
6 Hit 3 0 2 1 Block 1 accessed
9 Miss 0 1 3 2 Block 0 replaced
10 Miss 1 2 0 3 Block 2 replaced
------------------------------------------------------------------------------------------------
SET 0 Hit Ratio = 2 / 8 = 0.25
v TAG Block0 v TAG Block1 v
TAG Block2 v TAG Block3 C0 C1 C2 C3
Unified cache
One cache for data and instructions
Split cache
Two separate caches, one for data and one for instructions
Advantages of unified cache
Higher hit rate
Balances load of instruction and data fetch
Only one cache to design & implement
Advantages of split cache
Eliminates cache contention between instruction
fetch/decode unit and execution unit
Important in pipelining
Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Out of order execution logic
Schedules micro-ops
Based on data dependence and resources
May speculatively execute
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem
L2 cache and systems bus
Core Cache Cache Size (kB) Cache Line Associativity Location Write Buffer
Type Size (words) Size (words)
ARM720T Unified 8 4 4-way Logical 8