You are on page 1of 39

CACHE MEMORY MAPPING

FUNCTIONS
Cache/ Main Memory Structure
Cache Design
• If memory contains 2n addressable words
– Memory can be broken up into blocks with K words per block. Number of
blocks = 2n / K
– Cache consists of C lines or slots, each consisting of K words
– C << M
– How to map blocks of memory to lines in the cache?
Mapping functions
 Mapping functions determine how memory blocks are placed in the cache.
 Eg Processor with main memory addressable by a 16-bit address and a cache of size 2048
 Size of Main memory = 216 = 64K words
 Size of a block = 24 = 16 words
 No of blocks in Main memory = (216 / 24 ) = 212 = 4K = 4096 blocks of 16 words each.
 Size of cache is 2048 (= 2K = 211) words
 No of blocks in Cache = (211 / 24 ) = 27 = 128 blocks of 16 words each
 Three mapping functions:
 Direct mapping
 Associative mapping
 Set-associative mapping.
Direct mapping
• Simplest mapping technique

• Each block of main memory maps to only one cache line - i.e. if a block is in cache, it
must be in one specific place

• Cache block no = memory block no % number of cache lines or blocks (b = j mod c)


• More than one memory block is mapped onto the same position in the cache.

• mem blocks - 0,128,256… 3968 maps to cache block 0,

• mem blocks- 1, 129,257… 3969 maps to cache block 1,

• mem blocks - 127,255….4095 maps to cache block 31.

• 4096/128 => 32 memory blocks map to one cache block


• To identify currently which memory block is in cache block, tag bits are attached to
each block, Tag no = memory block no / number of cache lines or blocks (quotient)
• May lead to contention for cache blocks even if the cache is not full.

•If a program accesses 2 blocks that map to the same line repeatedly, cache misses are
very high (thrashing)
• Resolve the contention by allowing new block to replace the old block, leading to a
trivial replacement algorithm.
• Simple to implement , Inexpensive ,but not very flexible
Direct mapping
• Block j of the main memory maps to j
modulo 128 block of cache.

• Memory address divided into three fields:

• 4 word bits – which one of 16 words


(each block has 16=24 words)

• 7 block bits – to which cache line or


block to place this memory block (128=2 7)
[b=j mod c]

• 5 tag bits - compared with the tag bits


associated with its location in the cache.
Identify which one of 32 blocks ( =25) is
currently resident in the cache[t = j / c]
Direct Mapping
Tag Block Word
5 7 4 Main memory address

11101,1111111,1100

• Tag: 11101
• Block: 1111111=127, in the 127th block of the cache
• Word:1100=12, the 12th word of the 127th block in the cache
Direct Mapping
Tag Cache 0 1 2 3 … 31 tag no
no
3 Block
384no 0 128 256 384 … 3968
1 129 1 129 257 385 3969
2 130 258 386 3970
3 131 259 387 3971
: : : : : :
126 254 382 510
31 4095 127 255 383 511 4095
Fully Associative Mapping

 A fully associative mapping scheme can overcome the problems of the direct mapping scheme
 A main memory block can load into any line of cache
 Memory address is interpreted as tag and word
 Tag uniquely identifies block of memory
 Every line’s tag is examined for a match
 Also need a Dirty and Valid bit
 Flexible, and uses cache space efficiently.
 All slots searched in parallel for target
 But Cache searching gets expensive!
 Ideally need circuitry that can simultaneously examine all tags for a match
 Lots of circuitry needed, high cost
 Need replacement policies now that anything can get thrown out of the cache
Associative mapping

Memory address divided into two fields:

4 word bits - which one of 16 words (each


block has 16=24 words)

12 tag bits - Identify which one of the 4096


blocks (4096=212 ) is currently resident in
the cache block.
Associative Mapping
Tag Word
12 4 Main memory address

111011111111,1100

• Tag: 111011111111
• Word: 1100=12,
• 12th word of a block in the cache
Set Associative Mapping
• Compromise between fully-associative and direct-mapped cache
– Cache is divided into a number of sets
– Each set contains a number of cache lines or blocks
– A given block maps to any line in a specific set
• Use direct-mapping to determine which set in cache corresponds to a set in memory
• Memory block could then be in any line of that set
– e.g. 2 lines per set (2 way associative mapping)
• A given block can be in either of 2 lines in a specific set
– e.g. K cache lines or blocks per set
• K way associative mapping
• A given block can be in one of K lines in a specific set
• Much easier to simultaneously search one set than all lines
Set Associative Mapping
• Set-associative mapping combination of direct and associative mapping.
• Blocks of cache are grouped into sets.
• Mapping function allows a block of the main memory to reside in any block of a specific set.

• Number of blocks per set is a design parameter.

 One extreme is to have all the blocks in one set, requiring no set bits (fully associative mapping).

 Other extreme is to have one block per set, is same as direct mapping

 No of blocks per set = set size = k (k-way associative)

 No of cache sets = No of cache blocks / k

 Set No = memory block no % number of cache sets (remainder) s = j mod (c /k)

 Tag no = memory block no / number of cache sets (quotient) t = j / (c/k)


Set-Associative mapping
• Divide the cache into 64 sets, with two blocks per
set.
• Memory block 0, 64, 128 etc. map to block 0, and
they can occupy either of the two positions.
• Memory address is divided into three fields:

4 word bits: one of 16 words within a block

(each block has 16=24 words)

6 set bits: points to a particular set in the cache


(128/2 = 64 = 26)

6 tag bits: used to check if the desired block is


present in cache (4096/64 = 26).
Set-Associative Mapping
Tag Set Word
6 6 4 Main memory address

111011,111111,1100

• Tag: 111011
• Set: 111111=63, in the 63rd set of the cache
• Word:1100=12, the 12th word of the 63rd set in the cache
Set-Associative Mapping
Set Tag Cache Tag Cache 0 1 2 62 63 tag no
0 no
2 Block
128 no
62 Block
3968 0 64 128 … 3968 4032
1 1 65 2 129 1 65 129 3969 4033
2 2 3970 63 4034 2 66 130 3970 4034
3 3 67 131 3971 4035
: : : : :
62 62 126 190 4030
63 62 4031 2 191 63 127 191 4031 4095
Where can a memory block be placed in cache?

° Memory Block 12 to be placed in a cache of 8 blocks :


• Fully Associative Mapping - No particular cache block
• Direct mapping, cache block no= Block Number % Number of Cache Blocks
• Set Associative Mapping, cache block no = Block Number % Number of Cache Sets

Set associative:
Fully associative: Direct mapped: block 12 can go
block 12 can go block 12 can go only anywhere in set 0
anywhere into block 4 (12 (12 mod 4)
Block 01234567 Block mod8)
01234567 Block 01234567
no. no. no.

Set Set Set Set


0 1 2 3
Direct Mapping with C=4
Each slot contains K words
Tag: Identifies which memory block is in the slot
Valid: Set after block copied from memory to indicate cache line has valid data

Valid Dirty Tag Block 0 Main


Slot 0 Block 1 Memory
Slot 1 Block 2
Slot 2 Block 3
Slot 3 Block 4
Cache Memory Block 5
Block 6
Block 7
Associative Mapping Example
Valid Dirty Tag Block 0 Main
Slot 0 Block 1 Memory
Slot 1 Block 2
Slot 2 Block 3
Slot 3 Block 4
Cache Memory Block 5
Block can map to any slot Block 6
Tag used to identify which block is in which slot
All slots searched in parallel for target Block 7
Set Associative Mapping
• To compute cache set number:
– SetNum = j mod v
• j = main memory block number
• v = number of sets in cache
Main Memory
Block 0
Block 1
Set 0 Slot 0 Block 2
Slot 1 Block 3
Set 1 Slot 2 Block 4
Slot 3 Block 5
Direct Mapping Address 64K Cache Example

 Given a 16Mb Memory = 24 bit address


 4 bytes in a block = 2 bit - word identifier

 64k cache= 14 bits to address the cache slot or line(216/22=214 cache lines)

 Leaves 8 bits left for tag (=22-14)

 No two blocks in the same line have the same Tag field

 Check contents of cache by finding line and checking Tag

 Also need a Valid bit and a Dirty bit

 Valid – Indicates if the slot holds a block belonging to the program being executed
 Dirty – Indicates if a block has been modified while in the cache. Will need to be written back to
memory before slot is reused for another block
Direct Mapping Example, 64K Cache

Main Memory

1B0007 = 0001 1011 0000 0000 0000 0111


Word = 11, Line = 0000 0000 0000 01, Tag= 0001 1011
Associative Mapping 64K Cache Example

22 bit tag stored with each slot in the cache – no more bits for the slot line number needed
since all tags searched in parallel
Compare tag field of a target memory address with tag entry in cache to check for hit

Least significant 2 bits of address identify which word is required from the block, e.g.:

 Address: FFFFFC = 1111 1111 1111 1111 1111 1100

 Tag: Left 22 bits, truncate on left:


 11 1111 1111 1111 1111 1111
 3FFFFF
 Address: 16339C = 0001 0110 0011 0011 1001 1100

 Tag: Left 22 bits, truncate on left:


 00 0101 1000 1100 1110 0111
 058CE7
Set Associative Mapping 64K Cache Example

 E.g. Given our 64Kb cache, with a line size of 4 bytes, we have 16384 lines. Say that we decide to create 8192
sets, where each set contains 2 lines. Then we need 13 bits to identify a set (213=8192)
 Use set field to determine cache set to look in
 Compare tag field of all slots in the set to see if we have a hit, e.g.:
 Address = 16339C = 0001 0110 0011 0011 1001 1100
 Tag = 0 0010 1100 = 02C
 Set = 0 1100 1110 0111 = 0CE7
 Word = 00 = 0
 Address = 008004 = 0000 0000 1000 0000 0000 0100
 Tag = 0 0000 0001 = 001
 Set = 0 0000 0000 0001 = 0001
 Word = 00 = 0
Cache Definitions
• Hit is when data is found at a given memory level.
• Miss is when data is not found at a given memory level..
• Hit rate is the percentage of memory accesses data is found at a given memory level.
• Miss rate is the percentage of memory accesses data is not found.
• Miss rate = 1 - hit rate.
• Hit time is the time required to access data at a given memory level.
• Miss penalty is the time required to process a miss, including the time that it takes to replace a block
of memory plus the time it takes to deliver the data to the processor.
Hit ratio (Hit rate) = hit / (hit + miss) = no. of hits/total accesses
Memory Access Time Example

 Assume that it takes 1 cycle to send the address, 15 cycles for each DRAM access and 1 cycle to
send a word of data.
 Assuming a cache block of 4 words and one-word wide DRAM,
 miss penalty = 1 + 4x15 + 4x1 = 65 cycles
 With main memory and bus width of 2 words,
 miss penalty = 1 + 2x15 + 2x1 = 33 cycles.
 For 4-word wide memory, miss penalty is 17 cycles. Expensive due to wide bus and control
circuits.
 With interleaved memory of 4 memory banks and same bus width,
 miss penalty = 1 + 1x15 + 4x1 = 20 cycles.
 The memory controller must supply consecutive addresses to different memory banks.
Interleaving is universally adapted in high-performance computers.
Replacement Algorithms -Associative & Set Associative
 Algorithm must be implemented in hardware (speed)
 Distinguish an Empty location from a Full one - Valid Bit
 Least Recently used (LRU)
 e.g. in 2 way set associative, which of the 2 block is LRU?
 For each slot, have an extra bit, USE. Set to 1 when accessed, set all others to 0.
 For more than 2-way set associative, need a time stamp for each slot - expensive
 First in first out (FIFO)
 Replace block that has been in cache longest
 Easy to implement as a circular buffer
 Least frequently used
 Replace block which has had fewest hits
 Need a counter to sum number of hits
 Random
 Almost as good as LFU and simple to implement
Replacement Algorithms
• Difficult to determine which blocks to kick out
• Least Recently Used (LRU) block
• The cache controller tracks references to all blocks as computation
proceeds.
• Increase / clear track counters when a hit/miss occurs
Replacement Algorithms
• For Associative & Set-Associative Cache
Which location should be emptied when the cache is full and a miss
occurs?
– First In First Out (FIFO)
– Least Recently Used (LRU)
• Distinguish an Empty location from a Full one
– Valid Bit

/ 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss

Cache A A A A A E E E E E
FIFO  B B B B B A A A A
C C C C C C C F
D D D D D D

Hit Ratio = No of hits / total no of accesses


= 3 / 10 = 0.3

/ 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss

Cache A B C A D E A D C F
LRU  A B C A D E A D C
A B C A D E A D
B C C C E A

Hit Ratio = No of hits / total no of accesses


= 4 / 10 = 0.4

/ 19
Interleaving
 Divides the memory system into a number of memory modules.
 Each module has its own address buffer register (ABR) and data buffer register
(DBR).
 Arranges addressing so that successive words in the address space are placed in
different modules.
 When requests for memory access involve consecutive addresses, the access will
be to different modules.
 Since parallel access to these modules is possible, the average rate of fetching
words from the Main Memory can be increased.
Methods of address layouts
k bits m bits
m bits k bits
Module Address in module MM address
Address in module Module MM address

ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR
Module Module Module Module Module Module
0 i n- 1 0 i 2k - 1

 Consecutive words are placed in a module. • Consecutive words are located in consecutive
 High-order k bits of a memory address modules.
determine the module. • Consecutive addresses can be located in
 Low-order m bits of a memory address
determine the word within a module. consecutive modules.
 When a block of words is transferred from • While transferring a block of data, several
main memory to cache, only one module is
busy at a time. memory modules can be kept busy at same
time.
Hit Rate and Miss Penalty
• Hit rate can be improved by increasing block size, while
keeping cache size constant
• Block sizes that are neither very small nor very large give best
results.
• Miss penalty can be reduced if load-through approach is used
when loading new blocks into cache.
Caches on the processor chip
• In high performance processors 2 levels of caches are
normally used.

• Avg access time in a system with 2 levels of caches is


T ave = h1c1+(1-h1)h2c2+(1-h1)(1-h2)M

You might also like