Professional Documents
Culture Documents
ORGANIZATION
After Midterm
الر ِحي ِم
َّ ن
ِ ـ م
ّ َٰح
ْ الر
َ ِ
ـهَ ّ ل ال م
ِ س
ِْب
11
Exploiting Memory Hierarchy
” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”
DRAM & SRAM
(Dynamic Random Access Memory) (Static Random Access Memory)
Hit Rate
High Hit Rate: most memory accesses are being serviced by the cache
Low Hit Rate: most memory accesses are being serviced by main memory
CPU
Increasing distance
Level 1
from the CPU in
access time
Level n
Placement Strategy keep the frequently used data and instructions in the cache
Temporal locality: The tendency of a program to access the same memory location multiple times in a short period of time.
Spatial locality: The tendency of a program to access memory locations that are near each other in memory.
Efficient Replacement Policy by replacing the least used data and instructions in the cache
LRU (Least Recently Used)
LFU (Least Frequently Used)
Locality
Miss Penalty the time and resources required to service a cache miss means
the time it takes for the CPU to retrieve the requested data or instruction from
the main memory.
11
Exploiting Memory Hierarchy
” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”
12/2
Exploiting Memory Hierarchy
” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”
Decreasing Miss Rates
Direct mapped: one unique cache location for each memory block
cache block address = memory block address % cache size
Set associative: each memory block can place in a unique set of cache locations –
if the set is of size n it is n-way set-associative
cache set address = memory block address % number of sets in cache
all cache entries in the corresponding set are searched (in parallel) to locate block
– Increasing degree of associativity
• reduces miss rate
• increases hit time because of the parallel search and then fetch
CACHE
X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2 it is fetched from
memory
Xn – 1 Xn – 1
X2 X2
Xn
X3 X3
Cache Directory or Cache Map Includes an index and a tag for each cache block.
Index is used to identify the location of the cache block in the cache
Tag is used to identify the specific data or instruction stored in the cache block.
Hash Table
Similar to the cache directory but the data is stored in a key-value pair
format. When the CPU requests a data item, the key is used to index the
hash table and check if the data item is in the cache.
Direct Mapped Cache
Cache
000
001
010
011
111
100
101
110
00001 00101 01001 01101 10001 10101 11001 11101
Memory
Direct Mapped Cache
Cache = 8 blocks (2^3 ) Cache
000
001
010
011
111
100
101
110
00000 = 0 00001 = 1
01000 = 8 01000 = 9
10000 = 16 10001 = 17
11000 = 24 11001 = 25 00001 00101 01001 01101 10001 10101 11001 11101
Memory
Cache size: 8 blocks x block size (varies depending on the system) = 8 x block size
Cache address (Index): log2(number of cache blocks) = log2(8) = 3 bits
Tag bits: log2(number of memory blocks / number of cache blocks) = log2(32/8) = log2(4) = 2 bits
Cache
000
001
010
011
111
100
101
110
Direct Mapped Cache
Memory
000
001
010
011
111
100
101
110
Direct Mapped Cache
Memory
(2) Address referred 11010 (miss): (3) Address referred 10110 (hit):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 11 Mem(11010)
011 N 011 N
100 N 100 N to CPU
101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Cache
000
001
010
011
111
100
101
110
Direct Mapped Cache
Memory
(3) Address referred 10110 (hit): (4) Address referred 10010 (miss):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 10 Mem(10010)
011 N to CPU 011 N
100 N 100 N
101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Direct Mapped Cache MIPS style
Address showing bit positions
Address (showing bit positions)
31 30 13 12 11 210
Byte
offset
20 10
Hit Data
Tag
Cache 1024 blocks Index
1-word for each block = 32 bit Index Valid Tag Data
Byte offset (least 2 significant bits) is ignored 0
1
Index = next 10 bits of the memory address 2
Tag = 20 bits
Cache Block 1021
1022
= Index of memory address % Number of cache blocks 1023
20
= Index % 1024 32
Direct Mapped Cache MIPS style
Address showing bit positions
Address (showing bit positions)
31 30 13 12 11 210
Byte
offset
20 10
Hit Data
Tag
Cache 1024 blocks Index
1-word for each block = 32 bit Index Valid Tag Data
Byte offset (least 2 significant bits) is ignored 0
1
Index = next 10 bits of the memory address 2
Tag = 20 bits
Cache Block 1021
1022
= Index of memory address % Number of cache blocks 1023
20
= Index % 1024 32
Cache Read Hit/Miss
Cache Read Hit/Miss
Cache Write Hit/Miss
Cache Write Hit/Miss
Direct Mapped Cache
Taking Advantage of Spatial Locality
Fetch block from memory Hit Tag
16 12 2 Byte
Data
offset
Replace word that caused miss Index Block offset
Write block to both cache and memory V
16 bits
Tag
128 bits
Data
Unlike case of 1-word blocks, a write
miss with a multi-word block causes a 4K
entries
memory read
16 32 32 32 32
Mux
32
Direct Mapped Cache
Taking Advantage of Spatial Locality
40%
35%
30%
25%
Miss rate
20%
15%
10%
5%
0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
Miss rate vs. block size for various cache sizes 64 KB
256 KB
Improving Cache Performance
Multiplexor Bus
Cache Cache
Cache
Proce- Memory Memory Memory Memory
Bus Bus Bus ssor bank 0 bank 1 bank 2 bank 3