Minmin 7

COMPUTER
ORGANIZATION
After Midterm
‫الر ِحي ِم‬
‫َّ‬ ‫ن‬
‫ِ‬ ‫ـ‬ ‫م‬
‫ّ َٰ‬‫ح‬
‫ْ‬ ‫الر‬
‫َ‬ ‫ِ‬
‫ـه‬‫َ‬ ‫ّ‬ ‫ل‬ ‫ال‬ ‫م‬
‫ِ‬ ‫س‬
‫ِْ‬‫ب‬
‫‪11‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬
DRAM & SRAM
(Dynamic Random Access Memory) (Static Random Access Memory)
Hit Rate
High Hit Rate: most memory accesses are being serviced by the cache
Low Hit Rate: most memory accesses are being serviced by main memory
CPU
Increasing distance
Level 1
from the CPU in
access time
Levels in the Level 2

memory hierarchy
Level n
Size of the memory at each level

Hit Rate
Factors to increase the Hit Rate


Size of the cache

The placement of data and instructions (locality)

The cache replacement policy
Block of data Processor
Data are transferred

Hit Rate
Placement Strategy keep the frequently used data and instructions in the cache

Temporal locality: The tendency of a program to access the same memory location multiple times in a short period of time.

Spatial locality: The tendency of a program to access memory locations that are near each other in memory.
Efficient Replacement Policy by replacing the least used data and instructions in the cache

LRU (Least Recently Used)

LFU (Least Frequently Used)
Locality
Temporal locality: The tendency of a program to access the same memory

location multiple times in a short period of time.
Spatial locality: The tendency of a program to access memory locations that
are near each other in memory.
Why does code have locality considering instruction and data?

Hit and Miss
Two adjacent levels


Upper (near CPU)

Lower (farther from CPU)
Miss Penalty the time and resources required to service a cache miss means
the time it takes for the CPU to retrieve the requested data or instruction from
the main memory.
‫‪11‬‬
‫‪12/2‬‬
Decreasing Miss Rates
 Direct mapped: one unique cache location for each memory block
 cache block address = memory block address % cache size

 Fully associative: each memory block can locate anywhere in cache

 all cache entries are searched (in parallel) to locate block
 Set associative: each memory block can place in a unique set of cache locations –
if the set is of size n it is n-way set-associative
 cache set address = memory block address % number of sets in cache
 all cache entries in the corresponding set are searched (in parallel) to locate block
– Increasing degree of associativity
• reduces miss rate
• increases hit time because of the parallel search and then fetch
CACHE
X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2 it is fetched from
memory
Xn – 1 Xn – 1
X2 X2
Xn
X3 X3
a. Before the reference to Xn b. After the reference to Xn

CACHE
Cache Directory or Cache Map Includes an index and a tag for each cache block.

Index is used to identify the location of the cache block in the cache

Tag is used to identify the specific data or instruction stored in the cache block.
Hash Table

Similar to the cache directory but the data is stored in a key-value pair
format. When the CPU requests a data item, the key is used to index the
hash table and check if the data item is in the cache.
Direct Mapped Cache
Cache
000
001
010
011
111
100
101
110
00001 00101 01001 01101 10001 10101 11001 11101
Memory
Direct Mapped Cache
Cache = 8 blocks (2^3 ) Cache
Memory = 32 blocks (2^5 )
000
001
010
011
111
100
101
110
00000 = 0 00001 = 1
01000 = 8 01000 = 9
10000 = 16 10001 = 17
11000 = 24 11001 = 25 00001 00101 01001 01101 10001 10101 11001 11101
Memory
Cache size: 8 blocks x block size (varies depending on the system) = 8 x block size
Cache address (Index): log2(number of cache blocks) = log2(8) = 3 bits
Tag bits: log2(number of memory blocks / number of cache blocks) = log2(32/8) = log2(4) = 2 bits
Cache
000
001
010
011
111
100
101
110
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
Memory
(0) Initial state: (1) Address referred 10110 (miss):

Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 N 010 N
011 N 011 N
100 N 100 N
101 N 101 N
110 N 110 Y 10 Mem(10110)
111 N 111 N
Cache
000
001
010
011
111
100
101
110
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
Memory
(2) Address referred 11010 (miss): (3) Address referred 10110 (hit):
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 11 Mem(11010)
011 N 011 N
100 N 100 N to CPU
101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Cache
000
001
010
011
111
100
101
110
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
Memory
(3) Address referred 10110 (hit): (4) Address referred 10010 (miss):
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 10 Mem(10010)
011 N to CPU 011 N
100 N 100 N
101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Direct Mapped Cache MIPS style
Address showing bit positions
Address (showing bit positions)
31 30 13 12 11 210
Byte
offset
20 10
Hit Data
Tag

Cache 1024 blocks Index

1-word for each block = 32 bit Index Valid Tag Data

Byte offset (least 2 significant bits) is ignored 0
1

Index = next 10 bits of the memory address 2

Tag = 20 bits

Cache Block 1021
1022

= Index of memory address % Number of cache blocks 1023
20

= Index % 1024 32
Direct Mapped Cache MIPS style
Address showing bit positions
31 30 13 12 11 210
Byte
offset
20 10
Hit Data
Tag

Cache 1024 blocks Index

1-word for each block = 32 bit Index Valid Tag Data

Byte offset (least 2 significant bits) is ignored 0
1

Index = next 10 bits of the memory address 2

Tag = 20 bits

Cache Block 1021
1022

= Index of memory address % Number of cache blocks 1023
20

= Index % 1024 32
Cache Read Hit/Miss
Cache Read Hit/Miss
Cache Write Hit/Miss
Cache Write Hit/Miss
Direct Mapped Cache
Taking Advantage of Spatial Locality
Write Miss: Address showing bit positions


Tags are unequal 31 16 15 4 32 1 0

Fetch block from memory Hit Tag
16 12 2 Byte
Data
offset

Replace word that caused miss Index Block offset

Write block to both cache and memory V
16 bits
Tag
128 bits
Data

Unlike case of 1-word blocks, a write
miss with a multi-word block causes a 4K
entries
memory read
16 32 32 32 32
Mux
32
Direct Mapped Cache
Taking Advantage of Spatial Locality
40%
35%
30%
25%
Miss rate
20%
15%
10%
5%
0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
Miss rate vs. block size for various cache sizes 64 KB
256 KB
Improving Cache Performance
Block size in Instruction Data miss Effective combined

Program words miss rate rate miss rate
gcc 1 6.1% 2.1% 5.4%
4 2.0% 1.7% 1.9%
spice 1 1.2% 1.3% 1.2%
4 0.3% 0.6% 0.4%
Miss rates for gcc and spice in a MIPS R2000 with one and four word block sizes

GCC (GNU Compiler Collection) is a set of compilers that can be used to generate code for the MIPS R2000 processor.

Increasing Bandwidth
CPU CPU CPU
Multiplexor Bus
Cache Cache
Cache
Proce- Memory Memory Memory Memory
Bus Bus Bus ssor bank 0 bank 1 bank 2 bank 3
Memory Memory Memory Memory

Memory
bank 0 bank 1 bank 2 bank 3 Interleaved memory units
compete for bus
Memory b. Wide memory organization c. Interleaved memory organization
4 word wide memory and bus 4 word wide memory only
1 + 1*15 +1*1 = 17 cycles 1 +1*15 + 4*1 = 20 cycles

a. One-word-wide
memory organization
1 + 4*15 + 4*1 = 65 cycles Assume:


Cache Block of 4 words

1 clock cycle to send address to memory address buffer (1 bus trip)

15 clock cycles for each memory data access

1 clock cycle to send data to memory data buffer (1 bus trip)
‫‪12‬‬

Minmin 7

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Minmin 7

Uploaded by

Copyright:

Available Formats

COMPUTER

Levels in the Level 2

Size of the memory at each level

Factors to increase the Hit Rate

Data are transferred

Temporal locality: The tendency of a program to access the same memory

Why does code have locality considering instruction and data?

Two adjacent levels

 Fully associative: each memory block can locate anywhere in cache

a. Before the reference to Xn b. After the reference to Xn

Memory = 32 blocks (2^5 )

00001 00101 01001 01101 10001 10101 11001 11101

(0) Initial state: (1) Address referred 10110 (miss):

00001 00101 01001 01101 10001 10101 11001 11101

00001 00101 01001 01101 10001 10101 11001 11101

Write Miss: Address showing bit positions

Block size in Instruction Data miss Effective combined

Memory Memory Memory Memory

4 word wide memory and bus 4 word wide memory only

1 + 115 +11 = 17 cycles 1 +115 + 41 = 20 cycles

1 + 415 + 41 = 65 cycles Assume:

You might also like