You are on page 1of 30

COMPUTER

ORGANIZATION
After Midterm
‫الر ِحي ِم‬
‫َّ‬ ‫ن‬
‫ِ‬ ‫ـ‬ ‫م‬
‫ّ َٰ‬‫ح‬
‫ْ‬ ‫الر‬
‫َ‬ ‫ِ‬
‫ـه‬‫َ‬ ‫ّ‬ ‫ل‬ ‫ال‬ ‫م‬
‫ِ‬ ‫س‬
‫ِْ‬‫ب‬
‫‪11‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬
DRAM & SRAM
(Dynamic Random Access Memory) (Static Random Access Memory)
Hit Rate

High Hit Rate: most memory accesses are being serviced by the cache
Low Hit Rate: most memory accesses are being serviced by main memory

CPU

Increasing distance
Level 1
from the CPU in
access time

Levels in the Level 2


memory hierarchy

Level n

Size of the memory at each level


Hit Rate

Factors to increase the Hit Rate



Size of the cache

The placement of data and instructions (locality)

The cache replacement policy
Block of data Processor

Data are transferred


Hit Rate

Placement Strategy keep the frequently used data and instructions in the cache


Temporal locality: The tendency of a program to access the same memory location multiple times in a short period of time.


Spatial locality: The tendency of a program to access memory locations that are near each other in memory.

Efficient Replacement Policy by replacing the least used data and instructions in the cache


LRU (Least Recently Used)

LFU (Least Frequently Used)
Locality

Temporal locality: The tendency of a program to access the same memory


location multiple times in a short period of time.
Spatial locality: The tendency of a program to access memory locations that
are near each other in memory.

Why does code have locality considering instruction and data?


Hit and Miss

Two adjacent levels



Upper (near CPU)

Lower (farther from CPU)

Miss Penalty the time and resources required to service a cache miss means
the time it takes for the CPU to retrieve the requested data or instruction from
the main memory.
‫‪11‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬
‫‪12/2‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬
Decreasing Miss Rates

 Direct mapped: one unique cache location for each memory block
 cache block address = memory block address % cache size

 Fully associative: each memory block can locate anywhere in cache


 all cache entries are searched (in parallel) to locate block

 Set associative: each memory block can place in a unique set of cache locations –
if the set is of size n it is n-way set-associative
 cache set address = memory block address % number of sets in cache
 all cache entries in the corresponding set are searched (in parallel) to locate block
– Increasing degree of associativity
• reduces miss rate
• increases hit time because of the parallel search and then fetch
CACHE

X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2 it is fetched from
memory
Xn – 1 Xn – 1

X2 X2

Xn

X3 X3

a. Before the reference to Xn b. After the reference to Xn


CACHE

Cache Directory or Cache Map Includes an index and a tag for each cache block.

Index is used to identify the location of the cache block in the cache

Tag is used to identify the specific data or instruction stored in the cache block.

Hash Table

Similar to the cache directory but the data is stored in a key-value pair
format. When the CPU requests a data item, the key is used to index the
hash table and check if the data item is in the cache.
Direct Mapped Cache
Cache

000
001
010
011

111
100
101
110
00001 00101 01001 01101 10001 10101 11001 11101

Memory
Direct Mapped Cache
Cache = 8 blocks (2^3 ) Cache

Memory = 32 blocks (2^5 )

000
001
010
011

111
100
101
110
00000 = 0 00001 = 1

01000 = 8 01000 = 9

10000 = 16 10001 = 17

11000 = 24 11001 = 25 00001 00101 01001 01101 10001 10101 11001 11101

Memory

Cache size: 8 blocks x block size (varies depending on the system) = 8 x block size
Cache address (Index): log2(number of cache blocks) = log2(8) = 3 bits
Tag bits: log2(number of memory blocks / number of cache blocks) = log2(32/8) = log2(4) = 2 bits
Cache

000
001
010
011

111
100
101
110
Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

Memory

(0) Initial state: (1) Address referred 10110 (miss):


Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 N 010 N
011 N 011 N
100 N 100 N
101 N 101 N
110 N 110 Y 10 Mem(10110)
111 N 111 N
Cache

000
001
010
011

111
100
101
110
Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

Memory

(2) Address referred 11010 (miss): (3) Address referred 10110 (hit):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 11 Mem(11010)
011 N 011 N
100 N 100 N to CPU

101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Cache

000
001
010
011

111
100
101
110
Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

Memory

(3) Address referred 10110 (hit): (4) Address referred 10010 (miss):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 10 Mem(10010)
011 N to CPU 011 N
100 N 100 N
101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
Direct Mapped Cache MIPS style
Address showing bit positions
Address (showing bit positions)
31 30 13 12 11 210
Byte
offset

20 10
Hit Data
Tag

Cache 1024 blocks Index


1-word for each block = 32 bit Index Valid Tag Data

Byte offset (least 2 significant bits) is ignored 0
1

Index = next 10 bits of the memory address 2


Tag = 20 bits

Cache Block 1021
1022

= Index of memory address % Number of cache blocks 1023
20

= Index % 1024 32
Direct Mapped Cache MIPS style
Address showing bit positions
Address (showing bit positions)
31 30 13 12 11 210
Byte
offset

20 10
Hit Data
Tag

Cache 1024 blocks Index


1-word for each block = 32 bit Index Valid Tag Data

Byte offset (least 2 significant bits) is ignored 0
1

Index = next 10 bits of the memory address 2


Tag = 20 bits

Cache Block 1021
1022

= Index of memory address % Number of cache blocks 1023
20

= Index % 1024 32
Cache Read Hit/Miss
Cache Read Hit/Miss
Cache Write Hit/Miss
Cache Write Hit/Miss
Direct Mapped Cache
Taking Advantage of Spatial Locality

Write Miss: Address showing bit positions


Address (showing bit positions)

Tags are unequal 31 16 15 4 32 1 0


Fetch block from memory Hit Tag
16 12 2 Byte
Data
offset

Replace word that caused miss Index Block offset


Write block to both cache and memory V
16 bits
Tag
128 bits
Data


Unlike case of 1-word blocks, a write
miss with a multi-word block causes a 4K
entries
memory read

16 32 32 32 32

Mux
32
Direct Mapped Cache
Taking Advantage of Spatial Locality
40%

35%

30%

25%
Miss rate

20%

15%

10%

5%

0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
Miss rate vs. block size for various cache sizes 64 KB
256 KB
Improving Cache Performance

Block size in Instruction Data miss Effective combined


Program words miss rate rate miss rate
gcc 1 6.1% 2.1% 5.4%
4 2.0% 1.7% 1.9%
spice 1 1.2% 1.3% 1.2%
4 0.3% 0.6% 0.4%
Miss rates for gcc and spice in a MIPS R2000 with one and four word block sizes

GCC (GNU Compiler Collection) is a set of compilers that can be used to generate code for the MIPS R2000 processor.

Increasing Bandwidth
CPU CPU CPU

Multiplexor Bus
Cache Cache
Cache
Proce- Memory Memory Memory Memory
Bus Bus Bus ssor bank 0 bank 1 bank 2 bank 3

Memory Memory Memory Memory


Memory
bank 0 bank 1 bank 2 bank 3 Interleaved memory units
compete for bus
Memory b. Wide memory organization c. Interleaved memory organization

4 word wide memory and bus 4 word wide memory only

1 + 1*15 +1*1 = 17 cycles 1 +1*15 + 4*1 = 20 cycles


a. One-word-wide
memory organization

1 + 4*15 + 4*1 = 65 cycles Assume:



Cache Block of 4 words

1 clock cycle to send address to memory address buffer (1 bus trip)

15 clock cycles for each memory data access

1 clock cycle to send data to memory data buffer (1 bus trip)
‫‪12‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬

You might also like