Professional Documents
Culture Documents
Memory Hierarchy
Shovon Roy
Lecturer, CSE, BUBT
Patterson, D. A., & Hennessy, J. L. (2016). Computer
Organization and Design Revised 4th Edition
Computer Architecture
Principle of locality
In computer science, locality of reference, also known as the principle of locality, is the
tendency of a processor to access the same set of memory locations repetitively over a short
period of time. There are two different types of locality:
Hit time: The time required to access a level of the memory hierarchy, including the time needed to determine whether
the access is a hit or a miss.
Miss penalty: The time required to fetch a block into a level of the memory hierarchy from the lower level, including the
time to access the block, transmit it from one level to the other, insert it in the level that experienced the miss, and then
pass the block to the requestor.
Computer Architecture
Example
Suppose, CPU refers memory 100 times. Among those, number of hits is 80 and number of
misses is 20. Assume, when a hit occurs, memory access time is 10 ns, and when a miss
occurs, memory access time is 100 ns. Calculate total memory access time and average
memory access time.
Soln:
Total time for all hits = 80 * 10 = 800 ns
Total time for all misses = 20 * 100 = 2000 ns
-----------------------------------------------------------
Total memory access time = 2800 ns
Tavg = H * hit time + (1-H) * miss time H (hit ratio) = no. of hits / total accesses
= 0.8 * 10 + (1 - 0.8) * 100 = 80 / 100 = 0.8
= 28 ns
Computer Architecture
Practice
Assume that for a certain processor, a read request takes 50 ns on a cache miss and 5 ns on a
cache hit. Suppose while running a program, it was observed that 80% of the processor's read
requests result in a cache hit. The average read access time is ________ ns?
Computer Architecture
Types of cache access
1. Simultaneous access: Request for cache memory and main memory are generated
simultaneously.
• Tavg = H * Tcm + (1 - H) * Tmm
2. Hierarchical access: Only faster memory is accessed, that means CPU first access cache
memory and when there is a miss then CPU access main memory.
• Tavg = H * Tcm + (1-H) * (Tcm + Tmm)
= H * Tcm + Tcm + Tmm - H * Tcm - H * Tmm
Tavg = Tcm + (1 - H) * Tmm
Computer Architecture
Example
In a two-level hierarchy, if the top level has an access time of 10 ns and the bottom level has
an access time of 60 ns, what is the hit rate on the top level required to give an average access
time of 15 ns?
Soln:
where,
Tblock = block access time from main memory
= block size * Tmm
Soln:
We know,
Hierarchical access: Tavg = Tcm + (1 - H) * Tblock
where,
Tblock = block access time from main memory
= block size * Tmm
Now, Tblock = block size * Tmm = 16 * 50 = 800 ns
Soln:
NOTES
• Main memory is divided into equal size partitions called as blocks or frames.
• Cache memory is divided into partitions having same size as that of blocks called as lines.
• During cache mapping, block of main memory is simply copied to the cache and the block
is not actually brought from the main memory.
Main memory
3
Cache memory
05
4
.....
5
.....
6
.....
7
.....
8
.....
9
98
Cache line number = (Main Memory Block Address) Mod (Number of lines in Cache)
99
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Hit/Miss Comments
(main memory block (cache memory
no.) block no.)
CM block no.
01
02
Tag
03
0
04
Main memory
1
05
2
Main memory block no. .....
3
Cache memory
Tag Cache memory .....
block no. 4
.....
5
.....
6
.....
7
98
8
99
9
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)
CM block no.
Blocks in main memory = 8 (000 - 111) 001
Tag
010
Main memory
Cache memory
00 011
Main memory block no. 01 100
Tag Cache memory 10 101
block no.
11 110
111
Cache line number = (Main Memory Block Address) Mod (Number of lines in Cache)
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)
Note: Block size 2 bytes means each block in the main memory has 2 cells (1 byte each) and byte no. for first cell is 0
and for second cell is 1. So, number of bits for byte no. field is 1 (as 2 bytes = 2 1 bytes => 1 bit). Now, tell me the
number of bits for block size 32 bytes.
Block size 32 bytes means each block in the main memory has 32 cells (1 byte each) and byte no. for 1st cell is 00000 and
for 32nd cell is 11111. So, number of bits for byte no. field is 5 (as 32 bytes = 25 bytes => 5 bits).
Computer Architecture
Cache Mapping: Direct Mapping 0000 Block
0001 000
Main memory address
CM block no.
0010 Block
Block no. Byte no. 0011 001
0100 Block
Tag
0101 010
Main memory
Cache memory
Tag Cache memory block no. Byte no.
00 0110 Block
011
01 0111
1000 Block
10 100
1001
11 1010 Block
1011 101
1100 Block
1101 110
1110 Block
1111 111
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)
Assume there is a main memory with 16 blocks where block size is 4 words and a 4 lines
cache memory. Find the length of the addresses and how many bits for the tag field. Assign
the following address of reference to the appropriate cache block using direct mapping
analyzing the given scenario:
CPU requests: W2 W13 W15 W53 W41 W49 W0
Computer Architecture
Cache Mapping: Direct Mapping (Valid-Bit, Modified-Bit)
CM block no.
erased, there is something (called garbage value) in the cache.
Now, let's assume, CPU requests for some data in the cache and
Valid bit
cache is not initialized before so it contains some garbage.
Tag
Then, as usual, the tag in the cache block and the tag generated
Cache memory
by the CPU are compared, what if they are same (CACHE 00 0
HIT). As a result, CPU will go in a wrong direction. That will 01 0
be a problem.
10 0
Solution: We will use valid bit and initialize it with 0 when 11 0
computer starts. Here, 0 means invalid content and 1 means
valid content. Valid bit is not a part of the memory address.
Computer Architecture
Cache Mapping: Direct Mapping (Valid-Bit, Modified-Bit)
CM block no.
issues. But if it is a write operation, suppose the write back
Modified bit
technique is being followed, then the CPU will change the
Valid bit
cache content which will need to be updated in the main
Tag
memory, here is an issue. The problem is how it will be known
Cache memory
that the CPU has changed the cache content. To solve this 00 0 0
problem, we need another bit called modified or dirty bit. If the 01 1 1
dirty bit is 0, then there are no changes in the cache and if it is 1,
10 0 0
then cache content has been changed that impacts main
memory to be updated with the new value. Modified bit is also 11 1 0
not a part of the memory address.
Computer Architecture
Cache Mapping: Set Associative Mapping 0000 Block
0001 000
CPU Requests: 1, 5, 1, 5, 1, 5
CM block no.
0010 Block
In that case, all are miss in the 0011 001
Tag
being replaced one by one. 0101 010
Main memory
Cache memory
Soln: Use set associative 00 0110 Block
mapping 01 1, 5,.... 0111 011
1000 Block
10 100
1001
11 1010 Block
1011 101
1100 Block
1101 110
1110 Block
1111 111
Computer Architecture
Cache Mapping: 2-Way Set Associative Mapping 0000 Block
0001 000
CPU Requests: 1, 5, 1, 5, 1, 5
CM block no.
0010 Block
In that case, the first 1 and 5 0011 001
Main memory
Cache memory
will occur hit in the cache. 0110 Block
Soln: Use set associative 0 0111 011
mapping 1000 Block
1001 100
1
1010 Block
1011 101
1100 Block
CPU Requests: 1(miss), 5(miss), 1(hit), 5(hit), 1(hit), 5(hit) 1101 110
1110 Block
Cache set number = (Main Memory Block Address) Mod (Number of 111
1111
sets in Cache)
Computer Architecture
Cache Mapping: 2-Way Set Associative Mapping 0000 Block
0001 000
CPU Requests: 1, 5, 1, 5, 1, 5
CM block no.
0010 Block
In that case, the first 1 and 5 0011 001
Main memory
Cache memory
will occur hit in the cache. 0110 Block
Soln: Use set associative 0 0111 011
mapping 1000 Block
1001 100
1
Main memory address 1010 Block
1011 101
Block no. Byte no.
1100 Block
1101 110
1110 Block
Tag Cache memory set no. (set Byte no. 1111 111
offset) (byte offset)
Computer Architecture
Example
A computer has a 256Kbytes, 4-way set associative, write back data cache with block size of
32 bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag
directory entry contains, in addition to address tag, 2 valid buts, 1 modified bit and 1
replacement bit. Find the number of bits in the tag field and the size of the cache tag directory.
32-bits
Soln:
Tag set offset byte offset
16 11 5
Number of bits for byte offset => block size 32 bytes => block size 25 bytes => 5 bits
Blocks in cache memory = cache size / block size = 256KB / 32B
= 256*1024 B / 32B = 8192 = 213
Number of sets in cache = No. of block in cache / associativity = 213 / 4 = 211
So, number of bits for set offset = 11 bits
Number of bits for Tag = 32 - (5 + 11) = 16 bits
Tag directory size = Blocks in cache memory * (Number of bits for Tag + Extra bits)
= 213 * (16 + 2 + 1 + 1) = 163840 bits = 160 Kbits
Computer Architecture
Cache Mapping: Fully Associative Mapping
Direct Mapping k-way set associative mapping Fully associative mapping
Tag Cache memory Byte no. Tag set offset byte offset
block no. .................... Tag byte offset
increasing associativity
CM block no.
CM block no.
Tag
Tag
Tag
Byte offset
Here,
• All the lines of cache are freely available.
• Thus, any block of main memory can map to any line of the cache.
• Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.
Hit or Miss • Size of MUX for Tag selection = No. of blocks in cache:1
Computer Architecture
Cache Mapping: Hardware Implementation in 2-Way Set Associative Mapping
set
00
set
01
set
10
set
11
000
No replacement policy in direct mapping technique 001
as we have only one block to be replaced if occupied 010
before.
011
100
101
110
111
Computer Architecture
Cache Mapping: Block Replacement in k-Way Set Associative Mapping
MM blocks = 8 (0-7)
CM blocks = 4
Let’s assume 2-way set associative
MM blocks request: 1, 3, 1, 5
Consider a 4-way set associative cache (initially empty) with total 16 cache blocks. The main
memory consists of 256 block and the request for memory blocks is in the following order:
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
Which one of the following memory block will not be in cache if LRU replacement policy is
used?
a) 3 00
b) 8
c) 129 01
d) 216 10
11
Cache set number = (Main Memory Block Address) Mod (Number of sets in Cache)
Computer Architecture
Cache Mapping: Block Replacement in Fully Associative Mapping
Consider a fully associative cache with 8 cache blocks (0-7) and the following sequence of
memory block requests:
4, 3, 25,8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?
a) 4
b) 5
c) 6
d) 7 0 1 2 3 4 5 6 7
Computer Architecture
Virtual Memory
Virtual Memory is a storage scheme that provides user an illusion of having a very big main
memory. This is done by treating main memory as a “cache” for secondary storage and
secondary memory as the main memory.
In this scheme, user can load the bigger size processes than the available main memory by
having the illusion that the memory is available to load the process.
Instead of loading one big process in the main memory, the Operating System loads the
different parts of more than one process in the main memory.
By doing this, the degree of multiprogramming will be increased and therefore, the CPU
utilization will also be increased.
Computer Architecture
How Virtual Memory Works?
In modern word, virtual memory has become quite common these days. In this scheme, a
process or a program is divided into some equally sized chunks or blocks called pages and
each page has an address called virtual address which is translated by a combination of
hardware and software to a physical address (RAM); here, main memory is also divided into
equally sized blocks called frames and each frame size is equal to the page size. Whenever
some pages needs to be loaded in the main memory for the execution and the memory is not
available for those many pages, then in that case, instead of stopping the pages from entering
in the main memory, the OS search for the RAM area that are least used in the recent times or
that are not referenced and copy that into the secondary memory to make the space for the
new pages in the main memory.
Since all this procedure happens automatically, therefore it makes the computer feel like it is
having the unlimited RAM.
Computer Architecture
Computer Architecture
Computer Architecture
Demand Paging
According to the concept of Virtual Memory, in order to execute some process, only a part of
the process needs to be present in the main memory which means that only a few pages will
only be present in the main memory at any time.
However, deciding, which pages need to be kept in the main memory and which need to be
kept in the secondary memory, is going to be difficult because we cannot say in advance that a
process will require a particular page at particular time.
Therefore, to overcome this problem, there is a concept called Demand Paging is introduced.
It suggests keeping all pages of the frames in the secondary memory until they are required. In
other words, it says that do not load any page in the main memory until it is required.
Whenever any page is referred for the first time in the main memory, then that page will be
found in the secondary memory.
Computer Architecture
What is a Page Fault?
If the referred page is not present in the main memory then there will be a miss and the
concept is called Page miss or page fault. The CPU has to access the missed page from the
secondary memory.
Page Table
The table containing the virtual to physical address translations in a virtual memory system.
The table, which is stored in memory, is typically indexed by the virtual page number; each
entry in the table contains the physical page number for that virtual page if the page is
currently in memory. Page table is stored in main memory at the time of process creation and
its base address is stored in process control block. Page table is created for Each Process
separately.
Computer Architecture