You are on page 1of 52

Large and Fast Exploiting

Memory Hierarchy

Shovon Roy
Lecturer, CSE, BUBT
Patterson, D. A., & Hennessy, J. L. (2016). Computer
Organization and Design Revised 4th Edition
Computer Architecture
Principle of locality
In computer science, locality of reference, also known as the principle of locality, is the
tendency of a processor to access the same set of memory locations repetitively over a short
period of time. There are two different types of locality:

1. Temporal locality (locality in time)


If an item is referenced, it will tend to be referenced again soon. For example, if you
recently brought a book to your desk from book shelf to look at, you will probably need
to look at it again soon.

2. Spatial locality (locality in space)


If an item is referenced, items whose addresses are close by will tend to be referenced
soon. For example, when you brought out the book on early English computers to write a
term paper on important historical developments in computer hardware, you also noticed
that there was another book shelved next to it about early mechanical computers, so you
also brought back that book and, later on, found something useful in that book. Libraries
put books on the same topic together on the same shelves to increase spatial locality.
Computer Architecture
Memory hierarchy
We ta k e a dv a n ta g e o f t h e
principle of locality by
implementing the memory of a (Cache)
computer as a memory
hierarchy. Memory hierarchy
(Main memory)
is a structure that uses
multiple levels of memories;
as the distance from the
processor increases, the size of (HDD)
the memories and the access
time both increase.
Computer Architecture
Cache memory
The data or contents of the main memory that are used frequently by CPU are stored in the
cache memory so that the processor can easily access that data in a shorter time. Cache
memory is a special very high-speed memory. Cache memory is costlier than main memory or
disk memory. It holds frequently requested data and instructions so that they are immediately
available to the CPU when needed. Whenever the CPU needs to access memory, it first checks
the cache memory. If the data is not found in cache memory, then the CPU moves into the
main memory. Cache memory is placed between the CPU and the main memory.
Computer Architecture
Cache Performance
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.
• If the processor finds that the memory location is in the cache, a cache hit has occurred
and data is read from cache.
• If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from main memory,
then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Hit ratio or hit rate = hit / (hit + miss) = no. of hits / total accesses
Miss ratio or miss rate = 1 - Hit ratio

Hit time: The time required to access a level of the memory hierarchy, including the time needed to determine whether
the access is a hit or a miss.
Miss penalty: The time required to fetch a block into a level of the memory hierarchy from the lower level, including the
time to access the block, transmit it from one level to the other, insert it in the level that experienced the miss, and then
pass the block to the requestor.
Computer Architecture
Example
Suppose, CPU refers memory 100 times. Among those, number of hits is 80 and number of
misses is 20. Assume, when a hit occurs, memory access time is 10 ns, and when a miss
occurs, memory access time is 100 ns. Calculate total memory access time and average
memory access time.

Soln:
Total time for all hits = 80 * 10 = 800 ns
Total time for all misses = 20 * 100 = 2000 ns
-----------------------------------------------------------
Total memory access time = 2800 ns

So, average memory access time , Tavg = 2800 / 100 = 28 ns

Tavg = H * hit time + (1-H) * miss time H (hit ratio) = no. of hits / total accesses
= 0.8 * 10 + (1 - 0.8) * 100 = 80 / 100 = 0.8
= 28 ns
Computer Architecture
Practice
Assume that for a certain processor, a read request takes 50 ns on a cache miss and 5 ns on a
cache hit. Suppose while running a program, it was observed that 80% of the processor's read
requests result in a cache hit. The average read access time is ________ ns?
Computer Architecture
Types of cache access
1. Simultaneous access: Request for cache memory and main memory are generated
simultaneously.
• Tavg = H * Tcm + (1 - H) * Tmm

2. Hierarchical access: Only faster memory is accessed, that means CPU first access cache
memory and when there is a miss then CPU access main memory.
• Tavg = H * Tcm + (1-H) * (Tcm + Tmm)
= H * Tcm + Tcm + Tmm - H * Tcm - H * Tmm
Tavg = Tcm + (1 - H) * Tmm
Computer Architecture
Example
In a two-level hierarchy, if the top level has an access time of 10 ns and the bottom level has
an access time of 60 ns, what is the hit rate on the top level required to give an average access
time of 15 ns?

Soln:

Tavg = Tcm + (1 - H) * Tmm


15 = 10 + (1 - H) * 60
5 = 60 - 60H
so, H = 55/60 = 0.91
Computer Architecture
Tavg when locality of reference
1. Simultaneous access: Tavg = H * Tcm + (1 - H) * Tblock

2. Hierarchical access: Tavg = Tcm + (1 - H) * Tblock

where,
Tblock = block access time from main memory
= block size * Tmm

If suppose block transfer time is also included here,


Tavg when locality of reference and block transfer both are included

1. Simultaneous access: Tavg = H * Tcm + (1 - H) * (Tblock + Tcm)

2. Hierarchical access: Tavg = Tcm + (1 - H) * (Tblock + Tcm)


Computer Architecture
Example
In a two-level hierarchy, the top level has an access time of 10 ns and the bottom level has an
access time of 50 ns, hit rate on the top level is 90%. if the block size of cache is 16 bytes then
find the average memory access time.
(Note: Consider the system uses Locality of Reference)

Soln:
We know,
Hierarchical access: Tavg = Tcm + (1 - H) * Tblock
where,
Tblock = block access time from main memory
= block size * Tmm
Now, Tblock = block size * Tmm = 16 * 50 = 800 ns

So, Tavg = Tcm + (1 - H) * Tblock


= 10 + (1 - 0.9) * 800
= 90 ns
Computer Architecture
Example
In a two-level hierarchy, the cache has an access time of 12 ns and the main memory has an
access time of 120 ns, hit rate on the cache is 90%. If the block size of cache is 16 bytes then
find the average memory access time including miss penalty.
(Miss penalty: Time to bring a block from main memory to cache when cache miss)

Soln:

Hierarchical access: Tavg = Tcm + (1 - H) * (Tblock + Tcm)


where,
Tblock = block access time from main memory
= block size * Tmm
Now, Tblock = block size * Tmm = 16 * 120 = 1920 ns

So, Tavg = Tcm + (1 - H) * (Tblock + Tcm)


= 12 + (1 - 0.9) * (1920 + 12)
= 205.2 ns
Computer Architecture
Cache write
Cache write problem or write propagation problem:
When CPU updates content of any block in the cache, then original content in main memory
need to be updated.
There are two types of cache write:
1. Write through
In write-through, data is
simultaneously updated
to cache and memory.
This process is simpler
and more reliable but
time consuming.
Computer Architecture
Cache write
2. Write back
The data is updated
only in the cache and
updated into the memory
at a later time. Data is
updated in the memory
only when the cache line
is ready to be replaced.
Computer Architecture
Cache Mapping
Cache mapping defines how a block from the main memory is mapped to the cache memory
in case of a cache miss.
OR
Cache mapping is a technique by which the contents of main memory are brought into the
cache memory.

NOTES
• Main memory is divided into equal size partitions called as blocks or frames.
• Cache memory is divided into partitions having same size as that of blocks called as lines.
• During cache mapping, block of main memory is simply copied to the cache and the block
is not actually brought from the main memory.

Cache mapping is performed using following three different techniques-


1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
Computer Architecture
Cache Mapping

CPU generates main memory address


always (not cache memory address),
then how CPU does search content
into the cache memory.
Here a special technique is followed to keep the content from the main memory in the cache
memory and this technique is also used to retrieve the contents from the cache memory which
is required for the CPU.
Computer Architecture
Cache Mapping: Direct Mapping 00
01
Before going into depth, let’s see an example. 0
Suppose, 02
1
Blocks in cache = 10 (0 - 9) 03
Blocks in main memory = 100 (00 - 99) 2
04

Main memory
3

Cache memory
05
4
.....
5
.....
6
.....
7
.....
8
.....
9
98
Cache line number = (Main Memory Block Address) Mod (Number of lines in Cache)
99
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Hit/Miss Comments
(main memory block (cache memory
no.) block no.)

Block no. 63 63 % 10 = 3 4 Miss Bring main memory block


no. 63 in cache at block
no. 3

Block no. 93 93 % 10 = 3 Miss Bring main memory block


(contents of block no. no. 93 in cache at block
63 is currently present) no. 3 by replacing block
no. 63
Computer Architecture
Cache Mapping: Direct Mapping

The contents of any memory block (03, 13,


23, 33, ..., 93) may be present in the cache at
block 3 right now. The problem is how the
CPU identifies that the current content is
from which main memory block.
We need extra information to solve this
problem

unique value which


we keep as extra mod value
information to detect which is same
current contect block for all blocks
accurately called tag.
Computer Architecture
Cache Mapping: Direct Mapping 00

CM block no.
01
02

Tag
03
0
04

Main memory
1
05
2
Main memory block no. .....
3

Cache memory
Tag Cache memory .....
block no. 4
.....
5
.....
6
.....
7
98
8
99
9
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)

Block no. 63 63 % 10 = 3 6 Miss Bring main memory


block no. 63 in
cache at block no. 3

Block no. 93 93 % 10 = 3 9 Miss Bring main memory


(contents of block block no. 93 in
no. 63 is currently cache at block no. 3
present) by replacing block
no. 63 and update
the tag to 9
Computer Architecture
Cache Mapping: Direct Mapping

Let’s see another example.


Suppose,
Blocks in cache = 4 (00 - 11) 000

CM block no.
Blocks in main memory = 8 (000 - 111) 001

Tag
010

Main memory
Cache memory
00 011
Main memory block no. 01 100
Tag Cache memory 10 101
block no.
11 110
111

Cache line number = (Main Memory Block Address) Mod (Number of lines in Cache)
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)

Req1: Block no. 101 01 1 Miss Bring main memory


block no. 101 in
Tag CM block no. cache at block no.
1 01 01 with tag 1
Req2: Block no. 001 01 0 Miss Bring main memory
(contents of block block no. 001 in
no. 101 is cache at block no.
Both CPU request will go to the same CM block 01, then currently present) 01 by replacing
compare the previous tag (1) with the current tag (0). But this block no. 101 and
time, they are not same, so there is a miss in cache memory. update the tag to 0
Computer Architecture
Cache Mapping: Direct Mapping

In the last example we have,


Cache size
Blocks in cache = 4 (00 - 11)
Blocks in cache memory =
Blocks in main memory = 8 (000 - 111)
Block size
Block size = 2 Bytes

Size of cache memory = 4 * 2 = 8 Bytes


Size of main memory = 8 * 2 = 16 Bytes = 24 Bytes
No. of bits for main memory address = 4 bits

Note: Block size 2 bytes means each block in the main memory has 2 cells (1 byte each) and byte no. for first cell is 0
and for second cell is 1. So, number of bits for byte no. field is 1 (as 2 bytes = 2 1 bytes => 1 bit). Now, tell me the
number of bits for block size 32 bytes.
Block size 32 bytes means each block in the main memory has 32 cells (1 byte each) and byte no. for 1st cell is 00000 and
for 32nd cell is 11111. So, number of bits for byte no. field is 5 (as 32 bytes = 25 bytes => 5 bits).
Computer Architecture
Cache Mapping: Direct Mapping 0000 Block
0001 000
Main memory address

CM block no.
0010 Block
Block no. Byte no. 0011 001

0100 Block

Tag
0101 010

Main memory
Cache memory
Tag Cache memory block no. Byte no.
00 0110 Block
011
01 0111
1000 Block
10 100
1001
11 1010 Block
1011 101

1100 Block
1101 110

1110 Block
1111 111
Computer Architecture
Cache Mapping: Direct Mapping
CPU request Mapping Tag Hit/Miss Comments
(main memory (cache memory
block no.) block no.)

Req1: Block no. 01 1 Miss Bring main memory


1011 block no. 101 in
Tag CM block no. Byte no. 1 cache at block no.
1 01 01 with tag 1
Req2: Block no. 01 1 Hit Send byte 0 of this
1010 block to CPU

Both CPU request will go to the same CM block 01, then


compare the previous tag (1) with the current tag (1). This time,
they are same, so there is a hit in cache memory.
Computer Architecture
Cache Mapping: Direct Mapping
We have seen,
Tag Cache memory block no. Byte no.
or Cache line no. or Byte offset

Number of bits in byte offset = log2(block size) - bits


Number of bits in CM block no. = log2(no. of blocks in cache) - bits
Number of bits in Tag = remaining bits
Tag directory size = No. of blocks in CM * No. of bits for a Tag
32-bits
Example:
16-bits 12-bits 4-bits

1. byte offset = log2(16) = 4 bits


2. No. of CM blocks = Cache size / Block size = 64KB / 16B
= (64*1024)B / 16B
= 4096 blocks
So, Number of bits in CM block no. = log2(4096) = 12 bits
3. Bits in Tag = 32 - (4 + 12) = 16 bits
4. Tag directory size = 4096 * 16 = 65, 536 bits = 8192 Bytes
= 8KB
Computer Architecture
Cache Mapping: Direct Mapping (Practice)

Assume there is a main memory with 16 blocks where block size is 4 words and a 4 lines
cache memory. Find the length of the addresses and how many bits for the tag field. Assign
the following address of reference to the appropriate cache block using direct mapping
analyzing the given scenario:
CPU requests: W2 W13 W15 W53 W41 W49 W0
Computer Architecture
Cache Mapping: Direct Mapping (Valid-Bit, Modified-Bit)

When the computer is turned on, there is nothing in the cache


memory (as the cache is a volatile memory). But nothing in the
cache memory doesn't mean that cache is totally empty or

CM block no.
erased, there is something (called garbage value) in the cache.
Now, let's assume, CPU requests for some data in the cache and

Valid bit
cache is not initialized before so it contains some garbage.

Tag
Then, as usual, the tag in the cache block and the tag generated

Cache memory
by the CPU are compared, what if they are same (CACHE 00 0
HIT). As a result, CPU will go in a wrong direction. That will 01 0
be a problem.
10 0
Solution: We will use valid bit and initialize it with 0 when 11 0
computer starts. Here, 0 means invalid content and 1 means
valid content. Valid bit is not a part of the memory address.
Computer Architecture
Cache Mapping: Direct Mapping (Valid-Bit, Modified-Bit)

CPU can access cache memory for read or write operation. If it


is a read operation then newly requested content by the CPU
can easily be brought by replacing the current content with no

CM block no.
issues. But if it is a write operation, suppose the write back

Modified bit
technique is being followed, then the CPU will change the

Valid bit
cache content which will need to be updated in the main

Tag
memory, here is an issue. The problem is how it will be known

Cache memory
that the CPU has changed the cache content. To solve this 00 0 0
problem, we need another bit called modified or dirty bit. If the 01 1 1
dirty bit is 0, then there are no changes in the cache and if it is 1,
10 0 0
then cache content has been changed that impacts main
memory to be updated with the new value. Modified bit is also 11 1 0
not a part of the memory address.
Computer Architecture
Cache Mapping: Set Associative Mapping 0000 Block
0001 000

CPU Requests: 1, 5, 1, 5, 1, 5

CM block no.
0010 Block
In that case, all are miss in the 0011 001

cache memory because it is 0100 Block

Tag
being replaced one by one. 0101 010

Main memory
Cache memory
Soln: Use set associative 00 0110 Block
mapping 01 1, 5,.... 0111 011

1000 Block
10 100
1001
11 1010 Block
1011 101

1100 Block
1101 110

1110 Block
1111 111
Computer Architecture
Cache Mapping: 2-Way Set Associative Mapping 0000 Block
0001 000

CPU Requests: 1, 5, 1, 5, 1, 5

CM block no.
0010 Block
In that case, the first 1 and 5 0011 001

will face cache miss but later Tag


0100 Block
on, the rest of the operations 0101 010

Main memory
Cache memory
will occur hit in the cache. 0110 Block
Soln: Use set associative 0 0111 011
mapping 1000 Block
1001 100
1
1010 Block
1011 101

1100 Block
CPU Requests: 1(miss), 5(miss), 1(hit), 5(hit), 1(hit), 5(hit) 1101 110

1110 Block
Cache set number = (Main Memory Block Address) Mod (Number of 111
1111
sets in Cache)
Computer Architecture
Cache Mapping: 2-Way Set Associative Mapping 0000 Block
0001 000

CPU Requests: 1, 5, 1, 5, 1, 5

CM block no.
0010 Block
In that case, the first 1 and 5 0011 001

will face cache miss but later Tag


0100 Block
on, the rest of the operations 0101 010

Main memory
Cache memory
will occur hit in the cache. 0110 Block
Soln: Use set associative 0 0111 011
mapping 1000 Block
1001 100
1
Main memory address 1010 Block
1011 101
Block no. Byte no.
1100 Block
1101 110

1110 Block
Tag Cache memory set no. (set Byte no. 1111 111
offset) (byte offset)
Computer Architecture
Example
A computer has a 256Kbytes, 4-way set associative, write back data cache with block size of
32 bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag
directory entry contains, in addition to address tag, 2 valid buts, 1 modified bit and 1
replacement bit. Find the number of bits in the tag field and the size of the cache tag directory.
32-bits
Soln:
Tag set offset byte offset
16 11 5
Number of bits for byte offset => block size 32 bytes => block size 25 bytes => 5 bits
Blocks in cache memory = cache size / block size = 256KB / 32B
= 256*1024 B / 32B = 8192 = 213
Number of sets in cache = No. of block in cache / associativity = 213 / 4 = 211
So, number of bits for set offset = 11 bits
Number of bits for Tag = 32 - (5 + 11) = 16 bits
Tag directory size = Blocks in cache memory * (Number of bits for Tag + Extra bits)
= 213 * (16 + 2 + 1 + 1) = 163840 bits = 160 Kbits
Computer Architecture
Cache Mapping: Fully Associative Mapping
Direct Mapping k-way set associative mapping Fully associative mapping
Tag Cache memory Byte no. Tag set offset byte offset
block no. .................... Tag byte offset

increasing associativity
CM block no.

CM block no.
Tag

Tag

Tag

Byte offset

Direct Mapping 2-way set associative mapping Fully associative mapping


Computer Architecture
Cache Mapping: Fully Associative Mapping

Here,
• All the lines of cache are freely available.
• Thus, any block of main memory can map to any line of the cache.
• Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.

Need of Replacement Algorithm-

In fully associative mapping,


• A replacement algorithm is required.
• Replacement algorithm suggests the block to be replaced if all the cache lines are occupied.
• Thus, replacement algorithm like FIFO Algorithm, LRU Algorithm etc is employed.
Computer Architecture
Cache Mapping: Hardware Implementation in Direct Mapping
8-bits Block in cache = 4 = 22
Block size = 16 bytes = 24
Main memory address = 8 bits
2 2 4
00
01
4:1
MUX
10
11

Hit or Miss • Size of MUX for Tag selection = No. of blocks in cache:1
Computer Architecture
Cache Mapping: Hardware Implementation in 2-Way Set Associative Mapping

set
00
set
01
set
10
set
11

No. of blocks in cache = 8


2-way set associative
No. of sets = 8/2 = 4
• Size of MUX for Tag selection = No. of sets in cache:1
• No. of MUX of that size for Tag selection = K-way
Computer Architecture
Cache Mapping: Hardware Implementation in Fully Associative Mapping

• Size of Comparators = No. of blocks in cache


Computer Architecture
Cache Mapping: Block Replacement in Direct Mapping

MM blocks = 256 (0-255)


CM blocks = 8 (0-7)
MM blocks request: 2, 3, 4,8, 0,12, 18, 24, 6, 13, 11, 9

000
No replacement policy in direct mapping technique 001
as we have only one block to be replaced if occupied 010
before.
011
100
101
110
111
Computer Architecture
Cache Mapping: Block Replacement in k-Way Set Associative Mapping

MM blocks = 8 (0-7)
CM blocks = 4
Let’s assume 2-way set associative
MM blocks request: 1, 3, 1, 5

Replacement policy is used.

FIFO (First In First Out):


block 1 is replaced by block 5

LRU (Least Recently Used):


block 3 is replaced by block 5.
Computer Architecture
Cache Mapping: Block Replacement in k-Way Set Associative Mapping

Consider a 4-way set associative cache (initially empty) with total 16 cache blocks. The main
memory consists of 256 block and the request for memory blocks is in the following order:
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
Which one of the following memory block will not be in cache if LRU replacement policy is
used?

a) 3 00
b) 8
c) 129 01
d) 216 10
11

Cache set number = (Main Memory Block Address) Mod (Number of sets in Cache)
Computer Architecture
Cache Mapping: Block Replacement in Fully Associative Mapping

Consider a fully associative cache with 8 cache blocks (0-7) and the following sequence of
memory block requests:
4, 3, 25,8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?

a) 4
b) 5
c) 6
d) 7 0 1 2 3 4 5 6 7
Computer Architecture
Virtual Memory
Virtual Memory is a storage scheme that provides user an illusion of having a very big main
memory. This is done by treating main memory as a “cache” for secondary storage and
secondary memory as the main memory.

In this scheme, user can load the bigger size processes than the available main memory by
having the illusion that the memory is available to load the process.
Instead of loading one big process in the main memory, the Operating System loads the
different parts of more than one process in the main memory.
By doing this, the degree of multiprogramming will be increased and therefore, the CPU
utilization will also be increased.
Computer Architecture
How Virtual Memory Works?
In modern word, virtual memory has become quite common these days. In this scheme, a
process or a program is divided into some equally sized chunks or blocks called pages and
each page has an address called virtual address which is translated by a combination of
hardware and software to a physical address (RAM); here, main memory is also divided into
equally sized blocks called frames and each frame size is equal to the page size. Whenever
some pages needs to be loaded in the main memory for the execution and the memory is not
available for those many pages, then in that case, instead of stopping the pages from entering
in the main memory, the OS search for the RAM area that are least used in the recent times or
that are not referenced and copy that into the secondary memory to make the space for the
new pages in the main memory.

Since all this procedure happens automatically, therefore it makes the computer feel like it is
having the unlimited RAM.
Computer Architecture
Computer Architecture
Computer Architecture
Demand Paging
According to the concept of Virtual Memory, in order to execute some process, only a part of
the process needs to be present in the main memory which means that only a few pages will
only be present in the main memory at any time.

However, deciding, which pages need to be kept in the main memory and which need to be
kept in the secondary memory, is going to be difficult because we cannot say in advance that a
process will require a particular page at particular time.

Therefore, to overcome this problem, there is a concept called Demand Paging is introduced.
It suggests keeping all pages of the frames in the secondary memory until they are required. In
other words, it says that do not load any page in the main memory until it is required.

Whenever any page is referred for the first time in the main memory, then that page will be
found in the secondary memory.
Computer Architecture
What is a Page Fault?

If the referred page is not present in the main memory then there will be a miss and the
concept is called Page miss or page fault. The CPU has to access the missed page from the
secondary memory.

Page Table

The table containing the virtual to physical address translations in a virtual memory system.
The table, which is stored in memory, is typically indexed by the virtual page number; each
entry in the table contains the physical page number for that virtual page if the page is
currently in memory. Page table is stored in main memory at the time of process creation and
its base address is stored in process control block. Page table is created for Each Process
separately.
Computer Architecture

Page table register


which indicates the
starting address of
the active process
page table.
Computer Architecture

You might also like