You are on page 1of 38

Virtual Memory

Virtual Memory
Caches provides fast access to recently used portions of

programs code and data Same way, main memory can act as a cache for the secondary storage, usually implemented with magnetic disk. This technique is called Virtual memory Allows safe and efficient sharing of memory among programs Reduces programming burden of a small, limited amount of memory

Virtual Memory : Motivation


Handling Dynamic interactions of programs

Each program compiles into its own address space Virtual memory translates this address space into physical address space

Enforces protections of programs address space from


other programs

Allowing single user program to exceed the size of primary

memory

Virtual Memory : Terminology


Page

A Virtual memory block

Page fault

A virtual memory miss

Virtual address is translated into physical address, which in

turn can be used to access main memory

Virtual Memory : Address translation

Handling page faults


To overcome high cost of page faults

Pages should be large enough to reduce access time Attractive organizations that reduces rate of page faults Page faults can be handled in software Which write policy should be adopted to handle page faults???

Searching a page
Using a page table

Used to locate pages in memory A structure that indexes pages in memory and resides in main memory Each program has its own page table Page table register: Used to indicate page table in memory State of the program: A page table, program counter and register together identifies the state of the program

Searching a page

Example:
Compute the total page table size for the following system: 32-bit virtual address 4 KB pages 4 Bytes per page table entry

Handling Writes in Virtual Memory system


Is write-through practical??? Use write-back (copy-back) instead.
Perform individual writes into the page in memory Copying the page back into the disk when it is replaced in memory

Making address translation fast: A TLB


Page table is stored in main memory
Each memory access by program takes two access of memory
Has to be some way to reduce this time Rely on locality of reference to page table

A TLB (Translation Look-aside Buffer)


A special cache that keeps track of recently used translations It avoids accessing page table again for the recently accessed page

TLB (cont)

TLB (cont)
Handling memory reference(access)
Turn on reference bit each time page is accessed Turn on dirty bit each time write is performed

Handling TLB miss


Can be either TLB miss or page fault
Handling TLB miss : Bring the Translation to TLB Which data to replace??? How to replace?? Copy the dirty, reference and valid bits in page table

TLB (cont)
Some typical values of TLB:
TLB size: 16 to 512 entries Block size: 1-2 page table entries Hit ratio: 0.5 to 1 clock cycle Miss penalty: 10-100 clock cycle Miss rate: 0.01% to 1%

Integrating virtual memory, TLB and Cache

Example:
TLB Hit Miss Miss Miss Hit Hit Miss Page Table Hit Hit Hit Miss Miss Miss Miss Cache Miss Hit Miss Miss Miss Hit Hit Possible?

Example:
TLB Hit Miss Miss Miss Hit Hit Miss Page Table Hit Hit Hit Miss Miss Miss Miss Cache Miss Hit Miss Miss Miss Hit Hit Possible? Yes Yes Yes Yes No No No

Simple Memory System Example


Addressing
14-bit virtual addresses 12-bit physical address Page size = 64 bytes
13 12 11 10 9 8 7 6 5 4 3 2 1 0

VPN
(Virtual Page Number)
11 10 9 8 7 6 5

VPO
(Virtual Page Offset)
4 3 2 1 0

PPN
(Physical Page Number)

PPO
(Physical Page Offset)

Simple Memory System Page Table


Only show first 16 entries
VPN 00 01 02 03 04 05 06 07 PPN 28 33 02 16 Valid 1 0 1 1 0 1 0 0 VPN 08 09 0A 0B 0C 0D 0E 0F PPN 13 17 09 2D 11 0D Valid 1 1 1 0 0 1 1 1

Simple Memory System TLB


TLB
16 entries 4-way associative
TLBT
13 12 11 10 9 8 7

TLBI
6 5 4 3 2 1 0

VPN
Set 0 1 2 3 Tag 03 03 02 07 PPN 2D Valid 0 1 0 0 Tag 09 02 08 03 PPN 0D 0D Valid 1 0 0 1 Tag 00 04 06 0A PPN 34

VPO
Valid 0 0 0 1 Tag 07 0A 03 02 PPN 02 Valid 1 0 0 0

Cache

Simple Memory System Cache

16 lines 4-byte line size Direct mapped


CT
11 10 9 8 7 6 5 4

CI
3 2 1

CO
0

PPN
Idx 0 1 2 3 4 5 6 7 Tag 19 15 1B 36 32 0D 31 16 Valid 1 0 1 0 1 1 0 1 B0 99 00 43 36 11 B1 11 02 6D 72 C2 B2 23 04 8F F0 DF B3 11 08 09 1D 03 Idx 8 9 A B C D E F Tag 24 2D 2D 0B 12 16 13 14

PPO
Valid 1 0 1 0 0 1 1 0 B0 3A 93 04 83 B1 00 15 96 77 B2 51 DA 34 1B B3 89 3B 15 D3

Address Translation Example #1


Virtual Address 0x03D4 TLBT
13 12 11 10 9 8 7

TLBI
6 5 4 3 2 1 0

VPN
VPN ___ TLBI ___ TLBT ____ TLB Hit? __

VPO
Page Fault? __ PPN: ____

Physical Address CT
11 10 9 8 7 6 5 4

CI
3 2 1

CO
0

PPN
Offset ___ CI___ CT ____ Hit? __

PPO
Byte: ____

Address Translation Example #1


Virtual Address 0x03D4 TLBT
13 12 11 10 9 8 7

TLBI
6 5 4 3 2 1 0

VPN
VPN 0F TLBI 03 TLBT 03 TLB Hit? Y

VPO
Page Fault? NO PPN: 0D

Physical Address CT
11 10 9 8 7 6 5 4

CI
3 2 1

CO
0

PPN
Offset 00 CI 05 CT 0D Hit? Y

PPO
Byte: 36

Address Translation Example #2


Virtual Address 0x0B8F TLBT
13 12 11 10 9 8 7

TLBI
6 5 4 3 2 1 0

VPN
VPN ___ TLBI ___ TLBT ____ TLB Hit? __

VPO
Page Fault? __ PPN: ____

Physical Address CT
11 10 9 8 7 6 5 4

CI
3 2 1

CO
0

PPN
Offset ___ CI___ CT ____ Hit? __

PPO
Byte: ____

Address Translation Example #2


Virtual Address 0x0040 TLBT
13 12 11 10 9 8 7

TLBI
6 5 4 3 2 1 0

VPN
VPN ___ TLBI ___ TLBT ____ TLB Hit? __

VPO
Page Fault? __ PPN: ____

Physical Address CT
11 10 9 8 7 6 5 4

CI
3 2 1

CO
0

PPN
Offset ___ CI___ CT ____ Hit? __

PPO
Byte: ____

EXERCISE
Imagine a system with the following parameters:
Virtual addresses: 20 bits Physical addresses: 18 bits Page size: 1 KB TLB: 2-way set associative, 16 total entries

Solve for following virtual addresses:


0x078E6 0x04AA4

First 32 entries of Page Table

TLB

More exercises
1. We have 32-byte direct mapped cache with a block size of 4 bytes. a) To which block the byte address 36 will map??? b) To which block the word address 36 will map??

More exercises
2. We have 256-bytes direct mapped cache with a block size of 32 bytes. a) To which block the byte address 300 will map???

More exercises
Row major: int arr[10000][10000]; int i,j; for (i=0;i<10000;i++) for(j=0;j<10000;j++) arr[i][j] = arr[i][j]*2;

More exercises
Column major: int arr[10000][10000]; int i,j; for (j=0;j<10000;j++) for(i=0;i<10000;i++) arr[i][j] = arr[i][j]*2;

More exercises
Assume that C compiler stores array in row major order, By laying our array in row major order, the two consecutive elements in row will access same line of cache Hence, accessing array in row major form will be faster than column major form since row major form has greater spatial and temporal locality.

More exercises
The following C program is run on a processor with a cache that has 8 word(32 bytes) blocks and hold 256 bytes of data. int i, j, c, stride, array[512]; for(i=0;i<10000;i++) for (j=0;j<512;j+=stride) c=array[j] + 17; If we consider only the cache activities generated by references to the array and we assume that integers are words, what is the expected miss rate when the cache is direct mapped and stride = 256? How about if stride = 255? Would either of these change if cache were 2-way set associative??

Protection with virtual memory


Sharing of memory by multiple processors
Protection from unauthorized access is requires

Measures:
The write access bit in TLB can protect a page from being written Two modes of processing : user and kernel Provide a portion of processor state that a user can read but not write Provide a mechanism whereby the processor can go from user mode to system mode and vice versa

Protection with virtual memory(cont)


Handling context switch
A changing of the internal state of the processor to allow a different process to use the processor Includes saving the state needed to return to the currently executing process

While OS context switches from P1 to P2, it must ensure that,


P2 can not access the page table of P1
How to protect??? If there is no TLB?? If TLB is there??? Extend the virtual address space by adding process identifier

Calculation of CPI (Cycles Per Instruction)


For the multi-cycle MIPS
Load - 5 cycles Store - 4 cycles R-type - 4 cycles Branch - 3 cycles Jump - 3 cycles If a program has 50% R-type instructions, 10% load instructions, 20% store instructions, 8% branch instructions and 2% jump instructions then what is the CPI?

Solution: CPI = (4x50 + 5x10 + 4x20 + 3x8 + 3x2)/100 = 3.6