Professional Documents
Culture Documents
ארכיטקטורה - הרצאה 5 - Virtual Memory and Memory Hierarchy
ארכיטקטורה - הרצאה 5 - Virtual Memory and Memory Hierarchy
Memory manager
What hardware support is required for the OS? Kernel and user mode
Kernel mode: access to all state including privileged state User mode: access to user state ONLY
Memory manager
Processes
Definition: A process is an instance of a running program. Virtualization of CPU and memory Process provides each program with two key abstractions: Logical control flow
Gives each program the illusion that it has exclusive use of the CPU Private set of register values
How is this illusion maintained? Process execution is interleaved (multitasking) Address space is managed by virtual memory system
Memory manager
Memory manager
Concurrent Processes
Two processes run concurrently (are concurrent) if their flows overlap in time. Otherwise, they are sequential. Examples: Concurrent: A & B, A&C Sequential: B & C
Process A Time Process B Process C
Memory manager
Process A Time
Process B
Process C
Memory manager
Context Switching
Processes are managed by a shared chunk of OS code called the kernel Important: the kernel is not a separate process, but rather runs as part of some user process Control flow passes from one process to another via a context switch Time between context switches 1020 ms Process A code Process B code
user code context switch
Time
context switch
Memory manager
Each process has its own private address space User processes cannot access top region of memory
run-time heap (managed by malloc) 0x10000000 0x00400000 0 read/write segment (.data, .bss) read-only segment (.text) unused
brk
Memory manager
Switches mode from user to kernel Jumps to a pre-defined place in the kernel program Still an expected change
Memory manager
Memory manager
10
Memory manager
11
event
current next
Precise exceptions/interrupts Difficult for deeply pipelined processors Could get exceptions out of order
Memory manager
12
Why precise exceptions are difficult with pipelining Imagine a 5 state pipeline during cycle i
Stage MEM: a ld causes a virtual memory exception Stage EX: an add causes an arithmetic overflow Stage ID: an invalid instruction exceptions Stage IF: a virtual memory exceptions due to instruction fetch
Memory manager
13
Memory manager
14
MIPS Interrupts
What does the CPU do on an exception? Set EPC register to point to the restart location Change CPU mode to kernel and disable interrupts Set Cause register to reason for exception also set BadVaddr register if address exception Jump to interrupt vector Interrupt Vectors 0x8000 0000 : TLB refill 0x8000 0180 : All other exceptions Privileged state EPC Cause BadVaddr
Memory manager
15
# # # # # # #
get address of counter load counter increment counter store counter restore status reg: enable interrupts and user mode return to program (EPC)
Cant survive nested exceptions OK since does not re-enable interrupts Does not use any user registers No need to save them
Memory manager
16
interrupt vector
0 1 2 n-1
...
...
code for code for exception handler n-1 exception handler n-1 But the CPU is much faster than I/O devices and most OSs use a common handler anyway Make the common case fast!
Memory manager
17
Exception Example #1
Memory Reference User writes to memory location That portion (page) of users memory is currently not in memory Page handler must load page into physical memory Returns to faulting instruction Successful on second try User Process OS
event
lw
Memory manager
18
Exception Example #2
Memory Reference User writes to memory location Address is not valid Page handler detects invalid address Sends SIGSEG signal to user process User process exits with segmentation fault
User Process
OS
event
lw
Memory manager
19
Exceptions & interrupts Illegal opcode, device by zero, , timer expiration, I/O, Exceptional instruction becomes a jump to OS code
Precise exceptions
External interrupts attached to some instruction in the pipeline New state: EPC, Cause, BadVaddr registers
Memory manager
20
run-time heap (managed by malloc) 0x10000000 0x00400000 0 read/write segment (.data, .bss) read-only segment (.text) unused
brk
Memory manager
21
Memory manager
22
Disk storage is ~100X cheaper than DRAM storage To access large amounts of data in a cost-effective manner, the bulk of the data must be stored on disk
Disk
Memory manager
23
CPU CPU regs regs Register size: 128 B Speed(cycles): 0.5-1 $/Mbyte: line size: 8B
8B
C a c h e
32 B
Memory Memory
8 KB
disk disk
Memory manager
24
Bottom line:
Design decisions made for DRAM caches driven by enormous cost of misses
SRAM
DRAM
Disk
Memory manager
25
Associativity?
High, to mimimize miss rate
hit time
Must match cache/DRAM performance
miss penalty
Very high. ~20ms
Memory manager
26
= X?
1: N-1:
Memory manager
27
OS retrieves information
Page Table Location Object Name X D: J: X: 0
On Disk
Memory manager
28
CPU
N-1:
CPU
P-1:
N-1: Disk
Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)
Based on slides by C. Kozyrakis Memory manager 30
Before fault
Memory Page Table Virtual Physical Addresses Addresses CPU
After fault
Memory Page Table Virtual Addresses CPU Physical Addresses
Disk
Disk
Memory manager
31
Cache Cache Memory-I/O bus Memory-I/O bus (2) DMA Transfer Memory Memory I/O I/O controller controller
disk Disk
disk Disk
Memory manager
32
$sp
the brk ptr runtime heap (via malloc) uninitialized data (.bss) initialized data (.data) program text (.text) forbidden
Memory manager
33
0 VP 1 VP 2
Address Translation PP 2
...
N-1 PP 7
VP 1 VP 2
...
PP 10 M-1
N-1
Memory manager
34
Memory 0: 1:
Process i:
VP 1: Yes VP 2: No
Physical Addr PP 6 PP 9 XXXXXXX
Process j:
VP 1: Yes VP 2: No
N-1:
Based on slides by C. Kozyrakis
EE108b Lecture 13
Memory manager 35
1K
2048 1024 0
1K 1K 1K
2048 1024 0
1K 1K 1K
Virtual Memory
Physical Memory
Virtual pages can map to any physical frame (fully associative)
Memory manager
36
Page Table/Translation
1 KB Pages
Virtual address
Page Table Base Register Valid bit to indicate if virtual page is currently mapped
= 32
Page Table
Page table index
V Access Rights Frame
Frame # 30 = 20
Offset 10
Memory manager
37
Context switches occur on a timeslice (when the scheduler determines the next process should get the CPU) or when blocking for I/O
Memory manager
38
Translation Process
Valid page
Check access rights (R, W, X) against access type
Generate physical address if allowed Generate a protection fault if illegal access
Invalid page
Page is not currently mapped and a page fault is generated
Frame table has mapping from physical address to virtual address and tracks used frames
Memory manager
39
MIPS architecture disallows unaligned memory access Interesting legacy problem on 80x86 which does support unaligned access
Memory manager
40
per
process!
Memory manager
41
Memory manager
42
TLB Entries
Just holds a cached Page Table Entry (PTE) Additional bits for LRU, etc. may be added
Virtual Page Physical Frame Dirty LRU Valid Access Rights
data
Memory manager
43
valid
.
=
TLB
TLB hit
physical address tag valid tag data index byte offset
Cache
cache hit
Based on slides by C. Kozyrakis
data
Memory manager
44
Memory manager
45
TLB problems
What happens on a context switch?
If the TLB entries have a Process ID (PID) associated with them, then nothing needs to be done Otherwise, the OS must flush the entries in the TLB
Limited Reach
64 entry TLB with 8KB pages maps 0.5 MB Smaller than many L2 caches TLB miss rate > L2 miss rate! Motivates larger page size
Memory manager
46
Virtual Page PID 0 Physical Page N D V G 0 PID N D V G Process ID Do not cache memory address Dirty bit Valid bit Global (valid regardless of PID)
Memory manager
47
Memory manager
48
Page Sharing
Another benefit of paging is that we can easily share pages by mapping multiple pages to the same physical frame Useful in many different circumstances
Useful for applications that need to share data Read only data for applications, OS, etc. Example: Unix fork() system call creates second process with a copy of the current address space
Memory manager
49
Virtual Address
Physical Address 1K 0
Physical Memory
Memory manager
50
Copy-on-Write
Virtual Address
Memory manager
51
Total page table size is therefore 220 4 bytes = 4 MB But, only a small fraction of those pages are actually used! The large page table size is made even worse by the fact that the page table must be in memory Otherwise, what would happen if the page table were suddenly swapped out?
Memory manager
52
Memory manager
53
Two-level Paging
PFE PTR PTE Offset
+ +
Master Block
+
Page Table Desired Page
Problems
Multiple page faults
Accessing a PTE page table can cause a page fault Accessing the actual page can cause a second page fault
Memory manager
54
TLB
Main memory
data
19b 13b virtual page number page offset cache tag 19b cache byte index offset 9b 4b
Translation & cache access in 1 cycle Access cache with untranslated part of address Tag check used physical address Cache data is based on physical addresses No problems with aliasing, I/O, multiprocessor snooping But only works when VPN bits not needed for cache lookup Cache Size <= Page Size * Assoc i.e. Set Size <= Page Size Most common solution We want L1 to be small anyway
Memory manager
55
Memory manager
56
Strategy
Pages not currently in use can be written back to secondary storage When page ownership changes, save the current contents to disk so they can be restored later
Costs
Address translation is still required to convert virtual addresses into physical addresses Writing to the disk is a slow operation TLB misses can limit performance
Memory manager
57
Finding a Block
Index with partial map (cache) or full map (page table) Index and search (set associative) Search (fully associative)
Replacing a Block
Random Pick one of the elements to replace Least Recently Used (LRU) Use bits to track usage
Writing
Write back Data is written only on eviction Write through Each write is passed through to lower level
Five Components
Memory manager
59