You are on page 1of 23

Computer Architecture Memory Management

Memory Paging Segmentation Virtual Memory Caches


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory
Core Memory
Period: 1950 ... 1975 Non-volatile Matrix of magnetic cores Storing a bit by changing the magnetic polarity of a core Access time 3s ... 300ns Destructive read
After reading a core, the content is lost. A read cycle must be followed by a write cycle i.o. to restore.
Image source: http://www.psych.usyd.edu.au/pdp-11/core.html Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory
Semiconductor Memory (1970 ...)
Dynamic memory (DRAM)
Storing a bit by charging a capacitor
(sometimes just the self-capacitance of a transistor) Memory Management

One transistor per bit


High density / capacity per area unit

Volatile Destructive read Self-discharging


Periodic refresh needed
Image source: http://www.research.ibm.com/journal/rd/391/adler.html Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory
Semiconductor Memory (1970 ...)
Static memory (SRAM)
Storing a bit in a flip-flop
Setting / Resetting the flip-flop Memory Management

6 transistors per bit


More chip area than with DRAM

Volatile Non-destructive read No self-discharge Fast!


Image source: Wikipedia on SRAM (English) Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Hierarchy
Memory Management

Program(mer)s want unlimited amounts of fast memory. Economical solution: Memory hierarchy.

Memory hierarchy levels in typical desktop / server computers, figure from [HP06 p.288]
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Main Memory
Central to computer system Large array of words / bytes Many programs at a time
for multi-programming / tasking to be effective

Operating System
program 1 program 2 program 3 program 4 program 5 program 6 program n

Working Memory Memory layout of a time sharing system


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Address Binding
Memory Management

Program = binary executable file Code/data accessible via addresses


... i = i + 1; check(i); ...
Addresses in the source code are symbolic, here: i (a variable) and check (a function). The compiler typically binds the symbolic addresses to relocatable addresses, such as i is 14 bytes from the beginning of the module. The compiler may also

be instructed to produce absolute addresses (non-relocatable code). The loader finally binds the relocatable addresses to absolute addresses, such as i is at 74014 when loading the code into memory.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Address Binding Schemes


The binding of code and data to logical memory addresses can be done at three stages:
Memory Management

Compile time (Program creation)


The resulting code is absolute code. All addresses are absolute. The program must be loaded exactly to a particular logical address in memory.

Load time
The code must be relocatable, that is, all addresses are given as an offset from some starting address (relative addresses). The loader calculates and fills in the resulting absolute addresses at load time (before execution starts).

Execution time
The relocatable code is executed. Address translation from relative to absolute addresses takes place at execution time (for every single memory access). Special hardware needed (MMU).
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Logical / Physical Addresses


Memory Management

Logical Address
The address generated by the CPU, also termed virtual address. All logical addresses form the logical (virtual) address space.

Physical Address
The address seen by the memory. All physical addresses form the physical address space. In compile-time and load-time address-binding schemes the logical and the physical addresses are the same. In execution-time address-binding the logical and physical addresses differ.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Management Unit


Memory Management

Hardware device that maps logical addresses to physical addresses (MMU).

A program (a process) deals with logical addresses, it never sees the real physical addresses.
Computer Architecture WS 06/07

Figure from [Sil00 p.258] Dr.-Ing. Stefan Freinatis

Protection
Memory Management

Protecting the kernel against user processes


No user process may read, modify or even destroy kernel data (or kernel code). Access to kernel data (system tables) only through system calls.

Protecting user processes from one another


No user process may read or modify other processes` data or code. Any data exchange between processes only via IPC.

MMU equipped with limit register Loaded with the highest allowed logical address
This is done by the dispatcher as part of the context switch.

Any address beyond the limit causes an error Assumption: contiguous physical memory per process
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Protection
Memory Management

Limit register for protecting process spaces against each other


Figure from [Sil00 p.266] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Occupation
Obtaining better memory-space utilization
Memory Management Initially the entire program plus its data (variables) needed to be in memory

Dynamic Loading
Load what is needed when it is needed.

Overlays
Replace code by other code.

Dynamic Linking (Shared Libraries)


Use shared code rather than back-pack everything.

Swapping
Temporarily kick out a process from memory.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Dynamic Loading
Memory Occupation

Routines are kept on disk


Main program is loaded into memory.

Routine loaded when needed


Upon each call it is checked whether the routine is in memory. If not, the routine is loaded into memory.

Unused routines are never loaded


Although the total program size may be large, the portion that is actually executed can be much smaller.

No special OS support required


Dynamic loading is implemented by the user. System libraries (and corresponding system calls) may help the programmer.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Overlays
Memory Occupation

Existing code is replaced by new code


Similar to dynamic loading, but instead of adding new routines to the memory, existing code is replaced by the loaded code.

No special OS support required


Overlay technique implemented by the user.

Example: Consider a two-pass assembler Pass 1 Pass 2 Symbol table Common routines 70 kB 80 kB 20 kB 30 kB Loading everything at once would require 200 kB.

Pass 1 and pass 2 do not need to be in memory at the same time Overlay
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Overlays
Memory Occupation Pass 1, when finished, is overlayed by pass 2. An additional overlay driver is needed (10 kB), but the total memory requirement now is 140 kB instead of 200 kB.

Memory
Computer Architecture WS 06/07

Figure from [Sil00 p.262] Dr.-Ing. Stefan Freinatis

Dynamic Linking
Different processes use same code
Memory Occupation This especially true for shared system libraries (e.g. reading from keyboard, graphical output on screen, networking, printing, disk access).

Single copy of shared code in memory


Rather than linking the libraries statically to each program (which increases the size of each binary executable), the libraries (or individual routines) are linked dynamically during execution time. Each library only resides once in physical memory.

Stub
is a piece of program code initially located at the library references in the program. When first called it loads the library (if not yet loaded) and replaces itself with the address of the library routine.

OS support required
since a user process cannot look beyond its address space whether (and where) the library code may be located in physical memory (protection!).
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Swapping
Memory Occupation

A process can be swapped temporarily out of memory to a

backing store, and then brought back into memory for continued execution.
Backing store: fast disk large enough to accommodate copies

of all memory images for all users; must provide direct access to these memory images.
Roll out, roll in swapping variant used for priority-based

scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed.
Major part of swap time is transfer time; total transfer time is

directly proportional to the amount of memory swapped.


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Swapping
Memory Occupation

Figure from [Sil00 p.263]

Figure: Process P1 is swapped out, and process P2 is swapped in.


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Allocation
Allocation of physical memory to a process
Memory Management

Contiguous
The physical memory space is contiguous (linear) for each process.

Fixed-sized partitions Variable sized partitions


Placement schemes: first fit, best fit, worst fit

Non-Contiguous
The physical memory space per process is fragmented (has holes).

Paging Segmentation Combination of Paging and Segmentation


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Contiguous Memory Allocation


The physical memory allocated to a process is contiguous (no holes).

Fixed-sized partitions
Memory is divided into fixed sized partitions. Originally used by IBM OS/360, no longer in use today.

Operating System
process 1

Simple to implement Degree of multiprogramming is bound by the number of partitions Internal fragmentation
free partition

process 2 process 3

process 4

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Contiguous Memory Allocation


The physical memory allocated to a process is contiguous (no holes).

Variable-sized partitions
Partitions are of variable size.

Operating System
process 1 process 2 process 3

OS must keep a free list


listing free memory (holes)

OS must provide placement scheme Degree of multiprogramming only limited by available memory No (or very little) internal fragmentation External fragmentation
The holes may be too small for a new process
Computer Architecture WS 06/07

process 4

Dr.-Ing. Stefan Freinatis

Compaction
Reducing external fragmentation (for variable-sized partitions)
Operating System
process 1 process 2 process 3 process 3 process 4

Operating System
process 1 process 2

Copy operation is expensive


process 4 free memory

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Placement Schemes
Satisfying a request of size n from a list of free holes.
General to the following schemes: find a large enough hole, allocate the portion needed, and return the remainder (leftover hole) to the free list.

First fit
Find the first hole that is large enough. Fastest method.

Best fit
Find the smallest hole that is large enough. The entire list must be searched (unless it is sorted by hole size). This strategy produces the smallest leftover hole.

Worst fit
Find the largest hole. Search entire list (unless sorted). This strategy produces the largest left-over hole, which may be more useful than the smallest leftover hole from the best-fit approach.

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

First Fit
Example: we need this amount of memory: Search starts at the bottom.
Operating System
process 1 process 2 process 3

Operating System
process 1 process 2 process 3 The first hole encountered is large enough.

process 4
Search

process 4 leftover hole

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Best Fit
Example: we need this amount of memory: Search starts at the bottom.
Operating System
process 1 process 2 process 3

Operating System
leftover hole process 1 process 2 process 3 We have to search all holes. The top hole fits best. This scheme creates the smallest leftover hole among the three schemes.

process 4
Search

process 4

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Worst Fit
Example: we need this amount of memory: Search starts at the bottom.
Operating System
process 1 process 2 process 3

Operating System
process 1 process 2 process 3 We have to search all holes. The bottom hole is found to be the largest. This scheme creates the largest leftover hole among the three schemes. leftover hole

process 4
Search

process 4

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Memory Allocation
Allocation of physical memory to a process

Contiguous
The physical memory space is contiguous (linear) for each process.

Fixed-sized partitions Variable sized partitions


Placement schemes: first fit, best fit, worst fit

Non-Contiguous
The physical memory space of a process is fragmented (has holes).

Paging Segmentation Combination of Paging and Segmentation


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Paging
Physical address space of a process can

be non-contiguous
Physical memory divided into fixed-sized frames
Frame size is power of 2, between 512 bytes and 8192 bytes

Logical memory divided into pages


Prage size is identical to frame size.

OS keeps track of all free frames (free-frame list) Running a program of size n pages requires

finding n free frames


Page table translates logical to physical addresses. Internal fragmentation, no external fragmentation.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Address Translation
Paging

Address generated by CPU is divided into:


Page number p used as in index into a page table which contains the base address f of the corresponding frame in physical memory. Page offset d the offset from the frame start, physical memory address = f + d.
page number logical address p
mn

page offset d
n

Logical address is m bits wide. Page size = frame size = 2n.


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Paging
Physical address = f + d f = PageTable[p] p = m-n significant bits of logical address d = n least significant bits
low memory

high memory

Figure from [Sil00 p.270] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Paging

Paging model: logical address space is contiguous, whereas the corresponding physical address space is not.

Figure from [Sil00 p.271] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Paging
What is the physical address of k?
n = 2 (page size is 4 byte) m = 4 (logical address space is 16 byte) k is located at logical address 10D
p d

Figure from [Sil00 p.272]

frame number
frame 0

frame 1 frame 2

frame 3

frame address
frame 4

10D = 1010 B

10 10
p = 2, d = 2. 0 1 2 3 20 24 4 8

frame 5 frame 6

f = PageTable[2] = 4

Physical address = f + d = 4 + 2 = 6
Computer Architecture

PageTable

frame 7

WS 06/07

Dr.-Ing. Stefan Freinatis

Free-Frame List
The OS must maintain a table of free frames (free-frame list)
free-frame list free-frame list Paging 13 14 15 16 17 18
page 0 page 1 page 2 page 3

free

14 13 18 20 15

15

13 14

page 1 page 0

frame number

15 16

19 20

0 1 2 3

14 13 18 20

17 18 19 20
page 3 page 2

page table of new process

new process
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Page-Table
Where to locate the page table?
Paging

Dedicated registers within CPU


Only suitable for small memory. Used e.g. in PDP-11 (8 page registers, each page 8 kB, 64 kB main memory total). Fast access (high speed registers).

Table in main memory


A dedicated CPU register, the page-table base register (PTBR), points to the table in memory (the table currently in use). With each context switch the PTBR is reloaded (then pointing to another page table in memory). The actual size of the page table is given by a second register, the page table length register (PTLR).

With the latter scheme we need two memory accesses, one for the page table, and one for accessing the memory location itself. Slowdown! Solution: Special hardware cache: translation look-aside buffer (TLB)

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Translation Look-Aside Buffer


Paging

A translation look-aside buffer (TLB) is a small fast-lookup associative memory.


key value

5 0
page number

12 14 13 4 18 15 17 20
frame address or frame number

1 4 2 6 9 3

18

The associative registers contain page frame entries (key | value). When a page number is presented to the TLB, all keys are checked simultaneously. If the desired page number is not in the TLB, it must be fetched from memory.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Translation Look-Aside Buffer


Paging

Paging hardware with TLB. Figure from [Sil00 p.276]


Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Memory Access Time


Paging

Assume: Memory access time = 100 ns. TLB access time = 20 ns When page number is in TLB (hit): total access time = 20 ns + 100 ns = 120 ns When page number is not in TLB (miss): total access time = 20 ns + 100 ns + 100 ns = 220 ns With 80% hit ratio: average access time = 0.8 120 ns + 0.2 220 ns = 140 ns With 98% hit ratio: average access time = 0.98 120 ns + 0.02 220 ns = 122 ns
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Protection
With paging the processes memory spaces are automatically protected against each other since each process is assigned its own set of frames. If a page is tried to be accessed that is not in the page table (or is marked invalid -- see next slide), the process is trapped by the OS. 0 1 2 3

Figure from [Sil00 p.272]

Paging
frame 0

frame 1 frame 2

frame 3

frame address
frame 4

Valid physical addresses:


20 ... 23 24 ... 27 04 ... 07 08 ... 11
Computer Architecture

20 24 4 8

frame 5 frame 6

PageTable

frame 7

WS 06/07

Dr.-Ing. Stefan Freinatis

Frame Attributes
Each frame may be characterized by additional bits in the page table.
Paging

Valid / invalid
Whether the frame is currently allocated to the process

Read-Only
Frame is read-only

Execute-Only
Frame contains code

Shared
Frame is accessible to other processes as well.
Figure from [Sil00 p.277]

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Shared Pages
Implementation of shared memory through paging is rather easy.
Paging

A shared page is a page whose frame is allocated to other processes as well. Many processes share a page in that each of the shared pages is mapped to the same frame in physical memory. Shared code must be non-self modifying code (reentrant code).
Figure on the next slide: Three processes are using an editor. The editor needs 3 pages for its code. Rather than loading the code three times into memory, the code is shared. It is loaded only once into memory, but is visible to each process as if it is their private code. The data (the text edited), of course, is private to each process. Each process thus has its own data frame.
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis

Shared Pages
0 1 2 3
Note: Free memory is shown in gray, occupied memory is in white.
0 1 2 3

Pages 0,1,2 of each process are mapped to physical frames 3,4,6.

0 1 2 3
0 1 2 3

0 1 2 3

0 1 2 3 Figure from [Sil00 p.283]

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Paging
Logical address space of modern CPUs: 232 ... 264 Assume: 32-bit CPU, frame size = 4K 232 / 212 = 220 page table entries (per process) Each entry size = 20 bit + 20 bit = 5 byte
20 bit for page number. 20 bit for frame number (less than requiring 32 bit for the frame address).

page table entry

page number frame number


20 20

220 x 5 byte = 5 MB per page table!

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Two-Level Paging
Paging Often, a process will not use all of its logical address space. Rather than allocating the page table contiguously in main memory (for the worst case), the page table is divided into small pieces and is paged itself.

outer page table

inner page table output points to a frame containing page table entries (inner page table entries) output points to final destination frame
WS 06/07 Dr.-Ing. Stefan Freinatis

Computer Architecture

Two-Level Paging
Paging

page number logical address p1


10

page offset p2
10

d
12

Numbers are for the 32-bit, 4 kB frame, example

max 210 entries each page of inner table has 210 entries final destination frame in memory

Figure from [Sil00 p.279]

Computer Architecture

WS 06/07

Dr.-Ing. Stefan Freinatis

Multi-Level Paging
Paging

Tree-Structure principle
Each outer page entry defines a root node of a tree.

Two / three / four level paging


SPARC (32 bit): three-level paging. Motorola 68030 (32 bit): four-level paging.

Better memory utilization


than using a contiguous (and possibly maximum-sized) page table.

Increase in access time


since we hop several times until final memory location is reached. Caching (TLB) however helps out a lot. Four-level paging with 98% hit rate: Effective access time = 0.98 120 ns + 0.02 520 ns = 128 ns
Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis