You are on page 1of 75

Memory Management

 Obviously memory accesses and memory management are a very


important part of modern computer operation. Every instruction has
to be fetched from memory before it can be executed, and most
instructions involve retrieving data from memory or storing data in
memory or both.
 The advent of multi-tasking OSes compounds the complexity of
memory management, because as processes are swapped in and out
of the CPU, so must their code and data be swapped in and out of
memory, all at high speeds and without interfering with any other
processes.
 Memory management is the functionality of an operating system
which handles or manages primary memory and moves processes
back and forth between main memory and disk during execution.
Memory management keeps track of each and every memory
location, regardless of either it is allocated to some process or it is
free.
Basic Hardware
¨ The CPU can only access its registers and main memory. It cannot, for
example, make direct access to the hard drive, so any data stored there
must first be transferred into the main memory chips before the CPU
can work with it.
¨ Memory accesses to registers are very fast, generally one clock tick,
and a CPU may be able to execute more than one machine instruction
per clock tick.
¨ Memory accesses to main memory are comparatively slow, and may
take a number of clock ticks to complete. This would require
intolerable waiting by the CPU if it were not for an intermediary fast
memory cache built into most modern CPUs. The basic idea of the
cache is to transfer chunks of memory at a time from the main
memory to the cache, and then to access individual memory locations
one at a time from the cache.
 For proper system operation, we must protect the operating system
from access by user processes. On multiuser systems, we must
additionally protect user processes from one another. This protection
must be provided by the hardware because the operating system
doesn’t usually intervene between the CPU and its memory accesses.
 Separate per-process memory space protects the processes from each
other and is fundamental to having multiple processes loaded in
memory for concurrent execution.
 This is usually implemented using a base register and a limit register
for each process.
 Every memory access made by a user process is checked against these
two registers, and if a memory access is attempted outside the valid
range, then a fatal error is generated. The OS obviously has access to
all existing memory locations, as this is necessary to swap users' code
and data in and out of memory. It should also be obvious that changing
the contents of the base and limit registers is a privileged activity,
allowed only to the OS kernel.
A base and a limit register
define a logical address
space

Hardware address
protection with base
and limit registers
 Protection of memory space is accomplished by having the CPU
hardware compare every address generated in user mode with the
registers.
 Any attempt by a program executing in user mode to access operating-
system memory or other users’ memory results in a trap to the
operating system, which treats the attempt as a fatal error.
 This scheme prevents a user program from (accidentally or
deliberately) modifying the code or data structures of either the
operating system or other users.
 The base and limit registers can be loaded only by the operating
system, which uses a special privileged instruction. Since privileged
instructions can be executed only in kernel mode, and since only the
operating system executes in kernel mode, only the operating system
can load the base and limit registers.
 This scheme allows the operating system to change the value of the
registers but prevents user programs from changing the registers’
contents.
Address Binding
 Usually, a program resides on a disk as a binary executable file. To
be executed, the program must be brought into memory and placed
within a process. Depending on the memory management in use,
the process may be moved between disk and memory during its
execution. The processes on the disk that are waiting to be brought
into memory for execution form the input queue.
 User programs typically refer to memory addresses with symbolic
names (such as variable "i", "count" etc). A compiler typically binds
these symbolic addresses to relocatable addresses (such as “14
bytes from the beginning of this module”). The linkage editor or
loader in turn binds the relocatable addresses to absolute addresses
(such as 74014). Each binding is a mapping from one address space
to another.
The binding of instructions and data to memory addresses can be done at any step along the way:
• Compile Time - If it is known at compile time where
a program will reside in physical memory, then
absolute code can be generated by the compiler,
containing actual physical addresses. However if the
load address changes at some later time, then the
program will have to be recompiled. MS-DOS .COM
programs use compile time binding.
• Load Time - If the location at which a program will
be loaded is not known at compile time, then the
compiler must generate relocatable code, which
references addresses relative to the start of the
program. If that starting address changes, then the
program must be reloaded but not recompiled.
• Execution Time - If a program can be moved around
in memory during the course of its execution, then
binding must be delayed until execution time. This
requires special hardware, and is the method
implemented by most modern OSes
Logical Versus Physical Address Space
 An address generated by the CPU is commonly referred to as a
logical address, whereas an address seen by the memory unit—that
is, the one loaded into the memory-address register of the memory
—is commonly referred to as a physical address.
 The compile-time and load-time address-binding methods generate
identical logical and physical addresses.
 The execution-time address binding scheme results in differing
logical and physical addresses.
 In this case the logical address is also known as a virtual
address, and the two terms are used interchangeably.
 The set of all logical addresses used by a program composes the
logical address space, and the set of all corresponding physical
addresses composes the physical address space.
Each physical memory location has an address. Physical addresses start at zero.
Example for a 32-bit computer with 16MB of memory:
Access to the memory is protected using base
and limit registers (hardware protection): base
≤ address < base + limit.

Address binding is the process


of translating logical addresses
into their corresponding
physical addresses:
• The run time mapping of logical to physical addresses is handled by the memory-
management unit, MMU.
• The MMU can take on many forms. One of the simplest is a modification of
the base-register scheme. The base register is now termed a relocation
register, whose value is added to every memory request at the hardware
level.
• Note that user programs never see physical addresses. User programs work
entirely in logical address space, and any memory references or manipulations
are done using purely logical addresses. Only when the address gets sent to the
physical memory chips is the physical memory address generated.
Dynamic Loading
¨ Rather than loading an entire program into memory at
once, dynamic loading loads up each routine as it is called.
¨ The advantage is that unused routines need never be
loaded, reducing total memory usage and generating faster
program startup times.
¨ The downside is the added complexity and overhead of
checking to see if a routine is loaded every time it is called
and then loading it up if it is not already loaded.
¨ Dynamic loading does not require special support from the
operating system. Operating systems may help the
programmer, however, by providing library routines to
implement dynamic loading.
Dynamic Linking and Shared Libraries
¨ Dynamically linked libraries are system libraries that are linked to user
programs when the programs are run.
¨ With static linking, library modules get fully included in executable
modules, wasting both disk space and main memory usage, because
every program that included a certain routine from the library would
have to have their own copy of that routine linked into their executable
code.
¨ With dynamic linking, however, only a stub is linked into the executable
module, containing references to the actual library module linked in at
run time.
¨ The stub is a small piece of code that indicates how to locate the
appropriate memory-resident library routine or how to load the library if
the routine is not already present. When the stub is executed, it checks to
see whether the needed routine is already in memory. If it is not, the
program loads the routine into memory.
• If the code section of the library routines is reentrant, (meaning it does
not modify the code while it runs, making it safe to re-enter it), then
main memory can be saved by loading only one copy of dynamically
linked routines into memory and sharing the code amongst all processes
that are concurrently using it. (Each process would have their own copy
of the data section of the routines, but that may be small relative to the
code segments.) Obviously the OS must manage shared routines in
memory.
• An added benefit of dynamically linked libraries ( DLLs, also known as
shared libraries or shared objects on UNIX systems) involves easy
upgrades and updates. When a program uses a routine from a standard
library and the routine changes, then the program must be re-built ( re-
linked ) in order to incorporate the changes. However if DLLs are used,
then as long as the stub doesn't change, the program can be updated
merely by loading new versions of the DLLs onto the system. Version
information is maintained in both the program and the DLLs, so that a
program can specify a particular version of the DLL if necessary.
Swapping
 A process must be in memory to be executed. A process, however,
can be swapped temporarily out of memory to a backing store and
then brought back into memory for continued execution.
 Swapping makes it possible for the total physical address space of all
processes to exceed the real physical memory of the system, thus
increasing the degree of multiprogramming in a system.
Standard Swapping
 Standard swapping involves moving processes between main memory
and a backing store. The backing store is commonly a fast disk.
 If compile-time or load-time address binding is used, then processes
must be swapped back into the same memory location from which they
were swapped out. If execution time binding is used, then the processes
can be swapped back into any available location.
 Swapping is a very slow process compared to other operations.
 For example, if a user process occupied 10 MB and the transfer rate for
the backing store were 40 MB per second, then it would take 1/4 second
(250 milliseconds) just to do the data transfer. Adding in a latency lag
of 8 milliseconds and ignoring head seek time for the moment, and
further recognizing that swapping involves moving old data out as well
as new data in, the overall transfer time required for this swap is 512
milliseconds, or over half a second. For efficient processor scheduling
the CPU time slice should be significantly longer than this lost transfer
time.
 To reduce swapping transfer overhead, it is desired to transfer as
little information as possible, which requires that the system know
how much memory a process is using, as opposed to how much it
might use. Programmers can help with this by freeing up dynamic
memory that they are no longer using.
 It is important to swap processes out of memory only when they
are idle, or more to the point, only when there are no pending I/O
operations. (Otherwise the pending I/O operation could write into
the wrong process's memory space.) The solution is to either swap
only totally idle processes, or do I/O operations only into and out
of OS buffers, which are then transferred to or from process's main
memory as a second step.
 Most modern OSes no longer use swapping, because it is too slow
and there are faster alternatives available. (e.g. Paging.)
Swapping on Mobile Systems
Swapping is typically not supported on mobile platforms, for several
reasons:
• Mobile devices typically use flash memory in place of more spacious hard
drives for persistent storage, so there is not as much space available.
• Flash memory can only be written to a limited number of times before it
becomes unreliable.
• The bandwidth to flash memory is also lower.
Apple's IOS asks applications to voluntarily free up memory
• Read-only data, e.g. code, is simply removed, and reloaded later if
needed.
• Modified data, e.g. the stack, is never removed, but . . .
• Apps that fail to free up sufficient memory can be removed by the OS
Android follows a similar strategy.
• Prior to terminating a process, Android writes its application state to
flash memory for quick restarting.
Contiguous Memory Allocation
 The main memory must accommodate both the operating
system and the various user processes. So, main memory
needs to be allocated in the most efficient way possible.
 The memory is usually divided into two partitions: one for
the resident operating system and one for the user
processes.
 The operating system is allocated space first, usually at
either low or high memory locations, and then the
remaining available memory is allocated to processes as
needed. (The OS is usually loaded low, because that is
where the interrupt vectors are located, but on older
systems part of the OS was loaded high to make more room
in low memory (within the 640K barrier) for user processes.)
 One approach to memory management is to load each process
into a contiguous space.
 In contiguous memory allocation, each process is contained in a
single section of memory that is contiguous to the section
containing the next process.
Memory Protection
 When the CPU scheduler selects a process for execution, the
dispatcher loads the relocation and limit registers with the
correct values as part of the context switch. Because every
address generated by a CPU is checked against these registers,
we can protect both the operating system and the other users’
programs and data from being modified by this running
process.

Hardware support for relocation and limit registers


Memory Allocation
• One of the simplest methods for allocating memory is to divide
memory into several fixed-sized partitions. Each partition may
contain exactly one process. Thus, the degree of multiprogramming
is bound by the number of partitions. In this multiple partition
method, when a partition is free, a process is selected from the
input queue and is loaded into the free partition. When the process
terminates, the partition becomes available for another process.
• As processes enter the system, they are put into an input queue.
The operating system takes into account the memory requirements
of each process and the amount of available memory space in
determining which processes are allocated memory. When a
process is allocated space, it is loaded into memory, and it can then
compete for CPU time. When a process terminates, it releases its
memory, which the operating system may then fill with another
process from the input queue.
• The memory blocks available comprise a set of holes of various sizes
scattered throughout memory. When a process arrives and needs
memory, the system searches the set for a hole that is large enough for
this process. If the hole is too large, it is split into two parts. One part is
allocated to the arriving process; the other is returned to the set of holes.
When a process terminates, it releases its block of memory, which is then
placed back in the set of holes. If the new hole is adjacent to other holes,
these adjacent holes are merged to form one larger hole.
An approach is to keep a list of unused ( free ) memory blocks ( holes ), and to find a
hole of a suitable size whenever a process needs to be loaded into memory. There are
many different strategies for finding the "best" allocation of memory to processes,
including the three most commonly used:
• First fit - Search the list of holes until one is found that is big enough to
satisfy the request, and assign a portion of that hole to that process.
Whatever fraction of the hole not needed by the request is left on the
free list as a smaller hole. Subsequent requests may start looking
either from the beginning of the list or from the point at which this
search ended.
• Best fit - Allocate the smallest hole that is big enough to satisfy the
request. This saves large holes for other process requests that may
need them later, but the resulting unused portions of holes may be too
small to be of any use, and will therefore be wasted. Keeping the free
list sorted can speed up the process of finding the right hole.
• Worst fit - Allocate the largest hole available, thereby increasing the
likelihood that the remaining portion will be usable for satisfying
future requests.
Fragmentation
• Both the first-fit and best-fit strategies for memory allocation
suffer from external fragmentation. As processes are loaded and
removed from memory, the free memory space is broken into little
pieces. External fragmentation exists when there is enough total
memory space to satisfy a request but the available spaces are not
contiguous: storage is fragmented into a large number of small
holes.
• The amount of memory lost to fragmentation may vary with
algorithm, usage patterns, and some design decisions such as
which end of a hole to allocate and which end to save on the free
list.
• Statistical analysis of first fit, for example, shows that for N blocks
of allocated memory, another 0.5 N will be lost to fragmentation.
This property is known as the 50-percent rule.
• Memory fragmentation can be internal as well as external. Internal
fragmentation also occurs, with all memory allocation strategies. This is
caused by the fact that memory is allocated in blocks of a fixed size,
whereas the actual memory needed will rarely be that exact size.
• One solution to the problem of external fragmentation is compaction.
The goal is to shuffle the memory contents so as to place all free memory
together in one large block. i.e. moving all processes down to one end of
physical memory. This only involves updating the relocation register for
each process, as all internal work is done using logical addresses.
Compaction is not always possible.
Segmentation
• Most users (programmers) do not think of their programs as
existing in one continuous linear address space. Rather they tend
to think of their memory in multiple segments, each dedicated to a
particular use, such as code, data, the stack, the heap, etc.
• Memory segmentation supports this view by providing addresses
with a segment number (mapped to a segment base address) and
an offset from the beginning of that segment.
• The programmer therefore specifies each address by two
quantities: a segment name and an offset. For simplicity of
implementation, segments are numbered and are referred to by a
segment number, rather than by a segment name. Thus, a logical
address consists of a two tuple:
<segment-number, offset>.
• A segment table maps segment-offset addresses to physical addresses, and
simultaneously checks for invalid addresses, using a system similar to the page
tables and relocation base registers.
• A logical address consists of two parts: a segment number, s, and an offset
into that segment, d. The segment number is used as an index to the segment
table. The offset d of the logical address must be between 0 and the segment
limit.
• If it is not, we trap to the
operating system (logical
addressing attempt beyond
end of segment). When an
offset is legal, it is added to
the segment base to produce
the address in physical
memory of the desired byte.
The segment table is thus
essentially an array of base–
limit register pairs.
Each entry in the segment table has a segment base and a segment limit.
The segment base contains the starting physical address where the
segment resides in memory, and the segment limit specifies the length of
the segment.
Consider the following segment table

Which of the following logical address will produce trap addressing


error?

a) 0, 430 Calculate the physical address if no trap is


b) 1, 11 produced
c) 2, 100
d) 3, 425
e) 4, 95
Paging
• Segmentation permits the physical address space of a
process to be noncontiguous. Paging is another memory-
management scheme that offers this advantage.
• Paging is a memory management scheme that allows
processes physical memory to be discontinuous, and which
eliminates problems with fragmentation by allocating
memory in equal sized blocks known as pages.
• Paging eliminates most of the problems of the other
methods, and is the predominant memory management
technique used today.
• The basic method for implementing paging involves breaking
physical memory into fixed-sized blocks called frames and
breaking logical memory into blocks of the same size called
pages. When a process is to be executed, its pages are loaded
into any available memory frames from their source (a file
system or the backing store).
• The backing store is divided into fixed-sized blocks that are the
same size as the memory frames or clusters of multiple frames.
• Every address generated by the CPU is divided into two parts: a
page number (p) and a page offset (d). The page number is
used as an index into a page table. The page table contains the
base address of each page in physical memory. This base
address is combined with the page offset to define the physical
memory address that is sent to the memory unit.
Paging
hardware

Paging model of
logical and physical
memory
 The page size (like the frame size) is defined by the hardware. The size of
a page is a power of 2, varying between 512 bytes and 1 GB per page,
depending on the computer architecture.
 The selection of a power of 2 as a page size makes the translation of a
logical address into a page number and page offset particularly easy. If the
size of the logical address space is 2m, and a page size is 2n bytes, then the
high-order m−n bits of a logical address designate the page number, and
the n low-order bits designate the page offset.
 Thus, the logical address is as follows:

where p is an index into the page table and d is the displacement within the page.
 The number of bits in the page number and the number of bits in the
frame number do not have to be identical. The former determines the
address range of the logical address space, and the latter relates to the
physical address space.
Logical address 0 is page 0,
offset 0. Indexing into the page
table, we find that page 0 is in
frame 5. Thus, logical address 0
maps to physical address 20 [=
(5 × 4) + 0].
• Logical address 3 (page 0,
offset 3) maps to physical
address 23 [= (5 × 4) + 3].
• Logical address 4 is page 1,
• Consider the example, in which a offset 0; according to the
process has 16 bytes of logical page table, page 1 is mapped
memory, mapped in 4 byte pages
to frame 6. Thus, logical
into 32 bytes of physical memory.
address 4 maps to physical
• In the logical address of this
address 24 [= (6 × 4) + 0].
example, n= 2 and m = 4. We use
a page size of 4 bytes and a • Logical address 13 maps to
physical memory of 32 bytes (8 physical address 9.
pages)
 Paging itself is a form of dynamic relocation. Every logical address
is bound by the paging hardware to some physical address.
 When a paging scheme is used, no external fragmentation exists:
any free frame can be allocated to a process that needs it. However,
we may have some internal fragmentation.
 In the worst case, a process would need n pages plus 1 byte. It
would be allocated n + 1 frames, resulting in internal fragmentation
of almost an entire frame.
 Larger page sizes waste more memory, but are more efficient in
terms of overhead. Modern trends have been to increase page sizes,
and some systems even have multiple size pages to try and make
the best of both worlds.
 Page table entries (frame numbers) are typically 32 bit numbers,
allowing access to 232 physical page frames. If those frames are 4 KB
in size each, that translates to 16 TB of addressable physical memory. (32
+ 12 = 44 bits of physical address space.)
Paging Examples

Assume a page size of 1K and a 15-bit logical address space. How


many pages are in the system?
Answer: 25 = 32
Assuming a 15-bit address space with 8 logical pages. How large
are the pages?
Answer: 212 = 4K
Consider a logical address space of 8 pages of 1024 words
mapped into memory of 32 frames.
a) How many bits are there in the logical address?
b) How many bits are there in physical address?
Answer: a) 13 bits
b) 15 bits
When a process arrives in the system to be executed, its size, expressed in pages, is
examined. Each page of the process needs one frame. Thus, if the process requires n
pages, at least n frames must be available in memory. If n frames are available, they
are allocated to this arriving process. The first page of the process is loaded into one
of the allocated frames, and the frame number is put in the page table for this
process. The next page is loaded into another frame, its frame number is put into the
page table, and so on.

Free frames
(a) before allocation
(b) after allocation
• Divide physical memory into
fixed- sized blocks called
frames (size is power of 2,
between 512 bytes and 8192
bytes).
• Divide logical memory into
blocks of same size called
pages.
• Keep track of all free frames.
• To run a program of size n
pages, need to find n free
frames and load program.
• Set up a Page Table to
translate logical to physical
addresses.
 An important aspect of paging is the clear separation between the
programmer’s view of memory and the actual physical memory.
 Processes are blocked from accessing anyone else's memory because all of
their memory requests are mapped through their page table. There is no
way for them to generate an address that maps into any other process's
memory space.
 The OS must be aware of the allocation details of physical memory—
which frames are allocated, which frames are available, how many total
frames there are, and so on.
 This information is generally kept in a data structure called a frame table.
The frame table has one entry for each physical page frame, indicating
whether the latter is free or allocated and, if it is allocated, to which page
of which process or processes.
 The operating system must keep track of each individual process's page
table, updating it whenever the process's pages get moved in and out of
memory, and applying the correct page table when processing system calls
for a particular process. This all increases the overhead involved when
swapping processes in and out of the CPU.
Hardware Support
• Page lookups must be done for every memory reference, and
whenever a process gets swapped in or out of the CPU, its page
table must be swapped in and out too, along with the instruction
registers, etc. It is therefore appropriate to provide hardware
support for this operation, in order to make it as fast as possible
and to make process switches as fast as possible also.
• The hardware implementation of the page table can be done in
several ways. In the simplest case, the page table is implemented
as a set of dedicated registers. These registers should be built with
very high-speed logic to make the paging-address translation
efficient.
• For ex, the DEC PDP-11 uses 16-bit addressing and 8 KB pages,
resulting in only 8 pages per process. (It takes 13 bits to address 8
KB of offset, leaving only 3 bits to define a page number)
 An alternate option is to store the page table in main memory, and
to use a single register (called the page-table base register,
PTBR) to record where in memory the page table is located.
 Process switching is fast, because only the single register
needs to be changed.
 However memory access just got half as fast, because every
memory access now requires two memory accesses - One to
fetch the frame number from memory and then another one to
access the desired memory location.
 The solution to this problem is to use a very special, small,
fast lookup hardware cache called the translation look-aside
buffer, TLB.
 The benefit of the TLB is that it can search an entire table
for a key value in parallel, and if it is found anywhere in
the table, then the corresponding lookup value is returned.
 The TLB is associative, high-speed memory. Each entry in the TLB
consists of two parts: a key (or tag) and a value. When the associative
memory is presented with an item, the item is compared with all keys
simultaneously. If the item is found, the corresponding value field is
returned.
 The search is fast; a TLB lookup in modern hardware is part of the
instruction pipeline, essentially adding no performance penalty. To be able
to execute the search within a pipeline step, however, the TLB must be
kept small. It is typically between 32 and 1,024 entries in size.
 The TLB is used with page tables in the following way. The TLB contains
only a few of the page-table entries. When a logical address is generated
by the CPU, its page number is presented to the TLB. If the page number is
found, its frame number is immediately available and is used to access
memory.
 If the page number is not in the TLB (known as a TLB miss), a memory
reference to the page table must be made. Depending on the CPU, this may
be done automatically in hardware or via an interrupt to the operating
system.
 In addition, we add the page number and frame number to the TLB, so that
they will be found quickly on the next reference. If the TLB is already full
of entries, an existing entry must be selected for replacement.
 Replacement policies range from least recently used (LRU) through round-
robin to random.
 Some TLBs allow certain entries to be wired down, meaning that they
cannot be removed from the TLB. Typically, TLB entries for key kernel
code are wired down.
 Some TLBs store address-space identifiers (ASIDs) in each TLB entry. An
ASID uniquely identifies each process and is used to provide address-
space protection for that process.
 When the TLB attempts to resolve virtual page numbers, it ensures that the
ASID for the currently running process matches the ASID associated with
the virtual page. If the ASIDs do not match, the attempt is treated as a TLB
miss.
 In addition to providing address-space protection, an ASID allows the TLB
to contain entries for several different processes simultaneously. Without
this feature the TLB has to be flushed clean with every process switch.
 The percentage of times that the page number of interest is found in the
TLB is called the hit ratio.
 For example, suppose that it takes 100 nanoseconds to access main
memory. So a TLB hit takes 100 nanoseconds total (100 to go get the
data), and a TLB miss takes 200 (100 to go get the frame number, and then
another 100 to go get the data.) So with an 80% TLB hit ratio, the average
memory access time would be:
effective access time 0.80 * 100 + 0.20 * 200 = 120 nanoseconds
We suffer a 20% slowdown in average memory access
 A 99% hit rate would yield 101 nanoseconds average access time for a 1%
slowdown.
effective access time = 0.99 × 100 + 0.01 × 200 = 101 nanoseconds
TLBs are a hardware feature and therefore would
seem to be of little concern to operating systems and
their designers. But the designer needs to understand
the function and features of TLBs, which vary by
hardware platform.
Consider a system with 80% hit ratio, 50 nano-seconds time to
search the associative registers , 750 nano-seconds time to access
memory. Find the time to access a page
a) When the page number is in associative memory.
b) When the time to access a page when not in associative memory.
c) Find the effective memory access time.

a) The time required is 50 nano seconds to get the page number from
associative memory and 750 nano-seconds to read the desired word
from memory. Time = 50+750= 800 nano seconds.
b) Now the time when not in associative memory is Time =
50+750+750= 1550 nano seconds One memory access extra is
required to read the page table from memory.
c) Effective access time = Page number in associative memory +
Page number not in associated memory. Page number in associative
memory = 0.8 * 800. Page number not in associated memory = 0.2 *
1550. Effective access time = 0.8 * 800 + 0.2 * 1550 = 950 nano
seconds
Protection
• Memory protection in a paged environment is accomplished by
protection bits associated with each frame.
• A bit or bits can be added to the page table to classify a page as
read-write, read-only, read-write-execute, or some combination of
these sorts of things. Then each memory reference can be checked
to ensure it is accessing the memory in the appropriate mode.
• One additional bit, Valid / invalid bit is generally attached to each
entry in the page table.
• When this bit is set to valid, the associated page is in the process’s
logical address space and is thus a legal (or valid) page. When the
bit is set to invalid, the page is not in the process’s logical address
space. Illegal addresses are trapped by use of the valid–invalid bit.
The operating system sets this bit for each page to allow or
disallow access to the page.
In a system with a 14-bit address space (0 to 16383), we have a program that
should use only addresses 0 to 10468. Given a page size of 2 KB, addresses in
pages 0, 1, 2, 3, 4, and 5 are mapped normally through the page table. Any
attempt to generate an address in pages 6 or 7, however, will find that the
valid–invalid bit is set to invalid, and the computer will trap to the operating
system (invalid page reference).

Because the program extends


only to address 10468, any
reference beyond that address
is illegal. However,
references to page 5 are
classified as valid, so
accesses to addresses up to
12287 are valid. Only the
addresses from 12288 to
16383 are invalid.
This reflects the internal fragmentation of paging. Areas of memory in the last page
that are not entirely filled by the process, and may contain data left over by
whoever used that frame last.
 Many processes use only a small fraction of the address
space available to them. It would be wasteful in these cases
to create a page table with entries for every page in the
address range.
 Most of this table would be unused but would take up
valuable memory space.
 Some systems provide hardware, in the form of a page-
table length register (PTLR), to indicate the size of the
page table. This value is checked against every logical
address to verify that the address is in the valid range for
the process.
 Failure of this test causes an error trap to the operating
system.
Shared Pages
• Paging systems can make it very easy to share blocks of
memory, by simply duplicating page numbers in multiple page
frames. This may be done with either code or data.
• If code is reentrant, that means that it does not write to or
change the code in any way (it is non self-modifying), and it is
therefore safe to re-enter it. More importantly, it means the code
can be shared by multiple processes, so long as each has their
own copy of the data and registers, including the instruction
register.
• In the example, three different users are running the editor
simultaneously, but the code is only loaded into memory (in the
page frames) one time.
Consider a system that supports 40 users, each of whom executes a text editor.
If the text editor consists of 150 KB of code and 50 KB of data space, we need
8,000 KB to support the 40 users. If the code is reentrant code (or pure code),
however, it can be shared we see three processes sharing a three-page editor—
each page 50 KB in size. Each process has its own data page.
Structure of a Paging Table
Hierarchical Paging
 Most modern computer systems support logical address spaces of 232
to 264.
 With a 232 address space and 4K (212) page sizes, this leave 220 entries
in the page table. At 4 bytes per entry, this amounts to a 4 MB page
table, which is too large to reasonably keep in contiguous memory.
Note that with 4K pages, this would take 1024 pages just to hold the
page table!
 One simple solution to this problem is to divide the page table into
smaller pieces.
 One way is to use a two-level paging algorithm, in which the page
table itself is also paged
For example, consider again the system with a 32-bit logical address space and a page
size of 4 KB. A logical address is divided into a page number consisting of 20 bits and
a page offset consisting of 12 bits.

• Because we page the page


table, the page number is
further divided two 10-bit
page numbers.
• The first identifies an entry
in the outer page table,
which identifies where in
memory to find one page of
an inner page table.
• The second 10 bits finds a
specific entry in that inner
page table, which in turn
identifies a particular frame
in physical memory.
• The remaining 12 bits of the
32 bit logical address are the
offset within the 4K frame.
p1 is an index into the outer page table and p2 is the displacement within the
page of the inner page table. The address-translation method for this
architecture is shown below:

Because address translation works from the outer page table inward,
this scheme is also known as a forward-mapped page table.
• The VAX minicomputer from Digital Equipment Corporation (DEC) was
the most popular minicomputer of its time and was sold from 1977 through
2000. The VAX architecture supported a variation of two-level paging. The
VAX is a 32-bit machine with a page size of 512 bytes.

• The logical address space of a process is divided into four equal sections,
each of which consists of 230 bytes. Each section represents a different part
of the logical address space of a process. The first 2 high-order bits of the
logical address designate the appropriate section. The next 21 bits represent
the logical page number of that section, and the final 9 bits represent an
offset in the desired page.
• By partitioning the page table in this manner, the operating system can
leave partitions unused until a process needs them. Entire sections of virtual
address space are frequently unused, and multilevel page tables have no
entries for these spaces, greatly decreasing the amount of memory needed to
store virtual memory data structures.
With a 64-bit logical address space and 4K pages, there are 52 bits worth of page
numbers, which is still too many even for two-level paging. One could increase the
paging level, but with 10-bit page tables it would take 7 levels of indirection,
which would be prohibitively slow memory access. So some other approach must
be used.
• 64-bits Two-tiered leaves 42 bits in outer table

we can page the outer page table, giving us a three-level paging scheme. Going
to a fourth level still leaves 32 bits in the outer table.

• The outer page table is still 234 bytes (16 GB) in size. The next step would be
a four-level paging scheme, where the second-level outer page table itself is
also paged, and so forth
• The 64-bit UltraSPARC would require seven levels of paging—a prohibitive
number of memory accesses— to translate each logical address.
Hashed Page Tables
⁂A common approach for handling address spaces larger
than 32 bits is to use a hashed page table, with the hash
value being the virtual page number.
⁂Each entry in the hash table contains a linked list of
elements that hash to the same location (to handle
collisions).
⁂Each element consists of three fields:
(1) the virtual page number,
(2) the value of the mapped page frame,
(3) a pointer to the next element in the linked list.
The virtual page number in the virtual address is hashed into the hash table.
The virtual page number is compared with field 1 in the first element in the
linked list. If there is a match, the corresponding page frame (field 2) is used to
form the desired physical address. If there is no match, subsequent entries in
the linked list are searched for a matching virtual page number.

The above figure illustrates a hashed page table using chain-and-bucket


hashing.
A variation of this scheme that is useful for 64-bit address spaces uses
clustered page tables, which are similar to hashed page tables except that each
entry in the hash table refers to several pages (such as 16) rather than a single
page. Therefore, a single page-table entry can store the mappings for multiple
physical-page frames. Clustered page tables are particularly useful for sparse
address spaces, where memory references are noncontiguous and scattered
throughout the address space.
Inverted Page Tables
⁂Each process has an associated page table. The page table
has one entry for each page that the process is using.
⁂Since the table is sorted by virtual address, the operating
system is able to calculate where in the table the associated
physical address entry is located and to use that value
directly.
⁂One of the drawbacks of this method is that each page table
may consist of millions of entries. These tables may
consume large amounts of physical memory just to keep
track of how other physical memory is being used.
• A solution to this problem is to use an inverted page table. An
inverted page table has one entry for each real page (or frame) of
memory.
• Each entry consists of the virtual address of the page stored in that
real memory location, with information about the process that owns
the page. Thus, only one page table is in the system, and it has only
one entry for each page of physical memory.
 Inverted page tables often require that an address-space identifier be
stored in each entry of the page table, since the table usually contains
several different address spaces mapping physical memory.
 Storing the address-space identifier ensures that a logical page for a
particular process is mapped to the corresponding physical page frame.
 Examples of systems using inverted page tables include the 64-bit
UltraSPARC and PowerPC.
 Access to an inverted page table can be slow, as it may be necessary to
search the entire table in order to find the desired page ( or to discover
that it is not there.) Hashing the table can help speedup the search
process.
 Inverted page tables prohibit the normal method of implementing
shared memory, which is to map multiple logical pages to a common
physical frame. (Because each frame is now mapped to one and only
one process.)
Practice Exercises
Assuming a 1-KB page size, What are the page numbers and
offsets for the following address references (provided as decimal
numbers)
A. 2375 B. 19366 C. 30000 D. 256 E. 16285
Answer:
Page size =2n=1024 B= 210 B
# of bits in offset part (n) =10
Assuming a 1-KB page size, what are the page numbers and offsets for
the following address references (provided as decimal numbers):
A. 3085 B. 42095 C. 215201 D. 650000 E. 2000001
Given five memory partitions of 100 KB, 500 KB, 200 KB, 300 KB,
and600 KB (in order), how would each of the first-fit, best-fit, and
worst-fit algorithms place processes of 212 KB, 417 KB, 112 KB, and
426 KB (in order)? Which algorithm makes the most efficient use of
memory?

In this example, Best-fit turns out to be the best because there is no wait processes.
Consider a paging system with the page table stored in memory.
a) If a memory reference takes 200 nanoseconds, how long does a
paged memory reference take?
b) If we add associative registers, and 75 percent of all page-table
references are found in the associative registers, what is the
effective memory reference time? (Assume that finding a page-
table entry in the associative registers takes zero time, if the entry
is there.)

Consider a logical address space of 64 pages of 1024 words each,


mapped onto a physical memory of 32 frames.
a. How many bits are there in the logical address?
b. How many bits are there in the physical address?

Consider a computer system with a 32-bit logical address and 4-KB


page size. The system supports up to 512MB of physical memory. How
many entries are there in a conventional single-level page table?
What is the size of the physical address space in a paging system
which has a page table containing 64 entries of 11 bit each
(including valid and invalid bit) and a page size of 512 bytes?
A. 211 B.215 C. 219 D. 220

Using the page table shown below, translate the physical address 25
to virtual address. The address length is 16 bits and page size is
2048 words while the size of the physical memory is four frames.
Page Present (1-ln, 0-0ut) Frame
0 1 3
1 1 2
2 1 0
3 0 –
A. 25 B. 6125 C. 2073 D. 4121

You might also like