You are on page 1of 21

Memory Organization

1.0 Hierarchy of Storage


Today’s computer systems use a combination of memory types to provide the best
performance at the best cost. This approach is called hierarchical memory. As a rule,
the faster memory is, the more expensive it is per bit of storage. By using a hierarchy
of memories, each with different access speeds and storage capacities, a computer
system can exhibit performance above what would be possible without a combination
of the various types.
We classify memory based on its “distance” from the processor, with distance measured
by the number of machine cycles required for access. The closer memory is to the
processor, the faster it should be. As memory gets further from the main processor, we
can afford longer access times. Thus, slower technologies are used for these memories,
and faster technologies are used for memories closer to the CPU. The better the
technology, the faster and more expensive the memory becomes. Thus, faster
memories tend to be smaller than slower ones, due to cost.

The following terminology is used when referring to this memory hierarchy: • Hit—The
requested data resides in a given level of memory (typically, we are concerned with
the hit rate only for upper levels of memory).
• Miss—The requested data is not found in the given level of memory. • Hit
rate—The percentage of memory accesses found in a given level of memory. • Miss
rate—The percentage of memory accesses not found in a given level of memory.
Note: Miss Rate = 1 _ Hit Rate.
• Hit time—The time required to access the requested information in a given level
of memory.
• Miss penalty—The time required to process a miss, which includes replacing a
block in an upper level of memory, plus the additional time to deliver the
requested data to the processor. (The time to process a miss is typically
significantly larger than the time to process a hit.)

Primary storage
Primary storage (or main memory or internal memory), often referred to simply
as memory, is the only one directly accessible to the CPU. The CPU continuously reads
instructions stored there and executes them as required. Any data actively operated on
is also stored there in uniform manner.
Historically, early computers used delay lines, Williams tubes, or rotating magnetic
drums as primary storage. By 1954, those unreliable methods were mostly replaced by
magnetic core memory. Core memory remained dominant until the 1970s, when
advances in integrated circuit technology allowed semiconductor memory to become
economically competitive.
This led to modern random-access memory (RAM). It is small-sized, light, but quite
expensive at the same time. (The particular types of RAM used for primary storage are
also volatile, i.e. they lose the information when not powered).
Traditionally there are two more sub-layers of the primary storage, besides main large
capacity RAM:
Processor registers are located inside the processor. Each register typically holds a word
of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to
perform various calculations or other operations on this data (or with the help of it).
Registers are the fastest of all forms of computer data storage.
Processor cache is an intermediate stage between ultra-fast registers and much slower
main memory. It's introduced solely to increase performance of the computer. Most
actively used information in the main memory is just duplicated in the cache memory,
which is faster, but of much lesser capacity. On the other hand it is much slower, but
much larger than processor registers. Multi-level hierarchical cache setup is also
commonly used—primary cache being smallest, fastest and located inside the
processor; secondary cache being somewhat larger and slower.
Main memory is directly or indirectly connected to the central processing unit via a
memory bus. It is actually two buses: an address bus and a data bus. The CPU firstly
sends a number through an address bus, a number called memory address that
indicates the desired location of data. Then it reads or writes the data itself using the
data bus. Additionally, a memory management unit (MMU) is a small device between
CPU and RAM recalculating the actual memory address, for example to provide an
abstraction of virtual memory or other tasks.
As the RAM types used for primary storage are volatile (cleared at start up), a computer
containing only such storage would not have a source to read instructions from, in order
to start the computer. Hence, non-volatile primary storage containing a small startup
program (BIOS) is used to bootstrap the computer, that is, to read a larger program
from non-volatile secondary storage to RAM and start to execute it. A non-volatile
technology used for this purpose is called ROM, for read-only memory (the terminology
may be somewhat confusing as most ROM types are also capable of random access).
Many types of "ROM" are not literally read only, as updates are possible; however, it is
slow and memory must be erased in large portions before it can be re-written. Some
embedded systems run programs directly from ROM (or similar), because such
programs are rarely changed. Standard computers do not store non-rudimentary
programs in ROM, rather use large capacities of secondary storage, which is
non-volatile as well, and not as costly.
Recently, primary storage and secondary storage in some uses refer to what was
historically called, respectively, secondary storage and tertiary storage.
Secondary storage
Secondary storage (also known as external memory or auxiliary storage), differs from
primary storage in that it is not directly accessible by the CPU. The computer usually
uses its input/output channels to access secondary storage and transfers the desired
data using intermediate area in primary storage. Secondary storage does not lose the
data when the device is powered down—it is non-volatile. Per unit, it is typically also
two orders of magnitude less expensive than primary storage. Consequently, modern
computer systems typically have two orders of magnitude more secondary storage than
primary storage and data is kept for a longer time there.
In modern computers, hard disk drives are usually used as secondary storage. The time
taken to access a given byte of information stored on a hard disk is typically a few
thousandths of a second, or milliseconds. By contrast, the time taken to access a given
byte of information stored in random access memory is measured in billionths of a
second, or nanoseconds. This illustrates the very significant access-time difference
which distinguishes solid-state memory from rotating magnetic storage devices: hard
disks are typically about a million times slower than memory. Rotating optical storage
devices, such as CD and DVD drives, have even longer access times. With disk drives,
once the disk read/write head reaches the proper placement and the data of interest
rotates under it, subsequent data on the track are very fast to access. As a result, in
order to hide the initial seek time and rotational latency, data are transferred to and
from disks in large contiguous blocks.
When data reside on disk, block access to hide latency offers a ray of hope in designing
efficient external memory algorithms. Sequential or block access on disks is orders of
magnitude faster than random access, and many sophisticated paradigms have been
developed to design efficient algorithms based upon sequential and block access.
Another way to reduce the I/O bottleneck is to use multiple disks in parallel in order to
increase the bandwidth between primary and secondary memory.
Some other examples of secondary storage technologies are: flash memory (e.g. USB
flash drives or keys), floppy disks, magnetic tape, paper tape, punched cards,
standalone RAM disks, and Iomega Zip drives.
The secondary storage is often formatted according to a file system format, which
provides the abstraction necessary to organize data into files and directories, providing
also additional information (called metadata) describing the owner of a certain file, the
access time, the access permissions, and other information.
Most computer operating systems use the concept of virtual memory, allowing
utilization of more primary storage capacity than is physically available in the system.
As the primary memory fills up, the system moves the least-used chunks (pages) to
secondary storage devices (to a swap file or page file), retrieving them later when they
are needed. As more of these retrievals from slower secondary storage are necessary,
the more the overall system performance is degraded.

Tertiary storage
Tertiary storage or tertiary memory, provides a third level of storage. Typically it
involves a robotic mechanism which will mount (insert) and dismount removable mass
storage media into a storage device according to the system's demands; this data is
often copied to secondary storage before use. It is primarily used for archival of rarely
accessed information since it is much slower than secondary storage (e.g. 5–60
seconds vs. 1-10 milliseconds). This is primarily useful for extraordinarily large data
stores, accessed without human operators. Typical examples include tape libraries and
optical jukeboxes.
When a computer needs to read information from the tertiary storage, it will first
consult a catalog database to determine which tape or disc contains the information.
Next, the computer will instruct a robotic arm to fetch the medium and place it in a
drive. When the computer has finished reading the information, the robotic arm will
return the medium to its place in the library.
Off-line storage
Off-line storage is a computer data storage on a medium or a device that is not under
the control of a processing unit. The medium is recorded, usually in a secondary or
tertiary storage device, and then physically removed or disconnected. It must be
inserted or connected by a human operator before a computer can access it again.
Unlike tertiary storage, it cannot be accessed without human interaction.
Off-line storage is used to transfer information, since the detached medium can be
easily physically transported. Additionally, in case a disaster, for example a fire,
destroys the original data, a medium in a remote location will probably be unaffected,
enabling disaster recovery. Off-line storage increases general information security,
since it is physically inaccessible from a computer, and data confidentiality or integrity
cannot be affected by computer-based attack techniques. Also, if the information
stored for archival purposes is accessed seldom or never, off-line storage is less
expensive than tertiary storage.
In modern personal computers, most secondary and tertiary storage media are also
used for off-line storage. Optical discs and flash memory devices are most popular, and
to much lesser extent removable hard disk drives. In enterprise uses, magnetic tape is
predominant. Older examples are floppy disks, Zip disks, or punched cards.
Fig 6.1: Memory hierarchy

1.2 Characteristics of Storage


Storage technologies at all levels of the storage hierarchy can be differentiated by
evaluating certain core characteristics as well as measuring characteristics specific to a
particular implementation. These core characteristics are volatility, mutability,
accessibility, and addressability. For any particular implementation of any storage
technology, the characteristics worth measuring are capacity and performance.

Volatility
• Non-volatile memory
Will retain the stored information even if it is not constantly supplied with electric
power. It is suitable for long-term storage of information. Nowadays used for
most of secondary, tertiary, and off-line storage. In 1950s and 1960s, it was also
used for primary storage, in the form of magnetic core memory.
• Volatile memory
Requires constant power to maintain the stored information. The fastest memory
technologies of today are volatile ones (not a universal rule). Since primary
storage is required to be very fast, it predominantly uses volatile memory.
Differentiation
➢ Dynamic random-access memory
A form of volatile memory which also requires the stored information to be
periodically re-read and re-written, or refreshed, otherwise it would vanish. ➢
Static memory
A form of volatile memory similar to DRAM with the exception that it never needs
to be refreshed as long as power is applied. (It loses its content if power is
removed).

Mutability
➢ Read/write storage or mutable storage
Allows information to be overwritten at any time. A computer without some
amount of read/write storage for primary storage purposes would be useless for
many tasks. Modern computers typically use read/write storage also for
secondary storage.
➢ Read only storage
Retains the information stored at the time of manufacture, and write once
storage (Write Once Read Many) allows the information to be written only once
at some point after manufacture. These are called immutable storage.
Immutable storage is used for tertiary and off-line storage. Examples include CD
ROM and CD-R.
➢ Slow write, fast read storage
Read/write storage which allows information to be overwritten multiple times,
but with the write operation being much slower than the read operation.
Examples include CD-RW and flash memory.

Accessibility
➢ Random access
Any location in storage can be accessed at any moment in approximately the
same amount of time. Such characteristic is well suited for primary and
secondary storage.
➢ Sequential access
The accessing of pieces of information will be in a serial order, one after the
other; therefore the time to access a particular piece of information depends
upon which piece of information was last accessed. Such characteristic is typical
of off-line storage.

Addressability
➢ Location-addressable
Each individually accessible unit of information in storage is selected with its
numerical memory address. In modern computers, location-addressable storage
usually limits to primary storage, accessed internally by computer programs,
since location-addressability is very efficient, but burdensome for humans. ➢ File
addressable
Information is divided into files of variable length, and a particular file is selected
with human-readable directory and file names. The underlying device is still
location-addressable, but the operating system of a computer provides the file
system abstraction to make the operation more understandable. In modern
computers, secondary, tertiary and off-line storage use file systems.
➢ Content-addressable
Each individually accessible unit of information is selected based on the basis of
(part of) the contents stored there. Content-addressable storage can be
implemented using software (computer program) or hardware (computer
device), with hardware being faster but more expensive option. Hardware
content addressable memory is often used in a computer's CPU cache.

Capacity
➢ Raw capacity
The total amount of stored information that a storage device or medium can
hold. It is expressed as a quantity of bits or bytes (e.g. 10.4 megabytes). ➢
Memory storage density
The compactness of stored information. It is the storage capacity of a medium
divided with a unit of length, area or volume (e.g. 1.2 megabytes per square
inch).

Performance
➢ Latency
The time it takes to access a particular location in storage. The relevant unit of
measurement is typically nanosecond for primary storage, millisecond for
secondary storage, and second for tertiary storage. It may make sense to
separate read latency and write latency, and in case of sequential access
storage, minimum, maximum and average latency.
➢ Throughput
The rate at which information can be read from or written to the storage. In
computer data storage, throughput is usually expressed in terms of megabytes
per second or MB/s, though bit rate may also be used. As with latency, read rate
and write rate may need to be differentiated. Also accessing media sequentially,
as opposed to randomly, typically yields maximum throughput.

Energy use
Storage devices that reduce fan usage, automatically shut-down during inactivity, and
low power hard drives can reduce energy consumption 90 percent.
2.5 inch hard disk drives often consume less power than larger ones. Low capacity solid
state drives have no moving parts and consume less power than hard disks. Also,
memory may use more power than hard disks.

1.3 Types of memory


Even though a large number of memory technologies exist, there are only two basic
types of memory: RAM (random access memory) and ROM (read-only memory).

RAM is also the “main memory”. Often called primary memory, RAM is used to store
programs and data that the computer needs when executing programs; but RAM is
volatile, and loses this information once the power is turned off. There are two general
types of chips used to build the bulk of RAM memory in today’s computers: SRAM and
DRAM (static and dynamic random-access memory).

• Dynamic RAM is constructed of tiny capacitors that leak electricity. DRAM


requires a recharge every few milliseconds to maintain its data. Static RAM
technology, in contrast, holds its contents as long as power is available.
• SRAM is faster and much more expensive than DRAM; however, designers use
DRAM because it is much denser (can store many bits per chip), uses less power,
and generates less heat than SRAM. For these reasons, both technologies are
often used in combination: DRAM for main memory and SRAM for cache.
• The basic operation of all DRAM memories is the same, but there are many flavors,
including
• Multibank DRAM (MDRAM), Fast-Page Mode (FPM) DRAM, Extended Data Out
(EDO) DRAM, Burst EDO DRAM (BEDO DRAM), Synchronous Dynamic Random
Access Memory (SDRAM), Synchronous-Link (SL) DRAM, Double Data Rate
(DDR) SDRAM, and Direct Rambus (DR) DRAM. The different types of SRAM
include asynchronous SRAM, synchronous SRAM, and pipeline burst SRAM.

ROM (read only memory): stores critical information necessary to operate the system,
such as the program necessary to boot the computer. ROM is not volatile and always
retains its data. This type of memory is also used in embedded systems or any systems
where the programming does not need to change. Many appliances, toys, and most
automobiles use ROM chips to maintain information when the power is shut off. ROMs
are also used extensively in calculators and peripheral devices such as laser printers,
which store their fonts in ROMs. There are five basic different types of ROM: ROM,
PROM, EPROM, EEPROM, and flash memory. PROM (programmable read-only memory)
is a variation on ROM.

• PROMs can be programmed by the user with the appropriate equipment. Whereas
ROMs are hardwired, PROMs have fuses that can be blown to program the chip. •
Once programmed, the data and instructions in PROM cannot be changed. EPROM
(erasable PROM) is programmable with the added advantage of being
reprogrammable (erasing an EPROM requires a special tool that emits ultraviolet
light). To reprogram an EPROM, the entire chip must first be erased. • EEPROM
(electrically erasable PROM) removes many of the disadvantages of EPROM: no
special tools are required for erasure (this is performed by applying an electric field)
and you can erase only portions of the chip, one byte at a time. Flash memory is
essentially EEPROM with the added benefit that data can be written or erased in
blocks, removing the one-byte-at-a-time limitation. This makes flash memory faster
than EEPROM.

1.4 Cache memory


A computer processor is very fast and is constantly reading information from memory,
which means it often has to wait for the information to arrive, because the memory
access times are slower than the processor speed. A cache memory is a small,
temporary, but fast memory that the processor uses for information it is likely to need
again in the very near future. The size of cache memory can vary enormously. A typical
personal computer’s level 2 (L2) cache is 256K or 512K. Level 1 (L1) cache is smaller,
typically 8K or 16K. L1 cache resides on the processor, whereas L2 cache resides
between the CPU and main memory. L1 cache is, therefore, faster than L2 cache.

The purpose of cache is to speed up memory accesses by storing recently used data
closer to the CPU, instead of storing it in main memory. Although cache is not as large
as main memory, it is considerably faster. Whereas main memory is typically composed
of DRAM with, say, a 60ns access time, cache is typically composed of SRAM, providing
faster access with a much shorter cycle time than DRAM (a typical cache access time is
10ns). Cache does not need to be very large to perform well. A general rule of thumb is
to make cache small enough so that the overall average cost per bit is close to that of
main memory, but large enough to be beneficial. Because this fast memory is quite
expensive, it is not feasible to use the technology found in cache memory to build all of
main memory.

Cache is not accessed by address; it is accessed by content. For this reason, cache is
sometimes called content addressable memory or CAM. Under most cache mapping
schemes, the cache entries must be checked or searched to see if the value being
requested is stored in cache. To simplify this process of locating the desired data,
various cache mapping algorithms are used.

Cache Mapping Schemes


For cache to be functional, it must store useful data. However, this data becomes
useless if the CPU can’t find it. When accessing data or instructions, the CPU first
generates a main memory address. If the data has been copied to cache, the address
of the data in cache is not the same as the main memory address. For example, data
located at main memory address 2E3 could be located in the very first location in
cache. How, then, does
the CPU locate data when it has been copied into cache? The CPU uses a specific
mapping scheme that “converts” the main memory address into a cache location. This
address conversion is done by giving special significance to the bits in the main
memory address. We first divide the bits into distinct groups we call fields. Depending
on the mapping scheme, we may have two or three fields. How we use these fields
depends on the particular mapping scheme being used. The mapping scheme
determines where the data is placed when it is originally copied into cache and also
provides a method for the CPU to find previously copied data when searching cache.

How data is copied into cache


Main memory and cache are both divided into the same size blocks (the size of these
blocks varies). When a memory address is generated, cache is searched first to see if
the required word exists there. When the requested word is not found in cache, the
entire main memory block in which the word resides is loaded into cache.
One field of the main memory address points us to a location in cache in which the data
resides if it is resident in cache (this is called a cache hit), or where it is to be placed if
it is not resident (which is called a cache miss).

The cache block referenced is then checked to see if it is valid. This is done by
associating a valid bit with each cache block. A valid bit of 0 means the cache block is
not valid (we have a cache miss) and we must access main memory. A valid bit of 1
means it is valid (we may have a cache hit but we need to complete one more step
before we know for sure). We then compare the tag in the cache block to the tag field
of our address. (The tag is a special group of bits derived from the main memory
address that is stored with its corresponding block in cache.) If the tags are the same,
then we have found the desired cache block (we have a cache hit). At this point we
need to locate the desired word in the block; this can be done using a different portion
of the main memory address called the word field.

Direct Mapped Cache


Direct mapped cache assigns cache mappings using a modular approach. Because there
are more main memory blocks than there are cache blocks, it should be clear that main
memory blocks compete for cache locations. Direct mapping maps block X of main
memory to block Y of cache, mod N, where N is the total number of blocks in cache. For
example, if cache contains 10 blocks, then main memory block 0 maps to cache block
0, main memory block 1 maps to cache block 1, . . . , main memory block 9 maps to
cache block 9, and main memory block 10 maps to cache block 0.
Fig 6.2: Direct Mapped Cache

You may be wondering, if main memory blocks 0 and 10 both map to cache block 0,
how does the CPU know which block actually resides in cache block 0 at any given
time? The answer is that each block is copied to cache and identified by the tag
previously described. If we take a closer look at cache, we see that it stores more than
just that data copied from main memory.
Fig 6.3: Memory address

To perform direct mapping, the binary main memory address is partitioned into the
fields. The size of each field depends on the physical characteristics of main memory
and cache. The word field (sometimes called the offset field) uniquely identifies a word
from a specific block; therefore, it must contain the appropriate number of bits to do
this. This is also true of the block field—it must select a unique block of cache. The tag
field is whatever is left over. When a block of main memory is copied to cache, this tag
is stored with the block and uniquely identifies this block. The total of all three fields
must, of course, add up to the number of bits in a main memory address.

Fully associative cache


This algorithm allows a main memory block to be placed anywhere in cache. The only
way to find a block mapped this way is to search all of cache. (This is similar to your
author’s desk!) This requires the entire cache to be built from associative memory so it
can be searched in parallel. That is, a single search must compare the requested tag to
all tags in cache to determine whether the desired data block is present in cache.
Associative memory requires special hardware to allow associative searching, and is,
thus, quite expensive.
Using associative mapping, the main memory address is partitioned into two pieces, the
tag and the word.

Fig 6.4: Main Memory Address format for Associative mapping

When the cache is searched for a specific main memory block, the tag field of the main
memory address is compared to all the valid tag fields in cache; if a match is found, the
block is found. (Remember, the tag uniquely identifies a main memory block.) If there is
no match, we have a cache miss and the block must be transferred from main memory.

With fully associative mapping, when cache is full, we need a replacement algorithm to
decide which block we wish to throw out of cache (we call this our victim block). A
simple first-in, first-out algorithm would work, as would a least-recently used
algorithm.

Set associative cache


This scheme is similar to direct mapped cache, in that we use the address to map the
block to a certain cache location. The important difference is that instead of mapping to
a single cache block, an address maps to a set of several cache blocks. All sets in cache
must be the same size.

In set-associative cache mapping, the main memory address is partitioned into three
pieces: the tag field, the set field, and the word field. The tag and word fields assume
the same roles as before; the set field indicates into which cache set the main memory
block maps. Suppose we are using 2-way set associative mapping with a main memory
of 214 words, a cache with 16 blocks, where each block contains 8 words. If cache
consists of a total of 16 blocks, and each set has 2 blocks, then there are 8 sets in
cache. Therefore, the set field is 3 bits, the word field is 3 bits, and the tag field is 8
bits.

Fig 6.5:
Two-way se associative cache 1

Fig 6.6: Set associative mapping 1

1.5 Replacement Policies


If there is contention for a cache block, there is only one possible action: The existing
block is kicked out of cache to make room for the new block. This process is called
replacement. With fully associative cache and set associative cache, we need a
replacement algorithm to determine the “victim” block to be removed from cache.

The algorithm for determining replacement is called the replacement

policy. Optimal algorithm

We like to keep values in cache that will be needed again soon, and throw out blocks
that won’t be needed again, or that won’t be needed for some time. An algorithm that
could look into the future to determine the precise blocks to keep or eject based on
these two criteria would be best. This is what the optimal algorithm does. We want to
replace the block that will not be used for the longest period of time in the future.

For example, if the choice for the victim block is between block 0 and block 1, and block
0 will be used again in 5 seconds, whereas block 1 will not be used again for 10
seconds, we would throw out block 1.

Least recently used (LRU) algorithm

We might guess that any value that has not been used recently is unlikely to be needed
again soon. We can keep track of the last time each block was accessed (assign a
timestamp to the block), and select as the victim block the block that has been used
least recently. This is the least recently used (LRU) algorithm. Unfortunately, LRU
requires the system to keep a history of accesses for every cache block, which requires
significant space and slows down the operation of the cache.

First in, first out (FIFO)


First in, first out (FIFO) is another popular approach. With this algorithm, the block that
has been in cache the longest (regardless of how recently it has been used) would be
selected as the victim to be removed from cache memory.

Random selection

Another approach is to select a victim at random.

The problem with LRU and FIFO is that there are degenerate referencing situations in
which they can be made to thrash (constantly throw out a block, then bring it back,
then throw it out, then bring it back, repeatedly). Some people argue that random
replacement, although it sometimes throws out data that will be needed soon, never
thrashes. Unfortunately, it is difficult to have truly random replacement, and it can
decrease average performance.

The algorithm selected often depends on how the system will be used. No single
(practical) algorithm is best for all scenarios. For that reason, designers use algorithms
that perform well under a wide variety of circumstances.

1.6 CPU Registers


Register are used to quickly accept, store, and transfer data and instructions that are
being used immediately by the CPU. These registers are the top of the memory
hierarchy, and are the fastest way for the system to manipulate data. In a very simple
microprocessor, it consists of a single memory location, usually called an accumulator.
Registers are built from fast multi-ported memory cell. There are various types of
Registers those are used for various purpose. Some Mostly used Registers are:
Accumulator(AC), Data Register(DR), Address Register(AR), Program Counter(PC),
Memory Data Register (MDR), Index Register(IR), Memory Buffer Register(MBR).

Registers are used for performing the various operations. While we are working on the
system then these Registers are used by the CPU for Performing the Operations. When
we give some input to the system then the input will be stored into the Registers and
When the system will give us the results after processing then the result will also be
from the Registers. So they are used by the CPU for processing the data which is given
by the user.

Registers Perform: -

1) Fetch: The Fetch Operation is used for taking the instructions those are given by
the user and the Instructions those are stored into the Main Memory will be fetch
by using Registers.
2) Decode: The Decode Operation is used for interpreting the Instructions means
the Instructions are decoded means the CPU will find out which Operation is to
be performed on the Instructions.
3) Execute: The Execute Operation is performed by the CPU. And Results those are
produced by the CPU are then Stored into the Memory and after that they are
displayed on the user Screen.

Registers are the most important components of CPU. Each register performs a specific
function.

A brief description of most important CPU Registers and their functions are
given below:

1. Memory Address Register (MAR): This register holds the address of memory
where CPU wants to read or write data. When CPU wants to store some data in the
memory or reads the data from the memory, it places the address of the required
memory location in the MAR.

2. Memory Buffer Register (MBR): This register holds the contents of data or
instruction read from, or written in memory. The contents of instruction placed in this
register are transferred to the Instruction Register, while the contents of data are
transferred to the accumulator or I/O register. In other words you can say that this
register is used to store data/instruction coming from the memory or going to the
memory.

3. I/O Address Register (I/O AR): I/O Address register is used to specify the
address of a particular I/O device.

4. I/O Buffer Register (I/O BR): I/O Buffer Register is used for exchanging data
between the I/O module and the processor.

5. Program Counter (PC): Program Counter register is also known as Instruction


Pointer Register. This register is used to store the address of the next instruction to be
fetched for execution. When the instruction is fetched, the value of IP is incremented.
Thus this register always points or holds the address of next instruction to be fetched.

6. Instruction Register (IR): Once an instruction is fetched from main memory, it is


stored in the Instruction Register. The control unit takes instruction from this register,
decodes and executes it by sending signals to the appropriate component of computer
to carry out the task.

7. Accumulator Register (AC): The accumulator register is located inside the ALU, It
is used during arithmetic & logical operations of ALU. The control unit stores data values
fetched from main memory in the accumulator for arithmetic or logical operation. This
register holds the initial data to be operated upon, the intermediate results, and the
final result of operation. The final result is transferred to main memory through MBR.

8. Stack Control Register (SCR): A stack represents a set of memory blocks; the
data is stored in and retrieved from these blocks in an order, i.e. First In and Last Out
(FILO). The Stack Control Register is used to manage the stacks in memory. The size of
this register is 2 or 4 bytes.
9. Flag Register (FR): The Flag register is used to indicate occurrence of a certain
condition during an operation of the CPU. It is a special purpose register with size one
byte or two bytes. Each bit of the flag register constitutes a flag (or alarm), such that
the bit value indicates if a specified condition was encountered while executing an
instruction. For example, if zero value is put into an arithmetic register (accumulator)
as a result of an arithmetic operation or a comparison, then the zero flag will be raised
by the CPU. Thus, the subsequent instruction can check this flag and when a zero flag
is "ON" it can take, an appropriate route in the algorithm.

10. Data Register (DR): A register used in microcomputers to temporarily store data
being transmitted to or from a peripheral device.

1.7 Memory Management Schemes


1. Memory Protection
If multiple processes are going to be in memory, we need to make sure that processes
cannot interfere with each other’s memory. Each memory management scheme we
consider must provide for some form of memory protection.
2. Swapping
In a multiprogrammed system, there may not be enough memory to have all processes
that are in the system in memory at once. If this is the case, programs must be brought
into memory when they are selected to run on the CPU. This is where medium-term
scheduling comes in.
Swapping is when a process is moved from main memory to the backing store, then
brought back into memory later for continued execution.
The backing store is usually a disk, which does have space to hold memory images for
all processes. Swapping time is dominated by transfer time of this backing store, which
is directly proportional to the amount of memory swapped.
The short-term scheduler can only choose among the processes in memory, and to keep
the CPU as busy as possible, we would like a number of processes in memory. Assuming
than an entire process must be in memory (not a valid assumption when we add virtual
memory), how can we allocate the available memory among the processes?
Fig 6.7: Swapping

Dynamic loading
i. Routine is not loaded until it is called
ii. Better memory-space utilization; unused routine is never loaded. iii. Useful when
large amounts of code are needed to handle infrequently occurring cases.
iv. No special support from the operating system is required implemented through
program design.

3. Contiguous Allocation
If all processes are the same size (not likely!), we could divide up the available memory
into chunks of that size, and swap processes in and out of these chunks. In reality, they
are different sizes. For the moment, we will assume that each process may have a
different size, but that its size is fixed throughout its lifetime. Processes are allocated
chunks of contiguous memory of various sizes. When a process arrives (or is swapped
in) it is allocated an available chunk (a “hole”) large enough to hold it. The system
needs to remember what memory is allocated and what memory is free.
CPU’s logical address is first checked for legality (limit register) to ensure that it will be
mapped into this process’ physical address space, then it is added to an offset
(relocation or base register) to get the physical address. If the program is reloaded into
a different portion of memory later, the same logical addresses remain valid, but the
base register will change.
How do we decide which “hole” to use when a process is added? We may have several
available holes to choose from, and it may be advantageous to choose one over
another. • First-fit: choose the first hole we find that is large enough. This is fast,
minimizing the search.
• Best-fit: Allocate the smallest available hole that is large enough to work. A search is
needed.
The search may be shortened by maintaining the list of holes ordered by size. •
Worst-fit: Allocate the largest hole. This is counterintuitive, but may be reasonable. It
produces the largest leftover hole. However, in practice it performs worse.

4. Fragmentation
The problem with any of these is that once a number of processes have come and gone,
we may have shredded up our memory into a bunch of small holes, each of which alone
may be too small to be of much use, but could be significant when considered together.
This is known as fragmentation.
• External fragmentation: total memory space exists to satisfy a request, but it is not
contiguous.
For example: 3 holes of size 20 are available, but a process cannot be allocated
because it requires 30.
• Internal fragmentation: this occurs when the size of all memory requests are rounded
up to the next multiple of some convenient size, say 4K. So if a process needs 7K, it
needs to round up to 8K, and the 1K extra is wasted space. The cost may be
worthwhile, as this could decrease external fragmentation.
External fragmentation can be reduced by compaction – shuffling of allocated memory
around to turn the small holes into one big chunk of available memory. This can be
done, assuming relocatable code, but it is expensive.
Contiguous allocation will not be sufficient for most real systems.

5. Paging
An approach that breaks up memory into fixed-size chunks is called paging. The size of
the chunks is called the page size, typically a power of 2 around 4K. We break up both
our logical memory and physical memory into chunks of this size. The logical memory
chunks are called pages and the physical memory chunks are called frames. The system
keeps track of the free frames of physical memory, and when a program of size n
pages is to be loaded, n free frames must be located.
Logical
Physic memo
al ry
memo (page
ry s)
(fram
es)

Fig 6.8: Frames and Pages


We can create a virtual memory which is larger than our physical memory by extending
the idea of process swapping to allow swapping of individual pages. This allows only
part of a process to be in memory at a time, and in fact allows programs to access a
logical memory that is larger than the entire physical memory of the system.
Fragmentation: we have no external fragmentation, but we do have internal
fragmentation.
The contiguous allocation scheme required only a pair of registers, the base/relocation
register and the limit register to translate logical addresses to physical addresses and to
provide access restrictions. For paging, we need something more complicated. A page
table is needed to translate logical to physical addresses.
Fig 6.9: Page table implementation
A page table is maintained for each process, and is maintained in main memory (in its
most straightforward implementation).
A page table base register can be used to locate the page table in memory, page-table
length register to determine its size.
It is possible that not all of a process’ pages are in memory, but we will not consider
that just yet.
Logical addresses are broken into:
• A page number, p, which is the index into the page table
• A page offset, d, which is added to the base address of the physical memory frame
that holds logical page p

Fig 6.10: Translating logical addresses to physical addresses If a page moves to


a different frame, we don’t have to tell the program – just update the page table and
the same logical addresses will work.
Disadvantage: every data/instruction access now requires two memory accesses: one
for the page table and one for the data/instruction. This is unacceptable! Memory
access is slow compared to operations on registers!
The two-memory access problem can be solved with hardware support, possibly by
using a fast lookup hardware cache called associative memory or translation look-aside
buffers (TLBs)
The TLB is an associative memory – a piece of hardware that lets us search all entries
for a line whose page number is p. If it’s there, we get our frame number f out without
a memory access to do a page table lookup. Since this is relatively expensive
hardware, it will not be large enough to hold all page table entries, only those we’ve
accessed in the recent past. If we try to access a page that is not in the TLB, we go to
the page table and look up the frame number. At this point, the page is moved into the
TLB so if we look it up again in the near future, we can avoid the memory access.
Fortunately, even if the TLB is significantly smaller than the page table, we are likely to
get a good number of TLB “hits”. This is because of locality – the fact that programs
tend to reuse the same pages of logical memory repeatedly before moving on to some
other group of pages.
TLB hit is still more expensive than a direct memory access (no paging at all) but much
better than the two references from before.
A TLB is typically around 64 entries. Tiny, but good enough to get a good hit rate. TLB
management could be entirely in the MMU, but often (on RISC systems like Sparc,
MIPS,
Alpha) the management is done in software. A TLB miss is trapped to the OS to choose
a TLB victim and get the new page table entry into the TLB.

Fig 6.11: TLB implementation

Page Replacement Algorithms


1. First-In-First-Out (FIFO) Algorithm
The first page in is the first page out when it comes time to kick someone
out. 2. Least Recently Used (LRU) Algorithm
Select the frame that we have not used for the longest time in the
past. 3. Counting Algorithms
One more group of algorithms to consider are those that keep track of the number of
references that have been made to each page.
Least-frequently used (LFU): replaces page with smallest count. We haven’t used it
much, so maybe that means we never will.
Most-frequently used (MFU): based on the argument that the page with the smallest
count was probably just brought in and has yet to be used.
4. Allocation of Frames
It is also possible to select the victim from a frame currently allocated to another
process (global replacement).
Each process needs a certain minimum number of pages.
pages for instructions
pages for local data
pages for global data

Allocation may be fixed:


equal allocation – p processes share m frames, m p each
proportional allocation – processes that have more logical memory get more
frames priority allocation – high priority processes get more frames

Thrashing
Thrashing occurs when a process does not have “enough” frames allocated to store the
pages it uses repeatedly, the page fault rate will be very high.
Since thrashing leads to low CPU utilization (all processes are in I/O wait frequently
having page
faults serviced), the operating system’s medium or long term scheduler may step in,
detect this and increase the degree of multiprogramming.
With another process in the system, there are fewer frames to go around, and the
problem most likely just gets worse.
Paging works because of locality of reference – a process access clusters of memory for
a while before moving on to the next cluster. For reasonable performance, a process
needs enough frames to store its current locality.
Thrashing occurs when the combined localities of all processes exceed the capacity of
memory.

Segmentation
Another possible memory management scheme, sort of a hybrid of contiguous
allocation and paging, is called segmentation.
Memory is allocated for a process as a collection of segments. These segments
correspond to logical units of memory in use by a process:
main program
procedure, function, method
object, local variables
global variables
common block (Fortran)
stack
heap space
This can be thought of as a finer grained approach to contiguous allocation – smaller
contiguous
chunks, corresponding to these logical units – are scattered throughout memory. With
segmentation, a logical address consists of an ordered pair: (segment-number, offset)
A segment table contains two entries for each segment:
base – the starting physical address where the segment resides in memory
limit – the length of the segment
Two registers locate the segment table in memory:
Segment-table base register (STBR) points to the segment table’s location in memory
Segment-table length register (STLR) indicates number of segments used by a
program; segment number s is legal if s < STLR.

Fig 6.12: Segmentation


With segmentation, segments may be relocated by moving the segment and updating
the segment table. The segment number, and hence the logical addresses, remain the
same.
Segment sharing is straightforward, as long as each process use the same segment
number. This is required because the code in a segment uses addresses in the
(segment number, offset) format.
Allocation for the segments uses the same techniques as contiguous allocation (first-fit,
best-fit), but since the chunks are smaller, there is less likelihood of some of the
problems arising, though external fragmentation is there.
Revision questions
1. Explain the significance of memory management in a computer system
2. Describe the memory hierarchy in a computer system
3. Differentiate between SRAM and DRAM
4. What are the techniques used for memory allocation?
5. Explain the replacement algorithms used in memory management

You might also like