Unit 3

UNIT 3
Syllabus: Memory System - Memory systems hierarchy-Main memory organization-

Types of Main memory: SRAM, DRAM and its characteristics and performance – latency
–cycle time -bandwidth- memory interleaving.
Cache memory: Address mapping-line size-replacement and policies- coherence
Virtual memory: Paging (Single level and multilevel) – Page Table Mapping – TLB
Reliability of memory systems: Error detecting and error correcting systems. Tutorial
on design aspects of memory system
Memory System - Memory systems hierarchy

The memory unit is an essential component in any digital computer since it is needed
for storing programs and data. A very small computer with a limited application may
be able to fulfill its intended task without the need of additional storage capacity. Most
general-purpose computers would run more efficiently if they were equipped with
additional storage beyond the capacity of the main memory. There is just not enough
space in one memory unit to accommodate all the programs used in a typical
computer. Moreover, most computer users accumulate and continue to accumulate
large amounts of data-processing software. Not all accumulated information is
needed by the processor at the same time. Therefore, it is more economical to use
low-cost storage devices to serve as a backup for storing the information that is not
currently used by the CPU. The memory unit that communicates directly with CPU is
called the main memory. Devices that provide backup storage are called auxiliary
memory. The most common auxiliary memory devices used in computer systems are
magnetic disks and tapes. They are used for storing system programs, large data
files, and other backup information. Only programs and data currently needed by the
processor reside in main memory. All the other information is stored in auxiliary
memory and transferred to main memory when needed.
Figure 1. Memory hierarchy
Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 1

The total memory capacity of a computer can be visualized as being a hierarchy of
components. The memory hierarchy system consists of all storage devices employed
in a computer system from the slow but high-capacity auxiliary memory to a
relatively faster main memory, to an even smaller and faster cache memory
accessible to the high-speed processing logic. Fig 1 illustrates the components in a
typical memory hierarchy. At the bottom of the hierarchy are the relatively slow
magnetic tapes used to store removable files. Next are the magnetic disks used as
backup storage. The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory devices through an
I/O processor. When programs not residing in main memory are needed by the CPU,
they are brought in from auxiliary memory. Programs not currently needed in main
memory are transferred into auxiliary memory to provide space for currently used
programs and data.
A special very-high-speed memory called a cache memory is sometimes used to

increase the speed of processing by making current programs and data available to
the CPU at a rapid rate. The cache memory is employed in computer systems to
compensate for the speed differential between main memory access time and
processor logic. CPU logic is usually faster than main memory access time, with the
result that processing speed is limited primarily by the speed of main memory. A
technique used to compensate for the mismatch in operating speeds is to employ an
extremely fast, small cache between the CPU and main memory whose access time
is closed to processor logic clock cycle time. The cache is used for storing segments
of programs currently being executed in the CPU and temporary data frequently
needed in the present
calculations.
Figure 2 Memory Hierarchy in a

computer system
The I/O processor manages

data transfers between auxiliary
memory and main memory,
the cache organization is
concerned with the transfer of
information between main
memory and CPU. Thus each is
involved with a different level in the memory hierarchy system. The reason for
having two or three levels of memory hierarchy is economics. As the storage
capacity of the memory increases, the cost per bit for storing binary information
decreases and the access time of the memory becomes longer. The auxiliary memory
has a large storage capacity, is relatively inexpensive, but has low access speed
compared to main memory. The cache memory is very small, relatively expensive,
and has very high access speed. Thus as the memory access speed increases, so
does its relative cost. The overall goal of using a memory hierarchy is to obtain the
highest-possible average access speed while minimizing the total cost of the entire
memory system.

Auxiliary and cache memories are used for different purposes. The cache holds those
parts of the program and data that are most heavily used, while the auxiliary memory
holds those parts that are not presently used by the CPU. Moreover, the CPU has
direct access to both cache and main memory but not to auxiliary memory. The
transfer from auxiliary to main memory is usually done by means of direct memory
access of large blocks of data. The typical access time ratio between cache and main
memory is about 1 to 7. Many operating systems are designed to enable the CPU to
process a multiprogramming number of independent programs concurrently. This
concept, called multiprogramming, refers to the existence of two or more programs
in different parts of the memory hierarchy at the same time. In this way it is possible
to keep all parts of the computer busy by working with several programs in sequence.
A program with its data normally resides in auxiliary memory. When the program
or a segment of the program is to be executed, it is transferred to main memory to be
executed by the CPU. Thus one may think of auxiliary memory as containing the
totality of information stored in a computer system. It is the task of the operating
system to maintain in main memory a portion of this information that is currently
active. The part of the computer system that supervises the flow of information
between auxiliary memory and main memory is called the memory management
system.
Main memory organization

The main memory is the central storage unit in a computer system. It is a relatively
large and fast memory used to store programs and data during the computer
operation. The principal technology used for the main memory is based on
semiconductor integrated circuits. Integrated circuit RAM chips are available in two
possible operating modes, static and dynamic.
The static RAM consists essentially of internal flip-flops that store the binary
information. The stored information remains valid as long as power is applied to the
unit. The dynamic RAM stores the binary information in the form of electric
charges that are applied to capacitors. The capacitors are provided inside the chip
by MOS transistors. The stored charge on the capacitors tends to discharge with time
and the capacitors must be periodically recharged by refreshing the dynamic memory.
Refreshing is done by cycling through the words every few milliseconds to restore the
decaying charge. The dynamic RAM offers reduced power consumption and larger
storage capacity in a single memory chip. The static RAM is easier to use and has
shorter read and write cycles.
Most of the main memory in a general-purpose computer is made up of RAM integrated

circuit chips, but a portion of the memory may be constructed with ROM chips.
Originally, RAM was used to refer to a random-access memory, but now it is used to
designate a read/write memory to distinguish it from a read-only memory, although
ROM is also random access. RAM is used for storing the bulk of the programs and
data that are subject to change. ROM is used for storing programs that are
permanently resident in the computer and for tables of constants that do not change
in value once the production of the computer is completed.

Among other things, the ROM portion of main memory is needed for storing an initial
program called a bootstrap loader. The bootstrap loader is a program whose function
is to start the computer software operating when power is turned on. Since RAM is
volatile, its contents are destroyed when power is turned off. The contents of ROM
remain unchanged after power is turned off and on again. The startup of a computer
consists of turning the power on and starting the execution of an initial program. Thus
when power is turned on, the hardware of the computer sets the program counter to
the first address of the bootstrap loader. The bootstrap program loads a portion of the
operating system from disk to main memory and control is then transferred to the
operating system, which prepares the computer for general use. RAM and ROM chips
are available in a variety of sizes. If the memory needed for the computer is larger than
the capacity of one chip, it is necessary to combine a number of chips to form the
required memory size.
RAM AND ROM CHIPS

Figure 3. Block diagram of a RAM
chip
A RAM chip is better suited for

communication with the CPU if
it has one or more control
inputs that select the chip only when needed. Another common feature is a
bidirectional data bus that allows the transfer of data either from memory to CPU
during a read operation, or from CPU to memory during a write operation. A
bidirectional bus can be constructed with three-state buffers. A three-state buffer
output can be placed in one of three possible states: a signal equivalent to logic 1, a
signal equivalent to logic 0, or a high-impedance state. The logic 1 and 0 are normal
digital signals. The high-impedance state behaves like an open circuit, which means
that the output does not carry a signal and has no logic significance. The block
diagram of a RAM chip is shown in Fig. 3. The capacity of the memory is 128 words of
eight bits (one byte) per word. This requires a 7-bit address and an 8-bit bidirectional
data bus.
Figure 1. Function
Table of RAM
The read and write

inputs specify the
memory operation
and the two chips
select (CS) control
inputs are enabling
the chip only when
it is selected by the microprocessor. The availability of more than one control input to
select the chip facilitates the decoding of the address lines when multiple chips are
used in the microcomputer. The read and write inputs are sometimes combined into
one line labeled R/W. When the chip is selected, the two binary states in this line
specify the two operations of read or write. The function table listed in Fig. table 1,
specifies the operation of the RAM chip. When the WR input is enabled, the memory

stores a byte from the data bus into a location specified by the address input lines.
When the RD input is enabled, the content of the selected byte is placed into the data
bus. The RD and WR signals control the memory operation as well as the bus buffers
associated with the bidirectional data bus.
A ROM chip is organized externally in a similar manner. However, since a ROM can
only read, the data bus can only be in an output mode. The block diagram of a ROM
chip is shown in Fig. 4. For the same-size chip, it is possible to have more bits of ROM
than of RAM, because the internal binary cells in ROM occupy less space than in RAM.
For this reason, the diagram specifies a 512-byte ROM, while the RAM has only 128
bytes. The nine address lines in the ROM chip specify any one of the 512 bytes stored
in it. The two chip select inputs must be CS1 = 1 and CS2 = 0 for the unit to operate.
Otherwise, the data bus is in a high-impedance state. There is no need for a read or
write control because the unit
can only read. Thus when the
chip is enabled by the two
select inputs, the byte selected
by the address lines appears
on the data bus.
Figure 4 Rom Chip
The designer of a computer

system must calculate the
amount of memory required for the particular application and assign it to either RAM
or ROM. The interconnection between memory and processor is then established from
knowledge of the size of memory needed and the type of RAM and ROM chips available.
The addressing of memory
can be established by
means of a table that
specifies the memory
address assigned to each
chip. The table, called a
memory address map, is
a pictorial representation
of assigned address space
for each chip in the
system. Table 2 Memory Address map for microcomputer
Types of Main memory: SRAM, DRAM and its characteristics

and performance
SRAM: Static Random Access Memory is a form of semiconductor memory widely
used in electronics, microprocessor and general computing applications. This form of
computer memory gains its name from the fact that data is held in the memory chip
in a static fashion, and does not need to be dynamically updated as in the case of
DRAM memory. SRAM does not have this requirement, resulting in better performance
and lower power usage. However, SRAM is also more expensive than DRAM, and it

requires a lot more space. While the data in the SRAM memory does not need to be
refreshed dynamically, it is still volatile, meaning that when the power is removed
from the memory device, the data is not held, and will disappear.
S-RAM has the advantage that it offers better performance than DRAM because
DRAM needs to be refreshed periodically when in use, while SRAM does not. However,
SRAM is more expensive and less dense than DRAM, so SRAM sizes are orders of
magnitude lower than DRAM.
SRAM basics - There are two key features to SRAM - Static Random Access Memory,
and these set it out against other types of memory that are available:
• The data is held statically: This means that the data is held in the
semiconductor memory without the need to be refreshed as long as the power is
applied to the memory.
• SRAM memory is a form of random access memory: A random access memory
is one in which the locations in the semiconductor memory can be written to or
read from in any order, regardless of the last memory location that was accessed.
The circuit for an individual SRAM memory cell comprises typically four transistors
configured as two cross coupled inverters. In this format the circuit has two stable
states, and these equate to the logical "0" and "1" states. The transistors are MOSFETs
because the amount of power consumed by MOS circuit is considerably less than
that of bipolar transistor technology which is the other feasible option, but it will
consume much more. Using bipolar technology limits the level for integration
because the issue of heat removal becomes a major issue.
In addition to the four transistors in the basic memory cell, and additional two
transistors are required to control the access to the memory cell during the read and
write operations. This makes a total of six transistors, making what is termed a 6T
memory cell, although more complex in terms of the number of components it has a
number of advantages. The main advantage of the six transistor SRAM circuit is the
reduced static power. In the four transistor version, there is a constant current
flow through one or other of the pull down resistors and this increases the overall
power consumption of the chip.
SRAM memory cell operation
The operation of the SRAM memory cell is relatively straightforward. When the cell is
selected, the value to be written is stored in the cross-coupled flip-flops. The cells
are arranged in a matrix, with each cell individually addressable. Most SRAM
memories select an entire row of cells at a time, and read out the contents of all the
cells in the row along the column lines. While it is not necessary to have two bit lines,
using the signal and its inverse, this is normal practice which improves the noise
margins and improves the data integrity. If the two lines are the same, then the system
will understand there is an issue and re-interrogate the cell. The two bit lines are
passed to two input ports on a comparator to enable the advantages of the differential
data mode to be accessed, and the small voltage swings that are present can be

more accurately detected. Access to the SRAM memory cell is enabled by the Word
Line. This controls the two access control transistors which control whether the cell
should be connected to the bit lines. These two lines are used to transfer data for both
read and write operations.
DRAM: SRAM is commonly used for a computer's cache memory, such as a

processor's L2 or L3 cache. It is not used for a computer's main memory because of
its cost and size. Most computers use DRAM instead because it supports greater
densities at a lower cost per megabyte (MB).
DRAM stores data by writing a charge to the capacitor by way of an access transistor
and was invented in 1966 by Robert Dennard at IBM and was patented in 1967.
DRAM looks at the state of charge in a transistor-capacitor circuit; a charged state is
a 1 bit; the low charge is seen as a 0 bit. Several of these transistor-capacitor circuits
together creates a “word” of memory.
DRAM uses capacitors that lose charge over time due to leakage, even if the supply
voltage is maintained. Since the charge on a capacitor decays when a voltage is
removed, DRAM must be supplied with a voltage to retain memory (and is thus
volatile). Capacitors can lose their charge a bit even when supplied with voltage if they
have devices nearby (like transistors) that draw a little current even if they are in an
“off” state; this is called capacitor leakage. Due to capacitor leakage, DRAM needs
to be refreshed often.
The difference between SRAM and DRAM are as follows:
Sr.
No. SRAM DRAM
It stores information as long as the
power is supplied or a few
1 It stores information as long as the milliseconds when power is switched
power is supplied. off.
2 Transistors are used to store Capacitors are used to store data in
information in SRAM. DRAM.
To store information for a longer
3 Capacitors are not used hence no time, contents of the capacitor needs
refreshing is required. to be refreshed periodically.
4 SRAM is faster need compared to DRAM. DRAM provides slow access speeds.
5 It does not have a refreshing unit. It has a refreshing unit.
6 These are expensive. These are cheaper.
7 SRAMs are low-density devices. DRAMs are high-density devices.
Bits are stored in the form of electric
8 Bits are stored in voltage form. energy.
9 These are used in cache memories. These are used in main memories.
10 Consumes more power and generates Uses less power and generates less
more heat. heat.

Latency, Cycle Time, Bandwidth, Memory interleaving
Latency - Simply put, latency is a delay. This delay can be measured in nanoseconds
(real-world time). However, when it comes to digital electronics, we often use clock cycles
because this way we get comparative numbers that aren’t dependent on frequency or data
rate of a part. While memory speed (or data rate) addresses how fast your memory
controller can access or write data to memory, RAM latency focuses on how soon it can
start the process. The former is measured in MT/s (Mega transfers per second) and the
latter in nanoseconds. RAM latency is measured in terms of memory bus clock cycles; the
fewer clock cycles, the lower the latency. RAM latency can be manually adjusted using
lower latency settings, although most computers will automatically set it.
Memory Timings (Cycle timing) - Unlike latency, memory timings (that sequence of
numbers you see on your memory module) are measured in clock cycles. So, each number
in memory timings, like 16-19-19-39, indicates the number of clock ‘ticks’ or cycles it takes
to complete a certain task. To simplify things, imagine your memory space as a giant
spreadsheet with rows and columns, where each cell can hold binary data (0 or 1).
1. CAS Latency (tCL) – The first memory timing is something called Column Access Strobe
(CAS) Latency. Although the term strobe is a bit antiquated today as it’s a vestige from
the Asynchronous DRAM days, the term CAS is still used in the industry. CAS Latency
of RAM indicates the number of cycles it takes to get a response from memory once
the memory controller sends over a column it needs to access. Unlike all other timings
below, tCL is an exact number and not a maximum/minimum.
2. Row Address to Column Address Delay (tRCD) – It denotes the minimum number of
clock cycles it will take to open a row and access the required column.
3. Row Precharge Time (tRP) – It denote that 4-number sequence indicates the minimum
clock cycle delay to access another row within the same selected column.
4. Row Active Time (tRAS) – The last number in that memory timing sequence denotes
the minimum number of clock cycles a row needs to remain open to access the data.
This is usually the biggest delay.
Calculating RAM Latency or CAS Latency - The amount of time your memory takes to
respond to a request from the memory controller. CAS Latency is measured in clock cycles,
we need to account for memory speed to get a real-world CAS latency in nanoseconds.
CAS latency formula- ((CL x 2000)/Data rate)
Memory Bandwidth - Computers need memory to store and use data, such
as in graphical processing or loading simple documents. While random access
memory (RAM) chips may say they offer a specific amount of memory, such as
10 gigabytes (GB), this amount represents the maximum amount of memory the
RAM chip can generate. What is more important is the memory bandwidth, or
the amount of memory that can be used for files per second. As the computer
gets older, regardless of how many RAM chips are installed, the memory

bandwidth will degrade. This is because the RAM size is only part of the
bandwidth equation along with processor speed. When someone buys a RAM
chip, the RAM will indicate it has a specific amount of memory, such as 10 GB.
This measurement is not entirely accurate; it means the chip has a maximum
memory bandwidth of 10 GB but will generally have a lower bandwidth. This
is because the RAM size is only part of the bandwidth equation along with
processor speed.
Memory Interleaving - Abstraction is one of the most important aspects of

computing. It is a widely implemented Practice in the Computational field. Memory
Interleaving is less or More an Abstraction technique. Though it’s a bit different
from Abstraction. It is a Technique that divides memory into a number of
modules such that Successive words in the address space are placed in the
Different modules. Suppose a 64 MB memory made up of the 4 MB chips as shown
in the below:
Why do we use Memory Interleaving? - Whenever Processor requests Data from

the main memory. A block (chunk) of Data is Transferred to the cache and then to
Processor. So whenever a cache miss occurs the Data is to be fetched from the main
memory. But main memory is relatively slower than the cache. So, to improve the
access time of the main memory interleaving is used.
Memory interleaving is classified into two types:

1. High Order Interleaving – In high-order interleaving, the most significant
bits of the address select the memory chip. The least significant bits are
sent as addresses to each chip. One problem is that consecutive addresses
send to be in the same chip. The maximum rate of data transfer is limited by
the memory cycle time. It is also known as Memory Banking. Shown in
Figure a.
2. Low Order Interleaving – In low-order interleaving, the least significant

bits select the memory bank (module). In this, consecutive memory
addresses are in different memory modules. This allows memory access at
much faster rates than allowed by the cycle time. Shown in Figure b.

Figure a
Figure b
Cache memory: Address mapping-line size

Analysis of a large number of typical programs has shown that the references
to memory at any given interval of time tend to be confined within a few
localized areas in memory. This phenomenon is known as the property of
locality of reference. The reason for this property may be understood considering that
a typical computer program flows in a straight-line fashion with program loops
and subroutine calls encountered frequently. When a program loop is executed,
the CPU repeatedly refers to the set of instructions in memory that constitute the
loop. Every time a given subroutine is called, its set of instructions are fetched
from memory. Thus loops and subroutines tend to localize the references to
memory for fetching instructions. To a lesser degree, memory references to data also
tend to be localized. Table-lookup procedures repeatedly refer to that portion in
memory where the table is stored. Iterative procedures refer to common memory
locations and array of numbers are confined within a local portion of memory. The
result of all these observations is the locality of reference property, which states

that over a short interval of time, the addresses generated by a typical program
refer to a few localized areas of memory repeatedly, while the remainder of
memory is accessed relatively infrequently.
If the active portions of the program and data are placed in a fast small memory,
the average memory access time can be reduced, thus reducing the total
execution time of the program. Such a fast small memory is referred to as a
cache memory. It is placed between the CPU and main memory as illustrated in Fig.
2. The cache memory access time is less than the access time of main memory by a
factor of 5 to 10. The cache is the fastest component in the memory hierarchy and
approaches the speed of CPU components.
The fundamental idea of cache organization is that by keeping the most

frequently accessed instructions and data in the fast cache memory, the average
memory access time will approach the access time of the cache. Although the cache
is only a small fraction of the size of main memory, a large fraction of memory
requests will be found in the fast cache memory because of the locality of reference
property of programs.
The basic operation of the cache is as follows. When the CPU needs to access
memory, the cache is examined. If the word is found in the cache, it is read from
the fast memory. If the word addressed by the CPU is not found in the cache, the
main memory is accessed to read the word. A block of words containing the one
just accessed is then transferred from main memory to cache memory. The block
size may vary from one word (the one just accessed) to about 16 words adjacent to
the one just accessed. In this manner, some data are transferred to cache so that future
references to memory find the required words in the fast cache memory.
The performance of cache memory is frequently measured in terms of a quantity

called hit ratio. When the CPU refers to memory and finds the word in cache, it is
said to produce a hit. If the word is not found in cache, it is in main memory and
it counts as a miss The ratio of the number of hits divided by the total CPU
references to memory (hits plus misses) is the hit ratio. This high ratio verifies the
validity of the locality of reference property. The average memory access time of a
computer system can be improved considerably by use of a cache. If the hit ratio is
high enough so that most of the time the CPU accesses the cache instead of main
memory, the average access time is closer to the access time of the fast cache
memory.
The basic characteristic of cache memory is its fast access time. Therefore, very little
or no time must be wasted when searching for words in the cache. The
transformation of data from main
memory to cache memory is referred
to as a mapping process. Three
types of mapping procedures are
of practical interest when
considering the organization of
cache memory:
Figure 11. Example of cache memory.

1. Associative mapping
2. Direct mapping
3. Set-associative mapping
To help in the discussion of these three mapping procedures we will use a specific
example of a memory organization as shown in Fig. 11. The main memory can store
32K words of 12 bits each. The cache is capable of storing 512 of these words at
any given time. For every word stored in cache, there is a duplicate copy in main
memory. The CPU communicates with both memories. It first sends a 15-bit address
to cache. If there is a hit, the CPU accepts the 12-bit data from cache. If there is a
miss, the CPU reads the word from main memory and the word is then transferred to
cache.
Page Table Mapping

Associative Mapping - The fastest and
most flexible cache organization uses an
associative memory. This organization is
illustrated in Fig. 12. The associative
memory stores both the address and
content (data) of the memory word. This
permits any location in cache to store any
word from main memory.
Figure 12 Associative mapping cache (all numbers

in octal).
The diagram shows three words presently

stored in the cache. The address value of
15 bits is shown as a five-digit octal
number and its corresponding 12-bit word
is shown as a four-digit octal number. A CPU address of 15 bits is placed in the
argument register and the associative memory is searched for a matching address. If
the address is found, the corresponding 12-bit data is read and sent to the
CPU. If no match occurs, the main memory is accessed for the word. The
address data pair is then transferred to the associative cache memory. If the cache
is full, an address—data pair must be displaced to make room for a pair that is
needed and not presently in the cache. The decision as to what pair is replaced
is determined from the replacement algorithm that the designer chooses for the
cache. A simple procedure is to replace cells of the cache in round-robin order
whenever a new word is requested from main memory. This constitutes a first-
in first-out (FIFO) replacement policy.
Direct Mapping - Associative memories are expensive compared to random-

access memories because of the added logic associated with each cell. The
possibility of using a random-access memory for the cache is investigated in Fig.
13. The CPU address of 15 bits is divided into two fields. The nine least
significant bits constitute the index field and the remaining six bits form the tag

field. The figure shows that main memory needs an address that includes both
the tag and the index bits. The number of bits in the index field is equal to the
number of address bits required to access the cache memory.
In the general case, there are 2k words in cache memory and 2n words in main
memory. The n-bit memory address is divided into two fields: k bits for the
index field and n-k bits for the tag field. The direct mapping cache
organization uses the n-bit address to access the main memory and the k-bit
index to access the cache. The internal organization of the words in the cache
memory is as shown in Fig. 14(b). Each word in cache consists of the data word
and its associated tag. When a new word is first brought into the cache, the tag
bits are stored along
side the data bits.
When the CPU
generates a memory
request, the index
field is used for the
address to access the
cache.
Figure 13 Addressing
relationships between
main and cache memories.
The tag field of the CPU

address is compared with the tag in the word read from the cache. If the two tags
match, there is a hit and the desired data word is in cache. If there is no match, there
is a miss and the required word is read from main memory. It is then stored in the
cache together with the new tag, replacing the previous value. The disadvantage
of direct mapping is that the hit ratio can drop considerably
if two or more words whose addresses have the same index
but different tags are accessed repeatedly. However, this
possibility is minimized by the fact that such words are
relatively far apart in the address range (multiples of 512
locations in this example.)
(a) Main memory
(b) Cache memory
Figure 14 Direct
mapping cache
organization.
To see how the direct-

mapping organization
operates, consider the
numerical example
shown in Fig. 12-13.

The word at address zero is presently stored in the cache (index = 000, tag = 00, data
= 1220). Suppose that the CPU now wants to access the word at address 02000. The
index address is 000, so it is used to access the cache. The two tags are then
compared. The cache tag is 00 but the address tag is 02, which does not produce a
match. Therefore, the main memory is accessed and the data word 5670 is
transferred to the CPU. The cache word at index address 000 is then replaced with a
tag of 02 and data of 5670. The direct-mapping example just described uses a block
size of one word. The same organization but using a block size of 8 words is shown in
Fig. 15.
The index field is now divided into two parts: the

block field and the word field. In a 512-word cache
there are 64 blocks of 8 words each, since 64 x 8 =
512. The block number is specified with a 6-bit
field and the word within the block is specified
with a 3-bit field. The tag field stored within the
cache is common to all eight words of the same
block. Every time a miss occurs, an entire block of
eight words must be transferred from main
memory to cache memory. Although this takes
extra time, the hit ratio will most likely improve
with a larger block size because of the sequential
nature of computer programs.
Figure 15 Direct mapping cache with block size of 8 words.
Set-Associative Mapping - It was mentioned previously that the disadvantage of

direct mapping is that two words with the same index in their address but with
different tag values cannot reside
in cache memory at the same time. A
third type of cache organization,
called set associative mapping, is
an improvement over the direct-
mapping organization in that each
word of cache can store two or
more words of memory under the
same index address. Each data
word is stored together with its tag
and the number of tag data items
in one word of cache is said to form
a set.
Figure 16 Two-way set-associative
mapping cache.
An example of a set-associative cache organization for a set size of two is shown in Fig.
16. Each index address refers to two data words and their associated tags. Each tag
requires six bits and each data word has 12 bits, so the word length is 2(6 + 12) 36
bits. An index address of nine bits can accommodate 512 words. Thus, the size of

cache memory is 512 x 36. It can accommodate 1024 words of main memory since
each word of cache contains two data words. In general, a set-associative cache of set
size k will accommodate k words of main memory in each word of cache.
The octal numbers listed in Fig. 16 are with reference to the main memory
contents illustrated in Fig. 14(a). The words stored at addresses 01000 and
02000 of main memory are stored in cache memory at index address 000.
Similarly, the words at addresses 02777 and 00777 are stored in cache at index
address 777. When the CPU generates a memory request, the index value of the
address is used to access the cache. The tag field of the CPU address is then
compared with both tags in the cache to determine if a match occurs. The
comparison logic is done by an associative search of the tags in the set similar
to an associative memory search: thus the name "set-associative." The hit ratio
will improve as the set size increases because more words with the same index
but different tags can reside in cache.
When a miss occurs in a set-associative cache and the set is full, it is

replacement necessary to replace one of the tag-data items with a new value.
The most algorithms common replacement algorithms used are: random
replacement, first-in, first-out (FIFO) and Least recently used (LRU).
Writing into Cache

An important aspect of cache organization is concerned with memory write
requests. When the CPU finds a word in cache during a read operation, the
main memory is not involved in the transfer; However, if the operation is a
write, there are two ways that the system can proceed. The simplest and
most commonly used first procedure is to update main memory with every
memory write operation, with cache memory being updated in parallel if it
contains the word at the specified address. This is called the write-through method.
This method has the advantage that main memory always contains the same data as
the cache. This characteristic is important in systems with direct memory access
transfers. It ensures that the data residing in main memory are valid at all times
so that an PO device communicating through DMA would receive the most recent
updated data.
The second procedure is called the write-back method. In this method only the
cache location is updated during a write operation. The location is then marked
by a flag so that later when the word is removed from the cache it is copied into
main memory. The reason for the write-back method is that during the time a word
resides in the cache, it may be updated several times; however, as long as the word
remains in the cache, it does not matter whether the copy in main memory is out
of date, since requests from the word are filled from the cache.
Cache Initialization
One more aspect of cache organization that must be taken into consideration is
the problem of initialization. The cache is initialized when power is applied to the
computer or when the main memory is loaded with a complete set of programs from

auxiliary memory. After initialization the cache is considered to be empty, but in
effect it contains some non-valid data. It is customary to include with each word in
cache a valid bit to indicate whether or not the word contains valid data.
The cache is initialized by clearing all the valid bits to 0. The valid bit of a
particular cache word is set to 1 the first time this word is loaded from main memory
and stays set unless the cache has to be initialized again. The introduction of the
valid bit means that a word in cache is not replaced by another word unless the
valid bit is set to 1 and a mismatch of tags occurs. If the valid bit happens to be 0,
the new word automatically replaces the invalid data. Thus the initialization
condition has the effect of forcing misses from the cache until it fills with valid data.
Replacement and policies

A virtual memory system is a combination of hardware and software tech-
niques. The memory management software system handles all the software
operations for the efficient utilization of memory space. It must decide
1. Which page in main memory ought to be removed to make room for a new
page?
2. When a new page is to be transferred from auxiliary memory to main memory.
3. Where the page is to be placed in main memory.
The hardware mapping mechanism and the memory management software

together constitute the architecture of a virtual memory. When a program starts
execution, one or more pages are transferred into main memory and the page
table is set to indicate their position. The program is executed from main
memory until it attempts to reference a page that is still in auxiliary memory.
This condition is called page fault. When page fault occurs, the execution of the
present program is suspended until the required page is brought into main
memory. Since loading a page from auxiliary memory to main memory is
basically an I/O operation, the operating system assigns this task to the I/O
processor. In the meantime, control is transferred to the next program in
memory that is waiting to be processed in the CPU. Later, when the memory
block has been assigned and the transfer completed, the original program can
resume its operation.
When a page fault occurs in a virtual memory system, it signifies that the
page referenced by the CPU is not in main memory. A new page is then
transferred from auxiliary memory to main memory. If main memory is full, it
would be necessary to remove a page from a memory block to make room for
the new page. The policy for choosing pages to remove is determined from the
replacement algorithm that is used.
The goal of a replacement policy is to try to remove the page least likely to
be referenced in the immediate future. Three of the most common
replacement algorithms used is the first-in, first – out ( FIFO), least recently used
(LRU) and the Optimal Page replacement algorithms.

1. FIFO Page Replacement - The simplest page-replacement algorithm is a FIFO
algorithm. A FIFO replacement algorithm associates with each page the time when
that page was brought into memory. When a page must be replaced, the oldest page
is chosen. A FIFO queue to hold all pages in memory. We replace the page at the head
of the queue. When a page is brought into memory, we insert it at the tail of the queue.
The FIFO page-replacement algorithm is easy to understand and program. Its

performance is not always good. The page replaced may be an initialization module
that was used a long time ago and is no longer needed. If we select for replacement a
page that is in active use, everything still works correctly. After we page out an active
page to bring in a new one, a fault occurs almost immediately to retrieve the active
page. Some other page will need to be replaced to bring the active page back into
memory. Bad replacement choice increases the page-fault rate and slows process
execution, but does not cause incorrect execution.
To illustrate the problems that are possible with a FIFO page-replacement algorithm,
we consider the reference string 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1,
7, 0, 1
For example reference string, our three frames are initially empty. The first three
references (7,0,1) cause page faults, and are brought into these empty frames. The next
reference (2) replaces page 7, because page 7 was brought in first. Since 0 is the next
reference and 0 is already in memory, we have no fault for this reference. The first
reference to 3 results in page 0 being replaced, since it was the first of the three pages
in memory (0, 1, and 2) to be brought in. Because of this replacement, the next
reference, to 0, will fault. Page 1 is then replaced by page 0. This process continues as
shown in Figure 22. Every time a fault occurs, we show which pages are in our three
frames. There are 15 faults altogether.
Figure 22 FIFO page-replacement algorithm.
The curve of page faults versus the number of available frames. We notice that the
number of faults for four frames (10) is greater than the number of faults for three
frames (nine)! This most unexpected result is known as Belady's anomaly: For some
page-replacement algorithms, the page fault rate may increase as the number of
allocated frames increases. We would expect that giving more memory to a process
would improve its performance.
2. Optimal Page Replacement - One result of the discovery of Belady's anomaly

was the search for an optimal page-replacement algorithm. An optimal page-

replacement algorithm has the lowest page-fault rate of all algorithms, and will never
suffer from Belady's anomaly. In optimal page-replacement algorithm we replace the
page that will not be used for the longest period of time ---. Use of this page-
replacement algorithm guarantees the lowest possible pagefault rate for a fixed number
of frames. For example, on our sample reference string,
7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
The optimal page-replacement algorithm would yield nine page faults, as shown in
Figure 23. The first three references cause faults that fill the three empty frames. The
reference to page 2 replaces page 7, because 7 will not be used until reference 18,
whereas page 0 will be used at 5, and page 1 at 14. The reference to page 3 replaces
page 1, as page 1 will be the last of the three pages in memory to be referenced again.
Figure 23 Optimal page-replacement algorithm.
With only nine page faults, optimal replacement is much better than a FIFO algorithm,
which had 15 faults. (If we ignore the first three, which all algorithms must suffer, then
optimal replacement is twice as good as FIFO replacement.) In fact, no replacement
algorithm can process this reference string in three frames with less than nine faults.
Unfortunately, the optimal page-replacement algorithm is difficult to implement,
because it requires future knowledge of the reference string.
3. LRU Page Replacement - If the optimal algorithm is not feasible, perhaps an

approximation to the optimal algorithm is possible. The key distinction between the
FIFO and OPT algorithms (other than looking backward or forward in time) is that
the FIFO algorithm uses the time when a page was brought into memory; the OPT
algorithm uses the time when a page is to be used. If we use the recent past as an
approximation of the near future, then we will replace the page that has not been
used for the longest period of time (Figure 29). This approach is the least-recently-
used (LRU) algorithm -----.
LRU replacement associates with each page the time of that page's last use. When a
page must be replaced, LRU chooses that page that has not been used for the
longest period of time. This strategy is the optimal page-replacement algorithm
looking backward in time, rather than forward.
The result of applying LRU replacement to our example reference string is shown in
Figure 24. The LRU algorithm produces 12 faults. Notice that the first five faults are
the same as the optimal replacement. When the reference to page 4 occurs, however,
LRU replacement sees that, of the three frames in memory, page 2 was used least

recently. The most recently used page is page 0, and just before that page 3 was used.
Thus, the LRU algorithm replaces page 2, not knowing that page 2 is about to be used.
When it then faults for page 2, the LRU algorithm replaces page 3 since, of the three
pages in memory {0, 3, 4}, page 3 is the least recently used. Despite these problems,
LRU replacement with 12 faults is still much better than FIFO replacement with 15.
Figure 24 LRU page-replacement algorithm.
The LRU policy is often used as a page-replacement algorithm and is considered to be

good. The major problem is how to implement LRU replacement. An LRU page-
replacement algorithm may require substantial hardware assistance. Neither optimal
replacement nor LRU replacement suffers from Belady's anomaly. There is a class of
page-replacement algorithms, called stack algorithms, that can never exhibit Belady's
anomaly.
Coherence Virtual memory: Single level and multilevel

An address generated by the CPU is commonly referred to as a logical address,
whereas an address seen by the memory unit-that is, the one loaded into the memory-
address register of the memory-is commonly referred to as a physical address.
Figure 5 Dynamic relocation using a

relocation register.
The compile-time and load-

time address-binding methods
generate identical logical and
physical addresses. However,
the execution-time address
binding scheme results in
differing logical and physical
addresses. In this case, we
usually refer to the logical
address as a virtual address.
We use logical address and
virtual address interchangeably
in this text. The set of all logical addresses generated by a program is a logical-address
space; the set of all physical addresses corresponding to these logical addresses is a
physical-address space. Thus, in the execution-time address-binding scheme, the
logical- and physical-address spaces differ.

The run-time mapping from virtual to physical addresses is done by a hardware device
called the memory-management unit (MMU). The base register is now called a
relocation register. The value in the relocation register is added to every address
generated by a user process at the time it is sent to memory. For example, if the base
is at 14000, then an attempt by the user to address location 0 is dynamically relocated
to location 14000; an access to location 346 is mapped to location 14346. The user
program never sees the real physical addresses. The user program deals with logical
addresses. The memory-mapping hardware converts logical addresses into physical
addresses.
The user program supplies logical addresses; these logical addresses must be
mapped to physical addresses before they are used. The concept of a logical-address
space that is bound to a separate physical address space is central to proper memory
management.
In a memory hierarchy system, programs and data are first stored in auxiliary
memory. Portions of a program or data are brought into main memory as they are
needed by the CPU. Virtual memory is a concept used in some large computer
systems that permit the user to construct programs as though a large memory space
were available, equal to the totality of auxiliary memory. Each address that is
referenced by the CPU goes through an address mapping from the so-called
virtual address to a physical address in main memory.
Virtual memory is used to give programmers the illusion that they have a very
large memory at their disposal, even though the computer actually has a relatively
small main memory. A virtual memory system provides a mechanism for
translating program-generated addresses into correct main memory locations.
This is done dynamically, while programs are being executed in the CPU. The
translation or mapping is handled automatically by the hardware by means of
a mapping table.
Address Space and Memory Space - An address used by a programmer will

be called a virtual address, and the set of such addresses the address space. An
address in main memory is called a location or physical address. The set of such
locations is called the memory space. Thus, the address space is the set of
addresses generated by programs as they reference instructions and data; the
memory space consists of the actual main memory locations directly addressable
for processing. In most
computers the address
and memory spaces are
identical. The address
space is allowed to be
larger than the memory
space in computers with
virtual memory.
Figure 17 Relation between
address and memory space in
a virtual memory system.

In a multi-program computer system, programs and data are transferred to and
from auxiliary memory and main memory based on demands imposed by the
CPU. Suppose that program 1 is currently being executed in the CPU. Program 1
and a portion of its associated data are moved from auxiliary memory into main
memory as shown in Fig. 17.
Portions of programs and data
need not be in contiguous
locations in memory since
information is being moved in
and out, and empty spaces
may be available in scattered
locations in memory.
Figure 18 Memory table for mapping
a virtual address.
The address field of the

instruction code has a
sufficient number of bits to
specify all virtual addresses. In
our example, the address field
of an instruction code will
consist of 20 bits but physical
memory addresses must be specified with only 15 bits. Thus CPU will reference
instructions and data with a 20-bit address, but the information at this address
must be taken from physical memory because access to auxiliary storage for
efficient transfers, auxiliary storage moves an entire record to the main memory. A table
is then needed, as shown in Fig.17, to map a virtual address of 20 bits to a physical
address of 15 bits. The mapping is a dynamic operation, which means that every address
is translated immediately as a word is referenced by CPU.
The mapping table may be stored in a separate memory as shown in Fig.17 or in
main memory. In the first case, an additional memory unit is required as well as
one extra memory access time. In the second case, the table takes space from main
memory and two accesses to memory are required with the program running at
half speed. A third alternative is to use an associative memory.
Address Mapping Using Pages - The table implementation of the address

mapping is simplified if the information in the address space and the memory
space are each divided into groups of fixed size. The physical memory is
broken down into groups of equal size called blocks, which may range from 64 to
4096 words each. The term page refers to groups of address space of the same
size. For example, if a page or block consists of 1K words, then, using the
previous example, address space is divided into 1024 pages and main memory
is divided into 32 blocks.
Although both a page and a block are split into groups of 1K words, a page refers
to the organization of address space, while a block refers to the organization
of memory space. The programs are also considered to be split into pages.
Portions of programs are moved from auxiliary memory to main memory in

records equal to the size of a page. The term "page frame" is sometimes used to
denote a block. Consider a computer with an
address space of 8K and a memory space of
4K. If we split each into groups of 1K words we
obtain eight pages and four blocks as shown in
Fig. 19. At any given time, up to four pages of
address space may reside in main memory in
any one of the four blocks.
Figure 19 Address space and memory space split into
groups of 1K words.
The mapping from address space to memory

space is facilitated if each virtual address is
considered to be represented by two numbers: a
page number address and a line within the page. In a computer with 2 words
per page, p bits are used to specify a line address and the remaining high-order
bits of the virtual address specify the page number. In the example of Fig. 20, a
virtual address has 13 bits. Since each page consists of = 1024 words, the
high-order three bits of a virtual address will specify one of the eight pages and
the low-order 10 bits give the line address within the page.
The memory-page table consists of eight words, one for each page. The address
in the page table denotes the page number and the content of the word gives
the block number where that page is stored in main memory. The table shows
that pages 1, 2, 5, and 6 are now available in main memory in blocks 3, 0, 1, and
2, respectively. A presence bit 1 in each location indicates whether the page has
been transferred from auxiliary memory into main memory. A 0 in the presence
bit indicates that this page is not available in main memory. The CPU
references a word in memory with a virtual address of 13 bits. The three high-
order bits of the virtual address specify a page number and also an address
for the memory-page table. The content of the word in the memory page table
at the page number address is
read out into the memory
table buffer register.
Figure 20 Memory table in a paged

system.
If the presence bit is a 1, the

block number thus read is
transferred to the two high-
order bits of the main memory
address register. The line
number from the virtual
address is transferred into the
10 low-order bits of the memory
address register. A read signal
to main memory transfers the

content of the word to the main memory buffer register ready to be used by the
CPU. If the presence bit in the word read from the page table is 0, it signifies
that the content of the word referenced by the virtual address does not reside
in main memory.
Figure 21 An associative memory page table.
Virtual address
Associative Memory Page Table - A

random-access memory page table is
inefficient with respect to storage
utilization. In the example of Fig. 21 we
observe that eight words of memory are
needed, one for each page, but at least
four words will always be marked empty
because main memory cannot
accommodate more than four blocks.
In general, a system with n pages and m blocks would require a memory-page

table of n locations of which up to m blocks will be marked with block numbers
and all others will be empty. The capacity of the memory-page table must be 1024
words and only 32 locations may have a presence bit equal to 1. At any given time,
at least 992 locations will be empty and not in use.
A more efficient way to organize the page table would be to construct it with a
number of words equal to the number of blocks in main memory. In this way
the size of the memory is reduced and each location is fully utilized. This
method can be implemented by means of an associative memory with each
word in memory containing a page number together with its corresponding
block number. The page field in each word is compared with the page number
in the virtual address. If a match occurs, the word is read from memory and
its corresponding block number is extracted.
In the random access memory-page table with an associative memory of four

words as shown in Fig. 21. Each entry in the associative memory array
consists of two fields. The first three bits specify a field for storing the page
number. The last two bits constitute a field for storing the block number. The
virtual address is placed in the argument register. The page number bits in the
argument register are compared with all page numbers in the page field of the
associative memory. If the page number is found, the 5-bit word is read out
from memory. The corresponding block number, being in the same word, is
transferred to the main memory address register. If no match occurs, a call to
the operating system is generated to bring the required page from auxiliary
memory.
Paging
Paging is a memory-management scheme that permits the physical-address space
of a process to be non-contiguous. Paging avoids the considerable problem of fitting

the varying-sized memory. When some code fragments or data residing in main
memory need to be swapped out, space must be found on the backing store.
Basic Method
Figure 8 Paging hardware.
Physical memory is broken

into fixed-sized blocks called
frames. Logical memory is
also broken into blocks of the
same size called pages. When
a process is to be executed, its
pages are loaded into any
available memory frames from
the backing store. The backing
store is divided into fixed-sized
blocks that are of the same
size as the memory frames.
The hardware support for paging is illustrated in Figure 8. Every address generated by
the CPU is divided into two parts: a page number (p) and a page offset (d). The page
number is used as an index into a page table. The page table contains the base address
of each page in physical memory. This base address is combined with the page offset
to define the physical memory address that is sent to the memory unit. The paging
model of memory is shown in Figure 8.
The page size (like the frame size) is defined by the hardware. The selection of a page
size makes the translation of a logical address into a page number and page offset
particularly easy. If the size of logical-address space is 2m, and a page size is 2n
addressing units (bytes or words), then the high-order m-n bits of a logical address
designate the page number, and the n low-order bits designate the page offset. Thus,
the logical address is as follows:
Figure 9 Paging model of logical and physical
memory
where p is an index into the page table

and d is the displacement within the
page. As a example, consider the

memory in Figure 10. Using a page size
of 4 bytes and a physical memory of
32 bytes (8 pages), we show how the
user's view of memory can be mapped
into physical memory. Logical address 0
is page 0, offset 0. Indexing into the page
table, we find that page 0 is in frame 5.

Thus, logical address 0 maps to physical address 20 (= (5 x 4) + 0). Logical address 3
(page 0, offset 3) maps to physical address 23 (= (5 x 4) + 3). Logical address 4 is page
1, offset 0; according to the page table, page 1 is mapped to frame 6. Thus, logical
address 4 maps to physical address 24 (= (6 x 4) + 0). Logical address 13 maps to
physical address 9.
Offset Number is 0 1 2
0 3 for all logical page
1
Logical page
number
0,1,2,3 2
Logical
Address
Figure 10 Paging example for a 32-byte memory with 4-byte pages.
Paging itself is a form of dynamic relocation. Every logical address is bound by the
paging hardware to some physical address. Using paging is similar to using a table of
base (or relocation) registers, one for each frame of memory. When we use a paging
scheme, we have no external fragmentation: Any free frame can be allocated to a
process that needs it. However, we may have some internal fragmentation.
When a process arrives in the system to be executed, its size, expressed in pages, is
examined. Each page of the process needs one frame. Thus, if the process requires n
pages, at least n frames must be available in memory. If n frames are available, they
are allocated to this arriving process. The first page of the process is loaded into one of
the allocated frames, and the frame number is put in the page table for this process.
The next page is loaded into another frame, and its frame number is put into the page
table, and so on.

An important aspect of paging is the clear separation between the user's view of
memory and the actual physical memory. The user program views that memory as
one single contiguous space, containing only this one program. In fact, the user
program is scattered throughout physical memory, which also holds other programs.
The difference between the user's view of memory and the actual physical memory
is reconciled by the address-translation hardware. The logical addresses are
translated into physical addresses. This mapping is hidden from the user and is
controlled by the operating system. This information is generally kept in a data
structure called a frame table. The frame table has one entry for each physical page
frame, indicating whether the latter is free or allocated and, if it is allocated, to which
page of which process or processes.
TLB - Hardware Support

Each operating system has its own methods for storing page tables. Most allocate a
page table for each process. A pointer to the page table is stored with the other register
values in the process control block. The hardware implementation of the page table
can be done in several ways. In the simplest case, the page table is implemented as a
set of dedicated registers. These registers should be built with very high-speed logic
to make the paging-address translation efficient. Every access to memory must go
through the paging map, so efficiency is a major consideration.
The use of registers for the page table is satisfactory if the page table is reasonably
small (for example, 256 entries). Most contemporary computers, allow the page table
to be very large (for example, 1 million entries). For these machines, the use of fast
registers to implement the page table is not feasible. Rather, the page table is kept in
main memory, and a page-table base register (PTBR) points to the page table.
Changing page tables requires changing only this one register, substantially reducing
context-switch time.
Figure 11 Free frames (a) before allocation and
(b) after allocation.
The problem with this approach is

the time required to access a user
memory location. If we want to
access location i, we must first index
into the page table, using the value
in the PTBR offset by the page
number for i. This task requires a
memory access. It provides us with
the frame number, which is
combined with the page offset to
produce the actual address. We can
then access the desired place in
memory. With this scheme, two memory accesses are needed to access a byte (one for
the page-table entry, one for the byte). Thus, memory access is slowed by a factor of 2.
This delay would be intolerable under most circumstances.

The standard solution to this problem is to use a special, small, fast lookup
hardware cache, called translation look-aside buffer (TLB). The TLB is associative,
high-speed memory. Each entry in the TLB consists of two parts: a key (or tag) and a
value. When the associative memory is presented with an item, it is compared with all
keys simultaneously. If the item is found, the corresponding value field is returned.
The search is fast; the hardware, however, is expensive. Typically, the number of
entries in a TLB is small, often numbering between 64 and 1,024.
The TLB is used with page tables in the following way. The TLB contains only a few of
the page-table entries. When a logical address is generated by the CPU, its page
number is presented to
the TLB. If the page
number is found, its
frame number is
immediately available and
is used to access memory.
The whole task may take
less than 10 percent
longer than it would if an
unmapped memory
reference were used.
Figure 12 Paging hardware with
TLB.
If the page number is not

in the TLB (known as a
TLB miss), a memory
reference to the page table
must be made. When the
frame number is obtained, we can use it to access memory (Figure 12). In addition, we
add page number and frame number to the TLB, so that they will be found quickly on
the neat reference. If the TLB is already full of entries, the operating system must select
one entry for replacement. Replacement policy is range from least recently used to
random. The percentage of times that a particular page number is found in the TLB is
called the hit ratio. The effective memory-access time, we must weigh each case by
its probability:
Effective access time = % of hit ratio x mapped access time + % time to search
TLB x (Total time – 1st access memory for page table and frame number + access the
desired byte in memory)
Numerical example - An 80-percent hit ratio means that we find the desired page
number in the TLB 80 percent of the time. If it takes 20 nanoseconds to search the TLB,
and 100 nanoseconds to access memory, then a mapped memory access takes 120
nanoseconds when the page number is in the TLB. If we fail to find the page number in
the TLB (20 nanoseconds), then we must first access memory for the page table and
frame number (100 nanoseconds), and then access the desired byte in memory (100
nanoseconds), for a total of 220 nanoseconds. To find the effective memory-access time.

Effective access time = 0.80 x 120 + 0.20 x 220 = 140 nanoseconds.
Reliability of memory systems: Error detecting and error

correcting systems.
What is Error? - Error is a condition when the output information does not match
with the input information. During transmission, digital signals suffer from noise that
can introduce errors in the binary bits travelling from one system to other. That means
a 0 bit may change to 1 or a 1 bit may change to 0.
Types Of Errors - In a data sequence, if 1 is changed to zero or 0 is changed to 1, it is

called “Bit error”. There are generally 3 types of errors occur in data transmission
from transmitter to receiver. They are
1. Single Bit Data Errors - The change in one bit in the whole data sequence, is
called “Single bit error”. Occurrence of single bit error is very rare in serial
communication system. This type of error occurs only in parallel
communication system, as data is transferred bit wise in single line, there is
chance that single line to be noisy.
2. Multiple Bit Data Errors - If there is change in two or more bits of data
sequence of transmitter to receiver, it is called “Multiple bit error”. This type of
error occurs in both serial type and parallel type data communication
networks.
3. Burst Errors - The change of set of bits in data sequence is called “Burst error”.
The burst error is calculated in from the first bit change to last bit change.
a) b)
c)

Here we identify the error form 3rd bit to 6th bit. The numbers between 3rd and 6th
bits are also considered as error. These set of bits are called “Burst error”. These burst
bits changes from transmitter to receiver, which may cause a major error in data
sequence. This type of errors occurs in serial communication and they are difficult to
solve.
Error-Detecting codes - Whenever a message is transmitted, it may get scrambled by

noise or data may get corrupted. To avoid this, we use error-detecting codes which
are additional data added to a given digital message to help us detect if an error
occurred during transmission of the message. A simple example of error-detecting code
is parity check.
Error-Correcting codes - Along with error-detecting code, we can also pass some data
to figure out the original message from the corrupt message that we received. This
type of code is called an error-correcting code. Error-correcting codes also deploy the
same strategy as error-detecting codes but additionally, such codes also detect the
exact location of the corrupt bit. In error-correcting codes, parity check has a simple
way to detect errors along with a sophisticated mechanism to determine the corrupt
bit location. Once the corrupt bit is located, its value is reverted (from 0 to 1 or 1
to 0) to get the original message.
How to Detect and Correct Errors? - To detect and correct the errors, additional
bits are added to the data bits at the time of transmission.
• The additional bits are called parity bits. They allow detection or correction of
the errors.
• The data bits along with the parity bits form a code word.
Types of Error detection

1. Parity Checking
2. Cyclic Redundancy Check (CRC)
3. Longitudinal Redundancy Check (LRC)
4. Check Sum
Parity Checking of Error Detection - It is the simplest technique for detecting

and correcting errors. The MSB of an 8-bits word is used as the parity bit and the
remaining 7 bits are used as data or message bits. The parity of 8-bits transmitted
word can be either even parity or odd parity. Parity check is also called as “Vertical
Redundancy Check (VRC)”

1. Even parity -- Even parity means the
number of 1's in the given word
including the parity bit should be
even (2,4,6,....).
2. Odd parity -- Odd parity means the

number of 1's in the given word
including the parity bit should be odd (1,3,5,....).
Use of Parity Bit - The parity bit can be

set to 0 and 1 depending on the type of the
parity required.
1. For even parity, this bit is set to 1

or 0 such that the no. of "1 bits" in
the entire word is even. Shown in fig.
(a).
2. For odd parity, this bit is set to 1 or
0 such that the no. of "1 bits" in the
entire word is odd. Shown in fig. (b).
How Does Error Detection Take Place?
Parity checking at the receiver can detect the

presence of an error if the parity of the receiver signal
is different from the expected parity. That means, if
it is known that the parity of the transmitted signal
is always going to be "even" and if the received
signal has an odd parity, then the receiver can
conclude that the received signal is not correct. If an
error is detected, then the receiver will ignore the received byte and request for
retransmission of the same byte to the transmitter.
CRC Code Generation - Based on the desired number of bit checks, we will add some
zeros (0) to the actual data. This new binary data sequence is divided by a new word
of length n + 1, where n is the number of check bits to be added. The reminder
obtained as a result of this modulo 2
division is added to the dividend bit
sequence to form the cyclic code. The
generated code word is completely divisible
by the divisor that is used in generation of
code. This is transmitted through the
transmitter. Example
At the receiver side, we divide the received

code word with the same divisor to get the
actual code word. For an error free
reception of data, the reminder is 0. If the

reminder is a non – zero, that means there is an error in the received code / data
sequence. The probability of error detection depends upon the number of check bits
(n) used to construct the cyclic code. For single bit and two-bit errors, the probability
is 100 %.
Longitudinal Redundancy Check - In longitudinal redundancy method, a BLOCK of

bits are arranged in a table format (in rows and columns) and we will calculate the
parity bit for each
column separately. The
set of these parity bits
are also sent along with
our original data bits.
Longitudinal
redundancy check is a
bit-by-bit parity
computation, as we
calculate the parity of
each column
individually. This
method can easily
detect burst errors and
single bit errors and it fails to detect the 2-bit errors occurred in same vertical slice.
Check Sum - Checksums are similar to parity bits except, the number of bits in the
sums is larger than parity and the result is always constrained to be zero. That
means if the checksum is zero, error is detected.
A checksum of a message is an arithmetic sum of code words of certain length. The

sum is stated by means of 1’s compliment and stored or transferred as a code
extension of actual code word. At receiver a new checksum is calculated by receiving
the bit sequence from transmitter. The checksum method includes parity bits, check
digits and longitudinal redundancy check (LRC).
For example, if we have

to transfer and detect
errors for a long data
sequence (also called as
Data string) then we
divide that into
shorter words and we
can store the data with a
word of same width. For
each another incoming
bit we will add them to
the already stored data.
At every instance, the
newly added word is
called “Checksum”. At

receiver side, the received bits checksum is same as that of transmitter’s, there is no
error found.
Hamming Code - This error detecting and correcting code technique is developed by
R.W.Hamming . This code not only identifies the error bit, in the whole data sequence
and it also corrects it. This code uses a number of parity bits located at certain
positions in the codeword. The number of parity bits depends upon the number of
information bits. The hamming code uses the relation between redundancy bits and
the data bits and this code can be applied to any number of data bits.
What is a Redundancy Bit? - Redundancy means “The difference between number

of bits of the actual data sequence and the transmitted bits”. These redundancy bits
are used in communication system to detect and correct the errors, if any.
How the Hamming code actually corrects the errors? - In Hamming code, the
redundancy bits are placed at certain calculated positions in order to eliminate errors.
The distance between the two redundancy bits is called “Hamming distance”. To
understand the working and the data error correction and detection mechanism of the
hamming code, let’s see to the following stages.
Number of parity bits - The number of parity bits to be added to a data string depends
upon the number of information bits of the data string which is to be transmitted.
Number of parity bits will be calculated by using the data bits. This relation is given
below.
2P >= n + P +1
Here, n represents the number of bits in the data string.

P represents number of parity bits.
For example, if we have 4 bit data string, i.e. n = 4, then the number of parity bits to
be added can be found by using trial and error method.
Let’s take P = 2, then 2P = 22 = 4 and n + P + 1 = 4 + 2 + 1 = 7
This violates the actual expression.
So let’s try P = 3, then 2P = 23 = 8 and n + P + 1 = 4 + 3 + 1 = 8
So we can say that 3 parity bits are required to transfer the 4 bit data with single bit
error correction.
Where to Place these Parity Bits? - After calculating the number of parity bits
required, we should know the appropriate positions to place them in the information
string, to provide single bit error correction. In the above considered example, we have
4 data bits and 3 parity bits. So, the total codeword to be transmitted is of 7 bits (4
+ 3). We generally represent the data sequence from right to left, as shown below.
bit 7, bit 6, bit 5, bit 4, bit 3, bit 2, bit 1, bit 0

The parity bits have to be located at the positions of powers of 2. I.e. at 1, 2, 4, 8
and 16 etc. Therefore the codeword after including the parity bits will be like this
D7, D6, D5, P4, D3, P2, P1
Here P1, P2 and P3 are parity bits. D1 —- D7 are data bits..

Unit 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

UNIT 3

Syllabus: Memory System - Memory systems hierarchy-Main memory organization-

Memory System - Memory systems hierarchy

Figure 1. Memory hierarchy

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 1

A special very-high-speed memory called a cache memory is sometimes used to

Figure 2 Memory Hierarchy in a

The I/O processor manages

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 2

Main memory organization

Most of the main memory in a general-purpose computer is made up of RAM integrated

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 3

RAM AND ROM CHIPS

A RAM chip is better suited for

The read and write

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 4

The designer of a computer

Types of Main memory: SRAM, DRAM and its characteristics

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 5

SRAM memory cell operation

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 6

DRAM: SRAM is commonly used for a computer's cache memory, such as a

The difference between SRAM and DRAM are as follows:

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 7

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 8

Memory Interleaving - Abstraction is one of the most important aspects of

Why do we use Memory Interleaving? - Whenever Processor requests Data from

Memory interleaving is classified into two types:

2. Low Order Interleaving – In low-order interleaving, the least significant

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 9

Cache memory: Address mapping-line size

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 10

The fundamental idea of cache organization is that by keeping the most

The performance of cache memory is frequently measured in terms of a quantity

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 11

Page Table Mapping

Figure 12 Associative mapping cache (all numbers

The diagram shows three words presently

Direct Mapping - Associative memories are expensive compared to random-

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 12

The tag field of the CPU

To see how the direct-

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 13

The index field is now divided into two parts: the

Figure 15 Direct mapping cache with block size of 8 words.

Set-Associative Mapping - It was mentioned previously that the disadvantage of

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 14

When a miss occurs in a set-associative cache and the set is full, it is

Writing into Cache

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 15

Replacement and policies

The hardware mapping mechanism and the memory management software

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 16

The FIFO page-replacement algorithm is easy to understand and program. Its

Figure 22 FIFO page-replacement algorithm.

2. Optimal Page Replacement - One result of the discovery of Belady's anomaly

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 17

Figure 23 Optimal page-replacement algorithm.

3. LRU Page Replacement - If the optimal algorithm is not feasible, perhaps an

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 18

Figure 24 LRU page-replacement algorithm.

The LRU policy is often used as a page-replacement algorithm and is considered to be

Coherence Virtual memory: Single level and multilevel