You are on page 1of 90

COMPUTER ORGANIZATION

AND ARCHITECTURE

Dr. Priyanka Singh


UNIT IV: MEMORY SYSTEM
Basic concepts – Semiconductor RAM – ROM – Speed – Size
and cost – Cache memories – Improving cache performance –
Virtual memory – Memory management requirements–
Associative Memories- Secondary- storage-devices.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 2


Basic Concepts
◼ Programs and the data they operate on are held in the memory of the computer.
◼ The maximum size of the memory that can be used in any computer is determined
by the addressing scheme. For example, a computer that generates 16-bit
addresses is capable of addressing up to 216 = 64K (kilo) memory locations.
◼ Machines whose instructions generate 32-bit addresses can utilize a memory that
contains up to 232 = 4G (giga) locations, whereas machines with 64-bit addresses
can access up to 264 = 16E (exa) ≈ 16 × 1018 locations.
◼ The number of locations represents the size of the address space of the computer.
◼ The memory is usually designed to store and retrieve data in word-length quantities.
Consider, for example, a byte-addressable computer whose instructions generate
32-bit addresses. When a 32-bit address is sent from the processor to the memory
unit, the highorder 30 bits determine which word will be accessed. If a byte quantity
is specified, the low-order 2 bits of the address specify which byte location is
involved.
5/5/2022 Computer Architecture & Organization (SRM University-AP) 3
Connection between the processor and its memory

5/5/2022 Computer Architecture & Organization (SRM University-AP) 4


Connection between the processor and its memory
◼ The processor uses the address lines to specify the memory location
involved in a data transfer operation, and uses the data lines to transfer
the data.
◼ At the same time, the control lines carry the command indicating a Read or
a Write operation and whether a byte or a word is to be transferred.
◼ The control lines also provide the necessary timing information and are used
by the memory to indicate when it has completed the requested operation.
◼ When the processor-memory interface receives the memory’s response, it
asserts the MFC signal.
◼ This is the processor’s internal control signal that indicates that the requested
memory operation has been completed.
◼ When asserted, the processor proceeds to the next step in its execution
sequence.
5/5/2022 Computer Architecture & Organization (SRM University-AP) 5
Memory Parameters
• Capacity
• Speed
• Latency
• Bandwidth

The total capacity is L*W bits

5/5/2022 Computer Architecture & Organization (SRM University-AP) 6


Speed
◼ A useful measure of the speed of memory units is the time that elapses between the
initiation of an operation to transfer a word of data and the completion of that
operation. This is referred to as the memory access time TA.
◼ Another important measure is the memory cycle time Tc, which is the minimum
time delay required between the initiation of two successive memory operations, for
example, the time between two successive Read operations.
◼ The cycle time is usually slightly longer than the access time, depending on the
implementation details of the memory unit .
◼ After the first read operation, the information read from memory can be immediately
used by the CPU. However, the memory is still busy with some internal operation for
some more time called ‘recovery time’ , RT.
◼ During this time, another memory access, read or write cannot be initiated.
◼ The cycle time is the total time including access time and the recovery time.
Tc = TA + RT
5/5/2022 Computer Architecture & Organization (SRM University-AP) 7
Latency
◼ Data transfers to and from the main memory often involve blocks of data. The
speed of these transfers has a large impact on the performance of a
computer system.
◼ The memory access time defined earlier is not sufficient for describing the
memory’s performance when transferring blocks of data.
◼ During block transfers, memory latency is the amount of time it takes to
transfer the first word of a block.
◼ The time required to transfer a complete block depends also on the rate at
which successive words can be transferred and on the size of the block.
◼ The time between successive words of a block is much shorter than the time
needed to transfer the first word.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 8


Bandwidth
◼ The example above illustrates that we need a parameter other than memory
latency to describe the memory’s performance during block transfers.
◼ A useful performance measure is the number of bits or bytes that can be
transferred in one second. This measure is often referred to as the memory
bandwidth.
◼ It depends on the speed of access to the stored data and on the number of
bits that can be accessed in parallel.
◼ The rate at which data can be transferred to or from the memory depends on
the bandwidth of the system interconnections.
◼ For this reason, the interconnections used always ensure that the bandwidth
available for data transfers between the processor and the memory is very
high

5/5/2022 Computer Architecture & Organization (SRM University-AP) 9


Transfer rate
This is the rate at which data can be transferred into or out of a memory unit.
For random-access memory, it is equal to 1/(cycle time). For non-random-
access memory, the following relationship holds:

where
Tn = Average time to read or write n bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 10


Main Memory
◼ A memory unit is the collection of storage units or devices together. The
memory unit stores the binary information in the form of bits. Generally,
memory/storage is classified into 2 categories:
❑ Volatile Memory: This loses its data, when power is switched off (RAM).

❑ Non-Volatile Memory: This is a permanent storage and does not lose any
data when power is switched off.
◼ Central storage unit in a computer system
❑ • Large memory

❑ • Made up of Integrated chips

❑ • Types:

◼ RAM (Random access memory)

◼ ROM (Read only memory)


5/5/2022 Computer Architecture & Organization (SRM University-AP) 12
RAM (Random Access Memory)
◼ One distinguishing characteristic of memory that is designated as RAM is
that it is possible both to read data from the memory and to write new data
into the memory easily and rapidly.
◼ Both the reading and writing are accomplished through the use of electrical
signals.
◼ The other distinguishing characteristic of traditional RAM is that it is volatile.
◼ A RAM must be provided with a constant power supply. If the power is
interrupted, then the data are lost. Thus, RAM can be used only as temporary
storage.
◼ The two traditional forms of RAM used in computers are DRAM and SRAM.
Semiconductor RAM
◼ Semiconductor random-access memories (RAMs) are available in a wide range of
speeds. Their cycle times range from 100 ns to less than 10 ns.
◼ Memory cells are usually organized in the form of an array, in which each cell is
capable of storing one bit of information. Each row of cells constitutes a memory
word, and all cells of a row are connected to a common line
referred to as the word line, which is driven by the address decoder on the chip.
◼ The cells in each column are connected to a Sense/Write circuit by two bit lines, and
the Sense/Write circuits are connected to the data input/output lines of the chip.
◼ During a Read operation, these circuits sense, or read, the information stored in the
cells selected by a word line and place this information on the output data lines.
◼ During a Write operation, the Sense/Write circuits receive input data and store them
in the cells of the selected word.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 14


Organization of bit cells in a memory chip

5/5/2022 Computer Architecture & Organization (SRM University-AP) 15


◼ Figure is an example of a very small memory circuit consisting of 16 words of
8 bits each. This is referred to as a 16 × 8 organization. The data input and
the data output of each Sense/Write circuit are connected to a single
bidirectional data line that can be connected to the data lines of a
computer. Two control lines, R/W and CS, are provided. The R/W
(Read/Write) input specifies the required operation, and the CS (Chip Select)
input selects a given chip in a multichip memory system.
◼ The memory circuit in Figure stores 128 bits and requires 14 external
connections for address, data, and control lines. It also needs two lines for
power supply and ground connections.
◼ Consider now a slightly larger memory circuit, one that has 1K (1024)
memory cells. This circuit can be organized as a 128 × 8 memory, requiring a
total of 19 external connections.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 16


Alternatively, the same number
of cells can be organized into a
1K × 1 format. In this case, a 10-
bit address is needed, but there
is only one data line, resulting in
15 external connections. The
required 10-bit address is
divided into two groups of 5 bits
each to form the row and column
addresses for the cell array. A
row address selects a row of 32
cells, all of which are accessed
in parallel. But, only one of these
cells is connected to the external
data line, based on the column
address.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 17


Static RAM
◼ Memories that consist of circuits
capable of retaining their state as
long as power is applied are known
as static memories.
◼ Static RAM (SRAM) is a digital device
that uses the same logic elements
used in the processor.
◼ In a SRAM, binary values are stored
using traditional flip-flop logic-gate
configurations. A static RAM will hold
its data as long as power is supplied
to it.
◼ Four transistors (T1, T2, T3, T4) are cross connected in an arrangement that
produces a stable logic state. In logic state 1, point C1 is high and point C2
is low; in this state, T1 and T4 are off and T2 and T3 are on.
◼ In logic state 0, point C1 is low and point C2 is high; in this state, T1 and T4
are on and T2 and T3 are off. Both states are stable as long as the direct
current (dc) voltage is applied.
◼ Unlike the DRAM, no refresh is needed to retain data. As in the DRAM, the
SRAM address line is used to open or close a switch.
◼ The address line controls two transistors (T5 and T6). When a signal is
applied to this line, the two transistors are switched on, allowing a read or
write operation.
◼ For a write operation, the desired bit value is applied to line B, while its
complement is applied to line B. This forces the four transistors (T1, T2, T3,
T4) into the proper state. For a read operation, the bit value is read from line
B.
Dynamic RAM
◼ Static RAMs are fast, but their cells require several transistors.
Less expensive and higher density RAMs can be implemented
with simpler cells. But, these simpler cells do not retain their
state for a long period, unless they are accessed frequently for
Read or Write operations. Memories that use such cells are
called dynamic RAMs (DRAMs).
◼ A dynamic RAM (DRAM) is made with cells that store data as
charge on capacitors.
◼ The presence or absence of charge in a capacitor is interpreted
as a binary 1 or 0. Because capacitors have a natural tendency
to discharge, dynamic RAMs require periodic charge refreshing
to maintain data storage.
◼ The term dynamic refers to this tendency of the stored charge
to leak away, even with power continuously applied.
◼ The address line is activated when the bit value from this cell is to be read or
written. The transistor acts as a switch that is closed (allowing current to flow) if a
voltage is applied to the address line and open (no current flows) if no voltage is
present on the address line.
◼ For the write operation, a voltage signal is applied to the bit line; a high voltage
represents 1, and a low voltage represents 0. A signal is then applied to the
address line, allowing a charge to be transferred to the capacitor.
◼ For the read operation, when the address line is selected, the transistor turns on
and the charge stored on the capacitor is fed out onto a bit line and to a sense
amplifier. The sense amplifier compares the capacitor voltage to a reference
value and determines if the cell contains a logic 1 or a logic 0. The readout from
the cell discharges the capacitor, which must be restored to complete the
operation.
◼ Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an
analog device. The capacitor can store any charge value within a range; a
threshold value determines whether the charge is interpreted as 1 or 0.
◼ Static RAM (SRAM)
❑ – a bit of data is stored using the state of a flip-flop.
❑ – Retains value indefinitely, as long as it is kept powered.
❑ – Mostly uses to create cache memory of CPU.
❑ – Faster and more expensive than DRAM.
◼ Dynamic RAM (DRAM)
❑ – Each cell stores bit with a capacitor and transistor.
❑ – Large storage capacity
❑ – Needs to be refreshed frequently.
❑ – Used to create main memory.
❑ – Slower and cheaper than SRAM.
ROM
◼ ROM is used for storing programs that are Permanently resident in the
computer and for tables of constants that do not change in value once the
production of the computer is completed.
◼ The ROM portion of main memory is needed for storing an initial program
called bootstrap loader, which is to start the computer software operating
when power is turned on.
◼ There are five basic ROM types:
❑ • ROM - Read Only Memory

❑ • PROM - Programmable Read Only Memory

❑ • EPROM - Erasable Programmable Read Only Memory

❑ • EEPROM - Electrically Erasable Programmable Read Only Memory

❑ • Flash EEPROM memory


◼ An important application of ROMs is microprogramming. Other potential
applications include
■■ Library subroutines for frequently wanted functions
■■ System programs
■■ Function tables
◼ For a modest-sized requirement, the advantage of ROM is that the data or
program is permanently in main memory and need never be loaded from a
secondary storage device.
◼ A ROM is created like any other integrated circuit chip, with the data actually
wired into the chip as part of the fabrication process. This presents two
problems:
■■ The data insertion step includes a relatively large fixed cost, whether one
or thousands of copies of a particular ROM are fabricated.
■■ There is no room for error. If one bit is wrong, the whole batch of ROMs
must be thrown out.
PROM
◼ When only a small number of ROMs with a particular memory content is
needed, a less expensive alternative is the programmable ROM (PROM).
◼ Like the ROM, the PROM is nonvolatile and may be written into only once.
◼ For the PROM, the writing process is performed electrically and may be
performed by a supplier or customer at a time later than the original chip
fabrication.
◼ Special equipment is required for the writing or “programming” process.
PROMs provide flexibility and convenience.
◼ The ROM remains attractive for high-volume production runs.
Read-mostly memory
◼ Another variation on read-only memory is the read-mostly memory, which is
useful for applications in which read operations are far more frequent than
write operations but for which nonvolatile storage is required.
◼ There are three common forms of read-mostly memory: EPROM, EEPROM,
and flash memory.
EPROM
◼ The optically erasable programmable read-only memory (EPROM) is read
and written electrically, as with PROM. However, before a write operation, all
the storage cells must be erased to the same initial state by exposure of the
packaged chip to ultraviolet radiation.
◼ Erasure is performed by shining an intense ultraviolet light through a window
that is designed into the memory chip. This erasure process can be
performed repeatedly; each erasure can take as much as 20 minutes to
perform.
◼ Thus, the EPROM can be altered multiple times and, like the ROM and
PROM, holds its data virtually indefinitely.
◼ For comparable amounts of storage, the EPROM is more expensive than
PROM, but it has the advantage of the multiple update capability.
EEPROM
◼ A more attractive form of read-mostly memory is electrically erasable
programmable read-only memory (EEPROM). This is a read-mostly memory
that can be written into at any time without erasing prior contents; only the
byte or bytes addressed are updated.
◼ The write operation takes considerably longer than the read operation, on the
order of several hundred microseconds per byte.
◼ The EEPROM combines the advantage of nonvolatility with the flexibility of
being updatable in place, using ordinary bus control, address, and data lines.
◼ EEPROM is more expensive than EPROM and also is less dense, supporting
fewer bits per chip.
Flash memory
◼ Another form of semiconductor memory is flash memory (so named because
of the speed with which it can be reprogrammed).
◼ First introduced in the mid-1980s, flash memory is intermediate between
EPROM and EEPROM in both cost and functionality.
◼ Like EEPROM, flash memory uses an electrical erasing technology. An entire
flash memory can be erased in one or a few seconds, which is much faster
than EPROM. In addition, it is possible to erase just blocks of memory rather
than an entire chip.
◼ Flash memory gets its name because the microchip is organized so that a
section of memory cells are erased in a single action or “flash.” However,
flash memory does not provide byte-level erasure. Like EPROM, flash
memory uses only one transistor per bit, and so achieves the high density
(compared with EEPROM) of EPROM.
Cache Memory
◼ Small amount of fast memory
◼ Sits between normal main memory and CPU
◼ May be located on CPU chip or module
❑ An entire blocks of data is copied from memory to the cache because the

principle of locality tells us that once a byte is accessed, it is likely that a


nearby data element will be needed soon.
Cache Memory Principles
◼ Cache memory is designed to combine the memory access time of expensive,
high- speed memory combined with the large memory size of less expensive,
lower- speed memory.
◼ There is a relatively large and slow main memory together with a smaller, faster
cache memory. The cache contains a copy of portions of main memory. When
the processor attempts to read a word of memory, a check is made to determine
if the word is in the cache. If so, the word is delivered to the processor. If not, a
block of main memory, consisting of some fixed number of words, is read into
the cache and then the word is delivered to the processor.
◼ Because of the phenomenon of locality of reference, when a block of data is
fetched into the cache to satisfy a single memory reference, it is likely that there
will be future references to that same memory location or to other words in the
block.
Cache and Main Memory
Cache/Main Memory Structure
• The cache consists of
m blocks, called lines.
• Each line contains K
words, plus a tag of a
few bits.
• The number of lines is
considerably less than
the number of main
memory blocks
(m<<M).
Cache Read Operation
The processor
generates the read
address (RA) of a word
to be read. If the word
is contained in the
cache, it is delivered to
the processor.
Otherwise, the block
containing that word is
loaded into the cache,
and the word is
delivered to the
processor.
Cache Hits
◼ The processor does not need to know explicitly about the existence of the cache. It simply
issues Read and Write requests using addresses that refer to locations in the memory.
◼ The cache control circuitry determines whether the requested word currently exists in the
cache. If it does, the Read or Write operation is performed on the appropriate cache location.
In this case, a read or write hit is said to have occurred. The main memory is not involved
when there is a cache hit in a Read operation.
◼ For a Write operation, the system can proceed in one of two ways.
◼ In the first technique, called the write-through protocol, both the cache location and the main
memory location are updated.
◼ The second technique is to update only the cache location and to mark the block containing
it with an associated flag bit, often called the dirty or modified bit. The main memory location
of the word is updated later, when the block containing this marked word is removed from the
cache to make room for a new block. This technique is known as the write-back, or copy-
back, protocol.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 35


Cache Misses
◼ A Read operation for a word that is not in the cache constitutes a Read miss. It causes the
block of words containing the requested word to be copied from the main memory into the
cache.
◼ After the entire block is loaded into the cache, the particular word requested is forwarded to
the processor.
◼ Alternatively, this word may be sent to the processor as soon as it is read from the main
memory.
◼ The latter approach, which is called load-through, or early restart, reduces the processor’s
waiting time somewhat, at the expense of more complex circuitry.
◼ When a Write miss occurs in a computer that uses the write-through protocol, the information
is written directly into the main memory.
◼ For the write-back protocol, the block containing the addressed word is first brought into the
cache, and then the desired word in the cache is overwritten with the new information.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 36


Cache Example
Consider a Level 1 cache capable of holding 1000 words with a 0.1
s access time. Level 2 is memory with a 1 s access time.
◼ If 95% of memory access is in the L1-cache:
❑ T=(0.95)*(0.1 s) + (0.05)*(0.1+1 s) = 0.15 s
◼ If 5% of memory access is in the L1-cache:
❑ T=(0.05)*(0.1 s) + (0.95)*(0.1+1 s) = 1.05 s
◼ Want as many cache hits as possible!
1.1 s

0.1 s

0% 100%
Elements Of Cache Design
Cache Addresses
◼ Almost all nonembedded processors, and many embedded processors,
support virtual memory.
◼ In essence, virtual memory is a facility that allows programs to address
memory from a logical point of view, without regard to the amount of main
memory physically available.
◼ When virtual memory is used, the address fields of machine instructions
contain virtual addresses. For reads to and writes from main memory, a
hardware memory management unit (MMU) translates each virtual address
into a physical address in main memory.
◼ When virtual addresses are used, the system designer may choose to place
the cache between the processor and the MMU or between the MMU and
main memory.
Logical cache & physical cache
• A logical cache, also
known as a virtual
cache, stores data
using virtual
addresses. The
processor accesses
the cache directly,
without going through
the MMU.
• A physical cache
stores data using main
memory physical
addresses.
◼ One obvious advantage of the logical cache is that cache access speed is
faster than for a physical cache, because the cache can respond before the
MMU performs an address translation.
◼ The disadvantage has to do with the fact that most virtual memory systems
supply each application with the same virtual memory address space. That
is, each application sees a virtual memory that starts at address 0. Thus, the
same virtual address in two different applications refers to two different
physical addresses. The cache memory must therefore be completely
flushed with each application context switch, or extra bits must be added to
each line of the cache to identify which virtual address space this address
refers to.
Cache Size
◼ We would like the size of the cache to be small enough so that the overall
average cost per bit is close to that of main memory alone and large enough
so that the overall average access time is close to that of the cache alone.
◼ There are several other motivations for minimizing cache size. The larger the
cache, the larger the number of gates involved in addressing the cache. The
result is that large caches tend to be slightly slower than small ones—even
when built with the same integrated circuit technology and put in the same
place on chip and circuit board.
◼ The available chip and board area also limits cache size. Because the
performance of the cache is very sensitive to the nature of the workload, it is
impossible to arrive at a single “optimum” cache size.
Mapping Function
◼ Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache lines.
◼ Further, a means is needed for determining which main memory block
currently occupies a cache line.
◼ The choice of the mapping function dictates how the cache is
organized.
◼ Three techniques can be used:
❑ direct,

❑ associative, and

❑ set-associative
Cache mapping (Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 44


Mapping Function – 64K Cache Example 2
◼ Suppose we have the following configuration
❑ Word size of 1 byte

❑ Cache of 64K Byte

❑ Cache line / Block size is 4 bytes

◼ i.e. cache is 64 KB / 4 bytes = 16,384 (214) lines of 4 bytes= 16K

❑ Main memory of 16MB

◼ 24 bit address (224=16MB)

◼ 16Mb / 4bytes-per-block → 4 Mega (222) of Memory Blocks

❑ Somehow we have to map the 4 Mega of blocks in memory onto the 16K

of lines in the cache. Multiple memory blocks will have to map to the same
line in the cache!
(Example 2) for mapping
◼ For all three cases, the example includes the following
elements:
❑ ■■ The cache can hold 64 kB.
❑ ■■ Data are transferred between main memory and the cache in
blocks of 4 bytes each. This means that the cache is organized as
16K = 214 lines of 4 bytes each.
❑ ■■ The main memory consists of 16 MB, with each byte directly
addressable by a 24-bit address (224 = 16M). Thus, for mapping
purposes, we can consider main memory to consist of 4M blocks of 4
bytes each i.e (222).
Direct mapping
◼ The simplest technique, known as direct mapping, maps each
block of main memory into only one possible cache line. The
mapping is expressed as
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
◼ Each block of main memory maps into one unique line of the cache. The next
m blocks of main memory map into the cache in the same fashion; that is,
block Bm of main memory maps into line L0 of cache, block Bm+1 maps into
line L1, and so on.
Direct Mapping with C=4
◼ Shrinking our example to a cache line size of 4 slots (each
slot/line/block still contains 4 words):
❑ Cache Line Memory Block Held
◼ 0 0, 4, 8, …
◼ 1 1, 5, 9, …
◼ 2 2, 6, 10, …
◼ 3 3, 7, 11, …
❑ In general:
◼ 0 0, C, 2C, 3C, …
◼ 1 1, C+1, 2C+1, 3C+1, …
◼ 2 2, C+2, 2C+2, 3C+2, …
◼ 3 3, C+3, 2C+3, 3C+3, …
Direct Mapping with C=4
Valid Dirty Tag Block 0 Main
Block 1 Memory
Slot 0
Slot 1 Block 2

Slot 2 Block 3

Slot 3 Block 4

Cache Memory Block 5


Each slot contains K words Block 6
Tag: Identifies which memory block is in the slot
Valid: Set after block copied from memory to indicate the Block 7
cache line has valid data
(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 51


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 52


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 53


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 54


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 55


◼ For purposes of cache access, each main memory address can be viewed as
consisting of three fields. The least significant w bits identify a unique word or
byte within a block of main memory; in most contemporary machines, the
address is at the byte level. The remaining s bits specify one of the 2s blocks
of main memory. The cache logic interprets these s bits as a tag of s - r bits
(most significant portion) and a line field of r bits. This latter field identifies
one of the m = 2r lines of the cache. To summarize,
Direct-Mapping Cache Organization
(Example 2)
◼ The direct mapping technique is simple and inexpensive to implement.
◼ Its main disadvantage is that there is a fixed cache location for any given
block. Thus, if a program happens to reference words repeatedly from two
different blocks that map into the same line, then the blocks will be
continually swapped in the cache, and the hit ratio will be low (a
phenomenon known as thrashing).
Fully Associative Mapping
◼ A fully associative mapping scheme can overcome the problems of the direct
mapping scheme
❑ A main memory block can load into any line of cache

❑ Memory address is interpreted as tag and word

❑ Tag uniquely identifies block of memory

❑ Every line’s tag is examined for a match

❑ Also need a Dirty and Valid bit

◼ But Cache searching gets expensive!


❑ Ideally need circuitry that can simultaneously examine all tags for a match

❑ Lots of circuitry needed, high cost

◼ Need replacement policies now that anything can get thrown out of the cache
(will look at this shortly)
Associative mapping

5/5/2022 Computer Architecture & Organization (SRM University-AP) 61


Associative Mapping Example
Valid Dirty Tag Block 0 Main
Block 1 Memory
Slot 0

Slot 1 Block 2

Slot 2 Block 3

Slot 3 Block 4

Cache Memory Block 5

Block can map to any slot Block 6


Tag used to identify which block is in which slot
All slots searched in parallel for target Block 7
(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 63


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 64


Fully Associative Cache Organization
Associative Mapping 64K Cache Example
◼ 22-bit tag stored with each slot in the cache – no more bits for the slot line number needed
since all tags searched in parallel. Compare tag field of a target memory address with tag
entry in cache to check for hit. Least significant 2 bits of address identify which word is
required from the block, e.g.:
❑ Address: FFFFFC = 1111 1111 1111 1111 1111 1100

❑ Tag: Left 22 bits, truncate on left:

❑ 11 1111 1111 1111 1111 1111

❑ 3FFFFF (Example 2)
❑ Address: 16339C = 0001 0110 0011 0011 1001 1100

❑ Tag: Left 22 bits, truncate on left:

❑ 00 0101 1000 1100 1110 0111

❑ 058CE7
(Example 2)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 67


◼ With associative mapping, there is flexibility as to which block to replace
when a new block is read into the cache. The principal disadvantage of
associative mapping is the complex circuitry required to examine the tags of
all cache lines in parallel.
Set Associative Mapping
◼ Set- associative mapping is a compromise that exhibits the strengths of both
the direct and associative approaches while reducing their disadvantages. In
this case, the cache consists of number sets, each of which consists of a
number of lines. The relationships are
m=v*k
i = j modulo v
where
i = cache set number
j = main memory block number
m = number of lines in the cache
v = number of sets
k = number of lines in each set
◼ This is referred to as k-way set-associative mapping. With set-associative
mapping, block Bj can be mapped into any of the lines of set j.
❑ A given block maps to any line in a specific set
◼ Use direct-mapping to determine which set in the cache corresponds to a set
in memory
◼ Memory block could then be in any line of that set
❑ e.g. 2 lines per set
◼ 2 way associative mapping
◼ A given block can be in either of 2 lines in a specific set
❑ e.g. K lines per set
◼ K way associative mapping
◼ A given block can be in one of K lines in a specific set
◼ Much easier to simultaneously search one set than all lines
Set Associative Mapping
◼ To compute cache set number:
❑ Set Num = j mod v
◼ j = main memory block number Main Memory
◼ v = number of sets in cache Block 0

Block 1

Set 0 Slot 0 Block 2

Slot 1 Block 3

Set 1 Slot 2 Block 4

Slot 3 Block 5
k-Way Set-Associative Cache Organization
◼ With fully associative mapping, the tag in a memory address is quite large
and must be compared to the tag of every line in the cache. With k- way set-
associative mapping, the tag in a memory address is much smaller and is
only compared to the k tags within a single set. To summarize,
◼ ■ Address length = (s + w) bits
◼ ■ Number of addressable units = 2s+w words or bytes
(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 74


(Example 1)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 75


Set Associative Mapping 64K Cache Example
◼ E.g. Given our 64Kb cache, with a line size of 4 bytes, we have 16384 lines. Say
that we decide to create 8192 sets, where each set contains 2 lines. Then we need
13 bits to identify a set (213=8192)
◼ Use set field to determine cache set to look in
◼ Compare tag field of all slots in the set to see if we have a hit, e.g.:
❑ Address = 16339C = 0001 0110 0011 0011 1001 1100

◼ Tag = 0 0010 1100 = 02C (Example 2)


◼ Set = 0 1100 1110 0111 = 0CE7

◼ Word = 00 = 0

❑ Address = 008004 = 0000 0000 1000 0000 0000 0100

◼ Tag = 0 0000 0001 = 001

◼ Set = 0 0000 0000 0001 = 0001

◼ Word = 00 = 0
Two-Way Set-Associative Mapping Example

(Example 2)

5/5/2022 Computer Architecture & Organization (SRM University-AP) 77


Question- Consider a direct mapped cache of size 32KB with block size of 32
bytes. The CPU generates 32 bit addresses. Find the number of bits required
for each cache indexing and tag bits?

5/5/2022 Computer Architecture & Organization (SRM University-AP) 78


Example- A 4-way set associative cache memory with capacity of 16 KB is built
using a block size of 8 words. The word length is 32 bits. The size of physical
address space is 4GB. The no of bits for the Tag field is?

5/5/2022 Computer Architecture & Organization (SRM University-AP) 79


Replacement Policy
◼ The replacement policy is the technique we use to determine
which line in the cache should be thrown out when we want to
put a new block in from memory
◼ Direct mapping
❑ No choice
❑ Each block only maps to one line
❑ Replace that line
Replacement Algorithms (2) -Associative & Set Associative
◼ Least Recently used (LRU)
❑ e.g. in 2 way set associative, which of the 2 block is LRU?

◼ For each slot, have an extra bit, USE. Set to 1 when accessed, set all others
to 0.
❑ For more than 2-way set associative, need a time stamp for each slot - expensive

◼ First in first out (FIFO)


❑ Replace block that has been in cache longest

❑ Easy to implement as a circular buffer

◼ Least frequently used


❑ Replace block which has had fewest hits

❑ Need a counter to sum number of hits

◼ Random- Almost as good as LFU and simple to implement


Example: Consider a reference string: 4, 7, 6, 1, 7, 6, 1, 2, 7, 2. the number of frames
in the memory is 3. Find out the number of page faults respective to:
◼ FIFO Page Replacement Algorithm

◼ LRU Page Replacement Algorithm

◼ Optimal page replacement algorithm


Reference string: 4, 7, 6, 1, 7, 6, 1, 2, 7, 2.
Reference string: 4, 7, 6, 1, 7, 6, 1, 2, 7, 2.

In the event of page fault, replace the page which will not used for the Longest
Duration of Time in Future.
Least frequently used

No. of miss=8

Replace that block in the set that has experienced the fewest
references. LFU could be implemented by associating a counter with
each line
FIFO: Reference String: 1,2,3,4,1,2,5,1,2,3,4,5. Find the No. of Page Fault in
each cases. Increasing No of Frames, is not necessary
to decrease the No of Page Fault, is called
Case 1: When No of Frames = 3 “Baledy Anamoly”
Case 2: When No of Frames = 4

5/5/2022 Computer Architecture & Organization (SRM University-AP) 86


Thrashing
◼ If the page fault and then swapping happening very frequently
at higher rate, then operating system has to spend more time
to swap these pages. This state is called thrashing.
◼ Because of this, CPU utilization is going to be reduced.
◼ Insufficient Physical Memory
◼ Excess degree of multiprogramming
◼ How to eliminate thrashing
◼ To resolve hard drive thrashing, you can do any of the
suggestions below.
❑ Increase the amount of RAM in the computer.

❑ Decrease the number of programs being run on the


computer.
❑ Adjust the size of the swap file.

5/5/2022 Computer Architecture & Organization (SRM University-AP) 87


Cache Performance
◼ Two measures that characterize the performance of a cache are the hit ratio and the
effective access time

(Num times referenced words are in cache)


Hit Ratio = -----------------------------------------------------
(Total number of memory accesses)

(# hits)(TimePerHit)+(# misses) (TimePerMiss)


Eff. Access Time = --------------------------------------------------------
(Total number of memory accesses)
Average memory access time (AMAT) in level cache
◼ Average access time experienced by the processor is
tavg = hC + (1 - h)M
Let h be the hit rate, M the miss penalty, and C the time to access information in the
cache.
◼ The average access time experienced by the processor in a system with two cache is:

tavg = h1C1 + (1 - h1)(h2C2 + (1 - h2)M )

where
h1 is the hit rate in the L1 caches.
h2 is the hit rate in the L2 cache.
C1 is the time to access information in the L1 caches.
C2 is the miss penalty to transfer information from the L2 cache to an L1 cache.
M is the miss penalty to transfer information from the main memory to the L2 cache.
THANK YOU
5/5/2022 90

You might also like