Parallel Processing Chapter - 4: Memory Hierarchies

Parallel Processing
Chapter - 4
Memory Hierarchies
Dr. Basant Tiwari

basanttiw@gmail.com
Department of Computer Science, Hawassa University
Memory Hierarchies
• Basic concept of hierarchical memory organization

• Discrepancies between memory speed and processor speed
• Cache memory design and implementation
• Virtual memory design and implementation
• Secondary memory technology (RAID)
Memory Hierarchy Technology
• Memory is place where we kept Data as well as instructions. As we now
variety of memories are available in market.
• These memories are categorized according to their properties like
speed of accessing the data, capacity to storage of data, whether it is
volatile or non volatile nature, how data are stored and how it is
accessed, rate with data are transferred etc.
• As an end user, the most important points that one consider while
designing the memory organization for a computer are:
• Its size (capacity),
• speed (access time),
• cost per Byte,
• Transfer Bandwidth, and
• Unit of Transfer.
• Storage devices such as Registers, Caches, Main Memory, disk device

and tape units are often organized as hierarchy of Memory.
As one goes down the hierarchy the following occur:
• Decrease in cost per bit
• Increase in capacity
• Increase in access time
• Decrease in frequency of access of the memory by the processor
• The Access Time (Ti) refers to the round trip time from the CPU to the ith level of
memory.
• The Memory Size (Si) is the number of bytes or word in Level i.
• Cost of the ith level memory is estimated by the product Ci.Si
• The Bandwidth (Bi) refers to the rate at which information is transferred between
adjacent level.
• The Unit of Transfer (xi) refer to the grain size for data transfer between level i
and i+1.
• Memory devices at a lower level • Ti-1 < Ti

are faster to access, smaller in • Si-1 < Si
• Ci-1 > Ci
size and more expensive per
• Bi-1 > Bi
byte, having a higher bandwidth
• Xi-1 < Xi
and using a smaller unit of
transfer as compared with those • For I = 1,2,3 and 4 in the hierarchy
at a higher level. where I = 0 corresponds to the CPU
register
Memory Hierarchy : Register and Cache
• The registers are very high speed memory and these are parts of the
processor complex, built either on the processor chip or on the processor
board.
• Register assignment is often made by the compiler. Register transfer operation
are directly controlled by the processor after instructions are decoded.
• Register transfer is conducted at processor speed, usually in one clock cycle.
Therefore many designer would not consider register a level of memory.
• A cache memory is a fast and very expensive memory placed in between CPU
and main memory to increase the overall speed of program execution. Caches
are faster than main memory. The cache memories are very expensive and are
used in very small sizes.
• The references to memory at given interval of time tend to be confined within a
few localized areas in memory. This principle is called locality of reference.
The cache works on this principal.
• Cache is controlled by the MMU and is programmer Transparent. The cache is
used to increase the speed of processing by making current program and data
available to the CPU at rapid rate.
Memory Hierarchy : Main Memory
• The memory with which the CPU can directly communicate is called primary
memory.
• The main memory holds currently needed programs and data. These are
relatively expensive. There are two type of primary memory:
 RAM (Random Access Memory)
 ROM (Read Only Memory)
• RAM (Random Access Memory)
• This is a semiconductor memory in which both read and write operation can
be performed, therefore it is also called Read-Write(R/W) memory.
• The user program resides in the RAM just prior to the execution by the CPU
that is the reason why these are also known as user memory.
• When ever electrical power goes off the contents of this memory are lost.
Therefore it is also known as Volatile or Temporary memory. Nowadays
these are available in semi conductor chips called integrated circuit (IC).
PRIMARY MEMORY
• Internally a RAM is organized with equal sized registers. Each register
holds a binary number called memory word.
• Each register in the chip is identified with its address. Once a particular
register is selected by providing its address on the address inputs of the
chip, two operations can be performed.
• One; the contents of that register can be taken out. This is called read
Operation.
• Two, new binary nu8mber can be entered in to that register. This is
called Write Operation.
• There are two types of RAM in use.
 Static RAM.
 Dynamic RAM.
STATIC RAM
• The information in static RAM is stored in the form of Voltage

which is a static electrical quantity. These memory are formed
using number of flip-flop which can be manufactured MOS
transistors.
• The voltage level in the output of a flip-flop stores a bit.
• Normally if the voltage is high a 1 is stored and if the voltage is
low, a 0 is stored.
• One flip flop occupies relatively larger area with in the chip,
therefore packing density of static RAM is small. But access time
if static RAM is low therefore SRAM is a very fast memory. SRAM
are very costly but easier to use.
DYNAMIC RAM
• The information in a dynamic RAM is stored in the form of a charge which
is dynamic electrical quantity. The charge is stored in a capacitor which
is provided by a MOSFET (Metal Oxide Semiconductor field effect
transistor). The complete unit of a MOSFET and capacitor is called MOS
cell.
• The presence of charge in the MOS cell may represent ‘1’ and the
absence of charge in the MOS cell may represent ‘0’.
• A MOS cell occupies a small area within the chip, thus a small chip may
contain large number of such cells. Therefore the packing density of a
DRAM is large. Another advantage of a DRAM is that, it is a inexpensive
as compared to SRAM.
• But its access time is large as compared to SRAM, that means DRAM
are slower. Another problem with DRAM is that the stored charge leaks
within few milliseconds. Therefore each DRAM must have a circuit
which is capable of reading the contents of DRAM and recharge
(Refresh) the same contents. Such a circuit is called Refreshing
Circuit.
DYNAMIC RAM
• A refreshing circuit charges the DRAM every few milliseconds.

The IC available for this circuit is called DRAM CONTROLLER.
Sometime this controller circuit is built-in within the chip itself.
Such a DRAM chip is called iRAM(Integrated) or Quasi-static
RAM or Pseudo-static RAM.
• Since DRAM use MOSFET Technology, there power consumption
is very small as compared to SRAM.
• Because of their low cost, higher packing density and small power
consumption, DRAM is extensively used in computers
Difference between Static & Dynamic RAM
s.no. Static RAM Dynamic RAM
1. • Bit stored in the form of voltage. • Bit stored in the form of Charges.
2. • the smallest unit is a flip-flop • The smallest unit is a MOS cell
3. • Uses MOS Technology. • Uses MOSFET Technology.
4. • Less Access time, so faster • High Access time, so slower
5. • Expensive • Inexpensive
6. • Low Packing density • High Packing density
7. • Does not require refreshing circuit • Require a refreshing circuit.
8. • Suitable for text information • suitable for video, graphics,

information
9. • Large power dissipation • Small power dissipation.
Read Only Memory (ROM)
• Read-only memory is a type of non-volatile memory used in
computers and other electronic devices. Data stored in ROM
cannot be electronically modified after the manufacture of the
memory device.
• Permanent storage
• Microprogramming
• It contains the programming needed to start a PC, which is

essential for boot-up.
• There are numerous ROM chips located on the motherboard and a

few on expansion boards. The chips are essential for the basic
input/output system (BIOS), boot up, reading and writing to
peripheral devices, basic data management and the software for
basic processes for certain utilities.
Types of ROM
• ROM- Written during manufacture
• Very expensive for small runs
• PROM- Programmable (once)
• PROM
• Needs special equipment to program
• Read “mostly”
• EPROM - Erasable Programmable (EPROM)
• Erased by UV
• EEPROM - Electrically Erasable (EEPROM)
• Takes much longer to write than read
• Flash memory
• Erase whole memory electrically
Memory Hierarchy : Disk Drive and Tape Unit
• Device that provide back storage is called Auxiliary memory. The

most commonly auxiliary memory used in computer system are
Magnetic Disk and Tapes.
• Disk drive and tape units are handled by the operating system
with limited user intervention.
• Disk Storage is considered the highest level of online memory. It

hold the system program such as OS and compiler and some
user programs & their data set.
• The Magnetic tape units are offline memory for the use of backup
storage.
Memory Hierarchy Properties
• Information stored in a memory hierarchy (M1, M2,..Mn) satisfies three
important properties:
• Inclusion Property: it implies that all information items are originally
stored in level Mn. During the processing, subsets of Mn are copied into
Mn-1. similarity, subsets of Mn-1 are copied into Mn-2, and so on.
• Coherence Property: it requires that copies of the same information
item at successive memory levels be consistent.
• If a word is modified in the cache, copies of that word must be updated
immediately or eventually at all higher levels..
• Locality of References: the memory hierarchy was developed based
on a program behavior known as locality of references. Memory
references are generated by the CPU for either instruction or data
access. Frequently used information is found in the lower levels in order
to minimize the effective access time of the memory hierarchy.
Memory Hierarchy Properties
1. Access by word (4 Bytes)
from a cache block of
32Bytes, such as block a.
2. Access by block (32 Bytes)

from a memory page of 32block
or 1 KBytes, such as block b
from page B.
3. Access by page (1 KBytes)

from a file consisting of
many pages, such as page A
and page B in segment F
4. Segment transfer with

different number of pages.
The inclusion property and data transfers between adjacent levels of a memory hierarchy.
Memory Capacity Planning
• The performance of a memory hierarchy is determined by the effective
access time (Teff) to any level in the hierarchy. It depends on the hit ratio
and access frequencies at successive levels.
• Hit Ratio (h): is a concept defined for any two adjacent levels of a
memory hierarchy. When an information item found in Mi, it is a hit,
otherwise, a miss. The hit ratio (hi) at Mi is the probability that an
information item will be found in Mi. The miss ratio at Mi is defined as 1-
h i.
• The Ratio of the number of hits, divided by the CPU references to
memory (HIT + MISS) is the Hit Ratio. Hit Ratio of 0.9 and higher
verifies the validity of the locality of reference property.
(Num times referenced words are in cache)
Hit Ratio = --------------------------------------------------------
(Total number of memory accesses)
Memory Capacity Planning
Effective Access Time (Teff):
• In practice, we wish to achieve as high a hit ratio as possible at Mi. Every time a
miss occurs, a penalty must be paid to access the next higher level of memory.
The miss have been called BLOCK MISSES in the cache and page faults in the
main memory, because blocks and pages are the units of transfer between levels.
• The time penalty for a page fault is much more than that for a block miss due to
that fact that T1 < T2 < T3. Stone (1990) has pointed out that “a cache miss is 2 to
4 time as costly as a cache hit, but a page fault is 1000 to 10,000 times as costly
as page hit”.
• The access frequency to Mi is defined as fi=(1-h1)(1-h2)(1-h3)…(1-hi-1). This is
indeed the probability of successfully accessing Mi, when there are i-1 misses at
the lower levels and hit at Mi.
• The Effective Access Time (Teff) of a memory hierarchy is given by:

Cache Memory
CACHE MEMORY
• A cache memory is a fast and very expensive memory placed in between CPU and main
memory to increase the overall speed of program execution. Caches are faster than main
memory. The cache memories are very expensive and are used in very small sizes.
• Small amount of fast memory
• May be located on CPU chip or module
• The references to memory at given interval of time tend to be confined within a few
localized areas in memory. This principle is called locality of reference.
• For example, if memory read cycle takes 100 ns and a cache read cycle takes 20 ns,
then for four continuous references (first one bring the main memory content to cache
and next three from cache).
The time taken with cache = (100x 1) + (20 x 3)

for the for last three read
first read operations
operation
Time taken without cache = 100 x 4 = 400 ns

• The performance of cache is closely related to the nature of the
programs being executed.
Cache and Main Memory
Cache Mapping Techniques
• The transformation of data from main memory to cache
memory is referred to as a mapping process. Three types
of mapping procedures are of practical interest when
considering the organization of cache memory:
 Direct mapping
 Associative mapping
 Set-Associative mapping
Direct Mapping
• Simplest mapping technique - each block of main memory
maps to only one and specific cache Block
• i.e. if a block is in cache, it must be in one specific place
• Formula to map a memory block to a cache line:
i = j mod c
• i=Cache block
• j=Main Memory Block Number to be transferred
• c=Number of blocks in Cache
• i.e. we divide the memory block by the number of cache block and
the remainder is the cache block address.
Direct Mapping with C=4
Valid Dirty Tag Block 0 Main

Block 1 Memory
Slot 0
Slot 1 Block 2
Slot 2 Block 3
Slot 3 Block 4
Cache Memory Block 5
Each slot contains K words Block 6

Tag: Identifies which memory block is in the slot
Valid: Set after block copied from memory to Block 7
indicate the cache line has valid data
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
• If a program accesses 2 blocks that map to the same
cache block repeatedly, cache misses are very high –
condition called thrashing
Fully Associative Mapping
• A fully associative mapping scheme can overcome the problems of the
direct mapping scheme
• A main memory block can load into any line/block of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Search the memory in parallel.
• But Cache searching gets expensive!
• Ideally need circuitry that can simultaneously examine all tags for a
match
• Lots of circuitry needed, high cost
FULL ASSOCIATIVE MAPPING
Set Associative Mapping
• Compromise between fully-associative and direct-mapped cache.
• Cache is divided into a number of sets
• Each set contains a number of lines/cache blocks
• A given block maps to any line/cache block in a specific set
 Use direct-mapping to determine which set in the cache
corresponds to a set in memory
 Memory block could then be in any line/cache block of that set
• e.g. 2 lines/cache block per set
 2 way associative mapping
 A given memory block can be in either of 2 lines/cache block in a
specific set
• e.g. K lines per set
 K way associative mapping
 A given memory block can be in one of K lines/cache block in a
specific set
 Much easier to simultaneously search one set than all lines
Set Associative Mapping
• To compute cache set number:
SetNum = j mod v
Main Memory
• j = main memory block number
Block 0
• v = number of sets in cache
Block 1
Set 0 Slot 0 Block 2

Slot 1 Block 3
Set 1 Slot 2 Block 4
Slot 3 Block 5
Cache Writing Policy
Write through
• Simplest technique to handle the cache inconsistency problem -
All writes go to main memory as well as cache at the same time.
• Multiple CPUs must monitor main memory traffic (snooping) to
keep local cache local to its CPU up to date in case another CPU
also has a copy of a shared memory location in its cache.
• Simple but Lots of traffic
• Slows down writes
Cache Writing Policy
Write Back
• Updates initially made in cache only
• Dirty bit is set when we write to the cache, this indicates the
cache is now inconsistent with main memory
• Dirty bit for cache slot is cleared when update occurs

• If cache line is to be replaced, write the existing cache
line to main memory, if dirty bit is set before loading the
new memory block
Virtual Memory
What is Virtual Memory
• Virtual memory as an alternate set of memory addresses.
Programs use these virtual addresses rather than real addresses
to store instructions and data.
• When the program is actually executed, the virtual addresses are
converted into real memory addresses.
OBJECTIVE
• When a computer is executing many programs at the same time,
Virtual memory make the computer to share memory efficiently.
• Eliminate a restriction that a computer works in memory which is
small and be limited.
• When many programs is running at the same time, by distributing
each suitable memory area to each program, VM protect
programs to interfere each other in each memory area.
How does it work…
• To facilitate copying virtual memory into real memory, the
operating system divides virtual memory into pages, each of
which contains a fixed number of addresses.
• Each page is stored on a disk until it is needed.
• When the page is needed, the operating system copies it from

disk to main memory, translating the virtual addresses into real
addresses.
Paging
• Paging is virtual Memory Management technique.
• Paging provides a somewhat easier interface for programs, in that its

operation tends to be more automatic and thus transparent.
• Each unit of transfer, referred to as a page, is of a fixed size and swapped by

the virtual memory manager outside of the program’s control.
• Paging uses a virtual addresses which are mapped to physical memory as

necessary.
• A technique used by virtual memory operating systems to help ensure that the
data you need is available as quickly as possible.
• The operating system copies a certain number of pages from your storage
device to main memory.
• When a program needs a page that is not in maim memory (called PAGE
FAULT), the operating system copies the required page into memory and
copies another page back to the disk.
Page Table
PAGE FAULT
• An interrupt to the software raised by the hardware, when a

program accesses a page, that is not mapped in physical
memory (Or RAM).
• A page fault occurs when a program attempts to access a block

of memory that is not stored in the physical memory, or RAM.
• The fault notifies the operating system that it must locate the

data in virtual memory, then transfer it from the storage device,
such as an HDD to the RAM.
PAGING REPLACEMENT ALGORITHMS
• FIFO(first input/first output) : rather than choosing the

victim page at random, the oldest page is the first to be
removed.
• LRU(Least Recently used) : move out the page that is the
least rarely used.
• LFU(Least Frequently used) : move out the page that is
not used often in the past.
Secondary Storage
Disk Geometry
• Disks consist of platters, each with two surfaces
• Each surface consists of concentric rings called tracks
• Each track consists of sectors separated by gaps
tracks
surface
track k gaps
spindle
sectors
Disk Geometry
(Muliple-Platter View)
• Aligned tracks form a cylinder
cylinder k
surface 0
platter 0
surface 1
surface 2
platter 1
surface 3
surface 4
platter 2
surface 5
spindle
Disk Capacity
• Capacity: maximum number of bits that can be stored
• Vendors express capacity in units of gigabytes (GB), where 1 GB = 109,
or terabytes (TB, 1012)
• Capacity is determined by technology factors:

• Recording density (bits/in): number of bits that can be squeezed into a
1-inch segment of a track.
• Track density (tracks/in): number of tracks that can be squeezed into a
1-inch radial segment.
• Areal density (bits/in2): product of recording and track density.
• Modern disks partition tracks into disjoint subsets called recording

zones
• Each track in a zone has same number of sectors, determined by
circumference of innermost track.
• Each zone has different number of sectors per track
Computing Disk Capacity
• Capacity = (# bytes/sector) x (avg. # sectors/track) x
(# tracks/surface) x (# surfaces/platter) x
(# platters/disk)
• Example:
• 512 bytes/sector
• 300 sectors/track (on average)
• 20,000 tracks/surface
• 2 surfaces/platter
• 5 platters/disk
• Capacity = 512 x 300 x 20000 x 2 x 5

= 30,720,000,000
= 30.72 GB
Disk Operation (Single-Platter View)
The disk surface

Read/write head
spins at a fixed
is attached to end
rotational rate
• of the arm and flies over
disk surface on
thin cushion of air
spindle
By moving radially, arm can

position read/write head over
any track
Disk Operation (Multi-Platter View)
read/write heads
move in unison
from cylinder to cylinder
arm
spindle
Disk Controller
Processor Main memory
System bus
Disk controller
Disk drive Disk drive
Disks connected to the system bus.

Disk Controller
• Seek
• Read
• Write
• Error checking
Disk Access Time
• Average time to access some target sector approximated by :
• Taccess = Tavg seek + Tavg rotation + Tavg transfer
• Seek time (Tavg seek)

• Time to position heads over cylinder containing target sector
• Typical Tavg seek = 9 ms
• Rotational latency (Tavg rotation)

• Time waiting for first bit of target sector to pass under r/w head
• Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
• Transfer time (Tavg transfer)

• Time to read the bits in the target sector.
• Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min
RAID
• RAID is short for redundant array of independent(or

inexpensive)disks.
• It is a category of disk drives that employ two or more drives

in combination for fault tolerance and performance.
• RAID disk drives are used frequently on servers but aren’t

generally necessary for personal computers.
• RAID allows you to store the same data redundantly (in

multiple paces) in a balanced way to improve over all
storage performance.
RAID Disk Arrays
• Redundant Array of Inexpensive Disks

• Using multiple disks makes it cheaper for huge storage, and also
possible to improve the reliability of the overall system.
• RAID0 – data striping
• RAID1 – identical copies of data on two disks
• RAID2, 3, 4 – increased reliability
• RAID5 – parity-based error-recovery
End of Chapter - 4

Parallel Processing Chapter - 4: Memory Hierarchies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Processing Chapter - 4: Memory Hierarchies

Uploaded by

Copyright:

Available Formats

Parallel Processing

Dr. Basant Tiwari

• Basic concept of hierarchical memory organization

• speed (access time),

• cost per Byte,

• Transfer Bandwidth, and

• Storage devices such as Registers, Caches, Main Memory, disk device

• Memory devices at a lower level • Ti-1 < Ti

 RAM (Random Access Memory)

 ROM (Read Only Memory)

• RAM (Random Access Memory)

• The information in static RAM is stored in the form of Voltage

• A refreshing circuit charges the DRAM every few milliseconds.

2. • the smallest unit is a flip-flop • The smallest unit is a MOS cell

3. • Uses MOS Technology. • Uses MOSFET Technology.

4. • Less Access time, so faster • High Access time, so slower

6. • Low Packing density • High Packing density

7. • Does not require refreshing circuit • Require a refreshing circuit.

8. • Suitable for text information • suitable for video, graphics,

• It contains the programming needed to start a PC, which is

• There are numerous ROM chips located on the motherboard and a

• Device that provide back storage is called Auxiliary memory. The

• Disk Storage is considered the highest level of online memory. It

2. Access by block (32 Bytes)

3. Access by page (1 KBytes)

4. Segment transfer with

• The Effective Access Time (Teff) of a memory hierarchy is given by:

The time taken with cache = (100x 1) + (20 x 3)

Time taken without cache = 100 x 4 = 400 ns

• Formula to map a memory block to a cache line:

• j=Main Memory Block Number to be transferred

• c=Number of blocks in Cache

Valid Dirty Tag Block 0 Main

Cache Memory Block 5

Each slot contains K words Block 6

• Memory address is interpreted as tag and word

• Tag uniquely identifies block of memory

• Every line’s tag is examined for a match

• Search the memory in parallel.

• But Cache searching gets expensive!

Set 0 Slot 0 Block 2

• Dirty bit for cache slot is cleared when update occurs

• Each page is stored on a disk until it is needed.

• When the page is needed, the operating system copies it from

• Paging provides a somewhat easier interface for programs, in that its

• Each unit of transfer, referred to as a page, is of a fixed size and swapped by

• Paging uses a virtual addresses which are mapped to physical memory as

• An interrupt to the software raised by the hardware, when a

• A page fault occurs when a program attempts to access a block

• The fault notifies the operating system that it must locate the

• FIFO(first input/first output) : rather than choosing the

• Capacity is determined by technology factors:

• Modern disks partition tracks into disjoint subsets called recording

• Capacity = 512 x 300 x 20000 x 2 x 5

The disk surface

By moving radially, arm can

Processor Main memory

Disk drive Disk drive

Disks connected to the system bus.

• Seek time (Tavg seek)

• Rotational latency (Tavg rotation)

• Transfer time (Tavg transfer)

• RAID is short for redundant array of independent(or