Professional Documents
Culture Documents
Parallel Processing Chapter - 4: Memory Hierarchies
Parallel Processing Chapter - 4: Memory Hierarchies
Chapter - 4
Memory Hierarchies
• Unit of Transfer.
• A cache memory is a fast and very expensive memory placed in between CPU
and main memory to increase the overall speed of program execution. Caches
are faster than main memory. The cache memories are very expensive and are
used in very small sizes.
• The references to memory at given interval of time tend to be confined within a
few localized areas in memory. This principle is called locality of reference.
The cache works on this principal.
• Cache is controlled by the MMU and is programmer Transparent. The cache is
used to increase the speed of processing by making current program and data
available to the CPU at rapid rate.
Memory Hierarchy : Main Memory
• The memory with which the CPU can directly communicate is called primary
memory.
• The main memory holds currently needed programs and data. These are
relatively expensive. There are two type of primary memory:
• This is a semiconductor memory in which both read and write operation can
be performed, therefore it is also called Read-Write(R/W) memory.
• The user program resides in the RAM just prior to the execution by the CPU
that is the reason why these are also known as user memory.
• When ever electrical power goes off the contents of this memory are lost.
Therefore it is also known as Volatile or Temporary memory. Nowadays
these are available in semi conductor chips called integrated circuit (IC).
PRIMARY MEMORY
• Internally a RAM is organized with equal sized registers. Each register
holds a binary number called memory word.
• Each register in the chip is identified with its address. Once a particular
register is selected by providing its address on the address inputs of the
chip, two operations can be performed.
• One; the contents of that register can be taken out. This is called read
Operation.
• Two, new binary nu8mber can be entered in to that register. This is
called Write Operation.
• There are two types of RAM in use.
Static RAM.
Dynamic RAM.
STATIC RAM
• The presence of charge in the MOS cell may represent ‘1’ and the
absence of charge in the MOS cell may represent ‘0’.
• A MOS cell occupies a small area within the chip, thus a small chip may
contain large number of such cells. Therefore the packing density of a
DRAM is large. Another advantage of a DRAM is that, it is a inexpensive
as compared to SRAM.
• But its access time is large as compared to SRAM, that means DRAM
are slower. Another problem with DRAM is that the stored charge leaks
within few milliseconds. Therefore each DRAM must have a circuit
which is capable of reading the contents of DRAM and recharge
(Refresh) the same contents. Such a circuit is called Refreshing
Circuit.
DYNAMIC RAM
1. • Bit stored in the form of voltage. • Bit stored in the form of Charges.
5. • Expensive • Inexpensive
• Permanent storage
• Microprogramming
• Disk drive and tape units are handled by the operating system
with limited user intervention.
• The Magnetic tape units are offline memory for the use of backup
storage.
Memory Hierarchy Properties
• Information stored in a memory hierarchy (M1, M2,..Mn) satisfies three
important properties:
• Inclusion Property: it implies that all information items are originally
stored in level Mn. During the processing, subsets of Mn are copied into
Mn-1. similarity, subsets of Mn-1 are copied into Mn-2, and so on.
• Coherence Property: it requires that copies of the same information
item at successive memory levels be consistent.
• If a word is modified in the cache, copies of that word must be updated
immediately or eventually at all higher levels..
• Locality of References: the memory hierarchy was developed based
on a program behavior known as locality of references. Memory
references are generated by the CPU for either instruction or data
access. Frequently used information is found in the lower levels in order
to minimize the effective access time of the memory hierarchy.
Memory Hierarchy Properties
1. Access by word (4 Bytes)
from a cache block of
32Bytes, such as block a.
The inclusion property and data transfers between adjacent levels of a memory hierarchy.
Memory Capacity Planning
• The performance of a memory hierarchy is determined by the effective
access time (Teff) to any level in the hierarchy. It depends on the hit ratio
and access frequencies at successive levels.
• Hit Ratio (h): is a concept defined for any two adjacent levels of a
memory hierarchy. When an information item found in Mi, it is a hit,
otherwise, a miss. The hit ratio (hi) at Mi is the probability that an
information item will be found in Mi. The miss ratio at Mi is defined as 1-
h i.
• The Ratio of the number of hits, divided by the CPU references to
memory (HIT + MISS) is the Hit Ratio. Hit Ratio of 0.9 and higher
verifies the validity of the locality of reference property.
(Num times referenced words are in cache)
Hit Ratio = --------------------------------------------------------
(Total number of memory accesses)
Memory Capacity Planning
Effective Access Time (Teff):
• In practice, we wish to achieve as high a hit ratio as possible at Mi. Every time a
miss occurs, a penalty must be paid to access the next higher level of memory.
The miss have been called BLOCK MISSES in the cache and page faults in the
main memory, because blocks and pages are the units of transfer between levels.
• The time penalty for a page fault is much more than that for a block miss due to
that fact that T1 < T2 < T3. Stone (1990) has pointed out that “a cache miss is 2 to
4 time as costly as a cache hit, but a page fault is 1000 to 10,000 times as costly
as page hit”.
• The access frequency to Mi is defined as fi=(1-h1)(1-h2)(1-h3)…(1-hi-1). This is
indeed the probability of successfully accessing Mi, when there are i-1 misses at
the lower levels and hit at Mi.
i = j mod c
• i=Cache block
• i.e. we divide the memory block by the number of cache block and
the remainder is the cache block address.
Direct Mapping with C=4
Slot 2 Block 3
Slot 3 Block 4
• Simple
• Inexpensive
• Fixed location for given block
• If a program accesses 2 blocks that map to the same
cache block repeatedly, cache misses are very high –
condition called thrashing
Fully Associative Mapping
• A fully associative mapping scheme can overcome the problems of the
direct mapping scheme
• A main memory block can load into any line/block of cache
• Ideally need circuitry that can simultaneously examine all tags for a
match
• Lots of circuitry needed, high cost
FULL ASSOCIATIVE MAPPING
Set Associative Mapping
• Compromise between fully-associative and direct-mapped cache.
• Cache is divided into a number of sets
• Each set contains a number of lines/cache blocks
• A given block maps to any line/cache block in a specific set
Use direct-mapping to determine which set in the cache
corresponds to a set in memory
Memory block could then be in any line/cache block of that set
• e.g. 2 lines/cache block per set
2 way associative mapping
A given memory block can be in either of 2 lines/cache block in a
specific set
• e.g. K lines per set
K way associative mapping
A given memory block can be in one of K lines/cache block in a
specific set
Much easier to simultaneously search one set than all lines
Set Associative Mapping
• To compute cache set number:
SetNum = j mod v
Main Memory
• j = main memory block number
Block 0
• v = number of sets in cache
Block 1
Slot 3 Block 5
Cache Writing Policy
Write through
• Simplest technique to handle the cache inconsistency problem -
All writes go to main memory as well as cache at the same time.
• Multiple CPUs must monitor main memory traffic (snooping) to
keep local cache local to its CPU up to date in case another CPU
also has a copy of a shared memory location in its cache.
• Simple but Lots of traffic
• Slows down writes
Cache Writing Policy
Write Back
• Updates initially made in cache only
• Dirty bit is set when we write to the cache, this indicates the
cache is now inconsistent with main memory
• A technique used by virtual memory operating systems to help ensure that the
data you need is available as quickly as possible.
• The operating system copies a certain number of pages from your storage
device to main memory.
• When a program needs a page that is not in maim memory (called PAGE
FAULT), the operating system copies the required page into memory and
copies another page back to the disk.
Page Table
PAGE FAULT
tracks
surface
track k gaps
spindle
sectors
Disk Geometry
(Muliple-Platter View)
• Aligned tracks form a cylinder
cylinder k
surface 0
platter 0
surface 1
surface 2
platter 1
surface 3
surface 4
platter 2
surface 5
spindle
Disk Capacity
• Capacity: maximum number of bits that can be stored
• Vendors express capacity in units of gigabytes (GB), where 1 GB = 109,
or terabytes (TB, 1012)
spindle
read/write heads
move in unison
from cylinder to cylinder
arm
spindle
Disk Controller
System bus
Disk controller
• Seek
• Read
• Write
• Error checking
Disk Access Time
• Average time to access some target sector approximated by :
• Taccess = Tavg seek + Tavg rotation + Tavg transfer