Memory Systems

1
CHAPTER 4
MEMORY SYSTEM DESIGN
Out line
2
 Characteristics of a Memory System

 Memory Hierarchy
 Main Memory
 SRAM
 DRAM
 Organization of a Memory Chip
 Memory Module Organization
 Cache Memory
 Elements of Cache Design
 Secondary Memory
Characteristics of a Memory System
3
 Location
 Processor
 Internal (Main)
 External (Secondary)
 Capacity
 Word Size
 Number of Words
 Unit of Transfer
 Word
 Block
4
 Access Method
 Sequential (Tape)
◼ Start at the beginning and read through in order
◼ Access time depends on location of data and previous location
 Direct (Disk)
◼ Individual blocks have unique address
◼ Access is by jumping to vicinity plus sequential search
◼ Access time depends on location of data and previous location
 Random (RAM/ROM)
◼ Individual addresses identify locations exactly
◼ Access time is independent of location or previous access
5
 Access Method (Contd.)

 Associative (Cache)
◼ Based on content
◼ Data is located by a comparison with contents of a portion
of the store
◼ Access time is independent of location or previous access
6
 Performance
 Access Time
◼ Time between presenting the address and getting the valid
data
 Cycle Time
◼ Time may be required for the memory to “recover” before
next access
◼ Cycle time is access + recovery
 Transfer Rate
◼ Rate at which data can be moved
7
 Physical Type
 Semiconductor
◼ RAM / ROM
 Magnetic
◼ Disk & Tape
 Optical
◼ CD & DVD
 Magneto-Optical
◼ CD-RW
8
 Physical Characteristics
 Volatile/ Non-Volatile
 Erasable / Non-Erasable
 Power Consumption
 Organization
 Physical arrangement of bits into words
◼ Not always obvious
Memory Hierarchy
9
 Memory design is governed by three questions:

 How large?
 How fast?
 How much?
 Three rules:
 Faster access time, greater cost per bit.
 Greater capacity, slower access time.
 Greater capacity, smaller cost per bit.
To solve this dilemma, designers use a hierarchy of memory systems.

Memory Hierarchy
10
Register Inboard Memory

Cache
< $ / bit Main Memory
> Capacity Magnetic Disk

CD ROM Outboard Storage
> Access Time CD-RW
DVD
< Frequency DVD-RW
of access
Magnetic Tape Off-Line
WORM Storage
Locality of Reference
11
 The memory hierarchy presented works because of

a natural phenomena known as “locality of
reference”.
 Locality of reference- is the term for phenomena in
which the same values, or related storage locations,
are frequently accessed, depending on memory
access pattern.
 During the execution of a program, memory
references for instructions and data tend to cluster.
 Keeping the current cluster in the faster memory
level allows faster memory access.
Main Memory
12
 Relatively large and fast.

 Used to store programs and data during the
computer operation.
 The principle technology is based on semiconductor
ICs.
 Usually referred to as Random Access Memory

(RAM).
 Themore accurate name would be Read / Write
Memory (R / WM)
RAM
13
 Allows both read and write operations.

 Both operations are performed electrically.
 Volatile.
 Used for temporary storage only.
 If the power is disconnected, the contents become invalid.
 Two main varieties.

 Static.
 Dynamic.
Dynamic RAM (DRAM)
14
 Usually used for Main Memory in most computer

systems.
 Inexpensive.
 Uses only one transistor per bit.

 Data is stored as charge in capacitors.
 Destructive read.
◼ Charge on capacitor
is drained during a read.
◼ Data must be re-written
after a read.
DRAM – (Contd.)
15
 Charge on a capacitor decays naturally.

 Therefore, DRAM needs refreshing even when powered to maintain
the data.
 Refreshing is done by reading and re-writing each word every few
milliseconds.
◼ Refresh Rate.
 During “suspended” operation, notebook computers use power
mainly for DRAM refresh.
Static RAM (SRAM)
16
 Consists of internal flip flop like structures that store the

binary information.
 No charges to leak.
◼ No refreshing is needed.
 Non-destructive read.
 More complex construction.
◼ Larger cell, Less dense.
 More expensive.
 Faster.
 Usually used for Cache Memory.

SRAM vs. DRAM
17
 Storage cells in DRAM are simpler and smaller.

+ DRAM is more dense.
◼ More bits per square area.
+ DRAM is less expensive.
+ DRAM uses less power.
 DRAM requires extra circuitry to implement refresh

mechanism.
 DRAM is slower.
18
SRAM
Chip
Organization
Read Only Memory (ROM)
19
 Read but cannot write.

 Non volatile.
 Used for:
 Microprogramming.
 System programs.
 Whole programs in embedded systems.
 Library subroutines and function tables.
 Constants.
 Manufactured with the data wired into the chip.

 No room for mistakes.
ROM Structure
20
Programmable ROM (PROM)
21
 Non volatile.
 Can be programmed - written into - only once.
 Programming is done electrically and can be done after

manufacturing.
 Special equipment is needed for the programming
process.
 Uses fuses instead of diodes.
 Fusesthat need to be removed are “vaporized” during the
programming process using a high voltage pulase (10 – 30 V).
 CAN NOT BE ERASED.

Erasable PROM (EPROM)
22
 Uses floating-gate MOS transistors with insulating material that

changes behavior when exposed to ultraviolet light.
 Programmed electrically and erased optically.
 Erasing can be repeated a relatively large but limited number of times
(~100,000 times).
 Erasing time ~20 minutes.
 Electrically read and written.

 Before writing, ALL cells must be erased by exposure to ultraviolet light.
 Non volatile.
 More expensive than PROM.
Electrically Erasable PROM (EEPROM)
23
 Uses the same floating-gate transistors, except that the

insulating material is much thinner.
 Its operation can be inverted using voltage.
 Can be written to any time without erasing the previous
contents.
 Only the bytes addressed are modified.
 Write takes a relatively long time (~100msec/byte).
 Can be erased only about 10,000 times.
 Non volatile.
 Updatable in place.
 More expensive and less dense than EPROM.
Flash Memory
24
 Called flash due to the speed of re-programming.

 Uses electrical erasure technology.
 An entire chip can be erased in 1-2 sec.

 Possible to erase only blocks of data.
 Does not provide byte level erasure.
 Uses one transistor per bit.
 Very high density.
 Cost is between EPROM and EEPROM.

 Non Volatile.
Organization of a Memory Chip
25
 The basic element of a semiconductor memory is the

memory cell.
 Thereare different types, but they all share some
common properties:
◼ Two states, 1 and 0.
◼ It is possible to write into the cell. (At least once).
◼ They can be read to sense the state.
Organization of a Memory Chip
26
 How to organize a-16 Mbit chip?

 1 Mega words of 16 bits each.
◼ Tall and narrow organization.
 Chips like to be square.
 Typical organization is:

◼ 2048 x 2048 x 4bit array.
◼ Organized internally as a square structure with decoders for row
and column.
◼ Simplifies decoding logic.
◼ Reduces number of address pins.
◼ Row and column address bits are multiplexed.
Organization of the Memory Chip
27
Memory Module Organization
28
 Most high capacity RAM chips contain only a single bit

per location.
 To build a multi-bit per location module, we will need
multiple chips.
 Design a 256K Byte memory system using 8 256K X 1

chips.
 256K requires 18 address wires
◼ We will apply 9 wires to the row selectors and 9 to the column
selectors
 The outputs of the chips are combined together to form the
8 bit output of the system.
29
Organization of
the 256 K Byte
System
 Each chip receives all 18 bits
of the address.
 Each chip produces/receives
a single bit of the data.
Memory Module Organization
30
 What if the size of the system is not the same as the

chips?
 Design a 1 MByte system using 256K X 1 chips.

 We will have to arrange the chips themselves into columns
and rows.
 There will be 4 columns of chips.
 Number of columns = system’s address space / chip’s address space.
 There will be 8 rows of chips.
 Number of rows = system’s word size / chip’s word size.
 Some of the address wires will have to be used for selecting

different rows of chips.
Organization of the 1 M Byte System
31
32
Cache Memory
33
 Cache Memory is intended to give:

 Memory speed approaching that of the fastest
memories available.
 Large memory size at the price of less expensive types
of semiconductor memories.
 Small amount of fast memory.

 Sits between normal main memory and CPU.
 May be located on CPU chip or module.
Conceptual Operation
34
 Relatively large and slow main memory together with faster, smaller
cache.
 Cache contains a copy of portions of main memory.
 When processor attempts to read a word from memory, a check is
made to determine if the word exists in cache.
 If it is, the word is delivered to the processor.
 If not, a block of main memory is read into the cache, then the word is delivered
to the processor.
Word Block
Transfer Transfer
CPU Cache
Memory
Main
Memory
Hit Ratio
35
 A measure of the efficiency of the cache structure.

 When the CPU refers to memory and the word is found
in the cache, this called a hit.
 When the word is not found in cache, this is called a
miss.
 Hit ratio is the total number of hits divided by the

total number of access attempts (hits + misses).
 It has been shown practically that hit rations higher than
0.9 are possible.
Cache vs. Main Memory Structure
36
0
1
Block 2 Block
Tag
(K words)
0
1
2
3
.
.
C-1
.
Block Length
(K Words)
Cache Main
2n - 1
Memory
Word Length
Main Memory and Cache Memory
37
 Main Memory consists of 2n addressable words.

 Each word has a unique n-bit address.
 We can consider that main memory is made up of
blocks of K words each.
 Usually, K is about 16
 Cache consists of C lines of K words each.
 A block of main memory is copied into a line of Cache.

 The “tag” field of the line identifies which main memory
block each cache line represents
Elements of Cache Design
38
 Size
 Mapping function
 Replacement algorithm
 Write policy
 Line size
 Number of caches
Elements of Cache Design...
39
 Cache Size
◼ Small enough ---not to be costly or expensive
◼ Large enough so overall average access time is small
◼ Affected by the available chip and board area
40
 Mapping Function
 No of cache lines <<< No of blocks in main memory
 Mapping function needed
◼ A method to map main memory blocks into cache lines
 Three mapping techniques used
 Direct
 Associative
 Set Associative
 Typical memory-cache organization

 Cache of 64kByte
◼ Organized as 16k lines of 4 bytes
◼ Cache block of 4 bytes
 16MBytes main memory
◼ Byte addressable memory
◼ 24 bit address
◼ (224=16M)
Direct Mapping
41
 Each block of main memory maps to only one cache

line
 i= j modulo m
◼i = cache line number,
◼ j = main memory block number and
◼ m = number of lines in the cache
 i.e. if a block is in cache, it must be in one specific place
 Mapping function implemented using main memory

address
Direct Mapping
42
 Map each block of memory into only one possible

cache line.
A block of main memory can only be brought into the
same line of cache every time.
Cache Line Main memory blocks assigned

0 0, C, 2C, 3C, …
1 1, C+1, 2C+1, 3C+1, …
… …
C–1 C-1, 2C-1, 3C-1, 4C-1, …

Direct Mapping...
43
 Address viewed as having three fields

 Word, line and tag identifier
 Least Significant w bits identify unique word in a block
 Most Significant s bits specify one of 2s memory block

 The MSBs are split into
◼a tag of (s – r) bits (most significant)
◼ Stored in the cache along with the data words of the line
◼a cache line field of r bits
◼ Identifies one of m = 2r lines of the cache
 Direct Mapping
 Address Structure
Tag s-r Line or Slot r Word w
8 14 2
Tag Line or Slot Word
 24 bit address
 2 bit word identifier (4 byte block)
 22 bit block identifier
◼ 8 bit tag (=22-14)
◼ 14 bit slot or line
 No two blocks in the same line have the same Tag field
Reading From a Direct Mapped
45
System
 The processor produces a 24 bit address.
 The cache uses the middle 14 bits to identify one of
its 16 K lines.
 The upper 8 bits of the address are matched to the
tag field of the cache entry.
 If they match, then the lowest order two bits of the
address are used to access the word in the cache line.
 If not, address is used to fetch the block containing the
specified word from main memory to the cache.
46
 Direct Mapping Cache Organization

47
 Direct Mapping Summary

 Address length = (s + w) bits
 Number of addressable units = 2(s+w) words / bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2(s+ w)/2w = 2s
 Number of lines in cache = m = 2r
 Size of tag = (s – r) bits

Direct Mapping
48
 Advantages.
 Simple.
 Inexpensive to implement.
 Disadvantages.
 There is a fixed location for each block in the cache.
◼ Ifa program addresses words from two blocks mapped to
the same line, the blocks have to be swapped in and out of
cache repeatedly.
Associative Mapping
49
 To improve the hit ratio of the cache, another

mapping techniques is often utilized, “associative
mapping”.
 A block of main memory may be mapped into ANY

line of the cache.
A block of memory is no longer restricted to a single
line of cache.
Associative Mapping
50
 A main memory address is considered to be made

up of two pieces:
 Tag
◼ Upper bits of the address
 Word address within a block
◼ Lower 2 bits of the address
Associative Mapping Address Structure
51
 16 Mbytes of memory.
 24 bits in address.
 4 byte blocks.
2 bits.
 Rest is used to identify the block mapped to the
line. 22 2
Tag Word
Reading From an Associative Mapped System
52

 The upper 22 bits of the address are matched to
the tag field of EACH cache entry.
 This matching must be done simultaneously to each of
the entries.
 i.e. Associative memory.
Associative Mapping Cache
Organization
53
Associative Mapping
54
 Advantages.
 Improves hit ratio for certain situations.
 Disadvantages.
 Requiresvery complicated matching hardware for
matching the tag and the entries for each line.
◼ Expensive.
Set Associative Mapping
55
 Set Associative Mapping helps reduce the complexity of

the matching hardware for an associative mapped
cache.
 Cache is divided into a number of sets.

 Each set contains a number of lines.
A 2-way set associative cache has 2 lines per set.
 A block of memory is restricted to a SPECIFIC set of

lines.
 A block of main memory may map to ANY line in the given
set.
56
 A main memory address is considered to be made

up of two pieces:
 Tag.
◼ Upper bits of the address.
 Set number.
◼ Middle bits of the address.
 Word address within a block.
◼ Lower 2 bits of the address.
Set Associative Mapping Address Structure
57
 16 Mbytes of memory.
 24 bits in address.
 4 byte blocks.
 Lowest order 2 bits.
 8K sets in a 2-way associative cache.
 13 bits.
 Rest is used to identify the block mapped to the
line.
9 13 2
Tag Set Word
Reading From a Set Associative Mapped System
58

 The cache uses the middle 13 bits to identify one of
its 8 K sets.
 The upper 9 bits of the address are matched to the
tag field of the cache entries that make up the set.
 The number of lines to match to is very limited.
 Therefore, the matching hardware is much simpler.
Set Associative Mapping Cache Organization
59
60
 Advantages.
 Combines advantages of direct and associative
mapping techniques.
 Disadvantages.
 Increasing the size of the set does not always improve
the hit ratio.
 2-way set associative has a much higher hit ratio than direct
mapping.
 Increasing it to 4-way improves the hit ratio slightly more.
 Beyond that no significant improvement has been seen.
Replacement Algorithms
61
 What happens if there is a “miss” and the cache is

already full?
 One of the items in the cache needs to be “replaced” with
the new item.
 Which one??
 Depends on the mapping technique used.
 Direct mapping.
 No choice.
 Memory blocks map into certain cache lines.
◼ The entry occupying that line must be swapped out.
Replacement Algorithms
62
 Associative & Set Associative:

 Random.
 First-in
First-out (FIFO).
 Least Recently Used (LRU).
 Least Frequently Used (LFU).

◼ The last three require additional bits for each entry to keep
track of order, time or number of times used.
 Usually,these algorithms are implemented in hardware

for speed.
Writing Into Cache
63
 Cache entries are supposed to be exact “copies” of

what is in main memory.
 What happens when the CPU wants to write into
memory??
 Which memory does it write too???
 Two techniques are possible.

 Write-through.
 Write-back.
Write-Through
64
 The simplest and most commonly used technique is

to update both the cache and main memory at the
same time.
 Advantage.
 Memory and cache are always in sync.
 Disadvantage.
 Memory write becomes slow.
Write-Back
65
 The update is done ONLY to the word in the cache

and the block containing the word is marked.
 When the block is to be swapped out of cache, the
word is written back to main memory.
 Advantage.
 Reduces memory traffic because a word may be
updated several times while in cache.
 Disadvantage.
 Cache and memory will be out of sync for a while.
 What about DMA??
Number of Caches
66
 When a cache miss occurs, the system suffers through a

large delay while the block is read from main memory
into the cache.
 Two possible solutions.

◼ Speed up the transfer of information.
◼ The transfer rate is limited by issues that may not be under our control.
◼ Speed up the source of the information.

◼ Main memory is between 7X and 10X slower than cache.
◼ We can insert an intermediate level of memory between cache and main
memory.
Cache Levels
67
 In most of today’s designs, cache sits on the same chip

as the CPU. “On-chip cache”
 Data travels a very short distance
 No need to use the very slow bus
 This is known as L1 cache
◼ Intel calls this level L0
 To reduce the penalty of a cache miss, a second level of

cache is inserted between main memory and the on-
chip cache.
 L2 cache
Cache Levels
68
Memory System
Data
On Bus Bus
Off-Chip
CPU Chip
Cache
Main
Bus Cache
Memory
MPU Chip
Pentium Pentium Pro

“L2” Cache
69
 A very fast, SRAM based, cache is placed off-chip.

 Slower than the on-chip cache.
 Larger than the on-chip cache.
 On-Module Cache.
◼ CPU uses a dedicated, internal, fast, memory bus to access
cache.
 On-Mother-Board Cache.
◼ The CPU has to use the system bus to get to it.
◼ Still much faster than DRAM based main memory.
Cache Strategy
70
 On-Chip Cache is optimized to increase “hit rate”.

 Block
size about 4 words
 Many blocks
 Off-Chip Cache is optimized to reduce “miss

penalty”.
 Larger block size
 Smaller number of blocks.
71
Secondary memory
Types of External Memory
 Magnetic Disk
 RAID
 Removable
 Optical
 CD-ROM
 CD-Recordable (CD-R)
 CD-R/W
 DVD
 Magnetic Tape
Magnetic Disk
 Disk substrate coated with magnetizable material (iron
oxide…rust)
 Substrate used to be aluminium
 Now glass
 Improved surface uniformity
◼ Increases reliability
 Reduction in surface defects
◼ Reduced read/write errors
 Lower flight heights (See later)
 Better stiffness
 Better shock/damage resistance
How Hard Drive works?
74
 See video
Read and Write Mechanisms
 Recording & retrieval via conductive coil called a head
 May be single read/write head or separate ones
 During read/write, head is stationary, platter rotates
 Write
 Current through coil produces magnetic field
 Pulses sent to head
 Magnetic pattern recorded on surface below
 Read (traditional)
 Magnetic field moving relative to coil produces current
 Coil is the same for read and write
 Read (contemporary)
 Separate read head, close to write head
 Partially shielded magneto resistive (MR) sensor
 Electrical resistance depends on direction of magnetic field
 High frequency operation
 Higher storage density and speed
Inductive Write MR Read
Data Organization and Formatting
 Concentric rings or tracks
 Gaps between tracks
 Reduce gap to increase capacity
 Same number of bits per track (variable packing

density)
 Constant angular velocity
 Tracks divided into sectors

 Minimum block size is one sector
 May have more than one sector per block
Disk Data Layout
Disk Velocity
 Bit near centre of rotating disk passes fixed point slower than bit on
outside of disk
 Increase spacing between bits in different tracks
 Rotate disk at constant angular velocity (CAV)
 Gives pie shaped sectors and concentric tracks
 Individual tracks and sectors addressable
 Move head to given track and wait for given sector
 Waste of space on outer tracks
◼ Lower data density
 Can use zones to increase capacity
 Each zone has fixed bits per track
 More complex circuitry
Disk Layout Methods Diagram
Finding Sectors
 Must be able to identify start of track and sector
 Format disk
 Additionalinformation not available to user
 Marks tracks and sectors
Winchester Disk Format
Seagate ST506
Characteristics
 Fixed (rare) or movable head
 Removable or fixed
 Single or double (usually) sided
 Single or multiple platter
 Head mechanism
 Contact (Floppy)
 Fixed gap
 Flying (Winchester)
Fixed/Movable Head Disk
 Fixed head
 One read write head per track
 Heads mounted on fixed ridged arm
 Movable head
 One read write head per side
 Mounted on a movable arm
Removable or Not
 Removable disk
 Can be removed from drive and replaced with another
disk
 Provides unlimited storage capacity
 Easy data transfer between systems
 Nonremovable disk
 Permanently mounted in the drive
Multiple Platter
 One head per side
 Heads are joined and aligned
 Aligned tracks on each platter form cylinders
 Data is striped by cylinder
 reduces head movement
 Increases speed (transfer rate)
Multiple Platters
Tracks and Cylinders
Floppy Disk
 8”, 5.25”, 3.5”
 Small capacity
 Up to 1.44Mbyte (2.88M never popular)
 Slow
 Universal
 Cheap
 Obsolete?

Memory Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Memory Systems

Uploaded by

Copyright:

Available Formats

1

 Characteristics of a Memory System

 Access Method (Contd.)

 Memory design is governed by three questions:

To solve this dilemma, designers use a hierarchy of memory systems.

Register Inboard Memory

> Capacity Magnetic Disk

 The memory hierarchy presented works because of

 Relatively large and fast.

 Usually referred to as Random Access Memory

 Allows both read and write operations.

 Two main varieties.

 Usually used for Main Memory in most computer

 Uses only one transistor per bit.

 Charge on a capacitor decays naturally.

 Consists of internal flip flop like structures that store the

 Usually used for Cache Memory.

 Storage cells in DRAM are simpler and smaller.

 DRAM requires extra circuitry to implement refresh

 Read but cannot write.

 Manufactured with the data wired into the chip.

 Programming is done electrically and can be done after

 CAN NOT BE ERASED.

 Uses floating-gate MOS transistors with insulating material that

 Electrically read and written.

 Uses the same floating-gate transistors, except that the

 Called flash due to the speed of re-programming.

 An entire chip can be erased in 1-2 sec.

 Cost is between EPROM and EEPROM.

 The basic element of a semiconductor memory is the

 How to organize a-16 Mbit chip?

 Chips like to be square.

 Typical organization is:

 Most high capacity RAM chips contain only a single bit

 Design a 256K Byte memory system using 8 256K X 1

 What if the size of the system is not the same as the

 Design a 1 MByte system using 256K X 1 chips.

 Some of the address wires will have to be used for selecting

 Cache Memory is intended to give:

 Small amount of fast memory.

 A measure of the efficiency of the cache structure.

 Hit ratio is the total number of hits divided by the

 Main Memory consists of 2n addressable words.

 Cache consists of C lines of K words each.

 A block of main memory is copied into a line of Cache.

 Typical memory-cache organization

 Each block of main memory maps to only one cache

 i.e. if a block is in cache, it must be in one specific place

 Mapping function implemented using main memory

 Map each block of memory into only one possible

Cache Line Main memory blocks assigned

C–1 C-1, 2C-1, 3C-1, 4C-1, …

 Address viewed as having three fields

 Least Significant w bits identify unique word in a block

 Most Significant s bits specify one of 2s memory block

 Direct Mapping Cache Organization

 Direct Mapping Summary

 Block size = line size = 2w words or bytes

 Number of blocks in main memory = 2(s+ w)/2w = 2s

 Number of lines in cache = m = 2r

 Size of tag = (s – r) bits

 To improve the hit ratio of the cache, another

 A block of main memory may be mapped into ANY