You are on page 1of 89

1

CHAPTER 4
MEMORY SYSTEM DESIGN
Out line
2

 Characteristics of a Memory System


 Memory Hierarchy
 Main Memory
 SRAM
 DRAM
 Organization of a Memory Chip
 Memory Module Organization
 Cache Memory
 Elements of Cache Design
 Secondary Memory
Characteristics of a Memory System
3

 Location
 Processor
 Internal (Main)
 External (Secondary)

 Capacity
 Word Size
 Number of Words

 Unit of Transfer
 Word
 Block
Characteristics of a Memory System
4

 Access Method
 Sequential (Tape)
◼ Start at the beginning and read through in order
◼ Access time depends on location of data and previous location
 Direct (Disk)
◼ Individual blocks have unique address
◼ Access is by jumping to vicinity plus sequential search
◼ Access time depends on location of data and previous location
 Random (RAM/ROM)
◼ Individual addresses identify locations exactly
◼ Access time is independent of location or previous access
Characteristics of a Memory System
5

 Access Method (Contd.)


 Associative (Cache)
◼ Based on content
◼ Data is located by a comparison with contents of a portion
of the store
◼ Access time is independent of location or previous access
Characteristics of a Memory System
6

 Performance
 Access Time
◼ Time between presenting the address and getting the valid
data
 Cycle Time
◼ Time may be required for the memory to “recover” before
next access
◼ Cycle time is access + recovery

 Transfer Rate
◼ Rate at which data can be moved
Characteristics of a Memory System
7

 Physical Type
 Semiconductor
◼ RAM / ROM
 Magnetic
◼ Disk & Tape
 Optical
◼ CD & DVD
 Magneto-Optical
◼ CD-RW
Characteristics of a Memory System
8

 Physical Characteristics
 Volatile/ Non-Volatile
 Erasable / Non-Erasable

 Power Consumption

 Organization
 Physical arrangement of bits into words
◼ Not always obvious
Memory Hierarchy
9

 Memory design is governed by three questions:


 How large?
 How fast?
 How much?

 Three rules:
 Faster access time, greater cost per bit.
 Greater capacity, slower access time.
 Greater capacity, smaller cost per bit.

To solve this dilemma, designers use a hierarchy of memory systems.


Memory Hierarchy
10

Register Inboard Memory


Cache
< $ / bit Main Memory

> Capacity Magnetic Disk


CD ROM Outboard Storage
> Access Time CD-RW
DVD
< Frequency DVD-RW
of access
Magnetic Tape Off-Line
WORM Storage
Locality of Reference
11

 The memory hierarchy presented works because of


a natural phenomena known as “locality of
reference”.
 Locality of reference- is the term for phenomena in
which the same values, or related storage locations,
are frequently accessed, depending on memory
access pattern.
 During the execution of a program, memory
references for instructions and data tend to cluster.
 Keeping the current cluster in the faster memory
level allows faster memory access.
Main Memory
12

 Relatively large and fast.


 Used to store programs and data during the
computer operation.
 The principle technology is based on semiconductor
ICs.

 Usually referred to as Random Access Memory


(RAM).
 Themore accurate name would be Read / Write
Memory (R / WM)
RAM
13

 Allows both read and write operations.


 Both operations are performed electrically.

 Volatile.
 Used for temporary storage only.
 If the power is disconnected, the contents become invalid.

 Two main varieties.


 Static.
 Dynamic.
Dynamic RAM (DRAM)
14

 Usually used for Main Memory in most computer


systems.
 Inexpensive.

 Uses only one transistor per bit.


 Data is stored as charge in capacitors.
 Destructive read.
◼ Charge on capacitor
is drained during a read.
◼ Data must be re-written
after a read.
DRAM – (Contd.)
15

 Charge on a capacitor decays naturally.


 Therefore, DRAM needs refreshing even when powered to maintain
the data.
 Refreshing is done by reading and re-writing each word every few
milliseconds.
◼ Refresh Rate.
 During “suspended” operation, notebook computers use power
mainly for DRAM refresh.
Static RAM (SRAM)
16

 Consists of internal flip flop like structures that store the


binary information.
 No charges to leak.
◼ No refreshing is needed.
 Non-destructive read.
 More complex construction.
◼ Larger cell, Less dense.
 More expensive.
 Faster.

 Usually used for Cache Memory.


SRAM vs. DRAM
17

 Storage cells in DRAM are simpler and smaller.


+ DRAM is more dense.
◼ More bits per square area.
+ DRAM is less expensive.
+ DRAM uses less power.

 DRAM requires extra circuitry to implement refresh


mechanism.
 DRAM is slower.
18

SRAM
Chip
Organization
Read Only Memory (ROM)
19

 Read but cannot write.


 Non volatile.

 Used for:
 Microprogramming.
 System programs.
 Whole programs in embedded systems.
 Library subroutines and function tables.
 Constants.

 Manufactured with the data wired into the chip.


 No room for mistakes.
ROM Structure
20
Programmable ROM (PROM)
21

 Non volatile.
 Can be programmed - written into - only once.

 Programming is done electrically and can be done after


manufacturing.
 Special equipment is needed for the programming
process.
 Uses fuses instead of diodes.
 Fusesthat need to be removed are “vaporized” during the
programming process using a high voltage pulase (10 – 30 V).

 CAN NOT BE ERASED.


Erasable PROM (EPROM)
22

 Uses floating-gate MOS transistors with insulating material that


changes behavior when exposed to ultraviolet light.
 Programmed electrically and erased optically.
 Erasing can be repeated a relatively large but limited number of times
(~100,000 times).
 Erasing time ~20 minutes.

 Electrically read and written.


 Before writing, ALL cells must be erased by exposure to ultraviolet light.

 Non volatile.
 More expensive than PROM.
Electrically Erasable PROM (EEPROM)
23

 Uses the same floating-gate transistors, except that the


insulating material is much thinner.
 Its operation can be inverted using voltage.
 Can be written to any time without erasing the previous
contents.
 Only the bytes addressed are modified.
 Write takes a relatively long time (~100msec/byte).
 Can be erased only about 10,000 times.

 Non volatile.
 Updatable in place.
 More expensive and less dense than EPROM.
Flash Memory
24

 Called flash due to the speed of re-programming.


 Uses electrical erasure technology.

 An entire chip can be erased in 1-2 sec.


 Possible to erase only blocks of data.
 Does not provide byte level erasure.
 Uses one transistor per bit.
 Very high density.

 Cost is between EPROM and EEPROM.


 Non Volatile.
Organization of a Memory Chip
25

 The basic element of a semiconductor memory is the


memory cell.
 Thereare different types, but they all share some
common properties:
◼ Two states, 1 and 0.
◼ It is possible to write into the cell. (At least once).
◼ They can be read to sense the state.
Organization of a Memory Chip
26

 How to organize a-16 Mbit chip?


 1 Mega words of 16 bits each.
◼ Tall and narrow organization.

 Chips like to be square.

 Typical organization is:


◼ 2048 x 2048 x 4bit array.
◼ Organized internally as a square structure with decoders for row
and column.
◼ Simplifies decoding logic.
◼ Reduces number of address pins.
◼ Row and column address bits are multiplexed.
Organization of the Memory Chip
27
Memory Module Organization
28

 Most high capacity RAM chips contain only a single bit


per location.
 To build a multi-bit per location module, we will need
multiple chips.

 Design a 256K Byte memory system using 8 256K X 1


chips.
 256K requires 18 address wires
◼ We will apply 9 wires to the row selectors and 9 to the column
selectors
 The outputs of the chips are combined together to form the
8 bit output of the system.
29

Organization of
the 256 K Byte
System
 Each chip receives all 18 bits
of the address.
 Each chip produces/receives
a single bit of the data.
Memory Module Organization
30

 What if the size of the system is not the same as the


chips?

 Design a 1 MByte system using 256K X 1 chips.


 We will have to arrange the chips themselves into columns
and rows.
 There will be 4 columns of chips.
 Number of columns = system’s address space / chip’s address space.
 There will be 8 rows of chips.
 Number of rows = system’s word size / chip’s word size.

 Some of the address wires will have to be used for selecting


different rows of chips.
Organization of the 1 M Byte System
31
32
Cache Memory
33

 Cache Memory is intended to give:


 Memory speed approaching that of the fastest
memories available.
 Large memory size at the price of less expensive types
of semiconductor memories.

 Small amount of fast memory.


 Sits between normal main memory and CPU.
 May be located on CPU chip or module.
Conceptual Operation
34

 Relatively large and slow main memory together with faster, smaller
cache.
 Cache contains a copy of portions of main memory.
 When processor attempts to read a word from memory, a check is
made to determine if the word exists in cache.
 If it is, the word is delivered to the processor.
 If not, a block of main memory is read into the cache, then the word is delivered
to the processor.

Word Block
Transfer Transfer

CPU Cache
Memory
Main
Memory
Hit Ratio
35

 A measure of the efficiency of the cache structure.


 When the CPU refers to memory and the word is found
in the cache, this called a hit.
 When the word is not found in cache, this is called a
miss.

 Hit ratio is the total number of hits divided by the


total number of access attempts (hits + misses).
 It has been shown practically that hit rations higher than
0.9 are possible.
Cache vs. Main Memory Structure
36

0
1
Block 2 Block
Tag
(K words)
0
1
2
3
.
.
C-1
.

Block Length
(K Words)

Cache Main
2n - 1
Memory

Word Length
Main Memory and Cache Memory
37

 Main Memory consists of 2n addressable words.


 Each word has a unique n-bit address.
 We can consider that main memory is made up of
blocks of K words each.
 Usually, K is about 16

 Cache consists of C lines of K words each.

 A block of main memory is copied into a line of Cache.


 The “tag” field of the line identifies which main memory
block each cache line represents
Elements of Cache Design
38

 Size
 Mapping function
 Replacement algorithm
 Write policy
 Line size
 Number of caches
Elements of Cache Design...
39

 Cache Size
◼ Small enough ---not to be costly or expensive
◼ Large enough so overall average access time is small
◼ Affected by the available chip and board area
Elements of Cache Design...
40

 Mapping Function
 No of cache lines <<< No of blocks in main memory
 Mapping function needed
◼ A method to map main memory blocks into cache lines
 Three mapping techniques used
 Direct
 Associative
 Set Associative

 Typical memory-cache organization


 Cache of 64kByte
◼ Organized as 16k lines of 4 bytes
◼ Cache block of 4 bytes
 16MBytes main memory
◼ Byte addressable memory
◼ 24 bit address
◼ (224=16M)
Direct Mapping
41

 Each block of main memory maps to only one cache


line
 i= j modulo m
◼i = cache line number,
◼ j = main memory block number and
◼ m = number of lines in the cache

 i.e. if a block is in cache, it must be in one specific place

 Mapping function implemented using main memory


address
Direct Mapping
42

 Map each block of memory into only one possible


cache line.
A block of main memory can only be brought into the
same line of cache every time.

Cache Line Main memory blocks assigned


0 0, C, 2C, 3C, …
1 1, C+1, 2C+1, 3C+1, …

… …

C–1 C-1, 2C-1, 3C-1, 4C-1, …


Direct Mapping...
43

 Address viewed as having three fields


 Word, line and tag identifier

 Least Significant w bits identify unique word in a block

 Most Significant s bits specify one of 2s memory block


 The MSBs are split into
◼a tag of (s – r) bits (most significant)
◼ Stored in the cache along with the data words of the line
◼a cache line field of r bits
◼ Identifies one of m = 2r lines of the cache
Elements of Cache Design...

 Direct Mapping
 Address Structure
Tag s-r Line or Slot r Word w
8 14 2
Tag Line or Slot Word

 24 bit address
 2 bit word identifier (4 byte block)
 22 bit block identifier
◼ 8 bit tag (=22-14)
◼ 14 bit slot or line

 No two blocks in the same line have the same Tag field
Reading From a Direct Mapped
45
System
 The processor produces a 24 bit address.
 The cache uses the middle 14 bits to identify one of
its 16 K lines.
 The upper 8 bits of the address are matched to the
tag field of the cache entry.
 If they match, then the lowest order two bits of the
address are used to access the word in the cache line.
 If not, address is used to fetch the block containing the
specified word from main memory to the cache.
Elements of Cache Design...
46

 Direct Mapping Cache Organization


Elements of Cache Design...
47

 Direct Mapping Summary


 Address length = (s + w) bits
 Number of addressable units = 2(s+w) words / bytes

 Block size = line size = 2w words or bytes

 Number of blocks in main memory = 2(s+ w)/2w = 2s

 Number of lines in cache = m = 2r

 Size of tag = (s – r) bits


Direct Mapping
48

 Advantages.
 Simple.

 Inexpensive to implement.

 Disadvantages.
 There is a fixed location for each block in the cache.
◼ Ifa program addresses words from two blocks mapped to
the same line, the blocks have to be swapped in and out of
cache repeatedly.
Associative Mapping
49

 To improve the hit ratio of the cache, another


mapping techniques is often utilized, “associative
mapping”.

 A block of main memory may be mapped into ANY


line of the cache.
A block of memory is no longer restricted to a single
line of cache.
Associative Mapping
50

 A main memory address is considered to be made


up of two pieces:
 Tag
◼ Upper bits of the address
 Word address within a block
◼ Lower 2 bits of the address
Associative Mapping Address Structure
51

 16 Mbytes of memory.
 24 bits in address.

 4 byte blocks.
2 bits.
 Rest is used to identify the block mapped to the
line. 22 2
Tag Word
Reading From an Associative Mapped System
52

 The processor produces a 24 bit address.


 The upper 22 bits of the address are matched to
the tag field of EACH cache entry.
 This matching must be done simultaneously to each of
the entries.
 i.e. Associative memory.
Associative Mapping Cache
Organization
53
Associative Mapping
54

 Advantages.
 Improves hit ratio for certain situations.

 Disadvantages.
 Requiresvery complicated matching hardware for
matching the tag and the entries for each line.
◼ Expensive.
Set Associative Mapping
55

 Set Associative Mapping helps reduce the complexity of


the matching hardware for an associative mapped
cache.

 Cache is divided into a number of sets.


 Each set contains a number of lines.
A 2-way set associative cache has 2 lines per set.

 A block of memory is restricted to a SPECIFIC set of


lines.
 A block of main memory may map to ANY line in the given
set.
Set Associative Mapping
56

 A main memory address is considered to be made


up of two pieces:
 Tag.
◼ Upper bits of the address.
 Set number.
◼ Middle bits of the address.
 Word address within a block.
◼ Lower 2 bits of the address.
Set Associative Mapping Address Structure
57

 16 Mbytes of memory.
 24 bits in address.

 4 byte blocks.
 Lowest order 2 bits.
 8K sets in a 2-way associative cache.
 13 bits.
 Rest is used to identify the block mapped to the
line.
9 13 2
Tag Set Word
Reading From a Set Associative Mapped System
58

 The processor produces a 24 bit address.


 The cache uses the middle 13 bits to identify one of
its 8 K sets.
 The upper 9 bits of the address are matched to the
tag field of the cache entries that make up the set.
 The number of lines to match to is very limited.
 Therefore, the matching hardware is much simpler.
Set Associative Mapping Cache Organization
59
Set Associative Mapping
60

 Advantages.
 Combines advantages of direct and associative
mapping techniques.

 Disadvantages.
 Increasing the size of the set does not always improve
the hit ratio.
 2-way set associative has a much higher hit ratio than direct
mapping.
 Increasing it to 4-way improves the hit ratio slightly more.
 Beyond that no significant improvement has been seen.
Replacement Algorithms
61

 What happens if there is a “miss” and the cache is


already full?
 One of the items in the cache needs to be “replaced” with
the new item.
 Which one??
 Depends on the mapping technique used.

 Direct mapping.
 No choice.
 Memory blocks map into certain cache lines.
◼ The entry occupying that line must be swapped out.
Replacement Algorithms
62

 Associative & Set Associative:


 Random.

 First-in
First-out (FIFO).
 Least Recently Used (LRU).

 Least Frequently Used (LFU).


◼ The last three require additional bits for each entry to keep
track of order, time or number of times used.

 Usually,these algorithms are implemented in hardware


for speed.
Writing Into Cache
63

 Cache entries are supposed to be exact “copies” of


what is in main memory.
 What happens when the CPU wants to write into
memory??
 Which memory does it write too???

 Two techniques are possible.


 Write-through.

 Write-back.
Write-Through
64

 The simplest and most commonly used technique is


to update both the cache and main memory at the
same time.

 Advantage.
 Memory and cache are always in sync.

 Disadvantage.
 Memory write becomes slow.
Write-Back
65

 The update is done ONLY to the word in the cache


and the block containing the word is marked.
 When the block is to be swapped out of cache, the
word is written back to main memory.

 Advantage.
 Reduces memory traffic because a word may be
updated several times while in cache.
 Disadvantage.
 Cache and memory will be out of sync for a while.
 What about DMA??
Number of Caches
66

 When a cache miss occurs, the system suffers through a


large delay while the block is read from main memory
into the cache.

 Two possible solutions.


◼ Speed up the transfer of information.
◼ The transfer rate is limited by issues that may not be under our control.

◼ Speed up the source of the information.


◼ Main memory is between 7X and 10X slower than cache.
◼ We can insert an intermediate level of memory between cache and main
memory.
Cache Levels
67

 In most of today’s designs, cache sits on the same chip


as the CPU. “On-chip cache”
 Data travels a very short distance
 No need to use the very slow bus
 This is known as L1 cache
◼ Intel calls this level L0

 To reduce the penalty of a cache miss, a second level of


cache is inserted between main memory and the on-
chip cache.
 L2 cache
Cache Levels
68
Memory System
Data
On Bus Bus
Off-Chip
CPU Chip
Cache
Main
Bus Cache
Memory
MPU Chip

Pentium Pentium Pro


“L2” Cache
69

 A very fast, SRAM based, cache is placed off-chip.


 Slower than the on-chip cache.
 Larger than the on-chip cache.

 On-Module Cache.
◼ CPU uses a dedicated, internal, fast, memory bus to access
cache.
 On-Mother-Board Cache.
◼ The CPU has to use the system bus to get to it.
◼ Still much faster than DRAM based main memory.
Cache Strategy
70

 On-Chip Cache is optimized to increase “hit rate”.


 Block
size about 4 words
 Many blocks

 Off-Chip Cache is optimized to reduce “miss


penalty”.
 Larger block size
 Smaller number of blocks.
71
Secondary memory
Types of External Memory
 Magnetic Disk
 RAID

 Removable

 Optical
 CD-ROM

 CD-Recordable (CD-R)
 CD-R/W

 DVD

 Magnetic Tape
Magnetic Disk
 Disk substrate coated with magnetizable material (iron
oxide…rust)
 Substrate used to be aluminium
 Now glass
 Improved surface uniformity
◼ Increases reliability
 Reduction in surface defects
◼ Reduced read/write errors
 Lower flight heights (See later)
 Better stiffness
 Better shock/damage resistance
How Hard Drive works?
74

 See video
Read and Write Mechanisms
 Recording & retrieval via conductive coil called a head
 May be single read/write head or separate ones
 During read/write, head is stationary, platter rotates
 Write
 Current through coil produces magnetic field
 Pulses sent to head
 Magnetic pattern recorded on surface below
 Read (traditional)
 Magnetic field moving relative to coil produces current
 Coil is the same for read and write
 Read (contemporary)
 Separate read head, close to write head
 Partially shielded magneto resistive (MR) sensor
 Electrical resistance depends on direction of magnetic field
 High frequency operation
 Higher storage density and speed
Inductive Write MR Read
Data Organization and Formatting
 Concentric rings or tracks
 Gaps between tracks
 Reduce gap to increase capacity

 Same number of bits per track (variable packing


density)
 Constant angular velocity

 Tracks divided into sectors


 Minimum block size is one sector
 May have more than one sector per block
Disk Data Layout
Disk Velocity
 Bit near centre of rotating disk passes fixed point slower than bit on
outside of disk
 Increase spacing between bits in different tracks
 Rotate disk at constant angular velocity (CAV)
 Gives pie shaped sectors and concentric tracks
 Individual tracks and sectors addressable
 Move head to given track and wait for given sector
 Waste of space on outer tracks
◼ Lower data density
 Can use zones to increase capacity
 Each zone has fixed bits per track
 More complex circuitry
Disk Layout Methods Diagram
Finding Sectors
 Must be able to identify start of track and sector
 Format disk
 Additionalinformation not available to user
 Marks tracks and sectors
Winchester Disk Format
Seagate ST506
Characteristics
 Fixed (rare) or movable head
 Removable or fixed
 Single or double (usually) sided
 Single or multiple platter
 Head mechanism
 Contact (Floppy)
 Fixed gap

 Flying (Winchester)
Fixed/Movable Head Disk
 Fixed head
 One read write head per track
 Heads mounted on fixed ridged arm

 Movable head
 One read write head per side
 Mounted on a movable arm
Removable or Not
 Removable disk
 Can be removed from drive and replaced with another
disk
 Provides unlimited storage capacity

 Easy data transfer between systems

 Nonremovable disk
 Permanently mounted in the drive
Multiple Platter
 One head per side
 Heads are joined and aligned
 Aligned tracks on each platter form cylinders
 Data is striped by cylinder
 reduces head movement
 Increases speed (transfer rate)
Multiple Platters
Tracks and Cylinders
Floppy Disk
 8”, 5.25”, 3.5”
 Small capacity
 Up to 1.44Mbyte (2.88M never popular)
 Slow
 Universal
 Cheap
 Obsolete?

You might also like