You are on page 1of 89

Unit 5

Memory and I/O System

The Memory System

Overview

Memory Technology Organization of the main memory Cache memory Associative Memory Virtual memory mechanism

Some Basic Concepts

Memory Types

Characteristics of Memory Technologies

Big-endian and Little-endian

The maximum size of the memory that can be used in any computer is determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations

Most modern computers are byte addressable.


Word address 0 4 0 4 Byte address 1 5 2 6 3 7 0 4 3 7 Byte address 2 6 1 5 0 4


k k k k k k k k


k k

2 -4

2 -4

2 -3

2- 2

2 - 1

2 - 4

2- 1

2 - 2

2 -3

2 -4
7

(a) Big-endian assignment

(b) Little-endian assignment

Traditional Architecture
Processor MAR n-bit data bus MDR Up to 2k addressable locations Word length =n bits Control lines ( R / W , MFC, etc.) k-bit address bus Memory

Figure 5.1. Connection of the memory to the processor.


8

Basic Concepts

Block transfer bulk data transfer Memory access time (Time from Read to MFC) Memory cycle time (Time between two memory operations Reads or Writes) RAM any location can be accessed for a Read or Write operation in some fixed amount of time that is independent of the locations address. Cache memory (for reducing memory access time) Virtual memory, memory management unit ( physical memory address is different from address specified in the program)
9

Semiconductor RAM Memories

10

Types of Semiconductor Memory


Semiconductor Memory

Read Only Memory

Random Access Memory Dynamic RAM (DRAM) Static RAM (SRAM)

Field Programmable

Mask Programmable (ROM)

Fast Page Mode DRAM(FPM DRAM) Fusible link PROM ( PROM) Extended Data Out DRAM(EDO RAM) UV light Erasable PROM(EPROM) Electrically Alterable PROM(EAPROM) Electrically Erasable PROM(EEPROM) Synchronous DRAM (SDRAM) Double Data Rate Synchronous Dynamic RAM (DDR SDRAM)
11

RAM BUS DRAM (RDRAM)

Internal Organization of Memory Chips


b7 W0 FF A0 A1 A2 A3 W15 Address decoder W1 Memory cells Sense / Write circuit Sense / Write circuit R/W CS b1 b0
12

b7

b1

b1

b0

b0

FF

16 words of 8 bits each: 16x8 memory Sense / Write org.. It has 16 external connections: circuit addr. 4, data 8, control: 2, power/ground: 2
Data input/output lines: b7 1K memory cells: 128x8 memory, external connections: ? 19(7+8+2+2)

1Kx1:? 15 (10+1+2+2)

Figure 5.2. Organization of bit cells in a memory chip.

A Memory Chip
5-bit row address W0 W1 5-bit decoder W31 32 32 memory cell array Sense/ Write circuitry

10-bit address

32-to-1 output multiplexer and input demultiplexer 5-bit column address Data input/output

R/ W CS

Figure 5.3. Organization of a 1K 1 memory chip.

13

Static Memories

The circuits are capable of retaining their state as long as power is applied.
b b

T1

T2

Word line Bit lines


14

Figure 5.4. A static RAM cell.

Static Memories

CMOS cell: low power consumption


b Vsupply b

T3 T1 X

T4 T2 Y

T5

T6

Word line Bit lines

15

Figure 5.5. An example of a CMOS memory cell.

Asynchronous DRAMs

Static RAMs are fast, but they cost more area and are more expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely need to be periodically refreshed.

Bit line Word line

T C

16

Figure 5.6. A single-transistor dynamic memory cell

A Dynamic Memory Chip


RA S

Row Addr. Strobe


Row address latch Row decoder 4096 512 8 cell array

A20 - 9 A 8 - 0

Sense / Write circuits

CS R/ W

Column address latch

Column decoder

CA S

D7

D0

Column Addr. Strobe

Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.

17

Fast Page Mode

When the DRAM in last slide is accessed, the contents of all 4096 cells in the selected row are sensed, but only 8 bits are placed on the data lines D7-0, as selected by A8-0. Fast page mode make it possible to access the other bytes in the same row without having to reselect the row. A latch is added at the output of the sense amplifier in each column. Good for bulk transfer.
18

Synchronous DRAMs

The operations of SDRAM are controlled by a clock signal.


Refresh counter

Row address latch Row/Column address Column address counter

Row decoder

Cell array

Column decoder

Read/Write circuits & latches

Clock RA S CA S R/ W CS
19

Mode register and timing control

Data input register

Data output register

Figure 5.8. Synchronous DRAM.

Data

Synchronous DRAMs

No CAS pulses is needed in burst operation. Refresh circuits are included (every 64ms). Clock frequency > 100 MHz Intel PC100 and PC133

20

Latency and Bandwidth

The speed and efficiency of data transfers among memory, processor, and disk have a large impact on the performance of a computer system. Memory latency the amount of time it takes to transfer a word of data to or from the memory. Memory bandwidth the number of bits or bytes that can be transferred in one second. It is used to measure how much time is needed to transfer an entire block of data. Bandwidth is not determined solely by memory. It is the product of the rate at which data are transferred (and accessed) and the width of the data bus.
21

DDR SDRAM

Double-Data-Rate SDRAM Standard SDRAM performs all actions on the rising edge of the clock signal. DDR SDRAM accesses the cell array in the same way, but transfers the data on both edges of the clock. The cell array is organized in two banks. Each can be accessed separately. DDR SDRAMs and standard SDRAMs are most efficiently used in applications where block transfers are prevalent.
22

Structures of Larger Memories


21-bit addresses A0 A1 19-bit internal chip address A19 A20

2-bit decoder

512K 8 memory chip

D31-24

D23-16

D 15-8

D7-0

512K 8 memory chip

19-bit address

8-bit data input/output

Chip select

23

Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.

Memory System Considerations

The choice of a RAM chip for a given application depends on several factors: Cost, speed, power, size SRAMs are faster, more expensive, smaller used as Cache memories DRAMs are slower, cheaper, larger- used as main Memories

24

Memory Controller
Address R/ W Request Processor Clock Memory controller Row/Column address RAS CAS R/ W CS Clock Memory

Data

Figure 5.11. Use of a memory controller.


25

Read-Only Memories

26

Read-Only-Memory

Volatile / non-volatile memory ROM PROM: programmable ROM EPROM: erasable, reprogrammable ROM EEPROM: can be programmed and erased electrically
Bit line T P Not connected to store a 1 Connected to store a 0

Figure 5.12. A ROM cell.

27

Flash Memory

Similar to EEPROM Difference: only possible to write an entire block of cells instead of a single cell Low power Use in portable equipment Implementation of such modules

Flash cards Flash drives


28

Speed, Size and Cost


Smaller, faster, and costlier (per byte) storage devices L0: registers L1: L2: on-chip L1 cache (SRAM) off-chip L2 cache (SRAM) main memory (DRAM)
CPU registers hold words retrieved from L1 cache. L1 cache holds cache lines retrieved from the L2 cache memory. L2 cache holds cache lines retrieved from main memory.

L3: Larger, slower, and cheaper L4: (per byte) storage devices L5:

Main memory holds disk blocks retrieved from local disks.

local secondary storage (local disks)

Local disks hold files retrieved from disks on remote network servers.

remote secondary storage (distributed file systems, Web servers)


29

Cache Memories

30

Cache
Processor Cache Main memory

Figure 5.14. Use of a cache memory.


Replacement algorithm Hit / miss Write-through / Write-back Load through


31

Cache Memory

Effectiveness of cache mechanism is based on a property of computer programs called locality of reference The references to memory at any given time interval tend to be confined within a localized areas Analysis of programs shows that most of their execution time is spent on routines in which instructions are executed repeatedly These instructions may be loops, nested loops , or few procedures that call each other Many instructions in localized areas of program are executed repeatedly during some time period and reminder of the program is accessed infrequently This property is called Locality of Reference
32

Locality of Reference

Locality of reference is manifested in two ways 1. Temporal- means that a recently executed instruction is likely to be executed again very soon

The information which will be used in near future is likely to be in use already( e.g. reuse of information in loops)

2. Spatial- means that instructions in close proximity to a recently executed instruction are also likely to be executed soon

If a word is accessed, adjacent (near) words are likely to be accessed soon ( e.g. related data items (arrays) are usually stored together; instructions are executed sequentially )
33

Locality of Reference (contd..)

If active segments of a program can be placed in a fast (cache) memory , then total execution time can be reduced significantly Temporal Locality of Reference suggests whenever an information (instruction or data) is needed first , this item should be brought in to cache Spatial aspect of Locality of Reference suggests that instead of bringing just one item from the main memory to the cache ,it is wise to bring several items that reside at adjacent addresses as well ( ie a block of information )

34

Locality of Reference (contd..)


Example : sum = 0; for (i=0 ; i<n ; i++) sum = sum + a[i] ; return sum Explanation: 1. sum = 0 2. i = 0 3. sum = sum + a[i] 4. i = i + 1 5. Check value of i 6. if i < n goto 3 7. Otherwise print sum

35

Locality of Reference example


Locality Example: Data Reference array elements in succession: Spatial locality Reference sum each iteration: Temporal locality Instructions Reference instructions in sequence: Spatial locality Cycle through loop repeatedly: Temporal locality

36

Principles of cache
The following diagram is used for explaining cache from access point of view

CPU Cache Memory 512 X 12

Main Memory 32 K X 12

37

Principles of cache (contd..)

When a read request is received from CPU, contents of a block of memory words containing the location specified are transferred in to cache When the program references any of the locations in this block , the contents are read from the cache Number of blocks in cache is smaller than number of blocks in main memory Correspondence between main memory blocks and those in the cache is specified by a mapping function
38

Principles of cache (contd..)


Assume cache is full and memory word not in cache is referenced Control hardware decides which block from cache is to be removed to create space for new block containing referenced word from memory Collection of rules for making this decision is called Replacement algorithm

39

Read/ Write operations on cache

Cache Hit Operation

CPU issues Read/Write requests using addresses that refer to locations in main memory Cache control circuitry determines whether requested word currently exists in cache If it does, Read/Write operation is performed on the appropriate location in cache (Read/Write Hit)

40

Read/Write operations on cache in case of Hit


In Read operation main memory is not involved In Write operation two things can happen

1.Cache and main memory locations are updated simultaneously ( Write Through ) OR 2.Update only cache location and mark it as Dirty or Modified Bit and update main memory location at the time of cache block removal ( Write Back or Copy Back )

41

Read/Write operations on cache in case of Miss

Read Operation

When addressed word is not in cache Read Miss occurs there are two ways this can be dealt with 1.Entire block of words that contain the requested word is copied from main memory to cache and the particular word requested is forwarded to CPU from the cache ( Load Through ) (OR) 2.The requested word from memory is sent to CPU first and then the cache is updated ( Early Restart )

42

Read/Write operations on cache in case of Miss (contd..)

Write Operation

If addressed word is not in cache Write Miss occurs If write through protocol is used information is directly written in to main memory In write back protocol , block containing the word is first brought in to cache , the desired word is then overwritten

43

Mapping Functions

Correspondence between main memory blocks and those in the cache is specified by a memory mapping function There are three techniques in memory mapping

1. Direct Mapping 2. Associative Mapping 3. Set Associative Mapping

44

Direct Mapping

Example: Consider a cache consisting of 128 blocks of 16 words each (128 X 16 = 2048 = 2K) Main memory is addressed by a 16 bit address (main memory has 64 K words, which is 4K (4096) blocks of 16 words each)
Tag 5 Block 7 16 7 11 4 cache
45

Word 4 Main memory

Main memory

Direct Mapping
Block j of main memory maps onto block j modulo 128 of the cache
Cache tag tag Block 0 Block 1

Block 0 Block 1

Block 127 Block 128 Block 129

4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27)

tag

Block 127

Block 255 Block 256 Block 257

Figure 5.15. Direct-mapped cache.

5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128).

Block 4095 Tag 5 Block 7 Word 4 Main memory address


46

Direct Mapping
Tag 5 Block 7 Word 4 Main memory address

11101,1111111,1100

Tag: 11101 Block: 1111111=127, in the 127th block of the cache Word:1100=12, the 12th word of the 127th block in the cache
47

Associative Mapping
Cache tag tag Block 0 Block 1

Main memory Block 0 Block 1

Block i tag

Block 127

4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212.
Block 4095 Tag 12 Word 4 Main memory address

Figure 5.16. Associative-mapped cache.

48

Associative Mapping
Tag 12 Word 4 Main memory address

111011111111,1100

Tag: 111011111111 Word:1100=12, the 12th word of a block in the cache

49

Main memory Block 0

Set-Associative Mapping
Cache tag Set 0 tag tag Set 1 tag Block 0 Block 1

Block 1

Block 63 Block 64 Block 2 Block 65 Block 3

4: one of 16 words. (each block has 16=24 words)

tag Set 63 tag

Block 127 Block 126 Block 128 Block 127 Block 129

6: points to a particular set in the cache (128/2=64=26) 6: 6 tag bits is used to check if the desired block is present (4096/64=26).

Block 4095

Figure 5.17. Set-associative-mapped cache with two blocks per set.


Tag 6 Set 6 Word 4 Main memory address
50

Set-Associative Mapping
Tag 6 Set 6 Word 4 Main memory address

111011,111111,1100

Tag: 111011 Set: 111111=63, in the 63th set of the cache Word:1100=12, the 12th word of the 63th set in the cache
51

Replacement Policies

When the cache is full and there is necessity to bring new data to cache , then a decision must be made as to which data from cache is to be removed The guideline for taking a decision about which data is to be removed is called replacement policy Replacement policy depends on mapping There is no specific policy in case of Direct mapping as we have no choice of block placement in cache

52

Replacement Policies (contd..)

In case of associative mapping

A simple procedure is to replace cells of the cache in round robin order whenever a new word is requested from memory This constitutes a First-in First-out (FIFO) replacement policy Random replacement First-in First-out (FIFO) ( item chosen is the item that has been in the set longest) Least Recently Used (LRU)( item chosen is the item that has been least recently used by CPU)
53

In case of set associative mapping


Associative Memory

54

Associative Memory

Many data-processing applications require the search of items in a table stored in memory Example: an account number may be searched in a file to determine the holders name and account status Usually all items are stored in memory where they can be addressed in sequence Search procedure usually is choosing a sequence of addresses , reading the content of memory at each address , and comparing the information read with the item being searched until a match occurs

55

Associative Memory (contd..)

The number of accesses to memory depends on the location of the item and the efficiency of the search algorithm Time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather by an address A memory unit accessed by content is called an associative memory or content addressable memory (CAM)
56

Associative Memory (contd..)

This type of memory is accessed simultaneously and in parallel on the basis of data content rather than by specific address or location When a word is written in associative memory , no address is given memory is capable of finding an empty location to store the word When a word is to be read from an associative memory , the content of the word or part of the word is specified - the memory locates all words which match the specified content and makes them for reading
57

Associative Memory (contd..)

Associative memory is uniquely suited to do parallel searches by data association Associative memory is more expensive than a random access memory because each cell must have storage capability as well as logic circuit for matching its content with an external argument Associative memories are used in applications where the search time is very critical and must be very short
58

Hardware Organization
Argument register (A)

Key Register (K) Match register Input Associative memory array and logic Read Write M m words n bits per word

output

59

Operation

Consists of a memory array and logic for m words with n bits per word Argument register A and Key register K each have n bits , one bit for each bit of a word Match register M has m bits , one bit for each memory word Each word in memory is compared in parallel with the content of the argument register The words that match the bits of the argument register set a corresponding bit in the match register
60

Operation (contd..)

After the matching process , those bits in the match register that have been set indicate the fact that that their corresponding words have been matched Reading is done by a sequential access to memory for those words whose corresponding bits in the match register have been set The key register provides a mask for choosing a particular field or key in the argument word - set to 0 for masking , otherwise set to 1

61

Operation (contd..)
Example for matching : A K Word 1 Word 2 101 111 100 101 111100 000000 111100 000001

no match match

62

Associative memory of m word , n cells per word


A1 Aj An

K1

Kj

Kn

Word 1

C 11

C 1j

C 1n

M1

Word i

C i1

C ij

C in

Mi

Word m

C m1 Bit 1

C mj Bit j

C mn Bit n

Mm

63

Operation

The cells in the array are marked by the letter C with 2 subscripts

The first subscript gives the word number and the second specifies the bit position in the word Example: C i j is the cell for bit j in word i

A bit A j in the argument register is compared with all the bits in column j of the array provided that K j = 1 This is done for all columns j = 1,2,.., n If a match occurs between all unmasked bits of the argument and the bits in word i , the corresponding bit M i in the match register is set to 1 If one or more unmasked bits of the argument and the word do not match , M i is cleared to 0
64

Internal organization of a typical cell Cij

Input Write

Aj

Kj

R Read Output

S F ij

Match logic

To M i

65

Internal Organization of one cell (contd..)

It consists of a flip-flop storage element F ij and the circuits for reading , writing and matching the cell Input bit is transferred in to the storage cell during a write operation The bit stored is read out during a read operation The match logic compares the content of the storage cell with the corresponding unmasked bit of the argument and provides an output for the decision logic that sets the bit in M i
66

MATCH LOGIC
K1 F'i1 F i1 A1 K2 F'i2 F i2 A2 .... Kn F'in F in An

Mi

67

Read Operation

If more than one word in memory matches the unmasked argument field , all the matched words will have 1s in the corresponding bit position of the match register In read operation all matched words are read in sequence by applying a read signal to each word line whose corresponding Mi bit is a logic 1 In applications where no two identical items are stored in the memory , only one word may match , in which case we can use Mi output directly as a read signal for the corresponding word
68

Write Operation

Can take two different forms


1. Entire memory may be loaded with new information 2.Unwanted words to be deleted and new words to be inserted

1.Entire memory : writing can be done by addressing each location in sequence This makes it random access memory for writing and content addressable memory for reading number of lines needed for decoding is d
Where m = 2 d , m is number of words
69

Write Operation (contd..)

2.Unwanted words to be deleted and new words to be inserted :

Tag register is used which has as many bits as there are words in memory For every active ( valid ) word in memory , the corresponding bit in tag register is set to 1 When word is deleted the corresponding tag bit is reset to 0 The word is stored in the memory by scanning the tag register until the first 0 bit is encountered After storing the word the bit is set to 1
70

Virtual Memory

71

Virtual Memory

Early days memory was expensive hence small Programmers were using secondary storage for overlaying Programmers were responsible for breaking programs in to overlays , decide where to keep in secondary memory, arranging for transfer of overlays between main memory and secondary memory In 1961 Manchester University proposed a method for performing overlay process automatically which has given rise to the concept of Virtual memory today

72

Virtual Memory - Background

Separate concept of address space and memory locations Programs reference instructions and data that is independent of available physical memory Addresses issued by processor for Instructions or Data are called Virtual or Logical addresses Virtual addresses are translated in to physical addresses by a combination of Hardware and Software components
73

Types of Memory

Real memory

Main memory Memory on disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory

Virtual memory

74

Overview
Processor Virtual address

Data

MMU

Memory Management Unit


Physical address

Cache

Data

Physical address

Main memory

DMA transfer

Disk storage

Figure 5.26.

Virtual memory organization.

75

Address Space and Memory Space

Address used by a programmer is called virtual address and set of such addresses is called address space Address in main memory is called a location or physical address and set of such locations is called the memory space The Address Space is allowed to be larger than the memory space in computers with virtual memory
76

Address Space and Memory Space

address space

memory space

virtual address (logical address)

Mapping physical address

address generated by programs

actual main memory address

77

Address Translation

All programs and data are composed of fixedlength units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory Page cannot be too small or too large The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage similar to cache
78

Memory Table for mapping a virtual address

CPU will reference instructions and data with virtual address , but this information must be taken from physical memory for speedy access A table is needed to map virtual address to a physical address ( dynamic operation) This table may be kept in

a separate memory or main memory or associative memory

79

Paging

Each process has its own page table Each page table entry contains the frame number of the corresponding page in main memory A bit is needed to indicate whether the page is in main memory or not Another bit is needed to indicate whether the page is modified or not helps in deciding whether to be copied or not

80

Paging
Virtual Address Page number offset

Page Table Entry VM Other Control bits Frame number

81

Address Translation
Virtual address from processor Page table base register Page table address Virtual page number Offset

+
PAGE TABLE

Control Page frame in memory Page frame Offset bits


82

Virtual-memory address translation.

Physical address in main memory

Organization of memory Mapping Table in a paged system


Page no. 1 0 1 Table address Line number 0 1 0 1 0 1 0 0 1 1 Presence bit 000 001 010 011 100 101 110 111 11 00 0 1 1 0 0 1 1 0 1 Main memory Block 0 Block 1 Block 2 Block 3 MBR Virtual address

Memory page table

01

0101010011

01 10

Main memory address register

01

Memory table buffer register

83

Translation Lookaside Buffer

Each virtual memory reference can cause two physical memory accesses

One to fetch the page table One to fetch the data

To overcome this problem a high-speed cache is set up for page table entries

Called a Translation Lookaside Buffer (TLB)

84

Translation Lookaside Buffer (contd..)

Contains page table entries that have been most recently used

85

Virtual address from processor

TLB
TLB Virtual page number

Virtual page number

Offset

Control bits

Page frame in memory

No

=? Yes

Miss

Hit

Page frame

Offset

Physical address in main memory

Figure 5.28. Use of an associative-mapped TLB.

86

87

Translation Lookaside Buffer (contd..)

Given a virtual address, processor examines the TLB If page table entry is present (TLB hit), the frame number is retrieved and the real address is formed If page table entry is not found in the TLB (TLB miss), the page number is used to index the process page table
88

Translation Lookaside Buffer (contd..)

First checks if page is already in main memory

If not in main memory a page fault is issued

The TLB is updated to include the new page entry

89

You might also like