Professional Documents
Culture Documents
Overview
Memory Technology Organization of the main memory Cache memory Associative Memory Virtual memory mechanism
Memory Types
The maximum size of the memory that can be used in any computer is determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
k k k k k k k k
k k
2 -4
2 -4
2 -3
2- 2
2 - 1
2 - 4
2- 1
2 - 2
2 -3
2 -4
7
Traditional Architecture
Processor MAR n-bit data bus MDR Up to 2k addressable locations Word length =n bits Control lines ( R / W , MFC, etc.) k-bit address bus Memory
Basic Concepts
Block transfer bulk data transfer Memory access time (Time from Read to MFC) Memory cycle time (Time between two memory operations Reads or Writes) RAM any location can be accessed for a Read or Write operation in some fixed amount of time that is independent of the locations address. Cache memory (for reducing memory access time) Virtual memory, memory management unit ( physical memory address is different from address specified in the program)
9
10
Field Programmable
Fast Page Mode DRAM(FPM DRAM) Fusible link PROM ( PROM) Extended Data Out DRAM(EDO RAM) UV light Erasable PROM(EPROM) Electrically Alterable PROM(EAPROM) Electrically Erasable PROM(EEPROM) Synchronous DRAM (SDRAM) Double Data Rate Synchronous Dynamic RAM (DDR SDRAM)
11
b7
b1
b1
b0
b0
FF
16 words of 8 bits each: 16x8 memory Sense / Write org.. It has 16 external connections: circuit addr. 4, data 8, control: 2, power/ground: 2
Data input/output lines: b7 1K memory cells: 128x8 memory, external connections: ? 19(7+8+2+2)
1Kx1:? 15 (10+1+2+2)
A Memory Chip
5-bit row address W0 W1 5-bit decoder W31 32 32 memory cell array Sense/ Write circuitry
10-bit address
32-to-1 output multiplexer and input demultiplexer 5-bit column address Data input/output
R/ W CS
13
Static Memories
The circuits are capable of retaining their state as long as power is applied.
b b
T1
T2
Static Memories
T3 T1 X
T4 T2 Y
T5
T6
15
Asynchronous DRAMs
Static RAMs are fast, but they cost more area and are more expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely need to be periodically refreshed.
T C
16
A20 - 9 A 8 - 0
CS R/ W
Column decoder
CA S
D7
D0
17
When the DRAM in last slide is accessed, the contents of all 4096 cells in the selected row are sensed, but only 8 bits are placed on the data lines D7-0, as selected by A8-0. Fast page mode make it possible to access the other bytes in the same row without having to reselect the row. A latch is added at the output of the sense amplifier in each column. Good for bulk transfer.
18
Synchronous DRAMs
Row decoder
Cell array
Column decoder
Clock RA S CA S R/ W CS
19
Data
Synchronous DRAMs
No CAS pulses is needed in burst operation. Refresh circuits are included (every 64ms). Clock frequency > 100 MHz Intel PC100 and PC133
20
The speed and efficiency of data transfers among memory, processor, and disk have a large impact on the performance of a computer system. Memory latency the amount of time it takes to transfer a word of data to or from the memory. Memory bandwidth the number of bits or bytes that can be transferred in one second. It is used to measure how much time is needed to transfer an entire block of data. Bandwidth is not determined solely by memory. It is the product of the rate at which data are transferred (and accessed) and the width of the data bus.
21
DDR SDRAM
Double-Data-Rate SDRAM Standard SDRAM performs all actions on the rising edge of the clock signal. DDR SDRAM accesses the cell array in the same way, but transfers the data on both edges of the clock. The cell array is organized in two banks. Each can be accessed separately. DDR SDRAMs and standard SDRAMs are most efficiently used in applications where block transfers are prevalent.
22
2-bit decoder
D31-24
D23-16
D 15-8
D7-0
19-bit address
Chip select
23
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
The choice of a RAM chip for a given application depends on several factors: Cost, speed, power, size SRAMs are faster, more expensive, smaller used as Cache memories DRAMs are slower, cheaper, larger- used as main Memories
24
Memory Controller
Address R/ W Request Processor Clock Memory controller Row/Column address RAS CAS R/ W CS Clock Memory
Data
Read-Only Memories
26
Read-Only-Memory
Volatile / non-volatile memory ROM PROM: programmable ROM EPROM: erasable, reprogrammable ROM EEPROM: can be programmed and erased electrically
Bit line T P Not connected to store a 1 Connected to store a 0
27
Flash Memory
Similar to EEPROM Difference: only possible to write an entire block of cells instead of a single cell Low power Use in portable equipment Implementation of such modules
L3: Larger, slower, and cheaper L4: (per byte) storage devices L5:
Local disks hold files retrieved from disks on remote network servers.
Cache Memories
30
Cache
Processor Cache Main memory
Cache Memory
Effectiveness of cache mechanism is based on a property of computer programs called locality of reference The references to memory at any given time interval tend to be confined within a localized areas Analysis of programs shows that most of their execution time is spent on routines in which instructions are executed repeatedly These instructions may be loops, nested loops , or few procedures that call each other Many instructions in localized areas of program are executed repeatedly during some time period and reminder of the program is accessed infrequently This property is called Locality of Reference
32
Locality of Reference
Locality of reference is manifested in two ways 1. Temporal- means that a recently executed instruction is likely to be executed again very soon
The information which will be used in near future is likely to be in use already( e.g. reuse of information in loops)
2. Spatial- means that instructions in close proximity to a recently executed instruction are also likely to be executed soon
If a word is accessed, adjacent (near) words are likely to be accessed soon ( e.g. related data items (arrays) are usually stored together; instructions are executed sequentially )
33
If active segments of a program can be placed in a fast (cache) memory , then total execution time can be reduced significantly Temporal Locality of Reference suggests whenever an information (instruction or data) is needed first , this item should be brought in to cache Spatial aspect of Locality of Reference suggests that instead of bringing just one item from the main memory to the cache ,it is wise to bring several items that reside at adjacent addresses as well ( ie a block of information )
34
35
36
Principles of cache
The following diagram is used for explaining cache from access point of view
Main Memory 32 K X 12
37
When a read request is received from CPU, contents of a block of memory words containing the location specified are transferred in to cache When the program references any of the locations in this block , the contents are read from the cache Number of blocks in cache is smaller than number of blocks in main memory Correspondence between main memory blocks and those in the cache is specified by a mapping function
38
Assume cache is full and memory word not in cache is referenced Control hardware decides which block from cache is to be removed to create space for new block containing referenced word from memory Collection of rules for making this decision is called Replacement algorithm
39
CPU issues Read/Write requests using addresses that refer to locations in main memory Cache control circuitry determines whether requested word currently exists in cache If it does, Read/Write operation is performed on the appropriate location in cache (Read/Write Hit)
40
In Read operation main memory is not involved In Write operation two things can happen
1.Cache and main memory locations are updated simultaneously ( Write Through ) OR 2.Update only cache location and mark it as Dirty or Modified Bit and update main memory location at the time of cache block removal ( Write Back or Copy Back )
41
Read Operation
When addressed word is not in cache Read Miss occurs there are two ways this can be dealt with 1.Entire block of words that contain the requested word is copied from main memory to cache and the particular word requested is forwarded to CPU from the cache ( Load Through ) (OR) 2.The requested word from memory is sent to CPU first and then the cache is updated ( Early Restart )
42
Write Operation
If addressed word is not in cache Write Miss occurs If write through protocol is used information is directly written in to main memory In write back protocol , block containing the word is first brought in to cache , the desired word is then overwritten
43
Mapping Functions
Correspondence between main memory blocks and those in the cache is specified by a memory mapping function There are three techniques in memory mapping
44
Direct Mapping
Example: Consider a cache consisting of 128 blocks of 16 words each (128 X 16 = 2048 = 2K) Main memory is addressed by a 16 bit address (main memory has 64 K words, which is 4K (4096) blocks of 16 words each)
Tag 5 Block 7 16 7 11 4 cache
45
Main memory
Direct Mapping
Block j of main memory maps onto block j modulo 128 of the cache
Cache tag tag Block 0 Block 1
Block 0 Block 1
4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27)
tag
Block 127
5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128).
Direct Mapping
Tag 5 Block 7 Word 4 Main memory address
11101,1111111,1100
Tag: 11101 Block: 1111111=127, in the 127th block of the cache Word:1100=12, the 12th word of the 127th block in the cache
47
Associative Mapping
Cache tag tag Block 0 Block 1
Block i tag
Block 127
4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212.
Block 4095 Tag 12 Word 4 Main memory address
48
Associative Mapping
Tag 12 Word 4 Main memory address
111011111111,1100
49
Set-Associative Mapping
Cache tag Set 0 tag tag Set 1 tag Block 0 Block 1
Block 1
Block 127 Block 126 Block 128 Block 127 Block 129
6: points to a particular set in the cache (128/2=64=26) 6: 6 tag bits is used to check if the desired block is present (4096/64=26).
Block 4095
Set-Associative Mapping
Tag 6 Set 6 Word 4 Main memory address
111011,111111,1100
Tag: 111011 Set: 111111=63, in the 63th set of the cache Word:1100=12, the 12th word of the 63th set in the cache
51
Replacement Policies
When the cache is full and there is necessity to bring new data to cache , then a decision must be made as to which data from cache is to be removed The guideline for taking a decision about which data is to be removed is called replacement policy Replacement policy depends on mapping There is no specific policy in case of Direct mapping as we have no choice of block placement in cache
52
A simple procedure is to replace cells of the cache in round robin order whenever a new word is requested from memory This constitutes a First-in First-out (FIFO) replacement policy Random replacement First-in First-out (FIFO) ( item chosen is the item that has been in the set longest) Least Recently Used (LRU)( item chosen is the item that has been least recently used by CPU)
53
Associative Memory
54
Associative Memory
Many data-processing applications require the search of items in a table stored in memory Example: an account number may be searched in a file to determine the holders name and account status Usually all items are stored in memory where they can be addressed in sequence Search procedure usually is choosing a sequence of addresses , reading the content of memory at each address , and comparing the information read with the item being searched until a match occurs
55
The number of accesses to memory depends on the location of the item and the efficiency of the search algorithm Time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather by an address A memory unit accessed by content is called an associative memory or content addressable memory (CAM)
56
This type of memory is accessed simultaneously and in parallel on the basis of data content rather than by specific address or location When a word is written in associative memory , no address is given memory is capable of finding an empty location to store the word When a word is to be read from an associative memory , the content of the word or part of the word is specified - the memory locates all words which match the specified content and makes them for reading
57
Associative memory is uniquely suited to do parallel searches by data association Associative memory is more expensive than a random access memory because each cell must have storage capability as well as logic circuit for matching its content with an external argument Associative memories are used in applications where the search time is very critical and must be very short
58
Hardware Organization
Argument register (A)
Key Register (K) Match register Input Associative memory array and logic Read Write M m words n bits per word
output
59
Operation
Consists of a memory array and logic for m words with n bits per word Argument register A and Key register K each have n bits , one bit for each bit of a word Match register M has m bits , one bit for each memory word Each word in memory is compared in parallel with the content of the argument register The words that match the bits of the argument register set a corresponding bit in the match register
60
Operation (contd..)
After the matching process , those bits in the match register that have been set indicate the fact that that their corresponding words have been matched Reading is done by a sequential access to memory for those words whose corresponding bits in the match register have been set The key register provides a mask for choosing a particular field or key in the argument word - set to 0 for masking , otherwise set to 1
61
Operation (contd..)
Example for matching : A K Word 1 Word 2 101 111 100 101 111100 000000 111100 000001
no match match
62
K1
Kj
Kn
Word 1
C 11
C 1j
C 1n
M1
Word i
C i1
C ij
C in
Mi
Word m
C m1 Bit 1
C mj Bit j
C mn Bit n
Mm
63
Operation
The cells in the array are marked by the letter C with 2 subscripts
The first subscript gives the word number and the second specifies the bit position in the word Example: C i j is the cell for bit j in word i
A bit A j in the argument register is compared with all the bits in column j of the array provided that K j = 1 This is done for all columns j = 1,2,.., n If a match occurs between all unmasked bits of the argument and the bits in word i , the corresponding bit M i in the match register is set to 1 If one or more unmasked bits of the argument and the word do not match , M i is cleared to 0
64
Input Write
Aj
Kj
R Read Output
S F ij
Match logic
To M i
65
It consists of a flip-flop storage element F ij and the circuits for reading , writing and matching the cell Input bit is transferred in to the storage cell during a write operation The bit stored is read out during a read operation The match logic compares the content of the storage cell with the corresponding unmasked bit of the argument and provides an output for the decision logic that sets the bit in M i
66
MATCH LOGIC
K1 F'i1 F i1 A1 K2 F'i2 F i2 A2 .... Kn F'in F in An
Mi
67
Read Operation
If more than one word in memory matches the unmasked argument field , all the matched words will have 1s in the corresponding bit position of the match register In read operation all matched words are read in sequence by applying a read signal to each word line whose corresponding Mi bit is a logic 1 In applications where no two identical items are stored in the memory , only one word may match , in which case we can use Mi output directly as a read signal for the corresponding word
68
Write Operation
1. Entire memory may be loaded with new information 2.Unwanted words to be deleted and new words to be inserted
1.Entire memory : writing can be done by addressing each location in sequence This makes it random access memory for writing and content addressable memory for reading number of lines needed for decoding is d
Where m = 2 d , m is number of words
69
Tag register is used which has as many bits as there are words in memory For every active ( valid ) word in memory , the corresponding bit in tag register is set to 1 When word is deleted the corresponding tag bit is reset to 0 The word is stored in the memory by scanning the tag register until the first 0 bit is encountered After storing the word the bit is set to 1
70
Virtual Memory
71
Virtual Memory
Early days memory was expensive hence small Programmers were using secondary storage for overlaying Programmers were responsible for breaking programs in to overlays , decide where to keep in secondary memory, arranging for transfer of overlays between main memory and secondary memory In 1961 Manchester University proposed a method for performing overlay process automatically which has given rise to the concept of Virtual memory today
72
Separate concept of address space and memory locations Programs reference instructions and data that is independent of available physical memory Addresses issued by processor for Instructions or Data are called Virtual or Logical addresses Virtual addresses are translated in to physical addresses by a combination of Hardware and Software components
73
Types of Memory
Real memory
Main memory Memory on disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory
Virtual memory
74
Overview
Processor Virtual address
Data
MMU
Cache
Data
Physical address
Main memory
DMA transfer
Disk storage
Figure 5.26.
75
Address used by a programmer is called virtual address and set of such addresses is called address space Address in main memory is called a location or physical address and set of such locations is called the memory space The Address Space is allowed to be larger than the memory space in computers with virtual memory
76
address space
memory space
77
Address Translation
All programs and data are composed of fixedlength units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory Page cannot be too small or too large The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage similar to cache
78
CPU will reference instructions and data with virtual address , but this information must be taken from physical memory for speedy access A table is needed to map virtual address to a physical address ( dynamic operation) This table may be kept in
79
Paging
Each process has its own page table Each page table entry contains the frame number of the corresponding page in main memory A bit is needed to indicate whether the page is in main memory or not Another bit is needed to indicate whether the page is modified or not helps in deciding whether to be copied or not
80
Paging
Virtual Address Page number offset
81
Address Translation
Virtual address from processor Page table base register Page table address Virtual page number Offset
+
PAGE TABLE
01
0101010011
01 10
01
83
Each virtual memory reference can cause two physical memory accesses
To overcome this problem a high-speed cache is set up for page table entries
84
Contains page table entries that have been most recently used
85
TLB
TLB Virtual page number
Offset
Control bits
No
=? Yes
Miss
Hit
Page frame
Offset
86
87
Given a virtual address, processor examines the TLB If page table entry is present (TLB hit), the frame number is retrieved and the real address is formed If page table entry is not found in the TLB (TLB miss), the page number is used to index the process page table
88
89