Memory Hierarchy • The memory system is organized in several levels. By hierarchy, it is divided into many levels using progressively faster technologies as we move towards the processor. • The level that is closest to the processor is faster, and which are little far from the processor are slower. • The entire addressable memory space is available in the largest, but slowest memory; i.e., Magnetic disk. The max size of the memory that can be used by a computer is determined by the addressing scheme. For eg, a computer with 16-bit addresses is capable of addressing up to 216 = 64K (kilo) memory locations. Memory Hierarchy The memory unit that communicates directly with the CPU is called the main memory. Only programs and data currently needed by the processor reside in main memory. Devices that provide backup storage are called auxiliary memory or Secondary Storage Memory (Magnetic disks/Hard drives and SSDs). There is a very high speed memory called cache employed between the CPU & main memory to compensate the mismatch in operating speeds. Cache memory is sometimes used to increase the speed of processing by making current programs and data available to the CPU at a rapid rate (extremely fast), Whose access time is close to processor logic clock cycle time. It used for storing segments of programs currently being executed in the CPU and temporary data frequently needed. Cache Memory
Cache memory is a high-speed memory, which is small in size but
faster than the main memory (RAM). The CPU can access it more quickly than the main memory Cache memory can only be accessed by CPU. It holds the data and programs which are frequently used by the CPU. So, it makes sure that the data is instantly available for CPU whenever it needs. In other words, if the CPU finds the required data or instructions in the cache memory, it doesn't need to access the primary memory(RAM). Thus, by acting as a buffer between RAM and CPU, it speeds up the system performance. Types of Cache Memory L1: It is the first level of cache memory, which is called Level 1 cache or L1 cache. In this type of cache memory, a small amount of memory is present inside the CPU itself. If a CPU has four cores (Quad core CPU), then each core will have its own level 1 cache. The size of L1 ranges from 2KB to 64 KB. The L1 cache is divided into two types of caches: Instruction cache, which stores instructions required by the CPU, and data cache that stores the data required by the CPU.
L2: It is known as Level 2 cache or L2 cache. This may be inside/outside
the CPU. All the cores of a CPU can have their own separate level 2 cache, or they can share one L2 cache among themselves. In case it is outside the CPU, it is connected with the CPU with a very high-speed bus. The size of L2 ranges from 256 KB to the 512 KB. It is slower than the L1.
L3: It is known as Level 3 cache or L3 cache. This cache is not present
in all the processors; some high-end processors may have this type of cache. This cache is used to enhance the performance of L1 and L2 cache. It is located outside the CPU and is shared by all the cores of a CPU. The size of L3 ranges from 1 MB to 8 MB. Although it is slower than L1 and L2 cache, it is faster than RAM. Performance of Cache Memory The basic operation of the cache is as follows: When the CPU needs to access memory, the cache is examined. If the word is found in the cache, it is read from it, otherwise, it is read from the main memory. The performance of cache memory is frequently measured in terms of a quantity called hit ratio. When the CPU refers to memory and finds the word in cache, it is said to produce a hit . If not found, it is in main memory and it counts as a miss . The ratio of the no. of hits divided by the total CPU references (hits + misses) to memory is the hit ratio. The average memory access time of a computer system can be improved, if the hit ratio is high enough. Mapping Techniques of Cache Memory The transformation of data from main memory to cache memory is referred to as a mapping process. Three types of mapping procedures include: 1. Associative mapping 2. Direct mapping 3. Set-associative mapping
NOTE: For every word stored in cache, there is a
duplicate copy in main memory. The CPU communicates with both memories. It first sends a 15- bit address to cache. If there is a hit, the CPU accepts the 12-bit data from cache. If there is a miss, the CPU reads the word from main memory and the word is then transferred to cache. 1. Associative Mapping: The associative memory stores both the address and content(data) of the memory word. This permits any location in cache to store any word from main memory. 2. Direct Mapping The direct mapping cache organization uses the n-bit address to access the main memory and the k-bit index to access the cache. The internal organization of the words in the cache memory consists of the dataword and its associated tag. When a new word is first brought into the cache, the tag bits are stored alongside the data bits. When the CPU generates a memory request, the index field is used for the address to access the cache. The tag field of the CPU address is compared with the tag in the cache. If the two tags match, there is a hit and the desired data word is in the cache. If there is no match, there is a miss and the required word is read from main memory. It is then stored in the cache together with the new tag, replacing the previous value. Disadvantage: If 2 or more words have the same index but different tags are accessed rapidly, the hit ratio may drop. 3.Set-Associative Mapping Set-associative mapping, is an improvement over the direct mapping in which, each word of cache can store two or more words of memory under the same index address. Each data word is stored together with its tag and the no. of tag-data items in one word of cache is said to form a set. When the CPU generates a memory request, the index value of the address is used to access the cache. The tag field is then compared with both tags in the cache to determine if a match occurs. The comparison logic is done by an associative search of the tags in the set: thus the name "set-associative“. The hit ratio will improve as the set size increases because more words with the same index but different tags can reside in cache. When a miss occurs and the set is full, it is necessary to replace one of the tag-data items with a new value. The most common Replacement algorithms used are: Random Replacement, FIFO and LRU With the Random Replacement algorithm, the control chooses one tag-data item for replacement at random. The FIFO algorithm selects the item that has been in the set for the longest, for replacement The LRU algorithm selects the item that has been least recently used by the CPU, for replacement. Swapping Swapping is a mechanism, in which a process is swapped out temporarily from main memory to secondary storage (backing store) and make that memory available to other processes, which swap in. A process needs to be in memory to be executed. A process, can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. In a multiprogramming environment with a round-robin CPU- scheduling algorithm, when the quantum expires, the memory manager will swap-out the process that just finished and swaps-in another process waiting for the memory space. In priority-based CPU scheduling algorithm, if a higher-priority process arrives and wants service, the memory manager swaps out the lower-priority process, so that it can load and execute the higher-priority process. When the higher-priority process finishes, the lower-priority process can be swapped back in and continued. This variant of swapping is sometimes called roll out, roll in. Swapping requires a backing store. The system maintains a ready queue consisting of all processes whose memory images are on the backing store and are ready to run. Whenever the CPU scheduler decides to execute a process, it calls the dispatcher. The dispatcher checks whether the next process in the queue is in memory. If not, and there is no free memory, the dispatcher swaps out a process currently in memory and swaps in the desired process. The context-switch time in such a swapping system is high. Let us assume that the user process is of size 1 MB and the backing store with a transfer rate of 5 MB per sec. The actual transfer of the 1 MB process to or from memory takes 1MB/5MB per sec = 1/5 sec = 200 milliseconds. Contiguous Memory Allocation The main memory must accommodate both the OS and the various user processes. The memory is usually divided into two partitions: one for the resident operating system, and one for the user processes. We may place the OS in either low or high memory. In contiguous memory allocation, each process is contained in a single contiguous section of memory.
Memory Mapping and Protection:
•Memory mapping is provided by using a relocation register along with a limit register. •The relocation register contains the value of the smallest physical address; the limit register contains the range of logical addresses. Logical address must be less than the limit register. The MMU maps the logical address dynamically by adding the value in the relocation register. This mapped address is sent to memory.
When the CPU scheduler selects a process for execution, the
dispatcher loads the relocation and limit registers with the correct values. Because every address generated by the CPU is checked against these registers, we can protect both the OS, user programs and data from being modified by this running process. Fragmentation As processes are loaded and removed from memory, the free memory space is broken into little pieces. It happens after sometimes that processes cannot be allocated to memory blocks considering their small size and memory blocks remains unused. This problem is known as Fragmentation.
Fragmentation is of two types −
External fragmentation Total memory space is enough to satisfy a request, but the available
space is not contiguous i.e., storage is fragmented into a no. of
small holes. so it cannot be used. Internal fragmentation Memory block allocated to a process is slightly bigger. So, some
portion of memory is left unused. As it cannot be used by another
process, hence this size difference is called Internal fragmentation. One Solution to the "external fragmentation" is Compaction. Compaction: Shuffle the memory contents to place all free memory together in one large block. It is possible only if relocation is dynamic, and is done at execution time. The compaction algorithm moves all processes toward one end of memory; all holes move in the other direction, producing one large hole of available memory. This scheme is expensive. The following diagram shows how fragmentation can cause waste of memory and a compaction technique can be used to create more free memory out of fragmented memory − Another possible solution to the external fragmentation is to permit the logical address space of the processes to be non-contiguous, thus allowing a process to be allocated physical memory wherever available. Two complementary techniques to achieve this solution are Paging and Segmentation. These techniques can also be combined for better results. Paging Paging is a memory-management scheme which permits the physical-address space of a process to be non-contiguous. Due to its advantages, it is used in most OSs and is handled by a hardware. Paging avoids the problem of fitting the varying-sized memory chunks onto the backing store. In Paging, the process address space is broken into blocks of the equal size called pages. Similarly, main memory(physical) is divided into small fixed-sized blocks of memory called frames and the size of a frame is kept the same as that of a page to have optimum utilization of the main memory and to avoid external fragmentation. Basic Method: Paging involves breaking of Physical memory into fixed-sized blocks called frames, breaking of Logical memory into blocks of the same size called pages. To run a program of size n pages, need to find n free frames and load program. Set up a page table to translate logical to physical addresses. •The page number is used as an index into a page table. The page table contains the base address(f) of each page in physical memory.
•This base address (f) is combined with the page offset to
define the physical memory address that is sent to the memory unit. Logical Address generated by CPU is divided into: • Page number (p) – used as an index into a page table which contains base address of each page in physical memory. • Page offset (d) – combined with base address to define the physical memory address. The page size or frame size is defined by the hardware. The size of a page is typically a power of 2, varying between 0.5MB - 16 MB per page. If the size of logical-address space is 2m, and a page size is 2n bytes, then the high-order m – n bits of a logical address designate the page number, and the n low-order bits designate the page offset. Thus, the logical address is as follows:
Advantages and Disadvantages of Paging
Paging reduces external fragmentation, but still suffer from internal fragmentation. Due to equal size of the pages and frames, swapping becomes very easy. Page table requires extra memory space, which is not good for a system having small RAM. Paging Example For eg., Using page size=4 bytes and physical memory=32 bytes(8 pages), let us see how the user's view of memory can be mapped into physical memory. Logical address 0 is page 0, offset 0. Indexing into the page table, we find that page 0 is in frame 5. Thus, logical address ‘0’ maps to physical address 20. (frame x Page size)+offset (= (5 x 4) + 0).
Similarly, Logical address 3 (page 0,
offset 3) maps to physical address 23 (= (5 x 4) + 3). Logical address 4 is page 1, offset 0; according to the page table, page 1 is mapped to frame 6. Thus, logical address 4 maps to physical address 24 (= (6 x 4) + 0). Hardware Support (Paging Hardware With TLB) The use of registers is feasible for the page tables which are small. But most computers need large page tables. so, it is not feasible for such. We can use a page-table base register (PTBR), which points to the page table. Changing page tables requires changing only one register, substantially reducing context-switch time. The problem with this approach is, two memory accesses are needed to access a byte. Thus, memory access is slowed. The solution to this problem is addressed by Translation Look-aside Buffer (TLB), which is a special, small, fast lookup hardware cache. Each entry in the TLB consists of two parts: a key (page no.) and a frame no.. When the associative memory is presented with an item, it is compared with all keys simultaneously. If the item is found, the corresponding value field is returned. The no. of entries in a TLB is small, i.e., 64 - 1024 The search is fast, hardware is expensive. The TLB contains few of the page-table entries. When a logical address is generated by the CPU, its page no. is presented to the TLB. If the page no. is found(known TLB Hit), its frame no. is available and is used to access the memory. Paging Hardware With TLB If the page no. is not found in the TLB (known as a TLB miss), a memory reference to the page table must be made to obtain the frame no., inorder to access the memory.