You are on page 1of 20

* Interrupt and USB and SATA not included

1. CACHE MEMORY

● Cache Memory is a special very high-speed memory.


● It is used to speed up and synchronizing with high-speed CPU.
● Cache memory is costlier than main memory or disk memory but
economical than CPU registers.
● Cache memory is an extremely fast memory type that acts as a buffer
between RAM and the CPU.
● It holds frequently requested data and instructions so that they are
immediately available to the CPU when needed.

Cache Mapping
FORMULA FOR MAPPING:
Simple Cache Representation

ACCESSING 22,26,22,26,16,3,16,18,16
Key points of Direct Mapping
❖ In Direct mapping, assign each memory block to a specific line in the
cache.
❖ If a line is previously taken up by a memory block when a new block
needs to be loaded, the old block is trashed.
❖ An address space is split into two parts index field and a tag field. The
cache is used to store the tag field whereas the rest is stored in the
main memory.
❖ Direct mapping`s performance is directly proportional to the Hit ratio

Need of Tag bits and Valid bits


1. The tags contain the address information required to identify whether a
word in the cache corresponds to the requested word.
2. Valid bit used to indicate whether an entry contains a valid address. If the
bit is not set, there cannot be a match for this block.

Handling Writes
1. Write-through - The simplest way to keep the main memory and the
cache consistent is always to write the data into both the memory and the
cache.
2. Write Back- when a write occurs, the new value is written only to the
block in the cache. The modified block is written to the lower level of the
hierarchy when it is replaced. Write-back schemes can improve
performance.

How does cache memory work with CPU?


When CPU needs the data, first of all, it looks inside the L1 cache. If it does not find
anything in L1, it looks inside the L2 cache. If again, it does not find the data in L2
cache, it looks into the L3 cache. If data is found in the cache memory, then it is
known as a cache hit. On the contrary, if data is not found inside the cache, it is
called a cache miss.

If data is not available in any of the cache memories, it looks inside the Random
Access Memory (RAM). If RAM also does not have the data, then it will get that data
from the Hard Disk Drive.
CACHE PERFORMANCE
❏ The performance of the cache is in terms of the hit ratio.
❏ The CPU searches the data in the cache when it requires writing or read any
data from the main memory. In this case, two cases may occur as follows:
● If the CPU finds that data in the cache, a cache hit occurs and it reads
the data from the cache.
● On the other hand, if it does not find that data in the cache, a cache
miss occurs. Furthermore, during cache miss, the cache allows the
entry of data and then reads data from the main memory.
● Therefore, we can define the hit ratio as the number of hits divided by
the sum of hits and misses.

Main Formulas:

1. Memory stall clock cycle - It includes cache miss scenarios.


2. CPU execution clock cycle - It includes cache hit scenarios.
3. Write buffer stalls are negligible, so can be ignored. Hence memory stall
can be written as follows:

AMAT - Average memory access time


Ways to improve Cache Performance
1. Reducing Cache Misses by More Flexible Placement of Blocks.
a. Associative mapping - In this type of mapping, the
associative memory is used to store content and
addresses of the memory word. Any block can go into any
line of the cache.

b. Set associative mapping - This form of mapping is an


enhanced form of direct mapping where the drawbacks of
direct mapping are removed.A block in memory can map to
any one of the lines of a specific set.
2. Reducing the Miss Penalty Using
Multilevel Caches

1. L1 cache, or primary cache, is extremely fast but relatively small,


and is usually embedded in the processor chip as CPU cache.
2. L2 cache, or secondary cache, is often more capacious than L1. L2
cache may be embedded on the CPU, or it can be on a separate chip
or coprocessor and have a high-speed alternative system bus
connecting the cache and CPU. That way it doesn't get slowed by
traffic on the main system bus.
3. Level 3 (L3) cache is specialized memory developed to improve the
performance of L1 and L2. L1 or L2 can be significantly faster than
L3, though L3 is usually double the speed of DRAM.
Cache Replacement Algorithms
In computing, cache algorithms (also frequently called cache replacement algorithms or
cache replacement policies) are optimizing instructions, or algorithms, that a computer
program or a hardware-maintained structure can utilize in order to manage a cache of
information stored on the computer. When the cache is full, the algorithm must choose which
items to discard to make room for the new ones.

1. Random Replacement
2. FIFO
3. Least Recently Used
4. Most Recently used
1. Random Replacement - Randomly selects a candidate item and discards it to make
space when necessary. This algorithm does not require keeping any information about
the access history.

2. FIFO - First in First out policy


● The block which has entered first in the main be replaced first.
● This can lead to a problem known as "Belady's Anamoly", it starts that if we
increase the no. of lines in cache memory the cache miss will increase.

Example: Let we have a sequence 7, 0 ,1, 2, 0, 3, 0, 4, 2, 3, and cache memory has


4 lines.
4. Most recently used (MRU)[edit]

In contrast to Least Recently Used (LRU), MRU discards the most recently used items first.

The access sequence for the below example is A B C D E C D B.

Here, A B C D are placed in the cache as there is still space available. At the 5th access
E, we see that the block which held D is now replaced with E as this block was used
most recently. Another access to C and at the next access to D, C is replaced as it was
the block accessed just before D and so on.
2. VIRTUAL MEMORY
❖ Virtual Memory is a storage scheme that provides user an illusion of having a
very big main memory. This is done by treating a part of secondary memory
as the main memory.
❖ In this scheme, User can load the bigger size processes than the available
main memory by having the illusion that the memory is available to load the
process.
❖ Instead of loading one big process in the main memory, the Operating System
loads the different parts of more than one process in the main memory.
❖ By doing this, the degree of multiprogramming will be increased and therefore,
the CPU utilization will also be increased.

How virtual memory works


➢ Virtual memory uses both hardware and software to operate. When an
application is in use, data from that program is stored in a physical address using
RAM.
➢ A memory management unit (MMU) map a logical address space to a
corresponding physical address.
➢ If, at any point, the RAM space is needed for something more urgent, data can be
swapped out of RAM and into virtual memory. The computer's memory manager
is in charge of keeping track of the shifts between physical and virtual memory. If
that data is needed again, the computer's MMU will use a context switch to
resume execution.
➢ While copying virtual memory into physical memory, the OS divides memory with
a fixed number of addresses into either pagefiles or swap files. Each page is
stored on a disk, and when the page is needed, the OS copies it from the disk to
main memory and translates the virtual addresses into real addresses.
➢ However, the process of swapping virtual memory to physical is rather slow. This
means using virtual memory generally causes a noticeable reduction in
performance.

Address Translation

● Translation of the virtual page number to a physical page number. The


physical page number constitutes the upper portion of the physical
address, while the page offset, which is not changed, constitutes the lower
portion.
● The number of bits in the page offset field determines the page size.
Translation Lookaside Buffer
● A translation lookaside buffer (TLB) is a memory cache that stores the recent
translations of virtual memory to physical memory. It is used to reduce the time
taken to access a user memory location. It can be called an address-translation
cache.
● TLB contains page table entries that have been most recently used.
● Since the page tables are stored in main memory, every memory
access by a program can take at least twice as long: one memory
access to obtain the physical address and a second access to get the
data.
Process:

1. Given a virtual address, the processor examines the TLB if a


page table entry is present (TLB hit), the frame number is
retrieved and the real address is formed.
2. If a page table entry is not found in the TLB (TLB miss), the page
number is used as index while processing page table. TLB first
checks if the page is already in main memory, if not in main
memory a page fault is issued then the TLB is updated to include
the new page entry.

Implementing Protection with


Virtual Memory
● The protection mechanism must ensure that although multiple
processes are sharing the same main memory, one renegade process
cannot write into the address space of another user process or into
the operating system either intentionally or unintentionally.
● Hardware must provide the following:
○ Support at least two modes that indicate whether the running process is
a user process or an operating system process
○ Provide a portion of the processor state that a user process can read but
not write. This includes the user/supervisor mode bit, which dictates
whether the processor is in user or supervisor mode, the page table
pointer, and the TLB. To write these elements, the operating system uses
special instructions that are only available in supervisor mode.
○ Provide mechanisms whereby the processor can go from user mode to
supervisor mode and vice versa using system call exception.

Advantages of Virtual Memory


● Allowing users to operate multiple applications at the same time or
applications that are larger than the main memory
● Freeing applications from having to compete for shared memory space and
allowing multiple applications to run at the same time
● Allowing core processes to share memory between libraries, which consists of
written code that provides the foundation for a program's operations
● Improving security by isolating and segmenting where the computer stores
information
● Improving efficiency and speed by allowing more processes to sit in virtual
memory
● Lowering the cost of computer systems as you find the right amount of main
memory and virtual memory.
Direct Memory Access(DMA)

● Direct memory access (DMA) is a feature of computer systems and allows


certain hardware subsystems to access main system memory
independently of the central processing unit (CPU).
● Without DMA, when the CPU is using programmed input/output, it is
typically fully occupied for the entire duration of the read or write operation,
and is thus unavailable to perform other work. With DMA, the CPU first
initiates the transfer, then it does other operations while the transfer is in
progress, and it finally receives an interrupt from the DMA controller
(DMAC) when the operation is done.
● This feature is useful at any time that the CPU cannot keep up with the rate
of data transfer, or when the CPU needs to perform work while waiting for
a relatively slow I/O data transfer.

Working steps
1. If the DMA controller is free, it requests the control of bus from the
processor by raising the bus request signal.
2. Processor grants the bus to the controller by raising the bus grant signal,
now DMA controller is the bus master.
3. The processor initiates the DMA controller by sending the memory
addresses, number of blocks of data to be transferred and direction of data
transfer.
4. After assigning the data transfer task to the DMA controller, instead of
waiting ideally till completion of data transfer, the processor resumes the
execution of the program after retrieving instructions from the stack.
5. It makes the data transfer according to the control instructions received by
the processor.
6. After completion of data transfer, it disables the bus request signal and
CPU disables the bus grant signal thereby moving control of buses to the
CPU.
Types of Data Transfer
a) Burst Mode: In this mode DMA handover the buses to CPU only after completion of

whole data transfer. Meanwhile, if the CPU requires the bus it has to stay ideal and wait

for data transfer.

b) Cycle Stealing Mode: In this mode, DMA gives control of buses to CPU after transfer

of every byte. It continuously issues a request for bus control, makes the transfer of one

byte and returns the bus. By this CPU doesn’t have to wait for a long time if it needs a

bus for higher priority task.

c) Transparent Mode: Here, DMA transfers data only when CPU is executing the

instruction which does not require the use of buses.

Key Points
➔ To speed up the transfer of data between I/O devices and memory,
DMA controller acts as station master.
➔ DMA controller is a control unit, part of I/O device’s interface circuit,
which can transfer blocks of data between I/O devices and main
memory with minimal intervention from the processor.
➔ It is controlled by the processor. The processor initiates the DMA
controller by sending the starting address, Number of words in the
data block and direction of transfer of data .i.e. from I/O devices to
the memory or from main memory to I/O devices.
➔ More than one external device can be connected to the DMA
controller.
➔ DMA controller contains an address unit, for generating addresses
and selecting I/O device for transfer.
➔ It also contains the control unit and data count for keeping counts of
the number of blocks transferred and indicating the direction of
transfer of data.
➔ When the transfer is completed, DMA informs the processor by
raising an interrupt.
I/O Interface - Parallel and Serial
● The I/O interface of a device consists of the circuitry needed to connect
that device to the bus.
● On one side of the interface are the bus lines for address, data, and control.
On the other side are the connections needed to transfer data between the
interface and the I/O . This side is called a port, and it can be either a
parallel or a serial port.
● A parallel port transfers multiple bits of data simultaneously to or from the
device. A serial port sends and receives data one bit at a time.

Functions of I/O Interface


1. Registers
a. for temporary storage of data
b. status register containing status information
c. control register that holds the information governing the behavior of the
interface
2. Contains address-decoding circuitry to determine when it is being
addressed by the processor.
3. Generates the required timing signals
4. Performs any format conversion that may be necessary to transfer data
between the processor and the I/O device, such as parallel-to-serial
conversion in the case of a serial port.
Parallel Interface
Input Interface - Keyboard

Registers - a data register, KBD_DATA, and a status register, KBD_STATUS.


Status flag -keyboard status flag, KIN.

Working of Keyboard:
A typical keyboard consists of mechanical switches that are normally open.
1. When a key is pressed, its switch closes and establishes a path for an
electrical signal.
2. This signal is detected by an encoder circuit that generates theASCII
code for the corresponding character.

Main issue:
Bouncing - A difficulty with such mechanical pushbutton switches is that the
contacts bounce when a key is pressed, resulting in the electrical connection
being made then broken several times before the switch settles in the closed
position.

Solution:
1. Using debouncing circuit along with the encoder circuit.The I/O routine
can read the input character as soon as it detects that KIN is equal to 1.
2. Using software based solution -The software detects that a key has been
pressed when it observes that the keyboard status flag, KIN, has been set
to 1. The I/O routine can then introduce sufficient delay before reading the
contents of the input buffer, KBD_DATA, to ensure that bouncing has
subsided.

Explanation of the circuit:

Encoder circuit:
● The output of the encoder consists of one byte of data representing the
encoded character and one control signal called Valid.
● When a key is pressed, the Valid signal changes from 0 to 1, causing the
ASCII code of the corresponding character to be loaded into the
KBD_DATA register and the status flag KIN to be set to 1.
Status flag
● The status flag is cleared to 0 when the processor reads the contents of
the KBD_DATAregister.

Address
● When the processor requests a Read operation, it places the address of
the appropriate register on the address lines of the bus.

R/W’ =1, indicating a Read operation

Slave Ready
● Slave-ready signal is set at the same time, to inform the processor that the
requested data or status information has been placed on the data lines.
STATUS FLAG CIRCUIT

● The KIN flag is the output of a NOR latch connected as shown.


● KIN=1,only when Master-ready is low.
● Both the flip-flop and the latch are reset to 0 when Read-data
becomes equal to 1, indicating that KBD_DATA is being read.
Output Interface - Display
Working:
1. When the display is ready to accept a character, it sets its Ready
signal, which causes the DOUT flag in the DISP_STATUS register to
be set to 1. When the I/O routine checks DOUT and finds it equal to 1,
it sends a character to DISP_DATA.
2. This clears the DOUT flag to 0 and sets the New-data signal to 1.
3. In response, the display returns Ready to 0 and accepts and displays
the character in DISP_DATA.
4. When it is ready to receive another character, it sets Ready again,
and the cycle repeats.

Both input interface and output interface has one more diagram having
gates. If required- please refer book

Serial Interface

➔ A serial interface is used to connect the processor to I/O devices that


transmit data one bit at a time.
➔ Data are transferred in a bit-serial fashion on the device side and in a
bit-parallel fashion on the processor side. The transformation
between the parallel and serial formats is achieved with shift
registers that have parallel access capability.
➔ The input shift register accepts bit-serial input from the I/O device.
When all 8 bits of data have been received, the contents of this shift
register are loaded in parallel into the DATAIN register.
➔ Similarly, output data in the DATAOUT register are transferred to the
output shift register, from which the bits are shifted out and sent to
the I/O device.

Status Flags - SIN and SOUT


● SIN = 1 when new data are loaded into DATAIN from the shift register,
and cleared to 0 when these data are read by the processor.
● SOUT =1 when data are transferred from DATAOUT to the output shift
register. It is cleared to 0 when the processor writes new data into
DATAOUT.

Need of double buffering


● With double buffering, the transfer of the second character can begin
as soon as the first character is loaded from the shift register into the
DATAIN register.
● Thus, provided the processor reads the contents of DATAIN before
the serial transfer of the second character is completed, the interface
can receive a continuous stream of input data over the serial line.

How clock is managed? Asynchronous vs Synchronous


● During serial transmission, the receiver needs to know when to shift
each bit into its input shift register.
● Since there is no separate line to carry a clock signal from the
transmitter to the receiver, the timing information needed must be
embedded into the transmitted data using an encoding scheme.

Asynchronous Transmission

● The line connecting the transmitter and the receiver is in the 1 state
when idle.
● Start bit=0, followed by 8 data bits and 1 or 2 Stop bits. The Stop bits
have a logic value of 1.
● The 1-to-0 transition at the beginning of the Start bit alerts the
receiver that data transmission is about to begin.
● Using its own clock, the receiver determines the position of the next
8 bits, which it loads into its input register. The Stop bits following the
transmitted character, which are equal to 1, ensure that the Start bit
of the next character will be recognized.
● When transmission stops, the line remains in the 1 state until another
character is transmitted.

Synchronous Transmission
★ Asynchronous is useful only where the speed of transmission is
sufficiently low.
★ In synchronous transmission, the receiver generates a clock that is
synchronized to that of the transmitter by observing successive
1-to-0 and 0-to-1 transitions in the received signal.
★ It adjusts the position of the active edge of the clock to be in the
center of the bit position.
★ A variety of encoding schemes are used to ensure that enough signal
transitions occur to enable the receiver to generate a synchronized
clock and to maintain synchronization.
★ Once synchronization is achieved, data transmission can continue
indefinitely

You might also like