You are on page 1of 65

The Buffer Cache

Memory Hierarchy
• Now we know that the files are stored on the hard
drive and the processes can access these files and
create new files on the disk.
• When a process requests for a file the kernel brings
the file into the main memory where user process can
change, read or access the file.
• The kernel could read and write the file directly from
the hard disk and put it in memory and vice versa but
the response time and throughput will be very low in
this case because of disks sow data transfer speed.
• To minimize the frequency of disk usage/access the
kernel keeps a buffer to store the recently
accessed files and/or frequently accessed files. This
buffer is called the buffer cache.
Main
Buffer Cache Memory Buffer
Cache

• Buffer cache is not same as a Cache memory.


• Buffer cache is a part of Main Memory which
contains different blocks of data from secondary
memory.
• Usually buffer cache is maintained in system area of
Main memory
• When partition Main memory in to different
partition then it allocates one or two partitions to
system area , so it is managed by Operating system
only.
Main
Memory
Secondary
Buffer
Cache Storage
Device

• 1st Communication takes place between Buffer


cache and Secondary device
• Buffer Cache is array or Pool of Buffer area.
• Each Buffer contains two area : Header & Data Area
• Data Area – Contains data from block of disk.
• Suppose if I have 50 Buffers , then 50 blocks
containing data from disk
• Access Mechanism :
• First time if there is cache miss followed by Main
memory miss then access data from secondary
storage.
• Before going to secondary storage first it searches in
Buffer cache for block of data.
Access Mechanism :

• When the process want to read a file the kernel


attempts to read this file in the buffer cache, if the
data is found in the buffer cache the data/file is
sent to the process.
• If the file is not found in the buffer cache then the
file is read from the disk and then kept in the buffer
cache so that it can be made available to the
process.
• If Buffer cache is not free then , have to victimize
one of the buffer cache , Means every buffer cache
contains some block of secondary storage ,
• So out of that push-out one of the block from one
of the buffer cache to make it free to contain a
new block from secondary storage.
• For that a replacement algorithm have to use is
Least replacement algorithm
• Header area is for buffer management & Data Area
contains block of data of size 256 bytes, 512 bytes ,
1024 bytes.
Buffer Headers

• When the system initializes the kernel allocates the


space for the buffer cache. The buffer cache
contains two regions. One for the data/files that
will be read from the disk, second the buffer header.
• The data in the buffer cache corresponds to the
logical blocks of the disk block of file system. The
buffer cache is “in memory” representation of the
disk blocks.
• There will never be a case when the buffer has two
entries for the same file on disk as this could lead
to inconsistencies. There is only and only one
copy of a file in the buffer.
• The buffer header contains the metadata
information like device number and the block
number range for which this buffer holds the data.
• The buffer header also contains pointer to a data
array for the buffer (i.e. pointer to the data region) .
• The buffer header also contains the status of the
buffer. The status of the buffer could be
Locked/unlocked (busy /free)
Buffer contains a valid data or not.
Whether the kernel should write the contents to disk
immediately or before reassigning the buffer(write
delay)
Kernel is currently reading the data or writing the data.
Is there any process waiting for the buffer to get free.
Buffer Header
• Replacement algorithm have to use is Least
replacement algorithm
• This can be implemented with the help of linked
list , because there are number of pointers in
buffer headers, it is quite obvious that buffer are
maintained in linked list form.
• In this case linked list will be doubly linked list,
because we have 2 pointers for every queue one
pointing to next buffer in queue and other pointing
to previous buffer queue
• Whenever a buffer has to be taken for overwriting
the data block by new data block then how this is
done.
Header
Node

• Header node points to forward direction , buffers


points to forward and backward pointers , this forms a
doubly linked circular list.
• Let us assume all these buffers are free list, that
means header node is Header of free list and all
the buffers in doubly linked list are free i.e Buffers
are not used by any process.
• Let us assume data blocks which are contained in
these buffer list are 2,15,32 .
Header
Node 2 15 32

• Header node points to forward direction , buffers


points to forward and backward pointers , this forms a
doubly linked circular list.
• Now suppose a process request a block 10, if it
does not exist in any of the buffer then block no 10
has to be read in and it has to be put in one of the
buffer , which are currently free , may not be empty.
• In such case , we can take out buffer which is
overwritten i.e always take from first buffer of free
list.
• Always take buffer from head of the list.
• Overwrite this with data from block no 10
Header
2 15 32

• Block 10
Header
2 15 32

• Block 10

10
• So while writing continuous this buffer will be
locked , when the writing is over then buffer will
be free.
• Now buffer is written with data from block no 10
i.e. this is one which is most recently used buffer.
Header
15 32 10

Free list
• Whenever I have to take a buffer for writing in a data
block , in that case always take a buffer from the
header of the free list.
• Whenever I want to return a buffer to free list , I will
always return at the tail of the list.
• Whenever a buffer just been used and it becomes
free then that is the buffer “Most recently used”
• Most recently used buffer are placed always at the
tail of the list.

Most
least Recently
Recently used
used

Header
15 32 10

Free list
• Now suppose some process wants to request data
block no – 12
• 1st check in buffer cache, if it is not in Buffer cache
then only read it from disk of block no 12 and put it
into one of the buffer cache.
• WORST CASE: if block no 12 does not exist and if
there are 1000 of buffer cache then we have to check
all 1000 buffer cache before I declare block no -12
does not exist.
• i.e search time to find out a particular block is present
or not is quite high IF I MAINTAINED A SINGLE LIST
• So instead of maintaining a single list , Maintain a
Number of list i.e called as HASH QUEUE.
• Suppose I have 4 hash queue, will put those buffers
in hash queue which contains a block number say n
Mod operation n mod 4
• n means a block number contained in buffer
• So we decide which buffer will be put in which hash
queue
• Ex; if n mod 4 is zero ,then buffer will be put in HQ0
N mod 4
HQ 0

• Suppose if N=6
HQ 1
• 6 mod 4 = 2
• Then buffer containing block no 6
HQ 2
will present in HQ2

HQ 3
N mod 4

HQ 0 28 4 64

HQ 1 17 5 97

HQ 2 98 50 10

HQ 3 3 35 99
• Free list is subset of nodes which are already
there in different hash queues.

• For example :
HQ 0 28 4 64

HQ 1 17 5 97

98 50 10
HQ 2

3 35 99
HQ 3

Free list
Header
• In above situation some of buffer are free, which
exist in free list and buffers which are not free exist
only in hash queue , they don’t exist in free list.
Structure of the buffer pool

• The kernel catches the least recently used data into


the buffer pool.
• The kernel also maintains a free list of buffers. The
free list is a doubly circular list of buffers.
• When kernel wants to allocate any buffer it removes
a node from the free list, usually from the
beginning of list but is could take it from middle of
the list too.
• When kernel frees a node from the buffer list it
adds this free node at the end of the free list.
• When kernel want to access the disk it searches the
buffer pool for a particular device number-block
number combination (which is maintained in the
buffer header).
• The entire buffer pool is organized as queues
hashed as a function of device number-block
number combination. The figure down below shows
the buffers on their hash queues
The important thing to note here is that no two nodes in the buffer pool
can contain the data of same disk block i.e. same file.
Example
• Suppose a Process request for Block no: 9
• i.e. 9 mod 4 = 1 , buffer containing block no:9 will
exist only in hash queue.(have to search only HQ.1)
• If there is no buffer containing block no :9, then
block does not exist in buffer cache.(Checks by using
Device no and Block no)
• So search time is reduced to a great extent when
we distribute buffers in to a number of hash
queues.
Different scenarios- Algorithm
• When a process puts request for a particular block,
suppose block no:9
• First we have to check whether it is present in hash
queue of buffers.
• Two situations , either it may present or may not
present in buffer cache.
• If it is present in buffer cache we have two situations,
that is buffer may be currently locked (i.e it is used by
some other process)
• Second is it finds data is present in buffer cache and it
is free , immediately acquired.
• Other case can be , process request for data it goes to
hash queue 1 and finds block no:9 is not present in
HQ1( Block no:9 does not exist in buffer cache)
• Then we have to get a node or buffer from free list and
overwrite that buffer with data from block no:9
• Various situations will arise
• First is free list is empty i,.e there is no buffer which is
currently free. (Process has to wait until some buffer
can free to overwrite data)
• Another situation is we get node on buffer list but is
marked as Delayed write.
Scenarios of retrieval of buffer

• High level kernel algorithms in file subsystem


invoke the algorithms of buffer pool to manage
the buffer cache.
• The algorithm for reading and writing disk
blocks uses the algorithm getblk to allocate
buffer from the pool.
1) GetBlock (file_system_no,block_no)
2) {
3) while (buffer not found)
4) {
5) if (buffer in hash queue)
6) {
7) if (buffer busy)
8) {
9) sleep (event buffer becomes free)
10)continue
11)}
12)mark buffer busy
13)remove buffer from free list
14)return buffer
15)}
16)Else
17){
18)if (there is no buffer on free list)
19){
20)sleep (event any buffer becomes free)
21)continue
22)}
23)remove buffer from free list
24)if (buffer marked as delayed write)
25){
26)asyschronous white buffer to disk
27)continue
28) }
29)remove buffer from hash queue
30)put buffer onto hash queue
31)return buffer
32)}
• The five typical scenarios that kernel may follow in getblk to
allocate a buffer in the disk block are
 1. The kernel finds the block on its hash queue, and its
buffer is free.
 2. The kernel cannot find the block on the hash queue, so it
allocates a buffer from the free list.
 3. The kernel cannot find the block on the hash queue and,
in attempting to allocate a buffer from the free list (as in
scenario 2), finds a buffer on the free list that has been
marked “Delayed write." The kernel must write the delayed
write" buffer to disk and allocate another buffer.
 4. The kernel cannot find the block on the hash queue, and
the free list of buffers is empty.
 5. The kernel finds the block on the hash queue, but its
buffer is currently busy.
Program for Retrival of a Buffer using getblock()

Implementation Concept:
• Buffer pool according to LRU
• The kernel maintains a free list of buffer
– doubly linked list
– take a buffer from the head of the free list.
• When returning a buffer, attaches the buffer to the
tail.
• When the kernel accesses a disk block
– separate queue (doubly linked circular list)
– hashed as a function of the device and block num
– Every disk block exists on one and only once on the
queue
• Determine the logical device num and block num
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99

Figure 3.3 Buffers on the Hash Queues


• The algorithms for reading and writing disk blocks
use the algorithm getblk
– The kernel finds the block on its hash queue
• The buffer is free.
• The buffer is currently busy.
– The kernel cannot find the block on the hash queue
• The kernel allocates a buffer from the free list.
• In attempting to allocate a buffer from the free list, finds a
buffer on the free list that has been marked “delayed write”.
• The free list of buffers is empty.
Retrieval of a Buffer:1st Scenario (a)
• The kernel finds the block on the hash queue and its buffer is
free
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99

freelist header

Search for block 4 52


Retrieval of a Buffer:1st Scenario (b)

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99

freelist header

Remove block 4 from free list


53
Before continuing to other scenarios lets see what happens after the buffer is
allocated. The kernel may read the data, manipulate it and/or change it in the buffer.
While doing so the kernel marks the buffer as busy so that no other process can access
this block. When the kernel is done using this block it releases the buffer
using brelse algorithm.
Retrieval of a Buffer: 2nd Scenario (a)
• The kernel cannot find the block on the hash queue, so it
allocates a buffer from free list
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99

freelist header

Search for block 18: Not in cache


55
Retrieval of a Buffer: 2nd Scenario (b)
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10 18
blkno2 mod 4

blkno3 mod 4 35 99

freelist header

Remove 1st block from free list: Assign to 18


56
Retrieval of a Buffer: 3rd Scenario (a)
• The kernel cannot find the block on the hash queue, and finds delayed
write buffers on hash queue
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4
delay
98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99
delay
freelist header

Search for block 18, Delayed write blocks on free list


57
Retrieval of a Buffer: 3rd Scenario (b)
Hash queue headers

28 64
blkno0 mod 4

17 5 97
blkno1 mod 4
writing

98 50 10 18
blkno2 mod 4

blkno3 mod 4 3 35 99

writing

freelist header

(b) Writing Blocks 3, 5, Reassign 4 to 18


Figure 3.8

58
Retrieval of a Buffer: 4th Scenario
• The kernel cannot find the buffer on the hash queue, and the free list is
empty
Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

blkno2 mod 4 98 50 10

blkno3 mod 4 3 35 99

freelist header

Search for block 18, free list empty 59


Race for free buffer
Retrieval of a Buffer: 5th Scenario
• Kernel finds the buffer on hash queue, but it is currently busy

Hash queue headers

28 4 64
blkno0 mod 4

17 5 97
blkno1 mod 4

98 50 10
blkno2 mod 4

blkno3 mod 4 3 35 99

busy
freelist header

Search for block 99, block busy 61


Race for a Locked buffer
Output:
Algorithms for Reading and writing disk blocks
Advantages of the buffer cache

• Uniform disk access => system design simpler (bcz the kernel
does not need to know the reason for the I/O)

• Copying data from user buffers to system buffers(and vice


versa) =>the kernel eliminates the need for special
alignment of user buffers, making user programs simpler
and more portable.
• Use of the buffer cache can reduce the amount of disk
traffic, thereby increasing overall system throughput and
decreasing response time.
• Single image of disk blocks contained in the cache => helps
insure file system integrity (prevents data corruption)
Disadvantages of the buffer cache

• Since the kernel does not immediately write data


to disk for a Delayed write =>the system vulnerable
to crashes that leave disk data in incorrect state
• Use of the buffer cache requires an extra data copy
when reading and writing to and from user
processes => slow down performance when
transmitting large data

You might also like