You are on page 1of 29

I/O Management and Disk

Scheduling
Chapter 11
I/O Driver
• OS module which controls an I/O device
• hides the device specifics from the above layers in the OS/kernel
• translates logical I/O into device I/O (logical disk blocks into {track, head, sector})


performs data buffering and scheduling of I/O operations

structure: several synchronous entry points (device
initialization, queue I/O requests, state control, read/write)
and an asynchronous entry point (to handle interrupts)
Typical driver structure
driver_strategy(request)
{ driver_init() {
if (empty(request-queue))

else
driver_start(request) }
add(request, request-queue)
block_current_process; reschedule()
}
driver_start(request) {
current_request= request;
start_dma(request); driver_interrupt(state) /* asynchronous part */
}
driver_ioctl(request) {
}
{
if (state==ERROR) && (retries++<MAX) {
driver_start(current_request);
return;
}
add_current_process_to_active_queue
if (! (empty(request_queue))
driver_start(get_next(request_queue))
}
User to Driver Control Flow

user read, write, ioctl

kernel ordinary file


special file

File System
character block
device device
Character queue Buffer Cache

driver_read/write driver-strategy
I/O Buffering
• before an I/O request is placed the source/destination of the I/O transfer must be locked in memory
• I/O buffering: data is copied from user space to kernel buffers which are pinned to memory
• works for character devices (terminals), network and disks
• buffer cache: a buffer in main memory for disk sectors
• character queue: follows the producer/consumer model (characters in the queue are read once)
• unbuffered I/O to/from disk (block device): VM paging for instance
Buffer Cache
• when an I/O request is made for a sector, the buffer cache is checked first
• if it is missing from the cache, it is read into the buffer cache from the disk
• exploits locality of reference as any other cache
• usually replacements done in chunks (a whole track can be written back at once to minimize seek time)
• replacement policies are global and controlled by the kernel
Replacement policies
• buffer cache organized like a stack: replace from the bottom
• LRU: replace the block that has been in the cache longest with no reference to it (on reference a block is moved to the top of the stack)
• LFU: replace the block with the fewest references (counters which are incremented on reference and blocks move accordingly)
• frequency-based replacement: define a new section on the top of the stack, counter is unchanged while the block is in the new section
Least Recently Used
• The block that has been in the cache the
longest with no reference to it is
replaced
• The cache consists of a stack of blocks
• Most recently referenced block is on the
top of the stack
• When a block is referenced or brought
into the cache, it is placed on the top of
the stack
Least Frequently Used
• The block that has experienced the fewest
references is replaced
• A counter is associated with each block
• Counter is incremented each time block
accessed
• Block with smallest count is selected for
replacement
• Some blocks may be referenced many times in
a short period of time and then not needed any
more
Application-controlled File Caching
• two-level block replacement: responsibility is split between kernel and user level
• a global allocation policy performed by the kernel which decides which process will give up a block
• a block replacement policy decided by the user:


kernel provides the candidate block as a hint to
the process

the process can overrule the kernel’s choice by
suggesting an alternative block

the suggested block is replaced by the kernel

examples of alternative replacement policy: most-recently
used (MRU)
Sound kernel-user cooperation
• oblivious processes should do no worse than under LRU
• foolish processes should not hurt other processes
• smart processes should perform better than LRU whenever possible and they should never perform worse


if kernel selects block A and user chooses B
instead, the kernel swaps the position of A and B
in the LRU list and places B in a “placeholder”
which points to A (kernel’s choice)

if the user process misses on B (i.e. he made a
bad choice), and B is found in the placeholder,
then the block pointed to by the placeholder is
chosen (prevents hurting other processes)
Disk Performance Parameters
• To read or write, the disk head must be
positioned at the desired track and at the
beginning of the desired sector
• Seek time
– time it takes to position the head at the
desired track
• Rotational delay or rotational latency
– time its takes for the beginning of the
sector to reach the head
Disk Performance Parameters
• Access time
– Sum of seek time and rotational delay
– The time it takes to get in position to read
or write
• Data transfer occurs as the sector moves
under the head
Disk I/O Performance
• disks are at least four orders of magnitude slower than the main memory
• the performance of disk I/O is vital for the performance of the computer system as a whole
• disk performance parameters


seek time (to position the head at the track): 20 ms

rotational delay (to reach the sector): 8.3 ms

transfer time: 1-2 MB/sec

access time (seek time+ rotational delay) >> transfer time for
a sector

therefore the order in which sectors are read matters a lot
Disk Scheduling Policies
• Seek time is the reason for differences in
performance
• For a single disk there will be a number
of I/O requests
• If requests are selected randomly, we
will get the worst possible performance
Disk Scheduling Policies
• First-in, first-out (FIFO)
– Process request sequentially
– Fair to all processes
– Approaches random scheduling in
performance if there are many processes
Disk Scheduling Policies
• Priority
– Goal is not to optimize disk use but to meet
other objectives
– Short batch jobs may have higher priority
– Provide good interactive response time
Disk Scheduling Policies
• Last-in, first-out
– Good for transaction processing systems
• The device is given to the most recent user so
there should be little arm movement
– Possibility of starvation since a job may
never regain the head of the line
Disk Scheduling Policies
• Shortest Service Time First
– Select the disk I/O request that requires the
least movement of the disk arm from its
current position
– Always choose the minimum Seek time
Disk Scheduling Policies
• SCAN
– Arm moves in one direction only, satisfying
all outstanding requests until it reaches the
last track in that direction
– Direction is reversed
Disk Scheduling Policies
• C-SCAN
– Restricts scanning to one direction only
– When the last track has been visited in one
direction, the arm is returned to the opposite
end of the disk and the scan begins again
Disk Scheduling Policies
• N-step-SCAN
– Segments the disk request queue into
subqueues of length N
– Subqueues are process one at a time, using
SCAN
– New requests added to other queue when
queue is processed
• FSCAN
– Two queues
– One queue is empty for new request
Disk Scheduling Policies
• usually based on the position of the requested sector rather than according to the process priority
• shortest-service-time-first (SSTF): pick the request that requires the least movement of the head
• SCAN (back and forth over disk): good distribution
• C-SCAN(one way with fast return):lower service variability but head may not be moved for a considerable period of time
• N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N
• FSCAN: uses two subqueues, during a scan one queue is consumed while the other one is produced
RAID
• Redundant Array of Independent Disks (RAID)
• idea: replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance
• RAID is a set of physical disk drives viewed by the OS as a single logical drive
• data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives
• redundant disk capacity is used to compensate the increase in the probability of failure due to multiple drives
• size RAID levels (design architectures)
RAID Level 0
• does not include redundancy
• data is stripped across the available disks


disk is divided into strips

strips are mapped round-robin to consecutive disks

a set of consecutive strips that map exactly one
strip to each array member is called stripe

strip 0 strip 1 strip 2 strip 3


strip 4 strip 5 strip 6 strip 7
...
RAID Level 1
• redundancy achieved by duplicating all the data
• each logical disk is mapped to two separate physical disks so that every disk has a mirror disk that contains the same data


a read can be serviced by either of the two disks
which contains the requested data (improved
performance over RAID 0 if reads dominate)

a write request must be done on both disks but can
be done in parallel

recovery is simple but cost is high
strip 0 strip 1 strip 0 strip 1
strip 2 strip 3 strip 2 strip 3
...
RAID Levels 2 and 3
• parallel access: all disks participate in every I/O request
• small strips (byte or word size)
• RAID 2: error correcting code (Hamming) is calculated across corresponding bits on each data disk and stored on log(data) parity disks; necessary only if error rate is high
• RAID 3: a single redundant disk which keeps the parity bit
P(i) = X2(i) + X1(i) + X0(i)
• in the event of failure, data can be reconstructed but only one request at the time can be satisfied

b0 b1 b2 P(b) X2(i) = P(i) + X1(i) + X0(i)


...
RAID Levels 4 and 5
• independent access: each disk operates independently, so multiple I/O request can be satisfied in parallel
• large strips
• RAID 4: for small writes: 2 reads + 2 writes
example: if write performed only on strip 0:
P’(i) = X2(i) + X1(i) + X0’1(i) =
X2(i) + X1(i) + X0’(i) + X0(i) + X0(i) =
P(i) + X0’(i) + X0(i)

• RAID 5: parity strips are distributed across all disks

strip 0 strip 1 strip 2 P(0-2)


strip 3 strip 4 strip 5 P(3-5)
UNIX SVR4 I/O

You might also like