You are on page 1of 714

Operating System

Second Edition
Operating System
Second Edition

Rohit Khurana
Founder and CEO
ITLESL, Delhi
VIKAS® PUBLISHING HOUSE PVT LTD
E-28, Sector-8, Noida-201301 (UP) India
Phone: +91-120-4078900 • Fax: +91-120-4078999
Registered Office: 576, Masjid Road, Jangpura, New Delhi-110014. India
E-mail: helpline@vikaspublishing.com
• Website: www.vikaspublishing.com
• Ahmedabad : 305, Grand Monarch, 100 ft Shyamal Road, Near
Seema Hall, Ahmedabad-380051 • Ph. +91-79-
65254204, +91-9898294208
• Bengaluru : First Floor, N S Bhawan, 4th Cross, 4th Main,
Gandhi Nagar, Bengaluru-560009 • Ph. +91-80-
22281254, 22204639
• Chennai : E-12, Nelson Chambers, 115, Nelson Manickam
Road, Aminjikarai, Chennai-600029 • Ph. +91-44-
23744547, 23746090
• Hyderabad : Aashray Mansion, Flat-G (G.F.), 3-6-361/8, Street
No. 20, Himayath Nagar, Hyderabad-500029 • Ph.
+91-40-23269992 • Fax. +91-40-23269993
• Kolkata : 82, Park Street, Kolkata-700017 • Ph. +91-33-
22837880
• Mumbai : 67/68, 3rd Floor, Aditya Industrial Estate, Chincholi
Bunder, Behind Balaji International School &
Evershine Mall, Malad (West), Mumbai-400064 •
Ph. +91-22-28772545, 28768301
• Patna : Flat No. 101, Sri Ram Tower, Beside Chiraiyatand
Over Bridge, Kankarbagh Main Rd., Kankarbagh,
Patna-800020 • Ph. +91-612-2351147

Operating System
ISBN: 978-93259-7563-7

First Edition 2013


Second Edition 2014

Vikas® is the registered trademark of Vikas Publishing House Pvt Ltd


Copyright © Author, 2013

All rights reserved. No part of this publication which is material protected by


this copyright notice may be reproduced or transmitted or utilized or stored in
any form or by any means now known or hereinafter invented, electronic,
digital or mechanical, including photocopying, scanning, recording or by any
information storage or retrieval system, without prior written permission from
the publisher.

Information contained in this book has been published by VIKAS® Publishing


House Pvt Ltd and has been obtained by its Authors from sources believed
to be reliable and are correct to the best of their knowledge. However, the
Publisher and its Authors shall in no event be liable for any errors, omissions
or damages arising out of use of this information and specifically disclaim
any implied warranties or merchantability or fitness for any particular use.
Disputes if any are subject to Delhi Jurisdiction only.
Preface to the Second Edition

It is a well known fact that operating system is an integral part of a


computer system. It is this which makes the computer system
function. The market today is constantly witnessing an upgradation
of popular operating systems, such as Microsoft Windows, Linux,
Mac OS X, Android, and many more. The continuous advancements
in technology have led the universities to revise their curricula so as
to suit the knowledge and learning needs of their students. To keep
pace with various updated curricula as well as equipping budding
system programmers with the right knowledge and expertise, we
have come up with the second edition of Operating System.
The revised edition has become more comprehensive with the
inclusion of several new topics. More emphasis has been laid on the
aesthetics of text to make it more presentable and appealing. Like its
previous edition, it continues to provide in-depth coverage of the
fundamentals as well as advanced topics in the discipline. In
addition, certain sections in the book have been thoroughly revised. I
hope the readers will find the present edition more helpful and
informative in expanding their knowledge in operating system.

Key Additions
Though several enhancements have been made to the text,
following are the key additions to this new edition.
• Chapter 1 introduces two more types of operating system,
including Personal Computer (PC) Operating System and Mobile
Operating System.
• Chapter 2 now also includes the methods used for
communication in client-server systems: Socket, Remote
Procedure Call (RPC) and Remote Method Invocation (RMI).
• The topics Thread Library and Thread Scheduling have been
added in Chapters 3 and 4, respectively.
• A few topics, including Principles of Concurrency, Precedence
Graph, Concurrency Conditions and Sleeping Barber Problem
have been added in Chapter 5.
• Chapter 7 comprises an additional topic Structure of Page
Tables.
• Chapter 8 introduces topics like Demand Segmentation and
Cache Memory Organization.
• Chapter 9 now covers the concept of STREAMS. It also throws
light on how I/O requests from the users are transformed to
hardware operations.
• Chapters 10 and 11 add topics such as Disk Attachment, Stable
Storage, Tertiary Storage, Record Blocking and File Sharing.
• The text of Chapter 13 is a complete overhaul and includes new
topics, such as Goals and Principles of Protection, Access
Control Matrix and its Implementation, Revocation of Access
Rights, Cryptography, Trusted Systems, and Firewalls.

Chapter Organization
The text is organized into 17 chapters.
• Chapter 1 introduces operating system, its services, and
structure. Also, it provides an insight into the organization of
computer system.
• Chapter 2 deals essentially with basic concepts of processes,
such as process scheduling, operations on processes and
communication between processes. It also introduces the
methods used for communication in client-server systems.
• Chapter 3 helps to understand the need and advantages of
threads, various multithreading models as well as threading
issues. It also introduces the concept of thread libraries and
discusses various operations on the threads of Pthread library.
• Chapter 4 spells out the scheduling criteria and different types of
scheduling algorithms. It also discusses several issues regarding
scheduling in multiprocessor and real-time systems.
• Chapter 5 throws light on several methods used for achieving
synchronization among cooperating processes.
• Chapter 6 describes the deadlock situation and the conditions
that lead to deadlock. It also provides methods for handling
deadlock.
• Chapter 7 familiarises the reader with the various memory
management strategies used for contiguous and non-contiguous
memory allocation.
• Chapter 8 introduces the concept of virtual memory. It also
discusses how virtual memory is implemented using demand
paging and demand segmentation.
• Chapter 9 discusses system I/O in detail, including the I/O system
design, interfaces, and functions. It also explains the STREAMS
mechanism of UNIX System V and the transformation of I/O
requests into hardware operation.
• Chapter 10 explains disk scheduling algorithms, disk
management, swap-space management and RAID. It also
introduces the concept of stable and tertiary storage.
• Chapter 11 acquaints the readers with basic concepts of files
including file types, attributes, operations, structure, and access
methods. It also describes the concepts of file-system mounting,
file sharing, record blocking and protection.
• Chapter 12 explores how the files and directories are
implemented. Management of free space on the disk is explained
as well.
• Chapter 13 disseminates to the reader the need of security and
protection in computer systems. It also explains methods to
implement the same.
• Chapter 14 sheds light on multiprocessor and distributed systems
including their types, architecture and benefits. It also describes
the distributed file system.
• Chapter 15 covers the UNIX operating system including its
development and structure. It discusses how processes,
memory, I/O, files, and directories are managed in UNIX. It also
introduces elementary shell programming.
• Chapter 16 presents an in-depth examination of the Linux
operating system. It describes how theoretical concepts of
operating system relate to one another and to practice.
• Chapter 17 expounds on implementation of various operating
system concepts in Windows 2000 operating system.
Acknowledgement

In all my efforts towards making this book a reality, my special thanks


go to my technical and editorial teams, without whom this work
would not have achieved its desired level of excellence. I sincerely
extend my thanks to my research and development team for
devoting their time and relentless effort in bringing out this high-
quality book. I convey my gratitude to my publisher Vikas Publishing
House Pvt. Ltd for sharing this dream and giving all the support in
realizing it.
In our attempt towards further improvement, I welcome you all to
send your feedback to itlesl@rediffmail.com. I will highly appreciate
all your constructive comments.
I hope you will enjoy reading the book and hope it proves to be a
good resource for all.

Rohit Khurana
Founder and CEO
ITLESL, Delhi
Contents

Preface to the Second Edition


Acknowledgement

1. Introduction to Operating System


1.1 Introduction
1.2 Operating System: Objectives and Functions
1.3 Different Views of an Operating System
1.4 Evolution of Operating Systems
1.4.1 Serial Processing 1.4.2 Batch Processing
1.4.3 Multiprogramming
1.5 Types of Operating Systems
1.5.1 Batch Operating Systems
1.5.2 Multiprogramming Operating Systems
1.5.3 Time-sharing Systems
1.5.4 Real-time Operating Systems
1.5.5 Distributed Operating Systems
1.5.6 Personal Computer Operating Systems
1.5.7 Mobile Operating Systems
1.6 Comparison between Different Operating Systems
1.7 Computer System Organization
1.7.1 Computer System Operation
1.7.2 Storage Structure 1.7.3 I/O Structure
1.8 Computer System Architecture
1.8.1 Single-Processor Systems 1.8.2 Multiprocessor Systems
1.8.3 Clustered Systems
1.9 Operating System Operations
1.9.1 Dual-Mode Operation 1.9.2 Timer
1.10 Operating-System Structures
1.10.1 System Components 1.10.2 Operating-system Services
1.10.3 User Operating-System Interface
1.10.4 System Calls 1.10.5 System Programs
1.10.6 System Structure 1.10.7 Virtual Machines
Let us Summarize
Exercises

2. Process Management
2.1 Introduction
2.2 Process Concept
2.2.1 The Process 2.2.2 Process States
2.2.3 Process Control Block (PCB)
2.3 Process Scheduling
2.4 Operations on Processes
2.4.1 Process Creation 2.4.2 Process Termination
2.5 Cooperating Processes
2.6 Inter-process Communication
2.6.1 Shared Memory Systems 2.6.2 Message Passing Systems
2.7 Communication in Client-Server Systems
2.7.1 Socket 2.7.2 Remote Procedure Call (RPC)
2.7.3 Remote Method Invocation (RMI)
Let us Summarize
Exercises

3. Threads
3.1 Introduction
3.2 Thread Concept
3.2.1 Advantages of Threads 3.2.2 Implementation of Threads
3.3 Multithreading Models
3.3.1 Many-to-One (M:1) Model 3.3.2 One-to-One (1:1) Model
3.3.3 Many-to-Many (M:M) Model
3.4 Threading Issues
3.4.1 fork() and exec() System Calls 3.4.2 Thread Cancellation
3.4.3 Thread-specific Data
3.5 Thread Libraries
3.5.1 Pthreads Library
Let us Summarize
Exercises

4. CPU Scheduduling
4.1 Introduction
4.2 Scheduling Concepts
4.2.1 Process Behaviour 4.2.2 When to Schedule
4.2.3 Dispatcher
4.3 Scheduling Criteria
4.4 Scheduling Algorithms
4.4.1 First-Come First-Served (FCFS) Scheduling
4.4.2 Shortest Job First (SJF) Scheduling
4.4.3 Shortest Remaining Time Next (SRTN) Scheduling
4.4.4 Priority-based Scheduling
4.4.5 Highest Response Ratio Next (HRN) Scheduling
4.4.6 Round Robin (RR) Scheduling 4.4.7 Multilevel Queue
Scheduling
4.4.8 Multilevel Feedback Queue Scheduling
4.5 Multiple Processor Scheduling
4.6 Real-time Scheduling
4.6.1 Hard Real-time Systems 4.6.2 Soft Real-time Systems
4.7 Algorithm Evaluation
4.8 Thread Scheduling
Let us Summarize
Exercises

5. Process Synchronization
5.1 Introduction
5.2 Principles of Concurrency
5.3 Precedence Graph
5.4 Critical Regions
5.4.1 Critical-Section Problem
5.5 Synchronization: Software Approaches
5.5.1 Strict Alternation: Attempt for Two-Process Solution
5.5.2 Dekker’s Algorithm: Two Process solution
5.5.3 Peterson’s Algorithm: Two-Process Solution
5.5.4 Bakery Algorithm: Multiple-Process Solution
5.6 Synchronization Hardware
5.7 Semaphores
5.8 Classical Problems of Synchronization
5.8.1 Producer-Consumer Problem
5.8.2 Readers-Writers Problem
5.8.3 Dining-Philosophers Problem
5.8.4 Sleeping Barber Problem
5.9 Monitors
5.10 Message Passing
Let us Summarize
Exercises

6. Deadlock
6.1 Introduction
6.2 System Model
6.3 Deadlock Characterization
6.3.1 Deadlock Conditions 6.3.2 Resource Allocation Graph
6.4 Methods for Handling Deadlocks
6.5 Deadlock Prevention
6.6 Deadlock Avoidance
6.6.1 Resource Allocation Graph Algorithm
6.6.2 Banker’s Algorithm
6.7 Deadlock Detection
6.7.1 Single Instance of Each Resource Type
6.7.2 Multiple Instances of a Resource Type
6.8 Deadlock Recovery
6.8.1 Terminating the Processes
6.8.2 Preempting the Resources
Let us Summarize
Exercises

7. Memory Management Strategies


7.1 Introduction
7.2 Background
7.3 Bare Machine
7.4 Contiguous Memory Allocation
7.4.1 Single Partition 7.4.2 Multiple Partitions
7.5 Non-contiguous Memory Allocation
7.5.1 Paging 7.5.2 Segmentation
7.5.3 Segmentation with Paging
7.6 Swapping
7.7 Overlays
Let us Summarize
Exercises

8. Virtual Memory
8.1 Introduction
8.2 Background
8.3 Demand Paging
8.3.1 Performance of Demand Paging
8.4 Process Creation
8.4.1 Copy-on-Write 8.4.2 Memory-Mapped Files
8.5 Page Replacement
8.5.1 FIFO Page Replacement 8.5.2 Optimal Page Replacement
8.5.3 LRU Page Replacement 8.5.4 Second Chance Page
Replacement
8.5.5 Counting-Based Page Replacement Algorithm
8.6 Allocation of Frames
8.7 Thrashing
8.7.1 Locality 8.7.2 Working Set Model
8.7.3 Page-fault Frequency (PFF)
8.8 Demand Segmentation
8.9 Cache Memory Organization
8.9.1 Terminologies Related to Cache
8.9.2 Impact on Performance
8.9.3 Advantages and Disadvantages of Cache Memory
Let us Summarize
Exercises

9. I/O Systems
9.1 Introduction
9.2 I/O Hardware
9.3 I/O Techniques
9.3.1 Polling 9.3.2 Interrupt-driven I/O
9.3.3 Direct Memory Access (DMA)
9.4 Application I/O Interface
9.5 Kernel I/O Subsystem
9.5.1 I/O Scheduling 9.5.2 Buffering 9.5.3 Caching
9.5.4 Spooling 9.5.5 Error Handling
9.6 Transforming I/O Requests to Hardware Operations
9.7 Streams
9.8 Performance
Let us Summarize
Exercises

10. Mass-Storage Structure


10.1 Introduction
10.2 Disk Structure
10.3 Disk Scheduling
10.3.1 First-Come, First-Served (FCFS) Algorithm
10.3.2 Shortest Seek Time First (SSTF) Algorithm
10.3.3 Scan Algorithm 10.3.4 Look Algorithm
10.3.5 C-Scan and C-Look Algorithms
10.4 Disk Management
10.4.1 Disk Formatting 10.4.2 Boot Block
10.4.3 Management of Bad Sectors
10.5 Swap-Space Management
10.6 Raid Structure
10.6.1 Improving Performance and Reliability
10.6.2 Raid Levels
10.7 Disk Attachment
10.7.1 Host-attached Storage 10.7.2 Network-attached Storage
10.8 Stable Storage
10.9 Tertiary Storage
10.9.1 Removable Disks 10.9.2 Magnetic Tapes
Let us Summarize
Exercises

11. File Systems


11.1 Introduction
11.2 Files: Basic Concept
11.2.1 File Attributes 11.2.2 File Operations
11.2.3 File Types 11.2.4 File Structure 11.2.5 File Access
11.3 Directories
11.3.1 Single-level Directory System
11.3.2 Two-level Directory System
11.3.3 Hierarchical Directory System 11.3.4 Directory
Operations
11.4 File-System Mounting
11.5 Record Blocking
11.6 File Sharing
11.6.1 File Sharing among Multiple Users
11.6.2 File Sharing in Remote File Systems
11.6.3 Consistency Semantics
11.7 Protection
11.7.1 Types of Access 11.7.2 Access Control
Let us Summarize
Exercises
12. Implementation of File System
12.1 Introduction
12.2 File System Structure
12.3 File System Implementation
12.3.1 Operating Structures 12.3.2 Partitions and Mounting
12.3.3 Virtual File System (VFS)
12.4 Allocation Methods
12.4.1 Contiguous Allocation 12.4.2 Linked Allocation
12.4.3 Indexed Allocation
12.5 Implementing Directories
12.5.1 Linear List 12.5.2 Hash Table
12.6 Shared Files
12.7 Free-Space Management
12.8 Efficiency and Performance
12.8.1 Efficiency 12.8.2 Performance
12.9 Recovery
12.10 Log-Structured File System
Let us Summarize
Exercises

13. Protection and Security


13.1 Introduction
13.2 Goals of Protection
13.3 Principles of Protection
13.4 Protection Mechanisms
13.4.1 Protection Domain 13.4.2 Access Control Matrix
13.5 Revocation of Access Rights
13.6 Security Problem
13.6.1 Intruders 13.6.2 Types of Security Violations
13.7 Design Principles for Security
13.8 Security Threats
13.8.1 Program Threats 13.8.2 System and Network Threats
13.9 Cryptography
13.9.1 Encryption
13.10 User Authentication
13.10.1 Passwords 13.10.2 One-time Passwords
13.10.3 Smart Card 13.10.4 Biometric Techniques
13.11 Trusted Systems
13.12 Firewalling to Protect Systems and Networks
Let us Summarize
Exercises

14. Multiprocessor and Distributed Operating Systems


14.1 Introduction
14.2 Multiprocessor Systems
14.2.1 Interconnection Networks
14.2.2 Architecture of Multiprocessor Systems
14.2.3 Types of Multiprocessor Operating System
14.3 Distributed Systems
14.3.1 Distributed Operating System
14.3.2 Storing and Accessing Data in Distributed Systems
14.4 Computer Networks
14.4.1 Types of Networks 14.4.2 Network Topology
14.4.3 Switching Techniques 14.4.4 Communication Protocols
14.5 Distributed File System
Let us Summarize
Exercises

15. Case Study: UNIX


15.1 Introduction
15.2 History of UNIX
15.3 UNIX Kernel
15.4 Process Management
15.4.1 Process Creation and Termination
15.4.2 Inter-process Communication 15.4.3 Process
Scheduling
15.4.4 System Calls for Process Management
15.5 Memory Management
15.5.1 Implementation of Memory Management
15.5.2 System Calls for Memory Management
15.6 File and Directory Management
15.6.1 UNIX File System
15.6.2 UNIX Directory Structure
15.6.3 System Calls for File and Directory Management
15.7 I/O Management
15.8 Elementary Shell Programming
15.8.1 Logging in UNIX 15.8.2 Basic Shell Commands
15.8.3 Standard Input, Output and Error 15.8.4 Re-direction
15.8.5 Wildcards 15.8.6 Filters 15.8.7 Shell Program
Let us Summarize
Exercises

16. Case Study: Linux


16.1 Introduction
16.2 The Linux System
16.3 Process and Thread Management
16.3.1 Creation and Termination of Processes and Threads
16.3.2 Process Scheduling
16.4 Memory Management
16.4.1 Physical Memory Management
16.4.2 Virtual Memory Management
16.5 File System
16.5.1 Linux ext2 File System 16.5.2 Linux ext3 File System
16.5.3 Linux proc File System
16.6 I/O Management
Let us Summarize
Exercises

17. Case Study: Windows


17.1 Introduction
17.2 Structure
17.2.1 Hardware Abstraction Layer (HAL)
17.2.2 Kernel 17.2.3 Executive
17.3 Process and Thread Management
17.3.1 Inter-process Communication (IPC)
17.3.2 Scheduling
17.4 Memory Management
17.4.1 Paging 17.4.2 Handling Page Faults
17.5 File System
17.5.1 NTFS Physical Structure 17.5.2 Metadata Files
17.5.3 Directory Implementation
17.6 I/O Management
Let us Summarize
Exercises

Glossary
chapter 1

Introduction to Operating System

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Define the term operating system along with its objectives and
functions.
⟡ Understand different views of an operating system.
⟡ Explore how operating systems have evolved.
⟡ Discuss different types of operating systems and compare them.
⟡ Describe the basic computer system organization and architecture.
⟡ Describe operations of an operating system.
⟡ Understand the components of an operating system.
⟡ List the services provided by an operating system.
⟡ Describe the two types of user interface.
⟡ Explain different types of system calls.
⟡ List different categories of system programs.
⟡ Discuss various ways of structuring an operating system.
⟡ Understand the concept of virtual machines.

1.1 INTRODUCTION
A computer system consists of two main components: the hardware
and the software. The hardware components include the central
processing unit (CPU), the memory, and the input/output (I/O)
devices. The software part comprises the system and application
programs such as compilers, text editors, word processors,
spreadsheets, database systems, etc. An application program is
developed by an application programmer in some programming
language. The application programmers and the end users (users
who interact with the application programs to solve their problems)
are generally not concerned with the details of the computer
hardware, and hence do not directly interact with it. Thus, to use the
various hardware components, the application programs and the
users need an intermediate layer that provides a convenient interface
to use the system. This layer is referred to as an operating system
(OS).

1.2 OPERATING SYSTEM: OBJECTIVES AND


FUNCTIONS
In simple terms, the operating system is defined as a program that is
running at all times on the computer (usually called the kernel). It is a
program that acts as an interface between the computer users and
the computer hardware (see Figure 1.1). It manages the computer
hardware and controls and coordinates the use of hardware among
various application programs. The operating system also provides a
way in which the various computer resources such as hardware,
software, and the data can be used in a proper and efficient manner.
An operating system has two common objectives: convenience
and efficiency. An operating system is designed in such a way that it
makes the computer system more convenient to use, and allows the
system resources to be used in an efficient manner. Some operating
systems are designed for convenience (for example, PC operating
systems), some for efficiency (for example, mainframe operating
systems), and some for the combination of both. Another important
objective that operating systems are expected to meet is their ability
to evolve. An operating system should be designed in such a way
that the testing, development as well as addition of new system
components could be performed effectively without interfering with
existing services.

Fig. 1.1 Components of a Computer System

An operating system performs two basically unrelated functions:


extending the machine and managing resources. Both these
functions are described here.

Operating System as an Extended Machine


An operating system can be described as an extended machine that
hides the hardware details from the user and makes the computer
system convenient and easy to use. As we know, a computer system
consists of several complex hardware components, and it is very
difficult for an end user to directly interact with these components.
Imagine a scenario where a user has to specify the parameters such
as the address of the block to be read, the number of sectors per
track, the inter-sector gap spacing, etc., while accessing some data
from a floppy disk. In such situations, it would be quite complex for
the users to perform even simple operations. Instead the user just
needs to know the name of the file from which the data is to be
accessed. One need not be concerned about the internal details of
the disk.
It is the responsibility of the operating system to give an abstract
view of the system to its users, without giving any unnecessary
details of the hardware components. For example, an operating
system hides the complex details of disk hardware, and simply gives
a file-oriented interface to its users. In the same way, it hides other
low-level features such as memory management, interrupts, timers,
I/O management, etc. Thus, the function of an operating system is to
present the user with the equivalent of an extended machine (or
virtual machine) that is easier to deal with than the underlying
hardware.

Operating System as a Resource Manager


A computer system comprises a wide variety of resources such as
files, databases, memory and I/O devices. As a resource manager, it
is the responsibility of the operating system to manage all these
resources, and allocate them to various programs and users in such
a way that the computer system can be operated in an efficient and
fair manner. In the case of a stand-alone environment, where a single
user uses all the computer resources, managing them is not a big
deal. However, in the case of a multiuser (or networked) environment,
where multiple users compete for the resources, the need for
managing and protecting the resources is even greater. The
operating system, in this case, is responsible for keeping track of
allocated resources, granting resource request (in case the resource
is free), accounting for usage of resource, and resolving conflicting
requests from different programs and users.
While managing resources, the operating system allows two types
of multiplexing (or sharing) of resources, namely, time multiplexing
and space multiplexing. In the case of time multiplexing of a
resource, different programs or users use it turn by turn. That is, the
resource is first allocated to any one of the user for a specified period
of time, then to another user, and so on. It is the responsibility of the
operating system to decide which user will get the resource next and
for how much time.
In the case of space multiplexing of a resource, each user gets
some of its portion, instead of taking turns. The main memory is the
most common example of space-multiplexed resource. The actively
running programs get some area in the main memory for their
execution. In this case, the main responsibility of the operating
system is to allocate and deallocate memory to the programs
whenever required, keep track of occupied and free parts of the
memory, and provide some mechanism to protect the memory
allocated to a program from being accessed by other programs.
Another example of space-multiplexed resource is hard disk, which
can hold multiple files of different users at the same time. The
operating system is responsible for allocating disk space and keeping
track of which user is using which disk blocks.

1.3 DIFFERENT VIEWS OF AN OPERATING


SYSTEM
An operating system can be compared to a government. Like a
government, it provides an environment for the other programs so
that they can do useful work. We can understand the role of the
operating system more fully by exploring it from two different
viewpoints: the user point of view and the system point of view.

User View
In a stand-alone environment, where a single user sits in front of a
personal computer, the operating system is designed basically for the
ease of use, and some attention is also paid to system
performance. However, since these systems are designed for a
single user to monopolize the resources, there is no sharing of
hardware and software among multiple users. Therefore, no attention
is paid to resource utilization.
In a networked environment, where multiple users share
resources and may exchange information, the operating system is
designed for resource utilization. In this case, the operating system
ensures that the available processor time, memory, and I/O devices
are used efficiently, and no individual user tries to monopolize the
system resources. In case, the various users are connected to a
mainframe or a minicomputer via their terminals, no attention is
paid to usability of individual systems. However, in case the users are
connected to the servers via their workstations, a compromise
between individual usability and resource utilization is made while
designing the operating system.
In handheld systems, the operating system is basically designed
for individual usability as these systems are mostly stand-alone units
for individual users. Finally, for the computers that have little or no
user view such as embedded systems, the operating system is
basically designed to ensure that these systems will run without user
intervention.

System View
As discussed earlier, the computer system consists of many
resources such as CPU time, memory, and I/O devices, which are
required to solve a computing problem. It is the responsibility of the
operating system to manage these resources, and allocate them to
various programs and users in a way such that the computer system
can be operated in an efficient and fair manner. Thus, from the
system’s point of view, the operating system primarily acts as a
resource allocator.
The operating system also acts as a control program that
manages the execution of user programs to avoid errors and
improper use of computer system. It also controls the I/O devices and
their operations.
1.4 EVOLUTION OF OPERATING SYSTEMS
The operating system may process its work serially or concurrently.
That is, it can dedicate all the computer resources to a single
program until the program finishes or can dynamically assign the
resources to various currently active programs. The execution of
multiple programs in an interleaved manner is known as
multiprogramming. In this section, we will discuss how the operating
systems have been evolved from serial processing to
multiprogramming systems.

1.4.1 Serial Processing


During the time period from late 1940s to the mid-1950s, there was
no existence of operating systems. The programmers used to interact
directly with the computer hardware by writing programs in machine
language. The programs and the data were entered into the computer
with the help of some input device such as a card reader. In addition
to input devices, the machines also consisted of display lights that
were used to indicate an error condition in case an error occurred
during any operation. If the execution of programs was completed
successfully, the output appeared (after minutes, hours, or days) in
the printed form using the line printers attached with the machine.
Whenever a programmer needed to operate the computer
system, he had to first reserve the machine time by signing up for a
block of time on a hardcopy sign-up sheet. After signing up, the
programmer used to enter the machine room and spend the desired
block of time working on the system. Once the desired block of time
of the programmer was finished, the next programmer in the queue
was supposed to perform the same procedure. Thus, all the users
were allowed to access the computer sequentially (one after the
other), hence, termed as serial processing.
The main problem associated with serial processing was that in
some cases, a programmer might sign up for 2 hours but finished his
or her work in 1.5 hours. In such situations, the computer processing
time would get wasted as no other programmer was allowed to enter
into the machine room during that time.
With the advancement in technologies, various system software
tools were developed that made serial processing more efficient.
These tools include language translators, loaders, linkers, debuggers,
libraries of common routines, and I/O routines. The programmers
could now code their programs in a programming language, which
could then be translated into an executable code with the help of a
language translator such as a compiler or an interpreter. The loader
automated the process of loading executable programs into memory.
It automatically transferred the program and the data from the input
device to the memory. The debuggers assisted the programmers in
detecting and examining the errors that occurred during program
execution (run-time errors). The linkers were used to link the
precompiled routines with the object code of the program so that they
could be executed along with the object code to produce the desired
output.
Though the use of the system software tools made the serial
processing a bit efficient, but still serial processing resulted in low
utilization of resources. The user productivity was also low as the
users had to wait for their turns.

1.4.2 Batch Processing


In the mid-1950s, the transistors were introduced that changed the
entire scenario. Computers now became more reliable and sold to the
customers with the expectation that they would continue to work for a
long time. These machines were named mainframes, and were
generally operated by the professional operators. These machines
were so expensive that only major government agencies and big
corporations could afford them.
A clear separation was made between computer operators and
programmers. The programmers used to prepare a job that consisted
of the instructions, data and some control information about the
nature of the job, and submit it to the computer operator. The jobs
were generally in the form of punched cards. When the currently
running job was finished, the operator would take off the output using
the printer (which may be kept in other room), and the programmer
could collect the output later at any time. The operator performed the
same process for all the card decks submitted by the programmers.
Much computer time was wasted while the operator was moving from
one room to another.
To reduce this wasted time and speed up the processing, the
operator used to batch together the jobs with similar requirements,
and run these batches one by one. This system was known as batch
processing system. For example, the jobs that need FORTRAN
compiler can be batched together so that the FORTRAN compiler can
be loaded only once to process all these jobs. Note that the jobs in a
batch are independent of each other and belong to different users.
To improve resource utilization and user productivity, the first
operating system was developed by General Motors for IBM 701 in
the mid-1950s. This operating system was termed as a batch
operating system. Its major task was to transfer control
automatically from one job to the next in the batch without the
operator’s intervention. This was achieved by automating the
transition from execution of one job to that of the next in the batch.
Batch processing is implemented by the kernel (also known as
batch monitor), which is the memory-resident portion of the batch
operating system (see Figure 1.2). The rest part of the memory is
allocated to the user jobs one at a time.
Fig. 1.2 Memory Layout for a Batch System

When a batch of similar jobs is submitted for processing, the


batch monitor reads the card reader and loads the first job in the
memory for processing. The beginning and end of each job in the
batch is identified by the JOB_START and JOB_END command,
respectively. When the batch monitor encounters the JOB_START
command, it starts the execution of the job, and when it encounters
the JOB_END command, it searches for another job in the batch.
Finally, when all the jobs in the batch are finished, the batch monitor
waits for the next batch to be submitted by the operator. Hence, the
operator intervention is required only at the time of start and end of a
batch.
The main disadvantage of batch processing is that during the
execution, the CPU is often idle, because of the speed difference
between the CPU and I/O devices. To overcome the problem of
speed mismatch, the concept of simultaneous peripheral operation
online (SPOOLing) came into existence. Instead of inputting the jobs
from the card readers, the jobs were first copied from the punched
cards to the magnetic tape. The magnetic tape was then mounted on
a tape drive, and the operating system read the jobs from the input
tape. Similarly, instead of getting output directly on the printer, it was
sent to the magnetic tape. Once the jobs were finished, the output
tape was removed and connected to the printer (which is not
connected to the main computer) for printing the output. The
magnetic tapes were much faster than the card readers and printers.
This reduced the CPU idle time by solving the problem of speed
mismatch.
With the introduction of disk technology, the batch operating
system started keeping all the jobs on the disk rather than on the
tapes. The disks are much faster than the magnetic tapes, and allow
direct access, and hence, the problem of speed mismatch was further
reduced.

1.4.3 Multiprogramming
Though the batch processing system attempted to utilize the
computer resources like CPU and I/O devices efficiently, but it still
dedicated all resources to a single job at a time. The execution of a
single job cannot keep the CPU and I/O devices busy at all times
because during execution, the jobs sometimes require CPU and
sometimes I/O devices, but not both at one point of time. Hence,
when the CPU is busy, the I/O devices have to wait, and when the I/O
devices are busy, the CPU remains idle.
For example, consider two jobs P1 and P2 such that both of them
require CPU time and I/O time alternatively. The serial execution of P1
and P2 is shown in Figure 1.4 (a). The shaded boxes show the CPU
activity of the jobs, and white boxes show their I/O activity. It is clear
from the figure that when P1 is busy in its I/O activity, the CPU is idle
even if P2 is ready for execution.
The idle time of CPU and I/O devices can be reduced by using
multiprogramming that allows multiple jobs to reside in the main
memory at the same time. If one job is busy with I/O devices, CPU
can pick another job and start executing it. To implement
multiprogramming, the memory is divided into several partitions,
where each partition can hold only one job. The jobs are organized in
such a way that the CPU always has one job to execute. This
increases the CPU utilization by minimizing the CPU idle time.
The basic idea behind multiprogramming is that the operating
system loads multiple jobs into the memory from the job pool on the
disk. It then picks up one job among them and starts executing it.
When this job needs to perform the I/O activity, the operating system
simply picks up another job, and starts executing it. Again when this
job requires the I/O activity, the operating system switches to the third
job, and so on. When the I/O activity of the job gets finished, it gets
the CPU back. Therefore, as long as there is at least one job to
execute, the CPU will never remain idle. The memory layout for a
multiprogramming batched system is shown in Figure 1.3.

Fig. 1.3 Memory Layout for a Multiprogramming System

Figure 1.4 (b) shows the multiprogrammed execution of jobs P1


and P2; both are assumed to be in memory and waiting to get CPU
time. Further assume that job P1 gets the CPU time first. When P1
needs to perform its I/O activity, the CPU starts executing P2. When
P2, needs to perform the I/O activity, the CPU again switches to P1,
and so on. This type of execution of multiple processes is known as
concurrent execution.
Note that for simplicity, we have considered the concurrent
execution of only two programs P1 and P2, but in general there are
more than two programs that compete for system resources at any
point of time. The number of jobs competing to get the system
resources in multiprogramming environment is known as degree of
multiprogramming. In general, higher the degree of
multiprogramming, more will be the resource utilization.

Fig. 1.4 Serial and Multiprogrammed Execution

In multiprogrammed systems, the operating system is responsible


to make decisions for the users. When a job enters into the system, it
is kept in the job pool on the disk that contains all those jobs that are
waiting for the allocation of main memory. If there is not enough
memory to accommodate all these jobs, then the operating system
must select among them to be loaded into the main memory. Making
this decision is known as job scheduling, which is discussed in
Chapter 4. To keep multiple jobs in the main memory at the same
time, some kind of memory management is required, which is
discussed in detail in Chapter 7. Moreover, if multiple jobs in the main
memory are ready for execution at the same time, the operating
system must choose one of them. Making this decision is known as
CPU scheduling, which is discussed in Chapter 4.
The main drawback of multiprogramming systems is that the
programmers have to wait for several hours to get their output.
Moreover, these systems do not allow the programmers to interact
with the system. To overcome these problems, an extension of
multiprogramming systems, called time-sharing systems is used. In
time-sharing (or multitasking) systems, multiple users are allowed
to interact with the system through their terminals. Each user is
assigned a fixed time-slot in which he or she can interact with the
system. The user interacts with the system by giving instructions to
the operating system or to a program using an input device such as
keyboard or a mouse, and then waits for the response.
The response time should be short—generally within 1 second.
The CPU in time-sharing system switches so rapidly from user to
another that each user gets the impression that only he or she is
working on the system, even though the system is being shared by
multiple users simultaneously. A typical time-sharing system is shown
in Figure 1.5.
Fig. 1.5 Time-sharing System

The main advantage of time-sharing systems is that they provide


a convenient environment in which the users can develop and
execute their programs. Unlike batch processing systems, they
provide quicker response time, and allow users to debug their
program interactively under the control of a debugging program.
Moreover, the users are allowed to share the system resources in
such a way that each user gets an impression that he or she has all
the resources with himself or herself.
The concept of time sharing was demonstrated in early 1960s, but
since it was expensive and difficult to implement at that time, they
were not used until the early 1970s. However, these days, most of the
systems are time sharing.

1.5 TYPES OF OPERATING SYSTEMS


In the previous section, we have discussed the history and evolution
of operating systems from serial processing to time-sharing systems.
In this section, we will discuss each type of operating system in the
context of the level of complexity involved in performing different OS
functions such as CPU scheduling, memory management, I/O
management, and file management.

1.5.1 Batch Operating Systems


In batch systems, the instructions, data, and some control information
are submitted to the computer operator in the form of a job. The users
are not allowed to interact with the computer system. Thus, the
programs (such as payroll, forecasting, statistical analysis, and large
scientific applications) that do not require interaction are well-served
by a batch operating system.
Since jobs are executed in first-come-first-served (FCFS) manner,
the batch operating system requires very simple CPU scheduling
techniques. In addition, the batch system allows only one user
program to reside in the memory at a time, and thus, memory
management is also very simple in a batch operating system.
Since only one program is in execution at a time, any time-critical
device management is not required, which simplifies the I/O
management. Files are also accessed in a serial manner; therefore,
no concurrency control mechanism for file access is required, which
makes file management also very simple in a batch operating system.

1.5.2 Multiprogramming Operating Systems


Multiprogramming systems allow concurrent execution of multiple
programs, and hence, multiprogramming operating systems require
more sophisticated scheduling algorithms. The programs should be
scheduled in such a way that CPU remains busy for the maximum
amount of time. Memory management must provide isolation and
protection of multiple programs residing simultaneously in the main
memory. Multiprogramming operating systems allow sharing of I/O
devices among multiple users, and hence, more sophisticated I/O
management is required. File management in a multiprogramming
operating system must provide advanced protection, and concurrency
control methods.

1.5.3 Time-sharing Systems


Since a time-sharing system is an extension of a multiprogrammed
system, the operating system in time-sharing systems is even more
complex than in multiprogramming systems. Time-sharing systems
require more complicated CPU scheduling algorithms. Most time-
sharing systems make use of round-robin scheduling algorithm in
which each program is given a system-defined time slice for its
execution. When this time slice gets over, and the program still
requires CPU for its execution, it is interrupted by the operating
system and is placed at the end of the queue of waiting programs.
Memory management in time-sharing systems must provide isolation
and protection of multiple programs residing simultaneously in the
main memory. I/O management in a time-sharing system must be
sophisticated enough to cope with multiple users and devices. Like
multiprogramming operating systems, file management in a
timesharing system must provide advanced protection, access
control, and concurrency control methods.

1.5.4 Real-time Operating Systems


In real-time systems, the correctness of the computations depends
not only on the output of the computation but also on the time at
which the output is generated. A real-time system has well-defined,
fixed time constraints. If these time constraints are not met, the
system is said to have failed in spite of producing the correct output.
Thus, the main aim of real-time systems is to generate the correct
result within specified time constraints. Consider an example of a car
running on an assembly line. Certain actions are to be taken at
certain instants of time. If the actions are taken too early or too late,
then the car will be ruined. Therefore, for such systems, the deadlines
must be met in order to produce the correct result. The deadline
could be the start time or the completion time. Generally, the time
deadline refers to the completion time. Some other examples of real-
time systems are air-traffic control systems, fuel-injection systems,
robotics, undersea exploration, etc.
The real-time systems are of two types: hard real-time systems
and soft real-time systems. In hard real-time systems, the actions
must be taken within the specified timeline; otherwise, undesirable
results may be produced. Industrial control and robotics are the
examples of hard real-time systems. In soft real-time systems, it is
not mandatory to meet the deadline. A real-time task always gets the
priority over other tasks, and retains the priority until its completion. If
the deadline could not be met due to any reason, then it is possible to
reschedule the task and complete it. Multimedia, virtual reality, and
advanced scientific applications such as undersea exploration come
under the category of soft real-time systems.
In real-time systems, a process is activated upon the occurrence
of an event, which is often signaled by an interrupt. Each process is
assigned a certain level of priority depending on the importance of the
event it services. The CPU is assigned to the process having the
highest priority among competing processes; hence, priority-based
preemptive scheduling (discussed in Chapter 4) is generally used in
real-time systems.
Memory management in real-time systems is less demanding
because the number of processes residing in the memory is fairly
static, and processes generally do not move between the main
memory and the hard disk. However, protection and sharing of
memory is essential because processes tend to cooperate closely.
I/O management in real-time operating systems includes
providing time-critical device management, interrupt management
and I/O buffering. File management in real-time operating systems
must provide the same functionality as found in timesharing systems,
which include protection and access control. However, here the main
objective of file management is to provide faster access to files
(because of timing constraints), instead of providing efficient
utilization of secondary storage.

1.5.5 Distributed Operating Systems


A distributed system is basically a computer network in which two or
more autonomous computers are connected via their hardware and
software interconnections, to facilitate communication and
cooperation. The computers can be interconnected by telephone
lines, coaxial cables, satellite links, radio waves, etc.
The main objective of a distributed operating system is to provide
transparency to its users. That is, users should not be bothered about
how various components and resources are distributed in the system.
A distributed operating system is generally designed to support
system-wide sharing of resources, such as I/O devices, files, and
computational capacity. In addition to providing typical operating
system services to local clients at each node, a distributed operating
system must provide some other services, such as global naming
conventions, distributed file system (DFS), internode process
communication, and remote procedure call (RPC).

1.5.6 Personal Computer Operating Systems


The personal computer (PC) systems are the most widely known
systems. The ultimate goal of PC operating systems is to provide a
good interface to a single user. They are commonly used for word
processing, spreadsheets, Internet access, etc. Microsoft Disk
Operating System (MS-DOS), Microsoft Windows, and Apple Mac OS
X are some popular PC operating systems. Some PC operating
systems are designed to run on several kinds of PC hardware. For
example, Windows is such an operating system that can run on a
variety of computer systems. In contrast, certain operating systems
can run only on specific PC hardware. For example, Mac OS X
operating system is specifically designed for Apple hardware.

1.5.7 Mobile Operating Systems


A mobile operating system (or mobile OS) is an operating system that
has been specifically designed for mobile devices including cell
phones, PDAs, tablet PCs, smart phones, and other hand-held
devices. Modern mobile operating systems mix the features of PC
operating systems with many other features, such as touch screen,
video camera, voice recorder, Bluetooth, Infrared, WiFi, GPS mobile
navigation, speech recognition, etc. Like a PC operating system
controls the desktop or laptop computer, a mobile operating system
also provides an environment for other programs to run on mobile
devices. With the continual growth in the use of mobile devices, most
companies have launched their own mobile operating system in the
market. However, some of the most popular mobile operating
systems are Android from Google Inc., Bada from Samsung
Electronics, iPhone OS from Apple, Symbian OS from Symbian Ltd.,
and Windows from Microsoft.

1.6 COMPARISON BETWEEN DIFFERENT


OPERATING SYSTEMS
In this section, we discuss the differences between different operating
systems that we have studied so far.
Table 1.1 lists the differences between a batch operating system
and a multiprogramming operating system.
Table 1.1 Differences between a Batch Operating System and a
Multiprogramming Operating System

Batch Operating System Multiprogramming Operating


System
• A batch of similar jobs that • Multiple programs appear to
consist of instructions, run concurrently by rapidly
data, and system switching the CPU between
commands is submitted to the programs.
the operator and one job is
executed at a time.
• The jobs are executed in • Multiprogramming systems
the order in which they need more sophisticated
were submitted, that is, scheduling algorithm, as
FCFS basis. multiple processes reside in
the main memory at the
same time.
• These systems do not tend • These systems ensure that
to achieve efficient CPU the CPU has always
utilization. something to execute, thus
increasing the CPU
utilization.
• Simple memory • Some form of memory
management is required. management is needed to
keep several jobs in memory
at the same time.
• Access to files is serial, so • A number of processes may
simple file management is attempt to access a file at the
needed. same time. This demands for
advanced protection and
concurrency control methods.
• At most one program can • More sophisticated device
be executing at a time; management is needed in
thus, time-critical device multiprogramming systems as
management is not needed devices are shared among
in batch systems. several programs.

Table 1.2 lists the differences between a batch operating system


and a real-time operating system.
Table 1.2 Differences between a Batch Operating System and a Real-Time
Operating System

Batch Operating System Real-Time Operating System


• Batch systems are well • Real-time systems are
suited for applications that designed for applications that
have a long execution time need quick response, thus
and do not need any quick meeting the scheduling
response. deadlines.
• These systems do not • These systems may require
involve any user user interaction.
interaction.
• FCFS scheduling algorithm • Majority of real-time systems
is used to execute the jobs. use priority-based preemptive
scheduling.
• The main objective is to • The main goal is to process the
execute a large number of individual records with
jobs as efficiently as minimum delay.
possible.
• Time-critical device • One of the main characteristics
management is not needed of real-time systems is time-
in batch systems. critical device management.

Table 1.3 lists the differences between a multiprogramming


operating system and a time-sharing operating system.
Table 1.3 Differences between a Multiprogramming Operating System and a
Time-Sharing Operating System

Multiprogramming Operating Time-Sharing Operating System


System
• It allows several programs • It is the logical extension of
to use the CPU at the multiprogramming system that
same time and does not supports interactive users and
support user interaction. provides a quick response
time.

Multiprogramming Operating Time-Sharing Operating System


System
• The main objective is to • The main objective is to
maximize the CPU minimize the response time by
utilization by organizing sharing the computing
programs such that the resources among several
CPU has always one to users. The CPU switches
execute. between multiple users so
frequently that each user gets
the impression that only he or
she is using the CPU alone,
while actually it is the one CPU
shared among many users.
• A context switch occurs • A context switch occurs each
only when the currently time after the time slice of
running process stalls and currently running process is
the CPU is to be allocated over.
to some other process.
• Multiprogramming system is • It is more complex than a
less complex than a time- multiprogramming system.
sharing system.

Table 1.4 lists the differences between a time-sharing operating


system and a real-time operating system.
Table 1.4 Differences between Time-Sharing Operating System and a Real-
Time Operating System

Time-Sharing Operating System Real-Time Operating System


• It is the logical extension of • This system is designed for
multiprogramming system environments where a huge
that supports interactive number of events must be
users and is designed to processed within fixed time
provide a quick response constraints.
time.
• More sophisticated memory • In these systems, the programs
management is needed in remain in the main memory
time-sharing systems to most of the time and there is a
provide separation and little swapping of programs
protection of multiple user between main and secondary
programs. memory. Thus, memory
management is less
demanding in real-time
systems.
• Round-robin scheduling is • Most real-time systems use
used to execute the priority-based pre-emptive
programs. scheduling.
• This system tends to • Effective resource utilization
reduce the CPU idle time and user convenience are of
by sharing it among secondary concern in a real-
multiple users. time system.

1.7 COMPUTER SYSTEM ORGANIZATION


The operating systems have always been closely tied to the
organization of the computer on which they run. Therefore, in this
section we will describe the basic computer organization. A computer
system basically consists of one or more processors (CPUs), several
device controllers, and the memory. All these components are
connected through a common bus that provides access to shared
memory. Each device controller acts as an interface between a
particular I/O device and the operating system. Thus, a device
controller plays an important role in operating that particular device.
For example, the disk controller helps in operating disks, USB
controller in operating mouse, keyboard, and printer, graphics adapter
in operating monitor, sound card in operating audio devices, and so
on. In order to access the shared memory, the memory controller is
also provided that synchronizes the access to the memory. The
interconnection of various components via a common bus is shown in
Figure 1.6.

Fig. 1.6 Bus Interconnection

1.7.1 Computer System Operation


When the system boots up, the initial program that runs on the
system is known as bootstrap program (also known as bootstrap
loader). The bootstrap program is typically stored in read-only
memory (ROM) or electrically erasable programmable ROM
(EEPROM). During the booting process, all the aspects of the system
like CPU registers, device controllers, and memory contents are
initialized, and then the operating system is loaded into the memory.
Once the operating system is loaded, the first process such as “init” is
executed, and the operating system then waits for some event to
occur.
The event notification is done with the help of an interrupt that is
fired either by the hardware or the software. When the hardware
needs to trigger an interrupt, it can do so by sending a signal to the
CPU via the system bus. When the software needs to trigger an
interrupt, it can do so with the help of system call (or monitor call).
Whenever an interrupt is fired, the CPU stops executing the
current task, and jumps to a predefined location in the kernel’s
address space, which contains the starting address of the service
routine for the interrupt (known as interrupt handler). It then
executes the interrupt handler, and once the execution is completed,
the CPU resumes the task that it was previously doing.
To quickly handle the interrupts, a table of pointers to interrupt
routines is used. The table contains the addresses of the interrupt
handlers for the various devices, and is generally stored in the low
memory (say first 100 locations or so). The interrupt routine can be
called indirectly with the help of this table. This array of addresses is
known as interrupt vector. The interrupt vector is further indexed by
a unique device number, given with the interrupt request, to provide
the address of the interrupt handler for the interrupting device.

1.7.2 Storage Structure


Whenever a program needs to be executed, it must be first loaded
into the main memory (called random-access memory or RAM).
RAM is the only storage area that can be directly accessed by the
CPU. RAM consists of an array of memory words, where each word
has its unique address. Two instructions, namely, load and store are
used to interact with the memory.
• The load instruction is used to move a word from the main
memory to a CPU register.
• The store instruction is used to move the content of the CPU
register to the main memory.
We know that a program is basically a set of instructions that are
executed to complete a given task. The execution of the program
instructions takes place in the CPU registers, which are used as
temporary storage areas, and have limited storage space. Usually, an
instruction–execution cycle consists of the following phases.
1. Fetch phase: Whenever the CPU needs to execute an instruction, it first
fetches it from the main memory, and stores it in instruction register
(IR).
2. Decode phase: Once the instruction has been loaded into the IR, the
control unit examines and decodes the fetched instruction.
3. Calculate effective address phase: After decoding the instruction, the
operands (if required) are fetched from the main memory and stored in
one of the internal registers.
4. Execute phase: The instruction is executed on the operands, and the
result is stored back to the main memory.
Since RAM is the only storage area that can be directly accessed
by the CPU, ideally all the programs and data should be stored in the
main memory permanently for fast execution and better system
performance. But, practically it is not possible because RAM is
expensive and offers limited storage capacity. Secondly, it is volatile
in nature, that is, it loses its contents when power supply is switched
off.
Therefore, we need some storage area that can hold a large
amount of data permanently. Such a type of storage is called
secondary storage. Secondary storage is non-volatile in nature, that
is, the data is permanently stored and survives power failure and
system crashes. However, data on the secondary storage is not
directly accessed by the CPU. Therefore, it needs to be transferred to
the main memory so that the CPU can access it. Magnetic disk
(generally called disk) is the primary form of secondary storage that
enables storage of enormous amount of data. It is used to hold online
data for a long term.
In addition to RAM and magnetic disk, some other form of storage
devices also exist, which include cache memory, flash memory,
optical discs, and magnetic tapes. The basic function of all the
storage devices is to store the data. However, they differ in terms of
their speed, cost, storage capacity, and volatility. On the basis of their
characteristics, such as cost per unit of data and speed with which
data can be accessed, they can be arranged in a hierarchical manner
as shown in Figure 1.7.

1.7.3 I/O Structure


Handling I/O devices is one of the main functions of an operating
system. A significant portion of the code of an operating system is
dedicated to manage I/O. One reason for this is the varying nature of
I/O devices. The operating system must issue commands to the
devices, catch interrupts, handle errors, and provide an interface
between the devices and the rest of the system.

Fig. 1.7 Memory Hierarchy

As already mentioned, a computer system consists of one or


more processors and multiple device controllers that are connected
through a common bus. Each device controller controls a specific
type of device, and depending on the device controller one or more
devices may be attached to it. For example, a small computer-system
interface (SCSI) controller may have seven or more devices attached
to it. To perform its job, the device controller maintains some local
buffer storage and a set of special-purpose registers. The operating
systems usually have a device driver for each device controller. The
role of the device driver is to present an interface for the device to the
rest of the system. This interface should be uniform, that is, it should
be same for all the devices to the extent possible.
To start an I/O operation, the device driver loads the appropriate
registers within the device controller, which in turn examines the
contents of registers to determine the action to be taken. Suppose,
the action to take is to read the data from the keyboard; the controller
starts transferring data from the device to its local buffer. Upon
completion of data transfer, the controller informs the device driver
(by generating an interrupt) that the transfer has been completed.
The device driver then returns the control along with the data or
pointer to the data to the operating system. This form of I/O is
interrupt-driven I/O, and this scheme wastes CPU’s time because the
CPU requests data from the device controller one byte at a time.
Thus, it is not feasible to transfer a large amount of data with this
scheme.
To solve this problem, another scheme, named direct memory
access (DMA) is commonly used. In this scheme, after setting up the
registers to inform the controller to know what to transfer and where,
the CPU is free to perform other tasks. The device controller can now
complete its job, that is, transfer a complete block of data between its
local buffer and memory without CPU intervention. Once the block of
data has transferred, an interrupt is generated to inform the device
driver that the operation has completed.

1.8 COMPUTER SYSTEM ARCHITECTURE


There exist a number of ways in which a computer system can be
organized. A common criterion of categorizing different computer
organizations is the number of general-purpose processors used.
1.8.1 Single-Processor Systems
Single-processor systems consist of one main CPU that can execute
a general-purpose instruction set, which includes instructions from
user processes. Other than the one main CPU, most systems also
have some special-purpose processors. These special-purpose
processors may be in the form of device-specific processors, such as
disk, keyboard, etc., or in mainframes, they may be I/O processors
that move data among the system components. Note that the special-
purpose processors execute a limited instruction set and do not
execute instructions from the user processes. Furthermore, the use of
special-purpose processors does not turn a single-processor system
into a multiprocessor system.
In some systems, the special-purpose processors are managed
by the operating systems, and in others, they are low-level
components built into the hardware. In the former case, the operating
system monitors their status and sends them information for their
next task. For example, the main CPU sends requests to access the
disk to a disk controller microprocessor, which implements its own
disk queue and disk scheduling algorithm. Doing this, the main CPU
is relieved from the disk scheduling overhead. In the latter case,
these special-purpose processors do their tasks autonomously, and
the operating system cannot communicate with them.

1.8.2 Multiprocessor Systems


As the name suggests, the multiprocessor systems (also known as
parallel systems or tightly coupled systems) consist of multiple
processors in close communication in a sense that they share the
computer bus and even the system clock, memory, and peripheral
devices. The main advantage of multiprocessor systems is that they
increase the system throughput by getting more work done in less
time. Another benefit is that it is more economic to have a single
multiprocessor system than to have multiple single-processor
systems. In addition, the multiprocessor systems are more reliable. If
one out of N processors fails, then the remaining N-1 processors
share the work of the failed processor among them, thereby
preventing the failure of the entire system.
Multiprocessor systems are of two types, namely, symmetric and
asymmetric. In symmetric multiprocessing systems, all the
processors are identical and perform identical functions. Each
processor runs an identical copy of the operating system and these
copies interact with each other as and when required. All processors
in symmetric multiprocessor system are peers—no master–slave
relationship exists between them. On the other hand, in asymmetric
multiprocessing systems, processors are different and each of
them performs a specific task. One processor controls the entire
system, and hence, it is known as a master processor. Other
processors, known as slave processors, either wait for the master’s
instructions to perform any task or have predefined tasks. This
scheme defines a master–slave relationship. The main disadvantage
of asymmetric multiprocessing systems is that the failure of the
master processor brings the entire system to a halt. Figure 1.8 shows
symmetric and asymmetric multiprocessor system.
Fig. 1.8 Symmetric and Asymmetric Multiprocessing Systems

1.8.3 Clustered Systems


A clustered system is another type of system with multiple CPUs. In
clustered systems, two or more individual systems (called nodes) are
grouped together to form a cluster that can share storage and are
closely linked via high-speed local area network (LAN). Each node
runs its own instance of operating system. Like multiprocessor
systems, multiple processors in clustered systems also work together
in close communication to accomplish a computational work.
However, in this case, the CPUs reside in different systems instead of
a single system.
The main advantage of clustered systems is that it provides
improved performance, high reliability, and high availability. That is,
the services will continue even in case of failure of one or more
systems in the cluster. It is achieved by adding a redundancy level in
the system.
Clustered systems are of two types, namely, asymmetric and
symmetric. In asymmetric clustered systems, one machine is
designated as a hot-standby host that does nothing but monitors the
active server, while the other machine runs the applications. In case
the active server fails, the machine in the hot-standby mode takes
possession of the applications that were running on the active server.
This entire process does not stop the running user applications, but
interrupt them for a short duration of time.
In symmetric clustered systems, all machines act as standby
hosts as well as active servers. This means, while running
applications they also monitor each other, which is a more efficient
method as it results in the better utilization of available resources.
In addition, there are two more types of clustered systems,
namely, parallel clustering and clustering over a wide area network
(WAN). In parallel clustering, multiple hosts are allowed to access
the same data on the shared storage. This can be achieved by using
some special versions of software and special releases of
applications. In clustering over a WAN, multiple machines in
buildings, cities, or countries are connected.

1.9 OPERATING SYSTEM OPERATIONS


As discussed earlier, modern operating systems are interrupt driven.
When there is no work to do, that is, no processes for execution, no
I/O activities, and no user to whom to respond, the operating system
will sit idle. Whenever an event occurs, it is signaled by triggering an
interrupt or a trap. For each type of interrupt, there exists a code
segment in the operating system that specifies the actions to be
taken. The part of the operating system called interrupt service
routine (ISR) executes the appropriate code segment to deal with the
interrupt.
In a multiprogrammed environment, the computer resources are
shared among several programs simultaneously. Though the sharing
of resources improves the resource utilization, it also increases the
problems. An error in one user program can adversely affect the
execution of other programs. It may also happen that the erroneous
program modifies another program, or data of another program, or
the operating system itself. Without the protection against such type
of errors, only one process must be allowed to execute at a time.
However, to improve the resource utilization, it is necessary to
allow resource sharing among several programs simultaneously.
Therefore, to handle such environment, the operating system must be
designed in such a way that it should ensure that an incorrect
program does not affect the execution of other programs, or the
operating system itself.

1.9.1 Dual-Mode Operation


In order to ensure the proper functioning of the computer system, the
operating system, and all other programs and their data must be
protected against the incorrect programs. To achieve this protection,
two modes of operations, namely, user mode and monitor mode
(also known as supervisor mode, system mode, kernel mode, or
privileged mode) are specified. A mode bit is associated with the
computer hardware to indicate the current mode of operation. The
value “1” indicates the user mode and “0” indicates the monitor mode.
When the mode bit is 1, it implies that the execution is being done on
behalf of the user, and when it is 0, it implies that the execution is
being done on behalf of the operating system.
When the system gets started (or booted), it is in monitor mode.
Then the operating system is loaded and the user processes are
started in the user mode. When a trap or an interrupt occurs, the
hardware switches from the user mode to the monitor mode by
changing the mode bit value to 0. Therefore, whenever the operating
system has the control over the computer, it is in the monitor mode.
Whenever the control needs to be passed to the user program, the
hardware must change the mode to the user mode before passing
the control to the user program.

Fig. 1.9 Dual-Mode Operation

This dual mode of operation helps in protecting the operating


system and the other programs, from malicious programs. To achieve
this protection, some of the machine instructions that may cause
harm are designated as privileged instructions. These privileged
instructions are allowed to be executed only in the monitor mode. If
an attempt is made to execute a privileged instruction in the user
mode, the hardware treats it as an illegal instruction and traps it to the
operating system without executing it. The instruction used to switch
from the kernel mode to the user mode is an example of a privileged
instruction.
Note: The operating systems including Windows 2000 and IBM OS/2
provide greater protection for the operating system by supporting
privileged instructions.
1.9.2 Timer
When a process starts executing, then it is quite possible that it gets
stuck in an infinite loop and never returns the control to the operating
system. Therefore, it is necessary to prevent a user program from
gaining the control of the system for an infinite time. For this, a timer
is maintained, which interrupts the system after a specified period.
This period can be fixed or variable. A variable timer is usually
implemented by a fixed-rate clock and a counter.
It is the responsibility of the operating system to set the counter
that is decremented with every clock tick. Whenever the value of
counter reaches 0, an interrupt occurs. In this way the timer prevents
a user program from running too long. Initially, when a program starts,
a counter is initialized with the amount of time that a program is
allowed to run. The value of counter is decremented by 1 with each
clock tick, and once it becomes negative, the operating system
terminates the program for exceeding the assigned time limit. Note
that the instructions that modify the operations of the timer are also
designated as privileged instructions.

1.10 OPERATING-SYSTEM STRUCTURES


Internally, different operating systems vary greatly from one another
in their structure. Therefore, before designing an operating system,
the designers must have clarity about which type of system is
desired. For this, the designers are required to study the operating
systems from several different viewpoints, for example, how it is
disassembled into its various components and how those
components are interconnected, which services it offers and how they
are provided, what kind of interface it provides to the users and
programmers, etc. This section discusses all these aspects of
operating systems.

1.10.1 System Components


An operating system is very a large and complex system. To make its
designing easier, it is partitioned into smaller parts where each part
refers to a well-defined portion of the system and has defined inputs,
outputs, and functions. Though the structure of all systems may differ,
the common goal of most systems is to support the system
components described in this section.

Process Management
The basic concept supported by almost all the operating systems is
the process. A process is a program under execution or we can say
an executing set of machine instructions. A program by itself does
nothing; it is a passive entity. In contrast, a process is an active entity
that executes the instructions specified in the program. It is
associated with a program counter that specifies the instruction to be
executed next. The instructions of a process are executed
sequentially, that is, one after another until the process terminates. It
is not necessary for a program to have only a single process; rather it
may be associated with many processes. However, the different
processes associated with the same program are not treated as
separate execution sequences. Furthermore, a process may spawn
several other processes (known as child processes) during
execution. These child processes may in turn create other child
processes, resulting in a process tree.
A process is intended to perform a specific task. To do its
intended task, each process uses some resources during its lifetime,
such as memory, files, CPU time, and I/O devices. These resources
can be allocated to the process either at the time of its creation or
during its execution. In addition to resources, some processes may
also need certain input when they are created. For example, if a
process is created to open a specific file, then it is required to provide
the desired file name as input to the process so that the process
could execute the appropriate instructions and system calls to
accomplish its task. After the process is terminated, the reusable
resources (if any) are reclaimed by the operating systems.
A process can be either a system process executing the system’s
code or a user process executing the user’s code. Usually, a system
contains a collection of system and user processes. These processes
can be made to execute concurrently by switching the CPU among
them. In relation to process management, the responsibilities of an
operating system are as follows:
• to create and delete processes (including both user and system
processes),
• to suspend the execution of a process temporarily and later
resume it,
• to facilitate communication among processes by providing
communication mechanisms, and
• to provide mechanisms for dealing with deadlock.
Note: All the concepts related to process management are discussed
in Chapters 2 through 6.

Memory Management
A computer system usually uses main memory and secondary
storage. The main memory is central to the operation of a computer
system. It is a huge collection of words or bytes, which may range
from hundreds of thousands to billions in size, and each byte or word
has a unique address. It holds the instructions and data currently
being processed by the CPU, the result of intermediate calculations,
and the recently processed data. It is the only storage that is directly
accessible to the CPU.
Whenever a program is to be executed, it is allocated space into
the main memory. As it executes, it accesses the data and
instructions from the main memory. After the program has been
executed, the memory space allocated to it is de-allocated and
declared available for some other process. This is the case of single
process in memory which leads to inefficient memory utilization. To
improve the utilization, multiprogramming is used in which multiple
programs are allowed to reside in the main memory at the same time.
However, in this case, the operating system needs more
sophisticated memory management techniques as compared to the
single-user environment. It is the responsibility of the memory
manager to manage memory between multiple programs in an
efficient way. In relation to memory management, the responsibilities
of an operating system are as follows:
• to allocate and deallocate memory space as and when required,
• to make a decision on which of the processes (ready for
execution) should be allocated memory when it is available, and
• to keep track of the parts of memory that have been allocated and
to which processes.
Usually, the capacity of the main memory is limited and not
enough to accommodate all data and programs in a typical computer.
Moreover, all the data stored is lost when power is lost or switched
off. Thus, it is required to use secondary storage in a computer
system to back up main memory. The most commonly used
secondary storage in computer systems is the disk. It stores the most
programs, such as compliers, sort routines, assemblers, etc. These
programs are loaded into the main memory when needed and
otherwise kept stored on the disk. Thus, it is important for a computer
system that the disk storage must be used efficiently. In relation to
disk management, the responsibilities of an operating system are as
follows:
• to allocate space on disk,
• to manage the unused (free) space available on disk, and
• to perform disk scheduling.
Note: Various memory management strategies are discussed in
Chapters 7 and 8. The disk management techniques are discussed in
Chapter 10.

File Management
Another important component of all the operating systems is file
management, which deals with the management and organization of
various files in the system. As we know that a computer system
consists of various storage devices such as hard disks, floppy disks,
compact discs, and so on, the operating system provides an abstract
view of these devices by hiding their internal structure so that the
users can directly access the data (on physical devices) without
exactly knowing where and how the data is actually stored.
The operating system defines a logical storage unit known as a
file, and all the data is stored in the form of files. Each file is
associated with some attributes such as its name, size, type, location,
date and time, etc. The users can perform various operations on files
such as create, delete, read, write, open, seek, rename, append, and
close. Operating system handles these operations with the help of
system calls.
To organize the files in a systematic manner, the operating system
provides the concept of directories. A directory can be defined as a
way of grouping files together. The directories are organized in a
hierarchical manner, which allows users to have subdirectories under
their directories, thus making the file system more logical and
organized. In relation to file management, the responsibilities of an
operating system are as follows:
• to create and delete files and directories,
• to back up the files onto some stable storage media,
• to map files onto secondary storage, and
• to provide primitives that enable one to manipulate the contents of
files and directories.
Note: The file management techniques are discussed in Chapters 11
and 12.

I/O Management
A computer system consists of several I/O devices such as keyboard,
monitor, printer, and so on. It is the responsibility of an operating
system to control and manage these devices. The operating system
also provides a device-independent interface between the devices
and the users so that the users can issue commands to use these
devices without actually knowing how their commands are being
executed. To hide the details of different devices, operating system
designers let the kernel to use device-driver modules, which present
a uniform device-access interface. The I/O management is discussed
in Chapter 9.

Protection and Security


In multiprogramming or multiuser systems where multiple processes
from many users are allowed to execute concurrently, some
protection mechanisms are needed to prevent the processes from
interfering into one another’s activities. These mechanisms must
ensure that the resources like memory segments, files, CPU, etc.,
can be accessed by only those processes which have been granted
permission by the operating system. For example, to ensure memory
protection, the operating system provides memory addressing
hardware that ensures that each process executes in its own address
space. In general, protection refers to any mechanism used to
control the access of processes, programs, or users to the resources
defined by the computer system. An unprotected resource is liable to
be misused by any unauthorized user.
Despite the fact that protection is ensured, a system may tend to
fail and allow unauthorized access. To understand how it can happen,
consider a user whose authentication information has been stolen.
Now, any unauthorized person can use the stolen information to copy
or modify that user’s data even though the memory and file protection
are working. This occurs due to the lack of security that ensures the
protection of system against attacks from insiders as well as external
attacks. Some of the common security attacks include viruses,
worms, trap doors, identity theft, denial of service (DoS) attacks, etc.
Prevention from these attacks is considered as the responsibility of
operating systems on some systems, while on other systems,
prevention is implemented using policies or additional software.
Note: Various security and protection mechanisms are discussed in
Chapter 13.

1.10.2 Operating-system Services


Almost all the user programs need an environment in which they can
be executed. In addition, they need a set of services using which the
burden of programming reduces and it becomes easier. For instance,
programmers should not be bothered about how memory is allocated
to their programs, where their programs are loaded in memory during
execution, how multiple programs are managed and executed, how
their programs are organized in files to reside on disk, etc. Providing
this environment in which programs can be executed and the set of
services to user programs are the operating system responsibilities.
One set of operating-system services provides functions to help the
user. These services include the following:
• User interface: Providing a user interface (UI) to interact with
users is essential for an operating system. This interface can be
in one of the several forms. One is the command-line interface,
in which users interact with the operating system by typing
commands. Another is the batch interface, in which several
commands and directives to control those commands are
collected into files that are then executed. Another is the
graphical user interface (GUI), in which users interact with the
system with a pointing device, such as a mouse.
• Program execution: The system must allocate memory to the
user programs, and then load these programs into memory so
that they can be executed. The programs must be able to
terminate either normally or abnormally.
• I/O operations: Almost all the programs require I/O involving a file
or an I/O device. For efficiency and protection, the operating
system must provide a means to perform I/O instead of leaving it
for users to handle I/O devices directly.
• File-system manipulation: Often, programs need to manipulate
files and directories, such as creating a new file, writing contents
to a file, deleting or searching a file by providing its name, etc.
Some programs may also need to manage permissions for files or
directories to allow or deny other programs requests to access
these files or directories.
• Communication: A process executing in one computer may need
to exchange information with the processes executing on the
same computer or on a different computer connected via a
computer network. The information is moved between processes
by the operating system.
• Error detection: There is always a possibility of occurrence of
error in the computer system. Error may occur in the CPU,
memory, I/O devices, or in user program. Examples of errors
include an attempt to access an illegal memory location, power
failure, link failure on a network, too long use of CPU by a user
program, etc. The operating system must be constantly aware of
possible errors, and should take appropriate action in the event of
occurrence of error to ensure correct and consistent computing.
As we know, multiple programs may be executed concurrently
each of which may require multiple resources during their execution.
Therefore, providing another set of services that help allocating
resources to programs in some order is necessary for an operating
system. These services exist not for helping user, instead to ensure
the efficient and secure execution of programs.
• Resource allocation: In case of multiprogramming, many
programs execute concurrently, each of which require many
different types of resources, such as CPU cycles, memory, I/O
devices, etc. Therefore, in such an environment, operating
system must allocate resources to programs in a manner such
that resources are utilized efficiently, and no program should wait
forever for other programs to complete their execution.
• Protection and security: Protection involves ensuring controlled
access to the system resources. In a multiuser or a networked
computer system, the owner of information may want to protect
information. When several processes execute concurrently, a
process should not be allowed to interfere with other processes or
with the operating system itself. Security involves protecting the
system from unauthorized users. To provide security, each user
should authenticate himself or herself before accessing system
resources. A common means of authenticating users is
username/password mechanism.
• Accounting: We may want to keep track of the usage of system
resources by each individual user. This information may be used
for accounting so that users can be billed or for accumulating
usage statistics, which is valuable for researchers.

1.10.3 User Operating-System Interface


Providing an interface to interact with the users is essential for an
operating system. Earlier operating systems provide users with the
command-line interface or character-based interface. This interface
enables users to interact with the operating system by entering
commands to which it responds. On the other hand, most operating
systems nowadays provide graphical user interface (GUI) in addition
to character-based interface. GUI enables users to interact with the
operating system by clicking mouse buttons.

Command-line Interface
As mentioned, this interface enables users to interact with the
operating system by typing commands. These commands are then
interpreted and executed in order to provide response to the user.
The MS-DOS is the most commonly used operating system that
provides command-line interface. Figure 1.10 shows MS-DOS
command-line interface. Some operating systems provide more than
one command-line interface; therefore, on such systems, command-
line interfaces are called shells. For example, UNIX provides C shell,
Bourne shell, Korn shell, etc.
Fig. 1.10 MS-DOS Command-Line Interface

Generally, the commands that can be given perform some


operations on a file such as creation, deletion, printing, executing,
and so on. These commands can be implemented in two ways. In the
first method, the code to execute the commands could be included in
the interface itself. Now, whenever a command is issued by the user,
the interface jumps to a section of its code and makes the appropriate
system call. Since the code for the commands is included in the
interface itself, adding new commands require changing the interface.
In addition, the size of interface increases with each new command.
Alternatively, system programs can be developed that include the
code for most commands. In this case, the interface has no idea
about the command implementation; instead it just uses the
command to locate the file and then loads and executes it. The UNIX
operating system follows this approach; therefore, a command in
UNIX would search a file with that name, load it in memory, and then
execute it. For example, the command
cp file1.txt file2.txt
interprets cp as the file name, searches it, and loads it in the memory
for execution. The command cp creates a copy of a given file. In the
aforementioned command, file1.txt and file2.txt are the name of
the source and the destination files, and they are treated as
parameters during the execution of the command cp. It means a copy
of file1.txt is created and named file2.txt.
With this approach, the new commands can be added to the
system easily by creating new system files. The names of the files will
then serve the command name. Since adding new files does not
require changing the interface, the size of the interface remains
unchanged and small.

Graphical User Interface


Since in command-line interface, users interact by issuing certain
commands, there is always a chance of committing a mistake. For
example, opening a file might require the user to enter the path where
the file actually resides. Any mistake in entering the path will prevent
the user to open the file.
An alternative and more user-friendly method to interface with the
operating system is the graphical user interface (GUI). GUI provides
a rectangular area of screen called Window in which files, programs,
directories, and system functions are represented as small images or
symbols called icons. In addition, various Menus are provided that
list actions or commands to be performed by the user. Such interface
enables users to interact with the operating system by moving the
mouse to position the mouse cursor and clicking on some icon or
menu option. Depending upon the position of the mouse cursor and
the button (left or right) clicked on the mouse, some action is
performed such as opening of a file, execution of a program,
appearing of a menu, etc.
UNIX, Apple Macintosh (Mac OS), and various versions of
Microsoft Windows including version 1.0 are some examples of
operating systems that provide GUI. Many operating systems provide
users with both command-line interface and GUI, and it depends on a
person’s personal choice to choose from and work with. Many UNIX
programmers prefer to use command-line interface because it is
faster to work with and provides powerful capabilities, whereas,
almost all Window users use GUI. GUI of Microsoft Windows 7 is
shown in Figure 1.11.

Fig. 1.11 GUI of Microsoft Windows 7

1.10.4 System Calls


As mentioned, providing services to the user programs come under
the operating system responsibilities. User programs interface with
these services through system calls. In other words, user programs
take these services from operating system by making system calls.
These calls are similar to a procedure call except that it switches the
mode of execution from the user mode to the kernel mode and
invokes the operating system. The operating system then determines
what the user program actually asks or wants, performs the system
call, and returns the control back to the instruction following the
system call. The user program now again proceeds in the user mode.
To understand how system calls are used in the command-line
interface, let us take an example of a program that opens a file and
writes some data in it. Suppose the file name and the data to be
written on it is provided by the user during the execution of the
program. This program performs a sequence of system call during its
execution. First, it makes a system call to prompt a message on
screen to ask the user to provide the file name and the data to be
written on the file. Then, the user provides the file name and the data
by typing it through keyboard, and reading this from keyboard again
needs a system call. With having the file name, the program attempts
to open the required file, for which another system call needs to be
made. Once the file is opened, the data is written to the file (requires
a system call) and the file is closed (another system call). Finally, a
system call is performed to prompt the user with a message to inform
that the task is completed successfully.
The earlier discussion explains the use of system calls during
normal operation; however, error could occur during any operation.
For instance, when the program attempts to open the file, an error
such as file not found, hardware failure, file protection violation, etc.,
could occur. In this situation, the program cannot proceed with its
normal behaviour, instead, it should prompt an appropriate message
on the screen (a system call needed) and then terminate abnormally
(another system call).
As now, it is clear that even the simple programs make a heavy
use of operating system services through system calls. In general,
the services offered by these calls include creation and termination
(or deletion) of processes, creation, deletion, reading, writing,
opening, and closing files, management of directories, and carrying
out input and output. In fact, the set of services that are offered
through system calls determine a significant part of the operating
systems responsibilities. Here note that each system call may have
same or different name in different operating systems.
Types of System Calls
All the system calls provided by an operating system can be roughly
grouped into following five major categories:
• Process management: The system calls under this category
include the calls to create a new process, terminate a process,
setting and retrieving process attributes (such as process priority,
its maximum allowable execution time, etc.), forcing a process to
wait for some time or some event to occur, etc. Some of the
commands that come under this category are create process,
terminate process, load, execute, end, and abort.

• File management: The system calls under this category include


the calls to create, delete, open, close, read, and write a file.
Some of the commands that come under this category are
create, delete, open, close, read, and write.

• Device management: The system calls under this category


include the calls to request for device, releasing it, and performing
some operations (such as read or write) with the device. Some of
the commands that come under this category are request,
release, read, and write.

• Information maintenance: The system calls under this category


include the calls to return information about the system, such as
system’s current data and time, number of current users, version
of operating system, amount of free memory, etc. Some of the
commands that come under this category are time, date, get
process attributes, and set process attributes.

• Communications: The system calls under this category include


the calls to open and close communication connection, reading
and writing messages, etc. The commands that come under this
category are open and close.

1.10.5 System Programs


In addition to system calls, modern systems also provide a variety of
system programs. These programs act as an interface between the
operating system and the application programs. They provide an
environment in which application programs can be developed and
executed in a convenient manner. They can be classified into the
following categories:
• File management: The system programs under this category
provide commands such as cut/copy, dump, list, print, etc., to
perform various operations on files and directories.
• File modification: The system programs under this category
allow creating or modifying the contents of a file stored on disk or
some other storage device. Text editor is a system program that
belongs to this category.
• Communications: The system programs under this category
enable communication among different users, processes, or
systems by establishing virtual connections between them. With
the help of these programs, a user can send messages to other
users, log on some remote systems, or transfer data from other
system to its own system.
• Status information: The system programs under this category
are used to present the status of the computer system such as
system date and time, number of users connected, CPU
utilization, disk and memory usage, configuration information, etc.
• Programming language support: Nowadays, several
programming languages support different system programs such
as compilers, assemblers, and interpreters. These system
programs are generally provided to the users along with the
operating system.
• Program loading and execution: After a user program has been
complied, it needs to be loaded in the main memory for execution.
The task of loading a program into the memory is performed by
the loader, a system program. A system may provide different
loaders including absolute loaders, relocatable loaders, overlay
loaders, etc. In addition, the successful execution of the program
also requires debugging, which is performed by debugger—
another system program under this category.

1.10.6 System Structure


Every operating system has its own internal structure in terms of file
arrangement, memory management, storage management, etc., and
the entire performance of the system depends on its structure. The
internal structure of operating system provides an idea of how the
components of the operating system are interconnected and blended
into kernel. This section discusses various system structures that
have evolved with time.

Simple Structure
Early operating systems were developed with an elementary
approach without much concern about the structure. In this approach,
the structure of the operating systems was not well-defined. The
operating systems were monolithic, written as a collection of
procedures where each procedure is free to call any other procedure.
An example of operating systems designed with this approach is MS-
DOS. Initially, MS-DOS was designed as a small-size and simple
system, and with limited scope, but grew beyond its scope with time.
It was designed with the idea of providing more functionality within
less space; therefore, it was not carefully divided into modules. Figure
1.12 shows the structure of the MS-DOS system.
Though MS-DOS has a limited structuring, there is no clear
separation between the different interfaces and level of functionality.
For example, application programs can directly call the basic I/O
routines to read/write data on disk instead of going through a series
of interfaces. This exemption makes the MS-DOS system susceptible
to malicious programs that may lead to system crash. Moreover, due
to the lack of hardware protection and dual-mode operation in the
Intel 8088 system (for which MS-DOS system was developed), the
base hardware was directly accessible to the application programs.
Fig. 1.12 Structure of the MS-DOS System

Layered Approach
In the layered approach, the operating system is organized as a
hierarchy of layers with each layer built on the top of the layer below
it. The topmost layer is the user interface, while the bottommost layer
is the hardware. Each layer has a well-defined function and
comprises data structures and a set of routines. The layers are
constructed in such a manner that a typical layer (say, layer n) is able
to invoke operations on its lower layers and the operations of layer n
can be invoked by its higher layers.
Fig. 1.13 Layers in THE System

‘THE system’ was the first layer-based operating system


developed in 1968 by E.W. Dijkstra and his students. This operating
system consisted of six layers (0-5) and each layer had a predefined
function as shown in Figure 1.13.
The layered design of the operating system provides some
benefits, which are as follows:
• It simplifies the debugging and verification of the system. As the
lowest layer uses merely the base hardware, it can be debugged
without pertaining to the rest of the system. Once it has been
verified, its correct functioning can be assumed while the second
layer is being verified. Similarly, each higher level layer can be
debugged independent of the lower layer. If during verification,
any bug is found, it will be on the layer being debugged as lower
layers have already been verified.
• It supports information hiding. Each higher level layer is required
to know only what operations the lower layers provide and not
how they are being implemented.
The layered approach has some limitations too, which are as
follows:
• As each higher level layer is allowed to use only its lower level
layers, the layers must be defined carefully. For example, the
device driver of the physical disk must be defined at a layer below
the one containing memory-management routines. This is
because memory management needs to use the physical disk.
• The time taken in executing a system call is much longer as
compared to that of non-layered systems. This is because any
request by the user has to pass through a number of layers
before the action could be taken. As a result, system overhead
increases and efficiency deceases.

Microkernels
Initially, the size of kernel was small; with Berkley UNIX (BSD) began
the era of large monolithic kernels. The monolithic kernel runs every
basic system service like scheduling, inter-process communication,
file management, process and memory management, device
management, etc., in the kernel space itself. The inclusion of all basic
services in kernel space increased the size of the kernel. In addition,
these kernels were difficult to extend and maintain. The addition of
new features required the recompilation of the whole kernel, which
was time and resource consuming.
To overcome these problems, an approach called microkernel
was developed that emphasized on modularizing the kernel. The idea
is to remove the less essential components from the kernel and
keeping only a subset of mechanisms typically included in a kernel
thereby reducing its size as well as number of system calls. The
components moved outside the kernel are implemented either as
system- or user-level programs. MACH system and OS X are the
examples of operating systems designed with microkernel approach.
The main advantage of the microkernel approach is that the
operating system can be extended easily; the addition of new
services in the user space does not cause any changes at the kernel
level. In addition, microkernel offers high security and reliability as
most services run as user processes rather than kernel processes.
Thus, if any of the running services fail, the rest of the system
remains unaffected.
Note: Though in the microkernel approach, the size of the kernel was
reduced, still there is an issue regarding which services to be
included in the kernel and which services to be implemented at the
user level.

Modules
The module approach employs object-oriented programming
techniques to design a modular kernel. In this approach, the
operating system is organized around a core kernel and other
loadable modules that can be linked dynamically with the kernel
either at boot time or at run time. The idea is to make the kernel
providing only core services while certain services can be added
dynamically. An example of module-based operating system is
Solaris, which consists of core kernel and seven loadable kernel
modules: scheduling classes, file systems, loadable system calls,
executable formats, streams modules, miscellaneous, and device and
bus drivers.
The modular approach is similar to the layered approach in the
sense that each kernel module has well-defined interfaces. However,
it is more flexible than the layered approach as each module is free to
call any other module.

1.10.7 Virtual Machines


Virtual machine is nothing but the identical copy of the bare hardware
including CPU, disks, I/O devices, interrupts, etc. It allows each user
to run operating system or software packages of his choice on a
single machine thereby creating an illusion that each user has its own
machine.
The virtual machine operating system (VMOS) creates several
virtual machines by partitioning the resources of the real machine.
The operating system uses the CPU scheduling and virtual memory
concept to create an appearance that each running process has its
own processor as well own virtual memory (see Figure 1.14). The
spooling and file system are used to create illusion of each user
having own card reader and line printer.

Fig. 1.14 Virtual Machine Structure

The virtual memory approach provides the following benefits:


• Using virtual machines does not result in any extra overhead and
performance degradation as each virtual machine has same
architecture as that of a real machine.
• Generally, while developing the operating system, the normal
functioning of the current system is to be halted. However, by
using the virtual machine system, each system programmer can
be provided with his own virtual machine for system development.
Thus, there is no need to interrupt the normal system operation.
• The VMOS keeps the virtual machines isolated from one another.
This results in the protection of system resources.

LET US SUMMARIZE
1. The operating system is defined as a program that is running at all times
on the computer (usually called the kernel). It acts as an interface
between the computer users and the computer hardware.
2. An operating system performs two basically unrelated functions: extending
the machine and managing resources. It can be described as an
extended machine that hides the hardware details from the user and
makes the computer system convenient and easy to use. It can also be
described as a resource manager that manages all the computer
resources efficiently.
3. The role of an operating system can be more fully understood by exploring
it from two different viewpoints: the user point of view and the system
point of view.
4. In serial processing, all the users are allowed to access the computer
sequentially (one after the other).
5. In the batch processing system, the operator used to batch together the
jobs with similar requirements, and run these batches one by one.
6. Multiprogramming allows multiple jobs to reside in the main memory at the
same time. If one job is busy with I/O devices, the CPU can pick another
job and start executing it. Thus, jobs are organized in such a way that the
CPU always has one to execute. This increases the CPU utilization by
minimizing the CPU idle time.
7. The number of jobs competing to get the system resources in a
multiprogramming environment is known as degree of multiprogramming.
8. An extension of multiprogramming systems is time-sharing systems (or
multitasking) in which multiple users are allowed to interact with the
system through their terminals.
9. The CPU in time-sharing systems switches so rapidly from one user to
another that each user gets the impression that only he or she is working
on the system, even though the system is being shared among multiple
users.
10. Different types of operating systems include batch operating systems,
multiprogramming operating systems, time-sharing systems, real-time
operating system, distributed operating system, PC operating systems,
and mobile operating systems.
11. A computer system basically consists of one or more processors (CPUs),
several device controllers, and the memory. All these components are
connected through a common bus that provides access to shared
memory. Each device controller acts as an interface between a particular
I/O device and the operating system.
12. When the system boots up, the initial program that runs on the system is
known as bootstrap program.
13. The event notification is done with the help of an interrupt that is fired
either by the hardware or the software.
14. Whenever an interrupt is fired, the CPU stops executing the current task,
and jumps to a predefined location in the kernel’s address space, which
contains the starting address of the service routine for the interrupt
(known as interrupt handler).
15. Whenever a program needs to be executed, it must be first loaded into the
main memory (called random-access memory or RAM). Two instructions,
namely, load and store are used to interact with the memory.
16. Secondary storage is nonvolatile in nature, that is, the data is permanently
stored and survives power failure and system crashes. Magnetic disk
(generally called disk) is the primary form of secondary storage that
enables the storage of enormous amount of data.
17. Handling I/O devices is one of the main functions of an operating system.
A significant portion of code of operating system is dedicated to manage
I/O.
18. Single-processor systems consist of one main CPU that can execute a
general-purpose instruction set, which includes instructions from user
processes. Other than the one main CPU, most systems also have some
special-purpose processors.
19. The multiprocessor systems (also known as parallel systems or tightly
coupled systems) consist of multiple processors in close communication
in a sense that they share the computer bus and even the system clock,
memory, and peripheral devices.
20. A clustered system is another type of system with multiple CPUs. In
clustered systems, two or more individual systems (called nodes) are
grouped together to form a cluster that can share storage and are closely
linked via high-speed LAN (local area network).
21. The part of the operating system called interrupt service routine (ISR)
executes the appropriate code segment to deal with the interrupt.
22. In order to ensure the proper functioning of the computer system, the
operating system, and all other programs and their data must be
protected against the incorrect programs. To achieve this protection, two
modes of operations, namely, user mode and monitor mode (also known
as supervisor mode, system mode, kernel mode, or privileged mode) are
specified.
23. It is necessary to prevent a user program from gaining the control of the
system for an infinite time. For this, a timer is maintained, which interrupts
the system after a specified period.
24. Though the structure of all systems may differ, the common goal of most
systems is to support the system components including process
management, memory management, file management, I/O management,
and protection and security.
25. One of the major responsibilities of the operating system is to provide an
environment for an efficient execution of user programs. For this, it
provides certain services to the programs and the users. These services
are divided into two sets. One set of services exists for the convenience of
users, and another set of services ensures the efficient operations of the
system in a multiprogramming environment.
26. Providing an interface to interact with the users is essential for an
operating system. There are two types of user interface: command-line
interface and graphical user interface (GUI).
27. All the system calls provided by an operating system can be roughly
grouped into following five major categories, namely, process
management, file management, device management, information
maintenance, and communication.
28. The system programs act as an interface between the operating system
and the application programs. They provide an environment in which
application programs can be developed and executed in a convenient
manner.
29. Every operating system has its own internal structure in terms of file
arrangement, memory management, storage management, etc., and the
entire performance of the system depends on its structure. Various
system structures have evolved with time including simple structure,
layered structure, microkernel, and modules.
30. Virtual machine is nothing but the identical copy of the bare hardware
including CPU, disks, I/O devices, interrupts, etc. It allows each user to
run operating system or software packages of his or her choice on a
single machine thereby creating an illusion that each user has its own
machine.

EXERCISES
Fill in the Blanks
1. To achieve protection, some of the machine instructions that may harm
are designated as _____________.
2. When the system gets started, it is in _____________ mode.
3. The memory-resident portion of the batch operating system is known as
_____________ .
4. The number of jobs competing to get the system resources in
multiprogramming environment is known as _____________ .
5. The lowest level layer of a computer system is _____________ .

Multiple Choice Questions


1. The operating system acts as a:
(a) Resource manager
(b) Interface
(c) Neither (a) nor (b)
(d) Both (a) and (b)
2. Which of the following instruction is used to interact with memory?
(a) Load
(b) Store
(c) Both (a) and (b)
(d) None of these
3. Which of the following does not provide GUI?
(a) MS-DOS
(b) UNIX
(c) Apple Macintosh
(d) None of these
4. Which of the following is false?
(a) Time-sharing system provides quicker response time than the
multiprogramming system.
(b) Multiprogramming systems are more complex than time-sharing
systems.
(c) Time-sharing system is an extension of the multiprogramming system.
(d) In the time-sharing system, each user is assigned a fixed time-slot.
5. Transparency is the main objective of:
(a) Distributed operating system
(b) Multiprogramming operating system
(c) Real-time operating system
(d) Mobile operating system

State True or False


1. The operating system comprises five layers.
2. The set of services offered through system calls determines a significant
part of the operating system’s responsibilities.
3. A variable timer is usually implemented by a fixed-rate clock and a
counter.
4. More than one device can be attached to a device controller.
5. In the case of time multiplexing of a resource, each user gets some of its
portion, instead of taking turns.

Descriptive Questions
1. What is an operating system? Give the view of an OS as a resource
manager.
2. How computer system handles interrupts? Discuss how interrupts can be
handled quickly?
3. Discuss the storage structure of a computer system.
4. How an I/O operation is handled by the system?
5. What do you mean by parallel clustering?
6. Describe briefly how the operating systems have been evolved from serial
processing to multiprogramming systems.
7. Write short notes on the following:
(a) Multiprogramming
(b) Time-sharing systems
(c) Dual-mode operation
(d) Command-line interface
(e) Microkernel
(f) Virtual machines
8. Compare and contrast the different types of operating systems.
9. Discuss the various services that the operating system should provide.
10. Why maintaining a timer is important?
11. How GUI is better than command-line interface?
12. What are system calls? Describe the use of system calls with the help of
an example.
13. Explain various categories of system programs.
14. Discuss various system structures that have evolved with time.
chapter 2

Process Management

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the basic concepts of processes.
⟡ Discuss various states of a process and the transition between these
states.
⟡ Define the term process scheduling.
⟡ Explain various operations that can be performed on processes.
⟡ Understand the concept of cooperating process.
⟡ Provide an overview of inter-process communication.
⟡ Explain different mechanisms used for communication in client-
server environment.

2.1 INTRODUCTION
Earlier, there was a limitation of loading only one program into the
main memory for execution at a time. This program was very
multifaceted and resourceful as it had access to all the resources of
the computer, such as memory, CPU time, I/O devices, and so on. As
time went by, newer techniques incorporated a variety of novel and
powerful features that dramatically improved the efficiency and
functionality of the overall system. Modern computer systems
corroborate multiprogramming, which allows a number of programs to
reside in the main memory at the same time. These programs can be
executed concurrently thereby requiring the system resources to be
shared among them. Multiprogrammed systems need to distinguish
among the multiple executing programs, and this is accomplished
with the concept of a process (also called task on some systems).
When multiple processes run on a system concurrently and more
than one processes require the CPU at the same time, then it
becomes essential to select any one process to which the CPU can
be allocated. To serve this purpose, scheduling is required. Moreover,
the multiple processes running on a system also need to
intercommunicate in order to reciprocate some data or information.
This kind of intercommunication between several processes is
referred to as inter-process communication (IPC).

2.2 PROCESS CONCEPT


In this section, we will discuss some basic concepts of computer
processes.

2.2.1 The Process


As discussed in the previous chapter, a process is a program under
execution or an executing set of machine instructions. It can be either
a system process executing the system’s code, or a user process
executing the user’s code.
There is only a hairline difference between the program and the
process in the sense that a program is a passive entity that does not
initiate anything by itself whereas a process is an active entity that
performs all the actions specified in a particular program. A process
comprises not only the program code (known as text section) but
also a set of global variables (known as data section) and the
process control block (PCB).
There can be either one-to-one or one-to-many relationship
between programs and processes. A one-to-one relationship exists in
case only a single instance of a program is running on the system.
On the other hand, if the multiple instances of a single program are
running simultaneously or when a concurrent program (a program
that requires some of its parts to be executed concurrently) is being
run, there exists one-to-many relationship between programs and
processes. In this case, the text section of the multiple instances will
be same but the data section will be different.
One important thing to notice about the processes is that some
processes involve higher computation than I/O operations thereby
demanding greater use of the CPU than I/O devices during their
lifetime. Such processes where the speed of execution is governed
by the CPU are called CPU-bound or compute-bound. Contrastive
to this are some processes that involve a lot of I/O operations during
their lifetime. Such processes where the speed of execution is
governed by the I/O device, not by the CPU, are called I/O-bound.

2.2.2 Process States


Each process in the operating system is tagged with a ‘state’ variable
—an integer value that helps the operating system to decide what to
do with the process. It also indicates the nature of the current activity
in a process. A process may be in one of the following states
depending on the current activity of the process.
• New: A process is said to be in ‘new’ state if it is being created.
• Ready: A process is said to be in ‘ready’ state if it is ready for the
execution and waiting for the CPU to be allocated to it.
• Running: A process is said to be in ‘running’ state if the CPU has
been allocated to it and it is being executed.
• Waiting: A process is said to be in ‘waiting’ state (also called
‘blocked’ state) if it has been blocked by some event. Unless that
event occurs, the process cannot continue its execution.
Examples of such blocking events are completion of some I/O
operation, reception of a signal, etc. Note that a process in
waiting state is unable to run even if the CPU is available.
• Terminated: A process is said to be in ‘terminated’ state if it has
completed its execution normally or it has been terminated
abnormally by the operating system because of some error or
killed by some other process.
Note: On a single processor system, only one process may be in
running state at one time; however, in a multiprocessor system with m
CPUs, at most m processes may be in running state at one time.
Each process undergoes change in state during its lifetime. The
change in state of a process is known as state transition of a
process. By and large, it is caused by the occurrence of some event
in the system. There are many possible state transitions (see Figure
2.1) that may crop up. These transitions along with their possible
causes are as follows:

Fig. 2.1 Process State Transition Diagram

• New → Ready: This transition takes place if a new process has


been loaded into the main memory and it is waiting for the CPU to
be allocated to it.
• Ready → Running: This transition takes place if the CPU has
been allocated to a ready process and it has started its execution.
• Running → Ready: This transition may occur if:
■ the time slice of the currently running process has expired.
■ some higher priority process gets ready for execution, etc.
In this case, the CPU is preempted from the currently executing
process and allocated to some another ready process.
• Running → Waiting: This transition may take place if the
currently running process
■ needs to perform some I/O operation,
■ has to wait for a message or some action from another
process,
■ requests for some other resource.
In this case, the CPU gets freed by the process and can be
allocated to some other ready process.
• Running → Terminated: This transition takes place if the
currently running process
■ has completed its task and requests to the operating system
for its termination,
■ is terminated by its parent in case the function performed by
it is no longer required,
■ is terminated by the kernel because it has exceeded its
resource usage limit or involved in a deadlock.
In this case, the CPU is preempted from the currently running
process and allocated to some other ready process.
• Waiting → Ready: This transition takes place if an event (for
example, I/O completion, signal reception, and synchronization
operation) for which the process was waiting, has occurred.

2.2.3 Process Control Block (PCB)


To keep track of all the processes in the system, the operating system
maintains a structurally organized table called process table that
includes an entry for each process. This entry is called process
control block (PCB)—a data structure created by the operating
system for representing a process. A process control block stores
descriptive information pertaining to a process such as its state,
program counter, memory management information, information
about its scheduling, allocated resources, accounting information,
etc., that is required to control and manage a particular process. The
basic purpose of PCB is to indicate the progress of a process so far.
Some of the important fields stored in a PCB are as follows:
• Process ID: Each process is assigned a unique identification
number called process identifier (PID) by the operating system
at the time of its creation. PID is used to refer the process in the
operating system.
• Process state: It stores the current state of a process that can be
new, ready, running, waiting, or terminated.
• Parent process ID: It stores the PID of the parent, if the process
has been created by some other process.
• Child process IDs: It stores the PID s of all the child processes of
a parent process.
• Program counter: It contains the address of the instruction that is
to be executed in the process next. Whenever the CPU switches
from one process to another, the program counter of the old
process is saved so that the operating system could resume with
the same instruction whenever the old process is restarted.
• Event information: If the process is in waiting state then this field
contains the information about the event for which the process is
waiting to happen. For example, if the process is waiting for an
I/O device, then this field stores the ID of that device.
• Memory management information: It includes information
related to the memory configuration for a process such as the
value of base and limit registers, the page tables (if paging
memory management technique has been used) or the segment
tables (if segmentation memory management technique has been
used). Memory management techniques are discussed in detail in
Chapter 7.
• CPU registers: They store the contents of index registers,
general purpose registers, condition code information, etc., at the
time when the CPU was last freed by the process or preempted
from the process.
• CPU scheduling information: It includes the information used by
scheduling algorithms such as the process priority number (in
case the priority scheduling is to be used for the process), the
pointers to appropriate scheduling queues depending upon the
current state of the process, the time when CPU was last
allocated to the process, etc. CPU scheduling is discussed in
detail in Chapter 4.
• I/O status: It includes information like I/O devices allocated to a
process, pointers to the files opened by the process for I/O, the
current position in the files, etc.

2.3 PROCESS SCHEDULING


The main objective of multiprogramming is to keep the jobs organized
in such a manner that CPU has always one to execute. This confirms
that CPU is utilized to the maximum level by reducing its idle time.
This purpose can be jolly well achieved by keeping the CPU busy at
all the times. This implies that some process must always be running
on the CPU. However, when two or more processes compete for the
CPU at the same time then a choice has to be made as to which
process to allocate the CPU next. This procedure of determining the
next process to be executed on the CPU is called process
scheduling and the module of the operating system that makes this
decision is called scheduler.

Scheduling Queues
For scheduling purposes, there exist different queues in the system;
these are as follows:
• Job queue: As the processes enter the system for execution, they
are massed into a queue called job queue (or input queue) on a
mass storage device such as hard disk.
• Ready queue: From the job queue, the processes which are
ready for the execution are shifted into the main memory. In the
main memory, these processes are kept into a queue called ready
queue. In other words, the ready queue contains all those
processes that are waiting for the CPU.
• Device queue: For each I/O device in the system, a separate
queue is maintained which is called device queue. The process
that needs to perform I/O during its execution is kept into the
queue of that specific I/O device; it waits there until it is served by
the device.
Generally, both the ready queue and device queue are maintained
as linked lists that contain PCBs of the processes in the queue as
their nodes. Each PCB includes a pointer to the PCB of the next
process in the queue (see Figure 2.2). In addition, the header node of
the queue contains pointers to the PCBs of the first and the last
process in the queue.

Fig. 2.2 Ready Queue and Device Queue Maintained as Linked List

Whenever a process in the job queue becomes ready to execute,


it is brought into the ready queue where it waits for the CPU
allocation. Once CPU is allocated to it (that is, the process switches
to the running state), the following transitions may occur.
• If the process needs to perform some I/O operation during its
execution, it is removed from the ready queue and put into the
appropriate device queue. After the process completes its I/O
operation and is ready for the execution, it is switched from the
device queue to ready queue.
• If an interrupt occurs, the CPU can be taken away from the
currently executing process forcibly and the process has to wait
until the interrupt is handled. After that the process is put back
into the ready queue.
• If the time slice (in the case of time sharing systems) of the
process has expired, the process is put back into the ready
queue.
• If the process creates a new process and has to wait until the child
process terminates, the parent process is suspended. After the
execution of child process, it is again put back into the ready
queue.
• If the process has successfully completed its task, it is terminated.
The PCB and all the resources allocated to the process are
deallocated.
All these transitions can be represented with the help of a
queuing diagram as shown in Figure 2.3.

Fig. 2.3 Queuing Diagram


Note: In a single processor system, since there can be only one running
process at a time, there is no need to maintain a queue for the running
processes.

Types of Schedulers
The following types of schedulers (see Figure 2.4) may coexist in a
complex operating system.
• Long-term scheduler, also known as job scheduler or
admission scheduler, works with the job queue. It selects the
next process to be executed from the job queue and loads it into
the main memory for execution. The long-term scheduler must
select the processes in such a way that some of the processes
are CPU-bound while others are I/O-bound. This is because if all
the processes are CPU-bound, then the devices will remain
unused most of the time. On the other hand, if all the processes
are I/O-bound then the CPU will remain idle most of the time.
Thus, to achieve the best performance, a balanced mix of CPU-
bound and I/O-bound processes must be selected. The main
objective of this scheduler is to control the degree of
multiprogramming (that is, the number of processes in the ready
queue) in order to keep the processor utilization at the desired
level. For this, the long-term scheduler may admit new processes
in the ready queue in the case of poor processor utilization or
may reduce the rate of admission of processes in the ready
queue in case the processor utilization is high. In addition, the
long-term scheduler is generally invoked only when a process
exits from the system. Thus, the frequency of invocation of long-
term scheduler depends on the system and workload and is much
lower than other two types of schedulers.
• Short-term scheduler, also known as CPU scheduler or
process scheduler, selects a process from the ready queue and
allocates CPU to it. This scheduler is required to be invoked
frequently as compared to the long-term scheduler. This is
because generally a process executes for a short period and then
it may have to wait either for I/O or for something else. At that
time, CPU scheduler must select some other process and
allocate CPU to it. Thus, the CPU scheduler must be fast in order
to provide the least time gap between executions.

Fig. 2.4 Types of Schedulers

• Medium-term scheduler, also known as swapper, comes into


play whenever a process is to be removed from the ready queue
(or from the CPU in case it is being executed) thereby reducing
the degree of multiprogramming. This process is stored at some
space on the hard disk and later brought into the memory to
restart execution from the point where it left off. This task of
temporarily switching a process in and out of main memory is
known as swapping (discussed in detail in Chapter 7). The
medium-term scheduler selects a process among the partially
executed or unexecuted swapped-out processes and swaps it in
the main memory. The medium-term scheduler is usually invoked
when there is some unoccupied space in the memory made by
the termination of a process or if the supply of ready processes
reduces below a specified limit.

Context Switch
Transferring the control of CPU from one process to another
demands saving the context of the currently running process and
loading the context of another ready process. This mechanism of
saving and restoring the context is known as context switch. The
portion of the process control block including the process state,
memory management information, and CPU scheduling information
together constitute the context (also called state information) of a
process. Context switch may occur due to a number of reasons some
of which are as follows:
• The current process terminates and exits from the system.
• The time slice of the current process expires.
• The process has to wait for I/O or some other resource.
• Some higher priority process enters the system.
• The process relinquishes the CPU by invoking some system call.
Context switching is performed in two steps, which are as follows:
1. Save context: In this step, the kernel saves the context of the
currently executing process in its PCB of the process so that it
may restore this context later when its processing is done and the
execution of the suspended process can be resumed.
2. Restore context: In this step, the kernel loads the saved context
of a different process that is to be executed next. Note that if the
process to be executed is newly created and the CPU has not yet
been allocated to it, there will be no saved context. In this case,
the kernel loads the context of the new process. However, if the
process to be executed was in waiting state due to I/O or some
other reason, there will be saved context that can be restored.
One of the major detriments of using context switching is that it
incurs a huge cost to the system in terms of real time and CPU cycles
because the system does not perform any productive work during
switching. Therefore, as far as possible, context switching should be
generally refrained from, otherwise it can amount to reckless use of
time. Figure 2.5 shows context switching between two processes P1
and P2.
Fig. 2.5 Context Switching between Processes P1 and P2

2.4 OPERATIONS ON PROCESSES


There are innumerable operations that can be performed on
processes such as creating, terminating, suspending, or resuming a
process, etc. To successfully execute these operations, the operating
system provides run-time services (or system calls) for the process
management. The user may invoke these system calls either directly
by embedding the process supervisory calls in the user’s program or
indirectly by typing commands on the terminal which are translated by
the system into system calls. In this section, we will discuss only
process creation and termination operations.

2.4.1 Process Creation


Whenever an operating system is booted, a number of processes
(system processes) are created automatically. Out of these, some
involve user interaction (called foreground processes) while others
are not related with any user, but still perform some specific function
(called background processes). In addition to system processes,
new processes can be created afterward as well. Sometimes, a user
process may need to create one or more processes during its
execution. It can do the same by invoking the process creation
system call (for example, CreateProcess() in Windows and fork() in
UNIX which tells the operating system to create a new process. This
task of creating a new process on the request of some other process
is called process spawning. The process that spawns a new
process is called parent process whereas the spawned process is
called child process (or sub process). The newly created process
can further create new processes thereby generating hierarchy of
processes.
Whenever a process creates a child process, there are chances
of innumerable situations that may arise depending on the operating
system installed. Some of these situations are as follows:
• Either the parent and child process may run concurrently
(asynchronous process creation) or the parent process may
wait until the child process completes its task and terminates
(synchronous process creation).
• The newly created process may be the duplicate of the parent
process in which case it contains a copy of the address space of
its parent. On the other hand, the child process may have a new
program loaded into its address space.
• The child process may be restricted to a subset of resources
available to the parent process or the child process may obtain its
resources directly from the operating system. In the former case,
the resources being used by the parent process need to be
divided or shared among its various child processes.
Note: Every time a process creates a new process, the PID of the
child process is passed on to the parent process.

2.4.2 Process Termination


Depending upon the condition, a process may be terminated either
normally or forcibly by some other process. Normal termination
occurs when the process completes its task and invokes an
appropriate system call (for example, ExitProcess() in Windows and
exit() in UNIX) to tell the operating system that it is finished. As a
result, all the resources held by the process are de-allocated, the
process returns output data (if any) to its parent, and finally the
process is removed from the memory by deleting its PCB from the
process table.
Note: A process that no longer exists but still its PCB has not been
removed from the process table is known as a zombie process.
Contrary to this, a process may cause abnormal termination of
some other process. For this, the process invokes an appropriate
system call (for example, TerminateProcess() in Windows and kill()
in UNIX) that tells the operating system to kill some other process.
Generally, the parent process can invoke such a system call to
terminate its child process. This usually happens because of the
following reasons.
• Cascading termination in which the termination (whether normal or
forced) of a process causes the termination of all its children. On
some operating systems, a child process is not allowed to
execute when its parent is being terminated. In such cases, the
operating system initiates cascading termination.
• The task that was being performed by the child process is not
required.
• The child process has used up the allocated resources for more
than the permissible time.

2.5 COOPERATING PROCESSES


The processes that coexist in the memory at some time are called
concurrent processes. The concurrent processes may either be
independent or cooperating. The independent processes (also called
competitors), as the name implies, do not share any kind of
information or data with each other. They just compete with each
other for the resources like CPU, and I/O devices that are required to
accomplish their operations. The cooperating (also called
interacting) processes, on the other hand, need to exchange data or
information with each other. In other words, we can say a cooperating
process is the one that can affect or be affected by the actions of
other concurrent processes. The need for cooperation among
processes arises because of the following reasons.
• Several processes may need access to same information. This
requires the operating system to provide a means for concurrent
access to the desired information.
• If a computer system has multiple processing elements (for
example, multiple CPUs or multiple I/O channels), we can make a
task to execute faster by breaking it into various subtasks and
running each of them in parallel.
• The environment supporting cooperating processes will help a
single user to carry out multiple tasks at the same time. For
example, a single user may be opening, printing, and compiling at
the same time.
• The system’s functions can be divided into different processes
and threads in order to construct the system in a modular fashion.
2.6 INTER-PROCESS COMMUNICATION
Cooperating processes require some mechanism to exchange data
or pass information to each other. One such mechanism is inter-
process communication (IPC)—a very useful facility provided by
the operating system. IPC allows the processes running on a single
system to communicate with each other. Two basic communication
models for providing IPC are shared memory systems and
message passing systems. In the former model, a part of memory
is shared among the cooperating processes. The processes that
need to exchange data or information can do so by writing to and
reading from this shared memory. However, in the latter model, the
cooperating processes communicate by sending and receiving
messages from each other. The communication using message
passing is much more time consuming as compared to shared
memory. This is because the message passing system is
implemented with the help of operating system calls and thus, it
requires a major involvement of kernel. On the other hand, in shared
memory systems, system calls are used only to set up the shared
memory area. Once the shared area is set up, no further kernel
intervention is required.

2.6.1 Shared Memory Systems


In shared memory systems, the process that needs to communicate
with the other processes creates a shared memory segment in its
own address space. Other processes can communicate with this
process by attaching its shared memory segment along with their
address space. All the communicating processes can read or write
data through this shared area. Note that these processes must be
synchronized so that no two processes are able to access the shared
area simultaneously. Figure 2.6 shows a shared memory
communication model.
Fig. 2.6 Shared Memory Communication Model

To understand the concept of shared memory systems, consider a


common example of cooperating processes known as producer-
consumer problem. In this problem, there are two processes, one is
producer that produces the items and the other is consumer that
consumes the items produced by the producer. These two processes
need to run concurrently thereby requiring communication with each
other. One possible solution to this problem can be provided through
shared memory. Both the producer and consumer processes are
made to share a common buffer between them. The producer
process fills the buffer by placing the produced items in it and the
consumer process vacates the buffer by consuming these items.
The buffer shared between producer and consumer processes
may be bounded or unbounded. In bounded buffer, the size of buffer
is fixed; therefore, the producer process has to wait in case the buffer
is full; similarly the consumer process has to wait in case the buffer is
empty. On the other hand, in unbounded buffer, there is no limit on
the buffer size. Thus, only the consumer process has to wait in case
there is no item to be consumed. However, the producer process
need not wait and it may continuously produce items.
To implement the bounded buffer producer-consumer problem
using shared memory, consider that the shared buffer consists of N
slots with each capable of storing an item. Further, assume that the
buffer is implemented as a circular array having two pointers in and
out. The pointer in points to the next free slot in the buffer, while the
pointer out points to the slot containing the next item to be consumed.
Initially, both in and out are set to zero. The following code written in
‘C’ language illustrates the implementation of shared area.

To implement the producer process, a local variable


item_produced is used that stores the newly produced item. The
producer process produces an item, places it in the buffer at the
position denoted by in, and updates the value of in. It continues to do
so as long as buffer is not full. Once the buffer gets full, that is, when
(in + 1) % size == out, it goes to the waiting state and remains in
that state until some slot becomes free in the buffer (that is, until the
consumer process removes some item from the buffer). The following
code illustrates the implementation of the producer process.
Likewise, to put into effect the consumer process, a local variable
item_consumed is used that stores the item to be consumed. The
consumer process removes an item from the position denoted by out
in the buffer, updates the value of out, and consumes that item. It
continues to do so as long as the buffer is not empty. Once the buffer
gets empty, that is, when in == out, it goes to the waiting state and
remains in that state until the producer process places some item in
the buffer. The following code illustrates the implementation of the
consumer process.
Note: For the sake of simplicity, we have assumed that the item in the
buffer is of type integer, and the implementation of procedures for
producing or consuming items is not shown here.
This solution to bounded buffer producer-consumer problem
permits to have at most size-1 items in the buffer at the same time. In
order to have size items in the buffer at the same time, we will need
to develop a different solution. In addition, this solution does not
address how to implement synchronization between producer and
consumer processes. Both the solution and the synchronization are
discussed in Chapter 5.

2.6.2 Message Passing Systems


In message passing systems, two system calls, send() and
receive(), are used. The sender process (say, P1) sends the
message to the operating system by invoking the send() system call.
The operating system stores this message in the buffer area until the
receive() system call is invoked by the receiver process (say, P2).
After that the operating system delivers this message to P2. In case
there is no message available for P2 when it invokes the receive()
system call, the operating system blocks it until some message
arrives for it. On the other hand, if a number of messages arrive for
P2, the operating system puts them in a queue and delivers them in
FIFO order upon the invocation of receive() call (one for each
process) by P2. Figure 2.7 shows the message passing
communication model.

Fig. 2.7 Message Passing Communication Model

In message passing, it is not necessary for the communicating


processes to reside on the same computer rather they may reside on
different computers connected via a network (a distributed
environment). Therefore, whenever two processes want to
communicate, a communication link must be established between
them. At the physical level, the communication link may be
implemented via shared variables or bus or the network, etc.
However, at the logical level, some features related with the
implementation of communication link arise, which are discussed
here.

Types of Communication
Processes may communicate with each other directly or indirectly.
In direct communication, processes address each other by their
PID assigned to them by the operating system. For example, if a
process P1 wants to send a message to process P2, then the system
calls send() and receive() will be defined as follows:
• send(PID2, message)

• receive(PID1, message)

Since both sender and receiver process need to know each


other’s PID, this type of communication is known as symmetric direct
communication. However, asymmetry in addressing can be
represented by making only the sender process to address the
receiver process by its PID but the receiver process need not know
the PID of the sender process. In the case of asymmetric direct
communication, the calls send() and receive() will be defined as
follows:
• send(PID2, message)

• receive(id, message)

Now, when the operating system delivers a message to process


P2 upon the invocation of a receive() call by it, the parameter id is
replaced with the PID of the sender process.
In indirect communication, messages are sent and received via
mailbox (also known as port)—a repository of inter-process
messages. A mailbox, as the name implies, is just like a postbox into
which messages sent by the processes can be stored and removed
by other processes. The different characteristics of a mailbox are as
follows:
• Each mailbox has a unique ID and the processes communicate
with each other through a number of mailboxes.
• The process that creates the mailbox is the owner of mailbox and
only this process can receive messages from it. Other processes
can only send messages to it. In other words, there can be
multiple senders but a single recipient for a mailbox.
• The process that knows the ID of a mailbox can send messages
to it.
• Besides a user process, the operating system may also own a
mailbox. In this case, the operating system may allow the
processes to create or delete a mailbox, send and receive
messages via mailbox. The process that creates the mailbox
becomes the owner of that mailbox and may receive messages
through this mailbox. However, with time, other processes can
also be made to receive messages through this mailbox by
passing ownership to them.
The system calls to send a message to a mailbox (say, X) and
receive a message from a mailbox will be defined as follows:
• send(X, message)

• receive(X, message)

As stated earlier, a communication link must exist between


processes before starting the communication. The communication
link exhibits different properties in direct and indirect communication,
which are discussed in Table 2.1.

Table 2.1 Comparison of Direct and Indirect Communication

Direct communication Indirect communication


• There exists only one link • There may be multiple links
between each pair of between each pair of
communicating processes. communicating processes,
where each link corresponds
to exactly one mailbox.
• A link is associated with just • A link may be associated with
two processes. more than two processes.
• The link is established • The communication link can
automatically between the be established between two
communicating processes, processes only if both the
provided the sender communicating processes
process knows the PID of share a mailbox with each
the receiver process. other.

Synchronization
Messages can be sent or received either synchronously or
asynchronously, also called blocking or non-blocking,
respectively. Various design options for implementing send() and
receive() calls are as follows:

• Blocking send: If a process (say, P1) invokes send() call to send


a message to another process (say, P2) or to a mailbox, the
operating system blocks P1 until the message is received by P2 or
by the mailbox.
• Blocking receive: If there is no message available for P2 when it
invokes the receive() system call, the operating system blocks it
until some message arrives for it.
• Non-blocking send: P1 sends the message and continues to
perform its operation without waiting for the message delivery by
P2 or by mailbox.

• Non-blocking receive: When P2 invokes a receive() call, it either


gets a valid message if some message is available for it or NULL if
there is no message available for it.

Buffering
As discussed earlier, the messages sent by a process are temporarily
stored in a temporary queue (also called buffer) by the operating
system before delivering them to the recipient. This buffer can be
implemented in a variety of ways, which are as follows:
• No buffering: The capacity of buffer is zero, that is, no messages
may wait in the queue. This implies that sender process has to
wait until the message is received by the receiver process.
• Bounded buffer: The capacity of the buffer is fixed, say m, that
is, at most m processes may wait in the queue at a time. When
there are less than m messages waiting in the queue and a new
message arrives, it is added in the queue. The sender process
need not wait and it can resume its operation. However, if the
queue is full, the sender process is blocked until some space
becomes available in the queue.
• Unbounded buffer: The buffer has an unlimited capacity, that is,
an infinite number of messages can be stored in the queue. In
this case, the sender process never gets blocked.
• Double buffering: Two buffers are shared between the sender
and receiver process. In case one buffer fills up, the second one
is used. When the second buffer fills up, the first might have been
emptied. This way the buffers are used turn by turn, thus avoiding
the blocking of one process because of another.

2.7 COMMUNICATION IN CLIENT-SERVER


SYSTEMS
So far we have discussed the communication mechanism for the
processes running on a single system. However, in an environment
(for example, client-server architecture) where processes are running
on separate systems connected via network, a different mechanism is
required to enable the communication. In this section, we discuss
some mechanisms that facilitate remote communications.

2.7.1 Socket
Socket is defined as an end-point of the communication path between
two processes. Each of the communicating processes creates a
socket and these sockets are to be connected enabling
communication. The socket is identified by a combination of IP
address and the port number. The IP address is used to identify the
machine on the network and the port number is used to identify the
desired service on that machine.
Usually, a machine provides a variety of services such as
electronic mail, Telnet, FTP, etc. To differentiate among these
services, each service is assigned with a unique port number. To avail
some specific service on a machine, first it is required to connect to
the machine and then connect to the port assigned for that service.
Note that the port numbers less than 1024 are considered well-known
and are reserved for standard services. For example, the port number
used for Telnet is 23.
Sockets employ client-server architecture. The server listens to a
socket bound to a specific port for a client to make connection
request. Whenever a client process requests for a connection, it is
assigned a port number (greater than 1024) by the host computer
(say M). Using this port number and the IP address of host M, the
client socket is created. For example, if the client on host M having IP
address (125.61.15.7) wants to connect to Telnet server (listening to
port number 23) having IP address (112.56.71.8), it may be assigned
a port number 1345. Thus, the client socket and server socket used
for communication will be (125.61.15.7:1345) and (112.56.71.8:23)
respectively as shown in the Figure 2.8.

Fig. 2.8 Communication between Sockets

Note that each connection between the client and the server
employs a unique pair of sockets. That is, if another client on host M
wants to connect to Telnet server, it must be assigned a port number
different from 1345 (but greater than 1024).

2.7.2 Remote Procedure Call (RPC)


RPC, as the name implies, is a communication mechanism that
allows a process to call a procedure on a remote system connected
via network. The calling process (client) can call the procedure on the
remote host (server) in the same way as it would call the local
procedure. The syntax of RPC call is very similar to conventional
procedure call as given below.
Call <Procedure_id> (<List of parameters>);

The RPC system facilitates communication between the client and


the server by providing a stub on both client and server. For each
remote procedure, the RPC system provides a separate stub on the
client side. When the client process wants to invoke a remote
procedure, the RPC call is implemented in the following steps.
1. The RPC system invokes the stub for the remote procedure on
the client, passing to it the parameters that are to be passed
further to the remote procedure. The client process is suspended
from execution until the call is completed.
2. The client stub performs parameter marshalling, which involves
packaging the parameters into a machine-independent form so
that they can be transmitted over the network. It now prepares a
message containing the identifier of the procedure to be
executed and the marshalled parameters.
3. The client stub sends the message to the server. After the
message has been sent, the client stub blocks until it gets the
reply to its message.
4. The corresponding stub on the server side receives the message
and converts the parameters into a machine-specific form
suitable for the server.
5. The server stub invokes the desired procedure, passing
parameters to it. The server stub is suspended from execution
until completion of the call.
6. The procedure executes and the results are returned to the
server stub.
7. The server stub converts the results into a machine-independent
form and prepares a message.
8. The server stub sends the message containing the results to the
client stub.
9. The client stub converts the results into machine-specific form
suitable for the client.
10. The client stub forwards the results to the client process. With
this, the execution of RPC is completed, and now, the client
process can continue its execution.
Figure 2.9 depicts all the steps involved in the execution of RPC.

Fig. 2.9 Implementation of RPC

2.7.3 Remote Method Invocation (RMI)


RMI is a Java-based approach that facilitates remote communication
between programs written in the Java programming language. It
allows an object executing in one Java virtual machine (JVM) to
invoke methods on an object executing in another Java virtual
machine either on the same computer or on some remote host
connected via network.
To enable the communication between the client and the server
using RMI, the remote methods must be transparent both to the client
and the server. For this, RMI implements the remote objects using
stubs and skeletons. A stub is a client-side proxy for a remote object
while a skeleton is the server-side proxy for the remote object. On
the client side, the stub acts on behalf of the actual remote object.
Whenever a client process wishes to invoke a remote method, the
stub for the remote object is called. This stub prepares a parcel that
contains the name of the method to be invoked on the server along
with the marshalled parameters and sends it to the server. At the
server, the skeleton for the remote object receives the parcel,
unmarshalls the parameters, and invokes the desired method. After
the execution of the method on the server, the skeleton prepares a
parcel containing the marshalled return value (or exception, if any)
and sends it to the client. The client stub then unmarshalls the return
value and forwards it to the client. Figure 2.10 shows the RMI
communication.

Fig. 2.10 RMI Communication

LET US SUMMARIZE
1. A process is a program under execution or an executing set of machine
instructions. Its can be either a system process executing the system’s
code or a user process executing the user’s code.
2. A process comprises not only the program code (known as text section)
but also a set of global variables (known as data section) and the process
control block (PCB).
3. The processes that involve more computation than I/O operations thereby
demanding greater use of CPU than I/O devices during their lifetime are
called CPU-bound or compute-bound processes.
4. The processes that involve a lot of I/O operations as compared to
computation during their lifetime are called I/O-bound processes.
5. Each process is labeled with a ‘state’ variable—an integer value that helps
the operating system to decide what to do with the process. It indicates
the nature of the current activity in a process.
6. Various possible states for a process are new, ready, running, waiting, and
terminated.
7. The change in state of a process is known as state transition of a process
and is caused by the occurrence of some event in the system.
8. To keep track of all the processes in the system, the operating system
maintains a table called process table that includes an entry for each
process. This entry is called process control block (PCB).
9. A process control block stores descriptive information pertaining to a
process such as its state, program counter, memory management
information, information about its scheduling, allocated resources,
accounting information, etc, that is required to control the process.
10. The procedure of determining the next process to be executed on the CPU
is called process scheduling and the module of the operating system that
makes this decision is called scheduler.
11. As the processes enter the system for execution, they are kept into a
queue called job queue (or input queue).
12. From the job queue, the processes which are ready for the execution are
brought into the main memory. In the main memory, these processes are
kept into a queue called ready queue.
13. For each I/O device in the system, a separate queue called device queue
is maintained. The process that needs to perform I/O during its execution
is kept into the queue of that specific I/O device and waits there until it is
served by the device.
14. The long-term scheduler, also known as job scheduler or admission
scheduler, selects the next process to be executed from the job queue
and loads it into the main memory for execution.
15. The short-term scheduler, also known as CPU scheduler or process
scheduler, selects a process from the ready queue and allocates the CPU
to it.
16. The medium-term scheduler, also known as swapper, selects a process
among the partially executed or unexecuted swapped-out processes and
swaps it in the main memory.
17. Transferring the control of CPU from one process to another demands
saving the context of the currently running process and loading the
context of another ready process. This task of saving and restoring the
context is known as context switch.
18. The portion of the process control block including the process state,
memory management information and CPU scheduling information
together constitute the context (also called state information) of a process.
19. A user process may create one or more processes during its execution by
invoking the process creation system call.
20. The task of creating a new process on the request of some other process
is called process spawning. The process that spawns a new process is
called parent process whereas the spawned process is called child
process.
21. When a process is terminated, all the resources held by the process are
de-allocated, the process returns output data (if any) to its parent, and
finally the process is removed from the memory by deleting its PCB from
the process table.
22. The processes that coexist in the memory at some time are called
concurrent processes. Concurrent processes may either be independent
or cooperating.
23. Independent processes (also called competitors), as the name implies, do
not share any kind of information or data with each other.
24. Cooperating (also called interacting) processes, on the other hand, need
to exchange data or information with each other.
25. The cooperating processes require some mechanism to communicate with
each other. One such mechanism is inter-process communication (IPC)—
a facility provided by the operating system.
26. Two basic communication models for providing IPC are ‘shared memory
systems’ and ‘message passing systems’.
27. A process running on a system can communicate with another process
running on remote system connected via network with the help of
communication mechanisms, including sockets, remote procedure call
(RPC), and remote method invocation (RMI).
EXERCISES
Fill in the Blanks
1. A process comprises _____________, _____________, and
_____________.
2. Context switching is performed in two steps, which are _____________
and _____________.
3. The processes that coexist in the memory at some time are called
_____________.
4. The two basic communication models for providing IPC are
_____________ and _____________.
5. A process that no longer exists but its PCB is still not removed from the
process table is known as a _____________.

Multiple Choice Questions


1. Which of the following state transitions is not possible?
(a) Ready → Running
(b) Running → Waiting
(c) Ready → Waiting
(d) Waiting → Ready
2. Which of the following is responsible for selecting a process among the
swapped-out processes and bringing it in the main memory?
(a) Short-term scheduler
(b) Medium-term scheduler
(c) Long-term scheduler
(d) None of these
3. Which of the following ways can be used to implement the buffer in
message passing system?
(a) No buffering
(b) Bounded buffer
(c) Unbounded buffer
(d) All of these
4. Which of the following is a temporary queue that stores the messages
sent by processes?
(a) Buffer
(b) Processor
(c) Scheduler
(d) None of these
5. PCB stands for:
(a) Process control buffer
(b) Processor controller and buffer
(c) Process controller block
(d) Process control block

State True or False


1. The operating system maintains a process table to keep track of all the
processes in the system.
2. If a process was blocked because of I/O wait and now is ready for
execution, then it will be placed in the job queue.
3. To keep track of all the processes in the system, the operating system
maintains a structurally organized process table.
4. The task of temporarily switching a process in and out of main memory is
known as swapping.
5. For each I/O device in the system, a separate queue called device queue
is maintained.

Descriptive Questions
1. What does a process control block contain?
2. Distinguish between CPU-bound and I/O-bound process.
3. Discuss various states of a process.
4. Describe the events under which state transitions between ready, running
and waiting take place.
5. What is the difference between symmetric and asymmetric direct
communication?
6. List three important fields stored in a process control block.
7. Distinguish among long-term, short-term and medium-term scheduler.
8. What is context switching? How is it performed, and what is its
disadvantage?
9. Describe the different models used for inter-process communication.
Which one is better?
10. In message passing systems, the processes can communicate directly or
indirectly. Compare both the ways.
11. Write short notes on the following:
a. Remote method invocation (RMI)
b. Cooperating processes
c. Scheduling queues
12. Consider the indirect communication method where mailboxes are used.
What will be the sequence of execution of send() and receive() calls in
the following two cases?
a. Suppose a process P wants to wait for two messages, one from
mailbox M and other from mailbox N.
b. Suppose P wants to wait for one message from mailbox M or from
mailbox N (or from both).
13. Explain the remote procedure call (RPC) method of communication in
client-server systems.
chapter 3

Threads

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the basic concepts of threads.
⟡ List various advantages of threads.
⟡ Describe the ways of implementing threads.
⟡ Discuss various models of multithreading.
⟡ Understand various threading issues.
⟡ Discuss the Pthreads library and its functions.

3.1 INTRODUCTION
In conventional operating systems, each process has a single thread
of control, that is, the process is able to perform only one task at a
time. To implement multiprogramming, multiple processes with each
having a separate address space may be created and the CPU may
be switched back and forth among them to create the illusion that the
processes are running in parallel. But as discussed in the previous
chapter, process creation and switching are time-consuming and
resource-intensive and thus, incur an overhead to the system.
Therefore, many modern operating systems employ multithreading
that allows a process to have multiple threads of control within the
same address space. These threads may run in parallel thereby
enabling the process to perform multiple tasks at a time.
3.2 THREAD CONCEPT
A thread is defined as the fundamental unit of CPU utilization. A
traditional process comprises a single thread of control, that is, it can
execute one task at a time and thus, is referred to as a single-
threaded process. However, to make the process perform several
tasks simultaneously, multiple threads of a single process can be
created with each thread having its own ID, stack and a set of
registers. In addition, all the threads of the same process share with
each other the code section, data section, and other resources
including list of open files, child processes, signals, etc., of the
process. A process with multiple threads of control is referred to as a
multithreaded process. Figure 3.1 shows the structure of a single-
thread process and a multithreaded process with four threads
(indicated by wavy lines) of control.

Fig. 3.1 Structure of Single-threaded and Multithreaded Process

Note: The traditional processes are termed heavyweight while a thread is


referred to as lightweight process (LWP).
Thread versus Process
As discussed in Chapter 2, a process is a program under execution.
Generally, the execution of a program constitutes multiple processes
in order to achieve concurrency within an application. However, this
idea of providing concurrency incurs high process switching
overhead, as it involves saving the context of currently running
process and loading the context of new process. A low-cost
alternative to achieve concurrency within an application is to use the
notion of threads.
A thread is also a program execution that utilizes the resources of
the process which it belongs to. A process may have multiple threads
which execute within the environment of process. Thus, the operating
system needs to save only the CPU state and stack pointer while
switching between threads of the same process. The resource state
is to be switched only when switching to a thread belonging to a
different process. This makes thread switching considerably faster
than process switching, reducing the switching overhead.
Some other differences between threads and processes are as
follows:
• A thread is a subset of a process, that is, it is dependent on the
process, whereas the processes may be independent.
• Each child process has a separate address space from that of its
parent while the threads belonging to the same process share the
address space of the process.
• Creation of a thread typically requires fewer resources than
process creation. Whenever a thread is created, the operating
system allocates to it a data structure that holds a set of registers,
stack, and priority of the thread. In contrast, when a process is
created, a process control block (PCB) is allocated, which is a
large data structure. The PCB includes a memory map, list of
open files, and environment variables. Allocating and managing
the memory map is typically the most time-consuming activity.
3.2.1 Advantages of Threads
The major advantage that threads provide over processes is low
overhead during switching. In addition, threads offer some other
advantages, which are as follows:
• Computational speedup: On a single processor system, a
process can be executed speedily by creating multiple threads in
the process and executing them in a quasi-parallel manner (that
is, by rapidly switching the CPU among multiple threads).
• Economic: Thread creation is more economical than process
creation. Every time a process is created, some memory and
resources are required to be allocated to it. On the other hand,
threads share the resources of the process to which they belong,
so there is no need to allocate memory and resources at the time
of thread creation.
• Efficient communication: As different threads of a process share
the same address space, communication among them can be
made via the shared memory. There is no need to execute
system calls, which cause extra overhead.
• Proper utilization of multiprocessor architecture: In
multiprocessor systems, threads prove more useful than
processes. Multiple threads of a single process can be made to
run on different CPUs at the same time, thereby achieving real
parallelism. In contrast, a single process can run only on one
CPU regardless of the number of available CPUs.
• Responsiveness: In the case of interactive processes, the major
performance criterion is response time. If such a process is
multithreaded, a part of the process (thread) is able to run even if
some other part of the process is blocked. As a result,
responsiveness of the process to the user increases.

3.2.2 Implementation of Threads


Threads can be implemented in different ways depending on the
extent to which the process and the operating system know about
them. Here, we will discuss two methods for implementing threads,
namely, kernel-level and user-level threads.

Kernel-level Threads
Kernel-level threads are implemented by the kernel, which is
responsible for creating, scheduling, and managing threads within the
kernel space. It maintains a thread table in addition to the process
table that holds the program counter, stack pointer, registers, state,
etc., of each thread in the system. Whenever a process wishes to
create or terminate a new thread, it initiates a system call to the
kernel. In response, the kernel creates or terminates the thread by
modifying the thread table. Many modern operating systems including
Solaris 2, Windows 2000, and Windows NT provide support for
kernel-level threads.

Advantages
• In a multiprocessor environment, multiple kernel-level threads
belonging to a process can be scheduled to run simultaneously
on different CPUs thereby resulting in computation speedup.
• As the threads are managed directly by the kernel, if one thread
issues a system call that blocks it, the kernel can choose another
thread to run either from the same process (to which the blocked
thread belongs) or from some different process.

Disadvantages
• The cost of creating and destroying threads in the kernel is
relatively greater than that of user-level threads.
• The kernel performs switching between the threads, which incurs
overhead to the system.

User-level Threads
User-level threads are implemented by a thread library associated
with the code of a process. The thread library provides support for
creating, scheduling, and managing threads within the user space
without any involvement from the kernel. Thus, the kernel is unaware
of the existence of threads in a process; it is concerned only with
managing single-threaded processes. Whenever a process wishes to
create or terminate a thread, it can do so by calling an appropriate
function from the thread library without the need of kernel
intervention. Moreover, each process maintains its own thread table
that keeps track of the threads belonging to that process and the
kernel maintains only the process table. POSIX Pthreads, Solaris 2
UI-threads, and Mach C-threads are some of the user-thread
libraries.

Advantages
• The user-level threads can be created and managed at a faster
speed as compared to kernel-level threads.
• The thread switching overhead is smaller as it is performed by the
thread library and there is no need to issue the system call.
• The thread library can schedule threads within a process using a
scheduling policy that best suits the process’s nature. For
example, for a real-time process, a priority-based scheduling
policy can be used. On the other hand, for a multithreaded Web
server, round-robin scheduling can be used.

Disadvantages
• At most, one user-level thread can be in operation at one time,
which limits the degree of parallelism.
• If one user-level thread issues a blocking system call, the kernel
blocks the whole process to which the thread belongs even if
there is some other thread that is ready to run. This is because
the kernel does not know the difference between a thread and a
process; it simply treats a thread like a process.

3.3 MULTITHREADING MODELS


Many systems support a hybrid thread model that contains both user-
and kernel-level threads along with a relationship between these
threads. There may exist different types of relationship between user-
and kernel-level threads, each resulting in a specific multithreading
model. In this section, we will discuss three common multithreading
models.

3.3.1 Many-to-One (M:1) Model


In this model, the kernel creates only one kernel-level thread in each
process and the multiple user-level threads (created by thread library)
of the process are associated with this kernel-level thread (see Figure
3.2). As the threads are managed in the user space, this model
produces a similar effect as that of user-level threads. An example of
a thread library that employs this model is Green threads which is
available for Solaris 2.

Fig. 3.2 Many-to-One (M:1) Model

Advantages
• It incurs a low switching overhead as kernel is not involved while
switching between threads.

Disadvantages
• If one user-level thread issues a blocking system call, the kernel
blocks the whole parent process.
• As the kernel-level thread can be accessed by only one user-level
thread at a time, multiple user-level threads cannot run in parallel
on multiple CPUs thereby resulting in low concurrency.

3.3.2 One-to-One (1:1) Model


In this model, each user-level thread is associated with a kernel-level
thread (see Figure 3.3). The threads are managed by the kernel;
therefore, this model provides an effect similar to the kernel-level
threads. Many modern operating systems such as Windows 2000
and Windows NT employ this model.

Advantages
• Multiple threads can run in parallel on multiple CPUs in a
multiprocessor environment and thus, greater concurrency is
achieved.

Fig. 3.3 One-to-One (1:1) Model

• As each user-level thread is mapped into a different kernel-level


thread, blocking of one user-level thread does not cause other
user-level threads to block.

Disadvantages
• It results in high switching overhead due to the involvement of
kernel in switching.
• Most implementations of this model restrict on the number of
threads that can be created in a process. This is because
whenever a user-level thread is created in a process, a
corresponding kernel-level thread is also required to be created.
The creation of many kernel-level threads incurs an overhead to
the system, thereby degrading the performance.

3.3.3 Many-to-Many (M:M) Model


In this model, many user-level threads are associated with many
kernel-level threads with the number of kernel-level threads being
equal to or less than that of user-level threads (see Figure 3.4). This
implies that more than one user-level threads may be associated with
the same kernel-level thread. This model overcomes the limitations of
both many-to-one and one-to-one models. The operating systems
including Solaris 2 and Tru64 UNIX employ this model.

Fig. 3.4 Many-to-Many (M:M) Model

Advantages
• Many user-level threads can be made to run in parallel on different
CPUs by mapping each user-level thread to a different kernel-
level thread.
• Blocking of one user-level thread does not result in the blockage
of other user-level threads that are mapped into different kernel-
level threads.
• Switching between user-level threads associated with the same
kernel-level thread does not incur much overhead.
• There is no restriction on the number of user-level threads that
can be created in a process; as many user-level threads as
required can be created.

Disadvantages
• The implementation of this model is very complex.

3.4 THREADING ISSUES


While multithreaded programs are executed, a number of issues
arise. In this section, we will discuss some of these issues.

3.4.1 fork() and exec() System Calls


Recall from the previous chapter the usage of fork() and exec()
system calls. Whenever a process invokes the fork() system call, a
new (child) process is created that is the exact duplicate of its parent
process. The child process executes the same code as that of its
parent. However, if it requires to load some other program in its
address space, it can do so by invoking the exec() system call,
passing the name of the desired program as a parameter to it.
In the case of multithreaded programs, the semantics of fork()
and exec() system calls are somewhat different. Here, the question
arises whether the newly-created thread (upon invocation of fork()
by one thread) should contain all the threads of the process to which
the invoking thread belongs or only the invoking thread itself. In
response to this, many UNIX systems offer two versions of fork()
system call: one to duplicate all the threads and another to duplicate
only the invoking thread.
The selection of a particular version of fork() system call to be
invoked depends on the application. If the newly-created process is to
invoke exec() system call immediately after the fork() system call, it
is unnecessary to duplicate all the threads. Thus, the latter version of
fork() should be used. On the other hand, if the newly-created
process does not require to invoke exec() after fork(), all threads
should be duplicated. Therefore, the former version of fork() should
be used.

3.4.2 Thread Cancellation


The procedure of terminating a thread before it completes its
execution is known as thread cancellation and the thread that is to
be cancelled is known as target thread. Thread cancellation may be
performed in any of the following ways:
• Asynchronous cancellation: In this type of cancellation, the
target thread is terminated immediately after any thread indicates
its cancellation. The operating system may acquire the resources
allocated to the cancelled thread but not necessarily all the
resources. Thus, asynchronous thread cancellation may not
release a system-wide resource thereby leaving the system in an
inconsistent state. Many operating systems support
asynchronous thread cancellation.
• Deferred cancellation: In this type of cancellation, the target
thread is not terminated immediately, rather it checks at regular
intervals whether it should be terminated or not. Thus, the target
thread gets the opportunity to terminate itself in an orderly
manner. Deferred cancellation ensures system consistency by
defining points in the code of a thread where it can safely be
cancelled. Whenever the target thread determines that it should
be terminated, it first checks whether it can safely be terminated.
If so, the target thread terminates; otherwise, its cancellation may
be deferred until it executes up to the safe point.

3.4.3 Thread-specific Data


As we know that threads of a process share the process’s data with
each other, sometimes, a thread may require having its own copy of
certain data, termed as thread-specific data. For example, consider
an airline reservation application in which a separate thread is
created to handle each client’s request for flight reservation. Each
request may be assigned a unique ID in order to distinguish among
multiple clients. Now, to relate each thread with the ID of request it is
handling, we would need thread-specific data.

3.5 THREAD LIBRARIES


A library which provides the programmers with an application
programming interface (API) for thread creation and management is
referred to as a thread library. A thread library can be implemented
either in the user space or in the kernel space. In the former case, the
entire code and data structures for the library reside in the user space
and there is no support from the kernel. If any function is invoked in
the API, it is treated as a local function call in the user space instead
of a system call. In contrast in the latter case, the entire code and
data structures for the library reside in the kernel space; the operating
system directly supports the kernel-level library. Thus, whenever a
function is called in API for the library, a system call is made to the
kernel.
POSIX Pthreads, Win 32, and Java are the three main thread
libraries which are in use today. Here, we discuss the Pthreads library
only.
Note: Win32 is a kernel-level library that provides API for creating
and managing threads on Windows systems, while Java thread
library provides API for creating and managing threads directly in
Java programs.

3.5.1 Pthreads Library


Pthreads refer to thread extensions of POSIX standard (IEEE
1003.1c) that can be implemented either in the kernel space or in the
user space as per the operating system’s designer choice. It
comprises different functions that are used for managing Pthreads. In
this section, we discuss some of these functions.

Creating a Pthread
A process can create a Pthread by calling the pthread_create()
function. The syntax of this function is as follows:
pthread_create (ptr_id, attr, start_routine, arg);

where
ptr_id is a pointer to the memory location where the ID of Pthread
will be stored.
attr specifies an attributes object that defines the attributes to be
used in Pthread creation.
start_routine is the routine to be executed by the newly-created
Pthread.
arg is the single argument that is passed to the Pthread during its
creation.
Once a Pthread has been created, it starts executing the
start_routine function within the environment of the process that has
created it.

Terminating a Pthread
A Pthread can get terminated under any of the following
circumstances.
• When it calls the pthread_exit(status_code) function.
• After itreturns from its start_routine, because then
pthread_exit() function is called implicitly.

• When some other Pthread cancels it by calling the


pthread_cancel() function.

• When the process that has created it terminates.


Detaching a Pthread
A Pthread can be detached from other Pthreads by calling the
pthread_detach() function. The syntax of this function is as follows:

pthread_detach (<pthread_id>); where


pthread_id is the ID of the Pthread which is to be detached (target
thread).
Note that no other Pthreads can synchronize their activities with
the detached Pthread; however, the detached Pthread continues to
run until it gets terminated.

Waiting for Termination of a Pthread


A Pthread can wait for another Pthread to complete before its
termination by calling the pthread_join() function. The syntax of this
function is as follows:
pthread_join (<pthread_id>, adr(x));

where
<pthread_id> is the ID of the Pthread whose termination is awaited.
adr(x)is the address of the variable x in which the status of the target
Pthread is to be stored.
Following points should be kept in mind while using the
pthread_join() function.

• The Pthread that has invoked the pthread_join() function remains


suspended until the target Pthread terminates.
• The pthread_join() function cannot be used for a detached
Pthread.
• No two Pthreads can invoke the pthread_join() function for each
other as it will result in a deadlock.
Note: Generally, the implementation of Pthread library is limited only
to UNIX-based systems, for example, Solaris 2 and not supported on
Windows.
LET US SUMMARIZE
1. Many modern operating systems employ multithreading that allows a
process to have multiple threads of control within the same address
space. These threads may run in parallel thereby enabling the process to
perform multiple tasks at a time.
2. A thread is defined as the fundamental unit of CPU utilization. Multiple
threads of the same process share with each other the code section, data
section, and other resources including list of open files, child processes,
signals, etc., of the process. In addition, each thread has its own ID,
stack, set of registers, and program counter.
3. The major advantage that threads provide over processes is the low
overhead during switching. In addition, threads offer some other
advantages which include computational speedup, economy, efficient
communication, proper utilization of multiprocessor architecture, and
responsiveness.
4. Threads can be implemented in different ways depending on the extent to
which the process and the operating system know about them. Two
common methods for implementing threads include kernel-level and user-
level threads.
5. Kernel-level threads are implemented by the kernel. The kernel is
responsible for creating, scheduling, and managing threads in the kernel
space.
6. User-level threads are implemented by a thread library associated with the
code of a process. The thread library provides support for creating,
scheduling, and managing threads in the user space without any
involvement from the kernel.
7. Many systems support a hybrid thread model that contains both user- and
kernel-level threads along with a relationship between these threads.
There may exist different types of relationship between user- and kernel-
level threads, each resulting in a specific multithreading model. Three
common multithreading models are many-to-one, one-to-one, and many-
to-many.
8. In many-to-one multithreading model, the kernel creates only one kernel-
level thread in each process and the multiple user-level threads (created
by thread library) of the process are associated with this kernel-level
thread.
9. In one-to-one multithreading model, each user-level thread is associated
with a kernel-level thread.
10. In many-to-many multithreading model, many user-level threads are
associated with many kernel-level threads with the number of kernel-level
threads being equal to or less than that of user-level threads.
11. The procedure of terminating a thread before it completes its execution is
known as thread cancellation and the thread that is to be cancelled is
known as target thread. Thread cancellation may be done in any of the
two ways: asynchronous cancellation and deferred cancellation.
12. A library which provides an application programming interface (API) to
create and manage threads is referred to as a thread library.
13. A thread library can be implemented either in the user space or in the
kernel space.
14. POSIX Pthreads, Windows, and Java are the three main thread libraries
which are in use today.
15. Pthreads refer to thread extensions of POSIX standard (IEEE 1003.1c)
that provide the programmers an API for thread creation and
management.
16. The Pthreads library can be implemented either in the kernel space or in
the user space as per the operating system’s designer choice.

EXERCISES
Fill in the Blanks
1. Many modern operating systems employ _____________ that allows a
process to have multiple threads of control within the same address
space.
2. Two methods for implementing threads are _____________ and
_____________ threads.
3. _____________ allows a process to have multiple threads of control
within the same address space.
4. Whenever a process invokes the _____________ system call, a new
(child) process is created that is the exact duplicate of its parent process.
5. The procedure of terminating a thread before it completes its execution is
known as _____________ and the thread that is to be cancelled is known
as _____________.

Multiple Choice Questions


1. A process with multiple threads of control is referred to as a:
(a) Multithreaded process
(b) Single-threaded process
(c) Lightweight process
(d) Heavyweight process
2. Thread cancellation may be performed in which of the following ways?
(a) Asynchronous cancellation
(b) Deferred cancellation
(c) Synchronous cancellation
(d) Both (a) and (b)
3. Which of these are the advantages of threads?
(a) Economic
(b) Responsiveness
(c) Computational speedup
(d) All of these
4. The kernel-level threads are implemented by the:
(a) System
(b) User
(c) Processor
(d) Kernel
5. Which of these thread libraries can be implemented in both user and
kernel space?
(a) POSIX Pthreads
(b) Win32
(c) Java
(d) All of these

State True or False


1. In the case of user-level threads, the kernel maintains a thread table to
keep track of user-level threads.
2. A thread may require having its own copy of certain data, termed as
thread-specific data.
3. In a multiprocessor environment, multiple user-level threads belonging to
a process can be scheduled to run simultaneously on different CPUs
thereby resulting in computation speedup.
4. A process can create a Pthread by calling the pthread_create() function.
5. In deferred cancellation, the target thread is terminated immediately after
any thread indicates its cancellation.
Descriptive Questions
1. Define a thread. How it is different from a process?
2. Name some operating systems that provide support for kernel-level
threads.
3. List some advantages of one-to-one multithreading model.
4. List some advantages of threads over the traditional processes.
5. Differentiate between kernel- and user-level threads. Which one is
preferred over another and under what circumstances?
6. Describe some issues related with multithreaded programs.
7. How does the many-to-many multithreading model overcome the
limitations of many-to-one and one-to-one models?
8. Why does the switching among threads incur less overhead as compared
to process switching?
9. What is thread cancellation? Give two ways in which thread cancellation
can be performed.
10. Define thread library. Also give the functions that are used for managing
Pthreads.
chapter 4

CPU Scheduling

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the basic concepts of scheduling.
⟡ Discuss the criteria for scheduling.
⟡ Describe various scheduling algorithms.
⟡ Discuss scheduling for multiprocessor systems.
⟡ Describe real-time scheduling.
⟡ Evaluate various scheduling algorithms.
⟡ Discuss thread scheduling.

4.1 INTRODUCTION
As discussed in Chapter 2, CPU scheduling is the procedure
employed for deciding to which of the ready processes, the CPU
should be allocated. CPU scheduling plays a pivotal role in the basic
framework of the operating system owing to the fact that the CPU is
one of the primary resources of the computer system. The algorithm
used by the scheduler to carry out the selection of a process for
execution is known as scheduling algorithm. A number of
scheduling algorithms are available for CPU scheduling. Each
scheduling algorithm influences the resource utilization, overall
system performance, and quality of service provided to the user.
Therefore, one has to reason out a number of criteria to be
considered while selecting an algorithm on a particular system.

4.2 SCHEDULING CONCEPTS


Before we start discussing the scheduling criteria and scheduling
algorithms comprehensively, we will first take into account some
relatively important concepts of scheduling which are mentioned next.

4.2.1 Process Behaviour


CPU scheduling is greatly affected by the way a process behaves
during its execution. Almost all the processes continue to switch
between CPU (for processing) and I/O devices (for performing I/O)
during their execution. The time period elapsed in processing before
performing the next I/O operation is known as CPU burst and the
time period elapsed in performing I/O before the next CPU burst is
known as I/O burst. Generally, the process execution starts with a
CPU burst, followed by an I/O burst, then again by a CPU burst and
so on until the termination of the process. Thus, we can say that the
process execution comprises alternate cycles of CPU and I/O burst.
Figure 4.1 shows the sequence of CPU and I/O bursts upon the
execution of the following code segment written in C language.

Fig. 4.1 Alternate Cycles of CPU and I/O Bursts


The length of the CPU burst and the I/O burst varies from process
to process depending on whether the process is CPU-bound or I/O-
bound. If the process is CPU-bound, it will have longer CPU bursts as
compared to I/O bursts, and vice versa in case the process is I/O-
bound. From the scheduling perspective, only the length of the CPU
burst is taken into consideration and not the length of the I/O burst.

4.2.2 When to Schedule


An important facet of scheduling is to determine when the scheduler
should make scheduling decisions. The following circumstances may
require the scheduler to make scheduling decisions.
• When a process switches from running to waiting state. This
situation may occur in case the process has to wait for I/O or the
termination of its child process or some another reason. In such
situations, the scheduler has to select some ready process for
execution.
• When a process switches from running to ready state due to
occurrence of an interrupt. In such a situation, the scheduler may
decide to run a process from the ready queue. If the interrupt was
caused by some I/O device that has now completed its task, the
scheduler may choose the process that was blocked waiting for
the I/O.
• When a process switches from waiting state to ready state. This
situation may occur when the process has completed its I/O
operation. In such a situation, the scheduler may select either the
process that has now come to the ready state or the current
process may be continued.
• When a process terminates and exits the system. In this case, the
scheduler has to select a process for execution from the set of
ready processes.

4.2.3 Dispatcher
The CPU scheduler only selects a process to be executed next on
the CPU but it cannot assign CPU to the selected process. The
function of setting up the execution of the selected process on the
CPU is performed by another module of the operating system, known
as dispatcher. The dispatcher involves the following three steps to
perform this function.
1. Context switching is performed. The kernel saves the context of
currently running process and restores the saved state of the
process selected by the CPU scheduler. In case the process
selected by the short-term scheduler is new, the kernel loads its
context.
2. The system switches from the kernel mode to user mode as a
user process is to be executed.
3. The execution of the user process selected by the CPU
scheduler is started by transferring the control either to the
instruction that was supposed to be executed at the time the
process was interrupted, or to the first instruction if the process
is going to be executed for the first time after its creation.
Note: The amount of time required by the dispatcher to suspend
execution of one process and resume execution of another process is
known as dispatch latency. Low dispatch latency implies faster start
of process execution.

4.3 SCHEDULING CRITERIA


The scheduler must consider the following performance measures
and optimization criteria in order to maximize the performance of the
system.
• Fairness: It is defined as the degree to which each process is
getting an equal chance to execute. The scheduler must ensure
that each process should get a fair share of CPU time. However,
it may treat different categories of processes (batch, real-time, or
interactive) in a different manner.
• CPU utilization: It is defined as the percentage of time the CPU is
busy in executing processes. For higher utilization, the CPU must
be kept as busy as possible, that is, there must be some process
running at all times.
• Balanced utilization: It is defined as the percentage of time all
the system resources are busy. It considers not only the CPU
utilization but the utilization of I/O devices, memory, and other
resources also. To get more work done by the system, the CPU
and I/O devices must be kept running simultaneously. For this, it
is desirable to load a mixture of CPU-bound and I/O-bound
processes in the memory.
• Throughput: It is defined as the total number of processes that a
system can execute per unit of time. By and large, it depends on
the average length of the processes to be executed. For the
systems running long processes, throughput will be less as
compared to the systems running short processes.
• Turnaround time: It is defined as the amount of time that has
rolled by from the time of creation to the termination of a process.
To put it differently, it is the difference between the time a process
enters the system and the time it exits from the system. It
includes all the time the process has spent waiting to enter into
ready queue, within ready queue to get CPU, running on CPU,
and in I/O queues. It is inversely proportional to the throughput,
that is, more the turnaround time, less will be the throughput.
• Waiting time: It is defined as the time used up by a process while
waiting in the ready queue. It does not take into account the
execution time or time consumed for I/O. Thus, waiting time of a
process can be determined as the difference between turnaround
time and processing time. In practice, waiting time is a more
accurate measure as compared to turnaround time.
• Response time: It is defined as the time elapsed after the user
initiates a request and the system starts responding to this
request. For interactive systems, it is one of the best metric
employed to gauge the performance. This is because in such
systems, only the speed with which the system responds to the
user’s request matters and not the time it takes to output the
response.
The basic purpose of a CPU scheduling algorithm is that it should
tend to maximize fairness, CPU utilization, balanced utilization and
throughput, and minimize turnaround, waiting and response time.
Practically speaking, no scheduling algorithm optimizes all the
scheduling criteria. Thus, in general, the performance of an algorithm
is evaluated on the basis of average measures. For example, an
algorithm that minimizes the average waiting time is considered as a
good algorithm because this improves the overall efficiency of the
system. However, in the case of response time, minimizing the
average is not a good criterion rather the variance in the response
time of the processes should be minimized. This is because it is not
desirable to have a process with long response time as compared to
other processes.

4.4 SCHEDULING ALGORITHMS


A wide variety of algorithms are used for the CPU scheduling. These
scheduling algorithms fall into two categories, namely, non-
preemptive and preemptive.
• Non-preemptive scheduling algorithms: Once the CPU is
allocated to a process, it cannot be taken back until the process
voluntarily releases it (in case the process has to wait for I/O or
some other event) or the process terminates. In other words, we
can say the decision to schedule a process is made only when
the currently running process either switches to the waiting state
or terminates. In both cases, the CPU executes some other
process from the set of ready processes. Some examples of non-
preemptive scheduling algorithms are first come first served
(FCFS), shortest job first (SJF), priority-based scheduling and
highest response ratio next (HRN) scheduling.
• Preemptive scheduling algorithms: The CPU can be forcibly
taken back from the currently running process before its
completion and allocated to some other process. The preempted
process is put back in the ready queue and resumes its execution
when it is scheduled again. Thus, a process may be scheduled
many times before its completion. In preemptive scheduling, the
decision to schedule another process is made whenever an
interrupt occurs causing the currently running process to switch to
ready state or a process having higher priority than the currently
running process is ready to execute. Some examples of
preemptive scheduling algorithms are shortest remaining time
next (SRTN), priority-based scheduling and round robin (RR)
scheduling.
Note: A non-preemptive scheduling algorithm is also known as a
cooperative or voluntary scheduling algorithm.

4.4.1 First-Come First-Served (FCFS) Scheduling


FCFS is one of the simplest scheduling algorithms. As the name
implies, the processes are executed in the order of their arrival in the
ready queue, which means the process that enters the ready queue
first gets the CPU first. FCFS is a non-preemptive scheduling
algorithm. Therefore, once a process gets the CPU, it retains the
control of CPU until it blocks or terminates.
To implement FCFS scheduling, the implementation of ready
queue is managed as a FIFO (First-in First-out) queue. When the first
process enters the ready queue, it immediately gets the CPU and
starts executing. Meanwhile, other processes enter the system and
are added to the end of the queue by inserting their PCBs in the
queue. When the currently running process completes or blocks, the
CPU is allocated to the process at the front of the queue and its PCB
is removed from the queue. In case a currently running process was
blocked and later it comes to the ready state, its PCB is linked to the
end of queue.
Example 1 Consider four processes P1, P2, P3, and P4 with their arrival
times and required CPU burst (in milliseconds) as shown in the
following table.
How will these processes be scheduled according to FCFS
scheduling algorithm? Compute the average waiting time and
average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Initially, P1 enters the ready queue at t = 0 and CPU is allocated to


it. While P1 is executing, P2, P3 and P4 enter the ready queue at t = 2, t
= 3, and t = 5, respectively. When P1 completes, CPU is allocated to
P2 as it has entered before P3 and P 4. When P2 completes, P3 gets the
CPU after which P4 gets the CPU.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (15 – 0) = 15 ms
Turnaround time for P2= (21 – 2) = 19 ms
Turnaround time for P3= (28 – 3) = 25 ms
Turnaround time for P4= (33 – 5) = 28 ms
Average turnaround time = (15 + 19 + 25 + 28)/4 = 21.75 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (15 – 15) = 0 ms
Waiting time for P2= (19 – 6) = 13 ms
Waiting time for P3= (25 – 7) = 18 ms
Waiting time for P4= (28 – 5) = 23 ms
Average waiting time = (0 + 13 + 18 + 23)/4 = 13.5 ms
The performance of FCFS scheduling algorithm largely depends
on the order of arrival of processes in the ready queue, that is,
whether the processes having long CPU burst enter before those
having short CPU burst or vice versa. To illustrate this, assume that
the processes (shown in Example 1) enter the ready queue in the
order P4, P2, P3 and P1. Now, the processes will be scheduled as
shown in the following Gantt chart.

Using the above formulae, the average turnaround time and


average waiting time can be computed as:
Average turnaround time = ((5 – 0) + (11 – 2) + (18 – 3) + (33 –
5))/4 = 14.25 ms
Average waiting time = ((5 – 5) + (9 – 6) + (15 – 7) + (28 – 15))/4
= 6 ms
It is clear that if the processes having shorter CPU burst execute
before those having longer CPU burst, the average waiting and
turnaround time may reduce significantly.
Example 2 Five jobs A through E arrive at a computer center with
following details:
Calculate the turnaround time and waiting time for all processes
applying FCFS algorithm.
Solution According to FCFS scheduling algorithm, the given
processes will be scheduled as depicted in the following Gantt chart.

Since Turnaround time = Exit time – Entry time,


Turnaround time for A = (9 – 0) = 9 ms
Turnaround time for B = (14 – 1) = 13 ms
Turnaround time for C = (16 – 2) = 14 ms
Turnaround time for D = (22 – 3) = 19 ms
Turnaround time for E = (30 – 4) = 26 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for A = (9 – 9) = 0 ms
Waiting time for B = (13 – 5) = 8 ms
Waiting time for C = (14 – 2) = 12 ms
Waiting time for D = (19 – 6) = 13 ms
Waiting time for E = (26 – 8) = 18 ms

Advantages
• It is easier to understand and implement as processes are simply
to be added at the end and removed from the front of the queue.
No process from in between the queue is required to be
accessed.
• It is well suited for batch systems where the longer time periods
for each process are often acceptable.
Disadvantages
• The average waiting time is not minimal. Therefore, this
scheduling algorithm is never recommended where performance
is a major issue.
• It reduces the CPU and I/O devices utilization under some
circumstances. For example, assume that there is one long CPU-
bound process and many short I/O-bound processes in the ready
queue. Now, it may happen that while the CPU-bound process is
executing, the I/O-bound processes complete their I/O and come
to the ready queue for execution. There they have to wait for the
CPU-bound process to release the CPU and the I/O devices also
remain idle during this time. When the CPU-bound process needs
to perform I/O, it comes to the device queue and the CPU is
allocated to I/O-bound processes. As the I/O-bound processes
require a little CPU burst, they execute quickly and come back to
the device queue thereby leaving the CPU idle. Then the CPU-
bound process enters the ready queue and is allocated the CPU
which again makes the I/O processes waiting in ready queue at
some point of time. This happens again and again until the CPU-
bound process is done, which results in low utilization of CPU and
I/O devices.
• It is not suitable for time sharing systems where each process
should get the same amount of CPU time.

4.4.2 Shortest Job First (SJF) Scheduling


The shortest job first, also known as shortest process next (SPN) or
shortest request next (SRN), is a non-preemptive scheduling
algorithm that schedules the processes according to the length of
CPU burst they require. At any point of time, among all the ready
processes, the one having the shortest CPU burst is scheduled first.
Thus, a process has to wait until all the processes shorter than it
have been executed. In case two processes have the same CPU
burst, they are scheduled in the FCFS order.
Example 3 Consider four processes P1, P2, P3, and P4 with their arrival
times and required CPU burst (in milliseconds) as shown in the
following table.

How will these processes be scheduled according to SJF


scheduling algorithm? Compute the average waiting time and
average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Initially, P1 enters the ready queue at t = 0 and gets the CPU as


there are no other processes in the queue. While it is executing, P2, P3
and P4 enter the queue at t = 1, t = 3 and t = 4, respectively. When
CPU becomes free, that is, at t = 7, it is allocated to P3 because it has
the shortest CPU burst among the three processes. When P3
completes, CPU is allocated first to P4 and then to P2.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (7 – 0) = 7 ms
Turnaround time for P2= (17 – 1) = 16 ms
Turnaround time for P3= (9 – 3) = 6 ms
Turnaround time for P4= (12 – 4) = 8 ms
Average turnaround time = (7 + 16 + 6 + 8)/4 = 9.25 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (7 – 7) = 0 ms
Waiting time for P2= (16 – 5) = 11 ms
Waiting time for P3= (6 – 2) = 4 ms
Waiting time for P4= (8 – 3) = 5 ms
Average waiting time = (0 + 11 + 4 + 5)/4 = 5 ms

Example 4 Consider the same set of processes, their arrival times


and CPU burst as shown in Example 2. How will these processes be
scheduled according to SJF scheduling algorithm? Compute the
average waiting time and average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Since Turnaround time = Exit time – Entry time,


Turnaround time for A = (9 – 0) = 9 ms
Turnaround time for B = (16 – 1) = 15 ms
Turnaround time for C = (11 – 2) = 9 ms
Turnaround time for D = (22 – 3) = 19 ms
Turnaround time for E = (30 – 4) = 26 ms
Average turnaround time = (9 + 15 + 9 + 19 + 26)/5 = 15.6 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for A = (9 – 9) = 0 ms
Waiting time for B = (15 – 5) = 10 ms
Waiting time for C = (9 – 2) = 7 ms
Waiting time for D = (19 – 6) = 13 ms
Waiting time for E = (26 – 8) = 18 ms
Average waiting time = (0 + 10 + 7 + 13 + 18)/5 = 9.6 ms

Advantages
• It eliminates the variance in waiting and turnaround times. In fact,
it is optimal with respect to average waiting time if all processes
are available at the same time. This is due to the fact that short
processes are made to run before the long ones which decreases
the waiting time for short processes and increases the waiting
time for long processes. However, the reduction in waiting time is
more than the increment and thus, the average waiting time
decreases.

Disadvantages
• It is difficult to implement as it needs to know the length of CPU
burst of processes in advance. In practice, having the prior
knowledge of the required processing time of processes is
difficult. Many systems expect users to provide estimates of CPU
burst of processes which may not always be correct.
• It does not favour the processes having longer CPU burst. This is
because as long as the short processes continue to enter the
ready queue, the long processes will not be allowed to get the
CPU. This results in starvation of long processes.

4.4.3 Shortest Remaining Time Next (SRTN) Scheduling


The shortest remaining time next also known as shortest time to go
(STG) is a preemptive version of the SJF scheduling algorithm. It
takes into account the length of the remaining CPU burst of the
processes rather than the whole length in order to schedule them.
The scheduler always chooses the process for execution that has the
shortest remaining processing time. While a process is being
executed, the CPU can be taken back from it and assigned to some
newly arrived process if the CPU burst of the new process is shorter
than its remaining CPU burst. Notice that if at any point of time, the
remaining CPU burst of two processes becomes equal; they are
scheduled in the FCFS order.
Example 5 Suppose that the following processes arrive for the
execution at the times indicated. Each process will run the listed
amount of time.

What is the average turnaround time and average waiting time for
these processes with SRTN algorithm?

Solution According to SRTN algorithm, the given processes will be


scheduled as depicted in the following Gantt chart.

Initially, P1 enters the ready queue at t = 0.0 and gets the CPU as
there are no other processes in the queue. While it is executing, at
time t = 0.4, P2 with CPU burst of 4 ms enters the queue. At that time
the remaining CPU burst of P1 is 7.6 ms which is greater than that of
P2. Therefore, the CPU is taken back from P1 and allocated to P2.
During execution of P2, P3 enters at t = 1.0 with a CPU burst of 1 ms.
Again CPU is switched from P2 to P3 as the remaining CPU burst of P2
at t = 1.0 is 3.4 ms which is greater than that of P3. When P3
completes at t = 2.0, the CPU is allocated to P2 because at that time
the remaining CPU burst of P2(which is, 3.4 ms) is shorter than that of
P1(whichis 7.6 ms). Finally, when P2 completes its execution at t= 5.4
ms, the CPU is allocated to P1 which completes its execution at t =
13.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (13 – 0) = 13 ms
Turnaround time for P2= (5.4 – 0.4) = 5 ms
Turnaround time for P3= (2 – 1) = 1 ms
Average turnaround time = (13 + 5 + 1)/3 = 6.33 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (13 – 8) = 5 ms
Waiting time for P2= (5 – 4) = 1 ms
Waiting time for P3= (1 – 1) = 0 ms
Average waiting time = (5 + 1 + 0)/3 = 2 ms
Example 6 Consider the same set of processes, their arrival times
and CPU burst as shown in Example 3. How will these processes be
scheduled according to SRTN scheduling algorithm? Compute the
average waiting time and average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Initially, P1 enters the ready queue at t = 0 and gets the CPU as


there are no other processes in the queue. While it is executing, at
time t = 1, P2 with CPU burst of 5 ms enters the queue. At that time
the remaining CPU burst of P1 is 6 ms which is greater than that of P2.
Therefore, the CPU is taken back from P1 and allocated to P2. During
execution of P2, P3 enters at t = 3 with a CPU burst of 2 ms. Again
CPU is switched from P2 to P3 as the remaining CPU burst of P2 at t =
3 is 3 ms which is greater than that of P3. However, when at time t =
4, P4 with CPU burst of 3 ms enters the queue, the CPU is not
assigned to it because at that time the remaining CPU burst of
currently running process (that is, P3) is 1 ms which is shorter than
that of P4. When P3 completes, there are three processes P1(6 ms),
P2(3 ms) and P4(3 ms) in the queue. To break the tie between P2 and
P4, the scheduler takes into consideration their arrival order and the
CPU is allocated first to P2, then to P4 and finally, to P1.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (17 – 0) = 17 ms
Turnaround time for P2= (8 – 1) = 7 ms
Turnaround time for P3= (5 – 3) = 2 ms
Turnaround time for P4= (11 – 4) = 7 ms
Average turnaround time = (17 + 7 + 2 + 7)/4 = 8.25 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (17– 7) = 10 ms
Waiting time for P2= (7 – 5) = 2 ms
Waiting time for P3= (2 – 2) = 0 ms
Waiting time for P4= (7 – 3) = 4 ms
Average waiting time = (10 + 2 + 0 + 4)/4 = 4 ms

Advantages
• A long process that is near to its completion may be favored over
the short processes entering the system. This results in an
improvement in the turnaround time of the long process.
Disadvantages
• Like SJF, it also requires an estimate of the next CPU burst of a
process in advance.
• Favoring a long process nearing its completion over the several
short processes entering the system may affect the turnaround
times of short processes.
• It favors only those long processes that are just about to complete
and not those who have just started their operation. Thus,
starvation of long processes still may occur.

4.4.4 Priority-based Scheduling


In priority-based scheduling algorithm, each process is assigned a
priority and the higher priority processes are scheduled before the
lower priority processes. At any point of time, the process having the
highest priority among all the ready processes is scheduled first. In
case two processes are having the same priority, they are executed
in the FCFS order.
Priority scheduling may be either preemptive or non-preemptive.
The choice is made whenever a new process enters the ready queue
while some process is executing. If the newly arrived process has the
higher priority than the currently running process, the preemptive
priority scheduling algorithm preempts the currently running process
and allocates CPU to the new process. On the other hand, the non-
preemptive scheduling algorithm allows the currently running process
to complete its execution and the new process has to wait for the
CPU.
Note: Both SJF and SRTN are special cases of priority-based
scheduling where priority of a process is equal to inverse of the next
CPU burst. Lower is the CPU burst, higher will be the priority.
A major design issue related with priority scheduling is how to
compute priorities of the processes. The priority can be assigned to a
process either internally defined by the system depending on the
process’s characteristics like memory usage, I/O frequency, usage
cost, etc., or externally defined by the user executing that process.
Example 7 Consider four processes P1, P2, P3, and P4 with their arrival
times, required CPU burst (in milliseconds), and priorities as shown in
the following table.

Assuming that the lower priority number means the higher priority,
how will these processes be scheduled according to non-preemptive
as well as preemptive priority scheduling algorithm? Compute the
average waiting time and average turnaround time in both cases.
Solution
Non-preemptive priority scheduling algorithm
The processes will be scheduled as depicted in the following Gantt
chart.

Initially, P1 enters the ready queue at t = 0 and gets the CPU as


there are no other processes in the queue. While it is executing, P2,
P3, and P4 enter the queue at t = 1, t = 3, and t = 4, respectively. When
CPU becomes free, that is, at t = 7, it is allocated to P3 because it is
having the highest priority (that is, 1) among the three processes.
When P3 completes, CPU is allocated to the next lower priority
process, that is, P4 and finally, the lowest priority process P2 is
executed.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (7 – 0) = 7 ms
Turnaround time for P2= (16 – 1) = 15 ms
Turnaround time for P3= (10 – 3) = 7 ms
Turnaround time for P4= (12 – 4) = 8 ms
Average turnaround time = (7 + 15 + 7 + 8)/4 = 9.25 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (7 – 7) = 0 ms
Waiting time for P2= (15 – 4) = 11 ms
Waiting time for P3= (7 – 3) = 4 ms
Waiting time for P4= (8 – 2) = 6 ms
Average waiting time = (0 + 11 + 4 + 6)/4 = 5.25 ms
Preemptive priority scheduling algorithm
The processes will be scheduled as depicted in the following
Gantt chart.

Initially, P1 of priority 4 enters the ready queue at t = 0 and gets


the CPU as there are no other processes in the queue. While it is
executing, at time t = 1, P2 of priority 3 greater than that of currently
running process P1, enters the queue. Therefore, P1 is preempted
(with remaining CPU burst of 6 ms) and the CPU is allocated to P2.
During execution of P2, P3 of priority 1 enters at t = 3. Again CPU is
switched from P2(with remaining CPU burst of 2 ms) to P3 as the
priority of P3 is greater than that of P2. However, when at time t = 4, P4
of priority 2 enters the queue, the CPU is not assigned to it because it
has lower priority than currently running process P3. When P3
completes, there are three processes P1, P2, and P4 in the ready
queue having priorities 4, 3, and 2, respectively. The CPU is allocated
first to P4, then to P2 and finally to P1.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (16 – 0) = 16 ms
Turnaround time for P2= (10 – 1) = 9 ms
Turnaround time for P3= (6 – 3) = 3 ms
Turnaround time for P4= (8 – 4) = 4 ms

Average turnaround time = (16 + 9 + 3 + 4)/4 = 8 ms


Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (16 – 7) = 9 ms
Waiting time for P2= (9 – 4) = 5 ms
Waiting time for P3= (3 – 3) = 0 ms
Waiting time for P4= (4 – 2) = 2 ms
Average waiting time = (9 + 5 + 0 + 2)/4 = 4 ms
Example 8 Consider the following set of processes with the length of
CPU burst time given in milliseconds.
Assume arrival order is: P1, P2, P3, P4, P5 all at time 0 and a smaller
priority number implies a higher priority. Draw the Gantt charts
illustrating the execution of these processes using preemptive priority
scheduling.
Solution According to preemptive priority-based scheduling
algorithm, the given processes will be scheduled as depicted in the
following Gantt chart.

All processes enter the ready queue at t = 0. Since at that time,


the process P2 has the highest priority among all processes, the CPU
is allocated to it. When P2 completes its execution at t = 1, the CPU is
allocated to P5 as it has the highest priority among the remaining
processes. Now at t = 6, a choice has to made between processes P1
and P3 as both have the same priority. To break the tie between P1
and P3, the scheduler takes into consideration their arrival order.
Thus, the CPU is allocated first to P1 and then to P3. Lastly at t = 18,
the CPU is allocated to P4.

Advantages
• Important processes are never made to wait because of the
execution of less important processes.
Disadvantages
• It suffers from the problem of starvation of lower priority
processes, since the continuous arrival of higher priority
processes will prevent lower priority processes indefinitely from
acquiring the CPU. One possible solution to this problem is aging
which is a process of gradually increasing the priority of a low
priority process with increase in its waiting time. If the priority of a
low priority process is increased after each fixed time of interval, it
is ensured that at some time it will become a highest priority
process and get executed.

4.4.5 Highest Response Ratio Next (HRN) Scheduling


The highest response ratio next scheduling is a non-preemptive
scheduling algorithm that schedules the processes according to their
response ratio. Whenever CPU becomes available, the process
having the highest value of response ratio among all the ready
processes is scheduled next. The response ratio of a process in the
queue is computed by using the following equation.

Initially, when a process enters, its response ratio is 1. It goes on


increasing at the rate of (1/CPU burst) as the process’s waiting time
increases.
Example 9 Consider two processes P and Q with CPU burst 10 and
50. Calculate the response ratio of both the processes after waiting
for 5 ms. Also, calculate the response ratio of the process Q after
waiting for 30 ms.
Solution Response ratio of P and Q after waiting time of 5 ms can be
calculated as:
Now, after waiting for 30 ms, the response ratio of Q becomes:

Example 10 Consider four processes P1, P2, P3, and P4 with their
arrival times and required CPU burst (in milliseconds) as shown in the
following table.

How will these processes be scheduled according to HRN


scheduling algorithm? Compute the average waiting time and
average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Initially, P1 enters the ready queue at t = 0 and CPU is allocated to


it. By the time P1 completes, P2 and P3 have arrived at t = 2 and t = 3,
respectively. At t = 3, the response ratio of P2 is ((3 – 2) + 4)/4 = 1.25
and of P3 is 1 as it has just arrived. Therefore P2 is scheduled next.
During execution of P2, P4 enters the queue at t = 4. When P2
completes at t = 7, the response ratio of P3 is ((7 – 3) + 5)/5 = 1.8 and
of P4 is ((7 – 4) + 2)/2 = 2.5. As P4 has higher response ratio, the CPU
is allocated to it and after its completion, P3 is executed.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (3 – 0) = 3 ms
Turnaround time for P2= (7 – 2) = 5 ms
Turnaround time for P3= (14 – 3) = 11 ms
Turnaround time for P4= (9 – 4) = 5 ms
Average turnaround time = (3 + 5 + 11 + 5)/4 = 6 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (3 – 3) = 0 ms
Waiting time for P2= (5 – 4) = 1 ms
Waiting time for P3= (11 – 5) = 6 ms
Waiting time for P4= (5 – 2) = 3 ms
Average waiting time = (0 + 1 + 6 + 3)/4 = 2.5 ms

Advantages
• It favors short processes. This is because with increase in waiting
time, the response ratio of short processes increases speedily as
compared to long processes. Thus, they are scheduled earlier
than long processes.
• Unlike SJF, starvation does not occur since with increase in
waiting time, the response ratio of long processes also increases
and eventually they are scheduled.

Disadvantages
• Like SJF and SRTN, it also requires an estimate of the expected
service time (CPU burst) of a process.

4.4.6 Round Robin (RR) Scheduling


The round robin scheduling is one of the most widely used
preemptive scheduling algorithms which considers all the processes
as equally important and treats them in a favorable manner. Each
process in the ready queue gets a fixed amount of CPU time
(generally from 10 to 100 milliseconds) known as time slice or time
quantum for its execution. If the process does not execute
completely till the end of time slice, it is preempted and the CPU is
allocated to the next process in the ready queue. However, if the
process blocks or terminates before the time slice expires, the CPU is
switched to the next process in the ready queue at that moment only.
To implement the round robin scheduling algorithm, the ready
queue is treated as a circular queue. All the processes arriving in the
ready queue are put at the end of queue. The CPU is allocated to the
first process in the queue, and the process executes until its time
slice expires. If the CPU burst of the process being executed is less
than one time quantum, the process itself releases the CPU and is
deleted from the queue. The CPU is then allocated to the next
process in the queue. However, if the process does not execute
completely within the time slice, an interrupt occurs when the time
slice expires. The currently running process is preempted, put back at
the end of the queue and the CPU is allocated to the next process in
the queue. The preempted process again gets the CPU after all the
processes before it in the queue have been allocated their CPU time
slice. The whole process continues until all the processes in queue
have been executed.
Example 11 Consider four processes P1, P2, P3, and P4 with their
arrival times and required CPU burst (in milliseconds) as shown in the
following table.
Assuming that the time slice is 3 ms, how will these processes be
scheduled according to round robin scheduling algorithm? Compute
the average waiting time and average turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.

Initially, P1 enters the ready queue at t = 0 and gets the CPU for 3
ms. While it executes, P2 and P3 enter the queue at t = 1 and t = 3,
respectively. Since, P1 does not execute within 3 ms, an interrupt
occurs when the time slice gets over. P1 is preempted (with remaining
CPU burst of 7 ms), put back in the queue after P3 because P4 has not
entered yet and the CPU is allocated to P2. During execution of P2, P4
enters in the queue at t = 4 and put at the end of queue after P1.
When P2 times out, it is preempted (with remaining CPU burst of 2
ms) and put back at the end of queue after P4. The CPU is allocated
to the next process in the queue, that is, to P3 and it executes
completely before the time slice expires. Thus, the CPU is allocated
to the next process in the queue which is P1. P1 again executes for 3
ms, then preempted (with remaining CPU burst of 4 ms) and put back
at the end of the queue after P2 and the CPU is allocated to P4. P4
executes completely within the time slice and the CPU is allocated to
next process in the queue, that is, P2. As P2 completes before the time
out occurs, the CPU is switched to P1 at t = 16 for another 3 ms.
When the time slice expires, CPU is again allocated to P1 as it is the
only process in the queue.
Since Turnaround time = Exit time – Entry time,
Turnaround time for P1= (20 – 0) = 20 ms
Turnaround time for P2= (16 – 1) = 15 ms
Turnaround time for P3= (8 – 3) = 5 ms
Turnaround time for P4= (14 – 4) = 10 ms

Average turnaround time = (20 + 15 + 5 + 10)/4 = 12.5 ms


Since Waiting time = Turnaround time – Processing time,
Waiting time for P1= (20 – 10) = 10 ms
Waiting time for P2= (15 – 5) = 10 ms
Waiting time for P3= (5 – 2) = 3 ms
Waiting time for P4= (10 – 3) = 7 ms
Average waiting time = (10 + 10 + 3 + 7)/4 = 7.5 ms
Example 12 Five jobs A through E arrive at a computer center with
following details:

Calculate the average turnaround time and average waiting time


of the processes applying Round Robin algorithm.
Solution According to this scheduling algorithm with quanta = 3, the
given processes will be scheduled as depicted in the following Gantt
chart.

Since Turnaround time = Exit time – Entry time,


Turnaround time for A = (28 – 0) = 28 ms
Turnaround time for B = (19 – 1) = 18 ms
Turnaround time for C = (8 – 2) = 6 ms
Turnaround time for D = (22 – 3) = 19 ms
Turnaround time for E = (30 – 4) = 26 ms
Average turnaround time = (28 + 18 + 6 + 19 + 26)/5 = 19.4 ms
Since Waiting time = Turnaround time – Processing time,
Waiting time for A = (28 – 9) = 19 ms
Waiting time for B = (18 – 5) = 13 ms
Waiting time for C = (6 – 2) = 4 ms
Waiting time for D = (19 – 6) = 13 ms
Waiting time for E = (26 – 8) = 18 ms
Average waiting time = (19 + 13 + 4 + 13 + 18)/5 = 13.4 ms
Note that the performance of round robin scheduling is greatly
affected by the size of the time quantum. If the time quantum is too
small, a number of context switches occur which in turn increase the
system overhead. The more time will be spent in performing context
switching rather than executing the processes. On the other hand, if
the time quantum is too large, the performance of round robin simply
degrades to FCFS.
Note: If the time quantum is too small, say 1 μs, the round robin
scheduling is called processor sharing.

Advantages
• It is efficient for time sharing systems where the CPU time is
divided among the competing processes.
• It increases the fairness among the processes.

Disadvantages
• The processes (even the short processes) may take long time to
execute. This decreases the system throughput.
• It requires some extra hardware support such as a timer to cause
interrupt after each time out.
Note: Ideally, the size of time quantum should be such that 80% of
the processes could complete their execution within one time
quantum.

4.4.7 Multilevel Queue Scheduling


The multilevel queue scheduling is designed for the environments
where the processes can be categorized into different groups on the
basis of their different response time requirements or different
scheduling needs. One possible categorization may be based on
whether the process is a system process, batch process, or an
interactive process (see Figure 4.2). Each group of processes is
associated with a specific priority. For example, the system processes
may have the highest priority whereas the batch processes may have
the least priority.
To implement multilevel scheduling algorithm, the ready queue is
partitioned into as many separate queues as there are groups.
Whenever a new process enters, it is assigned permanently to one of
the ready queues depending on its properties like memory
requirements, type and priority. Each ready queue has its own
scheduling algorithm. For example, for batch processes, FCFS
scheduling algorithm may be used, and for interactive processes, one
may use the round robin scheduling algorithm. In addition, the
processes in higher priority queues are executed before those in
lower priority queues. This implies no batch process can run unless
all the system processes and interactive processes have been
executed completely. Moreover, if a process enters into a higher
priority queue while a process in lower priority queue is executing,
then the lower priority process would be preempted in order to
allocate the CPU to the higher priority process.

Fig. 4.2 Multilevel Queue Scheduling

Advantages
• Processes are permanently assigned to their respective queues
and do not move between queues. This results in low scheduling
overhead.

Disadvantages
• The processes in lower priority queues may have to starve for
CPU in case processes are continuously arriving in higher priority
queues. One possible way to prevent starvation is to time slice
among the queues. Each queue gets a certain share of CPU time
which it schedules among the processes in it. Note that the time
slice of different priority queues may differ.

4.4.8 Multilevel Feedback Queue Scheduling


The multilevel feedback queue scheduling also known as multilevel
adaptive scheduling is an improved version of multilevel queue
scheduling algorithm. In this scheduling algorithm, processes are not
permanently assigned to queues; instead they are allowed to move
between the queues. The decision to move a process between
queues is based on the time taken by it in execution so far and its
waiting time. If a process uses too much CPU time, it is moved to a
lower priority queue. Similarly, a process that has been waiting for too
long in a lower priority queue is moved to a higher priority queue in
order to avoid starvation.
To understand this algorithm, consider a multilevel feedback
queue scheduler (see Figure 4.3) with three queues, namely, Q1, Q2,
and Q3. Further, assume that the queues Q1 and Q2 employ round robin
scheduling algorithm with time quantum of 5 ms and 10 ms,
respectively while in queue Q3, the processes are scheduled in FCFS
order. The scheduler first executes all processes in Q1. When Q1 is
empty, the scheduler executes the processes in Q2. Finally, when both
Q1 and Q2 are empty, the processes in Q3 are executed. While
executing processes in Q2, if a new process arrives in Q1, the currently
executing process is preempted and the new process starts
executing. Similarly, a process arriving in Q2 preempts a process
executing in Q3.
Initially, when a process enters into ready queue; it is placed in Q1
where it is allocated the CPU for 5 ms. If the process finishes its
execution within 5 ms, it exits from the queue. Otherwise, it is
preempted and placed at the end of Q2. Here, it is allocated the CPU
for 10 ms (if Q1 is empty) and still if it does not finish, it is preempted
and placed at the end of Q3.
Fig. 4.3 Multilevel Feedback Queue Scheduling

Example 13 Consider four processes P1, P2, P3, and P4 with their
arrival times and required CPU burst (in milliseconds) as shown in the
following table.

Assume that there are three ready queues Q1, Q2 and Q3. The CPU
time slice for Q1 and Q2 is 5 ms and 10 ms, respectively and in Q3,
processes are scheduled on FCFS basis. How will these processes
be scheduled according to multilevel feedback queue scheduling
algorithm? Compute the average waiting time and average
turnaround time.
Solution The processes will be scheduled as depicted in the
following Gantt chart.
Initially, P1 enters the system at t = 0, placed in Q1 and allocated
the CPU for 5 ms. Since, it does not execute completely, it is moved
to Q2 at t = 5. Now Q1 is empty so the scheduler picks up the process
from the head of Q2. Since, P1 is the only process in Q2, it is again
allocated the CPU for 10 ms. But during its execution, P2 enters Q1 at
t = 12, therefore P1 is preempted and P2 starts executing. At t = 17, P2
is moved to Q2 and placed after P1. The CPU is allocated to the first
process in Q2, that is, P1. While P1 is executing, P3 enters Q1 at t = 25 so
P1 is preempted, placed after P2 in Q2 and P3 starts executing. As P3
executes completely within time slice, the scheduler picks up the first
process in Q2 which is P2 at t = 29. While P2 is executing, P4 enters Q1
at t = 32 because of which P2 is preempted and placed after P1 in Q2.
The CPU is assigned to P4 for 5 ms and at t = 37, P4 is moved to Q2
and placed after P2. At the same time, the CPU is allocated to P1(first
process in Q2). When it completes at t = 42, the next process in Q2
which is P2, starts executing. When it completes, the last process in
Q2, that is, P4 is executed.

Since Turnaround time = Exit time – Entry time, we have:


Turnaround time for P1= (42 – 0) = 42 ms
Turnaround time for P2= (52 – 12) = 40 ms
Turnaround time for P3= (29 – 25) = 4 ms
Turnaround time for P4= (57 – 32) = 25 ms
Average turnaround time = (42 + 40 + 4 + 25)/4 = 27.75 ms
Since Waiting time = Turnaround time – Processing time, we
have:
Waiting time for P1= (42 – 25) = 17
Waiting time for P2= (40 – 18) = 22 ms
Waiting time for P3= (4 – 4) = 0 ms
Waiting time for P4= (25 – 10) = 15 ms
Average waiting time = (17 + 22 + 0 + 15)/4 = 13.5 ms

Advantages
• It is fair to I/O-bound (short) processes as these processes need
not wait too long and are executed quickly.
• It prevents starvation by moving a lower priority process to a
higher priority queue if it has been waiting for too long.

Disadvantages
• It is the most complex scheduling algorithm.
• Moving the processes between queues causes a number of
context switches which results in an increased overhead.
• The turnaround time for long processes may increase significantly.

4.5 MULTIPLE PROCESSOR SCHEDULING


So far we have discussed the scheduling of a single processor
among a number of processes in the queue. In the case of more than
one processor, different scheduling mechanisms need to be
incorporated. In this section, we will concentrate on homogeneous
multiprocessor systems which mean the systems in which all
processors are identical in terms of their functionality, and any
process in the queue can be assigned to any available processor.
The scheduling criteria for multiprocessor scheduling are same as
that for single processor scheduling. But there are also some new
considerations which are discussed here.

Implementation of Ready Queue


In multiprocessor systems, the ready queue can be implemented in
two ways. Either there may be a separate ready queue for each
processor [see Figure 4.4 (a)] or there may be a single shared ready
queue for all the processors [see Figure 4.4 (b)]. In the former case, it
may happen that at any moment the ready queue of one processor is
empty while the other processor is very busy in executing processes.
To prevent this situation, the latter approach is preferred in which all
the processes enter into one queue and scheduled on any available
processor.

Fig. 4.4 Implementation of Ready Queue in Multiprocessor Systems

Scheduling Approaches
The next issue is how to schedule the processes from the ready
queue to multiple processors. For this, one of following scheduling
approaches may be used.
• Symmetric multiprocessing (SMP): In this approach, each
processor is self-scheduling. For each processor, the scheduler
selects a process for execution from the ready queue. Since,
multiple processors need to access common data structure, this
approach necessitates synchronization among multiple
processors. This is required so that no two processors could
select the same process and no process is lost from the ready
queue.
• Asymmetric multiprocessing: This approach is based on the
master-slave structure among the processors. The responsibility
of making scheduling decisions, I/O processing and other system
activities is up to only one processor (called master), and other
processors (called slaves) simply execute the user’s code.
Whenever some processor becomes available, the master
processor examines the ready queue and selects a process for it.
This approach is easier to implement than symmetric
multiprocessing as only one processor has access to the system
data structures. But at the same time, this approach is inefficient
because a number of processes may block on the master
processor.

Load Balancing
On SMP systems having a private ready queue for each processor, it
might happen at a certain moment of time that one or more
processors are sitting idle while others are overloaded with a number
of processes waiting for them. Thus, in order to achieve the better
utilization of multiple processors, load balancing is required which
means to keep the workload evenly distributed among multiple
processors. There are two techniques to perform load balancing,
namely, push migration and pull migration.
In push migration technique, the load is balanced by periodically
checking the load of each processor and shifting the processes from
the ready queues of overloaded processors to that of less overloaded
or idle processors. On the other hand, in pull migration technique,
the idle processor itself pulls a waiting process from a busy
processor.
Note: Load balancing is often unnecessary on SMP systems with a
single shared ready queue.

Processor Affinity
Processor affinity means an effort to make a process to run on the
same processor it was executed last time. Whenever a process
executes on a processor, the data most recently accessed by it is
kept in the cache memory of that processor. Next time if the process
is run on the same processor, then most if its memory accesses are
satisfied in the cache memory only and as a result, the process
execution speeds up. However, if the process is run on some different
processor next time, the cache of the older processor becomes
invalid and the cache of the new processor is to be re-populated. As
a result, the process execution is delayed. Thus, an attempt should
be made by the operating system to run a process on the same
processor each time instead of migrating it to some another
processor.
When an operating system tries to make a process to run on the
same processor but does not guarantee to always do so, it is referred
to as soft affinity. On the other hand, when an operating system
provides system calls that force a process to run on the same
processor, it is referred to as hard affinity. In soft affinity, there is a
possibility of process migration from one processor to another
whereas in hard affinity, the process is never migrated to some
another processor.

4.6 REAL-TIME SCHEDULING


As discussed in Chapter 1, a real-time system has well-defined, fixed
time constraints which if not met, may lead to system failure even
though the output produced is correct. Therefore, the ultimate goal of
real-time systems is to produce the correct output within its certain
time constraints. The real-time systems are of two types: hard real-
time systems and soft real-time systems.

4.6.1 Hard Real-time Systems


In hard real-time systems, a process must be accomplished within the
specified deadlines; otherwise, undesirable results may be produced.
A process serviced after its deadline has crossed does not make any
sense. Industrial control and robotics are the examples of hard real-
time systems.
In hard-real time systems, the scheduler requires a process to
declare its deadline requirements before entering into the system.
Then it employs a technique known as admission control algorithm
to decide whether the process should be admitted. The process is
admitted if the scheduler can ensure that it will be accomplished by
its deadline; otherwise it is rejected. The scheduler can give
assurance of process completion on time only if it knows the exact
time taken by each function of the operating system to perform, and
each function is guaranteed to be performed within that duration of
time. But practically, it is not possible to provide such assurance in
the case of systems with secondary storage and virtual memory. This
is because in such systems the amount of time to execute a process
may vary. Thus, hard real-time systems are composed of special-
purpose software running on hardware committed to their vital
processes.

4.6.2 Soft Real-time Systems


In soft real-time systems, the requirements are less strict; it is not
mandatory to meet the deadline. A real-time process always gets the
priority over other tasks, and retains the priority until its completion. If
the deadline could not be met due to any reason, then it is possible to
reschedule the task and complete it. Multimedia, virtual reality, and
advanced scientific applications such as undersea exploration come
under the category of soft real-time systems.
The implementation of scheduling in soft real-time systems
requires the following properties to be considered:
• The system must employ preemptive priority-based scheduling
and real-time processes must be assigned higher priority than
non real-time processes. Also, the priority of the real-time
processes must not change during their life time.
• The dispatch latency must be low so that a runnable real-time
process could start as early as possible.
The system can guarantee the first property by prohibiting aging
(discussed in Section 4.4.4) on real-time processes thereby
preserving their priority. However, guaranteeing the second property
is somewhat difficult. This is because most operating systems are
constrained to wait for the completion of some system call or I/O
operation before context switching a process thereby resulting in high
dispatch latency.
One way to keep dispatch latency low is to provide preemptive
kernels, which allows the preemption of a process running in kernel
mode. A number of approaches can be used to make the kernel
preemptive; two of them are as follows:
• The first approach is to place preemption points at the safe
locations (where the kernel data is not being modified) in the
kernel. A preemption point determines whether there is a high-
priority process ready for execution. If so, the kernel process is
preempted, context switching is performed, and the high-priority
process is made to run. After the high-priority process has been
executed, the preempted process is rescheduled.
• Since the first approach allows a kernel process to be preempted
only at preemption points, a high-priority process may have to
wait while the process is executing in unsafe locations. As a
result, dispatch latency would be large. Thus, an alternative
approach is to make the kernel process preemptible at all times.
However to facilitate this, this approach needs to employ some
synchronization mechanisms. These mechanisms ensure the
protection of kernel data from modification by the high-priority
process if it comes when the process to be preempted is updating
the kernel data. This approach is efficient and used widely.
The aforementioned approaches suffer from priority inversion
problem. This problem occurs when the high-priority process requires
accessing (read/write) the kernel data currently being accessed by
some low-priority process or a chain of low-priority processes. In such
a case, the high-priority process is forced to wait for the low-priority
process or processes to complete their execution thereby resulting in
large dispatch latency.
The priority inversion problem can be overcome using the priority
inheritance protocol. This protocol allows the low-priority processes
that are currently accessing the resources required by the high-
priority process to inherit higher priority until they finish their work with
the required resource. Once they have finished with the resource,
their priorities revert to the original ones.

4.7 ALGORITHM EVALUATION


In Section 4.4, we have studied various scheduling algorithms. But,
now the issue arises on the selection of a scheduling algorithm for a
particular system. For this, we need to evaluate the performance of
different scheduling algorithms under given system workload and find
out the most suitable one for the system. This section discusses
some commonly used methods to evaluate scheduling algorithms.

Deterministic Modeling
Deterministic modeling is the simplest and direct method used to
compare the performance of different scheduling algorithms on the
basis of some specific criteria. It takes into account the pre-specified
system workload and measures the performance of each scheduling
algorithm for that workload.
For example, consider a system with workload as shown below.
We have to select an algorithm out of FCFS, SJF, and RR (with time
slice 8 ms), which results in minimum average waiting time.

According to FCFS, SJF, and RR scheduling algorithms, the


processes will be scheduled as depicted in the Gantt charts shown in
Figure 4.5.
According to FCFS, the average waiting time = (0 + 6 + 19 + 20)/4
= 11.25 ms
According to SJF, the average waiting time = (0 + 13 + 4 + 5)/4 =
5.5 ms
According to RR, the average waiting time = (0 + 13 + 12 + 13)/4
= 9.5 ms
From the above calculation, we can study the comparative
performance of scheduling algorithms. SJF scheduling algorithm
results in average waiting time less than half of that in FCFS while the
RR scheduling results in an intermediate value. Thus, for the given
system workload, SJF scheduling will work best.

Fig. 4.5 Deterministic Modeling

Though the deterministic modeling returns exact measures to


compare the performance of scheduling algorithms, it requires the
exact processing requirements of processes to be provided as input.
Thus, deterministic modeling is suitable for systems in which same
programs may run again and again thereby providing exact measures
of CPU bursts and I/O bursts of processes.

Queuing Models
Generally, there is no fixed set of processes that run on systems;
thus, it is not possible to measure the exact processing requirements
of processes. However, we can measure the distributions of CPU
bursts and I/O bursts during the life time of processes and derive a
mathematical formula that identifies the probability of a specific CPU
burst. Similarly, the arrival rate of processes in the system can also
be approximated.
The use of mathematical models for evaluating performance of
various systems led to the development of queuing theory, a branch
of mathematics. The fundamental model of queuing theory is identical
to the computer system model. Each computer system is represented
as a set of servers (such as CPU, I/O devices, etc.) with each server
having its own queue. For example, CPU has a ready queue and an
I/O device has a device queue associated with itself. By having
knowledge of arrival rates of processes in each queue and service
rates of processes, we can find out the average length of queue,
average waiting time of processes in the queue, etc.
For example, consider that L denotes the average queue length, W
denotes the average waiting time of a process in the queue, and a
denotes the average arrival rate of processes in the queue. The
relationship between L, W, and a can be expressed by the Little’s
formula, as given below:
L = a×W
This formula is based on the facts discussed next:
• During the time a process waits in the queue (W), (a × W) new
processes enter the queue.
• The system is in steady state, that is, the number of processes
exiting from the queue is equal to the number of processes
entering the queue.
Note: The performance evaluation using the queuing theory is known
as queuing analysis.
In spite of the fact that queuing analysis provides a mathematical
formula to evaluate the performance of scheduling algorithms, it
suffers from few limitations. We can use queuing analysis for only
limited classes of scheduling algorithms, not for all. Moreover, it is
based on approximations; therefore, the accuracy of calculated
results depends on how closely the approximations match with the
real system.

Simulations
Simulations are the more accurate method of evaluating algorithms
that mimic the dynamic behaviour of a real computer system over
time. The computer system model is programmed and all the major
components of system are represented by the data structures. The
simulator employs a variable representing a clock. As the clock is
incremented, the current system state is changed to reflect the
changed actions of processes, scheduler, I/O devices, etc. While the
simulation executes, the system parameters that affect the
performance of scheduling algorithms such as CPU burst, I/O burst,
and so on are gathered and recorded.
The data to drive the simulation can be generated using the trace
tapes, which are created by monitoring the system under study and
recording the events taking place. The sequence of recorded events
is then used to drive the simulation. Although trace tapes is the easier
method to compare the performance of two different scheduling
algorithms for the same set of real inputs, they need a vast amount of
storage space. Moreover, simulation requires a lot of computer time;
this makes it an expensive method.

4.8 THREAD SCHEDULING


As discussed in Chapter 3, there are two types of threads: user-level
threads and kernel-level threads. Kernel-level threads are managed
and scheduled by the operating system, whereas user-level threads
are managed by the thread library. Since CPU cannot execute user-
level threads directly, they need to be mapped to an associated
kernel-level thread. This mapping could be direct or indirect via a light
weight process (LWP).
Since the two types of threads are different, there exist two
schemes to schedule these threads: process contention scope and
system contention scope. In process contention scope (PCS)
scheme, the thread library schedules user-level threads to be
mapped on a single available LWP. This scheme is named so
because it schedules the threads belonging to the same process
which are competing for the CPU. That is, the scheduling is done at
the process level. Systems that use many-to-one and many-to-many
models (discussed in Chapter 3), use PCS scheme for scheduling
user-level threads. In PCS, the scheduler selects the runnable thread
with the highest priority to execute. Since PCS involves scheduling of
user-level threads, priorities are set or modified by the programmer
only—thread library is not responsible for setting or adjusting the
priorities.
Mapping a user-level thread to an LWP does not mean that the
thread is actually running on the CPU; rather, each LWP further
needs to be attached to some kernel thread that the operating system
will schedule to actually run on a CPU. The scheme that involves
deciding which kernel thread to schedule onto CPU is termed as
system contention scope (SCS). This scheme is named so
because it schedules the threads of the entire system competing for
the CPU. Thus, scheduling is done at system level. Systems using
one-to-one model use SCS scheme for scheduling kernel-level
threads.
Let us discuss the contention scope in context to Pthread. POSIX
Pthread API allows thread scheduling using either PCS or SCS.
Pthread identifies the following contention scope values for different
contention schemes:
• PTHREAD_SCOPE_PROCESS: It schedules threads using PCS
scheme/scheduling.
• PTHREAD_SCOPE_SYSTEM: It schedules threads using SCS
scheme/scheduling.
PTHREAD_SCOPE_PROCESS schedules user-level threads to run on
available LWPs. LWPs are maintained by thread library.
PTHREAD_SCOPE_SYSTEM creates and bind an LWP for each
user-level thread on many-to-many systems to kernel-level threads
resulting in effective mapping using one-to-one approach. In order to
get and set the contention scope values, Pthread IPC provides the
following functions:
• Pthread_attr_setscope(pthread_attr_t *attr, int scope)

• Pthread_attr_getscope(pthread_attr_t *attr, int *scope)

where,
• pthread_attr_t *attr specifies a pointer to the attribute set for the
thread.
• int scope specifies how the contention scope is to be set. The
value of this parameter could be either PTHREAD_SCOPE_PROCESS or
PTHREAD_SCOPE_SYSTEM.

• int *scope gives a pointer to the int value which is set to the
current value of the contention scope.
Note: Both the functions return nonzero values, in case an error
occurs.

LET US SUMMARIZE
1. The algorithm used by the scheduler to carry out the selection of a
process for execution is known as scheduling algorithm.
2. The time period elapsed in processing before performing the next I/O
operation is known as CPU burst.
3. The time period elapsed in performing I/O before the next CPU burst is
known as I/O burst.
4. The module of the operating system that performs the function of setting
up the execution of the selected process on the CPU is known as
dispatcher.
5. For scheduling purposes, the scheduler may consider some performance
measures and optimization criteria which include fairness, CPU utilization,
balanced utilization, throughput, waiting time, turnaround time and
response time.
6. A wide variety of algorithms are used for the CPU scheduling. These
scheduling algorithms fall into two categories, namely, non-preemptive
and preemptive.
7. In non-preemptive scheduling algorithms, once the CPU is allocated to a
process, it cannot be taken back until the process voluntarily releases it or
the process terminates.
8. In preemptive scheduling algorithms, the CPU can be forcibly taken back
from the currently running process before its completion and allocated to
some other process.
9. FCFS is one of the simplest non-preemptive scheduling algorithms in
which the processes are executed in the order of their arrival in the ready
queue.
10. The shortest job first also known as shortest process next or shortest
request next is a non-preemptive scheduling algorithm that schedules the
processes according to the length of CPU burst they require.
11. The shortest remaining time next also known as shortest time to go is a
preemptive version of the SJF scheduling algorithm. It takes into account
the length of remaining CPU burst of the processes rather than the whole
length in order to schedule them.
12. In priority-based scheduling algorithm, each process is assigned a priority
and the higher priority processes are scheduled before the lower priority
processes.
13. The highest response ratio next scheduling is a non-preemptive
scheduling algorithm that schedules the processes according to their
response ratio. Whenever CPU becomes available, the process having
the highest value of response ratio among all the ready processes is
scheduled next.
14. The round robin scheduling is one of the most widely used preemptive
scheduling algorithms in which each process in the ready queue gets a
fixed amount of CPU time (generally from 10 to 100 milliseconds) known
as time slice or time quantum for its execution.
15. The multilevel queue scheduling is designed for the environments where
the processes can be categorized into different groups on the basis of
their different response time requirements or different scheduling needs.
16. The multilevel feedback queue scheduling also known as multilevel
adaptive scheduling is an improved version of multilevel queue scheduling
algorithm. In this scheduling algorithm, processes are not permanently
assigned to queues; instead they are allowed to move between the
queues.
17. In multiprocessor systems, the ready queue can be implemented in two
ways. Either there may be a separate ready queue for each processor or
there may be a single shared ready queue for all the processors.
18. In symmetric multiprocessing scheduling approach, each processor is self-
scheduling. For each processor, the scheduler selects a process for
execution from the ready queue.
19. In asymmetric multiprocessing scheduling approach, the responsibility of
making scheduling decisions, I/O processing and other system activities is
up to only one processor (called master), and other processors (called
slaves) simply execute the user’s code.
20. A real-time system has well-defined, fixed time constraints which if not
met, may lead to system failure even though the output produced is
correct. It is of two types: hard real-time system and soft real-time system.
21. In hard-real time systems, the scheduler requires a process to declare its
deadline requirements before entering into the system. Then it employs a
technique known as admission control algorithm to decide whether the
process should be admitted.
22. In soft real-time systems, the requirements are less strict; it is not
mandatory to meet the deadline. A real-time process always gets the
priority over other tasks, and retains the priority until its completion. If the
deadline could not be met due to any reason, then it is possible to
reschedule the task and complete it.
23. To select a scheduling algorithm for a particular system, we need to
evaluate the performance of different scheduling algorithms under given
system workload and find out the most suitable one for the system. Some
of the commonly used evaluation methods are deterministic modeling,
queuing models, and simulations.
24. There are two types of threads: user level threads and kernel level
threads. There exist two schemes to schedule these threads: process
contention scope and system contention scope. In process contention
scope (PCS) scheme, the thread library schedules user-level threads to
be mapped on a single available LWP. On the other hand, the system
contention scope (SCS) scheme involves deciding which kernel thread to
schedule onto CPU.

EXERCISES
Fill in the Blanks
1. The time period elapsed in performing I/O before the next CPU burst is
known as _____________.
2. In _____________ scheduling, once the CPU is allocated to a process, it
cannot be taken back until the process voluntarily releases it or the
process terminates.
3. _____________ is non-preemptive scheduling algorithms in which the
processes are executed in the order of their arrival in the ready queue.
4. The _____________ is a preemptive scheduling algorithm in which each
process in the ready queue gets a fixed amount of CPU time.
5. The two schemes to schedule threads are _____________ and
_____________.

Multiple Choice Questions


1. The time period elapsed in processing before performing the next I/O
operation is called _____________.
(a) CPU burst
(b) I/O burst
(c) Process burst
(d) None of these
2. Which of these operating system modules performs the function of setting
up the execution of the selected process on the CPU?
(a) CPU scheduler
(b) Job scheduler
(c) Dispatcher
(d) None of these
3. Which of the following algorithms is also known as shortest time to go?
(a) Shortest job first
(b) Shortest process next
(c) Shortest remaining time next
(d) Shortest request next
4. Which of the following is also known as multilevel adaptive scheduling?
(a) Multilevel queue scheduling
(b) Multilevel feedback queue scheduling
(c) Multilevel scheduling
(d) None of these
5. In which of the following systems the scheduler requires a process to
declare its deadline requirements before entering into the system?
(a) Hard-real time systems
(b) Asymmetric systems
(c) Soft-real time systems
(d) None of these

State True or False


1. The algorithm used by scheduler to carry out selection of a process for
execution is known as scheduling algorithm.
2. Two main types of scheduling are preemptive and non-preemptive
scheduling.
3. In preemptive scheduling algorithms, once the CPU is allocated to a
process, it cannot be taken back until the process voluntarily releases it or
the process terminates.
4. The shortest remaining time next is a non preemptive version of the SJF
scheduling algorithm.
5. A real-time system has well-defined, fixed time constraints.

Descriptive Questions
1. Distinguish between non-preemptive and preemptive scheduling
algorithms.
2. Define throughput, turnaround time, waiting time, and response time.
3. List the situations that may require the scheduler to make scheduling
decisions.
4. Which non-preemptive scheduling algorithms suffer from starvation and
under what conditions?
5. Describe scheduling in soft real-time systems.
6. Explain the relation (if any) between the following pairs of scheduling
algorithms.
(a) Round robin and FCFS
(b) Multilevel feedback queue and FCFS
(c) SJF and SRTN
(d) SRTN and priority-based
7. Consider five processes P1, P2, P4, P4 and P5 with their arrival times,
required CPU burst (in milliseconds), and priorities as shown in the
following table.
Assume that the lower priority number means the higher priority.
Compute the average waiting time and average turnaround time
of processes for each of the following scheduling algorithms. Also
determine which of the following scheduling algorithms result in
minimum waiting time.
(a) FCFS
(b) SJF
(c) HRN
(d) Non-preemptive priority-based
8. Consider three processes P1, P2, and P3 with same arrival time t= 0. Their
required CPU burst (in milliseconds) is shown in the following table.

Assuming that the time slice is 4 ms, how will these processes be
scheduled according to round robin scheduling algorithm? Compute the
average waiting time and average turnaround time.
9. Consider the same set of processes as shown in Question 7. Compute the
average waiting time and average turnaround time of processes for each
of the following scheduling algorithms.
(a) SRTN
(b) Preemptive priority-based
(c) Round robin (if CPU time slice is 2 ms)
Compare the performance of these scheduling algorithms with each other.
10. Which of the following scheduling algorithms favor the I/O-bound
processes and how?
(a) Multilevel feedback queue
(b) SJF
(c) HRN
11. Write short notes on the following.
(a) Thread scheduling
(b) Soft affinity vs. hard affinity
(c) Dispatcher
(d) Scheduling approaches for multiprocessor scheduling
12. Differentiate between multilevel queue and multilevel feedback queue
scheduling.
13. Consider a scheduling algorithm that prefers to schedule those processes
first which have consumed the least amount of CPU time. How will this
algorithm treat the I/O-bound and CPU-bound processes? Is there any
chance of starvation?
14. Explain various methods used for evaluating performance of scheduling
algorithms.
chapter 5

Process Synchronization

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the principles of concurrency.
⟡ Describe how to implement control synchronization using
precedence graph.
⟡ Define the critical-section problem.
⟡ Explain the software solutions to critical-section problem, including
strict alternation, Dekker’s algorithm, Peterson’s algorithm and
Bakery algorithm.
⟡ Discuss the hardware-supported solutions for critical-section
problem.
⟡ Define semaphores.
⟡ Discuss various classical synchronization problems and their
solutions using semaphores.
⟡ Understand the concept of monitors and message passing.

5.1 INTRODUCTION
Operating systems that support multiprogramming allow multiple
processes to execute concurrently in a system even with a single
processor. The concurrent processes may interact with one another
by sharing data or exchanging messages and control signals in order
to coordinate their actions with respect to one another. To implement
the interactions among cooperating (or interacting) processes, the
operating system must provide a means of synchronization. Based on
the nature of interactions among cooperating processes, two kinds of
synchronization have been identified: control synchronization and
data access synchronization.
Control synchronization is needed when cooperating processes
need to coordinate their execution with respect to one another. For
example, a process cannot perform a certain action in its execution
until some other process or processes have been executed up to a
specific point in their execution. Control synchronization is
implemented with the help of precedence graph of cooperating
processes. On the other hand, data access synchronization is
needed when cooperating processes access shared data. The use of
shared data by cooperating processes may lead to unexpected
results because of race conditions (discussed in the next section).
The data access synchronization mechanisms provide mutually
exclusive access to shared data, thus ensuring that the race
conditions do not occur. This chapter discusses various mechanisms
that have been developed to provide synchronization among
cooperating processes.

5.2 PRINCIPLES OF CONCURRENCY


Concurrency which implies simultaneous execution of multiple
processes is central to the design of any operating system whether it
is a single-processor multiprogramming system, a multiprocessor or a
distributed system. On a single-processor multiprogramming system,
concurrency is achieved by executing the process in an interleaved
manner. Though interleaving the execution of processes does not
actually result in parallel processing and the processes just appear to
run simultaneously, it proves advantageous in terms of processing
efficiency and program structuring. On the other hand, the
concurrency is achieved on a multiprocessor system by interleaving
and sometimes overlapping the execution of processes.
Concurrent processing whether it is on a single-processor
multiprogramming system or on a multiprocessor system poses
certain problems, which are as follows.
• One of the major functions of an operating system is to allocate
the resources to processes in an optimal manner; however, it
becomes difficult when resources are being shared among
several concurrent processes. For example, suppose that a
process requests for an I/O channel and the request is granted
but the process is suspended before it uses the channel. In such
a case, it is not desirable for the operating system to lock the
channel and prevent its access by other processes, as doing so
may lead to deadlock (discussed in Chapter 6).
• It can be difficult to locate the programming errors, because the
contexts in which errors occur cannot always be reproduced
easily.
• Sharing of global resources safely becomes jeopardize. For
example, if two processes share a single variable and attempt to
perform read and write operations on it, then the order in which
read and write operations are performed is critical.

Operating System Concerns


Concurrency arises various issues in the design and management of
operating systems. Some of these issues are given below.
• The operating system must keep track of the multiple processes
existing in the system. This is accomplished with the help of
process control blocks (discussed in Chapter 2).
• The operating system must be able to provide protection of data
and other physical resources such as I/O devices, memory, etc.,
assigned to one process from being interfered by other
processes.
• The operating system must be able to handle the resource
allocation and deallocation requests of multiple processes.
Sometimes, two or more processes may also simultaneously
request for the same resource. In such situations, the operating
system must have some means to decide which process should
be allocated the resources.
• When multiple processes run concurrently, the speed of a process
may differ from other processes; however, this should not affect
the functioning of the process as well as the output produced by
the process. This is the responsibility of operating system to
ensure the correct functioning of a process.

Race Condition
As mentioned above, unordered execution of cooperating processes
may result in data inconsistency. To understand the concept, consider
two cooperating processes P1 and P2 that update the balance of an
account in a bank. The code segment for the processes is given in
Table 5.1.
Table 5.1 Code Segment for Processes P1 and P2

Process P1 Process P2
Read Balance Read Balance
Balance = Balance + Balance = Balance – 400
1000

Suppose that the balance is initially 5000, then after the execution
of both P1 and P2, it should be 5600. The correct result is achieved if
P1 and P2 execute one by one in any order either P1 followed by P2 or
P2 followed by P1. However, if the instructions of P1 and P2 are
interleaved arbitrarily, the balance may not be 5600 after the
execution of both P1 and P2. One possible interleaving sequence for
the execution of instructions of P1 and P2 is given in Table 5.2.

Table 5.2 Possible Interleaved Sequence


The above interleaved sequence results in an inconsistent
balance, that is, 4600. If the order of last two instructions is
interchanged, the balance would be 6000 (again, inconsistent). Note
that a situation where several processes sharing some data execute
concurrently and the result of execution depends on the order in
which the shared data is accessed by the processes is called race
condition.

Mutual Exclusion
To avoid race conditions or inconsistent results, some form of
synchronization among the processes is required which ensures that
only one process is manipulating the shared data at a time. In other
words, we need to ensure mutual exclusion. That means if a
process P1 is manipulating shared data, no other cooperating process
should be allowed to manipulate it until P1 finishes with it. In the
previous example, since mutual exclusion was not ensured while
accessing the shared data, it resulted in inconsistent result.

5.3 PRECEDENCE GRAPH


As discussed earlier, precedence graph is used to implement the
control synchronization. Here, we discuss the control synchronization
at the start or end of the process. A process (say, Pi) is said to
precede another process (say Pj) if Pj cannot begin its execution until
Pi has completed its execution. Graphically, it is represented using
the notation Pi → Pj. Notice that if there exists the relation Pi → Pj
and Pj → Pk, then it can be implied that Pi → Pk. In other words,
precedence relation on processes is transitive. The transitive
precedence is represented using the notation Pi Pk.
Formally, a precedence graph is defined as a directed graph G =
(N, E) where N is a set of nodes and E is a set of directed edges. The
nodes in precedence graph may correspond to the cooperating
processes, or the statements in a program that can be executed
concurrently. An edge from a node (say, representing a process P1) to
another node (say representing a process P2) implies that the process
P1 must be completed before the process P2 can begin its execution.

To understand how precedence graph is made, consider the


following sequential program segment with a set of statements.

Suppose that we want to execute some of these statements


concurrently. It is clear from the given statements that the statements
S1 and S2 can be executed concurrently because neither of these
depends on the other. The statement S3 can be executed only after
the statements S1 and S2 have been executed completely. That is,
both S1 and S2 precede S3. Similarly, we can observe that statement S3
precedes S 4, S1 precedes S5 and both S4 and S5 precede S6. Figure
5.1 shows the precedence graph for the given set of statements.
Fig. 5.1 Precedence Graph

Given a precedence graph, we can determine the number of


processors required to execute it in an efficient manner as well as the
order in which the activities (or processes) should be executed. The
number of processors that can be used to efficiently execute a
precedence graph is equal to the cardinality of the largest set of the
nodes of graph such that no two nodes in the set depend on each
other. For example, the precedence graph shown in Figure 5.1 can
be efficiently executed with two processors. One processor executes
the statements S1, S5 and S6 in order, while the other simultaneously
executes statements S2, S3 and S 4 in order. However, both
processors must synchronize with each other in such a way that the
second processor starts executing statement S3 only after the first
processor has completed the execution of S1.

Concurrency Conditions
To determine whether two statements (say Si and Sj) can be
executed concurrently with providing valid outputs, the following three
conditions (called Bernstein’s conditions) must hold:
Here, R(Si) is the read set of Si, which includes the variable(s)
whose value has been referenced in Si during execution and W(Si) is
the write set of Si, which includes the variable(s) whose value is to be
modified upon the execution of Si. For example, the read sets and
write sets of the given statements are as follows.

Observe that the statements S1 and S2 can be executed


concurrently, as R(S1) ∩ W(S2) = W(S1) ∩ R(S2) = W(S1) ∩ W(S2)={}.
However, the statement S3 cannot be executed concurrently with S2
because W(S2) ∩ R(S3)={y}. Similarly, we can observe that the
statement S 4 can be executed concurrently with S1 and S2, while not
with S5. Also, the statement S5 cannot be executed concurrently with
S6.

5.4 CRITICAL REGIONS


The portion of the code of a process in which it accesses or changes
the shared data is known as its critical region (also called critical
section). It is important for the system to ensure that the execution of
critical sections by the cooperating processes is mutually exclusive.
5.4.1 Critical-Section Problem
The critical-section problem is to design a protocol that the processes
can use to cooperate. Each process must request permission to enter
its critical section and signal the entrance by setting the values of
some variables. The process does this in the code just before the
critical section. That part of code is called the entry section. After
executing the critical section, the process again sets some variables
to signal the exit from the critical section. The portion of code in which
the process does this is called the exit section. A solution to critical-
section problem must satisfy the following requirements.
• Mutual exclusion: This condition states that no two cooperating
processes can enter into their critical sections at the same time.
That is, the access to critical section must be mutually exclusive.
For example, if a process P1 is executing in its critical section, no
other cooperating process can enter in the critical section until P1
finishes with it.
• Progress: Suppose a process P1 is executing in its critical section,
then all other processes that wish to enter their critical sections
have to wait. When P1 finishes its execution in critical section, a
decision as to which process will enter its critical section next is to
be made. In the decision, only the waiting processes will
participate, and the decision should be made in a finite amount of
time. A process that has exited from its critical section cannot
prevent other processes from entering their critical sections.
• Bounded waiting: A process wishing to enter its critical section
cannot be delayed indefinitely. There is an upper bound on the
number of times that other processes are allowed to enter their
critical sections after a process has made a request to enter its
critical section and before the permission is granted.
In addition, there is one more requirement, which is no
assumptions can be made about the number of processors available
and the relative speed of execution of processes.
A number of mechanisms have been developed to solve the
critical-section problem. These mechanisms include software
solutions, hardware-supported solutions, operating system primitives
and programming language constructs.

5.5 SYNCHRONIZATION: SOFTWARE


APPROACHES
One way to achieve solution to critical-section problem is to leave the
responsibility of coordination on the processes that are to be
executed concurrently. The processes themselves coordinate with
one another to ensure mutual exclusion without any support from the
operating system or without using any programming language
constructs. Such solutions are referred to as software approaches.
In this section, we present the software approaches for two
processes and multiple processes.

5.5.1 Strict Alternation: Attempt for Two-Process


Solution
An attempt to solve the critical-section problem applicable to two
processes is strict alternation. In this algorithm, the two processes,
say P0 and P1, share a variable turn of type int initialized to either 0
or 1. The general structure of the code segment for process P0 is as
follows.
The general structure of the code segment for process P1 is as
follows.

The value of variable turn lets the processes decide whether to


enter critical-section or wait. For example, when process P0 wishes to
enter its critical section, it checks the value of turn. If the value of
turn is 0, process P0 enters its critical section. In the mean time, if
process P1 also wishes to enter in its critical section, it has to wait until
the value of turn becomes 1. Note that P0 sets value of turn to 1 after
it exits the critical section. Hence, P1 enters only when P0 exits.
Similarly, when P1 is executing its critical section (means the value of
turn is 1), if P0 attempts to enter its critical section, it has to wait until
value of turn becomes 0. This way the mutual exclusion is ensured.
Though above algorithm ensures mutual exclusion, it does not
ensure the progress requirement. Suppose P0 is executing in its
critical section and P1 is waiting to enter its critical section. When P0
exits, it allows P1 to enter its critical section by setting value of turn to
1. P1 enters the critical section and P0 executes its remaining code.
Suppose P1 exits its critical section, sets value of turn to 0, and starts
executing its remaining code. Now, if P1 completes execution in its
remaining code and attempts to enter its critical section, it cannot
enter. This is because the value of turn is still 0 which means only P0
can enter. Clearly, P1 is blocked by P0 which is not executing in its
critical section.
Moreover, the strict alternation algorithm causes busy waiting.
When a process is in its critical section, another process is forced to
wait. The waiting process continuously tests the value of turn variable
until it becomes equal to the number of process and the process is
allowed to enter into its critical section. This procedure is called busy
waiting (or spin waiting). Usually, the busy waiting is undesirable as
the waiting process does nothing productive during this time and
thus, wastes CPU time.
Note: Processes can enter in critical section in strict ordering. Once
P0 then P1 then P0 then P1, and so on. A process cannot enter its
critical section twice in a row.

5.5.2 Dekker’s Algorithm: Two Process solution


In 1962, a Dutch mathematician, T. Dekker devised a software
solution to mutual exclusion problem for two processes. He was the
first person to devise such a solution that did not require strict
alternation. Dekker’s algorithm combines the idea of taking turns with
the lock variables and allows two processes to share the following
two variables.
int turn;
boolean flag[2];

Initially, the flag of both processes (that is, flag[0] and flag[1])
are set to false. When a process wishes to enter in its critical section,
it must set its corresponding flag to true in order to announce to the
other process that it is attempting to enter in its critical section. In
addition, the turn variable (initialized to either 0 or 1) is used to avoid
the livelock—a situation that arises when both processes prevent
each other indefinitely from entering into the critical sections.
Suppose we have two processes say, P0 and P1. The general
structure of the code segment for process P0 is as follows.
The general structure of the code segment for process P1 is as
follows.
To understand how Dekker’s algorithm works, suppose initially,
the process P0 wants to enter its critical section. Therefore, P0 sets
flag[0] to true. It then examines the value of flag[1]. If it is found
false, P0 immediately enters its critical section; otherwise, it checks
the value of turn. If turn=1, then P0 understands that it is the turn of P1
and so, sets flag[0] to false and continues to wait until turn becomes
0. On the other hand, if P0 finds turn=0, then it understands that it is
its turn and thus, periodically checks flag of P1, that is, flag[1]. When
at some point of time, P1 sets flag[1] to false, P0 proceeds. After the
process P0 has exited its critical section, it sets flag[0] to false and
turn to 1 to transfer the right to enter the critical section to P1.
In case both processes wish to enter their critical sections at the
same time which implies that both flag[0] and flag[1] are set to
true, the value of turn variable decides which process can enter into
its critical section. If turn=0, P1 sets flag[1] to false, thus allowing P0
to enter into its critical section and vice versa if turn=1. This ensures
the mutual exclusion requirement. Observe that one of the competing
processes is allowed to enter the critical section at the time when
both attempt to enter at the same time depending on the value of turn
variable, while the other one can enter after the first one has exited
from its critical section. Thus, a process will eventually always run
when both attempt to enter at the same time. This satisfies the
bounded waiting requirement, ensuring freedom from deadlock and
livelock conditions.

5.5.3 Peterson’s Algorithm: Two-Process Solution


Though Dekker’s algorithm solved the critical-section problem, it is a
complex program which is difficult to follow. Thus in 1981, Peterson
proposed a much simpler algorithm to solve the critical-section
problem for two processes. This algorithm also lets the two processes
P0 and P1 to share the following two variables.
int turn;
boolean flag[2];

The value of variable turn is initialized to either 0 or 1 and both the


elements of array flag are initialized to false. The general structure
of the code segment for process P0 is as follows.

The general structure of the code segment for process P1 is as


follows.
When any process, suppose P0, wishes to enter its critical section,
it first sets flag[0] to true and the value of turn to other number, that
is, 1. It then verifies the following two conditions.
1. whether flag[1] is true
2. whether turn equals 1.
If any of these conditions is false, the process P0 enters its critical
section, otherwise, it waits. In case, only P0 wishes to enter the critical
section, the first condition remains false. The process P0 then
executes in its critical section, and after executing it resets flag[0] to
false, indicating that P0 is not in its critical section.

Now, consider the case when both P0 and P1 wish to enter their
critical sections at the same time. In this case, both the elements of
the flag will be set to true, and the value of turn will be set to 0 and 1
one by one (by P0 and P1) but only one retains. Now, the first condition
is true, thus, the value of turn decides which process enters its
critical section first. The other process has to wait. It implies mutual
exclusion is preserved.
To verify that the algorithm also satisfies the other two
requirements, observe that the process P0 can be prevented from
entering its critical section if flag[1] is true and turn is 1. If P1 does
not wish to enter its critical section, then P0 finds flag[1] as false and
can enter its critical section. However, when both processes wish to
enter their critical section at the same time the variable turn plays its
role and allows one process to enter its critical section. Suppose turn
is 1, then P1 is allowed first and P0 is stuck in the loop. Now, when P1
exits from its critical section, it sets flag[1] to false to indicate that it
is not in its critical section now. This allows P0 to enter its critical
section. It means P0 enters its critical section after at most one entry
by P1, satisfying both progress and bounded-waiting requirements.

5.5.4 Bakery Algorithm: Multiple-Process Solution


Lamport proposed an algorithm, known as the bakery algorithm, to
solve the critical-section problem for N processes. The algorithm lets
the processes to share the following two variables.
boolean choosing[N];
int number[N];

All the elements of the arrays, that is, choosing and number are
initialized to false and 0, respectively.
The algorithm assigns a number to each process and serves the
process with the lowest number first. The algorithm cannot ensure
that two processes do not receive the same number. Thus, if two
processes, say Pi and Pj, receive the same number, then Pi is served
first if i<j. The general structure of the code segment for process, say
Pi, is as follows.
Note: For simplicity, the notation MAX(number) is used to retrieve the
maximum element in the array number.
To verify that mutual exclusion is preserved, suppose a process P0
is executing in its critical section and another process, say P1,
attempts to enter the critical section. For j=0, the process P1 is not
blocked in the first while loop because P0 had set choosing[0] to
false in the entry section. However, in the second while loop for j=0,
P1 finds the following:

• number[j]!=0, since P0 is executing in the critical section after


setting it to a nonzero number in the entry section.
• number[j]<number[i], since P1 is assigned a number after P0.
Though, P1 may be assigned the same number as that of P0 but in
that case, P1 finds j<i because 0<1.
Since, the eventual result in the second while loop is true, P1 is
blocked in the while loop until P0 finishes execution in its critical
section, thus, preserving the mutual exclusion requirement. The
algorithm not only preserves the mutual exclusion requirement, but
also the progress and bounded-waiting requirements. To verify these
requirements, observe that if two or more processes are waiting to
enter the critical sections, then the process that had come first is
allowed to enter the critical section first. The conditions in the second
while statement ensure this. It means the processes are served on a
first-come, first-serve basis, and no process is delayed because of
starvation.

5.6 SYNCHRONIZATION HARDWARE


The software approaches incur high processing overhead and the
risk of logical errors is more likely. An alternative is to use the
hardware approaches to solve the critical-section problem. The
hardware-supported solutions developed for the critical-section
problem make use of hardware instructions available on many
systems and thus, are effective and efficient.

Disabling Interrupts
On a system with a single-processor, only one process executes at a
time. The other processes can gain control of processor through
interrupts. Therefore, to solve the critical-section problem, it must be
ensured that when a process is executing in its critical section,
interrupt should not occur. A process can achieve this by disabling
interrupts before entering in its critical section. Note that the process
must enable the interrupts after finishing execution in its critical
section.
This method is simple, but it has certain disadvantages. First, it is
feasible in a single-processor environment only because disabling
interrupts in a multiprocessor environment takes time as message is
passed to all the processors. This message passing delays
processes from entering into their critical sections, thus, decreasing
the system efficiency. Second, it may affect the scheduling goals,
since the processor cannot be preempted from a process executing
in its critical section.

Using TestAndSet() Instruction


Due to the disadvantages of the above method, many systems
provide special machine instructions to solve the critical-section
problem. One special instruction is the TestAndSet instruction which
can be defined as follows.

An important characteristic of the TestAndSet instruction is that it


executes as an atomic action. It means that if two TestAndSet
instructions are executed simultaneously (each on a different CPU) in
a multiprocessor system, then one must complete before another one
starts on another processor.
On systems that support the TestAndSet instruction, the mutual
exclusion can be implemented by allowing the processes to share a
Boolean variable, say lock, initialized to false. The general structure
of the code segment for process, say Pi, is as follows.
The algorithm is easy and simple to understand. Any process that
wishes to enter its critical section executes the TestAndSet instruction
and passes the value of lock as a parameter to it. If the value of lock
is false (means no process is in its critical section), the TestAndSet
instruction sets the lock to true and returns false, which breaks the
while loop and allows the process to enter its critical section.
However, if the value of lock is true, the TestAndSet instruction
returns true, thus, blocking the process in the loop. The algorithm
satisfies the mutual exclusion requirement, but does not satisfy the
bounded-waiting requirement.

Using Swap() Instruction


Another special hardware instruction is the Swap instruction that
operates on two Boolean variables. Like the TestAndSet instruction,
the Swap instruction also executes as an atomic action. This
instruction can be defined as follows.
On systems that support the Swap instruction, the mutual exclusion
can be implemented by allowing the processes to share a Boolean
variable, say lock, initialized to false. In addition to this, each
process uses a local Boolean variable, say key. The general structure
of the code segment for process, say Pi, is as follows.

The algorithm is again very easy and simple to understand.


Initially, the value of lock is false, so the first process, say Pi, when
executes the Swap instruction sets the key to false and lock to true.
The false value for key allows the process to enter its critical section.
Any other process, say Pj, that attempts to enter its critical section
finds that the lock is true and when it is swapped with key, the key
remains true. The true value for key blocks the process Pj in the
while loop until the lock becomes false. Note that the lock becomes
false when Pi exits from the critical section. This algorithm also
satisfies only the mutual exclusion requirement and does not satisfy
the bounded-waiting requirement.

Modified Algorithm using TestAndSet() Instruction


To meet all the requirements of the solution for critical-section
problem, another algorithm is developed that uses the TestAndSet
instruction. The algorithm lets the processes to share the following
two variables.
boolean lock;
boolean waiting[N];

The variable lock and all the elements of array waiting are
initialized to false. Each process also has a local Boolean variable,
say key. The general structure of the code segment for process, say
Pi, is as follows.

To verify that the mutual exclusion requirement is met, suppose a


process Pi attempts to enter its critical section. It first sets the
waiting[i] and key to true, and then reaches the while loop in the
entry section. If Pi is the first process attempting to enter its critical
section, it finds that both the conditions in the while loop are true.
Then it executes the TestAndSet instruction which sets the lock to
true and returns false, since lock is initially false. The returned
value, that is false, is assigned to key, which allows the process Pi to
exit from the loop and enter its critical section after resetting the
waiting[i] to false.
Now, the value of lock is true, thus, any other process, say Pj,
that attempts to enter its critical section when executes the
TestAndSet instruction sets the key to true and is blocked in the while
loop until either key or waiting[j] becomes false. Note that neither
key nor waiting[j] becomes false until Pi is in its critical section.
This maintains the mutual exclusion requirement.
To verify the progress requirement, observe in the exit section that
Pi sets either lock or waiting[j] to false. Setting lock (on which the
value of key depends) or waiting[j] to false allows any other waiting
process to enter its critical section.
The algorithm also satisfies the bounded-waiting requirement. To
verify this, observe that when any process, say Pi, exits from its
critical section, it scans the waiting array in the cyclic order (i+1,
i+2, ... , N-1, 0, 1, …, i-1) to locate the first process, say Pj, with
waiting[j] equal to true. If no such process is found, Pi sets lock to
false, so that any other process that now attempts to enter its critical
section need not to wait. On the other hand, if such a process is
found, it enters its critical section next, since Pi sets waiting[j] to
false. In this way, each process gets its turn to enter its critical
section after a maximum of N-1 processes.

5.7 SEMAPHORES
In 1965, Dijkstra suggested using an abstract data type called a
semaphore for controlling synchronization. A semaphore S is an
integer variable which is used to provide a general-purpose solution
to critical-section problem. In his proposal, two standard atomic
operations are defined on S, namely, wait and signal, and after
initialization, S is accessed only through these two operations. The
definition of wait and signal operation in pseudocode is as follows.
The solution of critical-section problem for N processes is
implemented by allowing the processes to share a semaphore S,
which is initialized to 1. The general structure of the code segment for
process, say Pi, is as follows.

Note that all the solutions presented so far for the critical-section
problem, including the solution using semaphore, require busy
waiting. It means if a process is executing in its critical section, all
other processes that attempt to enter their critical sections must loop
continuously in the entry section. Executing a loop continuously
wastes CPU cycles, and is considered a major problem in
multiprogramming systems with one processor.
To overcome the busy waiting problem, the definition of
semaphore is modified to hold an integer value and a list of
processes, and the wait and signal operations are also modified. In
the modified wait operation, when a process finds that the value of
the semaphore is negative, it blocks itself instead of busy waiting.
Blocking a process means it is inserted in the queue associated with
the semaphore and the state of the process is switched to the waiting
state. The signal operation is modified to remove a process, if any,
from the queue associated with the semaphore and restart it. The
modified definition of semaphore, the wait operation, and the signal
operation is as follows.
Note: The block() operation and wakeup() operation are provided by
the operating system as basic system calls.
An important requirement is that both the wait and signal
operations must be treated as atomic instructions. It means no two
processes can execute wait and signal operations on the same
semaphore at the same time. We can view this as a critical-section
problem, where the critical section consists of wait and signal
operations. This problem can be solved by employing any of the
solutions presented earlier.
In this way though, we have not completely eliminated the busy
waiting but limited the busy waiting to only the critical sections
consisting of wait and signal operations. Since these two operations
are very short, busy waiting occurs rarely and for a very short time
only.
The semaphore presented above is known as counting
semaphore or general semaphore, since its integer value can range
over an unrestricted domain. Another type of semaphore is binary
semaphore whose integer value can range only between 0 and 1.
Binary semaphore is simpler to implement than general semaphore.
The wait and signal operations for a binary semaphore S, initialized
to 1, are as follows.

5.8 CLASSICAL PROBLEMS OF


SYNCHRONIZATION
In the previous section, the use of semaphore to develop solution for
critical-section problem is presented. Semaphore can also be used to
solve various synchronization problems. In this section, we present
some classical problems of synchronization and use semaphores for
synchronization in the solutions of these problems. Note that these
problems of synchronization are used for testing almost all the newly
proposed synchronization scheme.

5.8.1 Producer-Consumer Problem


The producer-consumer problem was introduced in Chapter 2 where
a solution to this problem was presented using shared memory.
Recall from the chapter that our solution allowed at most size-1 items
to be in the buffer at the same time. One possible solution to
eliminate this inadequacy is to have an integer variable count,
initialized with 0, to keep track of the number of items in the buffer.
The producer process increments the count every time it adds a new
item to the buffer, and the consumer process decrements the count
every time it removes an item from the buffer. The modified code for
the producer and consumer processes is as follows.
The producer process first determines whether the value of count
is equal to size. If it is, the producer waits, since the buffer is full.
Otherwise, it adds an item to the buffer and increments the count.
Similarly, the consumer process first determines whether the value of
count is 0. If it is, the consumer waits, since the buffer is empty.
Otherwise, it removes an item from the buffer and decrements the
count.

Though both producer and consumer processes are correct


separately, but concurrent execution of these processes may lead to
the race condition. To understand this, suppose the statement
count++ is internally implemented as follows.
register1 = count
register1 = register1 + 1
count = register1

Here, register1 is a local CPU register. In this implementation, the


value of count is first read into a local CPU register. Then, the value in
the register is incremented by one, which is finally assigned back to
the variable count. Similarly, suppose the statement count-- is
internally implemented as follows.
register2 = count
register2 = register2 – 1
count = register2

Further, suppose the value of count is currently 2, and the


producer process reads this value in register1 and then increments
the value in register1. The value in register1 becomes 3. However,
before the producer process assigns back the incremented value to
count, the scheduler decides to temporarily suspend it and start
running the consumer process. The consumer process reads the
value of count (which is still 2) in register2 and then decrements it.
The value in register2 becomes 1.
Now, the order in which count is updated by the producer and
consumer processes decides the final value of count. It means, if
producer first and consumer secondly updates count, its value
becomes 1. On the other hand, if consumer first and producer
secondly updates count, its value becomes 3. However, the only
correct value of count is 2, which is now cannot be produced. The
incorrect result is generated because access to the variable count is
unconstrained and both the processes manipulate it concurrently.
To avoid the possibility of occurrence of race condition, we
present a solution to the producer-consumer problem with bounded
buffer using semaphores. This solution not only avoids the race
condition but also allows to have size items in the buffer at the same
time, thus, eliminating the inadequacies of the solutions using shared
memory. The following three semaphores are used in this solution.
• The mutex semaphore, initialized with 1, is used to provide the
producer and consumer processes the mutually exclusive access
to the buffer. This semaphore ensures that only one process,
either producer or consumer, is accessing the buffer and the
associated variables at a time.
• The full semaphore, initialized with 0, is used to count the
number of full buffers. This semaphore ensures that the producer
stops executing items when the buffer is full.
• The empty semaphore, initialized with the value of size, is used to
count the number of empty buffers. This semaphore ensures that
consumer stops executing when the buffer is empty.
The general structure of the code segment for producer process
and consumer process is as follows.
5.8.2 Readers-Writers Problem
Concurrently executing processes that are sharing a data object,
such as a file or a variable, can be categorized into two groups:
readers and writers. The processes in the readers group want only to
read the contents of the shared object, whereas the processes in
writers group want to update (read and write) the value of shared
object. There is no problem if multiple readers access the shared
object simultaneously, however, if a writer and some other process
(either a reader or a writer) access the shared object simultaneously,
data may become inconsistent.
To ensure that such a problem does not arise, we must guarantee
that when a writer is accessing the shared object, no reader or writer
accesses that shared object. This synchronization problem is termed
as readers-writers problem, and it has many variations. The first
readers-writers problem (the simplest one) requires the following.
• All readers and writers should wait if a writer is accessing the
shared object. It means writers should get mutually exclusive
access to the shared object.
• Readers should not wait unless a writer is not accessing the
shared object. It means if a reader is currently reading the shared
object and a writer and a reader request, then writer should wait,
but reader should not wait just because of that a writer is waiting.
To develop the solution to the first readers-writers problem, the
readers are allowed to share two semaphores read and write, both
initialized with 1, and an integer variable count, initialized with 0. The
writers share the semaphore write with the readers. The functions of
read and write semaphores and count variable are as follows.

• The count variable is used to count the number of readers


currently reading the shared object. Each time a reader enters or
exits the critical section, count is updated.
• The read semaphore is used to provide mutual-exclusion to
readers when count is being updated.
• The write semaphore is used to provide mutual-exclusion to
writers. It is accessed by all the writers and only the first or last
reader that enters or exits its critical section, respectively.
The general structure of the code segment for a reader process
and a writer process is as follows.
5.8.3 Dining-Philosophers Problem
To understand the dining-philosophers problem, consider five
philosophers sitting around a circular table. There is a bowl of rice in
the center of the table and five chopsticks—one in between each pair
of philosophers (see Figure 5.2).
Fig. 5.2 A Situation in Dining Philosophers Problem

Initially, all the philosophers are in the thinking stage and while
thinking they do not interact with each other. As time goes on,
philosophers might feel hungry. When a philosopher feels hungry, he
attempts to pick up the two chopsticks closest to him (that are in
between him and his left and his right philosophers). If the
philosophers on his left and right are not eating, he successfully gets
the two chopsticks. With the two chopsticks in his hand, he starts
eating. After eating is finished, he puts the chopsticks back on the
table and starts thinking again. On the other hand, if the philosopher
on his left or right is already eating, then he is unable to successfully
grab the two chopsticks at the same time, and thus, must wait. Note
that this situation is similar to the one that occurs in the system to
allocate resources among several processes. Each process should
get required resources to finish its task without being deadlocked and
starved.
A solution to this problem is to represent each chopstick as a
semaphore, and philosophers must grab or release chopsticks by
executing wait operation or signal operation, respectively, on the
appropriate semaphores. We use an array chopstick of size 5 where
each element is initialized to 1. The general structure of the code
segment for philosopher i is as follows.

This solution is simple and ensures that no two neighbors are


eating at the same time. However, the solution is not free from
deadlock. Suppose all the philosophers attempt to grab the
chopsticks simultaneously and grab one chopstick successfully. Now,
all the elements of chopstick will be 0. Thus, when each philosopher
attempts to grab the second chopstick, he will go in waiting state
forever.
A simple solution to avoid this deadlock is to ensure that a
philosopher picks up either both chopsticks or no chopstick at all. It
means he must pick chopsticks in a critical section. A deadlock-free
solution to dining-philosophers problem is presented in Section 5.9
with the use of monitors.

5.8.4 Sleeping Barber Problem


To understand this problem, consider a barber shop that has one
barber, a barber chair, and n more chairs for waiting customers. If the
barber is busy in cutting a customer’s hair, the arriving customers sit
on the available chairs and wait for their turn. If all the chairs are
occupied, the customers will leave without haircut. If there are no
customers to be served, the barber sits on his chair and falls asleep.
When a customer arrives, he wakes up the sleeping barber to get a
haircut. The problem is to synchronize the barber and the customers
in such a way that the race conditions do not occur.
The solution to sleeping barber problem using semaphores
requires the barber and the customers to share the following.
• An integer variable waiting (initialized with 0) used to keep count
of the waiting customers.
• The barber semaphore, which is a binary semaphore (initialized
with 0) used to check whether the barber is idle.
• The customers semaphore, which is a counting semaphore
(initialized with 0) used to count the number of waiting customers.
• The mutex semaphore, which is a binary semaphore (initialized
with 1) used to provide mutually exclusive access to the waiting
variable.
In addition, a global variable no_of_chairs is used to define the
total number of chairs available in the barber shop. Notice that the
variable waiting serves the same purpose as that of the customers
semaphore. The reason we have used a separate variable is that the
value of a semaphore cannot be accessed.
The general structure of the code segment for barber and the
customers is as follows.
Initially, when the barber process executes, it finds the number of
waiting customers 0 and thus, sleeps. He continues to sleep until
awaken up by the first customer. When a customer comes, he first
needs to check whether there are seats for waiting. For this, he gets
accesses to the waiting variable by executing wait operation on
mutex semaphore and checks whether the value of waiting is less
than the total number of chairs. If no, the customer releases access
to waiting variable by executing signal operation on mutex
semaphore and leaves the barbershop without haircut.
However, if three are seats available, the customer increments the
value of waiting and then executes signal operation on the customers
semaphore thereby waking up the sleeping barber. At this point, both
barber and the customer are awake. With this, the customer releases
the waiting variable by executing signal operation on the mutex
semaphore. As it happens, the barber acquires access to waiting
variable by executing wait operation on mutex semaphore. He then
decrements the number of waiting customers and gets ready for his
job by executing signal operation on barber semaphore and
releasing access to waiting variable. After the haircut has been done,
the customer leaves the shop and the barber continues with the next
customer, if any; otherwise, he sleeps.

5.9 MONITORS
A monitor is a programming language construct which is also used to
provide mutually exclusive access to critical sections. The
programmer defines monitor type which consists of declaration of
shared data (or variables), procedures or functions that access these
variables, and initialization code. The general syntax of declaring a
monitor type is as follows.
The variables defined inside a monitor can only be accessed by
the functions defined within the monitor, and no process is allowed to
directly access these variables. Thus, processes can access these
variables only through the execution of the functions defined inside
the monitor. Further, the monitor construct ensures that only one
process may be executing within the monitor at a time. If a process is
executing within the monitor, then other requesting processes are
blocked and placed on an entry queue.
Though, monitor construct ensures mutual exclusion for
processes, but sometimes programmer may find them insufficient to
represent some synchronization schemes. For such situations,
programmer needs to define his own synchronization mechanisms.
He can define his own mechanisms by defining variables of condition
type on which only two operations can be invoked: wait and signal.
Suppose, programmer defines a variable C of condition type, then
execution of the operation C.wait() by a process, say Pi, suspends
the execution of Pi, and places it in a queue associated with the
condition variable C. On the other hand, the execution of the
operation C.signal() by a process, say Pi, resumes the execution of
exactly one suspended process Pj, if any. It means that the execution
of the signal operation by Pi allows other suspended process Pj to
execute within the monitor. However, only one process is allowed to
execute within the monitor at one time. Thus, monitor construct
prevents Pj from resuming until Pi is executing in the monitor. There
are following possibilities to handle this situation.
• The process Pi must be suspended to allow Pj to resume and wait
until Pj leaves the monitor.
• The process Pj must remain suspended until Pi leaves the
monitor.
• The process Pi must execute the signal operation as its last
statement in the monitor so that Pj can resume immediately.

Dining-Philosophers Problem Using Monitors


Now, we are in a situation to use the monitor to develop a deadlock-
free solution to dining-philosophers problem. The following monitor
controls the distribution of chopsticks to philosophers.
Each philosopher that feels hungry must invoke the
getChopsticks() operation before start eating, and after eating is
finished, he must invoke putDownChopsticks() operations and then
may start thinking. Thus, the general structure of the code segment
for philosopher i is as follows.

The getChopsticks() operation changes the state of philosopher


process from thinking to hungry and then verifies whether philosopher
on his left or right is in eating state. If either philosopher is in eating
state, then the philosopher process is suspended and its state
remains hungry. Otherwise, the state of philosopher process is
changed to eating.
After eating is finished, each philosopher invokes
putDownChopsticks() operation before start thinking. This operation
changes the state of philosopher process to thinking and then invoke
verifyAndAllow() operation for philosophers on his left and right side
(one by one). The verifyAndAllow() operation verifies whether the
philosopher feels hungry, and if so then allows him to eat in case
philosophers on his left and right side are not eating.

Producer-consumer Problem Using Monitor


The solution to bounded buffer producer-consumer problem using
monitor defines a monitor ProducerConsumer which has two conditional
variables full and empty to indicate the status of buffer, an integer
count to indicate current number of items in buffer and two
procedures put_item and remove_item to insert and remove items
from the buffer, respectively.
5.10 MESSAGE PASSING
Though semaphore and monitor provide process synchronization,
they can be used only on systems with a single primary memory. In a
distributed environment, where we have multiple systems connected
via network and each system has its own private memory, some other
mechanism is needed to support synchronization of processes.
Message passing is one such mechanism.
Recall from Chapter 2, in message passing the processes
communicate by sending and receiving messages to and from each
other. For this purpose, two system calls send() and receive() are
used. The syntax of these system calls is as follows.
send(destination, &message);
receive(source, &message);

Design Issues for Message Passing Systems


Message passing is not only limited to distributed systems rather the
processes on single system can also send and receive messages to
and from each other. However, in both cases certain design issues
arise that do not occur in the case of semaphores and monitors.
When communicating processes reside on the same computer, the
most important issue is the performance. Usually, the time taken in
passing a message from one process to another is much more than
that in performing an operation on semaphore or entering a monitor.
Several ways have been suggested to improve the performance of
message passing. One such way is to limit the size of message to
machine’s registers capacity so that message passing can be
performed faster with the help of registers.
When communicating processes reside on different computers
connected via network, some other issues need to be dealt with. One
of these is how can a sender know whether the receiver has received
the message or the message has lost in the network. This can be
communicated by making the receiver to send back an
acknowledgement message to the sender as soon as it receives a
message. If a sender receives the acknowledgement message within
a predefined time interval, it realizes that the message has been
received successfully; otherwise, it retransmits the message.
Now, consider a situation where receiver receives the message
successfully but the acknowledgement message is lost in the
network. In such a case, the sender after waiting for a certain time
interval retransmits the message and the receiver gets the same
message again. To avoid duplicate messages, the receiver must be
capable of distinguishing between new messages and retransmitted
messages. This can be achieved by adding a sequence number in
each new message. If a receiver gets a message with the same
sequence number as that of previous one, it simply discards that
message.
Another important issue in message passing systems is
authentication; how can a sender know whether it is communicating
with the actual receiver or some imposter. In addition, the processes
should be named in such a way that there is no ambiguity in send()
and receive() system calls.

Producer-consumer Problem Using Message Passing


Here, we will present a solution to the bounded buffer producer-
consumer problem with message passing. As an analogy to N slots in
the shared buffer, N messages have been used in this solution.
Initially, the consumer process sends all N empty messages to the
producer process. When producer process produces an item, it takes
an empty message, puts the item into it, and sends the filled
message to the consumer process. If the consumer process is not in
a state of receiving message, the operating system keeps this
message in a buffer until it is received by the consumer process.
When consumer process receives the message, it extracts the item,
sends back the empty message to the producer process, and
consumes the item. This way the producer and consumer processes
communicate with each other.
If the producer process produces items at a faster speed than the
consumer process consumes them, at some time, all messages will
be filled and the producer process will have to wait for an empty
message from the consumer process. On the other hand, if consumer
process’s speed is faster, it will have to wait for the producer process
to fill up some empty message. The code construct of bounded buffer
producer-consumer problem with message passing is given next.
Note: Since the total number of messages always remains constant
in the system, the amount of memory sufficient to hold N messages
can be kept reserved in advance.
LET US SUMMARIZE
1. Based on the nature of interactions among cooperating processes, two
kinds of synchronization have been identified: control synchronization and
data access synchronization.
2. Control synchronization is needed when cooperating processes need to
coordinate their execution with respect to one another. It is implemented
with the help of precedence graph of cooperating processes.
3. Data access synchronization is needed when cooperating processes
access shared data. The use of shared data by cooperating processes
may lead to unexpected results because of race conditions.
4. Concurrency which implies simultaneous execution of multiple processes
is central to the design of any operating system whether it is a single-
processor multiprogramming system, a multiprocessor or a distributed
system.
5. A situation where several processes sharing some data execute
concurrently and the result of the execution depends on the order in which
the shared data is accessed by the processes is called race condition.
6. To avoid race conditions, some form of synchronization among the
processes is required which ensures that only one process is
manipulating the shared data at a time. In other words, we need to ensure
mutual exclusion.
7. Formally, a precedence graph is defined as a directed graph G = (N, E)
where N is a set of nodes representing processes or statements of a
program and E is a set of directed edges, where an edge from a node
(say, representing a process P1) to another node (say representing a
process P2) implies that the process P1 must be completed before the
process P2 can begin its execution.
8. Given a precedence graph, we can determine the number of processors
required to execute it in an efficient manner as well as the order in which
the activities (or processes) should be executed.
9. The portion of code of a process in which it accesses or changes the
shared data is known as its critical region (also called critical section).
10. The critical-section problem is to design a protocol that the processes can
use to cooperate. A solution to critical-section problem must meet the
mutual exclusion, progress, and bounded-waiting requirements.
11. One way to achieve solution to critical-section problem is to leave the
responsibility of coordination of the processes that are to be executed
concurrently without any support from the operating system or without
using any programming language constructs. Such solutions are referred
to as software approaches.
12. The software solutions to critical-section problem include strict alternation
(an attempt), Dekker’s algorithm (the first known correct solution),
Peterson’s algorithm (simpler than Dekker’s algorithm) and Bakery
algorithm (multiple-process solution).
13. The hardware-supported solutions developed for the critical-section
problem make use of hardware instructions available on many systems
and thus, are effective and efficient. These solutions include disabling
interrupts, TestAandSet() instruction, Swap() instruction and modified
TestAndSet() instruction.
14. In 1965, Dijkstra suggested using an abstract data type called a
semaphore for controlling synchronization. A semaphore S is an integer
variable which is used to provide a general-purpose solution to critical-
section problem.
15. Two standard atomic operations are defined on S, namely, wait and
signal, and after initialization, S is accessed only through these two
operations.
16. The semaphore whose integer value can range over an unrestricted
domain is known as counting semaphore or general semaphore. Another
type of semaphore whose integer value can range only between 0 and 1
is known as binary semaphore.
17. Semaphore can also be used to solve various synchronization problems.
Some classical problems of synchronization include producer-consumer
problem, readers-writers problem, dining-philosophers problem, and
sleeping barber problem.
18. A monitor is a programming language construct which is also used to
provide mutually exclusive access to critical sections. The programmer
defines monitor type which consists of declaration of shared data (or
variables), procedures or functions that access these variables, and
initialization code.
19. Monitor construct ensures that only one process may be executing within
the monitor at a time.
20. Programmer can define his own synchronization mechanisms by defining
variables of condition type on which only two operations can be invoked:
wait and signal.
21. In a distributed environment where we have multiple processors
connected via network and each processor has its own private memory,
another communication mechanism called message passing is used.

EXERCISES
Fill in the Blanks
1. The portion of the code of a process in which it accesses or changes the
shared data is known as _____________.
2. Unordered execution of cooperating processes may result in
_____________.
3. The two standard atomic operations defined on monitor are
_____________ and _____________.
4. The semaphore whose integer value can range over an unrestricted
domain is known as _____________.
5. Two types of synchronization that may be needed among the cooperating
processes include_____________ and _____________.

Multiple Choice Questions


1. Which of the following requirements must be met by a solution to critical-
section problem?
(a) Bounded waiting
(b) Progress
(c) Mutual exclusion
(d) All of these
2. In dining-philosophers problem, how many philosophers are there around
a table?
(a) Two
(b) Seven
(c) Five
(d) Six
3. Which of the following is considered an important issue in message
passing system?
(a) Server performance
(b) Authentication
(c) Routing
(d) All of these
4. Which abstract data type was introduced in 1965 by Dijkstra?
(a) Semaphore
(b) Monitor
(c) Scheduler
(d) None of these
5. Which of the following is used for implementing control synchronization?
(a) Semaphores
(b) Precedence graph
(c) Monitors
(d) Peterson’s algorithm

State True or False


1. From a precedence graph, we can determine the number of processors
required to execute it in an efficient manner.
2. On a system with a single-processor, only one process executes at a time.
3. Executing the Swap instruction as an atomic action is not a requirement.
4. The problem of busy waiting is eliminated completely with the use of
semaphores.
5. Monitors can be used for process synchronization in distributed
environment.

Descriptive Questions
1. Explain with example that why some form of synchronization among the
processes is required.
2. Define critical-section problem. Also explain all the requirements that a
solution to critical-section problem must met.
3. Describe bakery algorithm to solve critical-section problem.
4. Give TestAndSet instruction. Also give the algorithm that uses TestAndSet
instruction to solve the critical-section problem and meets all the
requirements of the solution for the critical-section problem.
5. Write short notes on the following.
(a) Semaphore
(b) Swap instruction
(c) Entry and exit section
(d) Critical section
6. What is busy waiting? How semaphore is used to overcome the busy
waiting problem?
7. Explain the use of semaphore in developing a solution to bounded buffer
producer-consumer problem.
8. Describe the dining-philosophers problem. Give a solution to the dining-
philosopher problem with the use of monitors.
9. Can semaphores and monitors be used in distributed systems? Why or
why not.
10. Explain the bounded buffer producer-consumer problem and provide a
solution to this using message passing.
11. Discuss the Dekker’s algorithm and prove that this algorithm satisfies all
the three requirements for solution to critical-section problem.
12. Describe the use of precedence graph. What information does it provide?
chapter 6

Deadlock

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Define system model.
⟡ Discuss the features that characterize the deadlock.
⟡ Discuss different methods of handling deadlock.
⟡ Explain how a deadlock can be prevented by eliminating one of the
four conditions of a deadlock.
⟡ Understand the concept of safe and unsafe state.
⟡ Explain various deadlock avoidance algorithms.
⟡ Discuss different deadlock detection methodologies.
⟡ List the ways to recover from a deadlock.

6.1 INTRODUCTION
Deadlock occurs when every process in a set of processes are in a
simultaneous wait state and each of them is waiting for the release of
a resource held exclusively by one of the waiting processes in the
set. None of the processes can proceed until at least one of the
waiting processes releases the acquired resource. Deadlocks may
occur on a single system or across several machines. This chapter
discusses the different ways in which these deadlocks can be
handled.
6.2 SYSTEM MODEL
A system consists of various types of resources like input/output
devices, memory space, processors, disks, etc. For some resource
types, several instances may be available. For example, a system
may have two printers. When several instances of a resource type
are available, any one of them can be used to satisfy the request for
that resource type.
A process may need multiple resource types to accomplish its
task. However, to use any resource type, it must follow some steps
which are discussed next.
1. Request for the required resource.
2. Use the allocated resource.
3. Release the resource after completing the task.
If the requested resource is not available, the requesting process
enters a waiting state until it acquires the resource. Consider a
system with a printer and a disk drive and two processes P1 and P2
are executing simultaneously on this system. During execution, the
process P1 requests for the printer and process P2 requests for the
disk drive and both the requests are granted. Further, the process P2
requests for the printer held by process P1 and process P1 requests for
the disk drive held by the process P2. Here, both processes wll enter a
waiting state. Since each process is waiting for the release of
resource held by other, they will remain in waiting state forever. This
situation is called deadlock.
Note: When two processes are inadvertently waiting for the
resources held by each other, this situation is referred to as a deadly
embrace.

6.3 DEADLOCK CHARACTERIZATION


Before discussing the methods to handle a deadlock, we will discuss
the conditions that cause a deadlock and how a deadlock can be
depicted using resource allocation graph.

6.3.1 Deadlock Conditions


A deadlock occurs when all the following four conditions are satisfied
at any given point of time.
1. Mutual exclusion: Only one process can acquire a given resource at any
point of time. Any other process requesting for that resource has to wait
for earlier process to release it.
2. Hold and wait: A process is holding a resource allocated to it, and waiting
to acquire another resource held by some other process.
3. No preemption: Resource allocated to a process cannot be forcibly
revoked by the system, it can only be released voluntarily by the process
holding it.
4. Circular wait: A set of processes waiting for allocation of resources held
by other processes forms a circular chain in which each process is waiting
for the resource held by its successor process in chain.
In the absence of any one of these conditions, deadlock will not
occur. We will discuss these conditions in detail in subsequent
sections and see how they can be prevented.

6.3.2 Resource Allocation Graph


A deadlock can be depicted with the help of a directed graph known
as resource allocation graph. The graph consists of two different
types of nodes, namely, processes and resources. The processes are
depicted as circles and resources as squares. A directed arc from a
process to a resource (known as request edge) indicates that the
process has requested for the resource and is waiting for it to be
allocated. In contrast, a directed arc from a resource to a process
(known as assignment edge) indicates that the resource has been
allocated to the process. For example, consider the resource
allocation graph shown in Figure 6.1. Here, the process P1 is holding
resource R2 and requesting for the resource R1, which in turn is held
by the process P2. The process P2 is requesting for the resource R2
held by the process P 1. That means there is a deadlock.
Fig. 6.1 Resource Allocation Graph

It can be observed that this graph forms a cycle (P1->R1->P2->R2-


>P1). A cycle in the resource allocation graph indicates that there is
deadlock and the processes forming the part of the cycle are
deadlocked. If there is no cycle in a graph, there is no deadlock. In
this example, there is only one instance of each resource type.
However, there can exist multiple instances of a resource type. The
resource allocation graph for two instances (R21 and R22) of resource
type R2 is shown in Figure 6.2.

Fig. 6.2 Resource Allocation Graph for Multiple Instances of a Resource Type
This resource allocation graph has the following indications.
1. Process P1 is waiting for the allocation of resource R1 held by the process
P2.
2. Process P2 is waiting for the allocation of instance (R22) of resource type
R2.
3. Process P1 is holding an instance (R21) of resource type R2.
It can be observed that the graph forms a cycle but still processes
are not deadlocked. The process P2 can acquire the second instance
(R22) of the resource type R2 and complete its execution. After
completing the execution, it can release the resource R1 that can be
used by the process P1. Since, no process is in waiting state, there is
no deadlock.
From this discussion, it is clear that if each resource type has
exactly one instance, cycle in resource allocation graph indicates a
deadlock. If each resource type has several instances, cycle in
resource allocation graph does not necessarily imply a deadlock.
Thus, it can be concluded that if a graph contains no cycle, the set of
processes are not deadlocked; however, if there is a cycle then
deadlock may exist.

6.4 METHODS FOR HANDLING DEADLOCKS


A deadlock can be handled in four different ways which are as
follows:
• Prevent the deadlock from occurring.
• Adopt methods for avoiding the deadlock.
• Allow the deadlock to occur, detect it and recover from it.
• Ignore the deadlock.
Deadlock prevention or deadlock avoidance techniques can be
used to ensure that deadlocks never occur in a system. If any of
these two techniques is not used, a deadlock may occur. In this case,
an algorithm can be provided for detecting the deadlock and then
using the algorithm to recover the system from the deadlock.
One or the other method must be provided to either prevent the
deadlock from occurrence or detect the deadlock and taking an
appropriate action if a deadlock occurs. However, if in a system,
deadlock occurs less frequently (say, once in two years) then it is
better to ignore the deadlocks instead of adopting expensive
techniques for deadlock prevention, deadlock avoidance, or deadlock
detection and recovery.

6.5 DEADLOCK PREVENTION


As stated earlier, a deadlock occurs when all of the four conditions
are satisfied at any point of time. The deadlock can be prevented by
not allowing all four conditions to be satisfied simultaneously, that is,
by making sure that at least one of the four conditions does not hold.
Now let us analyze all four conditions one by one and see how their
occurrence can be prevented.

Eliminating Mutual Exclusion


The mutual exclusion property does not hold for the resources that
are sharable. For example, a file which is opened in read-only mode
can be shared among various processes. Hence, processes will
never have to wait for the sharable resources. However, there are
certain resources which can never be shared, like printer can work for
only one process at a time. It cannot print data being sent as output
from more than one process simultaneously. Hence, the condition of
mutual exclusion cannot be eliminated for all the resources.

Eliminating Hold and Wait Condition


This condition can be eliminated by not allowing any process to
request for a resource until it releases the resources held by it, which
is impractical as process may require the resources simultaneously.
Another way to prevent hold and wait condition is by allocating all the
required resources to the process before starting the execution of that
process. The disadvantage associated with it is that a process may
not know in advance about the resources that will be required during
its execution. Even if it knows in advance, it may unnecessarily hold
the resources which may be required at the end of its execution.
Thus, the resources are not utilized optimally.

Eliminating No Preemption
Elimination of this condition means a process can release the
resource held by it. If a process requests for a resource held by some
other process then instead of making it wait, all the resources
currently held by this process can be preempted. The process will be
restarted only when it is allocated the requested as well as the
preempted resources. Note that only those resources can be
preempted whose current working state can be saved and can be
later restored. For example, the resources like printer and disk drives
cannot be preempted.

Eliminating Circular-Wait Condition


The circular wait condition can be eliminated by assigning a priority
number to each available resource and a process can request
resources only in increasing order. Whenever a process requests for
a resource, the priority number of the required resource is compared
with the priority numbers of the resources already held by it. If the
priority number of a requested resource is greater than that of all the
currently held resources, the request is granted. If the priority number
of a requested resource is less than of the currently held resources,
all the resources with greater priority number must be released first,
before acquiring the new resource.

6.6 DEADLOCK AVOIDANCE


A deadlock can be prevented by eliminating any one of the four
necessary conditions of the deadlock. Preventing deadlock using this
method results in the inefficient use of resources. Thus, instead of
preventing deadlock, it can be avoided by never allowing allocation of
a resource to a process if it leads to a deadlock. This can be
achieved when some additional information is available about how
the processes are going to request for resources in future.
Information can be in the form of how many resources of each type
will be requested by a process and in which order. On the basis of
amount of information available, different algorithms can be used for
deadlock avoidance.
One of the simplest algorithms requires each process to declare
the maximum number of resources (of each type) required by it
during its course of execution. This information is used to construct
an algorithm that will prevent the system from entering a state of
deadlock. This deadlock avoidance algorithm continuously examines
the state of resource allocation ensuring that circular wait condition
never exists in a system. The state of resource allocation can be
either safe or unsafe.

Safe and Unsafe State


A state is said to be safe if allocation of resources to processes does
not lead to deadlock. More precisely, a system is in safe state only if
there is a safe sequence. A safe sequence is a sequence of process
execution such that each and every process executes till its
completion. For example, consider a sequence of processes (P1, P2,
P3, …, Pn) forming a safe sequence. In this sequence, first the process
P1 will be executed till its completion, and then P2 will be executed till
its completion, and so on. The number of resources required by any
process can be allocated either from the available resources or from
the resources held by previously executing process. When a process
completes its execution, it releases all the resources held by it which
then can be utilized by the next process in a sequence. That is, the
request for the resources by the process Pn can be satisfied either
from the available resources or from the resources held by the
process Pm, where m<n. Since this sequence of process execution is
safe, the system following this sequence is in the safe state. If no
such sequence of process execution exists then the state of the
system is said to be unsafe. Figure 6.3 depicts the relationship
between safe state, unsafe state and deadlock.
For example, consider a system in which three processes P1, P2
and P3 are executing and there are 10 instances of a resource type.
The maximum number of resources required by each process, the
number of resources already allocated and the total number of
available resources are shown in Figure 6.4.

Fig. 6.3 Relationship between Safe State, Unsafe State and Deadlock
Fig. 6.4 Safe Sequence of Execution of Processes
On the basis of available information, it can be easily observed
that the resource requirement of the process P2 can be easily
satisfied. Therefore, resources are allocated to the process P2 and it
is allowed to execute till its completion. After the execution of the
process P2, all the resources held by it are released. The number of
the resources now available are not enough to be allocated to the
process P1, whereas, they are enough to be allocated to the process
P3. Therefore, resources are allocated to the process P3 and it is
allowed to execute till its completion. The number of resources
available after the execution of process P3 can now easily be
allocated to the process P1. Hence, the execution of the processes in
sequence P2, P3, P1 is safe (see Figure 6.4).
Now consider a sequence P2, P1, P3. In this sequence, after the
execution of process P2, the number of available resources is 6, and
is allocated to the process P1. Even after the allocation of all the
available resources, the process P1 is still short of one resource for its
complete execution. As a result, the process P1 enters a waiting state
and waits for process P3 to release the resource held by it, which in
turn is waiting for the remaining resources to be allocated for its
complete execution. Now the processes P1 and P3 are waiting for each
other to release the resources, leading to the deadlock. Hence, this
sequence of process execution is unsafe.
Note that a safe state is a deadlock free state, whereas all unsafe
states may or may not result in a deadlock. That is, an unsafe state
may lead to a deadlock but not always.

6.6.1 Resource Allocation Graph Algorithm


As discussed earlier, resource allocation graph consists of two types
of edges: request edge and assignment edge. In addition to these
edges, another edge known as claim edge can also be introduced in
this graph, which helps in avoiding the deadlock. A claim edge from a
process to the resource indicates that the process will request for that
resource in near future. This edge is represented same as that of
request edge but with dotted line. Whenever the process actually
requests for that resource, the claim edge is converted to the request
edge. Also, whenever a resource is released by any process, the
corresponding assignment edge is converted back to the claim edge.
The pre-requisite of this representation is that all the claim edges
related to a process must be depicted in the graph before the process
starts executing. However, a claim edge can be added at the later
stage only if all the edges related to that process are claim edges.

Fig. 6.5 Resource Allocation Graph with Claim Edges

Whenever the process requests for a resource, the claim edge is


converted to request edge only if converting the corresponding
request edge to assignment edge does not lead to the formation of a
cycle in a graph, as cycle in a graph indicates the deadlock. For
example, consider the resource allocation graph shown in Figure 6.5,
the claim edge from process P1 to the resource R1 cannot be
converted to the request edge as it will lead to the formation of cycle
in the graph.

6.6.2 Banker’s Algorithm


In case there are multiple instances of a resource type in a system,
the deadlock cannot be avoided using resource allocation graph
algorithm. This is because the presence of cycle in the resource
allocation graph for multiple resources does not always imply the
deadlock. In such situations, deadlock can be avoided using an
algorithm known as banker’s algorithm that was devised by Dijkstra
in 1965. Here, we discuss the banker’s algorithm for the resource-
allocation system having a single resource with multiple instances
and for the resource-allocation system having multiple resources with
multiple instances of each resource.

Banker’s Algorithm for a Single Resource


The banker’s algorithm requires every process entering the system to
inform maximum number of resources (less than the total number of
available resources) that it needs during its execution. Whenever a
request from a process for allocating resources arrives, the banker’s
algorithm immediately checks whether allocating required number of
resources to the process leaves the system in safe state. If so, the
request is granted and resources are allocated; otherwise, the
process is forced to wait for other processes to release enough
resources.
To determine whether a state is safe, the algorithm checks if the
system has enough resources to complete any processes. If yes,
those processes are assumed to be executed and the resources
allocated to them are added to available resources. Now, the
algorithm again checks for the processes that can finish their
execution with currently available resources. This procedure
continues and eventually, if all the processes can be executed
completely, the system state is considered to be safe.
To understand banker’s algorithm, consider a system with three
processes P1, P2, P3 and a single resource type X having 15 instances.
The maximum numbers of resources needed by P1, P2, and P3 are
eight, eight, and nine respectively. The initial resource allocation state
of the system is shown here.
The banker’s algorithm is based on the assumption that not all
processes require all resources at the same time. Thus, to start with,
each process is allocated a few resources. The resource allocation
state of the system after allocating four, five, and three resources to P
1, P 2, and P 3, respectively is shown here.

This allocation state of the system is safe as with currently


available resources, firstly the process P2 can finish its execution
successfully and after that processes P1 and P3 can finish in any order.
Now suppose the process P3 requests for two more resources. If this
request is granted, the resource allocation state of the system will be
as shown here.

This allocation state is unsafe as no process can finish its


execution with the currently available resources. Thus, the request of
process P3 is not granted and it has to wait for other processes to
release some resources.

Banker’s Algorithm for Multiple Resources


In case there are multiple resources with each having multiple
instances, the banker’s algorithm for a single resource with multiple
instances needs to be extended. However, the basic idea remains the
same. Before granting request from a process, the algorithm checks
whether granting the request leaves the system in safe state. If so,
the request is granted; otherwise, the process has to wait. To
implement the banker’s algorithm for multiple resources, certain data
structures are required, which help in determining whether the system
is in safe state. These data structures are as follows:
1. Available resources, A: A vector of size q stores information about the
number of resources available of each type.
2. Maximum, M: A matrix of order p×q stores information about the maximum
number of resources of each type required by each process (p number of
processes). That is, M[i][j] indicates the maximum number of resources
of type j required by the process i.
3. Current allocation, C: A matrix of order p×q stores information about the
number of resources of each type allocated to each process. That is, C[i]
[j] indicates the number of resources of type j currently held by the
process i.
4. Required, R: A matrix of order p×q stores information about the remaining
number of resources of each type required by each process. That is, R[i]
[j] indicates the remaining number of resources of type j required by the
process i. Note that this vector can be obtained by M–C, that is, R[i][j] =
M[i][j] - C[i][j].
The values of these data structures keep on changing during the
execution of processes. Note that the condition A£B holds for the
vectors A and B of size p, if and only if A[i]≤ B[i] for all i=1, 2, 3, ...,
p. For example, if A={2, 1} and B={3, 4}, then A<=B.

Safety Algorithm
This algorithm is used for determining whether or not a system is in
safe state. To understand how this algorithm works, consider a vector
Complete of size p. Following are the steps of the algorithm.
1. Initialize Complete[i]=False for all i=1, 2, 3,..., p. Complete[i] =False
indicates that the ith process is still not completed.
2. Search for an i, such that Complete[i]=False and (R£A) that is, resources
required by this process is less than the available resources. If no such
process exists, then go to step 4.
3. Allocate the required resources to the process and let it finish its
execution. Set Complete[i]=True for that process and add all its resources
to vector A. Go to step 2.
4. If Complete[i]=True for all i, then the system is in safe state. Otherwise, it
indicates that there exists a process for which Complete[i]=False and
resources required by it are more than the available resources. Hence, it
is in unending waiting state leading to an unsafe state.

Resource-request Algorithm
Once it is confirmed that system is in safe state, an algorithm called
resource-request algorithm is used for determining whether the
request by a process can be satisfied or not. To understand this
algorithm, let Req be a matrix of the order pxq, indicating the number
of resources of each type requested by each process at any given
point of time. That is, Req[i] [j] indicates the number of resources of
jth type requested by the ith process at any given point of time.
Following are the steps of this algorithm.
1. If Req[i] [j] ≤ R[i][j], go to step 2, otherwise an error occurs as
process is requesting for more resources than the maximum number of
resources required by it.
2. If Req[i] [j] ≤ A[i] [j], go to step 3, otherwise the process Pi must wait
until the required resources are available.
3. Allocate the resources and make the following changes in the data
structures.

An Example
Consider a system with three processes (P1, P2 and P3) and three
resource types (X, Y and Z). There are 10 instances of resource type
X, 5 of Y and 7 of Z. The matrix M for maximum number of resources
required by the process, matrix C for the number of resources
currently allocated to each process and vector A for maximum
available resources are shown in Figure 6.6 (a), (b) and (c),
respectively. Now, the matrix R representing the number of remaining
resources required by each process can be obtained by the formula
M–C, which is shown in Figure 6.6 (d).

Fig. 6.6 Initial State of the System

It can be observed that currently the system is in safe state and


safe sequence of execution of processes is (P2, P3, P1). Now suppose
that process P2 requests one more resource of each type, that is, the
request vector for process P2 is (1,1,1). First, it is checked whether
this request vector is less than or equal to its corresponding required
vector (3,2,2). If the process has requested for less number of
resources than the declared maximum number of resources of each
type by it at initial stage, then it is checked whether these much
number of resources of each type are available. If so, then it is
assumed that the request is granted, and the changes will be made in
the corresponding matrices shown in Figure 6.7.

Fig. 6.7 State after Granting Request of P2

This new state of system must be checked whether it is safe. For


this an algorithm to check the safe state of the system is executed
and it is determined that the sequence (P2, P3, P1) is a safe sequence.
Thus, the request of process P2 is granted immediately.
Fig. 6.8 An Example of Unsafe State

Consider another state of a system shown in Figure 6.8. Now, a


request for (1, 2, 2) from process P2 arrives. If this request is granted,
the resulting state is unsafe. This is because after the complete
execution of process P2, the resultant vector A is (5, 4, 5). Clearly, the
resource requirement of processes P1 and P3 cannot be satisfied.
Thus, even though the system has resources, request cannot be
granted.
Example 1 Consider the following snapshot of the system:
Find out the following with the help of banker’s algorithm.
(a) How many resources the system still needs?
(b) Is the system currently safe? If it is safe state write the safe
sequence.
(c) If a request from process P2 arrives for (0 1 0 0), can it be granted
immediately?
Solution
(a) The resources each process in the system still needs can be
determined by subtracting their current allocation from the
maximum demand as shown here.
Process Required resources
P0 0000
P1 0750
P2 6622
P3 2002
P4 0320
(b) Using the safety algorithm, it can be determined that the system
is in safe state and the safe sequence is (P0, P3, P4, P1, P2).
(c) If the request from the process P2 for (0 1 0 0) resources is
granted, the resulting system state will be:

As the resulting system state is unsafe, the request from process


P2 cannot be granted immediately.
6.7 DEADLOCK DETECTION
There is a possibility of deadlock if neither the deadlock prevention
nor deadlock avoidance method is applied in a system. In such a
situation, an algorithm must be provided for detecting the occurrence
of deadlock in a system. Once the deadlock is detected, a
methodology must be provided for the recovery of the system from
the deadlock. In this section, we will discuss how a deadlock can be
detected in case we have single or multiple instances of each
resource type.

6.7.1 Single Instance of Each Resource Type


When only single resource of each type is available, the deadlock can
be detected by using variation of resource allocation graph. In this
variation, the nodes representing resources and corresponding edges
are removed. This new variation of resource allocation graph is
known as wait-for graph, which shows the dependency of a process
on another process for the resource allocation. For example, an edge
from the process Pi to Pj indicates that the process Pi is waiting for
the process Pj to release the resources required by it. If there exists
two edges Pn->Ri and Ri->Pm in resource allocation graph, then the
corresponding edge in the wait-for graph will be Pn->Pm indicating that
the process Pn is waiting for the process Pm for the release of the
resources. A resource allocation graph involving six processes and
five resources is shown in Figure 6.9 (a). The corresponding wait-for
graph is shown in Figure 6.9 (b).
Fig. 6.9 Converting Resource Allocation Graph to Wait-for Graph

If there exists a cycle in wait-for graph, there is a deadlock in the


system and the processes forming the part of cycle are blocked in the
deadlock. In wait-for graph [see Figure 6.9 (b)], the processes P2, P3
and P5 form the cycle and hence are blocked in the deadlock. To take
appropriate action to recover from this situation, an algorithm needs
to be called periodically to detect existence of cycle in wait-for graph.

6.7.2 Multiple Instances of a Resource Type


When multiple instances of a resource type exist, the wait-for graph
becomes inefficient to detect the deadlock in the system. For such
system, another algorithm which uses certain data structures similar
to the ones used in banker’s algorithm is applied. The data structures
used are as follows:
1. Available resources, A: A vector of size q stores information
about the number of available resources of each type.
2. Current allocation, C: A matrix of order p×q stores information
about the number of resources of each type allocated to each
process. That is, C[i][j] indicates the number of resources of
type j currently held by the process i.
3. Request, Req: A matrix of order p×q stores information about
the number of resources of each type currently requested by
each process. That is, R[i][j], indicates the number of
resources of type j currently requested by the process i.

To understand the working of deadlock detection algorithm,


consider a vector Complete of size p. Following are the steps to detect
the deadlock.
1. Initialize Complete[i]=False for all
i=1,2,3,...,p.Complete[i]=False indicates that the i process is
th

still not completed.


2. Search for an i, such that Complete[i] =False and (Req≤A), that
is, resources currently requested by this process is less than the
available resources. If no such process exists, then go to step 4.
3. Allocate the requested resources and let the process finish its
execution. Set A=A+C and Complete[i]=True for that process. Go
to step 2.
4. If Complete[i]=False for some i, then the system is in the state
of deadlock and the ith process is deadlocked.

6.8 DEADLOCK RECOVERY


Once the system has detected deadlock in the system, some method
is needed to recover the system from the deadlock and continue with
the processing. This section discusses two different ways to break the
deadlock and let the system recover from the deadlock automatically.

6.8.1 Terminating the Processes


There are two methods that can be used for terminating the
processes to recover from the deadlock. These two methods are as
follows:
• Terminating one process at a time until the circular wait
condition is eliminated. It involves an over head of invoking a
deadlock detection algorithm after termination of each process to
detect whether circular wait condition is eliminated or not, that is,
whether any processes are still deadlocked.
• Terminating all processes involved in the deadlock. This
method will definitely ensure the recovery of a system from the
deadlock. The disadvantage of this method is that many
processes may have executed for a long time; close to their
completion. As a result, the computations performed till the time
of termination are discarded.
In both the cases, all the resources which were acquired by the
processes being terminated are returned to the system. While
terminating any process, it must be ensured that it does not leave any
part of the system in an inconsistent state. For example, a process
might be in the middle of updating a disk file and termination of such
a process may leave that file in an inconsistent state. Similarly, a
printer might be in the middle of printing some document. In this case
when system is recovered from the deadlock, the system must reset
the printer to a correct state.
In the case of partial termination, while selecting the process to be
terminated, the choice of processes must be such that it incurs
minimum cost to the system. The factors which can effect the
selection of a process for termination are as follows:
• Number of remaining resources required by it to complete its task
• Number of processes required to be terminated
• Number and type of resources held by the process
• Duration of time for which process has already been executed
• Priority of the process.

6.8.2 Preempting the Resources


An alternative method to recover system from the state of deadlock is
to preempt the resources from the processes one by one and allocate
them to other processes until the circular-wait condition is eliminated.
The steps involved in the preemption of resources from the process
are as follows:
1. Select a process for preemption: The choice of resources and
processes must be such that they incur minimum cost to the system. All
the factors mentioned earlier must be considered while making choice.
2. Roll back of the process: After preempting the resources, the
corresponding process must be rolled backed properly so that it does not
leave the system in an inconsistent state. Since resources are preempted
from the process, it cannot continue with the normal execution, hence
must be brought to some safe state from where it can be restarted later. In
case no such safe state can be achieved, the process must be totally
rolled backed. However, partial rollback is always preferred over total
rollback.
3. Prevent starvation: In case the selection of a process is based on the
cost factor, it is quiet possible that same process is selected repeatedly
for the rollback leading to the situation of starvation. This can be avoided
by including the number of rollbacks of a given process in the cost factor.

LET US SUMMARIZE
1. Deadlock occurs when every process in a set of processes are in a
simultaneous wait state and each of them is waiting for the release of a
resource held exclusively by one of the waiting processes in the set.
2. A system consists of various types of resources like input/output devices,
memory space, processors, disks, etc. For some resource types, several
instances may be available. When several instances of a resource type
are available, any one of them can be used to satisfy the request for that
resource type.
3. Four necessary conditions for a deadlock are mutual exclusion, hold and
wait, no preemption and circular wait.
4. A deadlock can be depicted with the help of a directed graph known as
resource allocation graph.
5. If each resource type has exactly one instance, cycle in resource
allocation graph indicates a deadlock. If each resource type has several
instances, cycle in resource allocation graph does not necessarily imply a
deadlock.
6. Deadlock prevention or deadlock avoidance techniques can be used to
ensure that deadlocks never occur in a system.
7. A deadlock can be prevented by not allowing all four conditions to be
satisfied simultaneously, that is, by making sure that at least one of the
four conditions does not hold.
8. A deadlock can be avoided by never allowing allocation of a resource to a
process if it leads to a deadlock. This can be achieved when some
additional information is available about how the processes are going to
request for resources in future.
9. A state is said to be safe if allocation of resources to processes does not
lead to the deadlock. More precisely, a system is in safe state only if there
is a safe sequence. A safe sequence is a sequence of process execution
such that each and every process executes till its completion. If no such
sequence of process execution exists then the state of the system is said
to be unsafe.
10. There is a possibility of deadlock if neither deadlock prevention nor
deadlock avoidance method is applied in a system. In such a situation, an
algorithm must be provided for detecting the occurrence of deadlock in a
system.
11. When only single resource of each type is available, the deadlock can be
detected by using a variation of resource allocation graph known as wait-
for graph.
12. When multiple instances of a resource type exist, the wait-for graph
becomes inefficient to detect the deadlock in the system. For such
system, another algorithm which uses certain data structures similar to
the ones used in banker’s algorithm is applied.
13. Once the deadlock is detected, a methodology must be provided for the
recovery of the system from the deadlock.
14. Two different ways in which deadlock can be broken and system can be
recovered are— terminate one or more process to break the circular-wait
condition, preempt the resources from the processes involved in the
deadlock.

EXERCISES
Fill in the Blanks
1. If the requested resource is not available, the requesting process enters a
_____________ until it acquires the resource.
2. A deadlock can be depicted with the help of a directed graph known as
_____________.
3. Deadlock can be avoided using an algorithm known as _____________
devised by Dijkstra in 1965.
4. A _____________ is a sequence of process execution such that each and
every process executes till its completion.
5. When only single resource of each type is available, the deadlock can be
detected by using a variation of resource allocation graph called
_____________.

Multiple Choice Questions


1. Which one of the following is not associated with the resource allocation
graph to depict the deadlock?
(a) Request edge
(b) Claim edge
(c) Assignment edge
(d) None of these
2. Which of the following methods is used to avoid deadlock?
(a) Resource allocation graph
(b) Wait-for graph
(c) Banker’s algorithm
(d) All of these
3. Which of the following data structures is not used in algorithm for detecting
deadlock in the case of multiple instances of a resource type?
(a) Available resources
(b) Current allocation
(c) Request
(d) Maximum
4. In a resource allocation graph a directed arc from a resource to a process
is known as
(a) Request edge
(b) Assignment edge
(c) Process edge
(d) None of these
5. Which of the following are the different ways to handle a deadlock?
(a) Prevent the deadlock from occurring.
(b) Adopt methods for avoiding the deadlock.
(c) Allow the deadlock to occur, detect it and recover from it.
(d) All of these

State True or False


1. A deadlock can occur on a single system only.
2. The mutual exclusion property does not hold for the resources that are
sharable.
3. An unsafe state always leads to a deadlock.
4. In wait-for graph, only the processes are represented.
5. Resource allocation graph shows the dependency of a process on another
process for the resource allocation.

Descriptive Questions
1. Explain deadlock with an example.
2. What are the four conditions necessary for the deadlock? Explain them.
3. What are the steps performed by a process to use any resource type?
4. List the various ways to handle a deadlock.
5. How can the circular wait condition be prevented?
6. Mention different ways by which a system can be recovered from a
deadlock.
7. Consider a system having three instances of a resource type and two
processes. Each process needs two resources to complete its execution.
Can deadlock occur? Explain.
8. Consider a system is in an unsafe state. Illustrate that the processes can
complete their execution without entering a deadlock state.
9. Consider a system has six instances of a resource type and m processes.
For which values of m, deadlock will not occur?
10. Consider a system consisting of four processes and a single resource. The
current sate of the system is given here.
For this state to be safe, what should be the minimum number of
instances of this resource?
11. Consider the following state of a system.

Answer the following questions using the banker’s algorithm for multiple
resources:
(a) What is the content of the matrix Required?
(b) Is the system in a safe state?
(c) Can a request from a process P2 for (1, 0, 2) be granted immediately?
12. What is a resource allocation graph? Explain with an example.
13. Explain the banker’s algorithm for multiple resources with the help of an
example.
chapter 7

Memory Management Strategies

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Differentiate between logical and physical addresses.
⟡ Define address binding.
⟡ Understand the memory management in a bare machine.
⟡ Discuss memory management strategies that involve contiguous
memory allocation.
⟡ Explain memory management strategies that involve non-contiguous
memory allocation—for example, paging and segmentation.
⟡ Introduce a memory management scheme which is a combination of
segmentation and paging.
⟡ Discuss swapping.
⟡ Discuss overlays.

7.1 INTRODUCTION
To improve the utilization of the CPU and the speed of the computer’s
response to its users, the system keeps several processes in
memory, that is, several processes share the memory. Due to the
sharing of memory, there is need for memory management. It is the
job of memory manager, a part of the operating system, to manage
memory between multiple processes in an efficient way. For this, it
keeps track of which part of the memory is occupied and which part is
free. It also allocates and de-allocates memory to the processes
whenever required, and so on. Moreover, it provides the protection
mechanism to protect the memory allocated to each process from
being accessed by other processes.
For managing the memory, the memory manager may choose
from a number of available strategies. Memory allocation to the
processes using these strategies is either contiguous or non-
contiguous. This chapter discusses all the memory management
strategies in detail.

7.2 BACKGROUND
Every byte in the memory has a specific address that may range from
0 to some maximum value as defined by the hardware. This address
is known as physical address. Whenever a program is brought into
main memory for execution, it occupies certain number of memory
locations. The set of all physical addresses used by the program is
known as physical address space. However, before a program can
be executed, it must be compiled to produce the machine code. A
program is compiled to run starting from some fixed address and
accordingly all the variables and procedures used in the source
program are assigned some specific address known as logical
address. Thus, in machine code, all references to data or code are
made by specifying the logical addresses and not by the variable or
procedure names. The range of addresses that user programs can
use is system-defined and the set of all logical addresses used by a
user program is known as its logical address space.
When a user program is brought into main memory for execution,
its logical addresses must be mapped to physical addresses. The
mapping from addresses associated with a program to memory
addresses is known as address binding. The address binding can
take place at one of the following times.
• Compile time: The address binding takes place at compile time if
it is known at the compile time which addresses the program will
occupy in the main memory. In this case, the program generates
absolute code at the compile time only, that is, logical addresses
are same as that of physical addresses.
• Load time: The address binding occurs at load time if it is not
known at the compile time which addresses the program will
occupy in the main memory. In this case, the program generates
relocatable code at the compile time which is then converted into
the absolute code at the load time.
• Run time: The address binding occurs at run time if the process is
supposed to move from one memory segment to other during its
execution. In this case also, the program generates relocatable
code at compile time which is then converted into the absolute
code at the run time.
Note: The run time address binding is performed by the hardware
device known as memory-management unit (MMU).

7.3 BARE MACHINE


In computer terminology, a bare machine refers to a computer having
no operating system. In the early stage of computing, there were no
operating systems. Programmers used to feed their programs into the
computer system directly using machine language—no system
software support was there. This approach is termed as bare
machine approach in the development of operating systems.
In such systems, the whole memory is assigned to the user
process which runs in the kernel mode. The user process has
complete control over the memory. No protection mechanisms are
required as no operating system or any other process is residing in
the memory. Thus, there are no overheads on the programmers to
provide security and protection mechanisms. In addition, no special
memory hardware is needed, which reduces the hardware cost to a
great extent. Moreover, the logical address generated by the user
programs equal to the physical address of the memory, hence no
address translation is required. Therefore, no memory management
strategies are required in a bare machine. However, since there are
no operating system services for accessing the main memory, the
user must provide these.

7.4 CONTIGUOUS MEMORY ALLOCATION


In contiguous memory allocation, each process is allocated a single
contiguous part of the memory. The different memory management
schemes that are based on this approach are single partition and
multiple partitions.

7.4.1 Single Partition


One of the simplest ways to manage memory is to partition the main
memory into two parts. One of them is permanently allocated to the
operating system while the other part is allocated to the user process
(see Figure 7.1). In order to provide a contiguous area for the user
process, it usually resides at one extreme end of memory. Note that
in Figure 7.1, the operating system is in lower part of the memory.
However, it is not essential that the operating system resides at the
bottom of memory; it can reside at the upper part of the memory also.
The factor that decides the location of the operating system in the
memory is the location of the interrupt vector. The operating system is
placed at the same end of the memory where the interrupt vector is
located.
Fig. 7.1 Memory having Single Partition

Note: The portion of the operating system residing in the memory is known as
resident monitor.

Hardware Support
Single contiguous memory allocation is a simple memory
management scheme that is usually associated with stand-alone
computers with simple batch operating systems. Thus, no special
hardware support is needed, except for protecting the operating
system from the user process. This hardware protection mechanism
may include bounds register and two modes of CPU (supervisor and
user). The address of protected area is contained in bounds register.
Now, depending upon the mode of CPU, access to protected area is
controlled. If CPU is in user mode, each time a memory reference is
made, a hardware check is performed to ensure that it is not an
access to the protected area. If an attempt is made to access the
protected area, an interrupt is generated and control is transferred to
the operating system. On the other hand, if the CPU is in supervisor
mode, the operating system is allowed to access the protected area
as well as to execute the privileged instructions to alter the content of
bounds register.

Software Support
In this scheme, only one process can execute at a time. Whenever a
process is to be executed, the size of the process is checked. If the
size is less or equal to size of memory, the operating system loads it
into the main memory for execution. After termination of that process,
the operating system waits for another process. If the size of the
process is greater than the size of memory, an error occurs and next
process is scheduled. The same sequence is performed for the next
process.

Advantages
This scheme is easy to implement. Generally, the operating system
needs to keep track of the first and the last location allocated to the
user processes. However, in this case, the first location is
immediately following the operating system and the last location is
determined by the capacity of the memory. It needs no hardware
support except for protecting the operating system from the user
process.

Disadvantages
The main drawback of this technique is that since only one user
process can execute at a time, the portion of the memory which is
allocated but not used by the process will get wasted as shown in
Figure 7.1. Thus, memory is not fully utilized. Another disadvantage is
that the process size must be smaller or equal to the size of main
memory; otherwise, the process cannot be executed.
Note: The memory management scheme having single partition is
used by single process microcomputer operating systems, such as
CP/ M and PC-DOS.

7.4.2 Multiple Partitions


A single partition scheme restricts the system to have only one
process in memory at a time that reduces utilization of the CPU as
well as of memory. Thus, monoprogramming systems are rarely used.
Most of the systems used today support multiprogramming which
allows multiple processes to reside in the memory at the same time.
The simple way to achieve multiprogramming is to divide main
memory into a number of partitions which may be of fixed or variable
size.

Multiprogramming with Fixed Partitions


In this technique, each partition is of fixed size and can contain only
one process. There are two alternatives for this technique—equal-
sized partitions or unequal-sized partitions (see Figure 7.2). First,
consider the case of equal-sized partitions where any process can be
loaded into any partition. Whenever a partition is free, a process
whose size is less than or equal to the partition size is selected from
the input queue and loaded into this partition. When the process
terminates, the partition becomes free to be allocated to another
process. The implementation of this method does not require much
effort since all partitions are of same size. The operating system is
required to keep track only of the partition occupied by each process.
For this, it maintains a table that keeps either the starting address of
each process or the partition number occupied by each process.
Fig. 7.2 Memory having Multiple Fixed Partitions

There is one problem with this method that is the memory


utilization is not efficient. Any process regardless of how small it is,
occupies an entire partition which leads to the wastage of memory
within the partition. This phenomenon which results in the wastage of
memory within the partition is called internal fragmentation. For
example, loading a process of size 4M-n bytes into a partition of size
4M (where, M stands for megabytes) would result in a wasted space
of n bytes within the partition (see Figure 7.3).
Fig. 7.3 Internal Fragmentation

This problem cannot be resolved completely but can be reduced


to some extent by using unequal-sized partition method where a
separate input queue is maintained for each partition. Whenever a
process arrives, it is placed into the input queue of the smallest
partition large enough to hold it. When this partition becomes free, it
is allocated to the process. For example, according to Figure 7.2(b), if
a process of size 5M arrives, it will be accommodated in the partition
of size 6M. In this case also, some memory is wasted, that is, internal
fragmentation still exists, but less than that of equal-sized partition
method.
With this method, there may be possibility that the input queue for
a large partition is empty but the queue for small partition is full [see
Figure 7.4 (b)]. That is, the small jobs have to wait to be loaded into
memory, though a large amount of memory is free. To prevent this, a
single input queue can be maintained [see Figure 7.4 (b)]. Whenever
a partition becomes free, the process that fits in it can be chosen from
the input queue using some scheduling algorithm.
Fig. 7.4 Memory Allocation in Fixed Partitions

The fixed-partitioning technique is easy to implement and requires


less overhead but has some disadvantages which are as follows.
• The number of processes in memory depends on the number of
partitions. Thus, the degree of multiprogramming is limited.
• The memory cannot be used efficiently in the case of processes of
small sizes.
Note: The technique having fixed partitions is no longer in use.

Multiprogramming with Variable Partitions


To overcome the disadvantages of fixed-partitions technique, a
technique called MVT (Multiprogramming with a Variable number of
Tasks) is used. It is the generalization of the fixed partitions technique
in which the partitions can vary in number and size. In this technique,
the amount of memory allocated is exactly the amount of memory a
process requires. To implement this, the table maintained by the
operating system stores both the starting address and ending
address of each process.
Initially, when there is no process in the memory, the whole
memory is available for the allocation and is considered as a single
large partition of available memory (a hole). Whenever a process
requests for the memory, the hole large enough to accommodate that
process is allocated. The rest of the memory is available to other
processes. As soon as the process terminates, the memory occupied
by it is de-allocated and can be used for other processes. Due to
subsequent memory allocations and de-allocations, at a given time,
some parts of memory will be in use while others will be free [see
Figure 7.5 (a)]. Now to make further allocations, the memory
manager must keep track of the free space in memory. For this, the
memory manager maintains a free-storage list that keeps track of the
unused part (holes of variable sizes) of memory. The free-storage list
is implemented as a linked list where each node contains the size of
the hole and the address of the next available hole [see Figure 7.5
(b)].

Fig. 7.5 Memory Map and Free-storage List


In general, at a certain point of time, there will be a set of holes of
various sizes dispersed in the memory. As a result, there may be
possibility that the total available memory is large enough to
accommodate the waiting process. However, it cannot be utilized as it
is scattered. This results into external fragmentation problem, which
is discussed in detail in the next section.

Partition Selection Algorithms


Whenever a process arrives and there are various holes large
enough to accommodate it, the operating system may use one of the
following algorithms to select a partition for the process.
• First fit: In this algorithm, the operating system scans the free-
storage list and allocates the first hole that is large enough to
accommodate that process. This algorithm is fast because search
is little as compared to other algorithms.
• Best fit: In this algorithm, the operating system scans the free-
storage list and allocates the smallest hole whose size is larger
than or equal to the size of the process. Unlike first fit algorithm, it
allocates a partition that is close to the size required for that
process. It is slower than the first fit algorithm as it has to search
the entire list every time. Moreover, it leads to more wastage of
memory as it results in the smallest leftover holes that cannot be
used to satisfy the memory allocation request.
• Worst fit: In this algorithm, the operating system scans the entire
free-storage list and allocates the largest hole to that process.
Unlike best fit, this algorithm results in the largest leftover holes.
Moreover, simulation indicates that worst fit allocation is not very
effective in reducing the wastage of memory.
• Quick fit: This algorithm follows completely different approach to
select a free partition. It basically maintains separate lists for
common memory sizes (for example, 4K, 8K, 12K, and so on). In
addition, a separate list is maintained for the holes that do not fit
into any of the other lists. The main advantage of this algorithm is
that it finds a hole of the right size very quickly; however,
maintenance of multiple lists of different sizes is a bit
cumbersome task. Moreover, it is difficult to find free holes that
are contiguous and can be merged to perform compaction.
Note: First fit and best fit are among the most popular algorithms for
dynamic memory allocation.
Example 1 Consider the memory map given in Figure 7.5, how would
each of the first fit, best fit and worst fit algorithms allocate memory to
a process P of size 2M.
Solution According to the different algorithms, memory will be
allocated to the process P as shown in Figure 7.6.

Fig. 7.6 Memory Allocation Using Different Algorithms

Example 2 Given memory partition of 100K, 500K, 200K, 300K, and


600K in order. How would each of the first fit, best fit and worst fit
algorithms place the processes of 212K, 417K, 112K and 426K in
order? Which algorithm makes the most efficient use of memory?
Show the diagram of memory status in each case.
Solution Assuming that the operating system resides in lower part of
main memory and occupies 200K, the initial status of memory is
shown in Figure 7.7.

Fig. 7.7 Initial Status of Memory

According to first fit algorithm, the first process of size 212K is


placed in 500K partition, resulting in a hole of size 288 K. The next
process of size 417K is placed in 600K partition, resulting in a hole of
size 183K. The next process of size 112K is placed in 288K partition
(left after placing 212K in 500K partition), resulting in a hole of size
176K. Since now there is no partition big enough to accommodate the
last process of size 426K, the process has to wait.
According to best fit algorithm, the first process of size 212K is
placed in 300K partition, resulting in a hole of size 88K. The next
process of size 417K is placed in 500K partition, resulting in a hole of
size 83K. The next process of size 112K is placed in 200K partition,
resulting in a hole of size 88K. Finally, the last process of size 426K is
placed in 600K partition, resulting in a hole of size 174K.

Fig. 7.8 Memory Status


According to worst fit algorithm, the first process of size 212K is
placed in 600K partition, resulting in a hole of size 388K. The next
process of size 417K is placed in 500K partition, resulting in a hole of
size 83K. The next process of size 112K is placed in 388K partition
(left after placing 212K in 600K partition), resulting in a hole of size
276K. Since now there is no partition big enough to accommodate the
last process of size 426K, the process has to wait.
The memory maps according to first fit, best fit and worst fit
algorithms are shown in Figure 7.8.

Fragmentation Problem
An important issue related to contiguous multiple partition allocation
scheme is to deal with memory fragmentation. There are two facets
to memory fragmentation: internal and external fragmentation.
Internal fragmentation exists in the case of memory having multiple
fixed partitions when the memory allocated to a process is not fully
used by the process. Internal fragmentation has already been
discussed in detail. So here we focus on external fragmentation.
External fragmentation (also known as checker boarding) occurs
in the case of memory having multiple variable partitions when the
total free memory in the system is large enough to accommodate a
waiting process but it cannot be utilized as it is not contiguous.
To understand the external fragmentation problem, consider the
memory system (map along with free storage list) shown in Figure
7.5. Now, if a request for a partition of size 5M arrives, it cannot be
granted because no single partition is available that is large enough
to satisfy the request [see Figure 7.5 (a)]. However, the combined
free space is sufficient to satisfy the request.
To get rid of external fragmentation problem, it is desirable to
relocate (or shuffle) some or all portions of the memory in order to
place all the free holes together at one end of memory to make one
large hole. This technique of reforming the storage is termed as
compaction. Compaction results in the memory partitioned into two
contiguous blocks—one of used memory and another of free
memory. Figure 7.9 shows the memory map shown in Figure 7.5(a)
after performing compaction. Compaction may take place at the
moment any node frees some memory or when a request for
allocating memory fails, provided the combined free space is enough
to satisfy the request. Since it is expensive in terms of CPU time, it is
rarely used.

Fig. 7.9 Memory after Compaction

Relocation and Protection


In multiprogramming environment, multiple processes are executed
due to which two problems can arise which are relocation and
protection.

Relocation
From the earlier discussion, it is clear that the different processes run
at different partitions. Now, suppose a process contains an instruction
that requires access to address location 50 in its logical address
space. If this process is loaded into a partition at address 10M, this
instruction will jump to the absolute address 50 in physical memory,
which is inside the operating system. In this case, it is required to
map the address location 50 in logical address space to the address
location 10M + 50 in the physical memory. Similarly, if the process is
loaded into some other partition, say at address 20M, then it should
be mapped to address location 20M+50. This problem is known as
relocation problem. This relocation problem can be solved by
equipping the system with a hardware register called relocation
register which contains the starting address of the partition into
which the process is to be loaded. Whenever an address is
generated during the execution of a process, the memory
management unit adds the content of the relocation register to the
address resulting in physical memory address.
Example 3 Consider that the logical address of an instruction in a
program is 7632 and the content of relocation register is 2500. To
which location in the memory will this address be mapped?
Solution Here, Logical address = 7632,
Content of relocation register = 2500
Since, Physical address = Logical address + Content of relocation
register
Physical address = 7632 + 2500 = 10132
Thus, the logical address 7632 will be mapped to the location 10132
in memory.
Example 4 If a computer system has 16-bit address line and
supports 1K page size what will be the maximum page number
supported by the system?
Solution A computer system has 16-bit address lines, implies that
the logical address is of 16 bits. Therefore, the size of logical address
space is 216 and page size is 1K, that is, 1 *1024 bytes = 210 bytes.
Thus, the page offset will be of 10 bits and page number will be of
(16-10) = 6 bits.
Therefore, the maximum page number supported by this system
is 111111.

Memory Protection
Using relocation register, the problem of relocation can be solved but
there is a possibility that a user process may access the memory
address of other processes or the operating system. To protect the
operating system from being accessed by other processes and the
processes from one another, another hardware register called limit
register is used. This register holds the range of logical addresses.
Each logical address of a program is checked against this register to
ensure that it does not attempt to access the memory address
outside the allocated partition. Figure 7.10 shows relocation and
protection mechanism using relocation and limit register respectively.

Fig. 7.10 Relocation and Protection using Relocation and Limit Register
7.5 NON-CONTIGUOUS MEMORY ALLOCATION
In non-contiguous allocation approach, parts of a single process can
occupy noncontiguous physical addresses. In this section, we will
discuss memory management schemes based on non-contiguous
allocation of physical memory.

7.5.1 Paging
In paging, the physical memory is divided into fixed-sized blocks
called page frames and logical memory is also divided into fixed-size
blocks called pages which are of same size as that of page frames.
When a process is to be executed, its pages can be loaded into any
unallocated frames (not necessarily contiguous) from the disk. Figure
7.11 shows two processes A and B with all their pages loaded into the
memory. In this figure, the page size is of 4KB. Nowadays, the
systems typically support page sizes between 4KB and 8KB.
However, some systems support even larger page sizes.

Basic Operation
In paging, the mapping of logical addresses to physical addresses is
performed at the page level. When CPU generates a logical address,
it is divided into two parts: a page number (p) [high-order bits] and a
page offset (d) [low-order bits] where d specifies the address of the
instruction within the page p. Since the logical address is a power of
2, the page size is always chosen as a power of 2 so that the logical
address can be converted easily into page number and page offset.
To understand this, consider the size of logical address space is 2m.
Now, if we choose a page size of 2n (bytes or words), then n bits will
specify the page offset and m-n bits will specify the page number.
Example 5 Consider a system that generates logical address of 16
bits and page size is 4KB. How many bits would specify the page
number and page offset?
Fig. 7.11 Concept of Paging

Note: Some systems like Solaris support multiple page sizes (say 8KB and
4MB) depending on the data stored in the pages.

Solution Here, the logical address is of 16 bits, that is, the size of
logical address space is 216 and page size is 4KB, that is, 4 *1024
bytes = 212 bytes.
Thus, the page offset will be of 12 bits and page number will be of
(16-12) = 4 bits.
Now let us see how a logical address is translated into a physical
address. In paging, address translation is performed using a mapping
table, called page table. The operating system maintains a page
table for each process to keep track of which page frame is allocated
to which page. It stores the frame number allocated to each page and
the page number is used as index to the page table (see Figure
7.12).
When CPU generates a logical address, that address is sent to
MMU. The MMU uses the page number to find the corresponding
page frame number in the page table. That page frame number is
attached to the high-order end of the page offset to form the physical
address that is sent to the memory. The mechanism of translation of
logical address into physical address is shown in Figure 7.13.

Fig. 7.12 A Page Table


Fig. 7.13 Address Translation in Paging

Note: Since both the page and page frames are of same size, the offsets within
them are identical, and need not be mapped.

Example 6 Consider a paged memory system with eight pages of 8


KB page size each and 16 page frames in memory. Using the
following page table, compute the physical address for the logical
address 18325.
Solution Since, total number of pages = 8, that is, 23 and each page
size = 8 KB, that is, 213 bytes, the logical address will be of 16 bits.
Out of these 16 bits, the three high-end order bits represent the page
number and the 13 low-end order bits represent offset within the
page. In addition, there are 16, that is, 24 page frames in memory,
thus, the physical address will be of 17 bits.
Given logical address = 18325 which is equivalent to
0100011110010101. In this address, page number = 010, that is, 2
and page offset = 0011110010101. From the page table, it is clear
that the page number 2 is in page frame 1011.
Therefore, the physical address = 10110011110010101, which is
equivalent to 92053.
Example 7 Consider a logical address space of eight pages of 1024
words each, mapped onto a physical memory of 32 frames.
(i) Calculate the number of bits in the logical address.
(ii) Calculate the number of bits in the physical address.
Solution (i) Since total number of pages = 8, that is, 23 and each
page size = 1024 words, that is, 210 words, the logical address will be
of 13 bits (3+10). Out of these 13 bits, the three high-end order bits
represent the page number and the ten low-end order bits represent
offset within the page.
(ii) There are 32, that is, 25 frames in the memory, thus, the physical
address will be of 15 bits (5+10).

Advantages
• Since the memory is always allocated in fixed unit, any free frame
can be allocated to a process. Thus, there is no external
fragmentation.

Disadvantages
• Since memory is allocated in terms of an integral number of page
frames, there may be some internal fragmentation. That is, if the
size of a given process does not come out to be a multiple of
page size, then the last frame allocated to the process may not be
completely used. To illustrate this, consider a page size of 4KB
and a process requires memory of 8195 bytes, that is, 2 pages +
3 bytes. In this case, for only 3 bytes, an entire frame is wasted
resulting in internal fragmentation.

Page Allocation
Whenever a process requests for a page frame, the operating system
first locates a free page frame in the memory, and then allocates it to
the requesting process. For this, the operating system must keep
track of free and allocated page frames in physical memory. One way
to achieve this is to maintain a memory-map table (MMT). An MMT
is structured in the form of a static table in which each entry describes
the status of a page frame, indicating whether it is free or allocated.
Hence, an MMT for a given system contains only as many entries as
there are number of page frames in the physical memory. That is, if
the size of physical memory is m and page size is p, then
f= m/p, where f is the number of page frames.

Fig. 7.14 Memory Map Table

Since, both m and p are usually an integer power of base 2, the


resultant value of f is also an integer. A memory map table for the
physical memory shown in Figure 7.12 is shown in Figure 7.14.
This approach of keeping track of free page frames is simple to
implement. However, as the number of allocated page frames
increases, the system has to search more number of MMT entries to
find a free page frame, which makes this approach less efficient. To
illustrate this, let us assume that the free page frames are randomly
distributed in memory, then
x= n/q

where, x is the average number of MMT entries that needs to be


examined
n is the number of free frames to be found,
q is the probability that a given frame is free.
It is clear from the expression that x is inversely proportion to q,
that is, more is the amount of memory in use (or lesser the probability
that a given page frame is free), more will be the number of MMT
entries to be examined.
Another problem with this approach occurs when any process
terminates and all the frames allocated to it need to be de-allocated.
At that time, all the page frames of the departing process found in the
page table should be marked as FREE in the MMT, which is a time-
consuming task.
Another approach to keep track of free frames is to maintain a list
of free frames in the form of a linked list. In this case, whenever, n
free frames are required, the first n nodes of the list can be removed
from the free list and can be allocated to the process. When a
process departs, the frames found in the page table of that process
can be linked to the beginning of the free list. Since the free list is an
unordered list, adding free frames to the beginning of the free list is
generally faster than adding them to the end of the list.
Note that unlike MMT, the time taken to remove or add page
frames to the free list is not affected by the amount of memory in use.
That is, the time complexity of the free list approach is independent of
the variation of memory utilization. However, the time complexity and
memory utilization of storing and processing linked list is generally
higher as compared to static tables.

Hardware Support for Paging


Each operating system has its own way of storing page tables. The
simplest way is to use registers to store the page table entries
indexed by page number. Though this method is faster and does not
require any memory reference, its disadvantage is that it is not
feasible in the case of large page table as registers are expensive.
Moreover, at every context switch, the page table needs to be
changed which in turn requires all the registers to be reloaded. This
degrades the performance.
Another way is to keep the entire page table in main memory and
the pointer to page table stored in a register called page-table base
register (PTBR). Using this method, page table can be changed by
reloading only one register, thus reduces context switch time to a
great extent. The disadvantage of this scheme is that it requires two
memory references to access a memory location; first to access page
table using PTBR to find the page frame number and second to
access the desired memory location. Thus, memory accessing is
slowed down by a factor of two.
To overcome this problem, the system can be equipped with a
special hardware device known as Translation look-aside buffer
(TLB) (or associative memory). The TLB is inside MMU and
contains a limited number of page table entries. When CPU
generates a logical address and presents it to the MMU, it is
compared with the page numbers present in the TLB. If a match is
found in TLB (called TLB hit), the corresponding page frame number
is used to access the physical memory. In case a match is not found
in TLB (called TLB miss), memory is referenced for the page table.
Further, this page number and the corresponding frame number are
added to the TLB so that next time if this page is required, it can be
referenced quickly. Since the size of TLB is limited so when it is full,
one entry must be replaced. Figure 7.15 shows the mechanism of
paging using TLB.
Fig. 7.15 Paging with TLB

TLB can contain entries for more than one process at the same
time, so there is a possibility that two processes map the same
page number to different frames. To resolve this ambiguity, a
process identifier (PID) can be added with each entry of TLB. For
each memory access, the PID present in the TLB is matched with
the value in a special register that holds the PID of the currently
executing process. If it matches, the page number is searched to
find the page frame number; otherwise it is treated as a TLB miss.

Structure of Page Table


For structuring page table, there are different techniques, namely,
hierarchical paging, hash page table, and inverted page table. These
are discussed next.

Hierarchical Paging
In a system where the page table becomes excessively large that it
occupies a significant amount of physical memory, it also needs to be
broken further into pages that can be stored non-contiguously. For
example, consider a system with 32-bit logical address space (232).
Considering a page size of 4 KB (212), the page table consists of 220
(232/212) entries. If each page table entry consists of 4 bytes, the total
physical memory occupied by a page table is 4 MB, which cannot be
stored in the main memory all the time. To get around this problem,
many systems use hierarchical (or multilevel) page table where, a
hierarchy of page tables with several levels is maintained. This
implies that the logical address space is broken down into multiple
page tables at different levels.
The simplest way is to use a two-level paging scheme in which
the top-level page table indexes the second level page table. In a
system having one large page table, the 32-bit logical address is
divided into a page number consisting of 20 bits and a page offset
consisting of 12 bits. Now, for a two-level page table, the page
number of 20 bits is further divided into a 10-bit page number and a
10-bit page offset (see Figure 7.16).
Fig. 7.16 A 32-bit Logical Address with Two Page Table Fields

In this figure, p1 is the index into top-level page table and p2 is the
displacement within the page of the top-level page table. The address
translation scheme for 32-bit paging architecture is shown in Figure
7.17. This scheme is also called forward-mapped page table
because the address translation works from the top-level page table
towards the inner page table.

Fig. 7.17 Address Translation Scheme for Two-Level Page Table

For systems that support 64-bit logical address space, a two-level


page table becomes inappropriate. For such systems, we need to
have three-level or four-level paging schemes. In such schemes, the
top-level page table of the two-level paging scheme is further divided
into smaller pieces. With a three-level paging scheme, the 64-bit
logical address contains 32 bits for indexing into the top-level page
table (see Figure 7.18). If each page table entry consists of 4 bytes,
the top-level page table is 234 in size, which is still very large. To
further reduce the size of the top-level page table, it is further divided
into pieces, resulting into four-level paging scheme.
Fig. 7.18 A 64-bit Logical Address with Three Page Table Fields

Note that the UltraSPARC architecture with 64-bit addressing


would require seven levels of paging, thus, requires a large number
of memory accesses to translate each logical address. Thus,
hierarchical page tables are generally considered inappropriate for
64-bit architectures.

Hashed Page Tables


This technique is commonly used for handling address spaces
greater than 32-bits. In this, hash value is used as virtual page
number. A hash table is maintained in which each entry contains a
linked list of elements hashing to the same location. The linked list is
basically maintained to handle collisions. Each element in linked list
contains three fields: virtual page number, the value of mapped page
frame, and a pointer to the next element in the linked list.
The address translation of logical address into the physical
address using the hash page table is done as follows (also see
Figure 7.19):
1. A hash function is applied to the virtual page number in the
logical address space. This generates a hash value, which is
stored in the hash page table.
2. The desired virtual page number is compared with the virtual
page numbers stored in the linked list associated with the
generated hash value.
3. Once a match is found, the corresponding page frame (field 2) of
that node is used to form the desired physical address.
Fig. 7.19 Hashed Page Table

Note that a hashed page table lookup may require many memory
references to search the desired virtual address and its
corresponding frame number because there is no guarantee on the
number of entries in the linked list.

Inverted Page Tables


There are number of processes in the system and each process has
a page table. The page table further contains pages related to
process. Further, each page table may consist of millions of entries.
Such page tables consume large amount of physical memory.
An alternative is to use inverted page table. An inverted page
table contains one entry for each page frame of main memory. Each
entry consists of the virtual address of the page stored in that page
frame along with the information about the process that owns that
page. Hence, only one page table is maintained in the system for all
the processes. This scheme is shown in Figure 7.20.
Fig. 7.20 Inverted Page Table

To understand the address translation in the inverted page table


scheme, consider the simplified version of the inverted page table
used in the IBM RT. Each logical address in this system consists of
<process id, page number, offset>.

When a memory reference occurs, the pair <process id, page


number> is presented to memory. The inverted page table is then
searched for the match. Suppose the match occurs at jth entry then
the physical address <j, offset> is generated. If there is no match,
an illegal address access has been attempted.
Though inverted page table decreases the memory needed to
store each page table, but increases the time needed to search the
table when a page reference occurs. This is because the page table
is sorted by physical address, and lookups are made on virtual
address, therefore the entire table needs to be searched in order to
find the match. To reduce the search to one or at most a few page-
table entries, a hash table can be used.

7.5.2 Segmentation
A user views a program as a collection of segments such as main
program, routines, variables, etc. All of these segments are variable
in size and their size may also vary during execution. Each segment
is identified by a name (or segment number) and the elements within
a segment are identified by their offset from the starting of the
segment. Figure 7.21 shows the user view of a program.

Fig. 7.21 User View of a Program

Segmentation is a memory management scheme that implements


the user view of a program. In this scheme, the entire logical address
space is considered as a collection of segments with each segment
having a number and a length. The length of a segment may range
from 0 to some maximum value as specified by the hardware and
may also change during the execution. The user specifies each
logical address consisting of a segment number (s) and an offset (d).
This differentiates segmentation from paging in which the division of
logical address into page number and page offset is performed by the
hardware.
To keep track of each segment, a segment table is maintained by
the operating system (see Figure 7.22). Each entry in the segment
table consists of two fields: segment base and segment limit. The
segment base specifies the starting address of the segment in
physical memory and the segment limit specifies the length of the
segment. The segment number is used as an index to the segment
table.

Fig. 7.22 A Segment Table

When CPU generates a logical address, that address is sent to


MMU. The MMU uses the segment number of logical address as an
index to the segment table. The offset is compared with the segment
limit and if it is greater, invalid-address error is generated. Otherwise,
the offset is added to the segment base to form the physical address
that is sent to the memory. Figure 7.23 shows the hardware to
translate logical address into physical address in segmentation.
Fig. 7.23 Segmentation Hardware

Advantages
• Since a segment contains one type of object, each segment can
have different type of protection. For example, a procedure can
be specified as execute only whereas a char type array can be
specified as read only.
• It allows sharing of data or code between several processes. For
example, a common function or shared library can be shared
between various processes. Instead of having them in address
space of every process, they can be put in a segment and that
segment can be shared.
Example 8 Using the following segment table, compute the physical
address for the logical address consisting of segment and offset as
given below.
(a) segment 2 and offset 247
(b) segment 4 and offset 439

Solution
(a) Here, offset = 247 and segment is 2.
It is clear from the segment table that the limit of segment 2 is 780
and that of segment base is 2200.
Since the offset is less than the segment limit, physical address is
computed as:
Physical address = Offset + Segment base
= 247 + 2200 = 2447
(b) Here, Offset = 439 and segment is 4.
It is clear from the segment table that the limit of segment 4 is 400
and that of segment base is 1650.
Since the offset is greater than the segment limit, invalid-address
error is generated.

7.5.3 Segmentation with Paging


The idea behind the segmentation with paging is to combine the
advantages of both paging (such as uniform page size) and
segmentation (such as protection and sharing) together into a single
scheme. In this scheme, each segment is divided into a number of
pages. To keep track of these pages, a page table is maintained for
each segment. The segment offset in the logical address (comprising
segment number and offset) is further divided into a page number
and a page offset. Each entry of segment table contains the segment
base, segment limit and one more entry that contains the address of
the segment’s page table.
The logical address consists of three parts: segment number (s),
page number (p) and page offset (d). Whenever address translation is
to be performed, firstly, the MMU uses the segment number as an
index to segment table to find the address of page table. Then the
page number of logical address is attached to the high-order end of
the page table address and used as an index to page table to find the
page table entry. Finally, the physical address is formed by attaching
the frame number obtained from the page table entry to the high-
order end of the page offset. Figure 7.24 shows the address
translation in segmentation with paging scheme.
Fig. 7.24 Segmentation with Paging

Example 9 On a system using paging and segmentation, the virtual


address space consists of up to 16 segments where each segment
can up to 216 bytes long. The hardware pages each segment into
512-byte pages. How many bits in the virtual address specify the:

(i) Segment number


(ii) Page number
(iii) Offset within page
(iv) Entire virtual address

Solution

(i) Since virtual address space consists of 16 segments, that is,


24 segments. Therefore, 4 bits are required to specify the
segment number.
(ii) The size of each segment is 216 bytes, and each segment
consists of n pages each of which 512 (29) bytes long.
Therefore,
Number of pages (n) in each segment = size of each
segment/size of each page
= 216/29 = 216 – 9 = 27 pages
Thus, 7 bits are required to specify the page number.
(iii) Since size of each page is 29 bytes, therefore, 9 bits are
required to specify the offset within the page.
(iv) Entire virtual address = segment number + page number +
offset
= 4 + 7 + 9 = 20 bits

7.6 SWAPPING
Fig. 7.25 Swapping

In multiprogramming, a memory management scheme called


swapping can be used to increase the CPU utilization. The process of
bringing a process to memory and after running for a while,
temporarily copying it to disk is known as swapping. Figure 7.25
shows the swapping process. The decision of which process is to
swapped in and which process is to be swapped out is made by the
CPU scheduler. For example, consider a multiprogramming
environment with priority-based scheduling algorithm. When a
process of high-priority enters the input queue, a process of low
priority is swapped out so that the process of high priority can be
loaded and executed. On the termination of this process, the process
of low priority is swapped back in the memory to continue its
execution.
7.7 OVERLAYS
Memory management strategies require the entire process to be in
main memory before its execution. Thus, the size of the process is
limited to the size of physical memory. To overcome this limitation, a
memory management scheme called overlaying can be used, that
allows a process to execute irrespective of the system having
insufficient physical memory. The programmer splits a program into
smaller parts called overlays in such a way that no two overlays are
required to be in main memory at the same time. An overlay is loaded
into memory only when it is needed. Initially, overlay 0 would run.
When it is completed, it would call another overlay and so on until the
process terminates. These overlays reside on the disk and swapped
in and out of memory dynamically as needed thereby reducing the
amount of memory needed by the process. The major disadvantage
of this technique is that it requires a major involvement of the
programmer. Moreover, splitting a program into smaller parts is time
consuming.

LET US SUMMARIZE
1. To improve the utilization of the CPU and the speed of the computer’s
response to its users, the system keeps several processes in memory. It
is the job of memory manager, a part of the operating system, to manage
memory between multiple processes in an efficient way.
2. For managing the memory, the memory manager may use from a number
of available memory management strategies.
3. All the memory management strategies allocate memory to the processes
using either of two approaches: contiguous memory allocation or non-
contiguous memory allocation.
4. Every byte in the memory has a specific address that may range from 0 to
some maximum value as defined by the hardware. This address is known
as physical address.
5. A program is compiled to run starting from some fixed address and
accordingly all the variables and procedures used in the source program
are assigned some specific address known as logical address.
6. The mapping from addresses associated with a program to memory
addresses is known as address binding. The addresses binding can take
place at compile time, load time or run time.
7. In computer terminology, a bare machine refers to a computer having no
operating system. In such system, the whole memory is assigned to teh
user process, which runs in teh kernel mode.
8. In contiguous memory allocation, each process is allocated a single
contiguous part of the memory. The different memory management
schemes that are based on this approach are single partition and multiple
partitions.
9. In single partition technique, main memory is partitioned into two parts.
One of them is permanently allocated to the operating system while the
other part is allocated to the user process.
10. The simple way to achieve multiprogramming is to divide the main memory
into a number of partitions which may be of fixed or variable sizes.
11. There are two alternatives for multiple partition technique—equal-sized
partitions or unequal-sized partitions.
12. In equal-sized partitions technique, any process can be loaded into any
partition. Regardless of how small a process is, occupies an entire
partition which leads to the wastage of memory within the partition. This
phenomenon which results in the wastage of memory within the partition
is called internal fragmentation.
13. In unequal-sized partition, whenever a process arrives, it is placed into the
input queue of the smallest partition large enough to hold it. When this
partition becomes free, it is allocated to the process.
14. MVT (Multiprogramming with a Variable number of Tasks) is the
generalization of the fixed partitions technique in which the partitions can
vary in number and size. In this technique, the amount of memory
allocated is exactly the amount of memory a process requires.
15. In MVT, the wastage of the memory space is called external fragmentation
(also known as checker boarding) since the wasted memory is not a part
of any partition.
16. Whenever a process arrives and there are various holes large enough to
accommodate it, the operating system may use one of the algorithms to
select a partition for the process: first fit, best fit, worst fit and quick fit.
17. In multiprogramming environment, multiple processes are executed due to
which two problems can arise which are relocation and protection.
18. The relocation problem can be solved by equipping the system with a
hardware register called relocation register which contains the starting
address of the partition into which the process is to be loaded.
19. To protect the operating system from access by other processes and the
processes from one another, another hardware register called limit
register is used.
20. In non-contiguous allocation approach, parts of a single process can
occupy non-contiguous physical addresses.
21. Paging and segmentation are the memory management techniques based
on the noncontiguous allocation approach.
22. In paging, the physical memory is divided into fixed-sized blocks called
page frames and logical memory is also divided into fixed-size blocks
called pages which are of same size as that of page frames. The address
translation is performed using a mapping table, called page table.
23. To keep track of free and allocated page frames in physical memory, the
operating system maintains a data structure called a memory-map table
(MMT). An MMT is structured in the form of a static table, in which each
entry describes the status of each page frame, indicating whether it is free
or allocated. Another approach to keep track of free frames is to maintain
a list of free frames in the form of a linked list.
24. For structuring page table, there are different techniques, namely,
hierarchical paging, hash page table, and inverted page table.
25. In hierarchical paging technique, a hierarchy of page tables with several
levels is maintained. This implies that the logical address space is broken
down into multiple page tables at different levels.
26. In hashed table technique, a hash table is maintained in which each entry
contains a linked list of elements hashing to the same location. Each
element in linked list contains three fields: virtual page number, the value
of mapped page frame, and a pointer to the next element in the linked list.
27. An inverted page table contains one entry for each page frame of main
memory. Each entry consists of the virtual address of the page stored in
that page frame along with the information about the process that owns
that page. Hence, only one page table is maintained in the system for all
the processes.
28. Segmentation is a memory management scheme that implements the user
view of a program. In this scheme, the entire logical address space is
considered as a collection of segments with each segment having a
number and a length. To keep track of each segment, a segment table is
maintained by the operating system.
29. The idea behind the segmentation with paging is to combine the
advantages of both paging (such as uniform page size) and segmentation
(such as protection and sharing) together into a single scheme. In this
scheme, each segment is divided into a number of pages. To keep track
of these pages, a page table is maintained for each segment.
30. A memory management scheme called swapping can be used to increase
the CPU utilization. The process of bringing a process to memory and
after running for a while, temporarily copying it to disk is known as
swapping.
31. Overlaying is a memory management technique that allows a process to
execute irrespective of the system having insufficient physical memory.
The programmer splits a program into smaller parts called overlays in
such a way that no two overlays are required to be in main memory at the
same time. An overlay is loaded into memory only when it is needed.

EXERCISES
Fill in the Blanks
1. The mapping from addresses associated with a program to memory
addresses is known as _____________.
2. The division of logical memory into fixed size blocks is called
_____________.
3. _____________ is a hardware device which is situated in MMU used to
implement page table.
4. Each entry in an _____________ page table consists of the virtual
address of the page stored in that page frame along with the information
about the process that owns that page.
5. The process of bringing a process to memory and after running for a
while, temporarily copying it to disk is known as _____________.

Multiple Choice Questions


1. At what time, the address binding occurs if the process is supposed to
move from one memory segment to other during its execution.
(a) Compile time
(b) Load time
(c) Run time
(d) None of these
2. Which of the following memory management schemes suffer from internal
or external fragmentation?
(a) Multiple fixed-partition
(b) Multiple variable partition
(c) Paging
(d) Segmentation
3. In paging, the system can be equipped with a special hardware device
known as:
(a) Page table base register
(b) Translation look-aside buffer
(c) Memory map table
(d) None of these
4. Which of the following techniques is used for structuring a page table?
(a) Hierarchical paging
(b) Hashed page table
(c) Inverted page table
(d) All of these
5. Consider a paged memory system with 16 pages of 2048 bytes each in
logical memory and 32 frames in physical memory. How many bits will the
physical address comprise?
(a) 15
(b) 4
(c) 11
(d) 16

State True or False


1. In first fit algorithm, a partition close to the size required for a process is
allocated.
2. In paging, the pages and the page frames may not be of same size.
3. A memory map table is structured in the form of a static table, in which
each entry describes the status of each page frame, indicating whether it
is free or allocated.
4. The idea behind the segmentation with paging is to combine the
advantages of both paging (such as uniform page size) and segmentation
(such as protection and sharing) together into a single scheme.
5. A memory management scheme called overlaying, allows a process to
execute irrespective of the system having insufficient physical memory.

Descriptive Questions
1. Distinguish physical address and the logical address.
2. What is address binding? At what times does it take place?
3. Differentiate internal and external fragmentation.
4. Discuss the basic operation involved in paging technique with the help of
suitable diagram.
5. What do you mean by segmentation?
6. Consider the following memory map with a number of variable size
partitions.

Assume that initially, all the partitions are empty. How would each of the
first fit, best fit and the worst fit partition selection algorithms allocate
memory to the following processes arriving one after another?
(a) P1 of size 2M
(b) P2 of size 2.9M
(c) P3 of size 1.4M
(d) P4 of size 5.4M
Does any of the algorithms result in a process waiting because of
insufficient memory available? Also determine using which of the
algorithms the memory is most efficiently used?
7. Consider a paged memory system with 216 bytes of physical memory, 256
pages of logical address space, and a page size of 210 bytes, how many
bytes are in a page frame?
8. Can a process on a paged memory system access memory allocated to
some other process? Why or why not?
9. The operating system makes use of two different approaches for keeping
track of free pages frames in the memory. Discuss both of them. Which
one is better in terms of performance?
10. Discuss in detail the memory management strategies involving contiguous
memory allocation. Give suitable diagrams, wherever required.

11. Discuss the various techniques for structuring a page table.


12. What are the two major differences between segmentation and paging?
13. How does the segmentation scheme allow different processes to share
data or code?
14. Using the following segment table, compute the physical address for the
logical address consisting of segment and offset as given here.
(a) segment 0 and offset 193
(b) segment 2 and offset 546
(c) segment 3 and offset 1265
15. What is the idea behind combining segmentation with paging? When is it
useful?
16. On a system using paging and segmentation, the virtual address space
consists of up to 8 segments where each segment can up to 229 bytes
long. The hardware pages each segment into 256-byte pages. How many
bits in the virtual address specify the:

(i) Segment number


(ii) Page number
(iii) Offset within page
(iv) Entire virtual address

17. Write a short note on each of following:

(i) Swapping
(ii) Overlays
chapter 8

Virtual Memory

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the concept of virtual memory.
⟡ Implement virtual memory using demand paging.
⟡ Evaluate performance of demand paging.
⟡ Discuss how process creation and execution can be made faster
using copy-on-write.
⟡ Explain various page replacement algorithms.
⟡ Discuss allocation of frames to processes.
⟡ Explain thrashing along with its causes and prevention.
⟡ Discuss the demand segmentation, which is another way of
implementing virtual memory.
⟡ Understand the use of cache memory to increase the CPU
performance.
⟡ Explain the organization of cache memory.

8.1 INTRODUCTION
In Chapter 7, we discussed various memory management strategies.
All these strategies require the entire process to be in main memory
before its execution. Thus, the size of the process is limited to the
size of physical memory. To overcome this limitation, a memory
management scheme called overlaying can be used that allows a
process to execute irrespective of the system having insufficient
physical memory. This technique also suffers from a drawback that it
requires a major involvement of the programmer. Moreover, splitting a
program into smaller parts is time consuming.
This resulted in the formulation of another memory management
technique known as virtual memory. Virtual memory gives the
illusion that the system has much larger memory than actually
available memory. The basic idea behind this technique is that the
combined size of code, data and stack may exceed the amount of
physical memory. Thus, virtual memory frees programs from the
constraints of physical memory limitation. Virtual memory can be
implemented by demand paging or demand segmentation. Out of
these two ways, demand paging is commonly used as it is easier to
implement.

8.2 BACKGROUND
Whenever a program needs to be executed, it must reside in the
main memory. The programs with size smaller than the size of the
memory can be fitted entirely in the memory at once for execution.
However, the same is not possible for larger programs. Since, in real
life, most of the programs are larger than the size of the physical
memory, there must be some way to execute these programs. By
examining real life programs, it has been observed that it is rarely a
case that the entire program is required at once in the memory for
execution. In addition, some portion of the program is rarely or never
executed. For example, consider the following cases:
• Most of the programs consist of code segments that are written to
handle error conditions. Such a code is required to be executed
only when some error occurs during the execution of the program.
If the program is executed without any error, such code is never
executed. Thus, keeping such code segments in the memory is
merely wastage of memory.
• Certain subroutines of a program that provide additional
functionality are rarely used by the user. Keeping such
procedures in the memory also results in the wastage of memory.
A technique called virtual memory tends to avoid such wastage of
main memory. As per this technique, the operating system loads into
the memory only those parts of the program that are currently needed
for the execution of the process. The rest is kept on the disk and is
loaded only when needed. The main advantage of this scheme is that
the programmers get the illusion of much larger memory than
physical memory, thus, the size of the user program would no longer
be constrained by the amount of available physical memory. In
addition, since each user utilizes less physical memory, multiple users
are allowed to keep their programs simultaneously in the memory.
This results in increased utilization and throughput of the CPU.
Figure 8.1 illustrates the concept of virtual memory, where a 64M
program can run on a 32M system by loading the 32M in the memory
at an instant; the parts of the program are swapped between memory
and the disk as needed.

Fig. 8.1 Virtual Memory and Physical Memory

Note: In virtual memory systems, the logical address is referred to as virtual


address and logical address space is referred to as virtual address space.
8.3 DEMAND PAGING
As discussed earlier, the easiest way to implement virtual memory is
through demand paging, where a page is loaded into the memory
only when it is needed during program execution. Pages that are
never accessed are never loaded into the memory. A demand paging
system combines the features of paging with swapping. Thus, it
requires the same hardware as required in paging and swapping.
That is, it needs a secondary storage and page table.
• Secondary storage: To facilitate swapping, the entire virtual
address space of a process is stored contiguously on a
secondary storage device (usually, a disk). Whenever a process
is to be executed, an area on secondary storage device is
allocated to it on which its pages are copied. The area is known
as swap space of the process. During the execution of a
process, whenever a page is required, it is loaded into the main
memory from the swap space. Similarly, when a process is to be
removed from main memory, it is written back into the swap
space if it has been modified.
• Page table: To differentiate the pages that are in memory from
those on the disk, and an additional bit valid is maintained in
each page table entry to indicate whether the page is in the
memory or not. If a page is valid (that is, it exists in the virtual
address space of the process) and is in the memory, the
associated valid bit is set to 1, otherwise it is set to 0. Figure 8.2
shows the page table in demand paging system.
Note: In demand paging system, the term lazy swapping is used
instead of swapping as the page is never swapped into the memory
until it is needed.
Fig. 8.2 Page Table in Demand Paging System

Whenever a process requests for a page, the virtual address is


sent to the MMU. The MMU checks the valid bit in the page table
entry of that page. If the valid bit is 1 (that is, the requested page is in
the memory), it is accessed as in paging (discussed in Chapter 7).
Otherwise, the MMU raises an interrupt called page fault or a
missing page interrupt and the control is passed to the page fault
routine in the operating system.
To handle the page fault, the page fault routine first of all checks if
the virtual address for the desired page is valid from its PCB stored in
the process table. If it is invalid, it terminates the process giving error.
Otherwise, it takes the following steps.
1. Locates for a free page frame in the memory and allocates it to
the process.
2. Swaps the desired page into this allocated page frame.
3. Updates the process table and page table to indicate that the
page is in the memory.
After performing these steps, the CPU restarts from the instruction
that it left off due to the page fault.
Fig. 8.3 Handling a Page Fault

Note: In demand paging system, the process of loading a page in the memory
is known as page-in operation instead of swap-in. It is because the whole
process is not loaded; only some pages are loaded into the memory.

In an extreme case, the operating system could decide to start the


execution of a process with no pages in the memory. As soon as the
instruction pointer is set to the first instruction of the process, the
MMU immediately raises a page fault interrupt. Now, the missing
page (page containing the first instruction to be executed) is loaded
into the memory. This technique is known as pure demand paging,
which is based on the idea: ‘never bring a page until it is required’.
In some cases, programs access multiple pages from the memory
with each instruction. This results in multiple page faults in an
instruction. To avoid this and to attain a reasonable performance from
demand paging, programs should have locality of reference which
is discussed in Section 8.7.1.

Advantages
• It reduces the swap time since only the required pages are
swapped in instead of swapping the whole process.
• It increases the degree of multiprogramming by reducing the
amount of physical memory required for a process.
• It minimizes the initial disk overhead as initially not all pages are to
be read.
• It does not need extra hardware support.

8.3.1 Performance of Demand Paging


To determine how demand paging significantly affects the
performance of a computer system, we compute the effective
access time (EAT) for a demand-paged memory. The effective
memory access time can be computed as follows:

where
p (0 ≤ p ≤1) is the probability of a page fault. If p=0, there is no
page fault. However, p=1 implies that every reference is a page
fault. We could expect p to be close to zero, that is, there will be
only a few page faults.
ma is the memory access time
tpfh is page fault handling time
Note that if there are no page faults (that is, p=0), the EAT is
equal to the memory access time, as shown below:
For example, assuming memory access time of 20 nanoseconds
and page fault handling time of 8 milliseconds, EAT can be calculated
as:

We can observe that EAT is directly proportional to page-fault


rate, that is, more the number of page faults, more will be the EAT,
and in turn, poor will be the performance of demand-paged memory.
Thus, to reduce EAT and improve the performance, the number of
page faults must be reduced. Different operating systems use
different approaches to do this. For example, the Windows NT
operating system at the time of page fault loads the missing page as
well as some of the adjacent pages of the process into the main
memory, assuming that these pages will be referenced in the near
future. However, this method is not so gainful in case the preloaded
pages are not referenced by the process. In contrast, the Linux
operating system allows the process to specify the pages to be
preloaded at the time of page fault. This facility can help the
programmer to improve EAT and the performance of demand-paged
memory.

8.4 PROCESS CREATION


Demand paging enables a process to start faster by just loading the
page containing its first instruction in memory. Virtual memory
provides two other techniques, namely, copy-on-write and memory-
mapped files that make the process creation and execution faster.
8.4.1 Copy-on-Write
Recall from Chapter 2, a process may need to create several
processes during its execution and the parent and child processes
have their own distinct address spaces. If the newly created process
is the duplicate of the parent process, it will contain same pages in its
address space as that of parent. However, if the newly created
process need to load another program in its memory space,
immediately after creation, then the copying of the parent’s address
space may be unnecessary. To avoid copying, a technique made
available by virtual memory called copy-on-write can be employed.
In this technique, initially, parent and child processes are allowed
to share the same pages and these shared pages are marked as
copy-on-write pages. Now, if either process attempts to write on a
shared page, a copy of that page is created for that process. Note
that only the pages that can be modified (for example, pages
containing data) are marked as copy-on-write pages while the pages
that cannot be modified (for example, pages containing executable
code) are shared between parent and child processes.

Fig. 8.4 Implementing Copy-on-Write


Note: A number of operating systems including Windows XP, Linux and Solaris
support the use of copy-on-write technique.

To implement this, a bit is associated in page table entry of each


shared page to indicate that it is a copy-on-write page. Whenever
either of the processes, say child process, tries to modify a page, the
operating system creates a copy of that page, maps it to the address
space of the child process and turns off the copy-on-write bit. Now,
the child process can modify the copied page without affecting the
page of the parent process. Thus, copy-on-write technique makes the
process creation faster and conserves the memory.

8.4.2 Memory-Mapped Files


Every time a process needs to access a file on disk, it needs to use a
system call such as read(), write() and then access the disk. This
becomes very time consuming as disk access is much slower than
memory access. An alternative technique made available by virtual
memory is memory-mapped file that allows treating disk access as
memory access. The memory mapping of a file by a process means
binding that file to a portion of the virtual address space of the
process. This binding takes place at the time when the process
executes a memory map system call. A portion (equivalent to page
size in memory) of the desired file is mapped to a page in the virtual
memory of the process. After the file has been mapped, the process
can access the file pages in the same way as it would access other
pages in its address space thereby enhancing performance. Figure
8.5 shows the memory mapping of a file myfile by a process P.
Fig. 8.5 Memory-mapped File

Note that if the pages of process P that do not belong to the


mapped file myfile are to be paged-in or out of the physical memory,
the virtual memory handler performs this job using the swap space of
process P. On the other hand, in case of read/write to file myfile, the
file system is used in combination with virtual memory handler.
Whenever the process, during execution, writes/modifies data in
mapped pages of the file, the new/modified data is not immediately
written to the file on the disk. In some systems, the file on the disk is
modified when the operating system periodically checks that any of
the mapped pages of the file has been modified; whereas other
systems may choose to write to the physical file when the page frame
containing the modified page needs to be evicted. Moreover, when
the process executes a memory unmap system call, the pages of
mapped file are removed from the virtual address space of the
process and the modified data (if any) is written to the file on disk.

8.5 PAGE REPLACEMENT


When a page fault occurs, page fault routine locates a free page
frame in the memory and allocates it to the process. However, there
is a possibility that the memory is full, that is, no free frame is
available for allocation. In that case, the operating system has to evict
an existing page from the memory to make space for the desired
page to be swapped in. The page to be evicted will be written to the
disk depending on whether it has been modified or not. If the page
has been modified while in the memory, it is rewritten to the disk;
otherwise no rewrite is needed.
To keep track whether the page has been modified, a modified
(M) bit (also known as dirty bit) is added to each page table entry.
This bit indicates whether the page has been modified. When a page
is first loaded into the memory, this bit is cleared. It is set by the
hardware when any word or byte in the page is written into. At the
time of page replacement, if dirty bit for a selected page is cleared, it
implies that the page has not been modified since it was loaded into
the memory. The page frame is written back to the swap space only if
dirty bit is set.
The system can select a page frame at random and replace it by
the new page. However, if the replaced page is accessed frequently,
then another page fault would occur when the replaced page is
accessed again resulting in degradation of system performance.
Thus, there must be some policy to select a page to be evicted. For
this, there are various page replacement algorithms, which can be
evaluated by determining the number of page faults using a reference
string. A reference string is an ordered list of memory references
made by a process. It can be generated by a random-number
generator or recording the actual memory references made by an
executing program. To illustrate the page replacement algorithms,
consider the reference string as shown in Figure 8.6 for the memory
with three page frames. For simplicity, instead of actual memory
references, we have considered only the page numbers.

Fig. 8.6 Reference String


8.5.1 FIFO Page Replacement
The first-in, first-out (FIFO) is the simplest page replacement
algorithm. As the name suggests, the first page loaded into the
memory is the first page to be replaced. That is, the page is replaced
in the order in which it is loaded into the memory.
To illustrate the FIFO replacement algorithm, consider our
example reference string shown in Figure 8.6. Assuming that initially
all the three frames are empty, the first two references made to page
5 and 0 cause page faults. As a result, they are swapped in the
memory. The third reference made to page 5 does not cause page
fault as it is already in the memory. The next reference made to page
3 causes a page fault and that page is brought in the memory. The
reference to page 2 causes a page fault which results in the
replacement of page 5 as it is the oldest page. Now, the oldest page
is 0, so reference made to page 5 will replace page 0. This process
continues until all the pages of reference string are accessed. It is
clear from Figure 8.7 that there are nine page faults.
To implement this algorithm, each page table entry includes the
time (called swap-in time) when the page was swapped in the
memory. When a page is to be replaced, the page with the earliest
swap-in time is replaced. Alternatively, a FIFO queue can be created
to keep track of all the pages in the memory with the earliest one at
the front and the recent at the rear of the queue. At the time of page
fault, the page at the front of the queue is removed and the newly
arrived page is added to the rear of the queue.
Fig. 8.7 FIFO Replacement Algorithm

The FIFO page replacement algorithm is easier to implement as


compared to all other replacement algorithms. However, it is rarely
used as it is not very efficient. Since it does not consider the pattern
of the usage of a page, a frequently used one may be replaced
resulting in more page faults.

Belady’s Anomaly
FIFO page replacement algorithm suffers from Belady’s anomaly—a
situation in which increasing the number of page frames results in
more page faults. To illustrate this, consider the reference string
containing five pages, numbered 2 to 6 (see Figure 8.8). From this
figure, it is clear that with three page frames, a total of nine page
faults occur. On the other hand, with four page frames, a total of ten
page faults occur.
Fig. 8.8 Belady’s Anomaly

8.5.2 Optimal Page Replacement


Optimal page replacement (OPT) algorithm is the best possible page
replacement algorithm. The basic idea behind this algorithm is that
whenever a page fault occurs, some pages are in memory; out of
these pages, one will be referenced at the next instruction while other
pages may not be referenced until the execution of certain number of
instructions. In case of page replacement, the page that is referenced
at last will be replaced. That is, the page to be referenced in the most
distant future is replaced. For this, each page can be labeled in the
memory with the number of instructions to be executed before that
page is referenced for the first time. The page with the highest label is
replaced from the memory.
To illustrate this algorithm, consider our sample reference string
(see Figure 8.6). Like FIFO, the first two references made to page 5
and 0 cause page faults. As a result, they are swapped into the
memory. The third reference made to page 5 does not cause page
fault as it is already in the memory. The reference made to page 3
causes a page fault and thus is swapped into the memory. However,
the reference made to page 2 replaces page 3 because page 3 is
required at the last instruction whereas pages 5 and 0 are required at
next instructions. The page faults and the pages swapped in and
swapped out for all the page references are shown in Figure 8.9. This
algorithm causes seven page faults.

Fig. 8.9 Optimal Page Replacement Algorithm

The advantage of this algorithm is that it causes least page faults


as compared to other algorithms. The disadvantage of this algorithm
is that its implementation requires prior knowledge of which page will
be referenced next. Though this algorithm is not used in systems
practically, it is used as the basis for comparing performance of other
algorithms.
Note: To implement OPT, a program can be executed on a simulator
and all the page references are recorded. Using the page reference
records obtained during first run, it can be implemented at the second
run.

8.5.3 LRU Page Replacement


The least recently used (LRU) algorithm is an approximation to the
optimal algorithm. Unlike optimal algorithm, it uses the recent past
behavior of the program to predict the near future. It is based on the
assumption that the page that has been used in the last few
instructions will probably be referenced in the next few instructions.
Thus, it replaces the page that has not been referenced for the
longest time.
Consider the same sample reference string of Figure 8.6. As a
result of LRU algorithm, the page faults and the pages swapped in
and swapped out for all the page references are shown in Figure
8.10. Up to five references, page faults are the same as that of
optimal algorithm. When a reference is made to page 2, page 0 is
replaced as it was least recently used. However, after page 5, it is
being used again leading to a page fault. Regardless of this, the
number of page faults is eight which is less than in case of FIFO.

Fig. 8.10 LRU Page Replacement Algorithm

One way to implement LRU is to maintain a linked list of all the


pages in the memory; with the most recently used page kept at the
head and the least recently used page kept at the tail of the list.
Whenever a page is to be replaced, it is deleted from the tail of the
linked list and the new page is inserted at the head of the linked list.
The problem with this implementation is that it requires updating the
list at every page reference irrespective of whether the page fault
occurs or not. It is because whenever a page in the memory is
referenced, being the most recent page, it is removed from its current
position and inserted at the head of the linked list. This results in
extra overhead.
Alternatively, the hardware can be equipped with a counter. This
counter is incremented by one after each instruction. The page table
has a field to store the value of the counter. Whenever a page is
referenced, the current value of the counter is copied to that field in
the page table entry for that page. Whenever a page is to be
replaced, this algorithm searches the page table for the entry having
the lowest counter value (means the least recently used page) and
replaces that page.
Clearly, it has less page faults as compared to FIFO algorithm.
Moreover, it does not suffer from Belady’s anomaly. Thus, it is an
improvement on FIFO algorithm and is used in many systems. The
disadvantage of this algorithm is that it is time consuming when
implemented using linked list. Otherwise, it needs extra hardware
support for its implementation.
Note: Both the optimal and LRU algorithms belong to a special class
of page replacement algorithms called stack algorithms that never
exhibit Belady’s anomaly.
Example 1 Consider the following page reference string:
1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 1, 2, 3, 6
How many page faults would occur for the following replacement
algorithms, assuming one, two, three and four frames?

(i) LRU replacement


(ii) FIFO replacement
(iii) Optimal replacement

Remember that all frames are initially empty, so your first unique
pages will all cost one fault each.
Solution In case of one page frame, each page reference causes a
page fault. As a result, there will be 20 page faults for all the three
replacement algorithms.
Page frames =2
LRU replacement causes 18 page faults as shown in the figure given
below.

FIFO replacement also causes 18 page faults (see the figure given
below).

Optimal replacement, on the other hand, causes only 15 page faults


(see the figure given below).
Page frames =3
LRU replacement causes 15 page faults as shown in figure given
below.

FIFO replacement causes 16 page faults as shown in figure given


below.
Optimal replacement causes 11 page faults as shown in figure given
below.

Page frames = 4
LRU replacement causes 10 page faults as shown in figure given
next.

FIFO replacement causes 14 page faults as shown in figure given


below.
Optimal replacement causes 8 page faults as shown in figure given
below.

8.5.4 Second Chance Page Replacement


The second chance page replacement algorithm (sometimes also
called clock algorithm) is a refinement over FIFO algorithm. It
replaces the page that is both the oldest as well as unused instead of
the oldest page that may be heavily used. To keep track of the usage
of the page, it uses the reference bit (R) which is associated with
each page. This bit indicates whether the reference has been made
to the page while it is in the memory or not. It is set whenever a page
is accessed for either reading or writing. If this bit is clear for a page
that means this page is not being used.
Whenever a page is to be replaced, this algorithm uses the FIFO
algorithm to find the oldest page and inspects its reference bit. If this
bit is clear, the page is both the oldest and unused and thus,
replaced. Otherwise, the second chance is given to this page and its
reference bit is cleared and its load time is set to the current time.
Then the algorithm moves to the next oldest page using FIFO
algorithm. This process continues until a page is found whose
reference bit is clear. If the reference bit of all the pages is set (that is,
all the pages are referenced), then this algorithm will proceed as pure
FIFO.
This algorithm is implemented using a circular linked list and a
pointer that points to the next victim page. Whenever a page is to be
replaced, the list is traversed until a page whose reference bit is clear
is found. While traversing, the reference bit of each examined page is
cleared. When a page whose reference bit is clear is found, that page
is replaced with the new page and the pointer is advanced to the next
page. For example, consider the reference string with reference bits
shown in Figure 8.11. The algorithm starts with page 5, say at time
t=18. Since the reference bit of this page is set, its reference bit is
cleared and time is reset to the current system time as though it has
just arrived in the memory. The pointer is advanced to the next page
that is page 0. The reference bit of this page is clear, so it is replaced
by the new page. The pointer is advanced to the page 3 which will be
the starting point for the next invocation of this algorithm.
Fig. 8.11 Second Chance (Clock) Page Replacement Algorithm

8.5.5 Counting-Based Page Replacement Algorithm


Besides the page replacement algorithms discussed earlier, there are
several other algorithms. Some of them keep record of how often
each page has been referenced by associating a counter with each
page. Initially, the value of this counter is 0, which is incremented
every time a reference to that page is made. That is, the counter
counts the number of references that have been made to each page.
A page with the highest value of the counter is heavily used while for
a page with the lowest value of counter, there are following
interpretations.
• That page is least frequently used.
• That page has been just brought in and has yet to be used.
The algorithm based on the first interpretation is known as least
frequently used (LFU) page replacement algorithm. In this
algorithm, when a page is to be replaced, the page with lowest value
of counter is chosen for replacement. Clearly, the page that is heavily
used is not replaced. The problem with this algorithm arises when
there is a page that was used heavily initially, but afterwards never
used again. For example, in a multipass compiler, some pages are
used heavily during pass 1; thereafter, they may not be required. Still,
these pages should not be replaced as they have high value of
counter. Thus, this algorithm may replace useful pages instead of
pages that are not required. The algorithm that is based on the
second interpretation is called the most frequently used (MFU)
page replacement algorithm.
Both these algorithms are not commonly used, as their
implementation is expensive. Moreover, they do not approximate the
OPT page replacement algorithm.

8.6 ALLOCATION OF FRAMES


We know that physical memory is divided into page frames and, at a
time, there can be limited number of frames for allocation. In a single-
user system, memory allocation is simple as all free frames can be
allocated to a single process. However, in multiprogramming systems
where a number of processes may reside in main memory at the
same time, the free frames must be divided among the competing
processes. Thus, a decision is to be made on the number of frames
that should be allocated to each process.
A common approach for allocating frames to processes is to use
an algorithm. We can use several allocation algorithms; all algorithms
are constrained in the following ways:
• Each process must be allocated at least some minimum number
of frames; the minimum amount is defined by the computer
architecture.
• The maximum number of frames to be allocated to runnable
processes cannot exceed the total number of free frames.

Allocation Algorithms
Two ccommon algorithms used to divide free frames among
competing processes are as follows:
• Equal allocation: This algorithm allocates available frames to
processes in such a way that each runnable process gets an
equal share of frames. For example, if p frames are to be
distributed among q processes, then each process will get p/q
frames. Though this algorithm seems to be fair, it does not work
well in all situations. For example, let us consider we have two
processes P1 and P2 and the memory requirement of P1 is much
higher than P2. Now, allocating equal number of frames to both P1
and P2 does not make any sense as it would result in wastage of
frames. This is because the process P2 might be allocated more
number of frames than it actually needs.
• Proportional allocation: This algorithm allocates frames to each
process in proportion to its total size. To understand this
algorithm, let us consider F as the total number of available
frames and vi as the amount of virtual memory required by a
process pi. Therefore, the overall virtual memory, V, required by
all the running processes and the number of frames (ni) that
should be allocated to a process pi can be calculated as:

Note: The value of ni should be adjusted to an integer greater than


the minimum number of frames and the sum of all ni should not
exceed f.
Example 2 Consider a system with 2 KB frame size and 48 free
frames. If there are two runnable processes P1 and P2 of sizes 20 KB
and 160 KB, respectively, how many frames would be allocated to
each process in case of?
(a) Equal allocation algorithm
(b) Proportional allocation algorithm
Solution As the frame size is 2 KB, P1 requires 10 frames and P2
requires 80 frames. Thus, a total of 90 frames are required to run
both P1 and P2. As the total number of free frames are 48, then
(a) According to equal allocation algorithm, both P1 and P2 will be
allocated 24 frames each.
(b) According to proportional allocation algorithm, P1 will be allocated

5 frames (that is, × 48) and P2 will be allocated 42 frames (that

is, × 48).
Example 3 Consider a system with 1 KB frame size. If a process P1 of
size 10 KB and a process P2 of size 127 KB are running in a system
having 62 free frames, then how many frames would be allocated to
each process in case of?
(a) Equal allocation algorithm
(b) Proportional allocation algorithm
Which allocation algorithm is more efficient?
Solution Since the frame size is 1 KB, P1 requires 10 frames and P2
requires 127 frames. Thus, a total of 137 frames are required to run
both P1 and P2. As the total number of free frames are 62, then
(a) According to equal allocation algorithm, both P1 and P2 will be
allocated 31 frames each.
(b) According to proportional allocation algorithm, will be allocated 4

frames (that P1 is, × 62) and P2 will be allocated 57 frames

(that is, × 62).


Since the process P1 does not need more than 10 frames, giving it
more frames using equal allocation algorithm is wastage of frames.
However, proportional allocation algorithm allocates available frames
to each process according to its size. Thus, it is more efficient than
equal allocation algorithm.

Global versus Local Allocation


The allocation of frames to processes is greatly affected by page
replacement algorithm. Recall from Section 8.5, a page replacement
algorithm is used to replace a page from the set of allocated frames,
in case of page fault. The algorithm used for page replacement may
belong to either of the two categories, namely, global replacement
and local replacement. In global replacement algorithm, the victim
page is selected from the set of frames allocated to any of the
multiple running processes. Thus, global algorithm allows a process
to take frames from any other process that means the number of
frames allocated to a process may change from time to time.
On the other hand, local replacement algorithm allows selecting
the victim page from the set of frames allocated to the faulty process
(that caused page fault) only. Thus, the number of frames allocated to
a process does not change.

8.7 THRASHING
When a process has not been allocated as many frames as it needs
to support its pages in active use, it causes a page fault. To handle
this situation, some of its pages should be replaced. But since all its
pages are being actively used, the replaced page will soon be
referenced again thereby causing another page fault. Eventually,
page faults would occur very frequently, replacing pages that would
soon be required to be brought back into memory. As a result, the
system would be mostly busy in performing paging (page-out, page-
in) rather than executing the processes. This high paging activity is
known as thrashing. It results in poor system performance as no
productive work is being performed during thrashing.
The system can detect thrashing by evaluating CPU utilization
against the degree of multiprogramming. Generally, as we increase
the degree of multiprogramming, CPU utilization increases. However,
this does not always hold true. To illustrate this, consider the graph
shown in Figure 8.12 that depicts the behaviour of paging systems.
Initially, the CPU utilization increases with increase in degree of
multiprogramming. It continues to increase until it reaches its
maximum. Now, if the number of running processes is still increased,
CPU utilization drops sharply. To enhance CPU utilization at this
point, the degree of multiprogramming must be reduced.

Fig. 8.12 Behaviour of Paging Systems

To understand why it happens, consider a paging system with few


processes running on it; and the system uses global page
replacement algorithm. The operating system continuously observes
the system’s CPU utilization. When the CPU utilization is low, it
attempts to improve it by increasing the degree of multiprogramming,
that is, by starting new processes. Now, suppose a running process
requires more frames to continue its execution and thus, causes a
page fault. If all in-memory pages of the faulty process are in active
use, the page replacement algorithm selects the page to be replaced
from the set of frames allocated to other processes. When these
processes require the replaced pages, they also cause page faults,
taking frames from other processes.
Thus, at a time, most processes are waiting for their pages to be
brought in memory and as a result, CPU utilization drops. To improve
the still-lowering CPU utilization, the operating system again
increases the degree of multiprogramming. The new processes are
started by freeing frames from already running processes thereby
resulting in more page faults. The operating system still tries to
improve utilization by increasing degree of multiprogramming.
Consequently, thrashing sets in the system, decreasing the CPU
utilization and, in turn, the system throughput. Now, at this point, the
system must reduce the degree of multiprogramming in order to
eliminate thrashing and improve CPU utilization.

8.7.1 Locality
Thrashing can be prevented if each process is allocated as much
memory (frames) as it requires. But how should the operating system
know the memory requirement (number of frames required) of a
process. The solution to this problem is influenced by two opposing
factors: over-commitment and under-commitment of memory. If a
process is allocated more number of frames (over-commitment)
than it requires, only few page faults would occur. The process
performance would be good; however, the degree of
multiprogramming would be low. As a result, CPU utilization and
system performance would be poor. In contrast, under-commitment
of memory to a process causes high page-fault rate (as discussed
earlier) which would result in poor process performance. Thus, for
better system performance, it is necessary to allocate appropriate
number of frames to each process.
A clue about the number of frames needed by a process can be
obtained using the locality model of a process execution. Locality
model states that while a process executes, it moves from locality to
locality. Locality is defined as the set of pages that are actively used
together. It is a dynamic property in the sense that the identity of the
particular pages that form the actively used set varies with time. That
is, the program moves from one locality to another during its
execution.
Note: Localities of a process may coincide partially or wholly.
The principle of locality ensures that not too many page faults
would occur if the pages in the current locality of a process are
present in the memory. However, it does not rule out page faults
totally. Once all pages in the current locality of a process are in the
memory, page fault would not occur until the process changes
locality. On the other hand, if a process has not been allocated
enough frames to accommodate its current locality, thrashing would
result.

8.7.2 Working Set Model


Working set model is an approach used to prevent thrashing; it is
based on the assumption of locality. It uses a parameter (say, n) to
define the working set of a process, which is the set of pages that a
process has referenced in the latest n page references. The notion of
working set helps the operating system to decide how many frames
should be allocated to a process.
Since the locality of process changes from time to time, so also
the working set. At a particular instant of time, a page in active use is
included in the working set while a page that was referenced before
the most recent n references is not included. Note that the
performance of working set strategy depends to a greater extent on
the value of n. Too large a value would result in over-commitment of
memory to a process. The working set may contain those pages
which are not supposed to be referenced. In contrast, too small a
value would cause under-commitment of memory, which in turn
results in high page-fault rate, and consequently thrashing. Thus, the
value of n must be carefully chosen for the accuracy of working set
strategy.
The most important property of working set is its size, as it
indicates the number of frames required by a process. Knowledge of
the size of the working set of each process helps in computing the
total number of frames required by all the running processes. For
example, if WSSi denotes the size of the working set of a process Pi at
time t, then the total number of frames required (say, V) at time t can
be calculated as:
V = Σ WSSi

Thrashing can be prevented by ensuring V ≤ F, where F denotes


the total number of available frames in the memory at time t.
The idea behind the working set strategy is to have the working
set of processes in the memory at all times in order to prevent
thrashing. For this, the operating system continuously monitors the
working set of each running process and allocates enough frames to
accommodate its working set size. If some frames are still left, the
operating system may decide to increase the degree of
multiprogramming by starting a new process. On the other hand, if at
any instant the operating system finds V > F, it randomly selects
some process and suspends its execution thereby decreasing the
degree of multiprogramming. In totality, the degree of
multiprogramming is kept as high as possible and thus, working set
strategy results in the optimum utilization of the CPU.

8.7.3 Page-fault Frequency (PFF)


PFF is another approach to prevent thrashing that takes into account
the page-fault rate of a process. As we know that a process with high
page-fault rate means there are no enough frames allocated to the
process and thus, more frames should be allocated. On the other
hand, low page-fault rate means the process has been allocated
excess number of frames and thus, some frames can be removed.
The PFF approach provides an idea of when to increase or decrease
the frame allocation.
Figure 8.13 depicts the desirable page-fault characteristic, which
is a graph of page-fault rate against the number of frames allocated
to the process. It is clear from the figure that the page-fault rate
decreases monotonically as we increase the number of frames. To
control page fault, the PFF approach establishes an upper and lower
limit on the page-fault rate of the processes (Figure 8.13). If a
process during its execution crosses the upper limit, more frames are
allocated to it. If there are no free frames, the operating system must
select some process and suspend its execution. The freed frames are
then distributed among the processes with high page-fault rates.
Conversely, if the page-fault rate of a process goes down the lower
limit, some frames are removed from it. This way the page-fault rate
of the processes can be controlled and as a result, thrashing can be
prevented.

Fig. 8.13 Page-fault Frequency

8.8 DEMAND SEGMENTATION


Though demand paging is considered as the most efficient virtual
memory implementation, it requires a significant amount of hardware.
Some processor architectures, such as OS/2 from IBM running on
Intel 80286 processor, do not have enough hardware support for
implementing demand paging. Such systems implement virtual
memory in the form of demand segmentation. Unlike demand paging,
where a page frame of fixed size is brought into the main memory, in
demand segmentation, a segment of variable size is brought into the
memory. The working set of segmentation should include at least one
each of code, data, and stack segments.
To indicate whether a segment is present in the main memory, a
valid bit is associated with each segment stored in the segment table.
If it is present in the main memory, the processor can directly
reference its elements from the segment block; if it is not present,
segment fault exception is generated, which halts the execution of
the current process. Now, the operating system brings the desired
segment in the main memory, and the process resumes its execution
from the same instruction that caused the segment fault. Note that if
sufficient space is not available in the memory to accommodate the
desired segment, the operating system executes the segment
replacement algorithm to make space for the desired segment.
Note: When a segment is brought into the memory, the entire
segment is carried or nothing at all.

Advantages and Disadvantages


The main advantage of demand segmentation is that it inherits the
benefits of protection and sharing provided by segmentation (as
discussed in Section 7.5.2). However, its main drawback is that since
the segments are of variable sizes, it is difficult to create appropriate-
sized memory blocks for them in the main memory. Thus, segment
replacement algorithms in case of demand segmentation are quite
complex. In addition, sometimes a problem may arise when the
segment size is too large to be fitted in the physical memory. For
handling such cases, some systems combine segmentation with
paging in which segment is further divided into fixed-size pages that
can be easily loaded into the fixed-size frames in the main memory.

8.9 CACHE MEMORY ORGANIZATION


Cache is a small but very high speed memory that aims to speed up
the memory access operation. It is placed between the CPU and the
main memory (see Figure 8.14). The cache memory is organized to
store the frequently accessed data and instructions so that the
average memory access time can be reduced, thereby reducing the
total execution time of the program. The cache memory tends to
reduce the access time of main memory by a factor of 5 to 10.

Fig. 8.14 Cache Memory

The cache organization is concerned with the transfer of


information between CPU and main memory. Whenever the
processor needs to access any memory word, first of all, the word is
searched in the cache memory. If the required word is present in the
cache (referred to as cache hit), it is read from there. However, if the
word is not found in cache (referred to as cache miss), then the main
memory is accessed to read the word. The recently accessed block
of words that has been accessed from main memory is then
transferred to the cache memory so that the next time if this word is
required, it can be accessed from cache, thus, reducing the memory
access time. The block size can vary from one word to 16 words
adjacent to the required word. If cache is already full, then it is
overwritten, else the contents are appended.

8.9.1 Terminologies Related to Cache


Various terminologies related to cache are discussed in this section.
• Cache hit: Whenever a read operation is to be performed, the
cache memory is examined at first. If the required data are found
in cache, a cache hit is said to have occurred. In that case, the
processor immediately fetches the word from the cache and
performs the operation.
• Cache miss: If the data are not found in cache, it is termed as
cache miss.
• Cache hit time: The time taken to access the data from cache in
case of cache hit is known as cache hit time.
• Cache miss time penalty: In case of cache miss, the required
data are to be fetched from the main memory, transferred to
cache and then accessed. The time required to fetch the required
block from the main memory is called cache miss time penalty.
• Hit ratio: It is defined as the number of cache hits divided by the
total number of CPU references (including both cache hits and
cache misses), as given here:

The hit ratio always lies in the close interval of 0 and 1. A


sufficiently high value of hit ratio indicates that most of the time
the processor accesses cache memory instead of the main
memory. Thus, the access time becomes approximately equal to
the access time of the cache memory.
• Miss ratio: Miss ratio is defined as the ratio of number of cache
misses divided by the total number of CPU references.

Alternatively, it can be defined as the subtraction of hit ratio from


1, as given here:
Miss ratio = 1 – hit ratio

8.9.2 Impact on Performance


When cache memory is employed between the CPU and the main
memory, the CPU first examines the cache for the required word. In
case of cache hit, the CPU accesses only the cache and the main
memory is not accessed. On the other hand, in case of cache miss,
the CPU has to access the main memory as well as the cache. Let tC
and tM denote the time taken to access cache and main memory,
respectively, and h denote the hit ratio. Then, the average access
time (say, tA) can be computed using the following equation:

where (1 - h) denotes the miss ratio.


A high number of cache misses decreases the performance of
CPU. Thus, to increase the CPU performance, the number of cache
misses must be reduced. This can be accomplished by increasing the
size of cache memory. A larger cache can hold more information;
therefore, it is more likely that the CPU finds the required data in the
cache itself. As a result, the number of cache hits increases and the
average access time is reduced. Thus, a large cache can help in
improving the performance of CPU.
Unfortunately, in some cases, large cache sizes can even lower
the performance. With the large cache size, the number of locations
in cache also increases and the CPU has to search for the required
piece of information through all these locations. Thus, the larger size
also tends to slow down the cache speed. Therefore, the cache size
should be optimum so that the performance of the CPU can be
maintained.

8.9.3 Advantages and Disadvantages of Cache Memory


Cache memory is used for accessing of data speedily or to minimize
the average access time. It is used to compensate for mismatch in
the speed of the main memory and the CPU. Cache memory has
several advantages. Some of these are as follows.
• CPU has direct access to cache memory. Thus, the data that
resides in cache can be accessed without any delay and at less
processing cost.
• Cache increases the performance and the processing speed of the
CPU.
• In the case of shared-memory multiprocessor architecture, where
all the processors share a common memory of the server, but
have their own private caches, the reliability is increased. This is
because when the server is down, all the processors can directly
access the frequently accessed data from their own caches.
• The use of cache results in increased CPU utilization due to faster
accessing of data.
Although cache memory has several advantages, there are some
disadvantages as well. Some of them are as follows.
• Since cache is small in size, the cost per bit of storing data is high.
Thus, the use of cache memory increases the hardware cost.
• Cache suffers from the problem of incoherency, that is,
inconsistent data in case of multiprocessor system. This is
because each of the processors has its own private cache along
with shared resources. If the same value of data exists in multiple
caches, the changes made in one cache are required to be made
in the other copies of cache also. However, absence of identical
changes creates the problem of inconsistent data.
• The cache memory suffers from the lack of synchronization in the
client-server system, when multiple accesses can be made on the
database. This problem occurs when there is no proper
integration between the cache and the RDBMS used. The single
access on the database does not create a problem. However,
another access by a different application can cause problem.

LET US SUMMARIZE
1. Virtual memory is a technique that enables to execute a program which is
only partially in memory. The virtual memory can be implemented by
demand paging or demand segmentation.
2. In demand paging, a page is loaded into the memory only when it is
needed during program execution. Pages that are never accessed are
never loaded into the memory.
3. Whenever a process requests for a page and that page is not in the
memory then MMU raises an interrupt called page fault or a missing page
interrupt.
4. A reference string is an ordered list of memory references made by a
process.
5. A technique made available by virtual memory called copy-on-write makes
the process creation faster and conserves memory.
6. The first-in, first-out (FIFO) is the simplest page replacement algorithm. As
the name suggests, the first page loaded into the memory is the first page
to be replaced.
7. The optimal page replacement (OPT) algorithm is the best possible page
replacement algorithm in which the page to be referenced in the most
distant future is replaced.
8. The least recently used (LRU) algorithm is an approximation to the optimal
algorithm in which the page that has not been referenced for the longest
time is replaced.
9. The second chance page replacement algorithm (sometimes also referred
to as clock algorithm) is a refinement over FIFO algorithm; it replaces the
page that is both the oldest as well as unused, instead of the oldest page
that may be heavily used.
10. The least frequently used (LFU) algorithm replaces the page that is least
frequently used.
11. The most frequently used (MFU) algorithm replaces the page that has
been just brought in and has yet to be used.
12. In multiprogramming systems where a number of processes may reside in
the main memory at the same time, the free frames must be divided
among the competing processes. Thus, a decision is to be made on the
number of frames that should be allocated to each process.
13. Two common algorithms used to divide free frames among competing
processes include equal allocation and proportional allocation algorithm.
14. Equal allocation algorithm allocates available frames to the processes in
such a way that each runnable process gets an equal share of frames
while proportional allocation algorithm allocates frames to each process in
proportion to its total size.
15. A situation when the system is mostly busy in performing paging (page-
out, page-in) rather than executing the processes is known as thrashing. It
results in poor performance of the system as no productive work is
performed during thrashing.
16. A clue about the number of frames needed by a process can be obtained
using the locality model of a process execution. The locality model states
that while a process executes, it moves from locality to locality. Locality is
defined as the set of pages that are actively used together.
17. Working set model is an approach used to prevent thrashing, and is based
on the assumption of locality. It uses a parameter (say, n) to define the
working set of a process, which is the set of pages that a process has
referenced in the latest n page references.
18. PFF is another approach to prevent thrashing that takes into account the
page-fault rate of a process. This approach provides an idea of when to
increase or decrease the frame allocation.
19. In demand segmentation, a segment of variable size is brought into the
memory. The working set of segmentation should include at least one
each of code, data, and stack segments.
20. The main advantage of demand segmentation is that it inherits the benefits
of protection and sharing provided by segmentation.
21. Cache is a small but very high-speed memory that aims to speed up the
memory access operation. It is placed between the CPU and the main
memory.
22. The cache organization concerns itself with the transfer of information
between the CPU and the main memory.

EXERCISES
Fill in the Blanks
1. _____________ is a technique that enables to execute a program which is
only partially in memory.
2. Whenever a process is to be executed, an area on secondary storage
device is allocated to it on which its pages are copied. The area is known
as _____________ of the process.
3. A _____________ is an ordered list of memory references made by a
process.
4. The algorithm based on the interpretation that page has just brought in
and has yet to be used is _____________.
5. The system can detect thrashing by evaluating _____________ against
the _____________.

Multiple Choice Questions


1. Which of the following memory management strategies combines the
features of paging system with swapping?
(a) Demand segmentation
(b) Demand paging
(c) Both (a) and (b)
(d) None of these
2. Which of the following algorithms suffers from Belady’s anomaly?
(a) Optimal page replacement
(b) FIFO page replacement
(c) LRU page replacement
(d) None of these
3. The set of pages that are actively used together is known as
_____________.
(a) Locality
(b) Thrashing
(c) Cache
(d) None of these
4. Which of the following approaches is used to prevent thrashing?
(a) Equal allocation
(b) Working set model
(c) Page-fault frequency
(d) Both (b) and (c)
5. If the data is not found in cache, it is termed as _____________.
(a) Cache hit
(b) Cache miss
(c) Cache hit time
(d) Hit ratio

State True or False


1. The basic idea behind virtual memory is that the combined size of code,
data and stack cannot exceed the amount of physical memory.
2. The purpose of copy-on-write technique is to conserve memory by
avoiding copying of parent’s address space.
3. The LRU page replacement algorithm is also referred to as clock
algorithm.
4. Each process must be allocated at least some minimum number of
frames; the minimum amount is defined by the memory manager.
5. Cache memory is used for accessing the data speedily or to minimize the
average access time.

Descriptive Questions
1. Explain the concept of virtual memory.
2. What is demand paging? What are its advantages? Explain how it affects
the performance of a computer system.
3. When does a page fault occur? Mention the steps that are taken to handle
page fault.
4. Discuss the hardware support for demand paging.
5. Explain the algorithm used to minimize number of page faults.
6. Explain process creation.
7. ‘Copy-on-write technique makes the creation of process faster and
conserves memory.’ Explain.
8. What is the need of page replacement algorithms?
9. What is Belady’s anomaly? Does LRU replacement algorithm suffer from
this anomaly? Justify your answer with an example.
10. What are memory-mapped files?
11. Discuss the advantages and disadvantages of optimal page replacement
algorithm.
12. Compare LRU and FIFO page replacement algorithms.
13. Which algorithm is used as the basis for comparing performance of other
algorithms?
14. Discuss the two algorithms used for allocating physical frames to
processes.
15. Differentiate between global and local allocation.
16. How does the system keep track of modification of pages?
17. Write a short note on demand segmentation. How is it different from
demand paging?
18. Consider the following reference string consisting of 7 pages from 0 to 6.

Determine how many page faults would occur in the following algorithms:
(a) FIFO replacement
(b) Optimal replacement
(c) LRU replacement assuming one, two, three, and four frames.
19. Consider Figure 8.11(b) and suppose that R bits for the pages are 111001.
Which page will be replaced using second chance replacement
algorithm?
20. What will be the effect of setting the value of parameter n (in working-set
model) either too low or too high on the page-fault rate?
21. What is thrashing? Explain the approaches that can be used to prevent
thrashing.
22. What is cache memory? Explain its organization. Also, list its advantages
and disadvantages.
chapter 9

I/O Systems

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the basics of I/O hardware.
⟡ Explain how various I/O services are embodied in the application I/O
interface.
⟡ Discuss the services provided by the kernel I/O subsystem.
⟡ Describe STREAMS mechanism of the UNIX System V.
⟡ Explain how I/O requests are transformed to hardware operations.
⟡ Understand different factors affecting the performance of I/O.

9.1 INTRODUCTION
I/O and processing are the two main jobs that are performed on a
computer system. In most cases, the user is interested in I/O
operations; rather than processing. For example, while working in MS
Word, the user is interested in reading, entering or printing some
information, and not to compute any answer. Thus, controlling and
managing the I/O devices and the I/O operations is one of the main
responsibilities of an operating system. The operating system must
issue commands to the devices to work, provide a device-
independent interface between devices and the rest of the system,
handle errors or exceptions, catch interrupts, etc.
Since a variety of I/O devices with varying speeds and
functionality are attached to the system, providing a device-
independent interface is a major challenge for operating system
designers. To meet this challenge, designers use a combination of
hardware and software techniques. The basic I/O hardware elements
include ports, buses, and device controllers that can accommodate a
wide variety of I/O devices. In addition, various device-drivers
modules are provided with the operating system kernel to
encapsulate the details and peculiarities of different devices. This
forms an I/O subsystem of the kernel, which separates the
complexities of managing I/O devices from the rest of the kernel.

9.2 I/O HARDWARE


Computer communicates with a variety of I/O devices ranging from
mouse, keyboard, and disk to highly specialized devices like fighter
plane steering. These devices are broadly classified into three
categories:
• Human-interface devices: These devices facilitate
communication between the user and the computer. For example,
keyboard, mouse, monitor, printer, etc.
• Storage devices: These devices are used for permanent storage
of data and information. For example, disks, tapes, etc.
• Network or transmission devices: These devices allow
communication with remote systems and devices. For example,
modem, Bluetooth, etc.
All these devices differ from each other in various aspects. Some
of the key aspects that distinguish each device from the other include
data transfer rate (varying from 10 to 109 bps), unit of data transfer
(block or character), data representation (character coding and parity
conventions), error conditions, and complexity of control.
Despite the wide variety of devices, one needs to know only a few
concepts to understand how the devices are attached to the
computer, and how the operating system controls the hardware.
As already stated, the basic I/O hardware elements include ports,
buses, and device controllers. A port is a connection point through
which a device is attached to the computer system. It could be a
serial port or parallel port. After being attached, the device
communicates with the computer by sending signals over a bus. A
bus is a group of wires that specifies a set of messages that can be
sent over it. Recall Figure 1.6 of Chapter 1 that shows the
interconnection of various components to the computer system via a
bus. Note that the connection of some devices to the common bus is
shown via a device controller. A device controller (or adapter) is an
electronic component that can control one or more identical devices
depending on its type. For example, a serial-port controller is
simple and controls the signals sent to the serial port. On the other
hand, SCSI controller is complex and can control multiple devices.
Some devices have their own built-in controller. For example, disk
drive has its own disk controller, which consists of microcode and a
processor to do many tasks such as buffering, caching, and bad-
sector mapping.
Each device controller includes one or more registers that play an
important role in communication with the processor. The processor
writes data in these registers to let the device take some action and
reads data from these registers to know the status of the device.
There are two approaches through which the processor and controller
can communicate.
• In the first approach, some special I/O instructions are used,
which specify the read or write signal, I/O port address, and CPU
register. The I/O port address helps in the selection of correct
device.
• The second approach is memory-mapped I/O. In this, registers of
device are mapped into the address space of the processor and
standard data-transfer instructions are used to perform read/write
operation with the device.
Some systems use both the techniques. For example, in a PC I/O
instructions are used to control some I/O devices, whereas memory-
mapped I/O is used to control other devices.

9.3 I/O TECHNIQUES


There are basically three different ways to perform I/O operations
including programmed I/O (polling), interrupt-driven I/O, and direct
memory access (DMA). All these techniques are discussed in this
section.

9.3.1 Polling
A complete interaction between a host and a controller may be
complex, but the basic abstract model of interaction can be
understood by a simple example. Suppose that a host wishes to
interact with a controller and write some data through an I/O port. It is
quite possible that the controller is busy in performing some other
task; hence, the host has to wait before starting an interaction with
the controller. When the host is in this waiting state, we say the host
is busy-waiting or polling.
Note: Controllers are programmed to indicate something or
understand some indications. For example, every controller sets a
busy bit when busy and clears it when gets free.
To start the interaction, the host continues to check the busy bit
until the bit becomes clear. When the host finds that the busy bit has
become clear, it writes a byte in the data-out register and sets the
write bit to indicate the write operation. It also sets the command-
ready bit to let the controller take action. When the controller notices
that the ready bit is set, it sets the busy bit and starts interpreting the
command. As it identifies that write bit is set, it starts reading the
data-out register to get the byte and writes it to the device. After this,
the controller clears the ready bit and busy bit to indicate that it is
ready to take the next instruction. In addition, the controller also
clears an error bit (in the status register) to indicate successful
completion of the I/O operation.
9.3.2 Interrupt-driven I/O
The above scheme for performing interaction between the host and
controller is not always feasible, since it requires busy-waiting for
host. When either the controller or the device is slow, this waiting time
may be long. In that scenario, host must switch to another task.
However, if the host switches to another task and stops checking the
busy bit, how would it come to know that the controller has become
free?
One solution to this problem is that the host must check the busy
bit periodically and determine the status of the controller. This
solution, however, is not feasible because in many cases the host
must service the device continuously; otherwise the data may be lost.
Another solution is to arrange the hardware with which a controller
can inform the CPU that it has finished the work given to it. This
mechanism of informing the CPU about completion of a task (rather
than CPU inquiring the completion of task) is called interrupt. The
interrupt mechanism eliminates the need of busy-waiting of processor
and hence is considered more efficient than the previous one.
Now let us understand, how the interrupt mechanism works. The
CPU hardware has an interrupt-request line, which the controllers use
to raise an interrupt. The controller asserts a signal on this line when
the I/O device becomes free after completing the assigned task. As
the CPU senses the interrupt-request line after executing every
instruction, it frequently comes to know that an interrupt has occurred.
To handle the interrupt, the CPU performs the following steps.
1. It saves the state of current task (at least the program counter)
so that the task can be restarted (later) from where it was
stopped.
2. It switches to the interrupt-handling routine (at some fixed
address in memory) for servicing the interrupt. The interrupt
handler determines the cause of interrupt, does the necessary
processing, and causes the CPU to return to the state prior to
the interrupt.
The above discussed interrupt-handling mechanism is the ideal
one. However, in modern operating systems, the interrupt-handling
mechanism must accommodate the following features.
• High-priority interrupts must be identified and serviced before low-
priority interrupts. If two interrupts occur at the same time, the
interrupt with high-priority must be identified and serviced first.
Also, if one interrupt is being serviced and another high-priority
interrupt occurs, the high-priority interrupt must be serviced
immediately by preempting the low-priority interrupt.
• The CPU must be able to disable the occurrence of interrupts.
This is useful when CPU is going to execute those instructions of
a process that must not be interrupted (like instructions in the
critical section of a process). However, disabling all the interrupts
is not a right decision. This is because the interrupts not only
indicate the completion of task by a device but many exceptions
also such as an attempt to access non-existent memory address,
divide by zero error, etc. To resolve this, most CPUs have two
interrupt-request lines: maskable and non-maskable interrupts.
Maskable interrupts are used by device controllers and can be
disabled by the CPU whenever required but non-maskable
interrupts handle exceptions and should not be disabled.

9.3.3 Direct Memory Access (DMA)


Devices like disk drives are frequently involved in transferring large
amounts of data. To keep the CPU busy in transferring the data one
byte at a time from such devices is clearly wastage of the CPU’s
precious time. To avoid this, a scheme called Direct Memory Access
(DMA) is often used in systems. Note that for using DMA, the
hardware must have a DMA controller, which most systems have. In
DMA, the CPU assigns the task of transferring data to DMA controller
and continues with other tasks. The DMA controller can access the
system bus independent of CPU so it transfers the data on its own.
After the data has been transferred, it interrupts the CPU to inform
that the transfer is completed.
Note: Some systems have a separate DMA controller for each
device, whereas some systems have a single DMA controller for
multiple devices.
DMA works as follows. The CPU tells the DMA controller the
source of data, destination of data, and the byte count (the number
of bytes to transfer) by setting up several registers of the DMA
controller. The DMA controller then issues the command to the disk
controller to read the data into its internal buffer and verifies that no
read error has occurred. After this, the DMA controller starts
transferring data by placing the address on the bus and issuing read
request to disk controller. Since the destination memory address is on
the address lines of the bus, the disk controller reads data from its
buffer and writes to the destination address. After the data has been
written, the disk controller acknowledges DMA controller by sending a
signal over the bus. The DMA controller then increments the memory
address to use and decrements the byte count. If the byte count is
still greater than 0, the incremented memory address is placed over
the address lines of bus and read request is issued to the disk
controller. The process continues until the byte count becomes 0.
Once the transfer has been complete, the DMA controller generates
an interrupt to the CPU. The entire process is illustrated in Figure 9.1.
Fig. 9.1 Transferring Data using DMA

Note that when the DMA controller acquires the bus for
transferring data, the CPU has to wait for accessing the bus and the
main memory; though it can access cache. This mechanism is called
cycle stealing and it can slightly slow down the CPU. However, it is
much important to note that large amount of data gets transferred
with negligible task by the CPU. Hence, DMA seems to be a very
good approach of utilizing CPU for multiple tasks.

9.4 APPLICATION I/O INTERFACE


As stated earlier, there is a variety of I/O devices that can be attached
with the computer system and the operating system has to deal with
all these devices. However, it is almost impossible that operating
system developer would write separate code to handle every distinct
I/O device. Because then manufacturing of each new device would
lead to changing or adding some code in the operating system.
Clearly, this is not a feasible solution. Instead, I/O devices are
grouped under a few general kinds. For each general kind, a
standardized set of functions (called interface) is designed through
which the device can be accessed. The differences among the I/O
devices are encapsulated into the kernel modules called device
drivers. Note that a device driver is specific to each device and each
device driver exports one of the several standard interfaces. Figure
9.2 shows several software layers in the kernel I/O structure.

Fig. 9.2 Layers in the Kernel I/O Structure

With this structure implemented, the I/O subsystem becomes


independent of the hardware. Thus, the device can be accessed
through one of the standard interfaces and independent of the device
itself. Also, when hardware manufacturers create a new device, they
either make it compatible with one of the several available device
drivers or write the new device driver exporting one of the several
standard interfaces.

Block and Character Devices


A block device stores data in fixed-size blocks with each block
having a specific address. The data transfer to/from a block device is
performed in units of blocks. An important characteristic of block
devices is that they allow each block to be accessed independently
regardless of the other blocks. Some commonly used block devices
include hard disks, USB memory sticks, and CD-ROMs. Applications
can interact with block devices through the block-device interface,
which supports the following basic system calls.
• read (): To read from the device.
• write () : To write to the device.
• seek () : To specify the next block to be accessed.
On the other hand, a character device is the one that accepts
and produces a stream of characters. Unlike block devices, character
devices are not addressable. The data transfer to/from them is
performed in units of bytes. Some commonly used character devices
include keyboards, mice, and printers. Applications can interact with a
character device through the character-stream interface, which
supports the following basic system calls.
• get (): To read a character from the device.
• put (): To write a character to the device.
In addition to block and character devices, there are certain
devices that do not fit under any of these categories. For example,
clocks and timers are such devices. They are neither block
addressable nor they accept or produce character streams; rather
they are only used to generate interrupts after some specified
interval.
Note: Though block devices allow random access, some applications
(for example, DBMS) may access the block device as a sequential
array of blocks. This type of access is referred to as raw I/O.

Network Devices
Since the performance and addressing characteristics of network I/O
is different from that of disk I/O, the interface provided for network I/O
is also different. Unlike read(), write() and seek() interface for disks,
the socket interface is provided for network I/O. This interface is
provided in most of the operating systems including UNIX and
Windows NT.
The socket interface consists of various system calls that enable
an application to perform the following tasks:
• To create a socket
• To connect a local socket to a remote address
• To listen to any remote application
• To send and receive packets over the connection.
The socket interface also provides a select() function to provide
information about the sockets. When this function is called, it returns
the information about which sockets have space to accept packet to
be sent and which sockets have packet waiting to be received. This
eliminates the use of polling and busy waiting.

Clocks and Timers


Clocks and timers are needed by the operating system and time-
sensitive applications. Clocks and timers are used for the following:
• Setting a timer at time T to trigger X operation
• Getting the current time
• Getting the elapsed time.
These functions are mainly used by the operating system, and the
time-sensitive applications. However, no standardized system calls
are available across operating systems to implement these functions.
The hardware device used to trigger the operations and to measure
the elapsed time is known as programmable interval timer. This
device can be set to wait for a certain amount of time and then
generate interrupt; or it can be set to repeat the process to generate
interrupts periodically. Different components of the operating system
can use this timer for different purposes. For example:
• Scheduler can use it to generate interrupts for preempting the
processes at the end of their time slice.
• Network subsystem can use it to terminate the operations that are
operating too slowly because of network congestion or failure.
• I/O subsystem can use it to periodically flush the dirty cache
buffers to the disk.

Blocking and Non-blocking I/O


An operating system may use blocking or non-blocking I/O system
calls for application interface. The blocking I/O system call causes
the invoking process to remain blocked until the call is completed.
The process is removed from the run queue and placed into the wait
queue. Once the call has been completed, it is put back into the run
queue and the results returned by the system call are communicated
to it.
On the other hand, the non-blocking I/O system calls do not
suspend the execution of the invoking process for a long period;
rather they return quickly with a return value which indicates the
number of bytes that have been transferred. An alternative to non-
blocking I/O is asynchronous I/O where the invoking process need
not wait for I/O completion; rather it can continue its execution. When
the system call is completed, some signal or interrupt is generated,
and the results returned by the system call are provided to the
process.
Note: Most operating systems prefer to use blocking system calls as
their code is comparatively easier to use and understand.

9.5 KERNEL I/O SUBSYSTEM


The main concern of operating-system designers is the control of
devices attached to the computer. A wide variety of methods are used
to control these devices. These methods altogether form the I/O
subsystem of the kernel. The kernel I/O subsystem is responsible for
providing various I/O-related services, which include scheduling,
buffering, caching, and so on. In this section, we will discuss some of
these services.

9.5.1 I/O Scheduling


I/O scheduling means deciding the order in which the I/O requests
should be executed. Like process scheduling, I/O scheduling also
tends to improve the overall system performance. Thus, the I/O
requests from different processes should be scheduled in such a
manner that each process should get a fair share of an I/O device
and has to wait for the least possible time for I/O completion.
To implement I/O scheduling, a wait-queue mechanism is used.
As we have already studied in Chapter 2, for each I/O device in the
system, a queue is maintained. Whenever a process invokes a
blocking I/O system call, it is kept into the queue of that specific I/O
device. Now, depending on the application, the I/O scheduler may
use an appropriate scheduling algorithm to select the request from
the queue. For instance, it may use a priority-based algorithm to
serve I/O requests from a critical application on a priority-basis as
compared to less critical applications. Note that the I/O scheduler can
also rearrange the I/O requests in the queue for improving the
system’s performance.

9.5.2 Buffering
A buffer is a region of memory used for holding streams of data
during data transfer between an application and a device or between
two devices. Buffering serves the following purposes in a system.
• The speeds of the producer and the consumer of data streams
may differ. If the producer can produce items at a faster speed
than the consumer can consume or vice-versa, the producer or
consumer would be in waiting state for most of the time,
respectively. To cover up this speed mismatch between the
producer and consumer, buffering may be used. Both producer
and consumer share a common buffer. The producer produces an
item, places it in the buffer and continues to produce the next item
without having to wait for the consumer. Similarly, the consumer
can consume the items without having to wait for the producer.
However, due to fixed size of the buffer, the producer and
consumer still have to wait in case of full and empty buffer,
respectively. To resolve this, double buffering may be used
which allows sharing of two buffers between the producer and the
consumer thereby relaxing the timing requirements between
them.
• The sender and receiver may have different data transfer sizes. To
cope with such disparities, buffers are used. At the sender’s side,
large data is fragmented into small packets, which are then sent
to the receiver. At the receiver’s side, these packets are placed
into a reassembly buffer to produce the source data.
• Another common use of buffering is to support copy semantics for
application I/O. To understand the meaning of copy semantics,
consider that an application invokes the write() system call for
data in the buffer associated with it to be written to the disk.
Further, suppose that meanwhile the system call returns, the
application changes the contents of the buffer. As a result, the
version of the data meant to be written to the disk is lost. But with
copy semantics, the system can ensure that the appropriate
version of the data would be written to the disk. To ensure this, a
buffer is maintained in the kernel. At the time, the application
invokes write() system call, the data is copied to the kernel
buffer. Thus, any subsequent changes in the application buffer
would have no effect.

9.5.3 Caching
A cache is an area of very high speed memory, which is used for
holding copies of data. It provides a faster and an efficient means of
accessing data. It is different from the buffer in the sense that a buffer
may store the only existing copy of data (that does not reside
anywhere else) while a cache may store a copy of data that also
resides elsewhere.
Though caching and buffering serve different purposes,
sometimes an area of memory is used for both purposes. For
example, the operating system may maintain a buffer in the main
memory to store disk data for efficient disk I/O and at the same time
can use this buffer as cache to store the file blocks which are being
accessed frequently.

9.5.4 Spooling
SPOOL is an acronym for Simultaneous Peripheral Operation On-
line. Spooling refers to storing jobs in a buffer so that the CPU can
be utilized efficiently. Spooling is useful because devices access data
at different rates. The buffer provides a waiting station where the data
can rest while the slower device catches up. The most common
spooling application is print spooling. In a multiuser environment,
where multiple users can give the print command simultaneously, the
spooler loads the documents into a buffer from where the printer pulls
them off at its own rate. Meanwhile, a user can perform other
operations on the computer while the printing takes place in the
background. Spooling also lets a user place a number of print jobs on
a queue instead of waiting for each one to finish before specifying the
next one. The operating system also manages all requests to read or
write data from the hard disk through spooling.

9.5.5 Error Handling


Many kinds of hardware and application errors may occur in the
system during operation. For example, a device may stop working or
some I/O transfer call may fail. The failures may be either due to
transient reasons (such as overloaded network) or permanent (such
as disk controller failure). The kernel I/O subsystem protects against
transient failures so that the system does not fail. For instance, an I/O
system call returns one bit that indicates the success or failure of the
operation.
9.6 TRANSFORMING I/O REQUESTS TO
HARDWARE OPERATIONS
When a user requests for an I/O operation, a number of steps are
performed to transform the request into the hardware operation so as
to service this request. For instance, consider that a process wishes
to read data from a file. The following sequence of steps describes
the typical lifecycle of a read (blocking) request.
1. The process invokes a blocking read() system call.
2. The kernel code determines whether the parameters passed are
correct. After verification, the desired file blocks are checked in
buffer and if available, are returned to the process. This
completes the I/O request.
3. Otherwise, the desired data is to read from the physical disk.
The execution of the invoking process is suspended; the process
is removed from the run queue and brought into the wait queue
of the appropriate device where it waits till it is scheduled.
Finally, the kernel I/O subsystem passes the request to the
appropriate device driver.
4. The device driver allocates buffer space in the kernel for
receiving the data and schedules the I/O request. Finally, it
sends commands to the device controller.
5. The device controller functions the device hardware to transfer
the data.
6. The driver may continuously monitor the device for its status and
data (in the case of programmed I/O) or may establish a DMA
transfer in the kernel memory. In the case of DMA, an interrupt is
generated by the DMA controller after the transfer is complete.
7. The control is then passed to the appropriate interrupt handler,
which finds the interrupt, stores the data (if required), sends
signals to the device driver, and then returns.
8. Upon receiving the signals from the interrupt handler, the device
driver identifies the I/O request that has accomplished and the
status of the request, and informs the kernel I/O subsystem
about I/O completion by sending signals to it.
9. The kernel returns the results of the system call (either data or
the error code) to the invoking process. The process is
unblocked by bringing it back into the run queue from the wait
queue.
10. The I/O request is completed and the process continues its
execution.
Figure 9.3 depicts the lifecycle of a typical I/O request.
Fig. 9.3 Life Cycle of an I/O Request

9.7 STREAMS
STREAMS is a UNIX System V mechanism that enables
asynchronous I/O between a user and a device. It provides a full-
duplex (two-way communication) connection between a user process
and the device driver of the I/O device. A STREAM consists of a
stream head, driver end, and stream modules (zero or more). The
stream head acts as an interface to the user process and the driver
end controls the device. Between the stream head and the driver
end, are the stream modules that provide the functionality of
STREAMS processing. Each of the stream head, driver end, and
stream modules is associated with two queues: read queue and write
queue. The read queue is used to store the requests for reading from
the device while the write queue is used to store the requests for
writing to the device. Each queue can communicate with its
neighbouring queue via message passing. Figure 9.4 shows the
structure of STREAMS.
Fig. 9.4 Structure of STREAMS

Whenever a user process invokes a write() system call for output


to a device, the stream head prepares a message, copies the data
into it, and passes the message to the write queue of the adjacent
stream module. The message continues to pass down through the
write queues until it reaches the write queue of the driver end and
finally, to the device. Similarly, the user process can read from the
stream head by invoking a read() system call; however, this time the
communication takes place via read queues.
Though STREAMS facilitates non-blocking I/O, the user process
wishing to write to the device gets blocked if the write queue of the
stream head is full and remains blocked until there is space in the
write queue. Similarly, while reading from the device, the user
process remains blocked until some data becomes available in the
read queue.
STREAMS offers various benefits, which are as follows:
• It provides a framework to modular and incremental approach for
writing network protocols and device drivers.
• Different STREAMS (or different devices) may utilize the same set
of stream modules thereby resulting in reusability of device
drivers.
• Most implementations of UNIX provide support for STREAMS and
favour this method to write protocols and device drivers.

9.8 PERFORMANCE
System performance is greatly affected by the I/O. As we know that
an I/O system call invoked by an application has to pass through a
number of software layers such as kernel, device driver, and device
controller before reaching the physical device. This demands more
CPU time and therefore, performing I/O is costly in terms of the CPU
cycles. Moreover, the layers between the application and the physical
device imply overhead of:
• Context switching while crossing the kernel’s protection boundary.
• Interrupt-handling and signal-handling in the kernel to serve the
I/O device.
• Load on the CPU and memory bus while copying data between
device controller and physical memory and between kernel
buffers and application space.
One cause behind context switching is the occurrence of
interrupts. Whenever an interrupt occurs, the system performs a state
change, executes the appropriate interrupt handler, and then restores
the state. Though modern computers are able to deal with several
thousands of interrupts per second, handling an interrupt is quite an
expensive task.
Another reason that causes high context-switching rate is the
network traffic. To understand how it happens, suppose that a
process on one machine wants to login on a remote machine
connected via the network. Now, the following sequence of steps
takes place for transferring each character from the local machine to
the remote machine.
1. A character is typed on the local machine causing a keyboard
(hardware) interrupt. The system state is saved and the control
is passed to the appropriate interrupt handler.
2. After the interrupt has been handled, the character is passed to
the device driver and from there to the kernel. Finally, the
character is passed from the kernel to the user process. A
context switch occurs as the kernel switches from kernel mode
to user mode.
3. The user process invokes a network I/O system call to pass the
character through the network. A context switch occurs and the
character flows into the local kernel.
4. The character passes through network layers that prepare a
network packet which is transferred to the network device driver
and then to the network controller.
5. The network controller transfers the packet onto the network and
causes an interrupt. The system’s state is saved and the
interrupt is handled.
6. After the interrupt has been handled, a context switch occurs to
indicate the completion of network I/O system call.
7. At the receiving side, the network packet is received by the
network hardware and an interrupt occurs which causes the
state save.
8. The character is unpacked and is passed to the device driver
and from there to the kernel. A context switch occurs and the
character is passed to the appropriate network daemon.
9. The network daemon determines which login session is involved
and passes the character to the network sub-daemon via the
kernel, thereby resulting in two context switches.
Thus, it is clear that passing data through the network involves a
lot of interrupts, state switches, and context switches. Moreover, if the
receiver has to echo the character back to the sender, the work
doubles.
In general, the efficiency of I/O in a system can be improved by:
• reducing the number of context switches.
• reducing the frequency of interrupt generation by employing large
data transfers, smart controllers, and polling.
• reducing the frequency of copying data in the memory during data
transfer between application and device.
• balancing the load of memory bus, CPU, and I/O.
• employing DMA-knowledgeable controllers for increasing
concurrency.

LET US SUMMARIZE
1. Controlling and managing the I/O devices and the I/O operations is one of
the main responsibilities of an operating system. The operating system
must issue commands to the devices to work, provide a device-
independent interface between devices and the rest of the system, handle
errors or exceptions, catch interrupts, and so on.
2. I/O devices are broadly classified into three categories, namely, human-
interface devices, storage devices, and network devices.
3. The basic I/O hardware elements include ports, buses, and device
controllers. A port is a connection point through which a device is
attached to the computer system. It could be a serial port or parallel port.
After being attached, the device communicates with the computer by
sending signals over a bus. A bus is a group of wires that specifies a set
of messages that can be sent over it.
4. A device controller (or adapter) is an electronic component that can
control one or more identical devices depending on the type of device
controller.
5. There are basically three different ways to perform I/O operations
including programmed I/O (polling), interrupt-driven I/O, and direct
memory access (DMA).
6. During programmed I/O, the host may have to wait continuously while the
controller is busy in performing some other task. This behaviour is often
called busy-waiting or polling.
7. In interrupt-driven I/O, the CPU is informed of the completion of a task
(rather than CPU inquiring the completion of the task) by means of
interrupts. The interrupt mechanism eliminates the need of busy-waiting of
processor and hence is considered more efficient than programmed I/O.
8. In DMA, the DMA controller interacts with the device without the CPU
being bothered. As a result, the CPU can be utilized for multiple tasks.
9. All I/O devices are grouped under a few general kinds. For each general
kind, a standardized set of functions (called interface) is designed through
which the device can be accessed. The differences among the I/O
devices are encapsulated into the kernel modules called device drivers.
10. A block device stores data in fixed-size blocks with each block having a
specific address. A character device is the one that accepts and produces
a stream of characters. Unlike block devices, character devices are not
addressable.
11. A socket interface is provided in most of the operating systems for network
I/O. It also provides a select() function to provide information about the
sockets.
12. Clocks and timers are used for getting the current time and elapsed time,
and for setting the timer for some operation or interrupt.
13. An operating system may use blocking or non-blocking I/O system calls for
application interface. The blocking I/O system call causes the invoking
process to block until the call is completed. On the other hand, the non-
blocking I/O system calls do not suspend the execution of the invoking
process for a long period; rather they return quickly with a return value
which indicates the number of bytes that have been transferred.
14. A wide variety of methods are used to control the devices attached to the
computer system. These methods altogether form the I/O subsystem of
the kernel. The kernel I/O subsystem is responsible for providing various
I/O-related services, which include scheduling, buffering, caching,
spooling, and error handling.
15. When a user requests for an I/O operation, a number of steps are
performed to transform the I/O request into the hardware operation so as
to service the I/O request.
16. STREAMS is a UNIX System V mechanism that enables asynchronous
(non-blocking) I/O between a user and a device. It provides a full-duplex
(two-way communication) connection between a user process and the
device driver of the I/O device.
17. The efficiency of I/O in a system can be improved by reducing the number
of context switches, reducing the frequency of interrupt generation,
reducing the frequency of copying data in the memory during data transfer
between application and device, balancing the load of memory bus, CPU,
I/O, and employing DMA-knowledgeable controllers for increasing
concurrency.

EXERCISES
Fill in the Blanks
1. A device is attached with the computer system via a connection point
known as _____________.
2. DMA stands for _____________.
3. Applications can interact with the block and character devices through the
_________ and _____________, respectively.
4. _____________ means deciding the order in which the I/O requests
should be executed.
5. A stream consists of a _____________, _____________, and
_____________.

Multiple Choice Questions


1. Which of the following involves busy-waiting?
(a) Interrupt-driven I/O
(b) DMA
(c) Programmed I/O
(d) None of these
2. When the DMA controller acquires the bus for transferring data, the CPU
has to wait for accessing bus and main memory; though it can access
cache. This mechanism is called _____________.
(a) Cycle stealing
(b) Bus stealing
(c) CPU cycle
(d) None of these
3. Applications can interact with block devices through the _____________
interface.
(a) Block-character
(b) Block-device
(c) Blocking
(d) None of these
4. Which of the following allows sharing of two buffers between the producer
and the consumer thereby relaxing the timing requirements between
them?
(a) Single buffering
(b) Double buffering
(c) Circular buffering
(d) None of these
5. Which of the following is used to store the requests for writing to the
device?
(a) Circular queue
(b) Write queue
(c) Read queue
(d) None of these

State True or False


1. Maskable interrupts are used by device controllers; these can be disabled
by the CPU whenever required.
2. One of the function calls of network socket interface is select().
3. An area of memory cannot be used for both buffering and caching.
4. Storage devices are used for permanent storage of data and information.
5. The efficiency of I/O in a system can be improved by increasing the
number of context switches.

Descriptive Questions
1. Define the following terms:
(i) Port
(ii) Bus
(iii) Device controller
(iv) Spooling
2. Discuss the various categories of I/O devices. How do these devices differ
from each other?
3. What is asynchronous I/O?
4. State the difference between blocking and non-blocking I/O.
5. Describe some services provided by the I/O subsystem of a kernel.
6. List two common uses of buffering.
7. What is busy-waiting? Is it preferred over blocking-wait?
8. How does DMA result in increased system concurrency?
9. Differentiate between STREAMS driver and STREAMS module.
10. Write a short note on the following:
• Interrupt-driven I/O
• Block and character devices
• Kernel I/O structure
11. With the help of flow chart explain the lifecycle of the I/O operation.
12. Discuss the role of socket interface for network devices.
chapter 10

Mass-Storage Structure

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the physical structure of magnetic disks.
⟡ Describe disk scheduling and various algorithms that are used to
optimize disk performance.
⟡ Explain disk management including formatting of disks and
management of boot and damaged blocks.
⟡ Discuss swap-space management.
⟡ Explain RAID and its levels.
⟡ Describe disk attachment.
⟡ Explore stable storage and tertiary storage.

10.1 INTRODUCTION
As discussed in the previous chapter, a computer system consists of
several devices (such as mouse, keyboard, disk, monitor, CD-ROM)
that deal with different I/O activities. Among all these I/O devices, disk
(or some kind of disk) is considered as an essential requirement for
almost all the computers. Other devices like mouse, CD-ROM, or
even keyboard and monitor are optional for some systems such as
servers. This is because servers are usually accessed by other
resources (say, clients) on the network. Therefore, this chapter mainly
focuses on disk related issues such as its physical structure,
algorithms used to optimize its performance, its management, and
reliability.

10.2 DISK STRUCTURE


A magnetic disk is the most commonly used secondary storage
medium. It offers high storage capacity and reliability. Whenever the
data stored on the disk needs to be accessed by the CPU, it is first
moved to the main memory and then the required operation is
performed. Once the operation has been performed, the modified
data must be copied back to the disk. The system is responsible for
transferring the data between the disk and the main memory as and
when required. Data on the disk survives power failures and system
crash. There is a chance that a disk may sometimes fail and destroy
the data, however, such failures are rare.
Data is represented as magnetized spots on the disk. A
magnetized spot represents 1 and the absence of the magnetized
spot represents 0. To read the data, the magnetized spots on the disk
are converted into electrical impulses, which are then transferred to
the processor. Writing data onto the disk is accomplished by
converting the electrical impulses received from the processor into
magnetized spots on the disk. The data in a magnetic disk can be
erased and reused virtually infinitely. The disk is designed to reside in
a protective case or cartridge to shield it from dust and any other
external interference.

Organization of a Magnetic Disk


A magnetic disk consists of a plate/platter, which is made up of
metal or glass material, and its surface is covered with magnetic
material to store data on its surface. If the data can be stored only on
one side of the platter, the disk is single-sided, and if both sides are
used to hold the data, the disk is double-sided. When the disk is in
use, the spindle motor rotates the platters at a constant high speed.
Usually, the speeds at which they rotate are 60, 90 or 120 revolutions
per second.
The surface of a platter is divided into imaginary tracks and
sectors. Tracks are concentric circles where the data is stored, and
are numbered from the outermost to the innermost ring, starting with
zero. There are about 50,000 to 100,000 tracks per platter and a disk
generally has 1 to 5 platters. Tracks are further subdivided into
sectors (or track sectors). A sector is just like an arc that forms an
angle at the center. It is the smallest unit of information that can be
transferred to/from the disk. There are hundreds of sectors per track
and the sector size is typically 512 bytes. The inner tracks are of
smaller length than the outer tracks, thus, there are about 500 sectors
per track in the inner tracks, and about 1000 sectors per track
towards the boundary. In general, disks containing large number of
tracks on each surface of platter and more sectors per track have
high storage capacity.
A disk contains one read/write head for each surface of a platter,
which is used to store and retrieve data from the surface of the
platter. Information is stored on a sector magnetically by the
read/write head. The head moves across the surface of the platter to
access different tracks. All the heads are attached to a single
assembly called a disk arm. Thus, all the heads of different platters
move together. The assembly of disk platters mounted on a spindle
together with the heads mounted on a disk arm is known as head-
disk assembly. All the read/write heads are on the equal diameter
track on different platters at one time. The tracks of equal diameter on
different platters form a cylinder. Accessing data of one cylinder is
much faster than accessing data that is distributed among different
cylinders. A close look at the internal structure of a magnetic disk
may be had in Figure 10.1.
Fig. 10.1 Moving Head Disk Mechanism

Note: Some disks have one read/write head for each track of the platter. These
disks are termed as fixed-head disks, since one head is fixed on each track and
is not moveable. On the other hand, disks in which the head moves along the
platter surface are termed as moveable-head disks.

Accessing Data from the Disk


Data in a magnetic disk is recorded on the surface of the circular
tracks with the help of read/write head, which is mounted on the arm
assembly. These heads can be multiple in numbers to access the
adjacent tracks simultaneously and, thus, making a disk access
faster. Transfer of data between memory and disk drive is handled by
a disk controller, which interfaces the disk drive to the computer
system.
Some common interfaces used for disk drives on personal
computers and workstations are SCSI (small-computer-system-
interface; pronounced “scuzzy”), ATA (AT attachment) and SATA
(serial ATA). The latest technology implements the disk controller
within the disk drive. The controller accepts high level I/O commands
(to read or write sector) and start positioning the disk arm over the
right track in order to read or write the data. The disk controller
computes an error-correcting code for the data to be written on the
sector and attach it with the sector. When the sector is to be read, the
controller again computes the code from the sector data and
compares it with the stored code. If there is any difference between
them, the controller signals a read failure.
Remapping of bad sectors is another major task performed by
disk controllers. During initial formatting of the disk, if the controller
detects a bad (or damaged) sector, it logically maps the affected
sector to another physical location. The disk is notified of the
remapping and any further operation is carried out on the new
location. Management of bad sectors is discussed in Section 10.4.2.
The process of accessing data comprises the following three
steps.
1. Seek: As soon as the disk unit receives the read/write command, the
read/write heads are positioned on specific track on the disk platter. The
time taken in doing so is known as seek time. It is the average time
required to move the heads from one track to some other desired track on
the disk. Seek times of modern disk may range between 6–15
milliseconds.
2. Rotate: Once the heads are positioned on the desired track, the head of
the specific platter is activated. Since the disk rotates constantly, the head
has to wait for the required sector or cluster (desired data) to come under
it. This delay is known as rotational delay time or latency of the disk. The
average rotational latencies range from 4.2 to 6.7 ms.
3. Data transfer: After waiting for the desired data location, the read/write
head transfers the data to or from the disk to primary memory. The rate at
which the data is read from or written to the disk is known as data transfer
rate.. Some of the latest hard disks have a data transfer rate of 66
MB/second. The data transfer rate depends upon the rotational speed of
the disk. If the disk has a rotational speed of 6000 rpm (rotations per
minute), having 125 sectors and 512 bytes/sector, the data transfer rate
per revolution will be 125 × 512 = 64000 bytes. Hence, the total transfer
rate per second will be 64000 × 6000/60 = 6,400,000 bytes/second or 6.4
MB/second.
The combined time (seek time, latency time, and data transfer
time) is known as the access time. Specifically, it can be described
as the period of time that elapses between a request for information
from the disk or memory and the information arriving at the
requesting device. Memory access time refers to the time it takes to
transfer a character from the memory to or from the processor, while
disk access time refers to the time it takes to place the read/write
heads over the requested data. RAM may have an access time of 9–
70 nanoseconds, while the hard disk access time could be 10–40
milliseconds.
The reliability of the disk is measured in terms of the mean time
to failure (MTTF). It is the duration of time for which the system can
run continuously without any failure. Manufacturers claim that the
mean time to failure of disks ranges between 1,000,000 hours (about
116 years) to 1,500,000 hours (about 174 years), although various
research studies conclude that failure rates are, in some cases, even
13 times greater than what manufacturers claim. Generally, the
expected life span of most disks is about 4 to 5 years. However, the
failure rate of disks increases with their age.

10.3 DISK SCHEDULING


As discussed, accessing data from the disk requires seek time,
rotational delay and data transfer. Among these three, seek time is
the one that dominates the entire access time. Recall, it is the time in
which the read/write heads are positioned on specific tracks on the
disk platter. Whenever a disk access request arrives, the head is
moved to place on the specific track. In case, only one request comes
in at one time, all the requests are serviced as and when they arrive;
nothing can be done to reduce seek time. However, there is always a
possibility that new requests arrive when the system is servicing any
one. In this case, new requests are placed in the queue of pending
requests. Thus, after completing the current request, the operating
system has a choice of which request from the queue to service next.
Several algorithms have been developed that serve this purpose for
the operating system; some of them are discussed here. We will see
that selecting requests in an appropriate order can reduce the seek
time significantly.

10.3.1 First-Come, First-Served (FCFS) Algorithm


In this algorithm, the request at the front of the queue is always
selected to be serviced next. That is, the requests are served on
First-come, First-served basis. To understand this concept, consider a
disk with 100 cylinders and a queue of pending requests to access
blocks at cylinders as shown below.
15, 96, 35, 27, 73, 42, 39, 55
Further, suppose that the head is resting at cylinder 40 when the
requests arrive. First, the head is moved to cylinder 15, since it is at
the front of the queue. After servicing the request at cylinder 15, the
head is moved to 96, then on 35, 27, 73, and so on (see Figure 10.2).
It is clear that servicing all the requests results in total head
movement of 271 cylinders.
Fig. 10.2 FCFS Algorithm

Though it is the simplest algorithm to select a request from the


queue, it does not optimize disk performance. It is clear that the head
movement from cylinder 15 to cylinder 96 and then back again to
cylinder 35 constitutes the major portion of the total head movement
of 271 cylinders. If the request for cylinder 96 is scheduled after the
requests for cylinders 35 and 27, then the total head movement could
be reduced noticeably from 271 to 179 cylinders.

10.3.2 Shortest Seek Time First (SSTF) Algorithm


As just discussed, scheduling request for cylinders that are far away
from the current head position after the request for the closest
cylinder can reduce the total head movement significantly. This is
what SSTF algorithm attempts to do. This algorithm suggests to the
operating system to select the request for the cylinder which is
closest to the current head position. To understand this, consider
once again the above pending-request queue with head initially at
cylinder 40. The request for cylinder 39 is closest to the current head
position, so it will move to cylinder 39. After servicing the request at
cylinder 39, the head will move to cylinder 42 and then to service
requests at cylinders 35, 27 and 15. Now, the request for cylinder 55
is closest, so the head will move to cylinder 55. The next closest
request is for cylinder 73, so the head will move to cylinder 73 and
finally to cylinder 96 (see Figure 10.3).

Fig. 10.3 SSTF Algorithm

This algorithm requires a total head movement of 112 cylinders, a


major improvement over FCFS. However, this algorithm can cause
some request to wait indefinitely—a problem called starvation.
Suppose, when the request for cylinder 15 is being serviced, a new
request arrives for cylinder 17. Clearly, this algorithm makes the head
to move to cylinder 17. Further, suppose that while the request at
cylinder 17 is being serviced, new requests arrive for cylinders 14 and
22. In fact, if requests close to the current head position arrive
continuously, then requests for cylinders far away will have to wait
indefinitely.

10.3.3 SCAN Algorithm


In this algorithm, the head starts at one end of the disk and moves
towards the other end servicing the requests at cylinders that are on
the way. Upon reaching the other end, the head reverses its
movement and continues servicing the requests that are on the way.
This process of moving the head across the disk continues.
To understand this concept, consider our example again. Here, in
addition to the request queue and current head position, we must
know the direction in which the head is moving. Suppose the head is
moving toward the cylinder 100; it will service the requests at
cylinders 42, 55, 73, and 96 in the same order. Then, upon reaching
the end, that is, at cylinder 100, the head reverses its movement and
services the requests at cylinders 39, 35, 27, and 15 (see Figure
10.4).

Fig. 10.4 SCAN Algorithm

The algorithm is simple and almost totally avoids the starvation


problem. However, all the requests that are behind the head will have
to wait (no matter how close they are to the head) until the head
reaches the end of the disk, reverses its direction, and comes back to
them. Whereas, a new request that is in front of the head will be
serviced almost immediately (no matter when it enters the queue).
10.3.4 LOOK Algorithm
A little modification of SCAN algorithm is LOOK algorithm. Under this,
the head starts at one end and scans toward the other end with
servicing the requests on the way, just like the SCAN algorithm.
However, here the head does not necessarily reach the end of the
disk, instead when there are no more requests in the direction in
which the head is moving it reverses its direction. Figure 10.5
illustrates the LOOK algorithm for our example of queue of pending
requests. In this example, the head reverses its direction after
servicing the request for cylinder 96.

Fig. 10.5 LOOK Algorithm

10.3.5 C-SCAN and C-LOOK Algorithms


C-SCAN (circular SCAN) and C-LOOK (circular LOOK) are the
variants of SCAN and LOOK algorithms, respectively, which are
designed to provide a more uniform wait time. In these algorithms,
though the head scans through the disk in both the directions, it
services the requests in one direction only. That is, when the head
reaches the other hand, it immediately returns to the starting end
without servicing any requests. Figure 10.6 illustrates the C-SCAN
and C-LOOK algorithm for our sample queue of requests.

Fig. 10.6 C-SCAN and C-LOOK Algorithms

Example 1 Suppose that a disk drive has 5000 cylinders, numbered


0 to 4999. The drive is currently serving a request at cylinder 143,
and the previous request was at cylinder 125. The queue of pending
requests in FIFO order is: 86, 1470, 913, 1774, 948, 1509, 1022,
1750, 130.
Starting from the current head position, what is the total distance
(in cylinders) that the disk arm moves to satisfy all the pending
requests, for each of the following disk scheduling algorithms?
(a) FCFS
(b) SSTF
(c) SCAN
(d) LOOK
(e) C-SCAN
(f) C-LOOK
Solution
(a) FCFS algorithm:
Initially, the head is serving the request at cylinder 143. From here,
the head moves to cylinder 86 since it is first request in the queue.
After serving the request at 86, the head moves to cylinder 1470,
then to 913, 1774, 948, and so on, and finally, to 130. Figure 10.7
illustrates how the pending requests are scheduled according to the
FCFS algorithm.

Fig. 10.7 Using FCFS Algorithm

The total distance moved in serving all the pending requests can
be calculated as:
(143 – 86) + (1470 – 86) + (1470 – 913) + (1774 – 913) + (1774
– 948) + (1509 – 948) + (1509 – 1022) + (1750 – 1022) + (1750
– 130)
⇒ 57 + 1384 + 557 + 861 + 826 + 561 + 487 + 728 + 1620
⇒ 7081 cylinders
(b) SSTF algorithm
First of all, the head serves the cylinder 130 as it is closest to its
current position (which is 143). From there, it moves to the cylinder
86, to 913, 948, 1022, 1470, 1509, 1750, and finally, to the cylinder
1774. Figure 10.8 illustrates how the pending requests are scheduled
according to SSTF algorithm.

Fig. 10.8 Using SSTF Algorithm

The total distance moved in serving all the pending requests can
be calculated as:
(143 – 86) + (1774 – 86)
⇒ 57 + 1688
⇒ 1745 cylinders
(c) SCAN algorithm
As the head is currently serving the request at 143 and previously it
was serving at 125, it is clear that the head is moving towards
cylinder 4999. While moving, the head serves the requests at
cylinders which fall on the way, that is, 913, 948, 1022, 1470, 1509,
1750 and 1774 in this order. Then, upon reaching the end, that is, at
cylinder 4999, the head reverses its direction and serves the requests
at cylinders 130 and 86. Figure 10.9 illustrates how the pending
requests are scheduled according to SCAN algorithm.

Fig. 10.9 Using SCAN Algorithm

The total distance moved in serving all the pending requests can be
calculated as:
(4999 – 143) + (4999 – 86)
⇒ 4856 + 4913
⇒ 9769 cylinders
(d) LOOK algorithm
In LOOK algorithm, the head serves the requests in the same manner
as in SCAN algorithm except when it reaches cylinder 1774, it
reverses its direction instead of going to the end of the disk. Figure
10.10 illustrates how the pending requests are scheduled according
to the LOOK algorithm.
The total distance moved in serving all the pending requests can
be calculated as:
(1774 – 143) + (1774 – 86)
⇒ 1631 + 1688
⇒ 3319 cylinders

Fig. 10.10 Using LOOK Algorithm

(e) C-SCAN algorithm


Proceeding like SCAN algorithm, the head serves the requests at
cylinders 913, 948, 1022, 1470, 1509, 1750 and 1774 in the same
order. Then, upon reaching the end, that is, at cylinder 4999, the
head reverses its direction and moves to the other end, that is, to
cylinder 0 without serving any request on the way. Then, it serves the
requests at cylinders 86 and 130. Figure 10.11 illustrates how the
pending requests are scheduled according to C-SCAN algorithm.
Fig. 10.11 Using C-SCAN Algorithm

The total distance moved in serving all the pending requests can
be calculated as:
(4999 – 143) + 4999 + 130
⇒ 4856 + 5129
⇒ 9985 cylinders
(f) C-LOOK algorithm
In C-LOOK algorithm, the head serves the requests in the same
manner as in LOOK algorithm. The only difference is that when it
reverses its direction on reaching cylinder 1774, it first serves the
request at cylinder 86 and then at 130. Figure 10.12 illustrates how
the pending requests are scheduled according to C-LOOK algorithm.
Fig. 10.12 Using C-LOOK Algorithm

The total distance moved in serving all the pending requests can
be calculated as:
(913 – 143) + (948 – 913) + (1022 – 948) + (1470 – 1022) +
(1509 – 1470) + (1750 – 1509) + (1774 – 1750) + (1774 – 86) +
(130 – 86)
⇒ 770 + 35 + 74 + 448 + 39 + 241 + 24 + 1688 + 44
⇒ 3363 cylinders
Thus, we see that in this example the SSTF algorithm proves
fastest as the head needs to move only 1745 cylinders, as against
7081, 9769, 3319, 9985, and 3363 in other cases.

10.4 DISK MANAGEMENT


Providing disk management services also comes under the
responsibilities of the operating systems. In this section, we will
discuss disk formatting and recovery from bad sectors.
10.4.1 Disk Formatting
When a disk is manufactured, it is just a stack of some platters of
magnetic material on which data can be stored. At that time, there is
no information on the disk. Before the disk can be used for storing
data, all its platters must be divided into sectors (that disk controller
can read and write) using some software. This process is called low-
level (or physical) formatting, which is usually performed by the
manufacturer.
During this process, a special data structure for each sector is
written to the disk, which typically consists of a preamble, a data
portion, and an error-correcting code (ECC). The preamble begins
with a certain bit pattern to indicate the start of a new sector and also
contain information such as sector number, and cylinder number. The
size of the data portion determines the maximum amount of data that
each sector can hold. It is possible to choose the size of the data
portion among 256, 512, and 1024 bytes, but usually this size is 512
bytes. The ECC plays its role during each read and write. When
some data is written to a sector by the disk controller, it calculates a
code from all the bytes of the data being written to the sector and
updates the ECC with that code. Now, whenever that sector is read,
the disk controller recalculates the code from the data that is read
from the sector and compares it with the code stored in ECC. Any
mismatch in the values indicates that the data of the sector is
destroyed. ECC not only helps in detecting that some bits are
destroyed, in fact in case only a few bits are destroyed, it enables
disk controller to identify the destroyed bits and calculate their correct
values.
After low-level formatting, the disk is partitioned into one or more
groups of cylinders. The operating system treats each partition as a
logically separate disk. On most computers, some boot code and a
partition table is stored in sector 0. The partition table tells the starting
sector and the size of each partition on the disk.
The last step is logical (or high-level) formatting of each
partition of the disk. During this step, the operating system stores
initial file-system data structures and a boot block on the disk. The
file-system data structures include an empty directory, storage space
administration (free list or bitmap), etc. After logical formatting, the
disk can be used to boot the system and store the data.

10.4.2 Boot Block


It is well known that a computer needs the operating system to be in
the main memory for its operation. However, when a computer is
switched on or rebooted, there is nothing in the memory. Thus, it
requires an initial program called bootstrap program which finds the
kernel of the operating system on the disk and loads it into the
memory, thus starting the computer. The bootstrap program initializes
the CPU registers, main memory contents and the device controllers
and then, begins the operating-system execution.
Mostly, the bootstrap program is stored in the read-only memory
(ROM) of the computer. The reason behind this is that the location of
the ROM is fixed and it needs no initialization. Thus, the processor
can directly start executing it on powering up or resetting the
computer. Moreover, the ROM is not prone to viruses, as it is read
only memory. However, the only problem with this arrangement is that
if we need to change the bootstrap code, the ROM hardware chips
would also need to be changed. To avoid this problem, only a small
bootstrap loader program is stored in ROM, while the entire bootstrap
program is stored in certain fixed areas on the disk, called boot
blocks. Thus, any changes can be made easily to the bootstrap
program without requiring any changes in the ROM chips.
When the computer is started, the code contained in the bootstrap
loader in ROM informs the disk controller to read the boot blocks from
the disk and load the bootstrap program in the memory. The
bootstrap program then loads the operating system in the memory
and starts executing it.
Note: A disk containing boot partition is referred to as a boot disk or
a system disk.
10.4.3 Management of Bad Sectors
Due to manufacturing defects, some sectors of a disk drive may
malfunction during low-level formatting. Some sectors may also
become bad during read or write operations with the disk. This is
because the read/write head moves just at a distance of few
microinches from the surface of the disk platters and if the head
becomes misaligned or a tiny dust particle comes between the
surface and the head, it may touch or even scratch the surface. This
is termed as head crash, and it may damage one or more sectors.
For these reasons, disk drives are manufactured with some spare
sectors that replace bad sectors when needed.
There are several ways of handling bad sectors. On some simple
disks, bad sectors need to be handled manually by using, for
instance, format command or chkdsk command of MS-DOS.
However, in modern disks with advanced disk controller, other
schemes are also possible.
A simplest scheme is logical replacement of bad sector with one
of the spare sectors in the disk. This scheme is known as sector
sparing or forwarding. To understand this, suppose that the disk
controller reads a sector and finds a mismatch in calculated and
stored ECC. It reports the operating system that the sector is bad.
Next time when the system is rebooted, the disk controller is asked to
replace the sector with any spare sector. From now, whenever a
request for that sector arrives, the disk controller translates the
request into the address of the spare sector chosen for replacement.
To manage multiple bad sectors using this scheme, the disk controller
maintains a list of bad sectors and translates the requests to access
bad sectors into the addresses of the corresponding spare sectors.
This list is usually initialized during low-level formatting and is
updated regularly throughout the life of the disk.
Note: Usually, the data of the bad sector is lost, thus, replacement of
bad sector is not totally an automatic process. It requires the data to
be restored manually from the backup media.
Though the scheme discussed above is simple, the disk
scheduling algorithm followed by the operating system for
optimization may become less efficient or even worse. To understand
this concept, suppose that the sector 29 becomes bad and the disk
controller finds the first spare sector following the sector 278. Now,
every request to access the sector 29 is redirected by the disk
controller to the sector that follows 278. If the operating system
schedules the request to access the sectors 29, 35, and 275 in the
order when the disk head is at sector 27, then the disk head first
moves to the corresponding spare sector, then to sector 35 and finally
goes back to access sector 275. Though the operating system
schedules the requests for disk optimization with a total head
movement of 252, actually the total head movement is 736. To
overcome this problem, most disks are manufactured with a few
spare sectors in each cylinder and the disk controller attempts to
replace a bad sector with a spare sector in the same cylinder.
There is an alternative to the sector sparing scheme which is
known as sector slipping. In this scheme, instead of logically
replacing the bad sector with a spare sector, all the sectors following
the bad sector are shifted down one place, making the sector
following the bad sector free. The contents of the bad sector are then
copied to this free sector, leaving the bad sector unused. For
example, considering the same case as discussed above, sector 278
is copied into the spare sector, then 277 is copied into 278 and so on
until the sector 30 (following the bad sector) is copied to sector 31.
After that, sector 29 (bad sector) is mapped into the space freed by
sector 30.

10.5 SWAP-SPACE MANAGEMENT


As discussed in Chapter 8, the virtual memory uses some amount of
disk space known as swap-space as an extension of the main
memory. Swap-space is used in different ways by different operating
systems depending upon the memory management algorithms. For
example, the operating system may use this space to:
• hold the image of a process (including its code and data
segments) in the case of systems implementing swapping.
• hold the pages swapped out of the main memory in the case of
paging systems.
Therefore, the amount of disk space required to serve as swap-
space may vary from a few megabytes to gigabytes. Though swap-
space enables the operating system to use more memory than is
available, excessive use of swap-space degrades the system
performance. This is because accessing disk is much slower than
accessing the main memory. Therefore, the swap-space should be
designed and implemented in such a way that it provides best
throughput for virtual memory systems.
Swap-space can reside with a normal file system and in this case
it is simply a large file within the file system. Thus, file-system
routines can be used to create it, name it, and to allocate its space.
This is a simple approach to implement swap-space but inefficient as
it requires extra disk accesses while navigating through the directory
structure and disk-allocation structures.
As an alternative, swap-space can reside on a separate disk
partition on which no file system or directory structure is placed. A
swap-space storage manager is used to allocate and de-allocate the
swap-space. Since data in the swap-space resides for much lesser
time and swap-space is accessed much more frequently, the storage
manager mainly focuses on speed rather than the storage efficiency.
The problem with this approach is that a fixed amount of swap-space
is created during disk partitioning and increasing the swap-space
requires re-partitioning of the disk resulting in deletion of the other file
system partitions. These partitions then need to be restored from
other backup media.

10.6 RAID STRUCTURE


The technology of semiconductor memory has been advancing at a
much higher rate than the technology of secondary storage. The
performance and capacity of semiconductor memory is much
superior to secondary storage. To match this growth in semiconductor
memory, a significant development is required in the technology of
secondary storage. A major advancement in this area is represented
by the development of RAID (Redundant Arrays of Independent
Disks). The basic idea behind RAID is to have a large array of small
independent disks. The presence of multiple disks in the system
improves the overall transfer rates, if the disks are operated in
parallel. Parallelizing the operation of multiple disks allows multiple
I/O to be serviced in parallel. This setup also offers opportunities for
improving the reliability of data storage, because data can be stored
redundantly on multiple disks. Thus, failure of one disk does not lead
to the loss of data. In other words, this large array of independent
disks acts as a single logical disk with improved performance and
reliability.
Note: Originally, RAID stands for Redundant Array of Inexpensive
Disks, since array of cheap and smaller capacity disks was used as
an alternative to large expensive disks. Those days the cost per bit of
data of a small disk was less than that of a large disk.

10.6.1 Improving Performance and Reliability


In order to improve disk performance, a concept called data striping
is used which utilizes parallelism. Data striping distributes the data
transparently among N disks, which makes them appear as a single
large, fast disk. Striping of data across multiple disks improves the
transfer rate as well, since operations are carried out in parallel. Data
striping also balances the load among multiple disks.
In the simplest form, data striping splits each byte of data into bits
and stores them across different disks. This splitting of each byte into
bits is known as bit-level data striping. Having 8-bits per byte, an
array of eight disks (or either a factor or multiple of eight) is treated as
one large logical disk. In general, bit i of each byte is written to the
ith disk. However, if an array of only two disks is used, all odd
numbered bits go to the first disk and even numbered bits to the
second disk. Since each I/O request is accomplished with the use of
all disks in the array, the transfer rate of I/O requests goes to N times,
where N represents the number of disks in the array.
Alternatively, blocks of a file can be striped across multiple disks.
This is known as block-level striping. Logical blocks of a file are
assigned to multiple disks. Large requests for accessing multiple
blocks can be carried out in parallel, thus improving data transfer
rate. However, transfer rate for the request of a single block is the
same as that of one disk. Note that the disks not participating in the
request are free to carry out other operations.
Having an array of N disks in a system improves the system
performance, however, lowers the overall storage system reliability.
The chance of failure of at least one disk out of a total of N disks is
much higher than that of a specific single disk. Assume that the mean
time to failure (MTTF) of a disk is about 1,50,000 hours (slightly over
16 years). Then, for an array of 100 disks, the MTTF of some disk is
only 150,000/100=1500 hours (about 62 days). With such short MTTF
of a disk, maintaining one copy of data in an array of N disks might
result in the loss of significant information. Thus, some solutions must
be employed to increase the reliability of such storage system. The
most acceptable solution is to have redundant information. Normally,
the redundant information is not needed; however, in case of disk
failure it can be used to restore the lost information of the failed disk.
One simple technique to keep redundant information is mirroring
(also termed as shadowing). In this technique, the data is
redundantly stored on two physical disks. In this way every disk is
duplicated, and all the data has two copies. Thus, every write
operation is carried on both the disks. During the read operation, the
data can be retrieved from any disk. If one disk fails, the second disk
can be used until the first disk gets repaired. If the second disk also
fails before the first disk is fully repaired, the data is lost. However,
occurrence of such event is very rare. The mean time to failure of the
mirrored disk depends on two factors.
1. Mean time to failure of the independent disks and,
2. Mean time to repair a disk. It is the time taken, on an average, to restore
the failed disk or to replace it, if required.
Suppose the failure of two disks is independent of each other.
Further, assume that the mean time to repair of a disk is 15 hours and
the mean time to failure of single disk is 150,000 hours, then the
mean time to data loss in the mirrored disk system is (150,000)2 /
(2×15) = 7.5×108 hours or about 85616 years.
An alternate solution to increase the reliability is storing the error-
correcting codes, such as parity bits and hamming codes. Such
additional information is needed only in case of recovering the data of
the failed disk. Error correcting codes are maintained in a separate
disk called check disk. The parity bits corresponding to each bit of N
disks are stored in the check disk.

10.6.2 RAID Levels


Data striping and mirroring techniques improve the performance and
reliability of a disk, respectively. However, mirroring is expensive and
striping does not improve reliability. Thus, several RAID
organizations, referred to as RAID levels, have been proposed which
aim at providing redundancy at lower cost by using the combination
of these two techniques. These levels have different cost–
performance trade-offs. RAID levels are classified into seven levels
(from level 0 to level 6), as shown in Figure 10.13, and are discussed
hereunder. To understand all the RAID levels consider a disk array
consisting of four disks.
• RAID level 0: This level uses block-level data striping but does
not maintain any redundant information. Thus, the write operation
has the best performance with level 0, as only one copy of the
data is maintained and no redundant information needs to be
updated. However, RAID level 0 does not have the best read
performance among all the RAID levels, since systems with
redundant information can schedule disk access based on the
shortest expected seek time and rotational delay. In addition,
RAID level 0 is not fault tolerant because the failure of just one
drive will result in loss of the data. However, the absence of
redundant information ensures 100 per cent space utilization for
RAID level 0 systems.
• RAID level 1: This is the most expensive system, as it maintains
duplicate or redundant copy of the data using mirroring. Thus, two
identical copies of the data are maintained on two different disks.
Every write operation needs to update both the disks, thus, the
performance of RAID level 1 system degrades while writing.
However, performance while read operation is improved by
scheduling request to the disk with the shortest expected access
time and rotational delay. With two identical copies of the data,
RAID level 1 system utilizes only 50 per cent of the disk space.
• RAID level 2: This level is known as error-correcting-code
(ECC) organization. Two most popular error detecting and
correcting codes are parity bits and Hamming codes. In memory
system, each byte is associated with a parity bit. The parity bit is
set to 0 if the number of bits in the byte that are set to 1 is even,
otherwise the parity bit is set to 1. If any one bit in the byte gets
changed, then parity of that byte does not match the stored parity
bit. In this way, use of parity bit detects all 1-bit errors in the
memory system. Hamming code has the ability to detect the
damaged bit. It stores two or more extra bits to find the damaged
bit and by complementing the value of damaged bit, the original
data can be recovered. RAID level 2 requires three redundant
disks to store error detecting and correcting information for four
original disks. Thus, the effective space utilization is only about 57
per cent in this case. However, space utilization increases with
the number of data disks because check disks grow
logarithmically with the number of data disks.
• RAID level 3: As discussed, RAID level 2 uses check disks to
hold information to detect the failed disk. However, disk
controllers can easily detect the failed disk and hence, check
disks need not contain the information to detect the failed disk.
RAID level 3 maintains only single check disk with parity bit
information for error detection as well as for correction. This level
is also named as bit-interleaved parity organization.
• Performances of RAID level 2 and RAID level 3 are very similar
but the latter has the lowest possible overheads for reliability. So,
in practice, level 2 is not used. In addition, level 3 has two
benefits over level 1. Whereas level 1 maintains one mirror disk
for every disk, level 3 requires only one parity disk for multiple
disks, thus, increasing the effective space utilization. In addition,
level 3 distributes the data over multiple disks, with N-way striping
of data, which makes the transfer rate for reading or writing a
single block by N times faster than level 1. Since every disk has
to participate in every I/O operation, RAID level 3 supports lower
number of I/O operations per second than RAID level 1.
• RAID level 4: Like RAID level 0, RAID level 4 uses block-level
striping. It maintains parity block on a separate disk for each
corresponding block from N other disks. This level is also named
as block-interleaved parity organization. To restore the block of
failed disk, blocks from other disks and corresponding parity block
is used. Requests to retrieve data from one block are processed
with only one disk, leaving the remaining disks free to handle
other requests. Writing a single block involves one data disk and
check disk. The parity block is required to be updated with each
write operation, thus, only one write operation can be processed
at a particular point of time.
Fig. 10.13 Representing RAID Levels

With four data disks, RAID level 4 requires just one check disk.
Effective space utilization for our example of four data disks is 80
per cent. As always, one check disk is required to hold parity
information, effective space utilization increases with the number
of data disks.
• RAID level 5: Instead of placing data across N disks and parity
information in one separate disk, this level distributes the block-
interleaved parity and data among all the N+1 disks. Such
distribution is advantageous in processing read/ write requests.
All disks can participate in processing read request, unlike RAID
level 4, where dedicated check disks never participate in read
request. So level 5 can satisfy more read requests in a given
amount of time. Since the bottleneck of a single check disk has
been eliminated, several write requests could also be processed
in parallel. RAID level 5 has the best performance among all the
RAID levels with redundancy. In our example of 4 actual disks,
RAID level 5 system has five disks in all, thus, the effective space
utilization for level 5 is same as in level 3 and level 4.
• RAID level 6: RAID level 6 is an extension of RAID level 5 and
applies P + Q redundancy scheme using Reed-Solomon codes.
Reed-Solomon codes enable RAID level 6 to recover from up to
two simultaneous disk failures. RAID level 6 requires two check
disks; however, like RAID level 5, redundant information is
distributed across all disks using block-level striping.

10.7 DISK ATTACHMENT


The disk can be attached with the computer system either through
local I/O ports on the host computer or through a network connection.
In the former case, disk storage is referred to as host-attached
storage, while in the latter case, it is referred to as network-attached
storage.

10.7.1 Host-attached Storage


A host-attached storage system is connected directly to the network
server. This storage is accessed only through local I/O ports, which
are available in many technologies. Normal desktop systems often
use IDE or ATA bus architecture whereas high technology systems
such as servers work on SCSI and fiber channel (FC) architectures.
In SCSI bus architecture, a ribbon cable consisting of large
number of conductors serves as the physical medium. The SCSI
protocol is capable of supporting up to16 devices including one
controller card and 15 storage devices. The controller card in the host
is SCSI initiator and the other devices are SCSI targets. The SCSI
protocol can address up to eight logical units in each SCSI target.
Fiber channel (FC) architecture uses optical fiber as its physical
medium and thus provides high speed data transfer between storage
devices. It defines a high speed protocol which was originally
developed for high-end workstations, large storage media, and high
performance desktop applications. Two variants of this architecture
include a switched fabric that provides 24-bit address space and an
arbitrated loop (FC-AL) that can address 126 devices. The fiber
channel allows maximum flexibility in I/O communications with large
address space and storage devices.
Note: Various storage devices, which are in use as host-attached
storage, include hard disk, CD, DVD, optical disk, magnetic disk, and
pen drive.

10.7.2 Network-attached Storage


A network-attached storage (NAS) describes a storage system
designed to separate storage resources from network and application
servers, in order to simplify storage management and improve the
reliability, performance, and efficiency of the network. It commonly
supports NFS (Network File System) and CIFS (Common Internet
File System). NAS is ideal for storing local databases and for keeping
the backup of workstation data.
NAS facilitates all the computers in the network to access the
storage with the same ease as in case of local host-attached storage.
However, it is less efficient and slows down system performance as
compared to host-attached storage.

10.8 STABLE STORAGE


As discussed earlier, disks may sometimes make errors that damage
good sectors; or even the entire drive may fail. Though, to some
extent, RAID provides protection against good sectors becoming bad
or against drives failure, it cannot provide protection against system
crash during disk writes, which results in inconsistency on the disk.
Ideally, a disk should always work error free. However, practically,
it cannot be achieved. The only achievable thing is a disk subsystem
termed as stable storage which ensures whenever a write command
is issued to it; the disk performs it either completely or not at all. To
ensure this, the stable storage system maintains two physical blocks
per each logical block and a write operation is performed in the
following steps.
1. Data is written to the first physical block.
2. After the write to first physical block has been performed successfully,
data is written in the second physical block.
3. After the write to second physical block has been performed successfully,
the operation is declared to be complete.
During recovery, the recovery procedure checks both the physical
blocks and the following likelihoods may result.
• Both blocks contain no detectable error. In this case, no further
action is required.
• One of the two blocks contains a detectable error. In this case, the
contents of the erroneous block are replaced with that of other
block.
• Neither of two blocks contains a detectable error but they differ in
their content. In this case, the contents of the first block are
replaced with those of the second block.
In this way, the recovery procedure guarantees the write operation
on stable storage either to be performed successfully or not to be
performed at all.

10.9 TERTIARY STORAGE


Tertiary storage, also known as tertiary memory, is built from
inexpensive disks and tape drives that use removable media. Due to
relatively low speeds of tertiary storage systems, they are primarily
used for storing data that is to be accessed less frequently. In this
section, we will discuss various tertiary storage devices.

10.9.1 Removable Disks


Removable disks are one kind of tertiary storage. An example of
removable magnetic disk is the floppy disk. A floppy disk is a round,
flat piece of Mylar plastic coated with ferric oxide (a rust-like
substance containing tiny particles capable of holding a magnetic
field) and encased in a protective plastic cover (disk jacket). Common
floppy disks can hold only about 1 MB of data. Due to limited storage
capacity, the floppy disks have become outdated. Nowadays, other
kinds of removable disks including optical disks and magneto-optical
disks are in use.

Optical Disks
An optical disk is a flat, circular, plastic disk coated with a material on
which bits may be stored in the form of highly reflective areas and
significantly less reflective areas, from which the stored data may be
read when illuminated with a narrow-beam source, such as a laser
diode. The optical disk storage system consists of a rotating disk
coated with a thin layer of metal (aluminium, gold, or silver) that acts
as a reflective surface and a laser beam, which is used as a
read/write head for recording data onto the disk. Compact disk (CD)
and digital versatile disk (DVD) are the two forms of optical disks.

Compact Disk (CD)


A CD is a shiny, silver coloured metal disk of 12 cm in diameter. It is
available in various formats: CD-ROM (Compact Disk-Read Only
Memory), CD-R (Compact Disk-Recordable), and CD-RW (Compact
Disk-ReWritable) disks. A CD-ROM disk comes with prerecorded
data that can be read but altered. CD-R is a type of WORM (Write
Once-Read Many) disk that allows you to record your own data. Once
written, the data on the CD-R can be read but cannot be altered. A
CD-RW disk is rewritable version of CD-R that means, it allows
writing, erasing, and rewriting of data several times.
A CD is made up of three coatings, namely, polycarbonate plastic,
aluminium, and an acrylic coating to protect the disk from external
scratches and dust. The polycarbonate plastic is stamped with
millions of tiny indentations (pits). A light is beamed from a semi-
conductor laser through the bottom of the polycarbonate layer, and
aluminium coating monitors the light being reflected. Since the CD is
read through the bottom of the disk, each pit appears as an elevated
bump to the reading light beam. The light striking the land areas (the
areas without bumps) is reflected normally and detected by a
photodiode. As the disk rotates at a speed between 200 and 500 rpm,
the light bounces off the pits changing its frequency. The reflected
light then passes through a prism and onto a photo sensor. Light
reflected from a pit is 180 degrees out of phase with the light from the
lands, and the differences in intensity are measured by the
photoelectric cells, which convert it into a corresponding electrical
pulse.
The entire surface of a new CD-R disk is reflective; the laser can
shine through the dye and reflect off the gold layer. Hence, for a CD-
R disk to work there must be a way for a laser to create a non-
reflective area on the disk. Therefore, it has an extra layer that the
laser can modify. This extra layer is a greenish dye. When you write
some data to a CD-R, the writing laser (which is much more powerful
than the reading laser) heats up the dye layer and changes its
transparency. The change in the dye creates an equivalent of a non-
reflective bump. The decomposition of the dye in the pit area through
the heat of the laser is irreversible (permanent). Therefore, once a
section of a CD-R is written on, it cannot be erased or rewritten.
However, both CD and CD-R drives can read the modified dye as a
bump later on.
In contrast to the CD-R disk, a CD-RW disk is erasable and
rewritable because it uses phase-changing material on its recording
layer usually an alloy of silver, tellurium, indium, and antimony metals.
Phase-changing material changes its state when heated to a high
temperature (above its melting point) and can be converted back to
its original state when heated at a temperature slightly below its
melting point.
In a CD-RW disk, the recording layer initially has a polycrystalline
structure. While writing to the disk, the laser heats up the selected
areas above the melting point, which melts the crystals into non-
crystalline amorphous phase. These areas have lower reflectance
than the remaining crystalline areas. This difference in reflectance
helps in reading the recorded data as in the case of CD-R disk.
The process of erasing data on a CD-RW disk is called annealing.
During this process, the area on the layer that has been changed to
the amorphous phase (during writing) is converted back to its original
crystalline state by heating to a temperature slightly below the melting
point of phase-changing material.

Digital Versatile Disk (DVD)


DVD, initially called digital video disk, is a high-capacity data storage
medium. At a first glance, a DVD can easily be mistaken for a CD as
both are plastic disks 120 mm in diameter and 1.2 mm thick and both
rely on lasers to read data. However, the DVD’s seven-fold increase
in data capacity over the CD has been largely achieved by tightening
up the tolerances throughout the predecessor system. Like CDs,
DVDs are also available in different formats: DVD-ROM, DVD-R, and
DVD-RW.
In DVDs, the tracks are placed closer together, thereby allowing
more tracks per disk. The DVD’s track pitch (the distance between
two tracks) is 0.74 micron, which is less than half of that of a CD,
which is 1.6 microns. The pits, in which the data is stored, are also a
lot smaller, thus allowing more pits per track. The minimum pit length
of a single layer DVD is 0.4 micron as compared to 0.834 micron for a
CD. With the number of pits having a direct bearing on capacity
levels, the DVD’s reduced track pitch and pit size alone give DVDs
four times storage capacity than CDs.

Magneto-optical Disk
As implied by the name, these disks use a hybrid of magnetic and
optical technologies. A magneto-optical disk writes magnetically (with
thermal assist) and reads optically using the laser beam. A magneto-
optical disk drive is so designed that an inserted disk will be exposed
to a magnet on the label side and to the light (laser beam) on the
opposite side. The disks, which come in 3½-inch and 5¼-inch
formats, have a special alloy layer that has the property of reflecting
laser light at slightly different angles depending on which way it is
magnetized, and data can be stored on it as north and south
magnetic spots, just like on a hard disk.
While a hard disk can be magnetized at any temperature, the
magnetic coating used on the magneto-optical media is designed to
be extremely stable at room temperature, making the data
unchangeable unless the disk is heated to above a temperature level
called the Curie point (usually around 200º C). Instead of heating the
whole disk, magneto-optical drives use a laser to target and heat only
specific regions of the magnetic particles. This accurate technique
enables magneto-optical media to pack in a lot more information than
the other magnetic devices. Once heated, the magnetic particles can
easily have their direction changed by a magnetic field generated by
the read/write head. Information is read using a less powerful laser,
making use of the Kerr effect, where the polarity of the reflected light
is altered depending on the orientation of the magnetic particles.
Where the laser/magnetic head has not touched the disk, the spot
represents a ‘0’, and the spots where the disk has been heated up
and magnetically written will be seen as data ‘1’. However, this is a
‘two-pass’ process, which coupled with the tendency for magneto-
optical heads to be heavy, resulted in early implementations that were
relatively slow. Nevertheless, magneto-optical disks can offer very
high capacity and cheap media as well as top archival properties,
often being rated with an average life of 30 years, which is far longer
than any magnetic media.

10.9.2 Magnetic Tapes


Magnetic tapes appear similar to the tapes used in music cassettes.
These are plastic tapes with magnetic coating on them. The data is
stored in the form of tiny segments of magnetized and demagnetized
portions on the surface of the material. Magnetized portion of the
surface refers to the bit value ‘1’, whereas the demagnetized portion
refers to the bit value ‘0’. Magnetic tapes are available in different
sizes, but the major difference between different magnetic tape units
is the speed at which the tape is moved past the read/write head and
the tape’s recording density. The amount of data or the number of
binary digits that can be stored on a linear inch of tape is the
recording density of the tape.
Magnetic tapes are very durable and can be erased as well as
reused. They are a cheap and reliable storage medium for organizing
archives and taking backups. However, they are not suitable for data
files that need to be revised or updated often because data on them
is stored in a sequential manner. In this situation, the user will need to
advance or rewind the tape every time to the position where the
requested data starts. Tapes are also slow due to the nature of the
medium. If the tape stretches too much, then it will render it unusable
for data storage and may result in data loss. The tape now has a
limited role because the disk has proved to be a superior storage
medium than it. Today, the primary role of the tape drive is limited to
backing up or duplicating the data stored on the hard disk to protect
the system against data loss during power failures or computer
malfunctions.

LET US SUMMARIZE
1. A magnetic disk is the most commonly used secondary storage medium. It
offers high storage capacity and reliability. Data is represented as
magnetized spots on a disk. A magnetized spot represents 1 and the
absence of a magnetized spot represents 0.
2. A magnetic disk consists of plate/platter, which is made up of metal or
glass material, and its surface is covered with magnetic material to store
data on its surface.
3. Disk surface of a platter is divided into imaginary tracks and sectors.
Tracks are concentric circles where the data is stored, and are numbered
from the outermost to the innermost ring, starting with zero. A sector is
just like an arc that forms an angle at the center. It is the smallest unit of
information that can be transferred to/from the disk.
4. A disk contains one read/write head for each surface of a platter, which is
used to store and retrieve data from the surface of the platter. All the
heads are attached to a single assembly called a disk arm.
5. Transfer of data between the memory and the disk drive is handled by a
disk controller, which interfaces the disk drive to the computer system.
Some common interfaces used for disk drives on personal computers and
workstations are SCSI (small-computer-system-interface; pronounced
“scuzzy”), ATA (AT attachment) and SATA (serial ATA).
6. The process of accessing data comprises three steps, namely, seek,
rotate, and data transfer. The combined time (seek time, latency time, and
data transfer time) is known as the access time of the disk. Specifically, it
can be described as the period of time that elapses between a request for
information from the disk or memory and the information arriving at the
requesting device.
7. Reliability of the disk is measured in terms of the mean time to failure
(MTTF). It is the time period for which the system can run continuously
without any failure.
8. Several algorithms have been developed for disk scheduling, which are
first-come, first served (FCFS), shortest seek time first (SSTF), SCAN,
LOOK, C-SCAN and C-LOOK algorithms.
9. Before the disk can be used for storing data, all its platters must be
divided into sectors (that disk controller can read and write) using some
software. This process is called low-level (or physical) formatting, which is
usually performed by the manufacturer.
10. After physical formatting, logical (or high-level) formatting is to be
performed for each partition of the disk. During logical formatting, the
operating system stores initial file-system data structures and a boot block
on the disk. After logical formatting, the disk can be used to boot the
system and store the data.
11. Due to manufacturing defects, some sectors of a disk drive may
malfunction during low-level formatting. Some sectors may also become
bad during read or write operations with the disk due to head crash.
12. There are several ways of handling bad sectors. On some simple disks,
bad sectors need to be handled manually by using, for instance, format
command or chkdsk command of MS-DOS. However, in modern disks
with advanced disk controller, other schemes including sector sparing and
sector slipping can be used.
13. Swap-space is used in different ways by different operating systems
depending upon the memory management algorithms. The amount of disk
space required to serve as swap-space may vary from a few megabytes
to the level of gigabytes.
14. A major advancement in secondary storage technology is represented by
the development of RAID (Redundant Arrays of Independent Disks). The
basic idea behind the RAID is to have a large array of small independent
disks. The presence of multiple disks in the system improves the overall
transfer rates, if the disks are operated in parallel.
15. In order to improve the performance of a disk, a concept called data
striping is used which utilizes parallelism. Data striping distributes the data
transparently among N disks, which make them appear as a single large,
fast disk.
16. Several kinds of RAID organization, referred to as RAID levels, have been
proposed which aim at providing redundancy at low cost. These levels
have different cost–performance trade-offs. The RAID levels are classified
into seven levels (from level 0 to level 6).
17. The disk of a computer system contains bulk of data which can be
accessed by the system either directly through I/O ports (host-attached
storage) or through a remote system connected via a network (network-
attached storage).
18. Ideally, a disk should always work without producing any errors. However,
practically, it cannot be achieved. The only achievable thing is a disk
subsystem called stable storage which ensures whenever a write is
performed to the disk; it is performed either completely or not at all.
19. Tertiary storage, also known as tertiary memory, is built from inexpensive
disks and tape drives that use removable media. Due to relatively low
speeds of tertiary storage systems, they are primarily used for storing
data that is to be accessed less frequently. Some examples of tertiary
storage devices include floppy disk, optical disk, magneto-optical disk,
and magnetic tape.
EXERCISES

Fill in the Blanks


1. Disk surface of a platter is divided into imaginary _____________ and
_____________.
2. Reliability of a disk is measured in terms of _____________.
3. _____________ involves logical replacement of a bad sector with one of
the spare sectors in the disk.
4. In order to improve the performance of the disk, a concept called
_____________ is used which utilizes parallelism.
5. In RAID, one simple technique to keep redundant information is
_____________.

Multiple Choice Questions


1. The time taken to position read/write heads on specific track is known as
_____________.
(a) Rotational delay
(b) Seek time
(c) Data transfer time
(d) Access time
2. Which of the following RAID levels uses block-level striping?
(a) Level 0
(b) Level 1
(c) Level 3
(d) Level 6
3. Which of the following is not a tertiary storage device?
(a) Magnetic tape
(b) Optical disk
(c) Floppy disk
(d) Hard disk
4. Which of the following techniques is used for handling bad sectors?
(a) Forwarding
(b) Sector sparing
(c) Both (a) and (b)
(d) None of these
5. Which of the following disk scheduling algorithms suffers from starvation
problem?
(a) FCFS
(b) SSTF
(c) SCAN
(d) LOOK

State True or False


1. SSTF algorithm suggests operating system to select the request for
cylinder which is closest to the current head position.
2. After physical formatting, the disk can be used to boot the system and
store the data.
3. The controller card in the host is SCSI initiator and the other devices are
SCSI targets.
4. Due to relatively low speeds of tertiary storage systems, they are primarily
used for storing data that is to be accessed less frequently.
5. Excessive use of swap space degrades the system performance.

Descriptive Questions
1. Give hardware description and various features of a magnetic disk. How
do you measure its performance?
2. How does LOOK algorithm differ from SCAN algorithm?
3. In which ways can the swap-space be used by the operating system?
4. Explain why SSTF scheduling tends to favour middle cylinders over the
innermost and outermost cylinders.
5. Consider a disk drive having 200 cylinders, numbered from 0 to 199. The
head is currently positioned at cylinder 53 and moving toward the cylinder
199. The queue of pending I/O requests is: 98, 183, 37, 122, 14, 124, 65,
67.
Starting from the current head position, what is the total head
movement (in cylinders) to service the pending requests for each
of the following disk-scheduling algorithms?
(a) FCFS
(b) SSTF
(c) SCAN
(d) LOOK
(e) C-SCAN
(f) C-LOOK
6. Compare and contrast the sector sparing and sector slipping techniques
for managing bad sectors.
7. Define the following:
(a) Disk latency
(b) Seek time
(c) Head crash
(d) MTTF
8. Define RAID. What is the need of having RAID technology?
9. How can the reliability and performance of disk be improved using RAID?
Explain different RAID levels.
10. How does stable storage ensure consistency on the disk during a failure?
11. Write short notes on the following:
(a) Tertiary storage
(b) Swap-space management
(c) Disk formatting
(d) Disk attachment
chapter 11

File Systems

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the concept of file.
⟡ Discuss the aspects related to files such as file attributes, operations,
structures, access, and so on.
⟡ Discuss various types of directory structures.
⟡ Explain file-system mounting and unmounting.
⟡ Discuss the concept of record blocking.
⟡ Understand the concept of file sharing, and issues related to it.
⟡ Explain the protection mechanisms required for protecting files in
multi-user environment.

11.1 INTRODUCTION
Computer applications require large amounts of data to be stored, in
a way that it can be used as and when required. For this, secondary
storage devices such as magnetic disks, magnetic tapes and optical
discs are used. The storage of data on the secondary storage
devices makes the data persistent, that is, the data is permanently
stored and can survive system failures and reboots. In addition, a
user can access the data on these devices as per his/her
requirement.
The data on the disks are stored in the form of files. To store and
retrieve files on the disk, the operating system provides a mechanism
called file system, which is primarily responsible for the management
and organization of various files in a system. The file system consists
of two main parts, namely, a collection of files and a directory
structure. The directory structure is responsible for providing
information about all the files in the system. In this chapter, we will
discuss various aspects related to the file system.

11.2 FILES: BASIC CONCEPT


The operating system makes data storage and retrieval easier by
providing an abstract view of storage devices by hiding their internal
structure so that the users can directly access the data (on physical
devices) without exactly knowing where and how the data is actually
stored. The operating system defines a logical storage unit known as
a file, and all the data is stored in the form of files.
A file is a collection of related data stored as a named unit on the
secondary storage. It can store different types of data like text,
graphic, database, executable code, sound, and videos, and on the
basis of the data, a file can be categorized as data file, graphic file,
database file, executable file, sound file, video file, etc. Moreover, the
structure of a file is based on the type of the file. For example, a
graphic file is an organized collection of pixels, a database file is a
collection of tables and records, and a batch file is a collection of
commands.
Each file is associated with some attributes, such as its name,
size, type, location, date and time, etc. In this section, we will discuss
the properties possessed by a file, the operations performed on them,
various types of files that can be stored on a computer, and so on.
Note: From a user’s point of view, it is not possible to write data
directly to storage device until it is within a file.

11.2.1 File Attributes


A file in a system is identified by its name. The file name helps a user
to locate a specific file in the system. Different operating systems
follow different file naming conventions. However, most operating
systems accept a file name as a string of characters, or numbers or
some special symbols as well. For instance, names such as alice,
tom, 3546, !hello and table2-1 are all valid file names. Note that
some operating systems distinguish the upper and lower case
characters in the file names. For instance, in UNIX the file names
Alice, alice, ALICE refer to three different files, whereas in DOS and
Windows they refer to the same file.
Apart from the file name, some additional information (also known
as file attributes) is also associated with each file. This information
helps the file system to manage a file within the system. The file
attributes related to a file may vary in different operating systems.
Some of the common file attributes are as follows.
• Name: Helps to identify and locate a file in the system.
• Size: Stores information about the current size of the file (in bytes,
words, or blocks).
• Type: Helps the operating system to recognize and use the
recommended program to open a particular file type. For
instance, to open an mpeg (multimedia) file, the operating system
uses a media player.
• Identifier: A unique tag, usually a number that helps the file
system to recognize the file within the file system.
• Location: A pointer that stores location information of the device
and location of the file on that device.
• Date and Time: Stores information related to a file, such as
creation, last modification and last use. Such information may be
useful in cases of protection, security, monitoring, etc.
• Protection: Stores information about the access permissions
(read, write, execute) of different users. For example, it may
specify who can access the file and which operations can be
performed on a file by a user.
Figure 11.1 shows the list of some attributes that MS-DOS attaches to
a file.

Fig. 11.1 File Attributes in MS-DOS

The information related to a file is stored as a directory entry in


the directory structure. The directory entry includes the file’s name
and the unique identifier. The identifier in turn locates the other file
attributes.

11.2.2 File Operations


File operations are the functions that can be performed on a file. The
operating system handles the file operations through the use of
system calls. The various operations that can be performed on a file
are: create, write, read, seek, delete, open, append, rename and
close a file.
• Create a file: To bring a file into existence, the create system call
is used. When this system call is used, the operating system
searches for free space in the file system and allocates it to the
file. In addition, the operating system makes a directory entry to
record the name, location and other pieces of information about
the file.
• Open a file: To open a file, the open system call is used which
accepts the file name and the access mode (read, write, execute)
as parameters and returns a pointer to the entry in the open-file
table (a table in the main memory that stores information about
the files that are open at a particular time). The operating system
searches the directory entry table for the file name and checks if
the access permission in the directory entry matches the request.
If that access mode is allowed, it then copies the directory entry of
the file to the open-file table.
• Write to a file: To store data into a file, the write system call is
used which accepts the file name and the data to be written to the
file as parameters. The operating system searches the directory
entry to locate the file and writes the data to the specified position
in the file and also updates the write pointer to the location where
the next write operation is to take place.
• Read a file: To retrieve data from a file, the read system call is
used which accepts the file name, amount of data to be read and
a read pointer to point to the position from where the data is to be
read as parameters. The operating system searches the specified
file using the directory entry, performs the read operation and
updates the pointer to the new location. Note that since a process
may be only reading or writing a file at a time, a single pointer
called current position pointer can be used for both reading and
writing. Every time a read or write operation is performed, this
pointer must be updated.
• Seek file: To position the pointer to a specific position in a file, the
seek system call is used. Once the pointer is positioned, data can
be read from and written to that position.
• Close a file: When all the operations on a file are completed, it
must be closed using the close system call. The operating system
searches and erases the file entry from the open-file table to
make space for new file entries. Some systems automatically
close a file when the process that has opened the file terminates.
• Delete a file: When a file is not required, the delete system call is
used. The operating system searches the file name in the
directory listing. Having found the associated entry, it releases all
space allocated to the file (that can now be used by other files) by
erasing its corresponding directory entry.
• Append a file: To add data at the end of an existing file, append
system call is used. This system call works similar to the write
system call, except that it positions the pointer to the end of the
file and then performs the write operation.
• Rename file: To change the name of an existing file, rename
system call is used. This system call changes the existing entry
for the file name in the directory to a new file name.

11.2.3 File Types


We can store different types of files in a computer. The operating
system can handle a file in a reasonable way only if it recognizes and
supports that file type. A user request to open an executable file with
a text editor will only produce garbage if the operating system has not
been told that it is an executable file.
The most common technique to implement a file type is by
providing extension to a file. The file name is divided into two parts,
with the two parts separated by a period (‘.’) symbol, where the first
part is the name and the second part after the period is the file
extension. A file extension is generally one to three characters long,
it indicates the type of the file and the operations (read, write,
execute) that can be performed on that file. For example, in the file
name Itlesl.doc, Itlesl is the name and .doc is the file extension. The
extension .doc indicates that Itlesl.doc is a document file and should
be opened with an editor. Similarly, a file with .exe or .com extension is
an executable file. Table 11.1 lists various file types, extension and
their meaning.
File extensions help the operating system to know about the
application program that has created the file. For instance, the file
with .txt extension will be opened with a text editor and the file with
.mp3 extension will be opened with a music player supporting the .mp3
files. Note that the operating system automatically opens the
application program (for the known file types) whenever a user
double clicks the file icon.
Table 11.1 File Types and Extensions
Some operating systems, such as UNIX, support the use of double
extension to a file name. For example, the file name file1.c.z is a
valid file name, where .c reveals that file1 is a C language file and .z
reveals that the file is compressed using some zip program. A file
extension can be system defined or user defined.
Another way to implement the file type is the use of magic
number. A magic number is a sequence of bits, placed at the
beginning of a file to indicate roughly the type of file. The UNIX
system makes use of the magic number to recognize the file type.
However, not all its files have magic numbers. To help its users to
determine the type of contents of the file, it allows file-name-
extension hints.
11.2.4 File Structure
The file structure refers to the internal structure of the file, that is, how
a file is internally stored in the system. The most common file
structures recognized and enforced by different operating systems
are as follows.
• Byte sequence: In this file structure, each file is made up of a
sequence of 8-bit bytes [see Figure 11.2 (a)] having no fixed
structure. The operating system does not attach any meaning to
the file. It is the responsibility of the application program to include
a code to interpret the input file into an appropriate structure. This
type of file structure provides flexibility to the user programs as
they can store any type of data in the files and name these files in
any way as per their convenience. UNIX operating systems
support this type of file structure.
• Record sequence: In this file structure, a file consists of a
sequence of fixed length records, where arbitrary number of
records can be read from or written to a file. The records cannot
be inserted or deleted in the middle of a file. In this system, the
read operation returns one record and the write operation
appends or overwrites one record. CP/M operating system
supports this type of scheme.
Fig. 11.2 File Structures

• Tree structure: In this file structure, a file consists of a tree of disk


blocks, where each block holds a number of records of varied
lengths. Each record contains a key field at a fixed position. The
records are searched on key value and new records can be
inserted anywhere in the file structure. This type of file structure is
used on mainframe systems, where it is called ISAM (Indexed
Sequential Access Method).
Regardless of the file structure used, all disk I/O operations take
place in terms of blocks (physical records), where all blocks are of
equal size and the size of a block is generally determined by the size
of the sector. Since disk space to a file is allocated in blocks, some
portion of the last block in a file is generally wasted. For instance, if
each block is of 512 bytes, then a file of 3150 bytes would be
allocated seven blocks, and the last 434 bytes will be wasted. The
wastage of bytes to keep everything in units of blocks (instead of
bytes) is internal fragmentation. Note that all file systems face internal
fragmentation and with larger block sizes, there is more internal
fragmentation.

11.2.5 File Access


The information stored in the file can be accessed in one of the two
ways: sequential access or direct access.

Sequential Access
When the information in the file is accessed in the order of one record
after the other, it is called sequential access. It is the easiest file
access method. Compilers, multimedia applications, sound files and
editors are the most common examples of the programs using
sequential access.
The most frequent and common operations performed on a file
are read and write. In the case of read operation, the record at the
location pointed by the file pointer is read and the file pointer is then
advanced to the next record. Similarly, in the case of write operation,
the record is written to the end of the file and the pointer is advanced
to the end of new record.

Direct Access
With the advent of disks as a storage medium, large amounts of data
can be stored on them. Sequential access of this data would be very
lengthy and a slow process. To overcome this problem, the data on
the disk is stored as blocks of data with index numbers which helps to
read and write data on the disk in any order (known as random or
direct access).
Under direct access, a file is viewed as a sequence of blocks (or
records) which are numbered. The records of a file can be read or
written in any order using this number. For instance, it is possible to
read block 20, then write block 4, and then read block 13. The block
number is a number given by the user. This number is relative to the
beginning of the file. This relative number internally has an actual
absolute disk address. For example, the record number 10 can have
the actual address 12546 and block number 11 can have the actual
address 3450. The relative address is internally mapped to the
absolute disk address by the file system. The user gives the relative
block number for accessing the data without knowing the actual disk
address. Depending on the system, this relative number for a file
starts with either 0 or 1.
In direct access, the system calls for read and write operations
are modified to include the block number as a parameter. For
instance, to perform the read or write operation on a file, the user
gives read n or write n (n is the block number) rather than read next
or write next system calls used in sequential access.
Most applications with large databases require direct access
method for immediate access to large amounts of information. For
example, in a railway reservation system, if a customer requests to
check the status for reservation of the ticket, the system must be able
to access the record of that customer directly without having the need
to access all other customers’ records.
Note that for accessing the files, an operating system may support
either sequential access or direct access, or both. Some systems
require a file to be defined as sequential or direct when it is created;
so that it can be accessed in the way it is declared.

11.3 DIRECTORIES
As stated earlier, a computer stores numerous data on the disk. To
manage this data, the disk is divided into one or more partitions (also
known as volumes) and each partition contains information about the
files stored in it. This information is stored in a directory (also known
as device directory). In simplest terms, a directory is a flat file that
stores information about files and subdirectories.
In this section, we will discuss some most commonly used
directory structures, including single-level, two-level, and hierarchical
directory as well various operations that can be performed over
directories.

11.3.1 Single-level Directory System


Single-level directory is the simplest directory structure. There is only
one directory that holds all the files. Sometimes, this directory is
referred to as root directory. Figure 11.3 shows a single-level
directory structure having five files. In this figure, the box represents
directory and circles represent files.

Fig. 11.3 Single-level Directory Structure

The main drawback of this system is that no two files can have
the same name. For instance, if one user (say, jojo) creates a file
with name file1 and then another user (say, abc) also creates a file
with the same name, the file created by the user abc will overwrite the
file created by the user jojo. Thus, all the files must have unique
names in a single-level directory structure. With the increase in the
number of files and users on a system, it becomes very difficult to
have unique names for all the files.

11.3.2 Two-level Directory System


In a two-level directory structure, a separate directory known as user
file directory (UFD) is created for each user. Whenever a new UFD
is created, an entry is added to the master file directory (MFD)
which is at the highest level in this structure (see Figure 11.4). When
a user refers to a particular file, first, the MFD is searched for the
UFD entry of that user and then the file is searched in the UFD.
Unlike single-level directory structure, in a two-level directory
system the file names should be unique only within a directory. That
is, there may be files with same name in different directories. Thus,
there will not be any problem of name-collision in this directory
structure but the disadvantage is that the users are not allowed to
access files of other users. If a user wants to access a file of other
user(s), special permissions will be required from the administrator. In
addition, to access other users’ files, the user must know the other
user’s name and the desired file name. Note that different systems
use different syntax for file naming in directories. For instance, in MS-
DOS, to access the file in the sm directory, the user gives //comp/sm,
where // refers to the root, comp is the user name, sm is the directory.

Fig. 11.4 Two-level Directory Structure

In some situations, a user might need to access files other than its
own files. One such situation might occur with system files. The user
might want to use system programs like compilers, assemblers,
loaders, or other utility programs. In such a case, to copy all the files
in every user directory would require a lot of space and thus, would
not be feasible. One possible solution to this is to make a special user
directory and copy system files into it. Now, whenever a filename is
given, it is first searched in the local UFD. If not found then the file is
searched in the special user directory that contains system files.
11.3.3 Hierarchical Directory System
The hierarchical directory, also known as tree of directory or tree-
structured directory, allows users to have subdirectories under their
directories, thus making the file system more logical and organized
for the user. For instance, a user may have directory furniture, which
stores files related to the types of furniture, say wooden, steel, cane,
etc. Further, he wants to define a subdirectory which states the kind
of furniture available under each type, say sofa, bed, table, chair, etc.
Under this system, the user has the flexibility to define, group and
organize directories and subdirectories according to his requirements.

Fig. 11.5 Hierarchical Directory Structure

Hierarchical directory structure has the root directory at the


highest level, which is the parent directory for all directories and
subdirectories. The root directory generally consists of system library
files. All files or directories at lower levels are called child directories
and a directory with no files or subdirectory is called a leaf. Every file
in the system has a unique path name. A path name is the path from
the root, through all the subdirectories, to a specified file. Figure 11.5
shows the hierarchical directory structure having different levels of
directories, subdirectories and related files.
In this structure, the major concern is the deletion of files. If a
directory is empty it can simply be deleted, however, if the directory
contains subdirectories and files, they need to be handled first. Some
systems, for example MS-DOS, require a directory to be completely
empty before a delete operation can be performed on them. The user
needs to delete all the files, subdirectories, files in subdirectories
before performing the delete operation on a directory. Some systems,
for example UNIX, are flexible as they allow users to delete a
complete directory structure containing files and subdirectory with a
single rm command. Though it is easy for a user to handle delete
operation on directory under the UNIX system, it increases the risk of
accidental deletion of files.
Note: MS-DOS, Windows, and UNIX are some of the systems using
hierarchical directory structure.

Pathnames
Under hierarchical directory system, a user can access files of other
users in addition to its own files. To access such files, the user needs
to specify either the absolute path name or the relative path name.
The absolute path name begins at the root and follows a path down
to the specified file, whereas the relative path name defines a path
from the current working directory. For instance, to access a file under
directory D1, using absolute path name, the user will give the path
\\bin\D8\D1\filename. On the other hand, if the user’s current
working directory is \\bin\D8, the relative path name will be
D1\filename.

11.3.4 Directory Operations


Different operations that can be performed on a directory are as
follows:
• Create a file: New files can be created and added to a directory
by adding a directory entry in it.
• Search a file: Whenever a file is required to be searched, its
corresponding entry is searched in the directory.
• List a directory: All the files along with their contents in the
directory entry are listed.
• Rename a file: A file can be renamed. A user might need to
rename a file with change in its content. When a file is renamed
its position within the directory may also change.
• Delete a file: When a file is no longer required, it can be deleted
from the directory.
• Traverse the file system: Every directory and every file within a
directory structure can be accessed.

11.4 FILE-SYSTEM MOUNTING


A disk is partitioned into many logical disks (partitions); and on each
partition there exists a file system. To access files of a file system, it is
required to mount the file system. Mounting a file system means
attaching the file system to the directory structure of the system.
To implement mounting, it is required to specify the root of the file
system to be mounted and the mount point to the operating system.
Mount point specifies the location within the file structure at which the
file system is to be attached. Typically, mount points are empty files in
the file system hierarchy which are meant for mounting purpose only.
The mount operation is carried out on the file system by command
mount file system name (<Filesystem_name>), <mount_point_name>).
Once the file system is mounted, a file (say, myfile) with its relative
directory (say, mypath) in directory Filesystem_name can be accessed
using the pathname <mount_point_name>/mypath/ myfile.
To understand file-system mounting, let us consider the file
system shown in Figure 11.6, where (a) shows the file system before
mounting and (b) shows the file system after the mount operation
has been performed. Now, the
mount(department, company/manager)
employee file can be accessed through the pathname company/
manager/marketing/employee.

Fig. 11.6 Mounting of a File System

The effect of mounting lasts until the file system is unmounted.


Unmounting a file system means detaching a file system from the
system’s directory structure. The unmount operation is carried out by
command unmount (<Filesystem_name>, <mount_point_name>). For
example, if the unmount operation unmount (department,
company/manager) is performed on the file system shown in Figure
11.6(b), the file system will be restored to the file system shown in
Figure 11.6(a). Note that the files of the mounted file system must be
closed in order to carry out the unmount operation successfully.
Note: A file system is unmounted automatically whenever a system is
rebooted.

11.5 RECORD BLOCKING


Whenever a user or an application performs an operation on a file, it
is performed on the record level; whereas I/O is performed on the
block basis. Thus, for performing I/O, the records must be organized
as blocks. That is, the records of a file must be blocked when they
are written onto the secondary storage, and when they are read, they
must be unblocked before presenting them to the user. To implement
record blocking, two issues must be considered:
• Fixed-length or variable-length blocks: Most of the systems
support fixed-length blocks in order to simplify I/O operations,
buffer allocation in the main memory, and organization of blocks
on the secondary storage.
• Block size: The larger the size of the block, more records can be
transferred in a single I/O operation. This improves the
processing speed in sequential search as using larger blocks
reduces the number of I/O operations. However, if the records are
being accessed randomly with no particular locality of reference,
then larger blocks are not much useful as they would result in
unnecessary transfer of unused records. However, if the
frequency of sequential access is combined with the potential of
locality of reference, then larger blocks prove faster. Another
major concern of using larger blocks is that they make buffer
management a difficult task because they require large I/O
buffers in the main memory to accommodate their large size.
Three methods of record blocking are used depending on the size
of the block (see Figure 11.7). These methods are:
• Fixed blocking: In this type of blocking, fixed-length records are
used and an integral number of records is kept in each block. This
may lead to some unused space at the end of each block (internal
fragmentation). Fixed blocking is mainly used for sequential files
with fixed-length records.
• Variable-length spanned blocking: This type of blocking
accommodates variable-length records into blocks. In this
approach, the last record in a block may span to the next block if
the length of the record is larger than the space left in the current
block. Thus, this approach does not waste space, rather it
efficiently utilizes storage space. However, it is difficult to
implement as two I/O operations are required for the records
spanning two blocks.
• Variable-length unspanned blocking: This blocking uses
variable-length records without spanning due to which a lot of
space is left unused in most of the blocks as the remaining
portion of a block cannot hold the next record if the size of that
record is larger than the remaining space. In this technique, a lot
of space is wasted and the record size is limited to the block size.
Fig. 11.7 Record Blocking Methods

11.6 FILE SHARING


In a multi-user environment, where multiple users collaboratively work
to achieve the same computing goal, file sharing is a desirable
feature that must be provided by a multi-user operating system. File
sharing allows a number of people to access the same file
simultaneously. File sharing can be viewed as part of file systems and
their management. In this section, we will first discuss how multiple
users are allowed to share files in a system where single file system
is used. Then, we will discuss how file sharing can be extended in an
environment that involves multiple file systems including remote file
systems.

11.6.1 File Sharing among Multiple Users


In a system where multiple users are allowed to share files, file
naming and file protection are the essential issues that need to be
addressed. There are mainly two ways in which files can be shared
among multiple users. First, the system by default allows the users to
share the files of other users, and second, the owner of a file explicitly
grants access rights to the other users. The owner is the user who
has the most control over the file or the directory. He or she can
perform all the operations on the file, like reading, writing, renaming,
changing the file attributes, grant access, and so on. The other users
to whom the owner grants access are termed as group members.
It is not necessary that all the users belonging to a particular
group are given the same privileges and access rights. For example,
some members of the group may have right to only read the contents
of the file, while some other members may have the right to read as
well as modify the contents of the file. Therefore, it is entirely the
discretion of the owner how much right to grant to which member.
Each user of the system is assigned a unique user ID. The
system maintains a list of all its user names and associated user IDs.
Whenever a user logs into the system, his corresponding user ID is
determined from the list, which is then associated with all his
processes and threads until the user logs out. The owner (user) of
one file may belong to one or more groups associated with other files.
Thus, to implement group functionality, a system-wide list of group
names and group ID is maintained.
To implement file sharing, the system must associate each file
with its corresponding user ID and group ID. Whenever a user wants
to perform an operation on a file, the user ID is compared to the
owner attribute to determine whether the requesting user is the owner
of the file. Similarly, the group IDs can be compared. Depending on
the result of the comparison, the requested operation on the file is
granted or denied.

11.6.2 File Sharing in Remote File Systems


Remote file systems allow a computer to mount one or more file
systems from one or more remote machines. Thus, in a networked
environment, where file sharing is possible between remote systems,
more sophisticated file sharing methods are needed. With the
advancements in network and file technology, file sharing methods
have also been changed. Traditionally, the files were transferred
manually from one machine to another via programs like ftp. Then,
distribute file systems (DFS) came into existence that allowed
users to view the remote directories from their local machines. The
third method, which is the World Wide Web, allows users to gain
access to remote files using a browser. ftp is still needed to transfer
files from one machine to another.
ftp allows both anonymous as well as authenticated access. In
anonymous access, the user is allowed access without having any
account on the remote system. In authenticated access, the user
must be authenticated before accessing a file. WWW makes use of
anonymous file transfer, whereas DFS requires an authenticated
access.
DFS requires a much tighter integration between the machine
providing the files and the machine accessing the remote files. The
former is termed as the server, whereas the latter is termed as a
client. The server specifies which file(s) are available to exactly
which clients. There could be many-to-many relationship between the
servers and clients. That is, one server may provide files to multiple
clients, and one client can access files from multiple servers. Thus, a
given machine can be both server to other clients, and a client of
other servers.
When a client sends a file operation request to a server, the ID of
the requesting user is also sent along with the request. The server
performs the standard access checks to determine whether the user
has the permission to access the file in the desired mode. If the user
does not have the access rights, the file access is denied; otherwise,
a file handle is returned to the client, which then can perform the
desired operation on the file. Once the access is complete, the client
closes the file.

11.6.3 Consistency Semantics


As long as a file is being shared among multiple users for reading
only, no consistency issue arises. The issue only arises when a write
operation is performed on a file which is being accessed by other
users as well. The characterization of the system that specifies the
semantics of multiple users accessing a shared file simultaneously is
known as consistency semantics. These semantics specify when
the modifications done by one user should be made visible to the
other users accessing the file.
For example, in UNIX file system, the modifications done in an
open file are made visible immediately to other users who are
accessing this file. On the other hand, in Andrew File System (AFS)—
a file system designed for distributed computing environment—the
modifications done in an open file are not immediately visible to the
other users accessing this file.

11.7 PROTECTION
The information stored in a system requires to be protected from the
physical damage and unauthorized access. A file system can be
damaged due to various reasons, such as a system breakdown, theft,
fire, lightning or any other extreme condition that is unavoidable and
uncertain. It is very difficult to restore the data back in such
conditions. In some cases, when the physical damage is irreversible,
the data can be lost permanently. Though physical damage to a
system is unavoidable, measures can be taken to safeguard and
protect the data.
In a single-user system, protection can be provided by storing a
copy of the information on the disk to the disk itself, or to some other
removable storage medium, such as magnetic tapes and compact
discs. If the original data on the disk is erased or overwritten
accidentally, or becomes inaccessible because of its malfunctioning,
the backup copy can be used to restore the lost or damaged data.
Apart from protecting the files from physical damage, the files in a
system also need a protection mechanism to control improper
access.

11.7.1 Types of Access


In a single-user system or in a system where users are not allowed to
access the files of other users, there is no need for a protection
mechanism. However, in a multiuser system where some user can
access files of other users, the system is prone to improper access,
and hence a protection mechanism is mandatory. The access rights
define the type of operation that a user is allowed on a file. The
different access rights that can be assigned to a particular user for a
particular file are as follows.
• Read: Allow reading from the file.
• Write: Allow writing or rewriting the file.
• Execute: Allow running the program or application.
• Append: Allow writing new information at the end of the file.
• Copy: Allow creating a new copy of the file.
• Rename: Allow renaming a file.
• Edit: Allow adding and deleting information from the file.
• Delete: Allow deleting the file and releasing the space.
There are many protection mechanisms, each having some
advantages and some disadvantages. However, their kind depends
on the need and size of the organization. A smaller organization
needs a different protection mechanism than a larger organization
with large number of people sharing files.
11.7.2 Access Control
To protect the files from improper accesses, the access control
mechanism can follow either of the two approaches.

Password
A password can be assigned to each file and only the users
authorized to access the file are given the password. This scheme
protects the file from unauthorized access. The main drawback of this
approach is the large number of passwords which are practically very
difficult to remember (for each files separately). However, if only one
password is used for accessing all the files, then if once the password
is known, all the files become accessible. To balance the number of
passwords in a system, some systems follow a scheme, where a user
can associate a password with a subdirectory. This scheme allows a
user to access all the files under a subdirectory with a single
password. However, even this scheme is also not very safe. To
overcome the drawbacks of these schemes, the protection must be
provided at a more detailed level by using multiple passwords.

Access Control List

Fig. 11.8 A Sample Access Control List


It is an alternative method of recording access rights in a computer
system in which access to a file is provided on the basis of identity of
the user. An access-control list (ACL) is associated with each file
and directory, which stores user names and the type of access
allowed to each user. When a user tries to access a file, the ACL is
searched for that particular file. If that user is listed for the requested
access, the access is allowed. Otherwise, access to the file is denied.
Figure 11.8 shows a sample access control list for five files and four
users. It is clear from the figure that user A has access to File 1, File
2 and File 5, user B has access to File 1, File 2 and File 3 and
user C has access to File 2, File 3 and File 4.
This system of access control is effective, but if all users want to
read a file, the ACL for this file should list all users with read
permission. The main drawback of this system is that making such a
list would be a tedious job when the number of users is not known.
Moreover, the list needs to be dynamic in nature as the number of
users will keep on changing, complicating the space management.
To resolve the problems associated with ACL, a restricted version
of the access list can be used in which the length of the access list is
shortened by classifying the users of the system into the following
three categories.
• Owner: The user who created the file.
• Group: A set of users who need similar access permission for
sharing the file is a group, or work group.
• Universe: All the other users in the system form the universe.
Access permissions are assigned based on the category of a
user. The owner of the file has full access to the file; and can perform
all file operations (read, write and execute), whereas a group user
can read and write the file but cannot execute or delete it. However,
the member of the universe can only read it and is not allowed to
perform any other operations on it.
The UNIX operating system uses the above discussed method to
achieve protection, where the users are divided into three groups,
and access permissions for each file is set with the help of three
fields. Each field is a collection of bits, where three bits are used for
setting protection information and an additional bit is kept for a file
owner, for the file’s group and for all other users. The bits are set as –
rwx where r controls read access, w controls write access and x
controls execution. Thus, when all three bits are set to –rwx, it means
a user has full permission on a file, whereas if only –r–– is set, it
means permission is only to read from the file, and when –rw– bits are
set, it means the user can read and write but cannot execute the file.
The scheme requires total nine bits, to store the protection
information. The permissions for a file can be set either by an
administrator or a file owner.

LET US SUMMARIZE
1. Computer applications require large amounts of data to be stored, in a
way that it can be used as and when required.
2. Storing data on the secondary storage devices makes the data persistent,
that is, the data is permanently stored and can survive system failures
and reboots.
3. The data on the disks are stored in the form of files. To store and retrieve
files on the disk, the operating system provides a mechanism called the
file system, which is primarily responsible for the management and
organization of various files in a system.
4. A file is a collection of related data stored as a named unit on the
secondary storage.
5. Each file is associated with some attributes such as its name, size, type,
location, date and time, etc. These are known as file attributes. This
information helps the file system to manage a file within the system.
6. File operations are the functions that can be performed on a file. The
operating system handles the file operations through the use of system
calls.
7. Various operations that can be performed on a file are: create, write, read,
seek, delete, open, append, rename and close.
8. The most common technique to implement a file type is by providing
extension to a file. The file name is divided into two parts, with the two
parts separated by a period (‘.’) symbol, where the first part is the name
and the second part after the period is the file extension.
9. Another way to implement the file type is the use of magic number. A
magic number is a sequence of bits, placed at the beginning of a file to
indicate roughly the type of the file.
10. File structure refers to the internal structure of a file, that is, how a file is
internally stored in the system.
11. The most common file structures recognized and used by different
operating systems are byte sequence, record sequence and tree
structure.
12. The information stored in a file can be accessed in one of the two ways:
sequential access, or direct access.
13. When the information in the file is accessed in the order of one record after
the other, it is called sequential access.
14. When a file is viewed as a sequence of blocks (or records) which are
numbered and can be read or written in any order using this number, it is
called direct access.
15. To manage the data on the disk, the disk is divided into one or more
partitions (also known as volumes) where each partition contains the
information about the files stored in it. This information is stored in a
directory (also known as device directory).
16. Various schemes to define the structure of a directory are: single-level
directory, two-level directory and hierarchical directory.
17. Single-level directory is the simplest directory structure. There is only one
directory that holds all the files. Sometimes, this directory is referred to as
root directory.
18. In a two-level directory structure, a separate directory known as user file
directory (UFD) is created for each user. Whenever a new UFD is created,
an entry is added to the master file directory (MFD) which is at the highest
level in this structure.
19. The hierarchical directory, also known as tree of directory or tree-
structured directory, allows users to have subdirectories under their
directories, thus making the file system more logical and organized for the
user.
20. Mounting a file system means attaching the file system to the directory
structure of the system. The effect of mounting lasts until the file system is
unmounted. Unmounting a file system means detaching a file system from
the system’s directory structure.
21. Whenever a user or an application performs an operation on a file, it is
performed on the record level; whereas I/O is performed on the block
basis. Thus, for performing I/O the records must be organized as blocks.
22. Three methods of record blocking are used depending on the size of the
block, namely, fixed blocking, variable-length spanned blocking, and
variable-length unspanned blocking.
23. File sharing allows a number of people to access the same file
simultaneously. File sharing can be viewed as part of the file systems and
their management.
24. There are mainly two ways in which files can be shared among multiple
users. First, the system by default allows the users to share the files of
other users, and second, the owner of a file explicitly grants access rights
to other users.
25. The owner is the user who has the most control over the file or the
directory. He or she can perform all the operations on the file. The other
users to whom the owner grants access to his or her file are termed as
group members.
26. Remote file systems allow a computer to mount one or more file systems
from one or more remote machines. Thus, in a networked environment,
where file sharing is possible between remote systems, more
sophisticated file sharing methods are needed.
27. Characterization of the system that specifies the semantics of multiple
users accessing a shared file simultaneously is known as consistency
semantics. These semantics specify when the modifications done by one
user should be made visible to the other users accessing the file.
28. In a single-user system or in a system where users are not allowed to
access the files of other users, there is no need for a protection
mechanism. However, in a multi-user system where some user can
access files of other users, the system is prone to improper access, and
hence a protection mechanism is mandatory.
29. To protect the files from improper accesses, the access control mechanism
can follow either of the two approaches: password and access control list.
30. A password can be assigned to each file and only a user knowing the
password can access the file.
31. An access-control list (ACL) is associated with each file and directory,
which stores user names and the type of access allowed to each user.
When a user tries to access a file, the ACL is searched for that particular
file. If that user is listed for the requested access, the access is allowed.
Otherwise, access to the file is denied.

EXERCISES
Fill in the Blanks
1. To store and retrieve files on the disk, the operating system provides a
mechanism called _____________.
2. The additional information that helps the operating system to manage a
file within the file system is called _____________.
3. The data on the disk is kept as blocks of data with an _____________ to
access data directly in a random order.
4. When the information in the file is accessed in the order of one record
after the other, it is called _____________.
5. _____________ is a file system designed for distributed computing
environment.

Multiple Choice Questions


1. Which of these file attributes helps the operating system to create an
environment for a user to work on a file?
(a) File name
(b) File type
(c) File size
(d) File location
2. Which of these file attributes helps the operating system to position the
pointer to a specific position in a file?
(a) Delete file
(b) Append file
(c) Seek file
(d) Rename file
3. Which of these file types compresses and groups together related files
into a single file for storage?
(a) Archive
(b) Batch
(c) Backup file
(d) Library
4. Which of these directories holds all the files?
(a) Device directory
(b) Root directory
(c) User file directory
(d) Master file directory
5. Which of these applications support sequential access method for
information storage and retrieval?
(a) Compilers
(b) Large databases
(c) Multimedia files
(d) All of these

State True or False


1. File system is a part of the operating system that is primarily responsible
for the management and organization of various files.
2. In Windows, the file system distinguishes upper case and lower case
characters in a file name, it treats ‘comp’ and ‘COMP’ as two different
files.
3. In a two-level directory structure, a separate directory known as user file
directory (UFD) is created for each user.
4. All files or directories at the lower levels are called leaf directories and a
directory with no files or subdirectory is called a child.
5. MS-DOS, Windows, and UNIX are some of the systems that use
hierarchical directory structure.

Descriptive Questions
1. Explain the need for storing data on secondary storage devices.
2. Which system supports double extensions to a file name?
3. What is the difference between absolute path name and relative path
name?
4. Define the role of a file system in organizing and managing different files
in a system.
5. “The operating system gives a logical view of the data to its user”. Justify
this statement.
6. When a user double clicks on a file listed in Windows Explorer, a program
is run and given that file as parameter. List two different ways the
operating system could know which program to run?
7. Some systems simply associate a stream of bytes as a structure for a
file’s data, while others associate many types of structures for it. What are
the related advantages and disadvantages of each system?
8. A program has just read the seventh record; it next wants to read the
fifteenth record. How many records must the program read before reading
the fifteenth record?
(a) with direct access
(b) with sequential access
9. Give an example of an application in which data in a file is accessed in the
following order:
(a) Sequentially
(b) Randomly
10. What do you mean by file-system mounting? How is it performed?
11. What is record blocking? Discuss the three methods of blocking.
12. Explain the relative merits and demerits of using hierarchical directory
structure over single-level and two-level directory structures?
13. Discuss how file sharing can be implemented in a multi-user environment
where:
(a) a single file system is used.
(b) multiple file systems are used.
14. Write short notes on the following.
(a) Path name
(b) Magic number
(c) Consistency semantics
(d) Access control list
chapter 12

Implementation of File System

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the file system structure.
⟡ Discuss the basic concepts of file system implementation.
⟡ Describe various methods to allocate disk space to files.
⟡ Explain the directory implementation.
⟡ Explore the methods to keep track of free space on the disk.
⟡ Discuss the implementation of shared files.
⟡ Identify the issues related to file system efficiency and performance.
⟡ Understand how to ensure data consistency and recovery in the
event of system failures.
⟡ Understand the concept of log-structured file system.

12.1 INTRODUCTION
In the previous chapter, we have discussed the basic file concepts,
such as how files are named, what operations are allowed on files,
what the directory tree looks like, and other similar issues which help
users to understand the file system. In this chapter, we will discuss
various issues related to file system implementation in which the file
system designers are interested. This involves how files and
directories are implemented and stored, how the disk space is
managed, and how the file system can be made efficient and reliable.

12.2 FILE SYSTEM STRUCTURE


Every operating system imposes a file system that helps to organize,
manage and retrieve data on the disk. The file system resides
permanently on the disk. The design of the file system involves two
key issues. The first issue includes defining a file and its attributes,
operations that can be performed on a file, and the directory
structure. The second issue includes creating data structures and
algorithms for mapping the logical file system onto the secondary
storage devices.

Fig. 12.1 File System Layers


Figure 12.1 shows that the file system is made up of different
layers, where each layer represents a level. Each level uses the
features of the lower levels to create new features that are used by
higher levels. When the user interacts with the file system (through
system commands), the I/O request occurs on a device. The device
driver in turn generates an interrupt to transfer information between
the disk drive and the main memory. Input to the device driver
consists of high-level commands, and its output is low-level,
hardware-specific instructions. The device driver also writes specific
bit pattern to special locations in the I/O controller’s memory which
communicate to the controller the device location it must act on and
what actions to take.
The next component in the file system is the basic file system
that issues generic (general) commands to the appropriate device
driver to read and write physical blocks on the disk. The physical
blocks are referred to by the numeric disk address (for example, drive
1, cylinder 43, track 4, sector 16).
The component at the next level of the file system is the file
organization module that organizes the files. It knows the physical
block address (the actual address), logical block address (the relative
address), allocation method, and the location of files. Using this
information, it translates the logical address into physical address and
helps the basic file system to transfer files. The other function of the
file organization module is to keep track of the free space and provide
this space for allocation when needed.
The logical file system at the next level manages all the
information about a file except the actual data (content of the file). It
manages the directory structure to provide the necessary information
to the file organization module when the file name is given. It
maintains the file structure using the file control block (FCB) that
stores information about a file, such as ownership, permissions, and
the location of the file content. Protection and security are also taken
care of by the logical file system.
Apart from the physical disk drive, there are other removable
devices such as CD-ROM, floppy disk, pen drives and other storage
devices attached to the system. Each of these devices has a
standard file system structure imposed by its manufacturers. For
instance, most CD-ROMs are written in High Sierra format, which is a
standard format agreed upon by CD-ROM manufacturers. The
standard file system for the removable media makes them
interoperable and portable for use on different systems. Apart from
the file systems for removable devices, each operating system has
one (or more) disk-based file system. UNIX system uses the UNIX file
system (UFS) as a base. Windows NT supports disk file system
formats such as FAT, FAT32, and NTFS (or Windows NT file system),
along with CD-ROM, DVD, and floppy-disk file system formats.

12.3 FILE SYSTEM IMPLEMENTATION


In this section, we will discuss various structures and operations that
are used for implementing file system operations.

12.3.1 Operating Structures


There are several on-disk and in-memory structures that are used
to implement a file system. Depending on the operating system and
the file system, these may vary, but the general principles remain the
same. Many of the structures that are used by most of the operating
systems are discussed here. The on-disk structures include:
• Boot control block: It contains enough information that the
system needs to boot the operating system from that partition.
Though not all the partitions in a disk contain a bootable operating
system, every partition starts with a boot block. Having one block
in each partition reserved for a boot block is a good idea because
any partition can have the operating system in future. If the
partition does not contain any operating system, this block can be
empty. In UNIX file system (UFS), this block is referred to as the
boot block and in NTFS (Windows file system), it is called the
partition boot sector.
• Partition control block: It is a table that stores key information
related to the partition, the number and size of blocks in the
partition, free block count and free block pointers, and FCB
pointers and free FCB count. In UFS, the partition control block is
referred to as superblock, while in NTFS, it is called master file
table.
• Directory structure: Each partition has a directory structure with
root directory at the top. The directory structure helps to manage
and organize files in the file system.
• FCB: For each file, an FCB is allocated that stores information
such as file permissions, ownership, size, dates when the file was
created, accessed or written to, and location of the file data
blocks. In UFS, the FCB is called inode (an array of data
structures, one for each file). In NTFS, this information is kept
within the master file table, which uses a relational database
structure, where each row stores information about a file. An
example of the file-system layout is shown in Figure 12.2 (the
layout is based on UNIX file system).
The in-memory structure helps in improving the performance of
the file system. The in-memory structures include:
• In-memory partition table: It stores information about each
mounted partition.
• In-memory directory structure: It stores the information about
the recently accessed directories.
• System-wide open-file table: It contains a copy of the FCB of
each open file and a count of the number of processes that have
the file open.
• Per-process open-file table: It contains a pointer to the
corresponding entry in the system-wide open-file table along with
some related information.
Fig. 12.2 Layout of the File System

When a process needs to create a new file, it calls the logical file
system. In response, the logical file system allocates either a new
FCB or an FCB from the list of free FCBs in case all the FCBs have
already been created at the time of file system creation. After
allocating the FCB, the next step is to add the new file name and FCB
into the appropriate directory. For this, the system loads the desired
directory into the memory, updates it with the required information
and finally, writes it back onto the disk.
After a file has been created, I/O operations can be performed on
it. However, before a process can perform I/O on the file, the file
needs to be opened. For this, the process executes open() system
call that passes the file name to the logical file system. It may so
happen that the given file is already open and is in use by some other
process. To determine this, the given file name is searched in the
system-wide file-open table first. If the file name is found in the table,
an entry is made in the per-process open-file table which points to the
existing system-wide open-file table entry. On the other hand, if the
file name is not found in the system-wide open-file table (that is, the
file is not already open), the file name is searched for in the directory
structure. When the file is found, its FCB is copied into the system-
wide open-file table, and the count is incremented. The value of the
count indicates the number of users who have opened the file
currently. Figure 12.3 shows the in-memory file-system structures
while opening a file.
After updating the system-wide open-file table, an entry is made in
the per-process open-file table. This entry includes a pointer to the
appropriate entry in the system-wide open-file table, a pointer to the
position in the file where the next read or write will occur, and the
mode in which the file is open. The open() call returns a pointer to the
appropriate entry in the per-process file-system table. This pointer is
used to perform all the operations as long as the file is open.
When a process closes the file, the corresponding entry is
removed from the per-process open-file table and the system-wide
entry’s open count is decremented. When the count becomes 0 which
means all the users who opened the file have closed it, the updated
file information is copied back to the disk-based structures and the
entry is removed from the system-wide open-file table.

Fig. 12.3 In-memory File Structures in Opening a File

12.3.2 Partitions and Mounting


The layout of a disk may vary greatly from system to system based
on the operating system installed. For example, a disk can be divided
into multiple partitions where each partition may or may not contain a
file system. In other words, a partition can be raw (that is, without any
file system) or cooked. For example, in UFS, the partition holding the
swap space does not contain a file system (that is, a raw partition).
This is because UNIX uses its own disk format. Similarly, certain
database systems use raw disk and format the data as per their
requirements.
Recall from Chapter 10 that one partition on the disk contains the
boot block. The boot partition has no file system and has its own
format. The reason behind this is that when the system boots, no file-
system code is present in the memory and thus, the system cannot
understand the file-system format. The boot information is stored in a
series of contiguous blocks and is loaded into the memory as an
image (boot loader) whose execution begins from a specified location
in the memory. This boot loader in turn understands the file-system
structure and thus, determines and loads the kernel into the memory
in order to begin its execution. The boot loader may also have
information on how to boot a particular operating system.
So far we have discussed the case of systems having only a
single operating system. However, there are systems which can be
dually booted and thus, allow more than one operating system to be
installed on a single system. For example, a disk may have multiple
partitions where each partition contains a different operating system
and even a different file system. In such systems, a way is needed to
know which of the various operating systems is to be loaded. For this,
a boot loader that interprets multiple operating systems as well as
multiple file systems is stored in the boot partition. When the system
boots up, the boot loader loads one of the various operating systems
installed on the disk.
The kernel of the operating system and some system files are
stored in a partition on the disk, called the root partition. Root
partition is always mounted at the boot time, while other volumes may
mount automatically at boot time or can be mounted manually later
depending on the operating system. To let the mounting accomplish
successfully, the operating system assures that the device comprises
a valid file system. To ensure this, the operating system asks the
device driver to read the device directory and then checks its format.
If the format is found different from the expected format, consistency
checking and correction of the partition needs to be performed. In
case of valid file system, the operating system makes an entry in its
in-memory mount table, indicating that the file system has been
mounted. It also records the type of the file system.

12.3.3 Virtual File System (VFS)


As discussed earlier, most systems support multiple disk partitions
with each partition having the same or different file system on it. In
addition, an operating system may also support network (remote) file
systems. In order to facilitate the processes and applications to
interact with different file systems at the same time, the operating
system offers a virtual file system (VFS), which is a software layer
that hides the implementation details of any single file type. As a
result, a user can use the file system of his/her choice irrespective of
the file system implementation.
Figure 12.4 depicts the architecture of the file system
implementation which consists of the following three layers.
• The first layer is file system interface through which processes
and applications request access to files. It is based on the file
descriptors and the system calls, including open(), read(),
write() and close().

• The second layer is VFS layer that resides between the file-
system interface layer and the actual file systems. The VFS layer
is responsible for performing the following two functions.
■ It defines an interface, called VFS interface which separates
the generic operations of the file system from their
implementation and thus, allows different file systems
mounted locally to be accessed in a transparent manner.
■ It also provides a mechanism to identify a file uniquely across
the network rather than only within a single file system. For
this, it assigns a vnode to each file (or directory) which is a
numerical value that designates the file network-wide unique.
For each active node (that is, open file or directory), the
kernel maintains a vnode data structure.
In nutshell, the VFS separates the local files from remote files
and the local files are further distinguished based on the type of
their file systems. When a request comes from a local file, it
handles that request by activating operations specific to the
respective local file system, while for handling remote requests, it
invokes the procedures of NFS protocol.
• The third layer of architecture is the layer which actually
implements the file system type or the remote file-system
protocol.
Fig. 12.4 Architecture of the File System Implementation

Note: VFS is an abstraction that supports a generic file model.

12.4 ALLOCATION METHODS


Every system stores multiple files on the same disk. Thus, an
important function of the file system is to manage the space on the
disk. This includes keeping track of the number of disk blocks
allocated to files and the free blocks available for allocation.
The two main issues related to disk space allocation are:
• Optimum utilization of the available disk space
• Fast accessing of files.
The widely used methods for allocation of disk space to files (that
is, file implementation) include contiguous, linked and indexed. For
discussing these different allocation strategies, a file is considered to
be a sequence of blocks and all I/O operations on a disk occur in
terms of blocks.

12.4.1 Contiguous Allocation


In contiguous allocation, each file is allocated contiguous blocks on
the disk, that is, one after the other (see Figure 12.5). Assuming only
one job is accessing the disk, once the first block, say b, is accessed,
accessing block b+1 requires no head movement normally. Head
movement is required only when the head is currently at the last
sector of a cylinder and moves to the first sector of the next cylinder;
the head movement is only one track. Therefore, the number of seeks
and thus, seek time in accessing contiguously allocated files is
minimal. This improves the overall file system performance.
Fig. 12.5 An Example of Contiguous Allocation

It is relatively simple to implement the file system using


contiguous allocation method. The directory entry for each file
contains the file name, the disk address of the first block, and the
total size of the file.
Contiguous allocation supports both sequential and direct access
to a file. For sequential access, the file system remembers the disk
address of the last block referenced, and when required, reads the
next block. For direct access to block b of a file that starts at location
L, the block L+b can be accessed immediately.

Contiguous allocation has a significant problem of external


fragmentation. Initially, the disk is free, and each new file can be
allocated contiguous blocks starting from the block where the
previous file ends. When a file is deleted, it leaves behind some free
blocks in the disk. This is not a problem until we have contiguous
blocks to allocate to a new file at the end of the disk. However, with
time, the disk becomes full, and at that time the free blocks are
fragmented throughout the disk. One solution to this problem is via
compaction, which involves moving the blocks on the disk to make
all free space into one contiguous space. Compaction is expensive in
terms of time as it may take hours to compact a large hard disk that
uses contiguous allocation. Moreover, normal operations are not
permitted generally during compaction.
An alternative to expensive compaction is to reuse the space. For
this, we need to maintain a list of holes (an unallocated segment of
contiguous blocks). In addition, we must know the final size of a file at
the time of its creation so that a sufficiently large hole can be
allocated to it. However, determining the file size in advance is
generally difficult and allocating either too little or too more space to a
file can create problems. If we allocate more space than it needs, we
end up in wasting costly memory. On the other hand, if we allocate
too little space than needed, we may not extend the file, since the
blocks on both sides of the file may be allocated to some other files.
One possibility to extend the space is to terminate the user program
and then the user must restart it with more space. However, restarting
the user program repeatedly may again be costly. Alternatively, the
system may find the larger hole, copy the contents of the file to the
new space, and release the previous space. This can be done
repeatedly as long as the required space is available contiguously in
the disk. Moreover, the user program need not be restarted and the
user is also not informed about this. However, the task is again time-
consuming.

12.4.2 Linked Allocation


The file size generally tends to change (grow or shrink) over time.
The contiguous allocation of such files results in several problems.
Linked list allocation method overcomes all the problems of
contiguous allocation method.
In the linked list allocation method, each file is stored as a linked
list of the disk blocks. The disk blocks are generally scattered
throughout the disk, and each disk block stores the address of the
next block. The directory entry contains the file name and the address
of the first and last blocks of the file (see Figure 12.6).

Fig. 12.6 An Example of Linked List Allocation

This figure shows the linked list allocation for a file. A total of four
disk blocks are allocated to the file. The directory entry indicates that
the file starts at block 12. It then continues at block 9, block 2, and
finally ends at block 5.
The simplicity and straightforwardness of this method makes it
easy to implement. The linked list allocation results in the optimum
utilization of disk space as even a single free block between the used
blocks can be linked and allocated to a file. This method does not
come across with the problem of external fragmentation, thus,
compaction is never required.
The main disadvantages of using linked list allocation are slow
access speed, disk space utilization by pointers, and low reliability of
the system. As this method provides only sequential access to files,
to find the nth block of a file, the search starts at the beginning of the
file and follows the pointer until the nth block is found. For a very large
file, the average turn around time is high.
In linked list allocation, maintaining pointers in each block requires
some disk space. The total disk space required by all the pointers in a
file becomes substantial, thus the requirement of space by each file
increases. The space required by pointers could otherwise be used to
store the information. To overcome this problem, contiguous blocks
are grouped together as a cluster, and allocation to files takes place
as clusters rather than blocks. Clusters allocated to a file are then
linked together. Having a pointer per cluster rather than per block
reduces the total space needed by all the pointers. This approach
also improves the disk throughput as fewer disk seeks are required.
However, this approach may increase internal fragmentation because
having a partially full cluster wastes more space than having a
partially full block.
The linked list allocation is also not very reliable. Since disk blocks
are linked together by pointers, a single damaged pointer may
prevent us from accessing the file blocks that follow the damaged
link. Some operating systems deal with this problem by creating
special files for storing redundant copies of pointers. One copy of the
file is placed in the main memory to provide faster access to disk
blocks. Other redundant pointer files help in safer recovery.

12.4.3 Indexed Allocation


There is one thing common to both linked and indexed allocation, that
is, noncontiguous allocation of disk blocks to the files. However, they
follow different approaches to access the information on the disk.
Linked allocation supports sequential access, whereas indexed
allocation supports sequential as well as direct access.
In indexed allocation, the blocks of a file are scattered all over the
disk in the same manner as they are in linked allocation. However,
here the pointers to the blocks are brought together at one location
known as the index block. Each file has an index block (see Figure
12.7), which is an array of disk-block pointers (addresses). The kth
entry in the index block points to the kth disk block of the file. To read
the kth disk block of a file, the pointer in the kth index block entry is
used to find and read the desired block. The index block serves the
same purpose as a page map table does in the paged memory
systems.
The main advantage of indexed allocation is the absence of
external fragmentation, since any free blocks on the disk may be
allocated to fulfill a demand for more space. Moreover, the index can
be used to access the blocks in a random manner.
When compared to linked allocation, the pointer overhead in
indexed allocation is comparatively more. This is because with linked
allocation, a file of only two blocks uses a total of 8 bytes for storing
pointers (assuming each pointer requires 4 bytes of space). However,
with indexed allocation, the system must allocate one block (512
bytes) of disk space for storing pointers. This results in the wastage
of 504 bytes of the index block as only 8 bytes are used for storing
the two pointers.
Fig. 12.7 An Example of Indexed Allocation

Clearly, deciding the size of the index block is a major issue


because too large a block may result in the wastage of memory and a
too small index block limits the size of the largest file in the system. If
4 bytes are used to store a pointer to a block, then a block of size 512
bytes can store up to 128 pointers, thus, the largest file in that system
can have 65536 bytes (512 × 128) of information. However, we may
have a file which exceeds the size limit of 65536 bytes. To solve this
problem, multi-level indexes, with two, three, or four levels of
indexes may be used. The two-level indexes, with 128 × 128
addressing is capable of supporting file sizes up to 8 MB and the
three-level indexes with 128 × 128 × 128 addressing can support file
size of up to 1 GB.
The performance of the file system can be greatly enhanced by
placing the frequently accessed index blocks in the cache memory.
This reduces the number of disk accesses required to retrieve the
address of the target block.

12.5 IMPLEMENTING DIRECTORIES


Efficiency, performance, and reliability of a file system are directly
related to the directory-management and directory-allocation
algorithms selected for a file system. The most commonly used
directory-management algorithms are linear list and hash table.

12.5.1 Linear List


The linear list method organizes a directory as a collection of fixed
size entries, where each entry contains a (fixed-length) file name, a
fixed structure to store the file attributes, and pointers to the data
blocks (see Figure 10.3). The linear list method stores all the
attributes of the file at one place as a single directory entry and uses
a linear search to search a directory entry from the list of entries. It
means to search out an entry in the list of entries; each entry (starting
from the first entry in the directory) is examined one by one until the
desired entry is found. This is easy to program; however, with
extremely large directories the search becomes very slow, which is a
major disadvantage of this method. MS-DOS and Windows use this
approach for implementing directories.
Fig. 12.8 An Example of Linear List Directory Entry

When the user sends a request to create a new file, the directory
is searched to check whether any other file has the same name or
not. If no other file has the same name, the memory will be allocated
and an entry for the same would be added at the end of the directory.
To delete a file, the directory is searched for the file name and if the
file is found, the space allocated to it is released. The delete
operation results in free space that can be reused. To reuse this
space, it can be marked with a used-unused bit, a special name can
be assigned to it, such as all-zeros, or it can be linked to a list of free
directory entries.
When performing the file operations, the directory is searched for
a particular file. The search technique applied greatly influences the
time taken to make the search and in turn the performance and
efficiency of the file system. As discussed, with long directories, a
linear search becomes very slow and takes O(n) comparisons to
locate a given entry, where n is the number of all entries in a
directory. To decrease the search time, the list can be sorted and a
binary search can be applied. Applying binary search reduces the
average search time but keeping the list sorted is a bit difficult and
time-consuming, as directory entries have to be moved with every
creation and deletion of file.

12.5.2 Hash Table


A major concern while implementing directories is the search time
required to locate a directory entry corresponding to a file. To
considerably reduce the search time, a more complex data structure
known as a hash table along with the linear list of directory entry is
used.
A hash table is a data structure, with 0 to n-1 table entries, where
n is the total number of entries in the table (see Figure 12.9). It uses a
hash function to compute a hash value (a number between 0 to n-1)
based on the file name. For instance, file name is converted into
integers from 0 to n-1 and this number is divided by n to get the
remainder. Then, the table entry corresponding to the hash value
(value of remainder) is checked. If the entry space is unused, then a
pointer to the file entry (in the directory) is placed there. However, if
the entry is already in use, we say a collision has occurred. In such
a situation, a linked list of entries that hash to the same value is
created and the table entry is made to point to the header of the
linked list.

Fig. 12.9 Hash Table Linked to a Directory

To search a file, the same process is followed. The file name is


hashed to locate the hash table entry and then all the entries on the
chain are checked to see if the file name exists. If the name is not
found in the chain, the file is not present in the directory.
The main disadvantage of this approach is that it creates long
overflow of chains if the hash function is not distributing the values
uniformly. Now, in order to search an entry, a linear search in the long
overflow chain may be required, which increases the access time. In
addition, the administration of hash table becomes a complex
process. Some system copies the directory entries for frequently
accessed files in the cache memory. This saves the time required to
re-read information from the disk, thus, enabling faster access of the
file.

12.6 SHARED FILES

Fig. 12.10 File System containing a Shared File

Consider a scenario where two or more users are working on the


same project; the users may frequently need to work on files
belonging to that project at the same time. Moreover, changes made
by one user such as creating a new file in the project must be
automatically visible to other users working on the same project.
Thus, it would be convenient to store the common files in a
subdirectory and make this subdirectory to appear in the directory of
each user. This implies that the shared file or subdirectory will be
present in two or more directories in the file system. Figure 12.10
shows a file system that contains a file E shared between directories
D4 and D6.

Shared files cannot be implemented using the traditional tree-


structured directory system, as it does not allow directories to share
subdirectories and files with each other. Instead, the tree-structured
system is generalized to form a directed acyclic graph (DAG),
which allows the same file or subdirectory to appear in different
directories at the same time.
Though shared files provide convenience to the users, it causes
some problems too. First, if we are using the directory implementation
method in which the disk block addresses are stored in a directory
entry, then the directory entry of each user sharing the file will contain
a copy of the block addresses of the shared file. Moreover, if one user
adds or deletes some blocks from the shared file, the changes will
appear only in the directory of that user but not in other users’
directories.
This problem can be solved in the following ways.
• The directory entry of each user sharing the file contains only the
address of the inode that stores block addresses and the owner
of the shared file. A count is maintained in the inode of the
shared file to keep track of the number of directory entries
pointing to it. Every time a new user connects to a shared file, the
value of count in the inode is increased by one. Similarly,
disconnecting a user from a shared file decreases the value of
count by one. Now, a problem arises when the owner of the
shared file attempts to remove the file but some user is still
connected to it. If the owner removes the file along with its inode,
the directory entry of the user connected to it will contain either an
invalid pointer, or a pointer to an invalid inode (that is, to an
invalid file) if the inode is later allotted to some another file.
Obviously, the system may check from the count whether any
users are connected to the inode of the shared file, but it cannot
find the directory entries of those users.
• An alternative solution is to use symbolic linking approach. In
this approach, whenever a user (say, X) connects to a file
belonging to another user (say, Y), the system creates a new file
called link and enters this file in X’s directory. The link file stores
the path name (absolute or relative) to the shared file. At the time
a file reference is made, the directory is searched. If the directory
entry is found to be of type link, the link is resolved using the path
name to locate the original file. This approach overcomes the
limitation of the former solution as only the directory entry of the
owner of the shared file stores the pointer to the inode while those
of other users store only the path name. Thus, after the owner
removes the shared file along with inode, any attempt to refer to
that file through symbolic link will simply fail instead of pointing to
some invalid inode. However, creating and using symbolic links
incur extra overhead to the system.

12.7 FREE-SPACE MANAGEMENT


Whenever a new file is created, it is allocated some space from the
available free space on the disk. The free space can be either the
hitherto unused space or the space freed by some deleted files. The
file system maintains a free-space list that indicates free blocks on
the disk. To create a file, the free-space list is searched for the
required amount of space; the space is then allocated to the new file
accordingly. The newly allocated space is then removed from the
free-space list. Similarly, when a file is deleted, its space is added to
the free-space list. Various methods used to implement free-space list
are bit vector, linked list, grouping, and counting.

Bit Vector
Bit vector, also known as bit map, is widely used to keep track of the
free blocks on a disk. To track all the free and used blocks on a disk
with total n blocks, a bit map having n bits is required. Each bit in a bit
map represents a disk block, where a 0 in a bit represents an
allocated block and a 1 in a bit represents a free block. Figure 12.11
shows the bit map representation of a disk.
Fig. 12.11 A Bit Map

The bit map method for the managing the free-space list is simple.
For instance, if a file requires four free blocks using contiguous
allocation method, free blocks 12, 13, 14, and 15 (the first four free
blocks on the disk that are adjacent to each other) may be allocated.
However, for the same file using linked or indexed allocation, the file
system may use free blocks 2, 4, 6, and 8 for allocation to the file.
The bit map is usually kept in the main memory to optimize the
search for free blocks. However, for systems with larger disks,
keeping the complete bit map in the main memory becomes difficult.
For a 2 GB disk with 512-byte blocks, a bit map of 512 KB would be
needed.

Linked List
The linked list method creates a linked list of all the free blocks on the
disk. A pointer to the first free block is kept in a special location on the
disk and is cached in the memory. This first block contains a pointer
to the next free block, which contains a pointer to the next free block,
and so on. Figure 12.12 shows the linked list implementation of free
blocks, where block 2 is the first free block on the disk, which points
to block 4, which points to block 5, which points to block 8, which
points to block 9, and so on.
Linked list implementation for managing free-space list requires
additional space. This is because a single entry in linked list requires
more disk space to store a pointer as compared to one bit in bit map
method. In addition, traversing the free-list requires substantial I/O
operations as we have to read each and every block, which takes a
lot of time.

Fig. 12.12 Free-space Management through Linked List

Grouping
Grouping is a modification to the free-list approach in the sense that
instead of having a pointer in each free block to the next free block,
we have pointers for first n free blocks in the first free block. The first
n-1 blocks are then actually free. The nth block contains the address
of the next n free blocks, and so on. A major advantage of this
approach is that the addresses of many free disk blocks can be found
with only one disk access.

Counting
When contiguous or clustering approach is used, creation or deletion
of a file allocates or de-allocates multiple contiguous blocks.
Therefore, instead of having addresses of all the free blocks, as in
grouping, we can have a pointer to the first free block and a count of
contiguous free blocks that follow the first free block. With this
approach, the size of each entry in the free-space list increases
because an entry now consists of a disk address and a count, rather
than just a disk address. However, the overall list will be shorter, as
the count is usually greater than 1.

12.8 EFFICIENCY AND PERFORMANCE


The allocation methods and the techniques of directory management
discussed so far greatly affect the disk efficiency and performance.

12.8.1 Efficiency
The optimum utilization of disk space to store the data in an
organized manner defines the efficiency of a file system. A careful
selection of the disk-allocation and directory-management algorithms
is most important to improve the efficiency of a disk.
Certain operating systems make use of clustering (discussed in
Section 12.4.2) to improve their file-system performance. The size of
clusters depends on the file size. For large files, large clusters are
used, and for small files, small clusters are used. This reduces the
internal fragmentation that otherwise occurs when normal clustering
takes place.
The amount and nature of information kept in the file’s directory
influences the efficiency of the file system. A file’s directory that
stores detailed information about a file is informative but at the same
time it requires more read/write on disks for keeping the information
up to date. Therefore, while designing the file system, due
consideration must be given to the data that should be kept in the
directory.
Other consideration that must be kept in mind while designing the
file system is determining the size of the pointers (to access data
from files). Most systems use either 16-bit or 32-bit pointers. These
pointer sizes limit the file sizes to either 216 (64 KB) or 232 bytes (4
GB). A system that requires larger files to store data can implement a
64-bit pointer. This pointer size supports files of 264 bytes. However,
the greater the size of the pointer, the more the requirement of disk
space to store it. This in turn makes allocation and free-space
management algorithms (linked list, indexes, and so on) use up more
disk space.
For better efficiency and performance of a system, various factors,
such as pointer size, length of directory entry, and table size need to
be considered while designing an operating system.

12.8.2 Performance
The system’s read and write operations with memory are much faster
as compared to the read and write operations with the disk. To reduce
this time difference between disk and memory access, various disk
optimization techniques, such as caching, free-behind and read-
ahead are used.
To reduce the disk accesses and improve the system
performance, blocks of data from secondary storage are selectively
brought into the main memory (or cache memory) for faster
accesses. This is termed as caching of disk data.
When a user sends a read request, the file system searches the
cache to locate the required block. If the block is found the request is
satisfied without the need for accessing the disk. However, if the
block is not in the cache, it is first brought into the cache, and then
copied to the process that requires it. All the successive requests for
the same block can then be satisfied from the cache.
When a request arrives to read a block from the disk, the disk
access is required to transfer the block from it to the main memory.
With the assumption that the block may be used again in future, it is
kept in a separate section of the main memory. This technique of
caching disk blocks in the memory is called block cache. In some
other systems, file data is cached as pages (using virtual-memory
techniques) rather than as file system oriented blocks. This technique
is called page cache. Caching the file data using virtual addresses is
more efficient as compared to caching through physical disk blocks.
Therefore, some systems use page cache to cache both process
pages and file data. This is called unified virtual memory.
Now, consider two alternatives to access a file from disk: memory
mapped I/O and standard system calls, such as read and write.
Without a unified buffer cache, the standard system calls have to go
through the buffer cache, whereas the memory mapped I/O has to
use two caches, the page cache and the buffer cache (see Figure
12.13). Memory mapped I/O requires double caching. First the disk
blocks are read from the file system into the buffer cache, and then
the contents in the buffer cache are transferred to the page cache.
This is because virtual memory system cannot interface with the
buffer cache. Double caching has several disadvantages. First, it
wastes memory in storing copy of the data in both the caches.
Second, each time the data is updated in the page cache, the data in
the buffer cache must also be updated to keep the two caches
consistent. This extra movement of data within the memory results in
the wastage of CPU and I/O time.
Fig. 12.13 Input/Output without a Unified Buffer Cache

However, with a unified buffer cache, both memory mapped I/O


and the read and write system calls can use the same page cache
(see Figure 12.14). This saves the system resources which are
otherwise required for double caching.

Fig. 12.14 Input/Output using a Unified Buffer Cache


The cache memory has a limited space, therefore, once it is full,
bringing new pages into the cache requires some existing pages to
be replaced. A replacement algorithm is applied to remove a page
(and rewrite to the disk if it has been modified after it is brought in),
and then load the new page in the buffer cache. The most widely
used replacement algorithms are least recently used (LRU), first-in-
first-out (FIFO) and second chance (discussed in Chapter 8). Out of
these, LRU is the general-purpose algorithm for replacing pages.
However, LRU must be avoided with sequential access files, since
the most recently used pages will be used last or may never be used
again. Instead, sequential access can be optimized by techniques
known as free-behind and read-ahead.
The free-behind technique removes (frees) a page from the
buffer as soon as the next page is requested. The pages that are
used once and are not likely to be used again are only wasting buffer
space, thus, they are chosen for replacement. Another technique is
read-ahead, in which when a request arises to read a page, the
requested page and the several subsequent pages are also read and
cached in advance. This is because they are likely to be accessed
after the processing of the current page. Bringing the data from the
disk in one transfer and caching it saves considerable time.

12.9 RECOVERY
As discussed earlier, a computer stores data in the form of files and
directories on the disk and in the main memory. This data is important
for the users who have created it and also for other users who are
using it or might use it in the future. However, a system failure (or
crash) may result in loss of data and in data inconsistency. This
section discusses how a system can be recovered to a previous
consistent state prior to its failure. Data recovery includes creating full
and incremental backups to restore the system to a previous working
state and checking data consistency using consistency checker.

Backup and Restore


Creating a backup includes recording both data and control data to
some other storage devices magnetic tape, or optical disc. The
frequency of the backup depends on the type and nature of the data.
If the system is used to store critical data then backups can be
created on a daily basis, otherwise, the backup can be created after
longer gaps, for instance, after every seven or fifteen days.
However, creating full backups daily would lead to significant
copying overhead. To avoid recopying complete information,
organizations take incremental backups in between the two full
backups. An incremental backup includes copying only the changed
information (based on the date and time of last backup) to another
storage device. Figure 12.15 shows an example of full and
incremental backups between two distinct system states. Note that
from day n, the cycle of taking incremental backups starts again.

Fig. 12.15 Full and Incremental Backups between two Distinct System States

Consider a situation where a system failure leads to some data


loss. To recover this data, the last saved full backup is restored on the
system first. Following this, all the incremental backups are restored
one by one in the reverse order. However, there may be some
processes that are executed in the time between the last incremental
backup and the system failure. These processes are re-executed
following the recovery to update the system to the last working state
(that is, the state at the time of system crash).

Consistency Checking
Consider a situation where due to some reasons (such as power
failure or system crash) the system goes down abruptly. The system
is said to be in the inconsistent state when there is a difference
between the directory information and the actual data on the disk.
The main reason behind the system’s inconsistent state is the use
of main memory to store directory information. As soon as a file
operation occurs, the corresponding information in the directory is
updated in the main memory. However, directory information on the
disk does not necessarily get updated at the same time.
To overcome this problem, most systems use a special program
called consistency checker which runs at the time of system boot. It
compares the data in the directory structure with the data blocks on
the disk, and tries to fix any inconsistency it finds.

12.10 LOG-STRUCTURED FILE SYSTEM


Recall Section 12.8.2 where we discussed various techniques to
optimize the disk access operations. These techniques work well in
the case of read operations where data can be read directly from the
caches rather than the disk. However, if the majority of operations are
write, these techniques no longer gain much performance. This is
because most of the time is spent in disk head movements rather
than in actually performing the writes. To reduce the disk head
movements, a new kind of file system known as log-structured file
system was introduced.
The log-structured file system maintains a log file that contains
the metadata and data of all the files in the file system. Whenever the
data is modified or new data is written in a file, it is recorded at the
end of the log file. The file system also maintains an index block
corresponding to each file in the log file. An index block contains the
pointers to the disk blocks containing the file data in the log file.
Using the log-structured file system, writing data to the file
requires little movement of the disk head as the data is written at the
end of the log file. On the other hand, during the read operation such
movement depends on whether the data to be read is written recently
or is some old data. It is obvious that if the data is written recently,
reading it will require little movement, while if the data is some old
one, the movement is considerable.
To understand log-structured file system, consider the Figure
12.16. Here, we assume that log file contains the data of a single file.
Corresponding to this file, there is an index block that points to the
data blocks of the file. Further, assume that some modifications are
made to the block 1 and block 4 and the new data is written to the
new blocks, say block 5 and block 6, respectively. Now, the file
system will create a new index block in which the pointer to block 1
and pointer to block 4 will point to block 5 and block 6, respectively.
After the modifications have been made, the old index block, block 1
and block 4 become free.
Fig. 12.16 File Updating Process in Log-structured File System

LET US SUMMARIZE
1. Every operating system imposes a file system that helps to organize,
manage and retrieve data on the disk.
2. The design of a file system involves two key issues. The first issue
involves defining a file and its attributes, operations that can be performed
on a file, and the directory structure for organizing files. The second issue
involves creating algorithms and data structures to map the logical file
system onto the physical secondary storage devices.
3. The file system is made up of different layers, where each layer
represents a level.
4. The various file system components are I/O controller, basic file system,
file-organization module and logical file system.
5. The file control block (FCB) stores the information about a file such as
ownership, permissions, and location of the file content.
6. There are several on-disk and in-memory structures that are used to
implement a file system. The on-disk structures include boot control block,
partition control block, directory structure and FCB. The in-memory
structures include in-memory partition table, in-memory directory
structure, system-wide open-file table and per-process open-file table.
7. In order to facilitate the processes and applications to interact with
different file systems at the same time, the operating system offers a
virtual file system (VFS), which is a software layer that hides the
implementation details of any single file type.
8. Every system stores multiple files on the same disk. Thus, an important
function of the file system is to manage the space on the disk. This
includes keeping track of the number of disk blocks allocated to files and
the free blocks available for allocation. Some widely used methods for
allocation of disk space to files (that is, file implementation) include
contiguous, linked and indexed.
9. In contiguous allocation, each file is allocated contiguous blocks on the
disk, that is, one after the other.
10. In the linked list allocation method, each file is stored as a linked list of the
disk blocks. The disk blocks are generally scattered throughout the disk,
and each disk block stores the address of the next block. The directory
entry contains the file name and the address of the first and the last
blocks of the file.
11. In indexed allocation, the blocks of a file are scattered all over the disk in
the same manner as they are in linked allocation. However, here the
pointers to the blocks are brought together at one location known as the
index block.
12. Each file has an index block, which is an array of disk-block pointers
(addresses). The kth entry in the index block points to the kth disk block of
the file.
13. The efficiency, performance, and reliability of a file system are directly
related to the directory-management and directory-allocation algorithms
selected for a file system. The most commonly used directory-
management algorithms are linear list and hash table.
14. The linear list method organizes a directory as a collection of fixed size
entries, where each entry contains a (fixed-length) file name, a fixed
structure to store the file attributes, and pointers to the data blocks.
15. A hash table is a data structure, with 0 to n-1 table entries, where n is the
total number of entries in the table. It uses a hash function to compute a
hash value (a number between 0 to n-1) based on the file name.
16. In a scenario where two or more users want to work on the same files at
the same time, it would be convenient to store the common files in a
subdirectory and make this subdirectory appear in the directory of each
user. This implies that the shared file or subdirectory will be present in two
or more directories in the file system.
17. The file system maintains a free-space list that indicates the free blocks on
the disk. To create a file, the free-space list is searched for the required
amount of space, and the space is then allocated to the new file.
18. The various methods used to implement free-space list are bit vector,
linked list, grouping, and counting.
19. Optimum utilization of disk space to store the data in an organized manner
defines the efficiency of a file system. A careful selection of the disk-
allocation and directory-management algorithms is most important to
improve the efficiency of a disk.
20. System’s read and write operations with the memory are much faster as
compared to the read and write operations with the disk. To reduce this
time difference between disk and memory access, various disk
optimization techniques, such as caching, free-behind, and read-ahead
are used.
21. A system failure (or crash) may result in loss of data and in data
inconsistency. The data recovery includes creating full and incremental
backups to restore the system to a previous working state and checking
the data consistency using consistency checker.
22. The log-structured file system was introduced to reduce the movements of
the disk head while accessing the disk. It maintains a log file that contains
the metadata and data of all the files in the file system. Whenever the
data is modified or new data is written in a file, this new data is recorded
at the end of the log file.

EXERCISES
Fill in the Blanks
1. A _____________ stores all the information related to a file.
2. The widely used methods for allocating disk space to files are
_____________, _____________, _____________, and _____________.
3. _____________ is a data structure used along with the linear list of
directory entry that reduces the search time considerably.
4. To implement shared files, the tree-structured system is generalized to
form a _____________.
5. A _____________ stores addresses of all the blocks which are free for
allocation.

Multiple Choice Questions


1. Which of these is a file-system component?
(a) Logical file system
(b) Basic file system
(c) File-organization module
(d) All of these
2. A cluster is defined as a group of _____________.
(a) Non-contiguous blocks
(b) Contiguous blocks
(c) Free blocks
(d) Used blocks
3. Which of the following methods requires minimal memory for storing
information of free blocks?
(a) Bit vector
(b) Linked list
(c) Grouping
(d) Counting
4. What do you call a partition control block in NTFS?
(a) Boot block
(b) Inode
(c) Master File Table
(d) Superblock
5. Which of the following file-system components manages all the
information about a file except the actual data?
(a) Logical file system
(b) Basic file system
(c) File-organization module
(d) None of these

State True or False


1. The user interacts with the system by using system calls.
2. A linear search requires O(n) comparisons.
3. The amount and nature of information kept in the directory influences the
efficiency and performance of the file system.
4. Root partition is mounted at the boot time.
5. Contiguous allocation method does not support direct access to a file.

Descriptive Questions
1. Name the component of the file system that is responsible for transferring
information between the disk drive and the main memory.
2. Explain the role of the each layer in a file system.
3. List the advantages of using linked list and indexed allocation methods
over linear list allocation method.
4. Explain the need for having a standard file-system structure attached to
various devices in a system.
5. Explain various on-disk and in-memory structures that are used for
implementing a file system.
6. Compare various schemes used for the management of free space.
7. Discuss the methods used for directory implementation and compare
them.
8. How does cache help in improving performance?
9. What is the difference between caching with and without the unified buffer
cache?
10. Explain how data on a system can be recovered to a previous working
state without any data inconsistency after a system failure?
11. Write short notes on the following.
(a) Shared files
(b) Virtual file system
(c) Hash table
12. How does the log-structured file system reduce disk head movements?
chapter 13

Protection and Security

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the need for protection and security.
⟡ Describe the goals and principles of protection.
⟡ Explain the various protection mechanisms, including protection
domain and access control matrix.
⟡ Understand the problem of security.
⟡ Identify the types of security violations, methods used in attempting
security violations and various security measure levels.
⟡ Discuss design principles of security.
⟡ Determine various security threats—caused by humans, natural
calamities and by the use of networks.
⟡ Explore the encryption technique used to ensure security.
⟡ Explore different means to authenticate the user, including password,
smart card and biometric techniques.
⟡ Understand the concept of trusted system.
⟡ Describe the use of firewalls in protecting systems and networks.

13.1 INTRODUCTION
Nowadays, most of the organizations serving domains such as
banking, education, finance and telecommunication rely on the use of
computers for their day-to-day activities. These organizations store
huge amount of data in computers. Since the data is highly valuable,
it is important to protect it from unauthorized access. In addition to
data, the protection of computer resources, such as memory and I/O
devices is also necessary.
Since security of data and computer resources is a major
concern, many organizations restrict the entry of unauthorized
persons inside their premises. For this, they place security guards at
the entrance of the building to allow only authorized persons to enter.
Also, servers, network resources, and other sensitive areas (where
file cabinets are placed) are locked and only some authorized
persons are allowed to enter.
Though the terms security and protection are often used
interchangeably, they have different meanings in computer
environment. Security deals with the threats to information caused
by outsiders (non-users), whereas protection deals with the threats
caused by other users (those who are not authorized to do what they
are doing) of the system. This chapter discusses various aspects
related to protection and security of the system.

13.2 GOALS OF PROTECTION


As the use of computers is becoming all-pervading, the need to
protect it is also increasing day by day. Protection is the major
concern in multiprogramming operating systems, where multiple
processes (or users) are allowed to share the system resources
(such as data files and memory). One of the most important goals of
protection is to ensure that each resource is accessed correctly and
is accessed only by those processes that are allowed to do so. Thus,
any malicious use of the system by users or processes must be
prevented. Some other reasons for which a computer system needs
to be protected are as follows.
• To improve reliability of the system by finding all possible errors at
the interface between component subsystems so that the errant
programs can cause minimal amount of damage. An unprotected
system provides no means to distinguish between authorized and
unauthorized access.
• To ensure that each shared resource is used only in accordance
with system policies, which may be implemented in the design of
the system, or set by the management of the system. Some
policies are even defined by system users in accordance to their
needs to protect their own files and programs. Thus, the
protection system must be flexible enough to accommodate
various kinds of policies.
Note that the policies describing resource usage are not static and
fixed—they vary from application to application, and also change over
time. Thus, providing protection is no longer the sole responsibility of
the operating system designers. It is also the responsibility of the
application programmers to protect their own resources (created
during the course of execution of their application software) against
misuse. To design their own protection software, the programmers
can use the various protection mechanisms provided by the operating
system.

13.3 PRINCIPLES OF PROTECTION


Whenever we start working on a project, we set a guided principle. If
this principle is followed throughout the project, it simplifies the
decisions and keeps the system consistent and easy to understand.
Like other projects, we can also use some guided principle for
designing a protection system. A time-tested principle for protection is
the principle of least privilege, which states that each user,
program, or subsystem involved in the project must be given only
those privileges that are required for performing their tasks.
The operating systems designed by keeping this principle in mind
ensure that failure or compromise of any component results into least
amount of damage to the system. In addition, such operating systems
provide the following facilities to the application programmers:
• Provide system calls and services to the programmers to write
their applications with fine-grained access control.
• Provide mechanisms that allow programmers to enable and
disable the privileges as and when required.
• Allow creation of audit trails in order to trace all protection and
security activities on the system.
Now the question arises: how to manage multiple users involved
in the system? Typically, a separate account is created for each user,
by granting only those privileges that are required for performing his
or her task. However, some systems implement role-based access
control (RBAC) to offer this facility.
Though at a first glance the principle of least privilege appears to
provide a secure computing environment for all operating systems, it
is not true. For example, Windows 2000 is designed with complex
protection mechanism, it has various security holes. Solaris 10, on
the other hand, is relatively secure, even though its design is based
on UNIX, in which protection was given the least preference. Thus, in
general, the protection system completely depends on the complexity
and the services provided by the operating system.

13.4 PROTECTION MECHANISMS


Implementing protection requires policies and mechanisms. The
policy of the organization decides which data should be protected
from whom and the mechanism specifies how this policy is to be
enforced. In the discussion of protection, we focus on mechanism
rather than on policy because policy may change from application to
application.
There are many protection mechanisms used in a system, each
having some advantages and some disadvantages. However, the
kind of mechanism used depends on the need and size of the
organization. Some commonly used protection mechanisms include
protection domain, access control list, and access control matrix. The
access control list has already been discussed in Chapter 11. So,
here we will discuss the remaining two mechanisms.

13.4.1 Protection Domain


A computer system consists of a set of objects that may be accessed
by various processes. An object can be either a hardware object
(such as CPU, memory segment and printer) or software object (such
as file, database, program and semaphore). Each object is referred to
by a unique name and is accessible by the processes using some
pre-defined operations. For example, a process can perform wait()
and signal() operations on a semaphore object.
Since an object is the basic component of a computer system, a
mechanism is required to ensure that a process accesses only those
objects for which it has got permission from the operating system.
Moreover, it must be ensured that a process performs only those
operations on the object which are currently required by it to complete
its task. To facilitate this, the concept of protection domain is used
which specifies such objects.
A domain is a collection of access rights where each access right
is a pair of <object-name,rights_set>. The object_name is the name
of the object and rights_ set is the set of operations that a process is
permitted to perform on the object_name. For example, a domain D
with access right <A,[R, W]> specifies that any process in domain D
can perform read and write operation on the object A.
A system can specify a number of domains. These domains may
be disjoint or may share access rights with each other. To understand
this, consider Figure 13.1, which shows three domains D1, D2 and D3
with five objects A, B, C, D and E. The domain D1 is disjoint while
domains D2 and D3 share the access right <C,[Print]> and thus,
overlap.
Fig. 13.1 Protection Domain

From this figure, it is clear that a process in domain D1 can only


read the object A; however, a process in domain D2 can write as well
as execute the object A. In addition, a process executing in either of
the domains D2 and D3 can print the object C.
Each process, at a given time, executes in some protection
domain with access to objects specified in the domain and the
specified set of rights on those objects. The association between a
process and domain may be either static or dynamic. In the former
case, a process is allowed to access only a fixed set of objects and
rights during its lifetime, while in the latter case, the process may
switch from one domain to another during its execution (termed as
domain switching).
In general, there are three ways in which a domain can be
realized: user, process, or a procedure. In the first case, each user is
treated as a domain, and the set of objects that can be accessed by a
user depends on his or her identity. Domain switching in this case is
performed when one user logs out and another logs in. In the second
case, each process is treated as a domain, and the set of objects that
can be accessed by a process depends on its identity. Domain
switching in this case is performed when one process sends a
message to another process and waits for the response. In the third
case, each procedure is treated as a domain, and the set of objects
that can be accessed by a procedure would be its local variables.
Domain switching in this case is performed when a procedure is
called.
13.4.2 Access Control Matrix
The access control matrix (or access matrix) is a mechanism that
records the access rights of processes over objects in a computer
system. Like access control list, the access matrix is also employed in
file systems and is consulted by the operating system each time an
access request is issued.
The access matrix stores the authorization information in the form
of rows and columns. The columns of the matrix represent objects
and the rows represent domains. The set of access rights that a
process, executing in a particular domain, has on an object is
represented by an entry in the access matrix at the intersection of the
corresponding row and column. Figure 13.2 illustrates an access
control matrix for three domains: D1, D2 and D3, and four objects: O1, O2,
O3 and O4.

Fig. 13.2 Access Control Matrix

It is clear from the above access control matrix that a process


executing in domain D2 can perform read operation on objects O2 and
O4. A process executing in domain D1 has all the rights that a process
executing in domain D3 has in addition to the right to perform read
operation on object O1.
The contents in access control matrix are decided by the users.
Whenever a user creates an object, he or she decides the access
rights of different domains on this object. Then, a column
corresponding to the newly created object is added to the matrix and
the appropriate entries are made in the rows.

Domains as Objects
Access matrix can also be used for representing domain switching
among processes. This can be achieved by representing each
domain as an object in the access matrix, and switching among
domains is shown by adding a switch entry at the intersection of the
corresponding row and column. For example, Figure 13.3 shows a
modified access matrix, in which three columns have been added to
represent domains as objects. An entry switch in the row and column
intersection of domains D1 and D2 indicates that domain switching is
possible between them.

Fig. 13.3 Access Control Matrix with Domains as Objects

Protecting Access Matrix Entries


The contents of the access matrix are not fixed or static—they can
also be modified. Since each entry in the access matrix can be
modified individually, it also needs to be protected from unauthorized
access. That is, only the authorized users are allowed to modify the
contents of the access matrix. Allowing controlled change in the
contents of the access matrix requires three additional operations,
namely, copy, owner, and control. The copy and owner rights allow a
process to modify only the column entries in the access matrix;
however, the control right allows a process to modify the row entries.

Copy Right
The copy right allows copying of an access right from one domain to
another. The access right can only be copied within the same column,
that is, for the same object for which the copy right is defined. The
copy right is denoted by an asterisk (*) appended to the access right.
For example, Figure 13.4 (a) shows that the process executing in
domain D1 has the ability to copy the execute operation into any entry
associated with object O3. Figure 13.4 (b) shows a modified version of
access matrix, where the access right execute* has been copied to
domain D3.
Fig. 13.4 Copy Right and its Variations

It is clear from Figure 13.4 (b) that D1 has propagated both the
access right as well as the copy right to D3. There exist two more
variants of this scheme.
• Limited copy: In this case, only the access right is copied (not the
copy right) from one domain to another. For example, in Figure
13.4 (c) only the access right is copied from D1 to D3—D3 cannot
further copy the access right.
• Transfer: In this case, the right is transferred (not copied) from
one domain to another, that is, it is removed from the original
domain. For example, in Figure 13.4 (d) the access right is
transferred from D1 to D3. Note that it is removed from the domain
D1.

Owner Right
The owner right allows a process to add new rights and remove the
existing rights within the same column for which the owner right is
defined. For example, in Figure 13.5 (a), domain D1 is the owner of
object O1, hence it can grant and revoke the access rights to and from
the other domains for the object O1 [as shown in Figure 13.5 (b)].

Fig. 13.5 Access Control Matrix with Owner Rights

Control Right
Control right is applicable to only domain objects. It allows a process
executing in one domain to modify other domains. For example, in
Figure 13.6, a process operating in domain D2 has the right to control
any of the rights in domain D3.
Fig. 13.6 Access Control Matrix with Control Right

Implementation of Access Control Matrix


The main drawback of access control matrix is its huge size. If there
are n number of users and files, the size of the access matrix will be n
× n. This will require a large memory area for holding the access
matrix. However, the access control information contained in the
access matrix is of much smaller size. Most of its entries are null
values. Thus, implementing it in the form of sparse matrix will only
waste the memory. To effectively implement an access control matrix,
some other techniques are used. These techniques are discussed in
this section.

Global Table
In this technique, a single (global) table is created for all domains and
objects. The table comprises a set of ordered triples <domain,
object, accessrights-set>. That is, each entry in the table
represents the access rights of a process executing in a specific
domain on a specific object. Whenever a process in domain Di needs
to perform an operation X on an object Oj, then the global table is
searched for a triple <Di, Oj, Rk>, where X𝜖Rk. If the matching entry is
found, the process is allowed to perform the desired operation;
otherwise an exception is raised and the access is denied. This is the
easiest method for implementing the access control matrix. However,
it suffers from some drawbacks. First, it is large in size, thus it cannot
be kept in the main memory. For this reason, an additional I/O is
required to access it from the secondary storage. Secondly, it does
not support any kind of groupings of objects or domains, thus, if any
access right (say, read operation) is applicable to several domains, a
separate entry must be stored for each domain.

Access Lists for Objects


Another technique of implementing the access control matrix is to
store every column as an access list of one object. Access list for an
object consists of ordered pairs <domain, accessright-set>
represented as <Di, Rk>. This set defines all domains with a non-
empty set of access rights for that object. The main advantage of this
approach is that the entries which are empty will be discarded.
Moreover, it allows to define a default set of access rights which
comprises the operations that can be performed on an object by all
domains. Whenever a process executing in domain Di needs to
perform an operation X (XϵRk) on object Oj, the access list for the
object Oj is searched.. If the entry is found, the operation continues,
otherwise the default set is checked. If the operation belongs to the
default set, access is allowed; otherwise an exception is raised.

Capability Lists for Domains


In this technique, rather than creating a list for each object, the
operating system creates a list for each domain in the access matrix.
This list (known as capability list) consists of the objects along with
the operations allowed on those objects for a particular domain. An
object is represented by its physical name or address, which is known
as a capability. Whenever a process needs to perform an operation
X on object Oj, it must specify the capability (or pointer) for object Oj
as a parameter. When the process is allowed to perform the desired
operation, it is termed as possession of the capability. The main
advantage of capability list is that it is itself a protected object, which
is created by the operating system. It cannot be accessed by a user
process directly, and hence cannot be modified. This implies that if all
the capabilities are secure, the objects they are protecting are also
secure against unauthorized access.

A Lock-Key Scheme
In this technique, each object and domain is associated with a list of
unique bit patterns. The bit patterns associated with the objects are
known as locks, and the bit patterns associated with the domains are
known as keys. If a process executing in domain Di wants to perform
an operation X on object Oj, it is allowed to do so only if Di has a key
that matches one of the locks of Oj. Thus, this mechanism can be
considered as a compromise between the two techniques discussed
earlier (access lists and capability lists).

13.5 REVOCATION OF ACCESS RIGHTS


In dynamic protection system, it might require to revoke the access
rights to the objects shared by several users. Whenever the access
rights need to be revoked, several issues must be considered:
• Whether the access rights should be revoked immediately or
should they be delayed.
• Whether the revocation of access right on an object should affect
all the users who are accessing the object, or it should affect only
a subset of users.
• Whether all the access rights associated with an object should be
revoked, or only a subset of access rights should be revoked.
• Whether to revoke the access right permanently or it can be
granted later again.
Revocation of access rights is much easier in case of access list
as compared to capability list. This is because in case of access list,
the access right to be revoked is simply deleted from the list.
However, in case of capability list, since the capabilities are
distributed throughout the system, they need to be searched before
revocation. To implement revocation for capabilities, the following
techniques can be used.
• Re-acquisition: In this technique, capabilities are periodically
deleted from each domain. If the capability that a process needs
to acquire has been deleted, it will try to re-acquire it. This is
known as re-acquisition. However, if the access has been
revoked, the process will be unable to re-acquire the capability.
• Back-pointers: This technique was initially implemented in
MULTICS system. In this approach, a list of pointers is associated
with each object, which points to all capabilities associated with
that object. When an access right on an object needs to be
revoked, the list of pointers associated with that object is
searched, and capabilities are modified as required. This is the
most commonly used scheme for revocation but its
implementation is expensive as the number of pointers is large.
• Indirection: The capabilities associated with an object point
indirectly to the object. Each capability actually directly points to a
unique entry in a global table, which in turn points to the object. At
the time of revocation, the desired entry is searched in the global
table, and deleted from there. The deleted table entries can be
reused for storing other capabilities without any difficulty. This is
because if in future, a process attempts to access an object
(whose access right has been revoked), the capability points to
an illegal (deleted) entry or to a mismatched entry (entry for
another object) in the global table. The main drawback of this
scheme is that it does not allow selective revocation.
• Keys: As discussed in the lock-key scheme, a key is a unique bit
pattern that is associated with each capability. The key is defined
when the capability is generated and the process owning the
capability is not allowed to examine or modify the defined key. In
addition, a master key is associated with each object. When a
capability for an object is created, the current value of the master
key is associated with the capability using the set-key operation.
When the capability is used, its key is compared to the master
key. In case a match is found, the process is allowed to perform
the desired operation; otherwise, an exception is raised and
access is denied. At the time of revocation, the current value of
the master key is replaced by a new value using the set-key
operation. This invalidates all the previous capabilities for this
object.
Note: In key-based schemes, all the users should not be allowed to
perform operations on keys such as defining keys, inserting and
deleting them to and from the lists, etc.—only the owner of an object
should be allowed to do so.

13.6 SECURITY PROBLEM


Security has always been an overriding concern of humankind.
Today, computer systems are used to store vital information of an
organization, such as payroll or other financial data or data pertaining
to corporate operations. If such information is stolen by some
unauthorized user (or lost accidently), it may slow down the pace of
an organization and seriously affect its business. Therefore, it has
become essential to ensure the security of data against unauthorized
access.
Undoubtedly, the protection mechanisms (discussed in Section
13.4) provided by the operating system enable users to protect their
programs and data. But a system is considered secure only if the
users make only the intended use of the computer’s resources in
every situation. Though it is not possible to achieve total security, we
should use such mechanisms that ensure the minimal possibility of
security breach.
In this section, we will first introduce the term intruders and then
discuss various types of security violations, different methods
employed for attempting security attacks and the levels of security
measures.

13.6.1 Intruders
Intruders (sometimes also called adversaries) are the attackers who
attempt to breach the security of a network. They attack on the
privacy of a network in order to get unauthorized access. Intruders
are of three types, namely masquerader, misfeasor and clandestine
user.
• Masquerader is an external user who is not authorized to use the
computer and tries to gain privileges to access some legitimate
user’s account. Masquerading is generally done by either using
stolen IDs and passwords or through bypassing authentication
mechanisms.
• Misfeasor is generally a legitimate user who either accesses
some applications or data without any privileges to access them,
or if he/she has privilege to access them, he/she misuses these
privileges. It is generally an internal user.
• Clandestine user is either an internal or external user who gains
admin access to the system and tries to avoid access control and
auditing information.

13.6.2 Types of Security Violations


There exist many types of security violations which can be broadly
classified into two categories: accidental and intentional (malicious).
Accidental violations may occur due to system error, improper
authorization, or by concurrent use of crucial information with no
proper synchronization mechanism. On the other hand, intentional
security violations may occur due to access by malicious persons,
fraud or theft committed by some insider (such as a disgruntled
employee of an organization). Accidental security violations are
easier to protect than intentional ones.
A few common types of security violations are described as
follows.
• Breach of confidentiality: Confidentiality refers to maintaining
secrecy of system data, that is, information is accessible to only
those users who are authorized to access it. Confidentiality is
breached when the secret data, such as credit-card details or
identity information gets captured (or theft) from the system by
some unauthorized user.
• Breach of integrity: Integrity refers to ensuring that the
information cannot be modified by unauthorized users. Integrity is
breached when some unauthorized user modifies the information
that he/she is not allowed to access.
• Breach of availability: Availability refers to ensuring that the
required information is available to authorized users at all times.
Availability is breached when some unauthorized user destroys
the data, making it inaccessible to authorized users. A common
example of this type of security violation is website defacement.
• Theft of service: This type of security violation occurs due to
unauthorized use of resources. For example, an intruder may
install a daemon on the system which acts as a file server.
• Denial of service (DoS): This type of security violation does not
damage information or access the unauthorized information but
prevents legitimate use of the system for authorized users.

Methods used in Security Attacks


The intruders adopt certain standard methods while making attempts
to breach the security of system. Some of these methods are
described as follows.
• Masquerading: In computer terms, masquerading is said to
happen when an entity impersonates another entity. By
masquerading, the intruders breach authentication and attempt to
gain more privileges than they are authorized for or gain access
that they would not normally be allowed for. Masquerading is
generally done by using stolen IDs and passwords or through
bypassing the authentication mechanisms.
• Replay attack: This attack involves capturing a copy of valid data
transmission between a sender and receiver and repeating it later
for malicious reasons, bringing out an unauthorized result. This
attack transmission might be an entire attack or partial attack.
• Message modification: In replay attack, sometimes the entire
attack is replayed, while usually the replay attack is used in
combination with message modification in which the transmitting
valid user’s information is replaced with an unauthorized user’s
data to cause an unauthorized effect.
• Man-in-the-middle attack: In this attack, an intruder comes in the
middle of the communication between two legitimate users and
pretends as the sender to the receiver and as receiver to the
sender.
• Session hijacking: The-man-in-the-middle attack is preceded by
session hijacking in which an intruder intercepts the active
communication session between two legitimate users.

Security Measure Levels


There are four levels at which security measures should be applied in
order to protect the system. These levels are described as follows.
• Physical level: Each physical location, such as sites or machine
rooms where computer systems are placed must be physically
secured so that intruders are unable to enter the premises.
• Human level: Users must be authorized carefully so that only
legitimate users could gain access to the system. However,
sometimes authorized users may also let others to use their
access either intentionally for their personal benefits or they may
be trapped unknowingly during social engineering attacks. A
common example of such attack is phishing in which the
fraudsters make use of real-look fake websites or emails and
prompt the users to enter their personal information like
username, password, social security number or credit card
details. This way they access personal information of users and
misuse it.
• Operating-system level: The system itself must be able protect
against accidental failures or other security breaching attempts. In
order to ensure security at operating-system level, it is important
to maintain the security at the physical and human level. This is
because any loophole in the security at high levels may
compromise low-level security even though strict security
mechanisms have been used at the low level.
• Network level: Nowadays, a huge amount of data is transmitted
via computer networks. These data may travel over shared lines,
private leased lines, dial-up connections or wireless connections
and are prone to be intercepted by intruders. Therefore, it is
important to take care of security attacks over system networks.

13.7 DESIGN PRINCIPLES FOR SECURITY


Designing a secure operating system is a crucial task. The major
concern of the designers is on the internal security mechanisms that
lay the foundation for implementing security policies. Researchers
have identified certain principles that can be followed while designing
a secure system. Some design principles presented by Saltzer and
Schroeder (1975) are as follows.
• Least privilege: This principle states that a process should be
allowed the minimal privileges that are required to accomplish its
task.
• Fail-safe default: This principle states that access rights should
be provided to a process on its explicit requests only and the
default should be ‘no access’.
• Complete mediation: This principle states that each access
request for every object should be checked by an efficient
checking mechanism in order to verify the legality of access.
• User acceptability: This principle states that the mechanism
used for protection should be acceptable to the users and should
be easy to use. Otherwise, the users may feel burdened in
following the mechanism.
• Economy of mechanism: This principle states that the protection
mechanism should be kept simple as it helps in verification and
correct implementation.
• Least common mechanism: This principle states that the
amount of mechanism common to and depended upon by
multiple users should be kept as minimum as possible.
• Open design: This principle states that the design of the security
mechanism should be open to all and should not depend on
ignorance of intruders. This entails use of cryptographic systems
where the algorithms are made public while the keys are kept
secret.
• Separation of privileges: This principle states that the access to
an object should not depend only on fulfilling a single condition;
rather more than one condition should be fulfilled before granting
access to the object.

13.8 SECURITY THREATS


Security threats continue to evolve around us by finding new ways.
Some of them are caused by humans, some by the mother nature,
such as floods, earthquakes and fire, and some are by the use of
Internet such as virus, Trojan horse, spyware, and so on.
Once the intruder gets access of the systems of an organization,
he/she may steal the confidential data, modify it in place, or delete it.
The stolen data or information can then be used for illegal activities
like blackmailing the organization, selling the information to
competitors, etc. Such attacks prove more destructive, especially if
the data deleted by the intruder cannot be recovered by the
organization.
Security can also be affected by natural calamities, such as
earthquakes, floods, wars, and storms. Such disasters are beyond
the control of humans and can result in huge loss of data. The only
way to deal with these threats is to maintain timely and proper
backups at geographically apart locations.
Different security threats are classified into two broad categories:
program threats and system and network threats. This section
discusses both kinds of security threats.

13.8.1 Program Threats


The common goal of intruders is to write programs that attempt to
breach the security or cause the processes to behave differently from
their expected behaviour and thus, create security breaches. In this
section, we will discuss some commonly used methods by which
programs cause security breaches.

Trap Doors
Trap doors (also known as backdoors) refer to the security holes left
by the insiders in the software purposely. Sometimes, while
programming the systems, the programmers embed a code into the
program to bypass some normal protective mechanism. For example,
they can insert a code that circumvents the normal login/ password
authentication procedure of the system, thus providing access to the
system. The main characteristic of trap doors is that they are hidden
in the software and no one knows about them for certainty.
In computing industry, insertion of trap doors is usually considered
necessary so that the programmers could quickly gain access to the
system in any undesirable error condition or when all other ways of
accessing the system have failed. However, this itself may prove a
potential security threat if a hacker comes to know about it.

Trojan Horses
Trojan horse is a malicious program that appears to be legal and
useful but concurrently does something unexpected like destroying
existing programs and files. It does not replicate itself in the computer
system and hence, it is not a virus. However, it usually opens the way
for other malicious programs such as viruses to enter the system. In
addition, it may also allow access to unauthorized users.
Trojan horses spread when users are convinced to open or
download a program because they think it has come from a legitimate
source. They can also be mounted on software that is freely
downloadable. They are usually subtler especially in the cases where
they are used for espionage. They can be programmed for self-
destruction, without leaving any evidence other than the damage they
have caused. The most famous Trojan horse is a program called
Back Orifice, which is an unsubtle play of words on Microsoft’s Back
Office suite of programs for NT server. This program allows anybody
to have complete control over the computer or server it occupies.
Another activity relating to the Trojan horse is spyware. Spyware
are small programs that install themselves on computers to gather
data secretly about the computer user without his/her consent and
knowledge and report the collected data to interested users or
parties. The information gathered by the spyware may include e-mail
addresses and passwords, net surfing activities, credit card
information, etc. The spyware often gets automatically installed on
your computer when you download a program from the Internet or
click any option from the pop-up window in the browser.

Logic Bombs
Logic bomb is a program or portion of a program, which lies dormant
until a specific part of program logic is activated. The most common
activator for a logic bomb is date. The logic bomb checks the date of
the computer system and does nothing until a pre-programmed date
and time is reached. It could also be programmed to wait for a certain
message from the programmer. When the logic bomb sees the
message, it gets activated and executes the code. A logic bomb can
also be programmed to activate on a wide variety of other variables
such as when a database grows past a certain size or a user’s home
directory is deleted. For example, the well-known logic bomb is
Michelangelo, which has a trigger set for Michelangelo’s birthday. On
the given birth date, it causes system crash or data loss or other
unexpected interactions with the existing code.
Viruses
Virus (stands for Vital Information Resources Under Seize) is a
program or small code segment that is designed to replicate, attach
to other programs, and perform unsolicited and malicious actions. It
enters the computer system from external sources, such as CD, pen
drive, or e-mail and executes when the infected program is executed.
Further, as an infected computer gets in contact with an uninfected
computer (for example, through computer networks), the virus may
pass on to the uninfected system and destroy the data.
Just as flowers are attractive to the bees that pollinate them, virus
host programs are deliberately made attractive to victimize the user.
They become destructive as soon as they enter the system or are
programmed to lie dormant until activated by a trigger. The various
types of virus are discussed as follows.
• Boot sector virus: This virus infects the master boot record of a
computer system. It either moves the boot record to another
sector on the disk or replaces it with the infected one. It then
marks that sector as a bad sector on the disk. This type of virus is
very difficult to detect since the boot sector is the first program
that is loaded when a computer starts. In effect, the boot sector
virus takes full control of the infected computer.
• File-infecting virus: This virus infects files with extension .com
and .exe. This type of virus usually resides inside the memory
and infects most of the executable files on the system. The virus
replicates by attaching a copy of itself to an uninfected executable
program. It then modifies the host programs and subsequently,
when the program is executed, it executes along with it. File-
infecting virus can only gain control of the computer if the user or
the operating system executes a file infected with the virus.
• Polymorphic virus: This virus changes its code as it propagates
from one file to another. Therefore, each copy of virus appears
different from others; however, they are functionally similar. This
makes the polymorphic virus difficult to detect like the stealth
virus (discussed below). The variation in copies is achieved by
placing superfluous instructions in the virus code or by
interchanging the order of instructions that are not dependent.
Another more effective means to achieve variation is to use
encryption. A part of the virus, called the mutation engine,
generates a random key that is used to encrypt the rest portion of
the virus. The random key is kept stored with the virus while the
mutation engine changes by itself. At the time the infected
program is executed, the stored key is used by the virus to
decrypt itself. Each time the virus replicates, the random key
changes.
• Stealth virus: This virus attempts to conceal its presence from
the user. It makes use of compression such that the length of the
infected program is exactly the same as that of the uninfected
version. For example, it may keep the intercept logic in some I/O
routines so that when some other program requests for
information from the suspicious portions of the disk using these
routines, it will present the original uninfected version to the
program. Stoned Monkey is one example of stealth virus. This
virus uses ‘read stealth’ capability and if a user executes a disk
editing utility to examine the main boot record, he/she would not
find any evidence of infection.
• Multipartite virus: This virus infects both boot sectors and
executable files, and uses both mechanisms to spread. It is the
worst virus of all because it can combine some or all of the stealth
techniques along with polymorphism to prevent detection. For
example, if a user runs an application infected with a multipartite
virus, it activates and infects the hard disk’s master boot record.
Moreover, the next time the computer is started, the virus gets
activated again and starts infecting every program that the user
runs. One-half is an example of a multipartite virus, which exhibits
both stealth and polymorphic behaviour.

13.8.2 System and Network Threats


Program threats attack the programs by finding weak points in the
protection mechanisms of a system. Contrastive to this, the system
and network threats cause security breaches by misusing the
resources and user files of the operating system. They create such
situations that the system services and network connections are
misused. In this section, we will discuss two common methods used
to achieve this misuse.
Note: Sometimes, a program threat is used to launch a system and
network threat and vice versa.

Worms
Worms are the programs constructed to infiltrate into the legitimate
data processing programs and alter or destroy the data. They often
use network connections to spread from one computer system to
another, thus, worms attack systems that are linked through
communication lines. Once active within a system, worms behave like
a virus and perform a number of disruptive actions. To reproduce
themselves, worms make use of network medium, such as:
• Network mail facility, in which a worm can mail a copy of itself to
other systems.
• Remote execution capability, in which a worm can execute a copy
of itself on another system.
• Remote log in capability, whereby a worm can log into a remote
system as a user and then use commands to copy itself from one
system to another.
Both worms and viruses tend to fill computer memory with useless
data thereby preventing the user from using memory space for legal
applications or programs. In addition, they can destroy or modify data
and programs to produce erroneous results as well as halt the
operation of the computer system or network. The worm’s replication
mechanism can access the system by using any of the three methods
given below.
• It employs password cracking, in which it attempts to log into
systems using different passwords such as words from an online
dictionary.
• It exploits a trap door mechanism in mail programs, which permits
it to send commands to a remote system’s command interpreter.
• It exploits a bug in a network information program, which permits it
to access a remote system’s command interpreter.

Denial of Service (DoS)


DoS is a network-based attack whose objective is not to steal the
system resources or access confidential data; rather, it aims to
prevent the legitimate users from accessing information or services
by interrupting the normal use of the system services. DoS attacks
fall under two categories: one which eat up almost all system
resources, preventing legitimate users from doing any useful work
and another which target the network and disrupt its operation.
The most common type of DoS attack occurs when attackers
mischievously flood a network server or a web server with multiple
false requests for services in order to crash the network. In this
situation, the server is not able to serve the genuine requests. This is
a ‘denial of service’ because the legitimate users cannot use the
network facility.
A variant of DoS attack is Distributed Denial of Service (DDoS)
attack in which numerous computers are used to generate the false
requests for a network. Using numerous computers helps the attacker
to flood the network very quickly.
Note that DoS attack does not damage information or access
restricted areas but it can shut down a website, thereby making it
inaccessible for genuine users. Several times, it becomes difficult for
a website to determine that it has been attacked. For example, a
slowdown may be considered as due to network traffic.
Note: It is usually impossible to prevent DoS attacks. In addition, it is
more difficult to prevent and resolve DDoS attacks than DoS attacks.
13.9 CRYPTOGRAPHY
Cryptography is a means for implementing security in a networked
environment. The term cryptography is derived from a Greek word
kryptos which means “secret writing”. In simple terms, cryptography
is the process of altering messages in a way that their meaning is
hidden from the adversaries who might intercept them. It allows a
sender to disguise a message to prevent it from being read or altered
by the intruder as well as it enables the receiver to recover the
original message from the disguised one.
In data and telecommunications, cryptography is an essential
technique required for communicating over any untrusted medium,
which includes any network, such as the Internet. By using
cryptographic techniques, the sender can first encrypt a message and
then transmit it through the network. The receiver on the other hand,
can decrypt the message and recover its original contents.
Cryptography relies upon two basic components: an algorithm (or
cryptographic methodology) and a key. Algorithms are complex
mathematical formulae and keys are the strings of bits. For two
parties to communicate over a network (the Internet), they must use
the same algorithm (or algorithms that are designed to work
together). In some cases, they must also use the same key.

13.9.1 Encryption
One of the most important aspects of the parts of cryptography is
encryption which is a means of protecting confidentiality of data in
an insecure environment, such as while transmitting data over an
insecure communication link. It is used in security and protection
mechanisms to protect the information of users and their resources.
Encryption is accomplished by applying an algorithmic
transformation to the data. The original unencrypted data is referred
to as plaintext, while its encrypted form is referred to as ciphertext.
Thus, encryption is defined as the process of encrypting plaintext so
that ciphertext can be produced. The plaintext is transformed to
ciphertext using the encryption algorithm. The ciphertext needs to be
converted back to plaintext using the opposite process of encryption,
called decryption which uses a decryption algorithm to accomplish
the same.
Both encryption and decryption algorithms make use of a key
(usually, a number or set of numbers) to encrypt or decrypt the data,
respectively (see Figure 13.7). The longer the key, harder it is for an
opponent to decrypt the message.
During encryption, the encryption algorithm (say, E) uses the
encryption key (say, k) to convert the plaintext (say, P) to ciphertext
(say, C), as shown here.

During decryption, the decryption algorithm (say, D) uses the


decryption key (k) to convert cipherext back to the plaintext, as shown
here.

Fig. 13.7 Encryption and Decryption

An encryption algorithm must possess a property that a computer


can decrypt a given ciphertext C to plaintext P if and only if it has D(k).
No one without having D(k) can produce plaintext from ciphertext.
Since the ciphertext is transmitted through a network (an insecure
channel), it is prone to be intercepted. Therefore, it is important to
ensure that even if the ciphertext is intercepted, it is potentially not
feasible to derive D(k) from it.
There are two categories of encryption algorithms, namely,
symmetric and asymmetric. Both these categories are discussed in
this section.
Note: Encryption and decryption algorithms are together known as
ciphers. Ciphers need not necessarily be unique for each
communicating pair; rather a single cipher can be used for
communication between multiple pairs.

Symmetric Encryption
The type of encryption in which the same key is used for both
encryption and decryption of data is called symmetric encryption.
Data Encryption Standard (DES) is a well known example of
symmetric encryption algorithm. In 1977, the US government
developed DES, which was widely adopted by the industry for use in
security products. The DES algorithm is parameterized by a 56-bit
encryption key. It has a total of 19 distinct stages and encrypts the
plaintext in blocks of 64 bits, producing 64 bits of ciphertext. The first
stage is independent of the key and performs transposition on the 64-
bit plaintext. The last stage is the exact inverse of the first stage
transposition. The preceding stage of last one exchanges the first 32
bits with the next 32 bits. The remaining 16 stages perform encryption
by using parameterized encryption key. Since the algorithm is
symmetric key encryption; it allows decryption to be done with the
same key as encryption. All the steps of the algorithm are run in the
reverse order to recover the original data.
With increasing speeds of computers, it was feared that a special-
purpose chip can crack DES in under a day by searching 256 possible
keys. Therefore, NIST created a modified version of the DES, called
triple DES (3-DES), with increased key length thereby making the
DES more secure. As the name implies, 3-DES performs the DES
thrice, including two encryptions and one decryption. There are two
implementations of 3-DES: one with two keys, while the other with
three keys. The former version uses two keys (k1 and k2) of 56 bits
each, that is, the key size is 112 bits. During encryption, the plaintext
is encrypted using DES with key k1 in the first stage, then the output
of first stage is decrypted using DES with key k2 in the second stage,
and finally, in the third stage, the output of second stage is encrypted
using DES with key k1 thereby producing the ciphertext. In contrast,
the latter version of 3-DES uses three keys of 56 bits each and a
different key is used for encryption/decryption in each stage. The use
of three different keys further increases the key length to 168 bits,
making the communication more secured.
After questioning the inadequacy of DES, the NIST adopted a
new symmetric encryption standard called Advanced Encryption
Standard (AES) in 2001. AES supports key lengths of 128, 192, and
256 bits and specifies the block size of 128 bits. Since the key length
is 128-bit, there are 2128 possible keys. It is estimated that a fast
computer that can crack DES in 1 second could take trillion of years
to crack 128-bit AES key. The main problem with symmetric algorithm
is that the key must be shared among all the authorized users. This
increases the chance of key becoming known to an intruder.

Asymmetric Encryption
In 1976, Diffie and Hellman introduced a new concept of encryption
called asymmetric encryption (or public encryption). It is based on
mathematical functions rather than operations on bit patterns. Unlike
DES and AES, it uses two different keys for encryption and
decryption. These are referred to as public key (used for encryption)
and private key (used for decryption). Each authorized user has a
pair of public key and private key. The public key is known to
everyone, whereas the private key is known to its owner only, thus,
avoiding the weakness of DES. Assume that E and D represent the
public encryption key and the private decryption key, respectively. It
must be ensured that deducing D from E should be extremely difficult.
In addition, the plaintext that is encrypted using the public key Ei
requires the private key Di to decrypt the data.
Now suppose that a user A wants to transfer some information to
user B securely. The user A encrypts the data by using public key of B
and sends the encrypted message to B. On receiving the encrypted
message, B decrypts it by using his private key. Since decryption
process requires private key of user B, which is known only to B, the
information is transferred securely.
In 1978, a group at MIT invented a strong method for asymmetric
encryption. It is known as RSA, the name derived from the initials of
the three discoverers Ron Rivest, Adi Shamir, and Len Adleman. It is
now the most widely accepted asymmetric encryption algorithm; in
fact most of the practically implemented security is based on RSA.
For good security, the algorithm requires keys of at least 1024 bits.
This algorithm is based on some principles from the number theory,
which states that determining the prime factors of a number is
extremely difficult. The algorithm follows the following steps to
determine the encryption key and decryption key.
1. Take two large distinct prime numbers, say m and n (about 1024 bits).
2. Calculate p= m × n and q=(m–1)×(n–1).
3. Find a number which is relatively prime to q, say D. That number is the
decryption key.
4. Find encryption key E such that E × D=1 mod q.
Using these calculated keys, a block B of plaintext is encrypted as
Te=BE mod p. To recover the original data, compute B =(Te)D mod p.
Note that E and p are needed to perform encryption, whereas D and p
are needed to perform decryption. Thus, the public key consists of
(E,p), and the private key consists of (D,p). An important property of
the RSA algorithm is that the roles of E and D can be interchanged. As
the number theory suggests that it is very hard to find prime factors of
p, it is extremely difficult for an intruder to determine decryption key D
using just E and p, because it requires factoring p which is very hard.
As an example, consider we have to encrypt the plaintext 6 using
RSA encryption algorithm. Suppose we use prime numbers 11 and 3
to compute the public key and private key. Here, we have m = 11 and
n = 3. Thus, p and q can be calculated as:

p = m × n = 11 × 3 = 33
q = (m-1)×(n-1) = (11-1) × (3-1) = 10 × 2 = 20
Let us choose D = 3 (a number relatively prime to 20, that is, gcd
(20, 3) = 1.
Now,
E × D = 1 mod q

⇒E × 3 = 1 mod 20
⇒E=7
As we know, the public key consists of (E,p), and the private key
consists of (D,p). Therefore, the public key is (7, 33) and the private
key is (3, 33).
Thus, the plaintext 6 can be converted to ciphertext using the
public key (7, 33) as shown here.
Te=BE mod p

⇒ 67 mod 33

⇒ 30
On applying the private key to the ciphertext 30 to get original
plaintext, we get:
B = (Te)D mod p

⇒ (30)3 mod 33

⇒6

13.10 USER AUTHENTICATION


The operating system applies various means to provide security, one
of which is by controlling the access to the system. This approach
often raises a few questions such as:
• Who is allowed to login into the system?
• How can a user prove that he/she is a true user of the system?
Some process is required that lets users to present their identity
to the system to confirm their correctness. This process of verifying
the identity of a user is termed as authentication. User
authentication can be based on:
• user knowledge (such as a username and password),
• user possession (such as a card or key) and/or
• user attribute (such as fingerprint, retina pattern or iris design).

13.10.1 Passwords
Password is the simplest and most commonly used authentication
scheme. In this scheme, each user is asked to enter a username and
password at the time of logging into the system. The combination of
username and password is then matched against the stored list of
usernames and passwords. If a match is found, the system assumes
that the user is legitimate and allows him/her access to the system;
otherwise the access is denied. Generally, the password is asked for
only once when the user logs into the system, however, this process
can be repeated for each operation when the user tries to access
sensitive data.
Though the password scheme is widely used, it has some
limitations. In this method, the security of the system relies
completely on the password. Thus, password itself needs to be
secured from unauthorized access. However unfortunately, the
passwords can be easily guessed, accidently exposed to some
intruder or may be passed illegally from an authorized to
unauthorized user. Moreover, they can be exposed to intruders
through visual or electronic monitoring. In visual monitoring, an
intruder looks over the shoulder of the user while he types the
password by seeing the keyboard. This activity is referred to as
shoulder surfing. On the other hand, in electronic monitoring (or
network monitoring), one having direct access to the network (in
which the system runs) can see the data being transferred on the
network. Such data may consist of user IDs and passwords also. This
activity is referred to as sniffing.
One simple way to secure the password is to store it in an
encrypted form. The system employs a function (say, f(x)) to
encode (encrypt) all the passwords. Whenever a user attempts to log
into the system, the password entered by him/her is first encrypted
using the same function f(x) and then matched against the stored list
of encrypted passwords.
The main advantage of encrypted passwords is that even if the
stored encrypted password is seen, it cannot be determined. Thus,
there is no need to keep the password file secret. However, care
should be taken to ensure that the password would never be
displayed on the screen in its decrypted form.

13.10.2 One-time Passwords


A simple approach to avoid the problems occurred due to password
exposure is to change the password during each session (means
using one-time passwords) instead of using same password all the
time. The system uses a set of paired passwords where one part of
password pair is provided by the system and another part is to be
supplied by the user at the beginning of each session. In other words,
we can say the system gives a challenge to the user and the user has
to provide the right answer to that challenge in order to access the
system.
To generalize this approach, an algorithm (such as an integer
function, say f) is used as a password. The function f accepts two
inputs: secret and seed and provides the password as output. The
input secret is shared between the user and the system and is never
transmitted over the transmission medium that permits exposure. The
input seed is a random integer or an alphanumeric sequence.
Whenever a session begins, the system selects seed and presents
it to the user as an authentication challenge. In response, the user
applies the function f(secret,seed) and transmits the result of the
function as password to the system. As the system also knows secret
and seed, it applies the same function. In case the result of
computation performed by the system matches the user’s result, the
user is allowed access. This procedure is repeated each time the
user needs to be authenticated and thus, the password generated
during every session is different.
The main advantage of using one-time passwords is that if any
intruder intercepts the password entered by a user in one session,
he/she will not be able to reuse it in the next session. Thus, any
improper authentication due to password exposure is prevented.

13.10.3 Smart Card


In this method, each user is provided with a smart card that is used
for identification. The smart card has a key stored on an embedded
chip, and the operating system of the smart card ensures that the key
can never be read. Instead, it allows data to be sent to the card for
encryption or decryption using that private key. The smart card is
programmed in such a way that it is extremely difficult to extract
values from it, thus, it is considered as a secure device.

13.10.4 Biometric Techniques


Biometric authentication technologies use the unique characteristics
(or attributes) of an individual to authenticate the person’s identity.
These include physiological attributes (such as fingerprints, hand
geometry, or retinal patterns) or behavioural attributes (such as voice
patterns and hand-written signatures). Biometric authentication
technologies based on these attributes have also been developed for
computer log in applications. Biometric authentication is technically
complex and expensive, and user acceptance can be difficult.
Biometric systems provide an increased level of security for
computer systems, but the technology is still new as compared to
memory tokens or smart tokens. Biometric authentication devices
also have imperfection, which may result from technical difficulties in
measuring and profiling physical attributes as well as from the
somewhat variable nature of physical attributes. These attributes may
change, depending on various conditions. For example, a person’s
speech pattern may change under stressful conditions or when
suffering from a sore throat or cold. Due to their relatively high cost,
biometric systems are typically used with other authentication means
in environments where high security is required.

Fig. 13.8 Biometric Techniques

13.11 TRUSTED SYSTEMS


A computer and its operating system which can be relied upon to a
determined level to implement a given security policy, is referred to as
the trusted system. In other words, the trusted system is defined as
the one the failure of which may cause a specified security policy to
be compromised. Trusted systems are of prime importance in areas
where the system’s resources or the information are required to be
protected on the basis of levels of security defined; that is, where
multilevel security is needed. For example, in military, the information
is classified into various levels, such as unclassified (U), confidential
(C), secret (S) and top secret (TS) and each user is allowed to
access only a certain level of information. In addition to military,
trusted systems are also being prominently used in banking and other
financial operations nowadays.
Central to the trusted systems is the reference monitor that is an
entity residing in the operating system of a computer and has the
responsibility of making all the access control related decisions on the
basis of the defined security levels. The reference monitor is
expected to be tamperproof, always invoked and subject to
independent testing.

13.12 FIREWALLING TO PROTECT SYSTEMS AND


NETWORKS
A big problem arises when there is a need to connect a trusted
system to an untrusted network. This can be done with the assistance
of firewall that separates the trusted and untrusted systems. Firewall
is such a mechanism that protects and isolates the internal network
from the outside world. Simply put, a firewall prevents certain outside
connections from entering the network. It traps inbound or outbound
packets, analyzes them, and permits access or discards them.
Basically, a firewall is a router or a group of routers and computers
that filter the traffic and implement access control between an
untrusted network (like Internet) and the more trusted internal
networks.
The most common type of firewall is a network firewall. It divides
the network into separate security domains and controls the network
access between different security domains. The criteria used for
limiting the access may include direction of connection, source or
destination port, or source or destination address. A common
implementation of network firewall specifies the following three
security domains.
• The Internet, which is an untrusted domain.
• The demilitarized zone (DMZ) which is a semi-trusted and semi-
secure network.
• The computers of the organization.
The organization’s computers are allowed to connect to DMZ
computers as well as to the Internet; however, no connections are
allowed from the Internet and DMZ computers to the organization’s
computers. In addition, connections are allowed from the Internet to
DMZ computers.
Note: In some exceptional cases, a restricted access may be allowed
from DMZ to one or more computers of the organization.
Besides network firewalls, there are some other types that protect
and secure network connections. These firewalls are described
below.
• Personal firewall: This firewall is a layer of software that is either
added as an application or included within the operating system.
It is designed to control the communication to and from a specific
host.
• Application proxy firewall: This firewall is designed based on its
understanding of protocols that applications use across the
network. It acts as a proxy server that handles the flow of
application-level traffic. For example, an application proxy firewall
can accept requests for FTP connection on behalf of actual FTP
server and can initiate a connection from the requesting host to
the desired FTP server. It can analyze the traffic while forwarding
messages, block any illegal or disallowed commands, and so on.
Some application proxy firewalls are designed only for specific
protocols, such as XML firewall which monitors only the XML
traffic and blocks any malformed or unwanted XML.
• System-call firewall: This firewall is designed to monitor the
execution of system calls and is placed between the application
and the kernel.
• Notice that it is important to ensure the security of the firewall itself
so that its potentials do not get compromised. A firewall has some
vulnerabilities that a user must know before connecting his/her
system to the network.
• Like any other machine, a firewall is also vulnerable to DoS
attacks. These attacks can affect the firewall to such an extent
that it may fail to provide its intended service.
• Spoofing is another attack in this regard. In this attack, an
unauthorized host pretends to be an authorized host, spoofing its
own identity. For example, if a firewall allows a connection from a
host (say, A) outside the firewall and uses the IP address of the
host as the authentication criterion, then it is quite possible that
some other host (say, B) may use the IP address of A and send
packets to pass through the firewall.
• As firewalls may allow only certain protocol(s) or connections to
pass through it, the attacks (if any) by masquerading protocols or
connections cannot be prevented. For example, a firewall that
allows HTTP connections cannot prevent the web server within it
from buffer-overflow attacks. This is because it is the contents of
HTTP connection that contain such attack.

LET US SUMMARIZE
1. Security deals with threats to information caused by outsiders (non-users),
whereas protection deals with the threats caused by other users (who are
not authorized to do what they are doing) of the system.
2. One of the most important goals of protection is to ensure that each
resource is accessed correctly and only by those processes that are
allowed to do so.
3. Implementing protection requires policies and mechanisms. The policy of
the organization decides which data should be protected from whom and
the mechanism specifies how this policy is to be enforced.
4. Some commonly used protection mechanisms include protection domain,
access control list, and access control matrix.
5. A computer system consists of a set of objects that may be accessed by
the processes. An object can be either a hardware object (such as CPU,
memory segment and printer) or software object (such as file, database,
program and semaphore).
6. A domain is a collection of access rights where each access right is a pair
of <object-name,rights_set>. The object_name is the name of the object
and rights_set is the set of operations that a process is permitted to
perform on the object_name.
7. The association between a process and domain may be either static or
dynamic. In the former case, the process is allowed to access only a fixed
set of objects and rights during its lifetime, while in the latter case, the
process may switch from one domain to another during its execution
(termed as domain switching).
8. Access control matrix (or access matrix) is a mechanism that records the
access rights of processes over objects in a computer system. Like
access control list, it is also employed in file systems and is consulted by
the operating system each time an access request is issued.
9. Allowing controlled change in the contents of the access matrix requires
three additional operations, namely, copy, owner, and control. The copy
and owner rights allow a process to modify only the column entries in the
access matrix; however, the control right allows a process to modify the
row entries.
10. To effectively implement an access control matrix, some techniques are
used. These techniques include global table, access lists for objects,
capability lists for domains, and a lock-key scheme.
11. In dynamic protection system, it might require to revoke the access rights
to the objects shared by several users. Revocation of access rights is
much easier in case of access list as compared to capability list.
12. To implement revocation for capabilities, techniques including re-
acquisition, back-pointers, indirection, and keys are used.
13. Undoubtedly, the protection mechanisms provided by the operating system
enable users to protect their programs and data. But a system is
considered secure only if the users make the intended use of and access
to the computer’s resources, in every situation.
14. Intruders (sometimes also called adversaries) are the attackers who
attempt to breach the security of a network. They attack on the privacy of
a network(s) in order to get unauthorized access. Intruders are of three
types, namely, masquerader, misfeasor and clandestine user.
15. Types of security violations may be many, which can be broadly classified
into two categories: accidental and intentional (malicious). Accidental
security violations are easier to protect than intentional ones.
16. The intruders adopt certain standard methods while making attempts to
breach the security of the system. Some of these methods are
masquerading, replay attack, message modification, man-in-the-middle
attack and session hijacking.
17. There are four levels at which security measures should be applied in
order to protect the system. These include physical, human, operating-
system and network levels.
18. Designing a secure operating system is a crucial task. The major concern
of designers is on the internal security mechanisms that lay the foundation
for implementing security policies.
19. Researchers have identified certain principles that can be followed to
design a secure system. These principles include least privilege, fail-safe
default, complete mediation, user acceptability, economy of mechanism,
least common mechanism, open design and separation of privileges.
20. Security threats continue to evolve around us by finding new ways. Some
of them are caused by humans, some are by nature such as floods,
earthquakes and fire, and some are by the use of Internet such as virus,
Trojan horse, spyware, and so on.
21. Different security threats are classified into two broad categories: program
threats and system and network threats.
22. In simple terms, cryptography is the process of altering messages in a way
that their meaning is hidden from the adversaries who might intercept
them.
23. One of the most important aspects of the parts of cryptography is
encryption which is a means of protecting confidentiality of data in an
insecure environment, such as while transmitting data over an insecure
communication link. There are two categories of encryption algorithms,
namely, symmetric and asymmetric.
24. A process that lets the users to present their identity to the system to
conform their correctness is termed as authentication. User authentication
can be based on—user knowledge (such as a username and password),
user possession (such as a card or key) and/ or user attribute (such as
fingerprint, retina pattern or iris design).
25. A computer and operating system which can be relied upon to a
determined level to implement a given security policy, is referred to as the
trusted system. In other words, a trusted system is defined as the one the
failure of which may compromise a specified security policy.
26. Firewall is such a mechanism that protects and isolates the internal
network from the outside world. Simply put, a firewall prevents certain
outside connections from entering the network.

EXERCISES
Fill in the Blanks
1. _____________ deals with the threats caused by those users of the
system who are not authorized to do what they are doing.
2. _____________ decides which data should be protected from whom.
3. The person who tries to breach the security and harm a system is referred
to as _____________.
4. The association between a process and domain may be either
_____________ or _____________.
5. _____________ use the unique characteristics (or attributes) of an
individual to authenticate a person’s identity.

Multiple Choice Questions


1. Which of the following access rights allow a process to modify row entries
in the access matrix?
(a) Copy
(b) Owner
(c) Both (a) and (b)
(d) Control
2. Which of the following techniques is not used to implement the access
matrix?
(a) Global table
(b) Access list for domains
(c) Capability lists for domains
(d) Lock-key scheme
3. _____________ are the small programs that install themselves on
computers to gather data secretly about the computer user without his/her
consent and report the collected data to interested users or parties.
(a) Virus
(b) Spyware
(c) Trojan horse
(d) None of these
4. Which of the following security design principles states that each access
request for every object should be checked by a mechanism in order to
verify the legality of access?
(a) Economy of mechanism
(b) Open design
(c) Complete mediation
(d) Least privilege
5. Which of the following viruses changes its code as it propagates from one
file to another?
(a) Polymorphic virus
(b) Stealth virus
(c) File-infecting virus
(d) Multipartite virus

State True or False


1. Protection deals with the threats to information caused by outsiders.
2. Protection domain specifies the objects that a process may access.
3. The owner right allows a process executing in one domain to modify other
domains.
4. Economy of mechanism security principle states that the protection
mechanism should be kept simple as it helps in verification and correct
implementations.
5. Application proxy firewall is a layer of software that is either added as an
application or included within the operating system.

Descriptive Questions
1. What is the difference between security and protection?
2. Differentiate between protection policy and protection mechanism.
3. Define the following terms.
(a) Intruder
(b) Phishing
(c) Authentication
4. Discuss the goals and principles of protection.
5. Which factors can affect the security of a computer system and harm it?
6. What are the levels at which security measures should be applied to
protect the system?
7. Discuss various means of authenticating a user.
8. What is the advantage of storing passwords in encrypted form in computer
systems?
9. Describe protection mechanism illustrating use of protection domain and
access control matrix.
10. Describe various techniques used to implement access control matrix.
11. Describe the use of one-time passwords.
12. What is the importance of design principles for security? Explain some of
these principles.
13. Define encryption. Point out the differences between symmetric and
asymmetric encryption.
14. Write short notes on the following.
(a) Firewalls
(b) Trusted systems
(c) Types of security violations
(d) Methods used in security attacks
chapter 14

Multiprocessor and Distributed Operating


Systems

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the term multiprocessor systems.
⟡ Describe the various interconnection networks used in
multiprocessor systems.
⟡ Describe the architecture of multiprocessor systems.
⟡ Discuss different types of multiprocessor systems.
⟡ Describe the term distributed systems.
⟡ Understand the capabilities of a distributed operating system.
⟡ Describe the various techniques of data distribution over a
distributed system.
⟡ Discuss the various aspects of computer networks that form the
basis for distributed systems.
⟡ Understand the concept of distributed file system.

14.1 INTRODUCTION
Today’s computer applications demand high performance machines
that can process large amounts of data in sophisticated ways. The
applications such as geographical information systems, real-time
decision making, and computer-aided design demand the capability
to manage several hundred gigabytes to terabytes of data. Moreover,
with the evolution of Internet, the number of online users as well as
size of data has also increased. This demand has been the driving
force of the emergence of technologies like parallel processing and
data distribution. The systems based on parallelism and data
distribution are called multiprocessor systems and distributed
systems, respectively. This chapter discusses these systems in brief.

14.2 MULTIPROCESSOR SYSTEMS


As the name suggests, the multiprocessor systems (also known as
parallel systems or tightly coupled systems) consist of multiple
processors in close communication in a sense that they share the
computer bus, system clock, and sometimes even memory and
peripheral devices. There are several advantages of a multiprocessor
system over a uniprocessor system. The main advantage is that it
supports parallel processing by executing several independent
processes simultaneously on different CPUs, thus, achieving an
increased system throughput by getting more work done in lesser
time. It also provides computation speed-up within an application by
allowing parallel execution of several child processes or threads of
the application.
Another benefit of multiprocessor systems is that they are more
cost-effective than multiple single-processor systems. They are also
more reliable. If one out of N processors fails, then the remaining N-1
processors share the work of the failed processor amongst them,
thereby preventing the failure of the entire system. This feature in
which a system can continue to operate in the event of failure, though
with somewhat reduced capabilities, is called graceful degradation.

14.2.1 Interconnection Networks


Multiprocessor systems consist of several components such as
CPUs, one or more memory units, disks and I/O devices. All of these
components communicate with each other via an interconnection
network. The three commonly used interconnection networks are bus,
crossbar switch and multistage switch. Note that for simplicity we will
discuss the interconnections of processors and memory units only.
We will not include I/O devices and disks in our discussion.

Bus
It is the simplest interconnection network in which all the processors
and one or more memory units are connected to a common bus. The
processors can send data on and receive data from this single
communication bus. However, only one processor can communicate
with the memory at a time. This organization is simple and
economical to implement, however, suitable only for low traffic
densities. At medium or high traffic densities, it becomes slow due to
bus contention. A simple bus organization containing four processors
P0, P1, P2 and P3 and a shared memory M is shown in Figure 14.1 (a).

Crossbar Switch
A crossbar switch uses an N × N matrix organization, wherein N
processors are arranged along one dimension and N memory units
are arranged along the other dimension. Every CPU and a memory
unit are connected via an independent bus. The intersection of each
horizontal and vertical bus is known as a crosspoint. Each
crosspoint is basically an electric switch that can be opened or closed
depending on whether or not the communication is required between
the processor and the memory. If a processor Pi wants to access the
data stored in memory unit Mj, the switch between them is closed,
which connects the bus of Pi to the bus of Mj.
The crossbar switch eliminates the problem of bus contention as N
processors are allowed to communicate with N different memory units
at the same time. However, the contention problem may occur when
more than one processor attempts to access the same memory unit
at the same time. Figure 14.1 (b) shows a crossbar switch
interconnection containing four processors P0, P1, P2 and P3 and four
memory units M0, M1, M2 and M3. It is clear from the figure that to
completely connect N processors to N memory units, N2 crosspoints
are required. For 1000 CPUs and 1000 memory units, a million
crosspoints are required, which makes the crossbar interconnections
more complex and expensive.

Multistage Switch
A multistage switch lies in between a bus and a crossbar switch in
terms of cost and parallelism. It consists of several stages, each
containing 2 × 2 crossbar switches. A 2 × 2 crossbar switch consists
of two inputs and two outputs. These switches can be connected in
several ways to build a large multistage interconnection network
(MIN). In general, for N processors and N memory units, m=log2N
stages are required, where each stage has N/2 crossbar switches,
resulting in a total of (N/2)log2N switches.
Figure 14.1 (c) shows a multistage network having eight
processors and eight memory units. It consists of three stages, with
four switches per stage, resulting in a total of 12 switches. The jth
switch in ith stage is denoted by Sij. This type of interconnection
network is termed as 8 × 8 omega network.
Whenever a CPU attempts to access data from a memory unit,
the path to be followed between them is selected using the address
bits of the memory unit to be accessed. Initially, the leftmost bit of the
memory address is used for routing data from the switch at the first
stage to the switch at the second stage. If the address bit is 0, the
upper output of the switch is selected, and if it is 1, the lower output is
selected. At the second stage, the second bit of the address is used
for routing, and so on. The process continues until a switch in the last
stage is encountered, which selects one of the two memory units.
For example, suppose the CPU P2 wants to communicate with the
memory unit M7. The binary address of this memory unit is 111.
Initially, P2 passes the data to switch S12. Since the leftmost bit of the
memory address is 1, S12 routes the data to second stage switch S22
via its lower output. The switch at second stage checks the second bit
of the memory address. Since it is again 1, S22 passes the data to the
third stage switch, which is S34, via its lower output. This switch
checks the third bit, which is again 1. Now, S34 routes the data to the
desired memory unit via its lower output. Consequently, the data
follows the highlighted path as shown in Figure 14.1 (c).
The main advantage of multistage switches is that the cost of an
N×N multistage network is much lower than that of an N×N crossbar
switch, because the former network requires a total of (N/2)log2N
switches, which is much lower than N2 crosspoints. However, unlike a
crossbar switch, it is a blocking network because it cannot process
every set of requests simultaneously. For example, suppose another
CPU, say P3 wants to communicate with the memory unit M6 at the
same time when P2 is communicating with M7. The binary address of
M6 memory unit is 110. Initially, P3 passes the data to switch S12. Since
the leftmost bit of the memory address is 1, S12 routes the data to
second stage switch S22 via its lower output. Since the lower output
line is already busy, this request cannot be processed until P2
releases the communicating line.
Fig. 14.1 Types of Interconnection Networks
14.2.2 Architecture of Multiprocessor Systems
Each CPU in a multiprocessor system is allowed to access its local
memory as well as the memories of other CPUs (non-local
memories). Depending on the speed with which the CPUs can
access the non-local memories, the architecture of multiprocessor
systems can be categorized into two types, namely, uniform memory
access (UMA) and non-uniform memory access (NUMA).

Uniform Memory Access (UMA)


In the UMA architecture, all the processors share the physical
memory uniformly, that is, the time taken to access a memory location
is independent of its position relative to the processor. The main
advantage of having a shared memory is that the CPUs can
communicate with each other by using memory writes at a much
faster rate than any other communication mechanism.
The processors and the shared memory can be connected via a
bus or through any other interconnection network. If bus is used as
an interconnection network, the system supports not more than 16 or
32 CPUs, as the bus becomes a bottleneck when the number of
CPUs increases. Adding more processors to the system will not affect
the performance of the system, because the processors will spend
most of their time in waiting for memory access. On the other hand, if
the crossbar switch is used, the system is scalable at low densities of
traffic. However, as the number of CPUs increases, more switches
will be required, which the makes the network more complex and
expensive.
Several improvements have been made in this architecture to
make it better. One such improvement is to provide a cache to each
CPU, as shown in Figure 14.2 (a). Whenever a CPU refers a word
from the shared memory, the entire block of size 32 bytes or 64 bytes
(depending on the system architecture) is fetched and put into the
local cache of the CPU. This reduces the network traffic as now the
CPUs can satisfy many read requests from their local caches.
Each cache block can be marked as read-only or as read-write. In
case of read-only, the cache block can be present in multiple caches
at the same time. However, in case of read-write, the cache block
may not be present in other caches. If a processor attempts to
perform a write operation on some data that is present in other
caches, the bus hardware generates a signal and puts it on the bus to
inform all other caches about the write operation.

Fig. 14.2 UMA Architecture

Note: Balance system by Sequent and VAX 8800 by Digital are the examples of UMA
architecture.
If other caches contain the “clean” copy (same as that of the
memory) of the data to be modified, they simply discard that copy and
allow the processor to fetch the cache block from the memory and
modify it. On the other hand, if any of the caches has the “dirty” or
modified copy (different from that of the memory) of the data, it either
directly transfers it to the processor (that wants to perform write
operation), or it writes it back to the memory before performing the
write operation. In this way, cache coherency can be achieved. The
overhead of maintaining cache coherency increases with increase in
the number of processors. Thus, with this improvement also UMA
cannot support more than 64 CPUs at a time.
Another possible design is to let each processor have local private
memories in addition to caches [see Figure 14.2 (b)]. This further
reduces the network traffic as the compiler places all the read-only
data such as program code, constants, and strings, other data such
as stacks and local variables in the private memories of the
processors. The shared memory is used only for writeable shared
variables. However, this design requires active participation of the
compiler.

Non-uniform Memory Access (NUMA)


The UMA architecture discussed earlier generally supports not more
than 16 or 32 processors if a bus is used as interconnection network.
In case of crossbar or multistage switched multiprocessors,
expensive hardware is required, and they also do not support large
number of CPUs. To support hundreds of CPUs, another architecture
named NUMA is used. In this architecture, the system consists of a
number of nodes, where each node consists of a set of CPUs, a
memory unit and an I/O subsystem connected by a local
interconnection network.
The memory unit associated with each node is said to be local to
the processors in that node. On the other hand, the memory units of
other nodes are said to be non-local. Each node in the system also
consists of a global port on the local interconnection network, which is
connected to a high speed global interconnection network. It can
provide a data transfer rate of more than 1 GB. The global ports of all
nodes are used to transfer data between CPUs and non-local
memory units. The NUMA architecture is shown in Figure 14.3.
Fig. 14.3 NUMA Architecture

Unlike UMA, the CPUs in NUMA architecture access the local and
non-local memories with different speed; each CPU can access its
local memory faster than the non-local memories. The remote or non-
local memory can be accessed via LOAD and STORE instructions. This
architecture supports the concept of distributed virtual-memory
architecture, where logically there is a single shared memory, but
physically there are multiple disjoint memory systems.
Like UMA, cache can also be provided to each CPU in each node.
The global ports of each node can also be associated with a cache
for holding data and instructions accessed by the CPUs in that node
from non-local memories. Thus, it is necessary to ensure coherence
between local as well as non-local caches. The system with coherent
caches is known as Cache-Coherent NUMA (CC-NUMA).
Note: The HP AlphaServer and the IBM NUMA-Q are the examples
of NUMA architecture.

14.2.3 Types of Multiprocessor Operating System


In the previous section, we discussed multiprocessor systems in
terms of hardware. In this section, we will focus on multiprocessor
systems in terms of software, especially the operating system used in
multiprocessor systems. There are basically three types of
multiprocessor operating systems, namely, separate supervisors,
master-slave, and symmetric.

Separate Supervisors
In separate supervisor systems, the memory is divided into as many
partitions as there are CPUs, and each partition contains a copy of
the operating system. Thus, each CPU is assigned its own private
memory and its own private copy of the operating system. Since
copying the entire operating system in each partition is not feasible;
the better option is to create copies of only the data, and allow CPUs
to share the operating system code (see Figure 14.4). Consequently,
n CPUs can operate as n independent computers, still sharing a set
of disks and other I/O devices. This makes it better than having n
independent computers.
The main advantage of this scheme is that it allows the memory to
be shared flexibly. That is, if one CPU needs a larger portion of the
memory to run a large program, the operating system can allocate
the required extra memory space to it until the execution of the
program. Once the execution is over, the additional memory is de-
allocated. Another benefit of this scheme is that it allows efficient
communication among executing processes through shared memory.
The main drawback of this approach is that it does not allow
sharing of processes. That is, all the processes of a user who has
logged into CPU1 will execute on CPU1 only. They cannot be
assigned to any other CPU, which sometimes results in an
imbalanced load distribution. For example, if another processor say
CPU2 is idle while CPU1 is heavily loaded with work. Another
problem arises when caching of disk blocks is allowed. If each
operating system maintains its own cache of disk blocks, then
multiple “dirty” copies of a certain disk block may be present at the
same time in multiple caches, which leads to inconsistent results.
Avoiding caches will definitely eliminate this problem, but it will affect
the system performance considerably.
Master-slave Multiprocessors
In master-slave (or asymmetric) multiprocessing systems, one
processor is different from the other processors in a way that it is
dedicated to execute the operating system and hence, known as
master processor. Other processors, known as slave processors,
are identical. They either wait for instructions from the master
processor to perform any task or have predefined tasks.

Fig. 14.4 Separate Supervisor System

The main advantage of this approach is that only the master


processor maintains a queue of ready processes, and assigns the
processes to the CPUs in such a way that the workload is evenly
distributed among them. The other benefit is that only one buffer
cache is maintained in the master processor, which never produces
inconsistent results.
The main disadvantage of asymmetric systems is that this
approach is workable for only a few slave processors. As the number
of slave processors increases, the master processor becomes
completely overloaded. Moreover, the failure of the master processor
brings the entire system to a halt. Figure 14.5 shows the master-slave
system.

Fig. 14.5 Master-slave System

Symmetric Multiprocessors
In symmetric multiprocessing systems, all the processors perform
identical functions. A single copy of the operating system is kept in
the memory and is shared among all the processors as shown in
Figure 14.6. That is, any processor can execute the operating system
code. Thus, the failure of one CPU does not affect the functioning of
other CPUs.
Though this approach eliminates all the problems associated with
separate supervisors and master-slave systems, it has its own
problems. This approach results in disasters when two or more
processors are executing the OS code at the same time. Imagine a
situation where two or more processors attempt to pick the same
process to execute or claim the same free memory page. This
problem can be resolved by treating the operating system as one big
critical region and associating a mutex variable with it. Now,
whenever a CPU needs to run operating system code, it must first
acquire the mutex, and if the mutex is locked, the CPU should wait
until the mutex becomes free. This approach allows all the CPUs to
execute the operating system code, but in mutually exclusive manner.

Fig. 14.6 Symmetric Multiprocessor System

This approach is also workable only for a few processors. As the


number of processors increases, there will be a long queue of CPUs
waiting to execute the operating system. This problem can be
resolved by allowing the processors to run independent routines of
operating system in parallel. For example, one CPU can execute the
scheduler, another can handle a file system call, and third can
process page fault, all at the same time. To implement this, each
independent routine of the operating system can be treated as a
critical region, each associated with its own mutex variable. This
helps in achieving more parallelism.

14.3 DISTRIBUTED SYSTEMS


A distributed system consists of a set of loosely coupled processors
that do not share memory or system clock, and are connected by a
communication medium. The processors in a distributed system are
referred to by different names such as nodes, computers, sites,
machines or hosts depending on the context in which they are being
referred. For example, the term site is generally used to indicate the
location of a machine, host or node is used to refer to a computer
system at a particular site.
The main advantages of distributed systems are that they allow
resource sharing, enhance availability and reliability of a resource,
provide computation speed-up and better system performance, and
allow incremental growth of the system. Resource sharing is one of
the major advantages of distributed systems. The users at one site
are allowed to share files and other resources such as laser printers
located at any other site on the network.
A distributed system also improves the availability and reliability
of a resource by keeping multiple copies of a particular resource at
different nodes of the system. For example, a user can keep two or
more copies of a particular file at different nodes so that in case of a
failure of one node, the file can be accessed from another node.
However, it is necessary to maintain the consistency of the various
copies of a file residing at different sites in a distributed system.
Computation speed-up within an application can be achieved by
dividing the application into several child processes in a way that they
can be executed independent of each other, and then distributing
these child processes among the various sites of the system for
parallel execution.
Incremental growth can be achieved by adding new hardware
components without replacing or upgrading the existing ones. This
way, the capabilities of a system can be enhanced at a cost
proportional to the nature and need of the enhancement.
It is important to note that since there is no shared memory in a
distributed system, the internode communication is carried out with
the help of message passing, where nodes send and receive
messages via communication lines. Each user is assigned a unique
ID to authenticate him or her in a distributed system. The users can
use their user IDs to login into each other’s remote systems to run
programs, transfer files from other systems, exchange mails to
coordinate their work, and so on. A distributed system provides
continued availability of communication even when users travel
between different sites of a system.

14.3.1 Distributed Operating System


Various capabilities of distributed systems such as resource sharing,
reliability, computation speed-up and incremental growth can be
realized using some hardware and software components. The
hardware components include computer systems, cables, links,
routers and gateways. The software components include operating
system components that handle creation and scheduling of child
processes which need to be distributed among various sites. It also
includes those OS components that ensure efficient usage of remote
resources and reliable communication among various sites.
A network operating system is the earliest form of operating
system used for distributed systems. Network OS is mainly
responsible for providing resource sharing among various systems
connected to the network. Each node has a network operating
system layer that exists between the user processes and the kernel
of the local OS. This layer provides the interface to the user
processes, that is, the user processes interact with the network
operating system layer instead of the kernel of the local OS. If a
process wants to access a local resource, the network OS layer
simply passes the request to the kernel of the local OS, which then
handles this request. On the other hand, if the process wants to
access a remote resource, the network OS layer establishes a
contact with the network OS layer of the node that contains the
desired resource for fulfilling the access request. ARPANET is an
example of network operating system.
Though a network operating system is simple to implement, it has
some drawbacks too. The main drawback is that the users are aware
of how the resources are actually distributed on the network, thus, it
does not support transparent access to distributed resources. The
local operating systems have their own identities and behave as
independent operating systems. Their functioning is not integrated
and their identities are visible to the users. Moreover, the resources
are not under the control of network operating system, therefore it
cannot ensure optimized or balanced utilization of resources.
Therefore, it may happen that some resources in a node are
overloaded with access requests at some time while other resources
of the same kind in other nodes are free. Another problem with
network operating system is the user process has to explicitly
mention the ID of the resource that needs to be accessed, and if that
resource fails, the process has to be aborted. Thus, network OS does
not provide fault tolerance. All these issues provide a path for the
evolution of distributed operating system.
A distributed operating system provides an abstract view of the
system by hiding the physical resource distribution from the users. It
provides a uniform interface for resource access regardless of its
location. That is, the users can access the remote resources in the
same way as they access the local resource. The user need not know
the identities and locations of the resources in order to use them. In
addition, resources are under the control of the distributed operating
system; therefore, processes can be migrated from one site to
another to ensure optimized or balanced utilization of resources.
Amoeba and Mach are examples of distributed operating systems.

14.3.2 Storing and Accessing Data in Distributed


Systems
In a distributed system, the data is stored across several sites. There
are two ways of distributing data in a distributed system, namely,
partitioning and replication. In partitioning (also known as
fragmentation), the data is divided into several partitions (or
fragments), and each partition can be stored at different sites. Data
partitioning is done due to several reasons. One reason could be that
data is voluminous. Other reasons could be that some parts of data
are more often accessed at different sites, or are originated at
different sites. On the other hand, in replication, several identical
copies or replicas of the data are maintained and each replica is
stored at different sites. Generally, replicas are stored at the sites
where they are in high demand. Data replication provides data
reliability and availability, and efficient access. Note that both of these
techniques can also be combined in a way that data can be divided
into several partitions and there may be several replicas of each
partition.
In case data is neither replicated nor partitioned, it is the
responsibility of the operating system to distribute or position the data
in such a way that the total network traffic generated by accessing the
data by various applications is minimal. There are three ways of
accessing the data in a distributed system, namely, data migration,
computation migration and process migration.
• Data migration: In this approach, the required data (such as a
file) is moved to the site where the computation on this data is to
be performed. There are two ways of migrating data—one way is
to transfer the entire file to the site, while another way is to
transfer only those portions of the file that are actually necessary
for the immediate task, and the remaining portions are transferred
as and when required. If only a small portion of a large file is
required, the latter approach is preferable; however, if a
significant portion of the file is required, the former approach is
better. Note that in both cases, when access to the file is no
longer required, the entire file (if it has been modified) or any of its
part that has been modified must be sent back to the site where
the file is actually stored.
• Computation migration: In this approach, the computation is
moved to the site where the data is stored. This approach is
useful when a user program wants to access several large files
stored at different sites. Transferring these files to the site where
computation is to be performed would increase the network traffic.
Therefore, it would be more efficient to move the computation to
those sites where the files are stored, and return the computation
result to the site where the computation was initiated.
• Process migration: In this approach, the entire process or parts
of it are moved to different sites for execution. It is a logical
extension of computation migration. Process migration is
performed to evenly distribute the workload and for faster
computation, or to reduce network traffic that may result in
accessing a remote resource. Another possible reason for
process migration could be the requirement of special hardware
and software, and thus, can be executed only on the site where
the desired hardware and software is available.

14.4 COMPUTER NETWORKS


Computer networks form the basis for the distributed systems. A
computer network includes both networking hardware and software.
Networking hardware deals basically with networking technology and
design of computer networks, and software deals with implementing
communication between a pair of processes. The reliability and
throughput of a distributed system is determined by the underlying
hardware and software. In this section we will discuss some of the
hardware and software aspects of networking.

14.4.1 Types of Networks


A computer network can be as small as several personal computers
on a small network or as large as the Internet. Depending on the
geographical area they span, computer networks can be classified
into two main categories, namely, local area networks and wide area
networks.

Local Area Networks


A local area network (LAN) is the network restricted to a small area
such as an office or a factory or a building. It is a privately owned
network that is confined to an area of a few kilometers. In a LAN, the
computers connected have a network operating system installed in
them. One computer is designated as the file server which stores all
the software that controls the network and the software that can be
shared by the computers attached to the network. The other
computers connected to the file server are called workstations. The
workstations can be less powerful than the file server and may have
additional software on their hard drives. On most LANs, cables are
used to connect the computers. Generally, a LAN offers a bandwidth
of 10 to 100 Mbps. LANs are distinguished from other networks by
three main characteristics including their size, topology, and
transmission technology.

Fig. 14.7 Local Area Network

Wide Area Network (WAN)


A wide area network (WAN) spreads over a large geographical area
like a country or a continent. It is much bigger than a LAN and
interconnects various LANs. This interconnection helps in faster and
more efficient exchange of information at a higher speed and low
cost. These networks use telephone lines, satellite transmission and
other long-range communication technologies to connect the various
networks. For example, a company with offices in New Delhi,
Chennai and Mumbai may connect their individual LANs together
through a WAN. The largest WAN in existence is the Internet.
Fig. 14.8 Wide Area Network

14.4.2 Network Topology


A network topology refers to the way a network is laid out either
physically or logically. The selection of a particular topology is
important and depends upon the number of factors like cost, reliability
and flexibility. The various network topologies include bus, ring, star,
tree, mesh, and graph.

Bus/Linear Topology
The bus topology uses a common single cable to connect all the
workstations. Each computer performs its task of sending messages
without the help of the central server. Whenever a message is to be
transmitted on the network, it is passed back and forth along the
cable from one end of the network to the other. However, only one
workstation can transmit a message at a particular time in the bus
topology.
As the message passes through each workstation, the
workstations check the message’s destination address. If the
destination address does not match the workstation’s address, the
bus carries the message to the next station until the message
reaches its desired workstation. Note that the bus comprises
terminators at both ends. The terminator absorbs the message that
reaches the end of the medium. This type of topology is popular
because many computers can be connected to a single central cable.

Fig. 14.9 Bus Topology

Advantages
• It is easy to connect and install.
• The cost of installation is low.
• It can be easily extended.

Disadvantages
• The entire network shuts down if there is a failure in the central
cable.
• Only a single message can travel at a particular time.
• It is difficult to troubleshoot an error.

Ring/Circular Topology
In ring topology, the computers are connected in the form of a ring
without any terminating ends. Every workstation in the ring topology
has exactly two neighbours. The data is accepted from one
workstation and is transmitted to the destination through a ring in the
same direction (clockwise or counter-clockwise) until it reaches its
destination.

Fig. 14.10 Ring Topology

Each node in a ring topology incorporates a repeater. That is,


each workstation re-transmits data or message received from a
neighbouring workstation, no signal is lost and hence, repeaters are
not required. In addition, since the ring topology does not have a
terminator that terminates the message received, the source
computer needs to remove the message from the network.

Advantages
• It is easy to install.
• The cable length required for installation is not much.
• Every computer is given equal access to the ring.

Disadvantages
• The maximum ring length and the number of nodes are limited.
• A failure in any cable or node breaks the loop and can down the
entire network.

Star Topology
In star topology, the devices are not directly linked to each other but
are connected through a centralized network component known as
hub or concentrator. Computers connected to the hub by cable
segments send their traffic to the hub that resends the message
either to all the computers or only to the destination computer. The
hub acts as a central controller and if a node wants to send the data
to another node, it boosts the message and sends it to the intended
node. This topology commonly uses twisted pair cable, however,
coaxial cable or optical fibre can also be used.

Fig. 14.11 Star Topology

It is easy to modify and add new computers to a star network


without disturbing the rest of the network. Simply a new line can be
added from the computer to the central location and plugged into the
hub. However, the number of systems that can be added depends
upon the capacity of the hub.
Advantages
• Its troubleshooting is easy..
• A single node failure does not affect the entire network.
• The fault detection and removal of faulty parts is easier.
• In case a workstation fails, the network is not affected.

Disadvantages
• It is difficult to expand.
• The cost of the hub and the longer cables makes it expensive over
others.
• In case the hub fails, the entire network fails.

Tree Topology
The tree topology combines the characteristics of the bus and star
topologies. It consists of groups of star-configured workstations
connected to a bus backbone cable. Every node is not directly
plugged to the central hub. The majority of nodes are connected to a
secondary hub which in turn is connected to the central hub. Each
secondary hub in this topology functions as the originating point of a
branch to which other nodes connect. This topology is commonly
used where a hierarchical flow of data takes place.
Fig. 14.12 Tree Topology

Advantages
• It eliminates network congestion.
• The network can be easily extended.
• The faulty nodes can easily be isolated from the rest of the
network.

Disadvantages
• It uses large cable length.
• It requires a large amount of hardware components and hence, is
expensive.
• Installation and reconfiguration of the network is very difficult.

Mesh Topology
In mesh topology, each workstation is linked to every other
workstation in the network. That is, every node has a dedicated point-
to-point link to every other node. The messages sent on a mesh
network can take any of the several possible paths from the source to
the destination. A fully connected mesh network with n devices has
n(n-1)/2 physical links. For example, if an organization implementing
the topology has 8 nodes, 8(8-1)/2, that is, 28 links are required. In
addition, routers are used to dynamically select the best path to be
used for transmitting the data.

Fig. 14.13 Mesh Topology

The mesh topology is commonly used in large Internet-working


environment because it provides extensive back up and routing
capabilities. This topology is ideal for distributed computers.

Advantages
• The availability of large number of routes eliminates congestions.
• It is fault tolerant, that is, failure of any route or node does not fail
the entire network.

Disadvantages
• It is expensive as it requires extensive cabling.
• It difficult to install.
Graph Topology
In a graph topology, the nodes are connected randomly in an arbitrary
fashion. There can be multiple links and all the links may or may not
be connected to all the nodes in the network. However, if all the
nodes are linked through one or more links, the layout is known as a
connected graph.

14.4.3 Switching Techniques


The main aim of networking is sharing of data or messages between
different computers. The data is transferred using switches that are
connected to communication devices directly or indirectly. On a
network, switching means routing traffic by setting up temporary
connections between two or more network points. This is done by the
devices located at different locations on the network called switches
(or exchanges). A switch is a device that selects an appropriate path
or circuit to send the data from the source to the destination. In a
switched network, some switches are directly connected to the
communicating devices while others are used for routing or
forwarding information.
Fig. 14.14 Switched Network

Figure 14.14 depicts a switched network in which the


communicating computers are labelled 1, 2, 3, etc., and the switches
are labelled I, II, III, etc. Each switch is connected either to a
communicating device or to any other switch for forwarding
information. The technique of using the switches to route the data is
called a switching technique (also known as connection strategy).
A switching technique basically determines when a connection should
be set up between a pair of processes, and for how long it should be
maintained.
There are three types of switching techniques, namely, circuit
switching, message switching and packet switching.

Circuit Switching
In circuit switching technique, first of all the complete end-to-end
transmission path is established between the source and the
destination computers, and then the message is transmitted through
the path. The main advantage of this technique is that the dedicated
transmission path provides a guaranteed delivery of the message. It
is mostly used for voice communication such as in the Public
Switched Telephone Network (PSTN) in which when a telephone call
is placed, the switching equipment within the telephone system seeks
out a physical path all the way from the computer to the receiver’s
telephone.
In circuit switching, the data is transmitted with no delay (except
for negligible propagation delay). In addition, this technique is simple
and requires no special facilities. Hence, it is well suited for low speed
transmission.

Fig. 14.15 Circuit Switching

Message Switching
In message switching technique, no physical path is established
between the sender and receiver in advance. This technique follows
the store and forward mechanism, where a special device (usually a
computer system with large memory storage) in the network receives
the message from the source computer and stores it in its memory. It
then finds a free route and sends the stored information to the
intended receiver. In this kind of switching, a message is always
delivered to one device where it is stored and then rerouted to its
destination.
Message switching is one of the earliest types of switching
techniques, which was common in the 1960s and 1970s. As delays in
such switching are inherent (time delay in storing and forwarding the
message) and capacity of data storage required is large, this
technique has virtually become obsolete.

Fig. 14.16 Message Switching

Packet Switching
In packet switching technique, the message is first broken down into
fixed size units known as packets. The packets are discrete units of
variable length block of data. Apart from data, the packets also
contain a header with the control information such as the destination
address, and priority of the message. The packets are transmitted
from the source to its local Packet Switching Exchange (PSE). The
PSE receives the packet, examines the packet header information
and then passes the packet through a free link over the network. If
the link is not free, the packet is placed in a queue until it becomes
free. The packets travel in different routes to reach the destination. At
the destination, the Packet Assembler and Disassembler (PAD)
puts each packet in order and assembles the packet to retrieve the
information.
The benefit of packet switching is that since packets are short,
they are easily transferred over a communication link. Longer
messages require a series of packets to be sent, but do not require
the link to be dedicated between the transmission of each packet.
This also allows packets belonging to other messages to be sent
between the packets of the original message. Hence, packet
switching provides a much fairer and efficient sharing of the
resources. Due to these characteristics, packet switching is widely
used in data networks like the Internet.

Fig. 14.17 Packet Switching


A comparison of the three switching techniques is listed in Table
14.1.
Table 14.1 Comparison between the Various Switching Techniques

14.4.4 Communication Protocols


A communication protocol (also known as network protocol) is a set
of rules that coordinates the exchange of information. If one computer
is sending information to another and both follow the same protocol,
the message gets through regardless of what types of machines they
are and on what operating systems they are running. As long as
machines have software that can manage the protocol,
communication is possible. The two most popular types of
communication protocols are ISO protocol and TCP/IP protocol.

ISO Protocol
International Standards Organization (ISO) provided an Open
Systems Interconnection (OSI) reference model for communication
between two end users in a network. In 1983, ISO published a
document called ‘The Basic Reference Model for Open Systems
Interconnection’ which visualizes network protocols as a seven-
layered model. The model lays a framework for the design of network
systems that allow for communication across all types of computer
systems. It consists of seven separate but related layers, namely,
Physical, Data Link, Network, Transport, Session, Presentation and
Application.
A layer in the OSI model communicates with two other OSI layers,
that directly above it and that directly below it. For example, the data
link layer in System X communicates with the network layer and the
physical layer. When a message is sent from one machine to another,
it travels down the layers on one machine and then up the layers on
the other machine. This route is illustrated in Figure 14.18.
As the message travels down the first stack, each layer (except
the physical layer) adds header information to it. These headers
contain control information that are read and processed by the
corresponding layer on the receiving stack. At the receiving stack, the
process happens in reverse. As the message travels up the other
machine, each layer strips off the header added by its peer layer.
The seven layers of the OSI model are listed here.
• Physical layer: It is the lowest layer of the OSI model that defines
the physical characteristics of the network. This layer
communicates with data link layer and regulates transmission of
stream of bits (0s and 1s) over a physical medium such as
cables, and optical fibers. In this layer, bits are converted into
electromagnetic signals before traveling across physical medium.
• Data link layer: It takes the streams of bits from the network layer
to form frames. These frames are then transmitted sequentially to
the receiver. The data link layer at the receiver’s end detects and
corrects any errors in the transmitted data, which travels from the
physical layer.
Fig. 14.18 Communication between two machines using OSI Model

• Network layer: This layer is responsible for transferring data


between the devices that are not attached locally. It manages the
network traffic problems such as routing data packets. The
network layer device such as router facilitates routing services in
a network. It checks the destination address of the packet
received and compares with the routing table (comprises network
addresses). The router directs the packet to an appropriate router
to reach the destination.
• Transport layer: The transport layer establishes, maintains and
terminates communication between the sender and the receiver.
It manages the end-to-end message delivery in the network.
• Session layer: This layer organizes and synchronizes the
exchange of data between the sending and the receiving
application. This layer keeps each application at one end and
confirms the status of other applications at other end.
• Presentation layer: This layer is responsible for format and code
conversion like encoding and decoding data, encrypting and
decrypting data, compresses and decompresses data. The layer
ensures that information or data sent from the application layer of
a computer system is readable by application layer of another
computer system. The layer packs and unpacks the data.
• Application layer: This layer is the entrance point through which
the programs access the OSI model. It is the “topmost” or the
seventh layer of the OSI model. It provides standardized services
such as virtual terminal file and job transfer operations.

TCP/IP Protocol

Fig. 14.19 TCP/IP Protocol

The transmission control protocol/Internet protocol (TCP/IP) is the


most widely adopted protocol over the Internet. It has fewer layers
than the ISO protocol, which makes it more efficient. However, it
combines several functions in each layer, which makes it more
difficult and complex to implement. The various layers in the TCP/IP
protocol are listed hereunder (see Figure 14.19).
• Link layer: It corresponds to the hardware, including the device
driver and interface card. The link layer has data packets
associated with it depending on the type of network being used,
such as Token ring or Ethernet.
• Network layer: It manages the movement of packets around the
network. It is responsible for making sure that packets reach their
destinations correctly.
• Transport layer: It is the mechanism used for two computers to
exchange data with regard to software. The two types of protocols
that are transport mechanisms are TCP and UDP.
• Application layer: It refers to the networking protocols that are
used to support various services such as FTP, and Telnet.

14.5 DISTRIBUTED FILE SYSTEM


As discussed in Chapter 11, the file system is used to store and
retrieve files on the disk. The file system is a part of the operating
system that is primarily responsible for the management and
organization of various files in a system. When we use distributed
operating systems, the files that are accessed by distributed systems
are stored on its different sites, making it difficult to access those files.
The distributed file system (DFS) provides a way by which users
can share files that are stored on different sites of distributed system.
In addition, it allows easy access to files on distributed system as the
users are unaware of the fact that the files are distributed. DFS is an
implementation of the classical time-sharing model of a file system
that allows sharing of files when they are physically dispersed on
different sites of a distributed system.

Terms used in DFS


To understand the basic structure of DFS, we need to define the
terms associated with it.
• Service: It is an application that is used to perform some tasks. It
provides a particular functionality and runs on one or more
machines.
• Client: It is a process that demands services from the server. It
invokes the service by using a set of operations, which forms the
client interface. Each client is connected to many servers and
there are many clients connected to a single server.
• Server: It provides services to the client and runs on a single
machine. The client requests the services from the server which
responds to their requests. For example, a file server can provide
a number of valuable services, including:
■ Management of shared resources: It manages the files,
databases and other resources that are needed by the client.
This management results in improved efficiency, security, and
availability.
■ Management of backup and recovery: It provides the
backup facility of the client data that require some specialized
skills, and daily attention that a client alone cannot manage.
■ User mobility: It may be advantageous when the system
fails or when we have to access the system from a remote
location. If the files are stored on the server then they can be
accessed any time and anywhere.
■ Diskless workstations: Diskless clients are very useful as
they reduce cost and provide security by preventing users
from copying sensitive system data. The server provides
them space to store the data.

Naming and Transparency


Naming refers to the mapping between physical name and logical
name of a file. Usually, a user refers to a file by a textual name but
these names are mapped to a numerical identifier which in turn is
mapped to disk blocks. In distributed file systems, the file name is
further mapped to the system as the files are stored on the distributed
system. The distributed file system provides the user with the
abstraction of a file that hides the details of how and where on the
system the file is stored.
Transparency means hiding the details from the user and
showing them only the required details. Various types of transparency
in distributed file systems are:
• Access transparency: We know that the files shared on the
distributed system are physically dispersed on different distributed
system sites. This sharing should be in such a way that the users
are unaware that the files are distributed and can access them in
the same way as the local files are accessed.
• Location transparency: The files that are distributed when
shared should provide a consistent namespace. A consistent
name space exists encompassing local as well as remote files.
The name of a file does not give it location.
• Concurrency transparency: All clients have the same view of
the state of the file system. This means that if one process is
modifying a file, the other processes on the same system or
remote systems accessing the files see the modifications in a
coherent manner.
• Failure transparency: The client and client programs should
operate correctly after a server failure.
• Replication transparency: To support scalability, we may wish to
replicate files across multiple servers. Clients should be unaware
of this.
• Migration transparency: Files should be able to move around
without the client’s knowledge.

LET US SUMMARIZE
1. With the evolution of Internet, the number of online users as well as the
size of data has increased. This demand has been the driving force of the
emergence of technologies like parallel processing and data distribution.
The systems based on parallelism and data distribution are called
multiprocessor systems and distributed systems, respectively.
2. Multiprocessor systems (also known as parallel systems or tightly coupled
systems) consist of multiple processors in close communication in a
sense that they share the computer bus, system clock, and sometimes
even memory and peripheral devices.
3. A multiprocessor system provides several advantages over a uniprocessor
system including increased system throughput, faster computation within
an application and graceful degradation.
4. Multiprocessor systems consist of several components such as CPUs,
one or more memory units, disks and I/O devices. All of these
components communicate with each other via an interconnection network.
The three commonly used interconnection networks are bus, crossbar
switch and multistage switch.
5. Bus is the simplest interconnection network in which all the processors
and one or more memory units are connected to a common bus. The
processors can send data on and receive data from this single
communication bus. However, only one processor can communicate with
the memory at a time.
6. A crossbar switch uses an N × N matrix organization, wherein N
processors are arranged along one dimension and N memory units are
arranged along the other dimension. Every CPU and a memory unit are
connected via an independent bus. The intersection of each horizontal
and vertical bus is known as a crosspoint.
7. A multistage switch lies in between a bus and a crossbar switch in terms
of cost and parallelism. It consists of several stages, each containing 2 ×
2 crossbar switches.
8. Each CPU in a multiprocessor system is allowed to access its local
memory as well as the memories of other CPUs (non-local memories).
Depending on the speed with which the CPUs can access the non-local
memories, the architecture of multiprocessor systems can be categorized
into two types, namely, uniform memory access (UMA) and non-uniform
memory access (NUMA).
9. In uniform memory access (UMA) architecture, all the processors share
the physical memory uniformly, that is, the time taken to access a memory
location is independent of its position relative to the processor. Several
improvements have been made in this architecture to make it better. One
such improvement is to provide a cache to each CPU. Another possible
design is to let each processor have local private memories in addition to
caches.
10. In the NUMA architecture, the CPUs access the local and non-local
memories with different speeds; each CPU can access its local memory
faster than the non-local memories.
11. There are basically three types of multiprocessor operating systems,
namely, separate supervisors, master-slave, and symmetric.
12. In separate supervisor systems, the memory is divided into as many
partitions as there are CPUs, and each partition contains a copy of the
operating system. Thus, each CPU is assigned its own private memory
and its own private copy of the operating system.
13. In master-slave (or asymmetric) multiprocessing systems, one processor
is different from the other processors in a way that it is dedicated to
execute the operating system and hence, known as master processor.
Other processors, known as slave processors, are identical. They either
wait for the instructions from the master processor to perform any task or
have predefined tasks.
14. In symmetric multiprocessing systems, all the processors perform identical
functions. A single copy of the operating system is kept in the memory
and is shared among all the processors.
15. A distributed system consists of a set of loosely coupled processors that
do not share memory or system clock, and are connected by a
communication medium.
16. The main advantages of distributed systems are that they allow resource
sharing, enhance availability and reliability of a resource, provide
computation speed-up and better system performance, and allow
incremental growth of the system.
17. A network operating system is the earliest form of operating system used
for distributed systems.
18. A distributed operating system provides an abstract view of the system by
hiding the physical resource distribution from the users. It provides a
uniform interface for resource access regardless of its location.
19. In a distributed system, the data is distributed across several sites. There
are two ways of achieving data distribution, namely, partitioning and
replication.
20. In partitioning (also known as fragmentation), the data is divided into
several partitions (or fragments), and each partition can be stored at
different sites. On the other hand, in replication, several identical copies or
replicas of the data are maintained and each replica is stored at different
sites.
21. There are three ways of accessing the data in a distributed system,
namely, data migration, computation migration and process migration.
22. A computer network can be as small as several personal computers on a
small network or as large as the Internet. Depending on the geographical
area they span, computer networks can be classified into two main
categories, namely, local area networks and wide area networks.
23. A local area network (LAN) is the network restricted to a small area such
as an office or a factory or a building.
24. A wide area network (WAN) spreads over a large geographical area like a
country or a continent. It is much bigger than a LAN and interconnects
various LANs.
25. A network topology refers to the way a network is laid out either physically
or logically. The various network topologies include bus, ring, star, tree,
mesh, and graph.
26. The main aim of networking is transfer of the data or messages between
different computers. The data is transferred using switches that are
connected to communication devices directly or indirectly. A switch is a
device that selects an appropriate path or circuit to send the data from the
source to the destination.
27. The technique of using the switches to route the data is called a switching
technique (also known as connection strategy). There are three types of
switching techniques, namely, circuit switching, message switching and
packet switching.
28. A communication protocol (also known as a network protocol) is a set of
rules that coordinates the exchange of information. The two most popular
types of communication protocols are the ISO protocol and TCP/IP
protocol.
29. The International Standards Organization (ISO) provided an Open
Systems Interconnection (OSI) reference model for communication
between two end users in a network. An OSI model consists of seven
separate but related layers, namely, Physical, Data Link, Network,
Transport, Session, Presentation and Application.
30. The transmission control protocol/Internet protocol (TCP/IP) is the most
widely adopted protocol over the Internet. It has fewer layers than that of
the ISO protocol. The various layers in the TCP/IP protocol are Link,
Network, Transport and Application.
31. The distributed file system (DFS) provides a way by which users can share
files that are stored on different sites of distributed system. In addition, it
allows easy access to files on distributed system as the users are
unaware of the fact that the files are distributed.
EXERCISES
Fill in the Blanks
1. A _____________ consists of a set of loosely coupled processors that do
not share memory or system clock, and are connected by a
communication medium.
2. A _____________ is the earliest form of operating system used for
distributed systems.
3. There are three ways of accessing the data in a distributed system,
namely, data migration, computation migration and _____________.
4. A _____________ is a device that selects an appropriate path or circuit to
send the data from the source to the destination.
5. The technique of using the switches to route the data is called
_____________.

Multiple Choice Questions


1. The network topology in which devices are not linked to each other and
where hub acts as a central controller is:
(a) Mesh topology
(b) Star topology
(c) Ring topology
(d) Tree topology
2. In which of the following types of multiprocessor systems, the memory is
divided into as many partitions as there are CPUs, and each partition
contains a copy of the operating system
(a) Master-slave multiprocessors
(b) Symmetric multiprocessors
(c) Both (a) and (b)
(d) Separate supervisors
3. Which of the following is not a type of transparency in distributed file
system?
(a) Access transparency
(b) Process transparency
(c) Concurrency transparency
(d) Migration transparency
4. If there are six nodes in an organization then how many physical links are
required to implement the mesh topology?
(a) 15
(b) 6
(c) 10
(d) 12
5. Which of the following switching techniques follows store and forward
mechanism?
(a) Packet switching
(b) Message switching
(c) Both (a) and (b)
(d) Circuit switching

State True or False


1. The CPUs in NUMA architecture access the local and non-local memories
with same speed.
2. The tree topology combines the characteristics of the bus and ring
topologies.
3. A WAN interconnects various LANs.
4. Naming refers to the mapping between physical name and logical name of
a file.
5. The transmission control protocol/Internet protocol (TCP/IP) is the most
widely adopted protocol over the Internet.

Descriptive Questions
1. What are the advantages of a multiprocessor system over a uniprocessor
system?
2. Discuss the various types of interconnection networks. How does a
multistage interconnection network considered as a blocking network?
Explain with the help of an example.
3. Explain the UMA architecture of a multiprocessor system. What are the
different variants of UMA architecture? Give suitable diagrams also.
4. Discuss the NUMA architecture of a multiprocessor system with the help
of suitable diagram. How is it different from UMA architecture?
5. What are the various types of multiprocessor operating systems? Discuss
the advantages and disadvantages of each of them.
6. What are the advantages of distributed systems?
7. What are the drawbacks of a network operating system? How does a
distributed operating system overcome these drawbacks?
8. Discuss various techniques of data distribution in distributed systems.
9. Discuss the advantages and disadvantages of ring topology.
10. Differentiate between the following.
(a) Data migration and computation migration (b) LAN and WAN (c) Star
and tree topology
(d) Circuit switching and packet switching
11. Explain the OSI model in detail. How is TCP/IP model different from OSI
model?
12. Explain the distributed file system and the terms associated with it.
13. Explain the different types of transparency in distributed file system.
chapter 15

Case Study: UNIX

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Discuss the history of UNIX operating system.
⟡ Understand the role, function and architecture of UNIX kernel.
⟡ Explain how process management is done in UNIX.
⟡ Describe how memory is managed in UNIX.
⟡ Explore the file system and directory structure of UNIX.
⟡ Discuss the I/O system of UNIX.
⟡ Learn shell programming.

15.1 INTRODUCTION
UNIX (officially trademarked as UNIX®) operating system is an open
source software that means complete source code is available with it,
so that one can customize the operating system. UNIX is available on
a wide range of different hardware (that is, it is portable), has
hierarchical file system, device independence and multi-user
operation capabilities. It uses virtual memory and paging to support
programs larger than the physical memory. In addition, it has a
number of utility programs like vi editor, shell (csh) and compilers. As
said by Thompson and Ritchie, the designers of UNIX, “The success
of UNIX lies not so much in new inventions but rather in the full
exploitation of a carefully selected set of fertile ideas, and especially
in showing that they can be keys to the implementation of a small and
yet powerful operating system.” UNIX being one of the dominant
operating systems available on high-end workstations and servers is
also used on systems ranging from cell phones to supercomputers
these days. These are the main reasons behind the fast growth and
vast development of UNIX operating system.

15.2 HISTORY OF UNIX


UNIX development was started at AT&T’s Bell Laboratories by Ken
Thompson in 1969 and later joined by Dennis Ritchie and his entire
department. One of the researchers at Bell Labs, Brian Kernighan
suggested the name UNICS (UNiplexed Information and Computing
System) for the operating system which was finally changed to UNIX
in 1970 and is being used till date.
Initially, UNIX was written in assembler by Thompson for the PDP-
7 minicomputer. Later it was moved from obsolete PDP-7 to more
advanced PDP-11/20 and then later to PDP-11/45 and PDP-11/70. As
it was difficult to write the entire system in assembler for each new
machine, Thompson decided to rewrite UNIX in a high-level language
named as ‘B’. Though writing it in a high-level language was a bit of
compromise in terms of the speed, the major gain was in terms of the
portability. As ‘B’ language has some weaknesses due to lack of
structure, this attempt was not successful. Ritchie then designed a
new language and named it ‘C’ (successor of ‘B’) and wrote the
compiler for this new language. Later, in 1973 UNIX was written in ‘C’
language by Ritchie.
In 1974, a landmark paper about UNIX was published by ‘Ritchie
and Thompson’ and for their work described in it, they were later
awarded the prestigious ACM Turing Award in 1984. After it was
rewritten in ‘C’ language, UNIX was used by educational institutions
and universities at nominal licensing fees. It became very popular
because the OS that came with PDP-11 was not liked by professor
and students. Hence, UNIX became the choice of all because
complete source code was available and they could experiment with
it and learn more. Due to this, new ideas and improvements to the
system spread rapidly and in a couple of years a new version
(version-7) came to market. Soon, UNIX was provided by a variety of
vendors and it was in widespread use on minicomputers and
engineering workstations.
Apart from AT&T, the University of California, Berkeley was the
major contributor in the development of UNIX. Berkeley was not
satisfied with most of the part of AT&T UNIX and hence created BSD
UNIX (Berkeley Software Distribution). Many new features were
added to BSD UNIX. File system was improved, the mail features
were made more versatile and a better method for linking files was
introduced. With the standard distribution, a networking protocol
software (TCP/IP) was offered which made the Internet possible.
Others who were eager to join the race were Microsoft and Sun
Microsystems. Sun Microsystems developed SunOS earlier and now
their version of UNIX is known as Solaris. Microsoft called their
product XENIX and was the first to run UNIX on a PC with 640 KB of
memory. Microsoft later sold XENIX to SCO (The Santa Cruz
Operation), who now markets the popular commercial brand of UNIX
for desktop called SCO UNIX. Later, AT&T introduced SVR4 by
unifying the system V3.2, BSD, SunOS and XENIX flavors into a
single product.
When AT&T took this step, International Business Machines
(IBM), Hewlett-Packard (HP) and Digital Equipment Corporation
(DEC) joined to form an association by the name Open Software
Foundation (OSF) with the motive to create a UNIX-like operating
system. However, each of them had their own brands like AIX by
IBM, HP-UX from HP and Digital UNIX from DEC.
UNIX operating system is ported to machines that range from
notebooks to supercomputers unlike Windows which is mainly
restricted to PCs. In 1992, Novell purchased the UNIX business from
AT&T. Novell handed over the UNIX trademark to X/OPEN which is a
standard body. Today UNIX is a standard and is no longer considered
a product. If a product is X/OPEN-compliant, it can be marketed as
UNIX.
The first GUI (Graphical User Interface) for UNIX was introduced
by MIT (Massachusetts Institute of Technology) and was named as X
Window. Two standards are being used these days namely Open
Look and Motif. X Window has almost all the GUI based features
available in Microsoft Windows like the usage of mouse to handle
icons, scroll bars, buttons, dialogue boxes, menu bars in a single
session and a lot more similar activities can also be performed based
on user-interface specification.
UNIX is finally commercial, though it was strongly felt that it was
so good that it should be distributed free. Linus Torvalds, a Finnish
student from the University of Helsinki in 1991 created a similar
operating system, which is now popular as Linux. He made the entire
software available on the Internet including the complete source code
which motivated many programmers throughout the world to add
features and improve the operating system. Linux has various
multimedia features as well as it is extremely strong in networking.
Linux provides a cost- effective solution to an organization in setting
up a website and intranet. More about Linux will be explored in
Chapter 16.

15.3 UNIX KERNEL


While designing UNIX, two concepts have been kept in mind. First
about the file system: it occupies ‘physical space’ and the second
about the process: it is supposed to have ‘life’. A block diagram
depicting logical view of kernel, showing its modules and internal
relationships among them is displayed in Figure 15.1. There are three
partitions or levels shown in the block diagram: user, kernel and
hardware. System call interface is used for communication between
the user and the kernel.
There are two major divisions or partitions in kernel that is, the file
subsystem shown on the left hand side and the process control
subsystem shown on the right hand side in the block diagram. The
file subsystem basically deals with the activities related to
management of files, allocating space, controlling access to files,
retrieving data for users and administering free space. It uses buffer
cache to access data from files. Buffering mechanism sends/receives
data between kernel and secondary storage media. Buffering
mechanism also enables the transfer of data to and from the kernel
by interacting with device driver of block I/O device. The file
subsystem can directly interact with device drivers of character
devices. Character devices are all the devices other than block
devices.
The process control subsystem deals with process scheduling,
inter-process communication, memory management and process
synchronization. The process control subsystem and the file
subsystem communicate when a file has to be loaded into the
memory for execution. The memory management module deals with
memory allocation. When multiple processes are competing for
execution and the operating system is falling short of physical
memory for all the processes, the kernel swaps the process between
the primary memory and the secondary memory. This way all the
processes get a chance to execute. The scheduler module controls
the allocation of CPU to various processes. It uses an allocation
algorithm to allocate the CPU to a process. Depending on chosen
allocation algorithm, it selects a process from the queue and allocates
the CPU to it. There are various types of inter-process
communication like synchronous transfer of messages or
asynchronous signaling of events among processes. Devices like
disks and terminals may generate interrupts to communicate with the
CPU while a process is running. If the interrupt is generated, the
kernel suspends the currently running process, services the interrupt
by invoking special functions and then resumes the process which
was running prior to the interrupt.
Fig. 15.1 Kernel of UNIX

15.4 PROCESS MANAGEMENT


In UNIX system, the process is the only active entity. Each process
runs as a single program and has single thread of control. That is,
there exists only program counter that keeps track of the next
instruction to be executed. Since UNIX is a multiprogramming
operating system, multiple processes can execute simultaneously.
There are three types of processes that can be executed in UNIX:
user processes, daemon processes, and kernel processes.
• User processes are responsible for executing user applications.
For each program initiated by the user, kernel creates a main
process (or parent process) which in turn may create several
other processes (or child processes). Parent and child processes
execute in a coordinated and tightly integrated manner, and their
lifetimes are also related.
• Daemon processes are the background processes that are
responsible for controlling the computational environment of the
system. These processes are not under the direct control of a
user. Examples of such processes are network management,
print spooling, etc. These processes exist throughout the life of
the operating system.
• Kernel processes execute the kernel code. Like daemon, these
are also run in the background. However, they are different from
daemons in the sense that they start automatically when the
system is booted. In addition, they can invoke kernel
functionalities without having to perform any system call.
To store the control information about the processes, UNIX uses
two data structures: proc structure and u area. The proc data
structure holds information related to process scheduling such as
process id, process state, priority, memory management information,
etc. The u area (user area) contains information related to resource
allocation and signal handling such as PCB, pointer to proc structure,
user and group IDs, terminal attached to the process, CPU usage
information, information related to all open files and current directory,
etc. The proc structure needs to be in the memory all the time
irrespective of whether the process is currently executing or not;
whereas, u area needs to be in memory only when the process is
executing.

15.4.1 Process Creation and Termination


UNIX allows a process to create multiple new processes by invoking
the fork() system call. The forking process is called parent while the
newly created process is termed as child. The fork() system call
creates an exact duplicate of the invoking process including registers,
variables and everything else but the child process has its own
private address space. Thus after the fork, if either process changes
any of its variables, the changes would not be visible to the other
process. However, an exception to this, are the files that were open in
the parent process before the fork. Such files will continue to open in
the child process after the fork, that is, the open files are shared
between the parent and child process. Thus, changes made in any of
the open files by either process would be visible to the other.
Since both the parent and child process are identical, to identify
which one is the parent and which one is the child, the fork() call
returns a zero value to the child process and a non-zero value (the
child process’s PID) to the parent process. The processes determine
their return value and execute accordingly.
After the fork, the child process continues to execute the same
program as that of its parent. However, in most cases, it needs to
execute a different program from its parent. The child process can
load a different program into its address space by invoking the
execve() system call. Note that a process can invoke execve()
system call to run a new program at any time, not necessarily after
the fork() system call. As a process invokes execve() system call,
the currently executing program is terminated and the new program
begins to execute in the context of the invoking process.
Once the execution of the process is over, it terminates itself by
calling exit(status) system call. The parameter status indicates the
termination status of the process. When the kernel receives the
exit() call, it closes all open files of the process, deallocates the
memory, and destroys the u area of the process. However, the proc
structure remains in the memory until the parent of the terminated
process explicitly destroys it. The status is saved in the proc structure
so that the parent process can anytime query its termination status.
The process which is terminated is dead, but it exists as long as its
proc structure is in the memory. Such a process is called zombie
process.

15.4.2 Inter-process Communication


Generally, a process needs to communicate with another process
either to inform it about occurrence of some event or to exchange
some data or information. The UNIX operating system is empowered
with various inter-process communication facilities, some of which are
discussed in this section.

Signals
A signal is the most basic communication mechanism that is used to
alert a process to the occurrence of some event such as abnormal
termination or floating point exception. It does not carry any
information rather it simply indicates that an event has occurred.
When a process sends a signal to another process, the execution of
the receiving process is suspended to handle the signal as in case of
interrupt.
UNIX offers a wide range of signals to indicate different events.
The majority of signals are sent from the kernel to user processes
while some can be used by the user processes to communicate with
each other. However, the kernel does not use signals to communicate
with a process running in kernel mode; instead a wait-queue
mechanism is used to enable kernel-mode processes to convey each
other about incoming asynchronous events. This mechanism allows
several processes to wait for a single event by maintaining a queue
for each event. Whenever a process needs to wait for the completion
of a particular event, it sleeps in the wait queue associated with that
event. After the event has happened, all the processes in the wait
queue are awakened.

Pipes
Pipe is the standard communication mechanism that enables the
transfer of data between processes. It provides a means of one-way
communication between related processes. Each pipe has a read
end and a write end. The data written at the write end of the pipe can
be seen (read) through the read end. When the writer process writes
to the pipe, stream of bytes are copied to the shared buffer whereas
at the time of reading bytes are copied from the shared buffer.
Though both the reader and writer processes may run concurrently,
the access to pipe must be synchronized. UNIX must ensure that only
one process (either writer or reader) is accessing the pipe at a time.
To synchronize the processes, UNIX uses locks and wait queues;
each end of pipe is associated with a wait queue.
Whenever the writer process requests for writing to pipe, UNIX
locks the pipe for it if and only if there is enough space as well as the
pipe is not locked for the reader process. Once the writer process
gains access to pipe, bytes are copied into it. However, if the pipe is
full or locked for the reader process, the writer process sleeps in the
wait queue at write end and remains there unless it is not awakened
by the reader. After the data has been written to the pipe, the pipe is
unlocked and any sleeping readers in the wait queue at read end are
awakened. A similar process follows at the time of reading from the
pipe.
A variation of pipe that UNIX supports is named pipe, also called
FIFO. As the name implies, in FIFOs, data written first to the pipe is
read first. They employ same data structures as used in pipes as well
as are handled in the same way. However unlike pipes, they are
persistent and exist as directory entries. Moreover, unrelated
processes can use them. Any process which wants to use named
pipes must have appropriate access rights. Before starting to use a
FIFO, it needs to be opened and similarly, after use it needs to be
closed. However, UNIX must ensure that a writer process opens the
FIFO before the reader process and a reader process does not
attempt to read from before the writer process has written to.

Shared Memory
Shared memory is another means of communication that allows
cooperating process to pass data to each other. This mechanism
enables a memory segment to be shared between two or more
processes. As discussed in Chapter 2, one process creates the
shared memory segment while others can read/write through it by
attaching the shared memory segment along with their address
space. Like other mechanisms, UNIX must ensure the
synchronization among communicating processes so that no two
processes access the shared area simultaneously.
Shared memory is a faster means of communication as compared
to other methods; however, it does not provide synchronization
among different processes by its own. For this, it is to be used with
some other IPC mechanism that offers synchronization.

15.4.3 Process Scheduling


The processes in UNIX are executed in one of three modes at a
particular point of time. These modes are user mode, kernel non-
interruptible mode, and kernel interruptible mode. The processes
running in user mode have lowest priorities, and the processes
executing in kernel non-interruptible mode have the highest priorities.
The processes in kernel mode use many operating system resources,
and hence are allowed to run at high priority. On the other hand, the
kernel mode processes which do not require much OS resources are
put in kernel interruptible mode, and given medium-level priorities.
UNIX assigns numerical values ranging between 0 and 127 as the
process priorities. The higher the value, the lower is the priority. The
user mode processes are assigned values between 50 and 127, and
kernel mode processes are assigned the values between 0 and 49.
Since UNIX is a time-sharing operating system, it basically follows
round-robin scheduling algorithm with some variation in it. Whenever
a process departs the ready queue either due to the reason that its
time slice is over or it is blocked, UNIX dynamically modifies its
priority, and performs round-robin only for processes with identical
priority. This results into multilevel adaptive scheduling.
There are two main reasons for dynamic variation of process
priorities, which are discussed as follows.
• The first reason is that when a user process enters into kernel
mode due to an interrupt or a system call, it acquires some kernel
resources and may wait for other resources. Therefore, it gets
blocked. Once it gets all the desired resources, it becomes active.
Now, its priority must be changed. It must be given higher priority
to get it scheduled as soon as possible so that it can release all
the kernel resources and return to the user mode. When the
process gets blocked in the kernel mode, its priority is changed to
a value ranging between 0 and 49 depending on the reason of
blocking. Once it becomes active again, it executes in the kernel
mode with the same priority. Once its execution in the kernel
mode is over, it returns to the user mode and is again assigned
the previous value of the priority, which is between 50 and 127.
• Another reason for dynamically varying the priorities is that round-
robin scheduling does not give fair CPU attention to all the
processes. The CPU time of I/O-bound processes is always
lesser than as compared to CPU-bound processes. Thus, UNIX
gives more importance to those processes which have not utilized
CPU time much in the recent time, and gives them higher
priorities. UNIX computes the priority of each process every
second, using the following formula.
Priority = Base + Nice + CPU_usage

where
■ Base is the base priorities of the user processes—it is same
for all the processes
■ Nice is the priority value which is assigned by the user to its
own process. For this, the user can use nice(priority)
system call. The default is 0, but the value can lie between
-20 and +20. The user is allowed to assign only a positive
value to the parameter priority in the nice() system call.
However, the system administrator is allowed assign a
negative value (between -20 and -1) to the parameter
priority.

■ CPU_usage is the average number of click ticks per second


that have been used by the process during the past few
seconds. This value is stored in the CPU usage field of each
running process’s PCB. Its value starts with 0, and with each
clock tick it is incremented by 1. The higher the value of
CPU_usage, higher will be the recomputed numerical value,
resulting into lower priority. Note that while recomputing the
priority of a process, UNIX does not consider the total CPU
usage since the start of a process; rather it takes into account
only the recent CPU usage. To implement this policy, it first
divides the value in the CPU_usage field by 2 and stores it
back. It then recomputes the process priority by adding this
new value to the Base. This way influence of the CPU usage
reduces as the process waits for CPU.

15.4.4 System Calls for Process Management


The system calls under this category include the following.
• fork(): This system call creates a new process (a child process)
which is the logical copy of the parent process. Parent process
returns the process ID of child process from fork while the child
process returns 0.
• pause(): This system call suspends the execution of the calling
process until it receives a signal.
• exit(status): This system call causes the calling process to
terminate. The value of the status is returned to the parent of
terminated process.
• pipe(fil_des): This system call returns two file descriptors:
fil_des[0] for reading and fil_des[1] for writing. Data through
the pipe is transmitted in first-in-first-out order and cannot be read
twice.
• nice(priority): This system call allows the user of the process to
control the process priority. The user can only increase the priority
of the process by giving a positive value to the parameter
priority—he or she is not allowed to reduce or lower the priority.

15.5 MEMORY MANAGEMENT


UNIX employs a simple and straightforward memory model which not
only increases the program’s portability but also enables UNIX to be
implemented on systems with diverse hardware designs. As we know
that each process is associated with itself an address space. In UNIX,
the address space of each process comprises the following three
segments.
• Text segment: This segment contains the machine code of
program produced by the compiler or assembler. The main
characteristic of this segment is that it can only be read but
cannot be written to or modified. Thus, the size of text segment
can neither increase nor decrease. Most UNIX systems support
the concept of shared text segment. This is useful in cases
where more one user run the same program at the same time. In
such cases though two text segments of the same program can
be kept in memory, but it would be inefficient. To remove this
inefficiency, only one text segment is kept in memory and both
processes share that text segment. In case of shared text
segment, the mapping is performed by virtual memory hardware.
• Data segment: This segment contains the memory addresses of
the variables, arrays, strings and other data of the program. It is
divided into parts: one containing initialized data and other
containing uninitialized data (also referred to as BSS). The
initialized data, as the name implies, contains those variables of
program that must be initialized with some value on starting up
the program’s execution. Note that the uninitialized variables do
not form the part of executable file of the program; rather the
executable file contains only the program’s text followed by the
initialized variables of the program.
The data segment differs from the text segment in the sense that
a program can change the size of its data segment. This is
required because the program variables may change their value
on executing the program or there can be variables in the
program to which memory needs to be allocated dynamically
during execution. To deal with such situations, UNIX allows the
data segments to increase their size on allocating memory and
decrease their size on deallocating memory.
• Stack segment: This is the third segment of the program which
holds the environment variables along with the arguments of
command line which was typed to the shell in order to invoke the
program. On most UNIX machines, the stack segment begins
from the top (or near) of the virtual address space of the program
and grows going downwards to 0. an important characteristics of
stack segment is that its size cannot be explicitly managed by the
program.
Note: Several versions of UNIX support memory-mapped files,
discussed in Chapter 8.

15.5.1 Implementation of Memory Management


Earlier versions of UNIX system (prior to 3BSD) were based on
swapping; when all the active processes could not be kept in
memory, some of them were moved to the disk in their entirety. This
implies that at a particular moment, the whole process would either
be in memory or on disk. Berkeley added paging to UNIX with 3BSD
in order to support larger programs. Virtually, all the current
implementations of UNIX support demand-paged virtual memory
system.

Swapping
When there exist more processes than that can be accommodated in
main memory, some of processes are removed from memory and
kept on the disk. This is what we refer to as swapping, as discussed
in Chapter 7. The module of operating system that handles the
movement of processes between disk and memory is called
swapper. Generally, the swapper needs to move processes from
memory to disk when the kernel runs out of free memory which may
happen on the occurrence of any one of the following events.
• A process invokes brk() system call to increase the size of data
segment.
• A process invokes a fork() system call to create a child process.
• A stack runs out of space allocated to it due to larger data.
One more possible reason for swapping could be a process that
had been on the disk for too long and now, it has to be brought into
the memory but there is no free space in memory.
Whenever a process is to be swapped out of memory to make
room for new process, the swapper first looks for those processes in
memory that are presently blocked and waiting for some event to
occur. If the swapper finds one or more such processes, it evicts one
of them based on certain criteria. For example, one possible
approach is to remove the process with highest value of priority plus
residence time. On the other hand, if swapper does not find any
blocked processes in memory, one of the ready processes is
swapped out based on the same criteria.
The swapper also examines after every few seconds the
swapped-out processes in order to determine whether any of them is
ready for execution and can be swapped in the memory. If it finds
such a process, then it determines whether it is going to be an easy
or hard swap. The swap is considered easy if there is enough free
space in memory that the chosen process can just be brought into
memory without having to swap out any process from memory. In
contrast, the swap is considered hard if there is no free space in
memory and to swap in the new process, some existing process has
to be swapped out of memory.
Note: In case the swapper finds many processes on the disk that are
ready for execution, it chooses the one to swap into memory that had
been on the disk for the longest time.
The swapper goes on repeating the above process until any of the
following two conditions are met.
• The memory is full of processes which had just been brought into
it and there is no room for any other process.
• There are no processes on the disk which are ready to execute.
To keep track of free space in memory and swap space on swap
device such as disk, linked list of holes is used. Whenever a process
is to be swapped into memory or a swapped-out process is to be
stored on disk, the linked list is searched and a hole is selected
following the first-fit algorithm.

Paging
Here, we discuss paging system in 4BSD UNIX system. The basic
idea behind paging in 4BSD is the same as we described in Chapter
7. That is, there is no need to load the entire process in memory for
its execution; rather loading only the user structure and page tables
of process in memory is enough to start its execution. In other words,
a process cannot be scheduled to run until the swapper brings the
user structure and page table of that process in memory. Once the
execution of process begins, the pages of its data, text and stack
segments are brought into memory dynamically as and when needed.
The physical (main) memory in 4BSD system is composed of
three parts [see Figure 15.2 (a)]. The first part stores the kernel, the
second part stores the core map, and the third part is divided into
page frames. The kernel and core map are never paged out of
memory, that is, they always remain in the memory. Each page frame
in memory contains a data, text, or stack page, a page table page, or
be on the list of free page frames in memory, maintained by the virtual
memory handler. The information regarding the contents of each
page frame is stored in the core map. The core map contains one
entry per page frame; the core map entry 0 describes page frame 0,
core entry 1 describes page frame 1, and so on.
Furthermore, each core map entry has various fields as shown in
Figure 15.2 (b). The first two fields in core map entry (that is, index of
previous and next entry) are useful when the respective page frame
is on the list of free page frames. The next three fields, including disk
block number, disk device number and block hash code, are used
when the page frame contains information. These items specify the
location on the disk where the page contained in the corresponding
page frame is stored and will be put on paging out. The next three
fields, including index into proc table, text/data/ stack and offset within
segment, indicate the process table entry for that page’s process, the
segment containing that process and the location of process within
that segment. The last field contains certain flags which are used by
the paging algorithm.

Fig. 15.2 Physical Memory and Core Map in 4BSD


To provide faster page-in operations, the virtual memory handler
attempts to keep at least 5 percent of total memory page frames in
this list at all times. To achieve this purpose, a process called page
daemon is created. This daemon is activated periodically to
determine whether the number of page frames in the list of free
memory page frames is below 5 percent. If so, the page daemon
takes necessary action to free up more page frames; otherwise, it
sleeps. Using a single threshold at 5 percent causes frequent
activation and deactivation of daemon. To avoid this, some variants of
UNIX use two thresholds: high and low. The daemon is activated
when it finds the number of memory page frames in the free list lower
than the low threshold. On the other hand, if the number of memory
page frames in free list is found greater than the high threshold, the
daemon sleeps.

Page Replacement Algorithm


The page daemon executes the page replacement algorithm when it
discovers that the page frames in list of free page frames is less than
a threshold value. Originally, Berkeley UNIX adopted the basic clock
algorithm for page replacement. However in modern UNIX systems
with larger memory sizes, each pass in the basic clock algorithm took
so much time. Therefore, these systems employ two-handed clock
algorithm, a modified version of basic clock algorithm (discussed in
Chapter 8).
The two-handed clock algorithm is implemented with the help of
two pointers (instead of one pointer in basic clock algorithm). One
pointer is used to clear the reference bits of pages, while the other
pointer is used to check the reference bits. Both pointers are
incremented simultaneously. When the page daemon runs, a page
pointed to by the checking pointer whose reference bit is clear is
chosen as victim, and thus, removed from the memory and added to
the free list.
In case the paging rate becomes too high and the number of page
frames in the free list is always found less than the threshold value,
the paging is stopped and the swapper comes into play. The swapper
removes one or more processes from the memory. It first examines
the memory to find those processes which have not been accessed
for 20 seconds or more. If such processes are found, the swapper
swaps out the one that has been idle for the longest. However, if no
such processes are found in memory, the swapper chooses the four
largest processes and swaps out the one which has been in the
memory for the longest as compared to other ones. This procedure is
continued (if required) until the sufficient memory has been freed.

15.5.2 System Calls for Memory Management


Most UNIX systems provide us with system calls for memory
management. Some most common system calls used for managing
memory are as follows.
• brk(end_data_seg): This system call is used to specify the size of
data segment. It sets the highest address of a process’s data
region to end_data_seg. If the new value of end_data_seq is smaller
than the older one, the data segment shrinks; otherwise, grows.
• mmap(address, len, prot, fl ags, fd, offset): This system call is
used to map a file in memory. Here, address specifies the memory
location where the file is to be mapped, len specifies the number
of bytes to be mapped, prot specifies the protection used for
mapped file, such as readable, writeable, executable, etc., flags
indicate whether file is private or sharable, fd is used as the file
descriptor of the file being mapped and the offset specifies the
position in file from where mapping is to be started.
• unmap(address, len) : This system call is used to remove those
pages of a mapped file form the memory that fall between address
and (address + len) memory addresses. The rest file remains
mapped.

15.6 FILE AND DIRECTORY MANAGEMENT


File system is the most visible part of any operating system. UNIX
provides a simple but elegant file system which uses only a limited
number of system calls. In this section, how files are implemented in
UNIX, the directory structure of UNIX and the system calls used for
file and directory management.

15.6.1 UNIX File System


A UNIX file is a stream of zero or more bytes containing arbitrary
data. In UNIX file system, the six types of files are identified, which
are listed in Table 15.1 along with their features.

Table 15.1 Types of Files in UNIX

Type of File Features


Regular (or This is the simplest type of file containing
ordinary) arbitrary data in zero or more bytes; it has no
internal structure and treated as a stream of
characters.
Directory This file is similar to ordinary file but contains
a collection of file names plus pointers to their
respective inodes (discussed next); the user
programs have only read access to directory
file but can never write into them.
Special This file does not contain any data; it provides
mechanisms to map physical devices onto file
names. A special file is associated with each
I/O device.
Named pipe This file is associated with a pipe, an IPC
mechanism between UNIX processes. It
buffers the data sent by one of the
communicating processes in its input which is
read by another process from pipe’s output.
Link This is not an actual file; rather it is just
another file name for an existing file.
Symbolic link This is a data file that stores the name of the
file to which it is linked.
UNIX does not distinguish between different types of files; it treats
all in the same way. It administers all types of files by means of
inodes.

Inode
Each file in UNIX is associated with a data structure known as inode
(index node). The inode of a specific file contains its attributes like
type, size, ownership, group, information related to protection of the
file, time of creation, and access and modification. In addition, it also
contains the disk addresses of the blocks allocated to the file. Some
of the fields contained in an inode are described as follows.
• File owner identifier: File ownership is distributed amongst the
group owner and other users.
• File type: Files can be of different type like a normal file, a
directory, a special file, a character / block device file or pipe.
• File access permission: File is protected according to the
distribution among three classes: the owner, the group owner and
other users. Each of the three classes has rights to read, write
and execute the file. The rights to read, write and execute could
be set individually for each class. Execute permission for a
directory allows that directory to be searched for any filename.
• File access time: It is the time of the last modification and the last
access of the file.
The inodes of all files existing in the file system are stored in an
inode table stored on disk. A file can be opened by just bringing its
inode into the main memory and storing it into the inode table
resident in memory. This implies that the blocks of any file can be
found by just having the inode of that file in the main memory. This
proves a major advantage over FAT scheme as at any instant only
inodes of the open files need to be in main memory thereby
occupying much smaller space in main memory as compared to FAT
scheme (discussed in Chapter 12). Moreover, the amount of space
reserved in main memory increases with increase in number of open
files only and not with the size of disk.
File Allocation
In UNIX, the disk space is allocated to a file in units of blocks and that
too dynamically as and when needed. UNIX adopts indexed
allocation method to keep track of files, where a part of index is
stored in the inode of files. Each inode includes a number of direct
pointers that point to the disk blocks containing data and three
indirect pointers, including single, double and triple. For instance, the
inode in FreeBSD UNIX system contains 15 pointers where first 12
pointers contain addresses of the first 12 blocks allocated to the file,
while rest three are indirect pointers (see Figure 15.3). In case the file
size is more than 12 blocks, one or more of the following levels of
indirection are used as per the requirement.
• Single indirection: The 13th pointer in inode contains address of
a disk block, known as single indirect block, which contains
pointers to the succeeding blocks in the file.
• Double indirection: In case the file contains more blocks, the
14th pointer in inode points to a disk block (referred to as double
indirect block) which contains pointers to additional single
indirect blocks. Each of the single indirect blocks in turn contains
pointers to the blocks of file.
• Triple indirection: In case the double indirect block is also not
enough, the 15th address in the inode points to a block (referred
to as triple indirect block), which points to additional double
indirect blocks. Each of the double indirect blocks contains
pointers to single indirect blocks each of which in turn contains
pointers to the blocks of file.
Fig. 15.3 Structure of Inode in FreeBSD UNIX

15.6.2 UNIX Directory Structure


A directory is a file with special format where the information about
other files is stored by the system. In UNIX, the directory structure is
extremely simple and it is the logical layout of the storage and
management of files in any operating system. These files may
contain documents, pictures, programs, audio, video, executables, or
help information. For the organization of files and directories, UNIX
uses a hierarchical structure (often referred to as directory tree) as
shown in Figure 15.4. In this structure, the directory at the top most
level is named as root directory designated by a forward slash “/”. It
contains references to files and other lower level directories forming a
tree like structure. A directory in the file system tree may have many
children, but it can only have one parent. A file can hold information,
but cannot contain other files, or directories. To describe a specific
location in the file system hierarchy, “path” must be specified. The
path to a location can be defined as an absolute path from the root
anchor point, or as a relative path, starting from the current location.
Fig. 15.4 Typical Tree-like Directory Structure of UNIX

Many users find the UNIX directory structure extremely


convenient. The organization of files in UNIX is entirely different in
comparison to MS-DOS and Windows. In these operating systems,
all the devices like floppy disks and CD-ROM’S are mounted at the
highest directory level. Also in Windows, mostly the files related to a
particular program are stored together however in UNIX files are
stored in specified directories according to the functionality of the
files. For example, suppose a software named WinRAR (Roshal
Archive) which is used for compression and decompression of data is
installed on our system. Files related to the configuration, help and
executables of WinRAR (Roshal Archive) are all stored together in
Windows, while in UNIX there are separate directories for
configuration, help and executables of any program or software.
One major difference lies in the representation format of files and
their path used by UNIX and Windows. In Windows, the path of a file
is represented as:
Drive:\Folder_Name\Sub_Folder_Name\File_Name.Extension

while in UNIX, the path of file is represented as:


/Folder_Name/Sub_Folder_name/File_Name.Extension

The main points to be focused while representing files, their


paths, and drives in UNIX are:
• Windows uses \ (backslash) while UNIX uses / (forward slash)
while representing the path of files.
• UNIX is case-sensitive that is, file XYZ.txt is different from xyz.txt
and Xyz. txt.
• Drives in UNIX are not named as C:, D:, E:, etc., as in Windows;
rather, / is the root partition where all the files, directories,
devices and drives are mounted.
Some of the standard subdirectories used in UNIX and the files
contained in them are:
• /bin: contains User Binaries
• /sbin: contains System Binaries
• /etc: contains Configuration Files
• /dev: contains Device Files
• /proc: contains Process Information
• /var: contains Variable Files
• /tmp: contains Temporary Files
• /user: contains User Programs
• /home: contains User Directories
• /boot: contains Boot Loader Files
• /lib: contains System Libraries
• /opt: contains Optional add-on Applications
• /mnt: contains Mount Directory
• /media: contains Removable Media Devices
• /srv: contains Service Data

Directory Implementation
In UNIX, directories are implemented using a variation of linear list
method (discussed in Chapter 12). Each directory entry consists of
two fields: the file (or subdirectory) name which can be maximum 14
bytes and the inode number which is an integer of 2 bytes. The inode
number contains the disk address of the inode structure that stores
the file attributes and the address of the file’s data-blocks (see Figure
15.5). With this approach, the size of the directory entry is very small,
and the approach has certain advantages over the linear list of
directory.

Fig. 15.5 Directory Implementation in UNIX

Whenever a file or subdirectory is accessed, its associated inode


number is used as index to the inode table. When a directory is
created, it has . and .. entries. The entry . has the inode number for
current directory, while the entry .. has the inode number for the
parent directory on the disk.

15.6.3 System Calls for File and Directory Management


Linux provides a number of system calls for managing files and
directories. Some of these system calls are described as follows.
• creat(file_name, rwx_ mode): This system call creates a new file
whose name is specified by file_name and access permission
mode by rwx_mode. If the file with same file name already exists,
creat truncates the existing file.
• open(file_name, rwa_flag, a_mode): This system call opens the
file whose name is specified by file_name. The rwa_flag specifies
that the file is to be opened for reading, writing, appending, etc.
a_mode specifies the file permissions in case the file is being
created. The open system call returns the file descriptor of the file
being opened.
• lseek(fil_des, offset, origin): This system call changes the
position of the read-write pointer of the file identified by the file
descriptor file_des and returns a new value. The position of
pointer is expressed as offset bytes relative to the origin which
could be the beginning, end or current position of the pointer.
• write(fil_des, buf, count): This system call writes the count
bytes of data from the buffer buf to the file specified by file
descriptor file_des.
• read(fil_des, buf, size): This system call reads the size bytes
from the file identified by file descriptor fil_des into the user
buffer buf. Read also returns the number of bytes it has read
successfully.
• close(fil_des): This system call closes the file which has file
descriptor fil_des, making the file unavailable to the process.
• chdir(file_name): This system call changes the current directory
of the calling process to the file name file_name.
• link(file_name1, file_name2): This system call sets another
name file_name2 for the already existing file named as
file_name1. The file could then be accessed using both the file
names, be it file_name1 or file_name2.
• unlink(file_name): This system call removes the directory entry
for the file file_name.

15.7 I/O MANAGEMENT


In UNIX, all the I/O devices can be treated as files and can be
accessed through the same system calls (read() and write()) as
used for ordinary files. In order to enable applications to access I/O
devices, Linux integrates I/O devices into a file system as what are
called special files. Each of the I/O devices is assigned a pathname,
normally under the directory /dev. For example, a printer might be
accessed as / dev/ lp. An important characteristic of special files is
that they can be accessed like regular files and no special system
calls are required to open, read, or write these files.
UNIX splits special files into two classes: block special files and
character special files. A block special file corresponds to a block
device (such as hard disk, floppy disk, CD-ROM, DVD, etc.) and
comprises a sequence of fixed-size numbered blocks. It allows
random access, that is, each block in a block special file can be
accessed individually. On the other hand, a character special file
corresponds to a character device that reads/writes stream of
characters such as mouse, keyboard, printer, etc. The character
special files do not support all the functionality provided by regular
files and even they don’t need to.
Each I/O device in UNIX is uniquely identified by the combination
of major device number and minor device number. The major device
number is used to identify the driver associated with that device
while the minor device number is used to identify the individual
device in case the driver supports multiple devices. UNIX also
maintains a separate hash table for character and block I/O devices.
Both the hash tables store data structures containing pointers to the
procedures for opening, reading, or writing to the device. Whenever
an application accesses any special file, the file system checks
whether the accessed file is a block special file or character special
file. Then, it identifies the major and minor device numbers
associated with that file. The major device number is used to index
into the appropriate internal hash table (block or character) and the
minor device number is used as a parameter.

Handling Block I/O Devices


Block devices provide the main interface to all the disk devices in a
system. Therefore, the part of I/O system that handles block devices
aims to minimize the number of disk accesses during disk I/O. This
objective is achieved by employing a buffer cache that stores a large
number of most recently accessed disk blocks. Whenever a disk
block is to be read, the file system searches the cache to locate the
desired block. If the block is found then there is no need for a disk
access, which results in better system performance. However, if the
block is not in the cache, it is read from the disk, copied to the cache,
and then copied to wherever it is required. All the successive
requests for the same block can now be satisfied from the cache. The
cache works well in case of disk writes also. Whenever it is required
to write to disk blocks, it also goes to cache and not to the disk. At the
time cache grows above its specified limit, all the dirty blocks
(modified disk blocks) are transferred to the disk. Note that to avoid
any inconsistency, dirty blocks are transferred after every 30 seconds.

Handling Character I/O Devices


Handling character devices is relatively simple. As character devices
input or output a stream of characters, the character device drivers do
not allow for random access to fixed-size blocks and even it does not
make any sense. For instance, is it is not meaningful (even not
possible) to access a specified block (say, 230) on a mouse. Thus,
there is no need of buffer cache for character I/O devices. Instead,
such devices use a character-buffering system that keeps small
blocks of characters (usually 28 bytes) in linked lists called C-lists.
As characters arrive from terminals and other character devices, they
are buffered in a chain of these blocks. C-lists are different from
buffer cache in the sense that the C-lists are read only once, that is,
as each character is read it is immediately destroyed; whereas buffer
cache can be read multiple times.
Furthermore, each terminal device is associated with a line
discipline—an interpreter for the data exchanged with that terminal
device. The characters are not passed directly from the C-list to the
process; rather they pass through line discipline. It acts a filter, which
accepts the raw character stream from the terminal drivers,
processes it, and produces the cooked character stream. This
stream is produced after some local line editing such as erased
characters and removed extra lines. This cooked stream is passed to
the process. In case the user process wants to interact with every
character, he or she can put the line in raw mode so as to bypass the
line discipline. Raw mode is generally used in those cases where no
conversions are required such as sending binary data to other
computers over a serial line and for GUIs.

15.8 ELEMENTARY SHELL PROGRAMMING


UNIX users invoke commands by interacting with command-language
interpreter also known as shell. The shell is built outside the kernel
and is written as a user process. The system invokes a copy of shell
for the user as he/she logs in, so that the related interactions could be
handled. There are five common shells namely, the Bourne, Korn,
TC, Bourne Again SHell and C shell with the program names sh, ksh,
tcsh, bash and csh, respectively. Since Bourne shell is simplest and
one of the most widely used UNIX shell today, we will discuss
commands with reference to it.
Shell can be used in one of the two ways, interactively or by
writing shell scripts.
• In the interactive mode, a single command or a short string of
commands is provided by user and the result is obtained
immediately.
• In shell scripting, there could be few lines or an entire program
typed in any text editor and executed as a shell script. If there are
many commands which are difficult to remember and they always
follow the same execution sequence, then it is worth preserving
them as a shell script.

15.8.1 Logging in UNIX


Whenever a user starts the UNIX system, he/she is prompted to login
into the system by providing user id and password. User id is a name
that is assigned by the administrator to the user. A user can access
the system only if correct user id and password are entered.

15.8.2 Basic Shell Commands


In this section, we will discuss few basic shell commands.
• pwd:It is used to know the present working directory. To know in
which directory you are currently working, type pwd and press
Enter as shown below.

It says that the usr1 is currently logged in and is currently present


at directory dir5 of parent directory dept3.
• cat: Suppose we have a file named file1 and we need to display
its contents on screen. We can do this by executing the cat
command as follows.
$ cat file1
Book
Pen
Eraser
Handbook
Chalk
Duster
Blackboard
Diary

• ls:This command lists all the files in the current directory as


shown below.
$ ls
hello.c
palindrome.c
prime.pas
prog1.cpp
dir2

• cd: This command is used to change to a new directory. For


example, the following command changes the current directory to
mydir.
$ cd mydir

When cd is executed without any argument, it will take you back to


your home directory.
• wc: This command counts the number of lines, words and
characters in the specified file(s). For example, the following
command counts and displays the number of lines, words and
characters, respectively, in the file1.
$ wc file1
8 8 47 file1

In the above output, first 8 is representing the number of lines,


next 8 is representing the number of words and 47 is representing the
number of characters in file1. Notice that the name of the file is also
displayed in the output. To count the number of lines, words and
characters individually, we can use –w, –c and –l switches.

15.8.3 Standard Input, Output and Error


Keyboard is used as the standard input while terminal display is
normally used for both standard output and error output. However,
we can use symbols > and < to redirect the output to and take input
from disk file.

15.8.4 Re-direction
We use re-direction operators to modify the default input and output
conventions of UNIX command. For example, the following command
will redirect the output of ls command to the file named
list_of_files.
$ ls > list_of_files
$

If the file list_of_files does not exist, it will be created; if it is


already present, then it will be overwritten with the output of ls
command. Now if we wish to check the contents of file list_of_files,
we can use the cat command as follows.
$ cat list_of_files
hello.c
palindrome.c
prime.pas
prog1.cpp
dir2

For re-directing the input, we use the operator < as shown in the
following command.
$ wc–w < file1
8

Here, the contents of file1 become the input for the command.
The command then counts the number of words in the input contents
and displays it. Notice that the name of the file is not displayed in the
output because the command does not know from where the output
is provided to it.

15.8.5 Wildcards
Wildcard characters are basically used in pattern matching. ‘*’ and ‘?’
characters generate one or more filenames which become part of the
effective command. First the wildcard expressions are resolved and
then the resultant expanded command is interpreted. ‘*’ matches with
any series of characters within the filenames of the directory while ‘?’
matches with a single character. For example to list all the files with
.c extension, we can use the following command.
$ ls *.c
hello.c
palindrome.c

To list the file whose name contains 10 characters and the


extension is .c, we can use the following command.
$ ls ??????????*.c
palindrome.c

15.8.6 Filters
Filter is a program that inputs data from standard input, perform some
operation on the data, and outputs the result to the standard output.
Thus, it can be used as a pipeline between the two programs. The
commonly used filters in UNIX are sort and grep.
• Sort: This command is used to display the contents of a file in
ascending or descending order. By default, the contents of a file
are displayed in ascending order. As an example, consider the
following command that displays the contents of file1 in
ascending order.

• To display the contents in descending order, we can use –r switch


as shown in the following command.
• Grep: This command searches for a specific pattern of characters
in the input. Lines containing the pattern are displayed as output,
if found. In the following example, grep is used to search for lines
containing ook in the filename file1.
$ grep ‘ook’ file1
Book
Handbook

• The pattern in this example is a simple case of complex


specification technique known as regular expression.

15.8.7 Shell Program


A sequence of shell commands stored in a text file forms a shell
program or script. This script is interpreted by shell as if the
commands were entered individually on the command prompt of
terminal.
Suppose, we have a text file file_display which contains a small
shell program to display a message on the screen. We can view the
contents of this file using cat command.
$ cat file_display
echo Welcome to the World of Shell Programming
The echo command is used to display the contents mentioned
after it on the terminal. We can execute the script named file_display
by writing its name at the prompt. When we execute this script, we
will get the following output.
$ Welcome to the World of Shell Programming

Variables
Like other programming languages, shell also provides the facility to
utilize the variables. Variables are declared simply by assigning value
to them. For example, the following statements declare two variables
and assign value to them.
$ val1=10
$ nam=ABC

Always remember to confirm that there are no spaces on either


side of the equals sign. All the variables are considered as a string of
characters. In the first example, though the value is notified as
numeric 10, while processing, it will be considered as a string
containing two characters, 1and 0.
If the value to be assigned contains spaces, it must be explicitly
enclosed in single (or double) quotes as shown below.
$par=“The good old lady”

If quotes are missing, only the first word will be considered as the
value of the variable. To get the value of par variable, the operator $ is
placed before the name of the variable par as shown below.
$ echo $par
The good old lady

The $ operator is compulsory to differentiate the string par from


the variable par while echoing it.

Control Facilities
Shell also provides the decision and loop control structures. Though
these facilities can be invoked using UNIX commands interactively,
the main usage of these structures is basically in the context of shell
programming. The for statement has the syntax as mentioned below.
for variable in value_list
do

commands
done

Let us discuss it with the help of an example.

Output
The value of k is 1
The value of k is 2
The value of k is 3
The value of k is 4
The value of k is 6
The value of k is 7

There is a series of character strings (1, 2, 3, 4, 6 and 7)


separated by white space. Each item in the list is assigned to the
variable k one by one and for each value in k, the echo command is
executed to display the output on terminal.
As shell programming is a vast topic, only elementary part is
being covered in this chapter. It is required to be explored and
practiced to get the real flavor and command over it.

LET US SUMMARIZE
1. UNIX (officially trademarked as UNIX®) operating system is an open
source software that means complete source code is available with it, so
that one can customize the operating system.
2. UNIX development was started at AT&T’s Bell Laboratories by Ken
Thompson in 1969.
3. While designing UNIX, two concepts have been kept in mind. First about
the file system: it occupies ‘physical space’ and the second about the
process: it is supposed to have ‘life’.
4. In UNIX system, the process is the only active entity. Each process runs
as a single program and has single thread of control.
5. There are three types of processes that can be executed in UNIX: user
processes, daemon processes, and kernel processes.
6. UNIX allows a process to create multiple new processes by invoking the
fork() system call. The forking process is called parent while the newly
created process is termed as child.
7. The UNIX operating system is empowered with various interprocess
communication facilities, some of which include signals, pipes and shared
memory.
8. The processes in UNIX are executed in one of three modes at a particular
point of time. These modes are user mode, kernel non-interruptible mode,
and kernel interruptible mode.
9. UNIX is a time-sharing operating system, it basically follows round-robin
scheduling algorithm with some variation in it, which is multilevel adaptive
scheduling.
10. UNIX employs a simple and straightforward memory model which not only
increases the program’s portability but also enables UNIX to be
implemented on systems with diverse hardware designs.
11. Earlier versions of UNIX system (prior to 3BSD) were based on swapping;
when all the active processes could not be kept in memory, some of them
were moved to the disk in their entirety. Berkeley added paging to UNIX
with 3BSD in order to support larger programs. Virtually, all the current
implementations of UNIX support demand-paged virtual memory system.
12. File system is the most visible part of any operating system. UNIX
provides a simple but elegant file system which uses only a limited
number of system calls.
13. A UNIX file is a stream of zero or more bytes containing arbitrary data. In
UNIX file system, the six types of files are identified, which are regular (or
ordinary) file, directory file, special, named pipe, link and symbolic link.
14. Each file in UNIX is associated with a data structure known as inode
(index node). The inode of a specific file contains attributes and the disk
addresses of the blocks allocated to the file.
15. The inodes of all files existing in the file system are stored in an inode
table stored on disk. A file can be opened by just bringing its inode into
the main memory and storing it into the inode table resident in memory.
16. UNIX adopts indexed allocation method to keep track of files, where a part
of index is stored in the inode of files. Each inode includes a number of
direct pointers that point to the disk blocks containing data and three
indirect pointers, including single, double and triple.
17. A directory is a file with special format where the information about other
files is stored by the system. UNIX uses a hierarchical directory structure
(often referred to as directory tree).
18. In UNIX, directories are implemented using a variation of linear list
method. Each directory entry consists of two fields: the file (or
subdirectory) name which can be maximum 14 bytes and the inode
number which is an integer of 2 bytes.
19. In UNIX, all the I/O devices can be treated as files and can be accessed
through the same system calls (read() and write()) as used for ordinary
files.
20. In order to enable applications to access I/O devices, Linux integrates I/O
devices into a file system as what are called special files.
21. UNIX splits special files into two classes: block special files and character
special files.
22. A block special file corresponds to a block device (such as hard disk,
floppy disk, CD-ROM, DVD, etc.) and comprises a sequence of fixed-size
numbered blocks. On the other hand, a character special file corresponds
to a character device that reads/writes stream of characters such as
mouse, keyboard, printer, etc.
23. Each I/O device in UNIX is uniquely identified by the combination of major
device number and minor device number. The major device number is
used to identify the driver associated with that device while the minor
device number is used to identify the individual device in case the driver
supports multiple devices.
24. UNIX users invoke commands by interacting with command-language
interpreter also known as shell. The shell is built outside the kernel and is
written as a user process.
25. There are five common shells namely, the Bourne, Korn, TC, Bourne
Again SHell and C shell with the program names sh, ksh, tcsh, bash and
csh, respectively.
26. Shell can be used in one of the two ways, interactively or by writing shell
scripts.
EXERCISES
Fill in the Blanks
1. The first GUI for UNIX was introduced by _____________.
2. _____________ are the background processes that are responsible for
controlling the computational environment of the system.
3. _____________ system call suspends the execution of the calling process
until it receives a signal.
4. The _____________ segment of the program in memory holds the
environment variables along with the arguments of command line which
was typed to the shell in order to invoke the program.
5. _____________ command searches for a specific pattern of characters in
the input.

Multiple Choice Questions


1. UNIX development was started at AT&T’s Bell Laboratories by Ken
Thompson in _____________.
(a) 1967
(b) 1969
(c) 1964
(d) 1968
2. Which of the following system calls does not fall under the file
management system calls?
(a) unmap()
(b) close()
(c) lseek()
(d) creat()
3. Which of the following is not a shell in UNIX?
(a) Bourne shell
(b) C shell
(c) Korn shell
(d) New shell
4. Which of the following is used as an IPC mechanism in UNIX?
(a) Pipes
(b) Signals
(c) Shared memory
(d) All of these
5. Which of the following system calls removes the directory entry for a
specified file?
(a) chdir()
(b) unlink()
(c) close()
(d) None of these

State True or False


1. UNICS stands for UNiplexed Information and Computing System.
2. Data through the pipe is transmitted in last-in-first-out order and cannot be
read twice.
3. The pwd command is used to change the password.
4. Block devices provide the main interface to all the disk devices in a
system.
5. Special files cannot be accessed like regular files.

Descriptive Questions
1. Write a short note on the history of UNIX.
2. Explain the architecture, role and function of UNIX kernel with the help of
a block diagram.
3. Write short notes on the following.
(a) Process management system calls
(b) Memory management system calls
(c) I/O management in UNIX
4. Describe the types of processes that can be executed in UNIX.
5. Explain the IPC mechanisms used in UNIX.
6. Give the reasons for dynamic variation of process priorities in UNIX.
7. List some standard subdirectories and files contained in them in UNIX.
8. Give the purpose of text, data and stack segment in address space.
9. How memory is managed in UNIX?
10. How files and directories are implemented in UNIX?
11. Explain the ways in which shell can be used.
12. Explain cat, ls and cd shell commands with the help of an example.
13. What is re-direction? Explain with an example how we can redirect input
and output in UNIX.
14. What does ‘*’ and ‘?’ characters stand for? Explain with the help of an
example.
15. What does wc command do? Explain with its switches.
chapter 16

Case Study: Linux

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ List different components of a Linux system.
⟡ Explain how processes and threads are created and terminated in
Linux.
⟡ Discuss how processes are scheduled.
⟡ Explain memory management strategies for physical and virtual
memory.
⟡ Describe the file system in Linux.
⟡ Explore how I/O devices are handled in Linux.

16.1 INTRODUCTION
Linux is a UNIX-like system whose development started in 1991 by
Linus Torvalds—a Finnish student at the University of Helsinki. It is a
multiuser multitasking operating system and its kernel is also
monolithic in nature. It comes with a GPL (GNU Public License) that
was devised by Richard Stallman, the founder of Free Software
Foundation (FSF). According to this license, the users may use, copy,
modify and redistribute the Linux source code and binary code freely
with a restriction that all works derived from the Linux kernel may not
be sold or redistributed only in binary form; the source code also has
to be shipped together with the product or made available on
demand.
16.2 THE LINUX SYSTEM
The Linux system consists of three main components, namely, kernel,
system libraries and system utilities (see Figure 16.1).
The Linux kernel is the core of the Linux system as it provides an
environment for the execution of processes. It also provides various
system services to allow protected access to hardware resources.
The code contained in kernel is always executed in the processor’s
privileged mode (also called kernel mode) and as a consequence,
has complete access to all the physical resources of the system. The
Linux kernel does not contain any user-mode code.

Fig. 16.1 Components of the Linux System

The Linux kernel is created as a single, monolithic binary, that is,


the entire kernel code (such as device drivers, file systems and
networking code) and data structures are present in the single
address space. This eliminates the need of context switches when a
user application invokes some operating-system service or when
some hardware interrupt occurs thereby improving the performance.
In spite of being a single, monolithic binary, Linux kernel still provides
some modularity in form of loadable kernel modules. The Linux
kernel is capable of loading (and unloading) any module dynamically
at run time. Note that these modules are independently loadable and
always execute in kernel mode.
The system libraries contain all the operating-system-support
code that need not be executed in the kernel mode. They provide a
standard set of functions using which applications can interact with
the kernel. The applications are not allowed to directly interact with
the kernel; rather they make use of system libraries which in turn call
the desired operating system service. Whenever a user application
requests for some kernel-system-service, it makes call to system
libraries which collect the system call arguments and represent them
in a form necessary to invoke the system call. The system libraries
also offer some other routines that do not correspond to any system
call rather they perform some specific functions such as string
manipulation, mathematical operations and sorting.
The system utilities are a set of user-mode programs with each
program designed to perform an independent, specialized
management task. Some utility programs may need to be executed
only once such as those which are necessary for initializing or
configuring some aspect of the system. Other utility programs (known
as daemons) may need to be executed continuously such as those
for handling user login requests and responding to incoming network
connection requests.

16.3 PROCESS AND THREAD MANAGEMENT


Traditional UNIX systems support only a single thread of execution
per process; however, modern UNIX systems like Linux allow each
process to have multiple kernel-level threads. The information
pertaining to a process or thread is maintained in task_struct data
structure. This data structure contains information regarding the
current state of process, process scheduling, IPC mechanisms, files
opened by the process, virtual address spec assigned to process,
process identifier along with user and group identifiers, and the
registers and stack information (that is, context of process).
Note: Linux employs the same IPC mechanisms as in UNIX.

16.3.1 Creation and Termination of Processes and


Threads
Though Linux supports both processes and threads, it does not
distinguish between the two. Thus, it generally uses the term task to
refer flow of control in a program. To create a task, Linux supports the
system call fork() whose functionality is identical to that in UNIX. It
also provides the ability to create a task using the clone() system
call.
Whenever clone() system call is invoked, multiple arguments
(flags) are passed to it which specify the extent of sharing between
the parent and child task. Some of the flags used in clone() system
call are as follows.
• CLONE_VM: This flag indicates the sharing of same memory space.
• CLONE_FS: This flag indicates the sharing of file-system
information.
• CLONE_FILES: This flag indicates the sharing of set of open files.
• CLONE_SIGHAND: This flag indicates the sharing of signal handlers.
If all of the above flags are set when clone() is invoked, the
parent and child task share the same resources, as specified by the
flags. However, if none of the above flags is set on invocation of
clone(), the parent and child task sharing nothing and the clone()
proceeds similar to fork().
Unlike UNIX, Linux does not keep the entire context of a task
within the main process data structure; rather, it uses separate data
structures that hold different subcontexts, including file-system
context of process, virtual memory context, file-descriptor table, and
signal handler table. The task_struct data structure contains only
pointers to subcontexts’ data structures. This enables the sharing of a
subcontext among any number of processes just by including a
pointer to that subcontext in their task_struct data structure.
When clone() system call is invoked to create a new task, the
arguments passed specify which subcontexts are to be shared
between the parent and child task and which are to be copied. The
newly created task is always assigned a unique identifier and a new
scheduling context. However, depending on the arguments passed,
the new task may use the same subcontext data structures as used
by the parent task or may create new subcontext data structures
which are initialized with copy of parent’s subcontext data structures.
To terminate a process, the Linux provides the exit_group()
system call, which when invoked terminates a process along with all
its threads.
Note: The fork() system call is just a special case of clone() in
which all the subcontexts of parent are copied to the child and
nothing is shared between the two.

16.3.2 Process Scheduling


Linux has two separate classes of processes: real-time and non-real-
time and the real-time processes are given priority over non-real-time
processes. The real-time processes are assigned priorities ranging
from 0 to 99 (that is, 100) where 0 denotes the highest priority. On the
other hand, the priorities assigned to non-real-time processes range
from 100 to 139 (that is, 40) where 100 denotes the highest priority.
Thus, Linux supports total 140 priority levels.
In Linux, the real-time processes can be scheduled in two ways:
first-come first-served (FCFS) and round robin (RR) within each
priority level (0-99). Accordingly, it provides two real-time scheduling
classes, one for real-time FCFS processes and another for real-time
RR processes. Within each scheduling class, each process has a
priority assigned to it and the CPU is always allocated to the process
having the highest priority. Note that a real-time FCFS process has
higher priority than a real-time RR process and it cannot be
preempted until it terminates, or blocks, or voluntarily exits. On the
other hand, a real-time RR process is associated with a time slice
and thus, is preemptable by the clock.
Though Linux scheduler ensures to serve a real-time process
before non-real-time processes, the kernel cannot guarantee how
rapidly the process will be scheduled once it becomes ready for
execution. This is because the Linux kernel is non-preemptive; a
process running in kernel mode cannot be preempted, even if a real-
time process with a higher priority is ready to run. Thus, Linux offers
soft real-time scheduling rather than hard.
The non-real-time processes in Linux are scheduled in a time
sharing manner. But the notion of time slice differs from that of
conventional time sharing algorithm. In Linux, the time slice of a
process varies according to its priority; higher priority implies larger
time slice. For instance, the processes with priority 100 may get a
time slice of 800 msec while the processes with priority 139 may get
5 msec. Moreover, a process can use its time slice over a period of
time in accordance with its priority.
The Linux scheduler uses a runqueue data structure that
contains the runnable processes. As Linux supports SMP, each
processor has its own runqueue data structure. Further, each
runqueue maintains two arrays: active and expired, which are
indexed from 0 to 139 corresponding to 140 priority levels (see Figure
16.2). The active array contains the processes that have their time
slices remaining and the expired array contains the processes that
have exhausted their time slices. In each of these arrays, the ith
position points to the list of processes with priority i.
At any instant, the scheduler selects a process with the highest
priority (say, P) from the active array for execution. After the process P
has executed for its time slice but still not finished, it is moved to the
expired array. However, if the process gets blocked (due to wait for
some event) before its time slice expires, it is put back in the active
array with its time slice decremented by the amount of CPU time it
has already taken. Once the awaited event has occurred, the blocked
process can be resumed. After all the processes in the active array
have exhausted their time slices (that is, active array is empty), the
active and expired arrays are exchanged; active array now becomes
the expired array and vice-versa. This way the processes are
scheduled in Linux.

Fig. 16.2 Active and Expired Arrays

16.4 MEMORY MANAGEMENT


As we know memory management is not only concerned with the
allocation of physical memory to the requesting processes but with
managing virtual memory also. So, whenever we talk about memory
management of any operating system, we need to consider both
physical memory and virtual memory.

16.4.1 Physical Memory Management


In Linux, due to several hardware limitations, different regions of
physical memory cannot be dealt in the similar manner. Thus, Linux
divides the physical memory into three different zones, which are as
follows.
• ZONE_DMA: refers to physical memory regions used for DMA
purposes.
• ZONE_NORMAL:refers to physical memory regions used for satisfying
routine memory space request; this area is mapped to the
address space of CPU.
• ZONE_HIGHMEM:refers to physical memory regions used for pages
with high-memory address; this area is not mapped into the
address space of kernel.
Note: The layout of the memory zones depends upon the
architecture of the system.
For each zone, the kernel maintains a separate page allocator to
manage memory individually. In order to keep track of free pages in
physical memory, the page allocator uses the buddy algorithm. In
this algorithm, if a memory request for a small chunk cannot be
fulfilled by the smallest available chunk, then the available chunk is
divided into two buddies of equal size. If the resulting chunks are still
too large to accommodate the request, one of them is further
subdivided into two equal-size buddies. This process continues until a
chunk of the desired size is obtained. Note that the buddy algorithm
always allocates memory in the power of 2 units. For example, a
process of 5 KB will always be allocated an 8 KB block of memory.
To understand the buddy algorithm, suppose the smallest
available memory block is of 32 KB and a request for 4 KB arrives. To
fulfill this request, the block of 32 KB is divided into two blocks of 16
KB each. Since 16 KB is still larger, one block of 16 KB is further
subdivided into two blocks of 8 KB each. This process continues until
the smallest block enough to accommodate the request (in our case 4
KB) is available. This whole process is depicted in Figure 16.3.
Fig. 16.3 Buddy Algorithm

The allocator would then allocate one of the 4 KB blocks to the


requesting process and keep the rest of the block of 4 KB, 8 KB, and
16 KB on the free block list. Though this algorithm is simple, it results
in internal fragmentation; each memory request is rounded up to a
power of 2, which generally leads to wastage of large amount of
memory.

16.4.2 Virtual Memory Management


The virtual memory is responsible for creating virtual pages and
managing the transfer of those pages from disk to memory and vice-
versa. In Linux, virtual memory is considered from two different views,
namely, logical view and physical view.
• Logical view: According to this view, the virtual address space of
a process comprises a set of homogeneous, contiguous, page
aligned areas or regions. Each area is described by
vm_area_struct structure. Each entry in this structure lists the
properties of that area like read/write permission.
• Physical view: According to this view, the virtual address space
consists of a set of pages. The information related to pages is
stored in a page table. Using the page table, the location of each
page of virtual memory can be determined.

Creating Virtual Address Space


A new virtual address space is created by the kernel when either an
entirely new code is executed by a process using the execve()
system call or a process spawns a child process using the fork()
system call. Whenever a process is created using execve() system
call, a new virtual address space is assigned to that process. On the
other hand, whenever a process is created using fork() system call,
a new virtual address space is created but it is an exact replica of that
of parent process. The parent’s process vm_area_struct descriptors
are copied and a new set of page tables are created for the child
process. In child’s page table, the page table of parent’s is copied
directly thereby allowing both parent and child process to share the
same physical pages in their virtual address space.

Paging
Similar to UNIX, Linux also rely on paging, that is, the transfer
between memory and disk are always done in units of pages. The
page replacement in Linux is also performed using a variation of the
clock algorithm (discussed in Chapter 8). Each page is associated
with age that indicates how frequently the page is accessed.
Obviously, the pages which are accessed frequently will have higher
value of age as compared to that of less frequently accessed pages.
During each pass of clock, the age of a page is either increased or
decreased depending on its frequency of usage. Whenever a page is
to be replaced, the page replacement algorithm chooses the page
with the least value of age.

16.5 FILE SYSTEM


The earliest file system used in Linux was the Minix file system. It
was restricted by short filenames up to 14 characters and maximum
file size of 64 MB. Therefore, after five years, a new improved file
system, known as extended file system (extfs), was developed.
This file system supported larger filenames and greater file size but it
was slower than the Minix file system. So, the ext file system was
redesigned to add the missing features and improve the performance
of the file system and it gave rise to the ext2 file system also called
second extended file system. The ext2fs has now become the
standard file system of Linux. Apart from ext2fs, Linux offers a variety
of other file systems also.
To support different file systems, the Linux kernel offers virtual file
system (VFS), which hides the differences among the various file
systems from the processes and applications. Here, we discuss the
Linux VFS first and then other file systems of Linux are discussed.

Virtual File System


The Linux VFS is based on the principles of object-oriented
programming. It defines four types of objects including inode, file,
superblock, and dentry and each object type is associated with a set
of operations. The description of VFS object types is as follows.
• inode: An inode (a shortened form of index node) object
describes a specific file. VFS defines an inode object
corresponding to each file in the file system. Since devices and
directories in Linux are also treated as files, they have
corresponding inodes also. An inode is maintained on the
physical disk as a data structure that contains pointers to the disk
blocks storing the actual file contents.
• file: The file object describes an open file associated with a
process. Before accessing the contents of an inode object, a
process needs to obtain the file object which points to the inode.
The file object also keeps track of the current position in file
where the read/write operation is being performed. Note that
there can be multiple file objects corresponding a single inode
object with each file object belonging to a single process.
• superblock: The superblock object describes a set of linked files
that constitute an independent file system. The main task of
superblock object is to provide access to the inodes. Each inode
is identified in VFS by a unique pair of file-system and inode
number. Whenever an inode is to be accessed, the VFS passes
the request to the superblock object which then returns the inode
with that number.
• dentry: The dentry object describes a directory entry which may
comprise the actual file name or the name of the directory in path
name of the file. For example, for a file with the pathname
/bin/include/conio.h, there will be four dentry objects
corresponding to four directory entries /, bin, include, and
conio.h.

16.5.1 Linux ext2 File System


The Linux ext2fs is the most popular on-disk file system in use. It
uses the same mechanism for storing the pointers to data blocks and
finding the data blocks corresponding to a file as used in UNIX BSD
Fast File System (FFS). In this file system, the inodes have a fixed
size and can accommodate only a fixed number of pointer entries. An
inode holds only thirteen pointers out of which, first ten pointers are
‘direct’ pointers and remaining three pointers are ‘indirect’ pointers.
Direct pointers point directly to the data blocks, whereas indirect
pointers point to index blocks which further point to data blocks.
Though in Linux the directory files are treated as normal files, they
are interpreted in a different manner. Each data block of a directory
file contains a linked list. Each entry of linked list stores the file name,
inode number of the inode associated with that file, length of entry,
and the information about the group of blocks allocated to the file.
The major thing that distinguishes ext2fs and FFS is the disk
allocation policies. The ext2fs performs allocations in small units;
block sizes of 1 KB, 2 KB or 4 KB, in contrast to FFS file system,
where allocation is performed in large blocks of 8 KB each.

16.5.2 Linux ext3 File System


The Linux ext2 file system improved the performance of the file
system by allocating blocks of small size, but in case of sudden
system failures and breakdowns, its performance was not
satisfactory. The inefficiencies in the Linux ext2 file system led to ext3
file system which uses the concept of journaling. Journaling refers to
the process of maintaining a log (journal) in which the changes made
to the file system are recorded in a sequential order. The changes
written sequentially help in reducing the overheads due to disk head
movements at the time of random disk accesses as already explained
in log-structured file systems in Chapter 12.

16.5.3 Linux proc File System


The Linux proc (process) file system does not store files persistently;
instead the files are read and used when required by the user. The
basic idea is that for each individual process in system, a directory is
created in the /proc file system. The name of this directory is the
decimal number corresponding to process’s PID such as /proc/345.
Inside this directory are the virtual files (not actually stored on the
disk) that seem to store process-related information like its signal
masks, command line, etc. When a user needs to read these virtual
files, the system retrieves the desired information from the actual
process at that time and returns it.

16.6 I/O MANAGEMENT


Like UNIX, Linux also splits special files into two classes: block
special files and character special files. Each I/O device in Linux is
also uniquely identified by the combination of major device number
and minor device number. However, in Linux few enhancements have
been in a way the block and character I/O devices are handled.
Handling Block I/O Devices
In Linux also, the main aim of the part of I/O system that handles
block devices aims to minimize the number of disk accesses during
disk I/O. For this, it also employs a buffer cache between the disk
drivers and the file system. However, Linux also aims at minimizing
the latency of repetitive disk head movements. To achieve this
objective, it relies on I/O scheduler that schedules the disk I/O
requests in an order that optimizes the disk access. The basic
scheduler of Linux is Linus Elevator that exploits the order in which
I/O requests are added or removed from each device queue. A list is
maintained to store the disk I/O requests with the requests arranged
in an increasing order of the address of desired sectors (sorted list).
New I/O requests are added to the list in a sorted order thereby
avoiding repetitive disk head movements. However, this scheduler
may lead to starvation of requests (towards the end of list) in case
more and more requests are added in between the sorted list.
Therefore, modern versions of Linux use a different scheduler named
deadline scheduler.
The deadline scheduler works in the same way as elevator
scheduler; however, it avoids starvation by associating a deadline
with each request. It maintains two additional lists: one for read
requests and other for write requests both of which are ordered by
deadline. Each I/O request is inserted in the sorted list as well as in
one of the read list (in case of read request) and write list (in case of
write request). Normally, the requests are scheduled from the sorted
list but in case the deadline of any request from read or write list
expires, the requests are scheduled from the list containing the
expired request. Thus, the deadline scheduling guarantees to
complete all the requests within their specified deadlines.

Handling Character I/O Devices


In Linux, each character device driver implementing a terminal device
is represented in the kernel with the help of a tty_struct data
structure (instead of C-lists). The tty_struct structure provides
buffering for the data stream from the terminal device and feeds that
data to the line discipline. The most common line discipline for the
terminal devices is the tty discipline. This line discipline enables the
user processes to directly interact with the terminal by attaching the
terminal’s input/output with the standard input/output streams of the
user process. Linux also allows a user to put the line in raw mode so
as to bypass the line discipline.

LET US SUMMARIZE
1. Linux is a UNIX-like system whose development started in 1991 by Linus
Torvalds—a Finnish student at the University of Helsinki. It is a multiuser
multitasking operating system and its kernel is also monolithic in nature.
2. The Linux system consists of three main components, namely, kernel,
system libraries and system utilities.
3. The Linux kernel is the core of the Linux system as it provides an
environment for the execution of processes. It also provides various
system services to allow protected access to hardware resources.
4. The system libraries contain all the operating-system-support code that
need not be executed in the kernel mode. They provide a standard set of
functions using which applications can interact with the kernel.
5. The system utilities are a set of user-mode programs with each program
designed to perform an independent, specialized management task.
6. Traditional UNIX systems support only a single thread of execution per
process; however, modern UNIX systems like Linux allow each process to
have multiple kernel-level threads.
7. The information pertaining to a process or thread is maintained in
task_struct data structure.
8. Linux generally uses the term task to refer flow of control in a program. To
create a task, Linux supports the system call fork() whose functionality is
identical to that in UNIX. It also provides the ability to create a task using
the clone() system call.
9. To terminate a process, the Linux provides the exit_group() system call,
which when invoked terminates a process along with all its threads.
10. Linux has two separate classes of processes: real-time and non-real-time
and the real-time processes are given priority over non-real-time
processes.
11. In Linux, the real-time processes can be scheduled in two ways: first-come
first-served (FCFS) and round robin (RR).
12. The non-real-time processes in Linux are scheduled in a time sharing
manner. But the notion of time slice differs from that of conventional time
sharing algorithm. In Linux, the time slice of a process varies according to
its priority; higher priority implies larger time slice.
13. In Linux, due to several hardware limitations, the physical memory regions
cannot be dealt in the similar manner. So, Linux divides the physical
memory into three different zones, which are ZONE_DMA,
ZONE_NORMAL, and ZONE_HIGHMEM.
14. The virtual memory is responsible for creating virtual pages and managing
the transfer of those pages from disk to memory and vice-versa.
15. Linux provides a variety of file systems including ext2fs, ext3fs, and proc
file system. To support these different file systems, the Linux kernel offers
virtual file system (VFS), which hides the differences among the various
file systems from the processes and applications.
16. Like UNIX, Linux also splits special files into two classes: block special
files and character special files. Each I/O device in Linux is also uniquely
identified by the combination of major device number and minor device
number.
17. In addition to minimizing the number of disk accesses during disk I/O,
Linux also aims at minimizing the latency of repetitive disk head
movements. To achieve this objective, it relies on I/O scheduler that
schedules the disk I/O requests in an order that optimizes the disk
access. The basic scheduler of Linux is Linus Elevator that exploits the
order in which I/O requests are added or removed from each device
queue.
18. Modern versions of Linux use a different scheduler named deadline
scheduler. It works in the same way as elevator scheduler; however, it
avoids starvation by associating a deadline with each request.
19. In Linux, each character device driver implementing a terminal device is
represented in the kernel with the help of a tty_struct data structure,
which provides buffering for the data stream from the terminal device and
feeds that data to the line discipline.

EXERCISES
Fill in the Blanks
1. _____________ are a set of user-mode programs with each program
designed to perform an independent, specialized management task.
2. For each zone, the kernel maintains a separate _____________ to
manage memory individually.
3. Each I/O device in Linux is uniquely identified by a combination of
_____________ and _____________.
4. The _____________ contain all the operating-system-support code that
need not be executed in the kernel mode.
5. In Linux, the information pertaining to a process or thread is maintained in
_____________ data structure.

Multiple Choice Questions


1. In which year the development of Linux started?
(a) 1990
(b) 1989
(c) 1991
(d) 1992
2. Which of the following system calls in Linux allows the creation of a task?
(a) create_task()
(b) fork()
(c) process()
(d) None of these
3. The Linux physical memory is divided into _____________ different memory
zones.
(a) Two
(b) Three
(c) Four
(d) None of these
4. Which of the following VFS object types describes a set of linked files that
constitute an independent file system?
(a) dentry
(b) superblock
(c) inode
(d) file
5. How many priority levels does Linux support?
(a) 99
(b) 139
(c) 100
(d) 140

State True or False


1. The Linux kernel does not contain any user-mode code.
2. Linux employs swapping rather than paging.
3. Linux ext2 file system was restricted by the short filenames up to 14
characters and maximum file size of 64 MB.
4. Linus Elevator is the basic scheduler of Linux.
5. In Linux, the tty_struct structure provides buffering for the data stream
from the terminal device.

Descriptive Questions
1. Explain various components of a Linux system.
2. Describe different object types defined by the virtual file system.
3. What do the active array and expired array in Linux contain?
4. Differentiate between logical and physical view of virtual memory.
5. When does the kernel create a new virtual address space?
6. Suppose we have a block of 512 KB available in physical memory. How
would the buddy system of Linux serve the following memory requests
coming in the shown order?
(a) 120 KB
(b) 60 KB
(c) 30 KB
(d) 80 KB Illustrate the memory allocations diagrammatically.
7. “Linux offers soft real-time scheduling rather than hard”. Explain.
8. What does the file object in Linux VFS describe?
9. Write short note on the following.
(a) Linux ext2 file system
(b) Linus Elevator
(c) Deadline scheduler
(d) Journaling
10. What are the objectives of the part of I/O system handling block devices?
11. Explain how character I/O devices are handled in Linux.
12. Explain how processes and threads are created in Linux.
chapter 17

Case Study: Windows

LEARNING OBJECTIVES
After reading this chapter, you will be able to:
⟡ Understand the structure of Windows 2000 operating system.
⟡ Explain various mechanisms used for communication between
processes.
⟡ Discuss how threads are scheduled.
⟡ Explain how memory is managed in Windows 2000.
⟡ Describe the file system in Windows 2000.
⟡ Explore how I/O devices are handled in Windows 2000.

17.1 INTRODUCTION
Microsoft Windows is the most popular series of operating system in
the past decade. Windows 95 revolutionized the personal computer
operating system market. Then came Windows 98, Windows ME,
Windows NT, Windows 2000, Windows XP, Windows Vista, Windows
7, and the latest one is Windows 8. In this chapter, we are discussing
how various operating system concepts are implemented in Windows
2000.
Windows 2000 is a 32-bit preemptive multitasking operating
system for Intel Pentium and later microprocessors. Being the
successor of Windows NT 4.0 and having user interface of Windows
98, Windows 2000 was originally going to be named as Windows NT
5.0; however, in 1999, Microsoft renamed it to Windows 2000 so that
the users of both Windows 98 and NT could see the neutral name as
the next logical step for them. In fact, Windows 2000 is better
Windows NT with user interface of Windows 98. It included various
features which were previously available only in Windows 98, such as
support for USB bus, plug and play devices and power management.
In addition, it introduced some new features including X.500-based
directory service, support for smart cards and security using
Kerberos.
Keeping pace with previous versions of Windows NT, Windows
2000 was also released in several versions. Microsoft introduced four
versions of Windows 2000: Windows 2000 Professional intended for
desktop use, Windows 2000 Server, Windows 2000 Advanced Server
and Windows 2000 Datacenter Server. Despite of minor differences
among these versions, the same binary executable file was used for
all the versions.

17.2 STRUCTURE
Windows 2000 comprises two major parts: the operating system and
the environmental subsystems. The operating system is organized
as a hierarchy of layers, each layer utilizing the services of the layer
underneath it. The main layers include hardware abstraction layer
(HAL), kernel and executive, each of which runs in protected (kernel)
mode. The environmental subsystems are a collection of user-
mode processes which enable Windows 2000 to execute programs
developed for other operating systems such as Win32, POSIX and
OS/2. Each subsystem provides an operating environment for a
single application. However, the main operating environment of
Windows 2000 is the Win32 subsystem and therefore, Windows
2000 facilitates to use Win32 API (Win32 Application Programming
Interface) calls. In this section, we will discuss only the operating
system’s structure.

17.2.1 Hardware Abstraction Layer (HAL)


As the name implies, the role of HAL is to hide the hardware
differences and present the upper layers with abstract hardware
devices. HAL conceals many of the machine dependencies in it and
exports a virtual machine interface which is used by rest of the
operating system and device drivers. This approach helps to make
Windows 2000 portable as the hardware need not be addressed
directly thereby requiring only minor changes in the kernel and drivers
when being ported to new hardware. Some of the services that HAL
provides include interrupt handling and resetting, accessing device
registers, DMA transfers, bus-independent device addressing and
controlling timers and real-time clock. In addition, HAL provides
support for symmetric multiprocessing.

17.2.2 Kernel
The role of kernel is to provide a higher-level abstraction of the
hardware to the executive and environmental subsystems. Other
responsibilities of kernel include thread scheduling, low-level
processor synchronization, exception handling and recovery after
power failure. The main characteristic of Windows 2000 kernel is that
it is permanently resident in the main memory and its execution is
never preempted.
The kernel of Windows 2000 is object-oriented, that is, it uses a
set of object types to perform its functions. An object type is a
system-defined data type that consists of a set of attributes as well as
a set of methods. Each instance of an object type is referred to as an
object. The kernel supports two classes of objects, namely, control
and dispatcher objects. The attributes of both of these objects contain
the kernel data and the methods of these objects perform the
activities of kernel. The control objects include those objects which
control the system while the dispatcher objects include those
objects which handle dispatching and synchronization in the system.
Table 17.1 lists some kernel objects along with their use.
Fig. 17.1 Structure of Windows 2000 System

Table 17.1 Kernel Objects

17.2.3 Executive
The executive offers a variety of services that can be used by
environmental subsystems. These services are grouped under
several components some of which include I/O manager, object
manager, process manager, plug and play (PnP) manager, security
manager, power manger and virtual memory manager. The
description of all these components is as follows.
• I/O manager: The I/O manager includes file system, cache
manager, device drivers and network drivers. Under I/O manager,
the file systems are technically treated as device drivers and in
Windows 2000, two such drivers exist, one for the FAT and
another for NTFS. The I/O manager keeps track of which
installable file systems are loaded. It is also responsible for
controlling the cache manager that deals with caching for the
entire I/O system. Another major responsibility of I/O manager is
to provide generic I/O services to the rest of the system.
Whenever an I/O request arrives, the I/O manager invokes the
appropriate device driver to perform physical I/O, thereby
providing device-independent I/O. In addition, it facilitates one
device driver to call another.
• Object manager: As already described, object is the basic
component that Windows 2000 operating system uses for
performing all its functions. It is the responsibility of object
manager to monitor the usage of all the operating system objects
by keeping track of which processes and threads are accessing
which objects. It provides processes and threads a standard
interface (called handles) to all types of objects. Whenever a
process or thread needs to access some object, it calls open()
method of object manager that in turn generates an object
handle (an identifier unique to the process) and returns it to the
requesting process or thread. The object manager is also
responsible for allocating a part of kernel address space to the
object at the time of its creation and returning the same to the free
list at the time of its termination.
• Process manager: As the name suggests, Windows 2000
process manager is responsible for process and thread
management including their creation, termination and use.
However, it is not aware of process hierarchies; those details are
known only to the specific environmental subsystem to which the
process belongs. When some application, say Win32 application,
needs to create a new process, it invokes the appropriate system
call. As a result, a message is passed to the corresponding
subsystem (in our case, Win32 subsystem) which then calls the
process manager. The process manager, in turn, calls the object
manager for creating a process object and returns the object
handle (corresponding to the newly created process) generated
by the object manager to the subsystem. Once handle to the new
process has been received, the subsystem again calls the
process manager for creating a thread for the new process. The
same process is repeated and a handle to thread is returned to
the subsystem. The subsystem then passes both the handles to
the requesting application.
• Plug and play (PnP) manager: Whenever a new hardware is
installed in the system or some changes are made to the existing
hardware configuration, there should be some entity that
recognizes these changes and adapts to those changes. Such
entity in operating system is the PnP manager. The PnP manager
automatically recognizes the devices installed in the system and
detects changes (if any) while the system operates. It is also
responsible for locating and loading the appropriate device drivers
to make the devices work. For example, when a USB device is
attached to the system, a message is passed to the PnP
manager, which then finds and loads the appropriate driver.
• Security manager: Windows 2000 employs security mechanism
that conforms to the U.S. Department of Defense’s C2
requirements, the Orange Book. This book specifies several rules
such as secure login, privileged access control, address space
protection per process and so on, which the operating system
must follow in order to be classified as secure system. The
security manager is responsible for ensuring that the system
always functions conforming to these rules.
• Power manager: The power manager is responsible for
supervising the usage of power in the system, and take
necessary actions to reduce the power consumption and maintain
integrity of information. For example, whenever the monitor has
been idle for a while, the power manager turns it off to save
energy. Similarly, on laptops, when the battery is about to run dry,
the power manager takes appropriate actions which inform open
applications to save their files and get ready for shutdown.
• Virtual memory manager: The virtual memory manager in
Windows 2000 uses the demand-paged management scheme. It
is responsible for allocating and freeing virtual memory, mapping
of virtual addresses to physical address space, enforcing
protection rules to restrict each process to access pages in its
own address space and not of others, and so on. It also facilitates
memory-mapped file I/O with the help of I/O manager.

17.3 PROCESS AND THREAD MANAGEMENT


Like many contemporary operating systems, Windows 2000 also
uses the notion of processes and threads. Each process comprises
one or more threads (the units of execution scheduled by the kernel)
with each thread comprising multiple fibers (the lightweight threads).
Furthermore, the processes to be handled as a unit can be combined
to form a job. The concept of jobs, processes, threads and fibers are
used together to achieve parallelism and manage resources in both
single and multiprocessor environment.
Each job is associated with some quota and resource limit which
indicate information such as maximum number of processes a job
can have, maximum CPU time each process can utilize, maximum
CPU time for all the processes in a job, memory limit of each process,
memory limits of all processes combined, and so on. Further, each
process has a process ID, 4 GB virtual memory address space,
information such as its base priority (the actual priority assigned at
the time of creation) and an affinity for one or more processors, a list
of handles (managed in kernel mode), and an access token to hold
security information. Each thread of a process has its own ID and
state which includes its priority, affinity and accounting information.
Note: The IDs for a process and its thread are taken from the same
space; hence, either the ID of process or of its thread can be in use at
a time.
To create a process, the Win32 CreateProcess() application call is
executed, passing the name of an executable file as a parameter to it.
This executable file specifies the contents of process’s address space
and generates its first thread. Though every process begins with a
single executing thread, additional threads can be created later on
using the Win32 CreateThread() application call. Note that in
Windows 2000, the kernel performs the thread creation and thus,
threads are not implemented strictly in user space.

17.3.1 Inter-process Communication (IPC)


Windows 2000 offers a wide variety of mechanisms to let the
processes or threads communicate with each other. Some of the
standard mechanisms include pipes, mailslots, sockets, remote
procedure calls (RPC) and shared memory. Socket and RPC
mechanism have already been discussed in Chapter 2. In addition,
the concept of shared memory in Windows 2000 is same as that of
Linux and thus, is not discussed here.
Note: In this section, we will use the terms process or thread
interchangeably.

Pipes
Pipe is the standard communication mechanism that enables to
transfer data between processes. In Windows 2000, pipes are
available in two modes: byte mode and message mode. The byte-
mode pipes work in the same way as pipes in Linux (described in
previous chapter). However, message-mode pipes are little bit
different as they preserve message boundaries. For example, if a
sender sends a 256 byte message in two writes of 128 bytes each,
the receiver will read it as two separate messages rather than a
single message. Like Linux, Windows 2000 also supports named
pipes. Named pipes are similar to regular pipes in the sense that
they are also available in same two modes (byte and message) but
unlike regular pipes they can be employed for communication in a
networked environment.

Mailslots
Mailslot is a mechanism that enables one-way communication
between processes on same system or different systems connected
via network. A mailslot is a repository of inter-process messages and
it resides in memory. The process that creates and owns a mailslot is
known as mailslot server while the processes that communicate with
it by putting messages in its mailslot are known as mailslot clients.
Any process that knows the name of the mailslot can write
messages to it. Each incoming message is appended to the mailslot
and kept stored there until read by the mailslot server. Mailslots allow
a mailslot client to broadcast a message to multiple mailslot servers
located anywhere on the network provided all mailslot servers bear
the same name of mailslot.
Whenever a server creates a mailslot, it is provided with a mailslot
handle. A mailslot server can read messages from the mailslot only
using this handle. In addition, any process other than the mailslot
server who has obtained the handle to a mailslot can read the
messages from it. Note that the same process can act as both
mailslot server and mailslot client. This enables bi-directional
communication between processes using multiple mailslots.
Note: Though mailslots can be employed in a networked
environment, they are unreliable and thus, do not offer guaranteed
delivery of messages.

17.3.2 Scheduling
In Windows 2000, the entities the operating system schedules are the
threads, not the processes. Accordingly, each thread has a state
while processes do not. A thread, at a time, can be in one of the six
possible states: ready, standby, running, waiting, transition and
terminated. Some of these states have already been described in
Chapter 2, while rests are described as follows.
• Standby: A ready thread is said to be in ‘standby’ state if it is the
next one to be executed.
• Transition: A newly created thread is said to be in ‘transition’
state while it is waiting for the resources required for its execution.
Every thread continues to switch among various states during its
lifetime. Figure 17.2 shows the thread state transition diagram.
The Windows 2000 scheduler selects the threads regardless of
which process they belong to. It does not even know which processes
own which threads. To determine the order in which ready threads
are to be executed, the scheduler uses 32 priority levels, numbered
from 0 to 31, where higher number denotes the higher priority. The 32
priority levels are divided into two classes: real-time and variable. The
threads with priorities from16 to 31 belong to real-time class while
from 0 to 15 belong to variable class. Note that the priorities 16 to 31
are kept reserved by the system and cannot be assigned to user
threads.
Fig. 17.2 Thread State Transition Diagram

To use scheduling priorities, the system maintains an array with


32 entries (indexed from 0 to 31) with each entry corresponding to a
specific priority level. In addition, it maintains a queue at each priority
level to hold the ready threads having the corresponding priority.
Each array entry points to the head of its respective queue (see
Figure 17.3).
To start with, the scheduler traverses through all the queues,
starting from highest to lowest, until it finds a nonempty queue. Once
a queue having any ready thread(s) has been found, the thread at the
head of the queue is scheduled for one time quantum. When the time
quantum of the thread expires but the thread has not still finished, it is
moved to the end of queue at its priority level and the thread at the
head of the queue is scheduled next. Note that while traversing
through the queues, if the scheduler finds a ready thread having
particular processor affinity but the desired processor is unavailable,
then it skips the thread and continues searching. In case no ready
thread is found, the scheduler runs a special thread called idle
thread.
Under certain specific conditions, the priority of some variable
class thread can be raised above or lowered down the base priority.
However, it must be noticed that the current priority of a variable class
thread must always be greater than base priority but less than or
equal to 15. In contrast, the priorities of real-time class threads are
never changed.
Fig. 17.3 Priority Levels

17.4 MEMORY MANAGEMENT


Memory management in Windows 2000 primarily involves managing
the virtual memory. The part of operating system that is responsible
for managing the virtual memory is called virtual memory (VM)
manager. Unlike scheduler, the memory manager in Windows 2000
deals entirely with processes and not with threads. This is because
the memory is allocated to a process and not to a thread.
Each user process in Windows 2000 is assigned a virtual address
space and since, VM manager uses 32-bit addresses, the virtual
address space of each process is 4 GB (232 bits) long. Out of 4 GB,
the lower 2 GB of address space store the code and data of a
process while the upper 2 GB are reserved for the operating system.
Thus, a process is allowed to use only 2 GB (lower) of its address
space. However, certain configurations of Windows 2000 enable a
process to use 3 GB of its address space by keeping only 1 GB
reserved for the operating system.

17.4.1 Paging
Windows 2000 supports demand-paged memory management
scheme to manage the memory. Theoretically, page sizes can be of
any power of two up to 64 KB, however, on Pentium, the page size is
fixed and is of 4 KB while on Itanium, it can be of 8 KB or 16 KB. The
VM manager of Windows 2000 supports the page size of 4 KB.
Note: Windows 2000 does not support segmentation.
The virtual memory of each process is divided into pages of 4 KB
each and also, the physical memory is divided into page frames of 4
KB each. Windows 2000 uses two-level paging in which page table
itself is also paged. Each process has a page directory (higher level
page table) that holds 1024 page directory entries where each entry
is of size 4 bytes (32 bits). Each page directory entry (PDE) points to
a page table. Furthermore, each page table of a process holds 1024
page table entries where each entry is of size 4 bytes. Each page
table entry (PTE) points to a page frame in the physical memory.
Figure 17.4 shows the virtual memory layout of a process.
Fig. 17.4 Virtual Memory Layout of a Process

Note: The maximum size of all page tables of a process can never exceed 4
MB.

The 32-bit virtual address of each process is divided into three


parts: p1 of 10 bits, p2 of 10 bits and page offset d of 12 bits. To map a
given virtual address onto byte address in physical memory, the
virtual address is sent to MMU. The MMU uses p1 as an index to the
page directory and selects the corresponding PDE from it. The
selected PDE points to a specific page table. The MMU uses p2 to
select a PTE from the specified page table. The selected PTE points
to a specific page frame in physical memory. The page offset d is
used to point to the specific byte in that page frame. Figure 17.5
shows the virtual-to-physical address translation in Windows 2000.
Note that the leftmost 20 bits of each 32 bit PTE specify the page
frame in physical memory, next three are reserved for operating
system’s use and rest of the bits specify whether the page is valid,
dirty, cacheable, accessed, read only, write through or kernel mode.
The MMU concatenates 20 bits of page frame from PTE with 12 bit
page offset (rightmost 12 bits of virtual address) to create a pointer to
the specific byte address in physical memory.

Fig. 17.5 Virtual-to-Physical Address Translation

17.4.2 Handling Page Faults


Windows 2000 employs the concept of working set. Each process
has a working set that specifies the pages in active use by that
process. The working set of each process is characterized by two
parameters: lower bound and upper bound, which describe the
minimum and maximum number of pages respectively that a process
can have in memory at a time. However, in exceptional cases, a
process may have less than minimum or more than maximum pages
in memory at a time.
Note: By default, the lower bound of a process lies in the range of 20
to 50 pages while the upper bound lies in the range of 45 to 345
pages depending on the available amount of RAM. However, these
defaults may be changed by the administrator.
To handle page faults, two dedicated kernel threads, balance set
manager and working set manager, work in conjunction with each
other as follows. The balance set manager keeps track of free
frames on the free list and the working set manager keeps track of
the working set of processes. When a process causes page fault, the
working set manager examines the current working set size of the
process. If it is smaller than the lower bound, the balance set
manager checks whether there are enough free frames in the free
list. If yes, the desired page is brought into the memory. However, if
the balance set manager determines that there are no enough free
frames, then some pages must be removed from the in-memory
processes. To do this, the working set manager needs to examine the
working set of processes and recover more pages.
First of all, the working set manager determines the order in which
the in-memory processes should be examined. Generally, the large
processes which have been idle for a long time are considered first,
then the small active processes and finally, the foreground processes
are considered for examination. Once the order has been specified,
the working set manager starts examining the processes in that order.
During inspection, a process having current working set size greater
than the lower bound is chosen as victim and thus, one or more
pages are taken from it. However, if the current working set size of a
process is found smaller than the lower bound or if the process has
caused more than a certain number of page faults since the last
inspection, the process is skipped and the working set manager
continues with the next process. Note that the total number of pages
to be removed from a process depends on various parameters such
as size of available RAM, how the current working set size is
compared with lower and upper bound of process, and so on.
17.5 FILE SYSTEM
On a system running Windows 2000, one of three file systems,
namely, FAT16, FAT32 and NTFS (New Technology File System) can
be used. However, NTFS supersedes the FAT file systems and has
become the standard file system of Windows 2000 because of
several improvements. Some of the major improvements in NTFS
over FAT file systems are as follows.
• It includes features like data recovery, file compression, large files
and file systems, encryption, etc.
• It provides greater control over security and access of data within
the file system.
• It supports large drives or partitions.
• It provides improved performance, reliability and efficient storage
using advanced data structure.

17.5.1 NTFS Physical Structure


The basic entity of NTFS is volume. An NTFS volume can be a
logical partition of the disk or the entire disk. It is organized as a
sequence of clusters where a cluster is a collection of contiguous disk
sectors; the number of disk sectors in a cluster is a power of 2. It is
the smallest unit of disk space that can be allocated to a file. The size
of a cluster for a volume varies from 512 bytes to 64 KB depending
on the size of volume. For example, the default cluster size for a 2
GB volume is 2 KB. Each cluster starting from the beginning of the
disk to the end is assigned a number known as logical cluster
number (LCN). NTFS uses these logical cluster numbers instead of
actual disk addresses while allocating space to files.
In NTFS, a file is considered to be a structured object consisting
of a set of attributes which are nothing but independent byte streams.
Some standard attributes like name, timestamp, and the date of
creation are defined for all the files. Note that user data is also
considered as an attribute and is stored in data attributes.
In order to keep track of information regarding each file on
volume, NTFS maintains a master file table (MFT). It is created in
addition to the boot sector and some system files when a volume is
formatted using NTFS. The MFT is itself a file that contains at least
one record for each file. Each MFT record consists of a sequence of
(attribute header, value) pairs. The attribute header identifies the
attribute and indicates the length of the value. If the value of attribute
is short enough to fit in the MFT record, it is stored in the MFT record
and is called resident attribute. On the other hand, if the value of
attribute is too long, it is placed on one or more contiguous extents on
the volume and a pointer to each extent is stored in the MFT record.
Such attribute is known as nonresident attribute. Note that there
may be a case when a file is extremely large or it has many
attributes. In such case, two or more MFT records are required; the
first one is known as base record that points to the other MFT
records. Some of the attributes along with their description are listed
in Table 17.2.

Table 17.2 Some attributes in MFT Records

Attribute Description
Standard information Contains information like flag bits,
timestamp.
File name Contains the file name in Unicode.
Attribute list Lists the location of additional MFT
records.
Object ID Represents the file identifier unique to the
volume.
Volume name Contains the name of the volume; used in
$Volume metadata file
Volume information Contains the version of the volume; used
in $Volume metadata file
Index root Used to implement directories.
Index allocation Used to implement very large directories.
Data Contains stream data.
NTFS associates each file with a unique ID known as file
reference. It is of 64 bits where first 48 bit and last 16 bits represent
the file number and the sequence number respectively. The file
number represents the record number in the MFT containing that
file’s entry and the sequence number shows the number of times
that MFT entry has been used.

17.5.2 Metadata Files


In NTFS, the internal information about the data for a volume is
stored in special system files known as metadata files. The MFT is
also one of the metadata files; it not only stores information about
itself but also about other metadata files. The first 16 MFT records
are reserved for metadata files, the first one being the record for MFT
itself. The second record is mirror copy of MFT that contains first 16
entries of original MFT file. It is used for recovery in case the original
MFT file gets corrupted.
Metadata files are represented in the MFT using a dollar sign ($)
at the beginning of the name. Some of the metadata files other than
MFT along with their file names and description are listed in Table
17.3.

17.5.3 Directory Implementation


Like in MS DOS and UNIX, the file system is organized as a
hierarchy of directories; a directory can contain other directories.
NTFS implements each directory using a data structure called B+
tree; an index of the file names of that directory is stored in B+ tree.
B+ tree not only makes insertion of new names in the directory at
appropriate place easier but also facilitates efficient search of a file in
a directory. This is because, in B+ tree, the length of each path from
the root of the tree to a leaf is same.
Table 17.3 Some Metadata Files in MFT
17.6 I/O MANAGEMENT

A computer system consists of various I/O devices such as keyboard,


mouse, scanner, monitor, printer, web camera, and several other
devices. Windows 2000 has been designed with a general framework
to which new devices can easily be attached as and when required.
The I/O manager in Windows 2000 is closely connected to the plug
and play (PnP) manager which automatically recognizes new
hardware installed on the computer, and make appropriate changes
in the hardware configuration.
An important feature of Windows 2000 is that it supports both
synchronous as well as asynchronous I/O. The synchronous I/O
causes the process (that requires I/O) to remain blocked until the I/O
operation is completed. On the other hand, in asynchronous I/O, the
process need not wait for I/O completion; rather it can continue its
execution in parallel with I/O. After the I/O operation has completed,
some signal or interrupt is generated, and the results are provided to
the process. The support for asynchronous I/O is especially important
on servers.
Another interesting feature of Windows 2000 is that it supports
dynamic disks. These disks may span multiple partitions and even
multiple disks. These disks can be reconfigured on the fly, that is,
there is no need to reboot the system after configuring these disks. In
this way, the logical drives are no longer constrained to be on a single
partition or even a single disk. This allows spanning of single file
system to multiple drives in a transparent way.

Implementation of I/O
The I/O manager provides a general framework in which different I/O
devices can operate. This framework basically consists of two parts:
one is a set of loaded device drivers that contains the device-specific
code required for communicating with the devices, and another is
device-independent code that is required for certain aspects of I/O
such as uniform interfacing for device drivers, buffering, and error
handling.
The device drivers should be written in such as way that they
should conform to a Windows Driver Model defined by Microsoft,
which ensures that the device drivers are compatible with the rest of
Windows 2000. Microsoft has also provided a toolkit that help driver
writers to produce conformant drivers. The drivers must meet certain
requirements to conform to the Windows Driver Model. Some of
these requirements are given below.
• The drivers must be able to handle incoming I/O requests that
arrive to them in the form of a standardized packet called an I/O
Request Packet (IRP).
• The drivers must be object based as rest of Windows 2000 in the
sense that they must provide a set of procedures that the rest of
the system can call. In addition, they must be able to correctly
deal with other Windows 2000 objects.
• The drivers must completely support plug and play feature, that is,
they must allow devices to be added or removed as and when
required.
• The drivers must be configurable, that is, they must not contain
any built-in assumptions about which I/O ports ad interrupt lines
certain devices use. For example, a printer driver must not have
any fixed address of the printer port hard coded into it.
• The drivers must permit power management, wherever required.
Power management is required to reduce the power consumption
of some devices of the system or of the entire system, in case the
system is idle. Windows 2000 provides several options to
conserve power. One is turning off the monitor and the hard disks
automatically when the system is idle for a short period. Another
option is to put the system on a standby mode in case you are
away from the system for a while. Third option is to put the
system in hibernation mode in case the system is idle for a long
time period such as overnight. Both standby and hibernation
mode put the entire system in a low-power state. The system
must also wakeup when told to do so.
• The drivers must be capable of being used on a multiprocessor
system because Windows 2000 was basically designed for use
on multiprocessors. This implies that the driver must function
correctly even if the driver code is executed concurrently by two
or more processors.
• The drivers must be portable across Windows 98 and Windows
2000. That is, the drivers must not work only on Windows 2000
but also on Windows 98.
As we have discussed that in Linux, the major device number is
used to identify the driver associated with that device. In Windows
2000, a different scheme is followed to identify the drivers associated
with each device. Whenever the system is booted, or whenever a
new plug-and-play device is attached to the system, Windows 2000
automatically detects the device and calls the PnP manager. The PnP
manager finds out the manufacturer and the model number of the
device, and using this information it looks up in a certain directory on
the hard disk to locate the driver. If the driver is not available in that
directory, it prompts the user to insert a floppy disk or CD-ROM that
contains the required driver. Once the driver is located, it is loaded
into the memory.
As stated earlier, the drivers must be object based in the sense
that they must provide a set of procedures that the rest of the system
can call to get its services. The two basic procedures that a driver
must provide are DriverEntry and AddDevice. Whenever a driver is
loaded into the memory, a driver object is created for it. Once the
driver is loaded, the DriverEntry procedure is called that initializes
the driver. During initialization, it may create some tables and data
structures, and provide values to some of the fields of the driver
object. These fields basically include pointers to all the other
procedures that drivers must supply. The driver objects are stored in
a special directory, \??.
In addition to driver object, a device object is also created for the
device that is controlled by that driver, which points to the driver
object. The device object is used to locate the desired driver object in
the directory \??. Once the driver object is located, its procedures can
be called easily.
The AddDevice procedure is called by the PnP manager once for
each device to be added. Once the device has been added, the driver
is called with the first IRP for setting up the interrupt vector and
initializing the hardware.
Windows 2000 allows a driver to do all the work either by itself
(just like the printer driver), or can divide the work among multiple
drivers. In the former case, the drivers are said to be monolithic, and
in the latter case, the drivers are said to be stacked, which means
that a request may pass through a sequence of drivers, each doing a
part of the work (see Figure 17.6).
Fig. 17.6 Stacked Drivers

The stacked drivers can be used to separate out the complex bus
management part from the functional work of actually controlling the
device. This helps the device writers to write only the device-specific
code without having to know the bus controlling part. Some device
drivers can also insert filter drivers at the top of the stack, which
perform some transformations on the data that is being transferred to
and from the device. For example, a filter driver could compress the
data while writing it onto the disk, and decompress it while reading.
Note that both the application programs and the true device drivers
are unaware of the presence of filter drivers—the filter driver
automatically performs the data transformations.

LET US SUMMARIZE
1. Microsoft Windows is the most popular series of operating system in the
past decade. Windows 95 revolutionized the personal computer operating
system market. Then came Windows 98, Windows ME, Windows NT,
Windows 2000, Windows XP, Windows vista, Windows 7, and the latest
one is Windows 8.
2. Windows 2000 is a 32-bit preemptive multitasking operating system for
Intel Pentium and later microprocessors.
3. Windows 2000 comprises two major parts: the operating system and the
environmental subsystems.
4. The Windows 2000 operating system is organized as a hierarchy of
layers, each layer utilizing the services of the layer underneath it. The
main layers include hardware abstraction layer (HAL), kernel and
executive, each of which runs in protected (kernel) mode.
5. The environmental subsystems are a collection of user-mode processes
which enable Windows 2000 to execute programs developed for other
operating systems such as Win32, POSIX and OS/2.
6. Each environmental subsystem provides an operating environment for a
single application. However, the main operating environment of Windows
2000 is the Win32 subsystem and therefore, Windows 2000 facilitates to
use Win32 API (Win32 Application Programming Interface) calls.
7. The role of HAL is to hide the hardware differences and present the upper
layers with abstract hardware devices. HAL conceals many of the
machine dependencies in it and exports a virtual machine interface which
is used by rest of the operating system and device drivers.
8. The role of kernel is to provide a higher-level abstraction of the hardware
to the executive and environmental subsystems. Other responsibilities of
kernel include thread scheduling, low-level processor synchronization,
exception handling and recovery after power failure.
9. The executive offers a variety of services that can be used by
environmental subsystems. These services are grouped under several
components some of which include I/O manager, object manager, process
manager, plug and play (PnP) manager, security manager, power manger
and virtual memory manager.
10. In Windows 2000, each process comprises one or more threads (the units
of execution scheduled by the kernel) with each thread comprising
multiple fibers (the lightweight threads). Furthermore, the processes to be
handled as a unit can be combined to form a job.
11. Windows 2000 offers a wide variety of mechanisms to let the processes or
threads communicate with each other. Some of the standard mechanisms
include pipes, mailslots, sockets, remote procedure calls (RPC) and
shared memory.
12. In Windows 2000, the entities the operating system schedules are the
threads, not the processes. Accordingly, each thread has a state while
processes do not. A thread, at a time, can be in one of the six possible
states: ready, standby, running, waiting, transition and terminated.
13. To determine the order in which ready threads are to be executed, the
scheduler uses 32 priority levels, numbered from 0 to 31, where higher
number denotes the higher priority.
14. The 32 priority levels are divided into two classes: real-time and variable.
The threads with priorities from16 to 31 belong to real-time class while
from 0 to 15 belong to variable class. Note that the priorities 16 to 31 are
kept reserved by the system and cannot be assigned to user threads.
15. Memory management in Windows 2000 primarily involves managing the
virtual memory. The part of operating system that is responsible for
managing the virtual memory is called virtual memory (VM) manager.
16. Unlike scheduler, the memory manager in Windows 2000 deals entirely
with processes and not with threads. This is because the memory is
allocated to a process and not to a thread.
17. Each user process in Windows 2000 is assigned a virtual address space
and since, VM manager uses 32-bit addresses, the virtual address space
of each process is 4 GB (232 bits) long. Out of 4 GB, the lower 2 GB of
address space store the code and data of a process while the upper 2 GB
are reserved for the operating system.
18. Windows 2000 supports demand-paged memory management scheme to
manage the memory. Theoretically, page sizes can be of any power of two
up to 64 KB, however, on Pentium, the page size is fixed and is of 4 KB
while on Itanium, it can be of 8 KB or 16 KB. The VM manager of
Windows 2000 supports the page size of 4 KB.
19. Windows 2000 employs two dedicated kernel threads, balance set
manager and working set manager that work in conjunction with each
other to handle page faults.
20. The balance set manager keeps track of free frames on the free list and
the working set manager keeps track of the working set of processes.
21. On a system running Windows 2000, one of three file systems, namely,
FAT16, FAT32 and NTFS (New Technology File System) can be used.
However, NTFS supersedes the FAT file systems and has become the
standard file system of Windows 2000 because of several improvements.
22. The basic entity of NTFS is volume. An NTFS volume can be a logical
partition of the disk or the entire disk. It is organized as a sequence of
clusters where a cluster is a collection of contiguous disk sectors.
23. The cluster is the smallest unit of disk space that can be allocated to a file.
The size of a cluster for a volume varies from 512 bytes to 64 KB
depending on the size of volume. Each cluster starting from the beginning
of the disk to the end is assigned a number known as logical cluster
number (LCN).
24. In order to keep track of information regarding each file on volume, NTFS
maintains a master file table (MFT) that contains at least one record for
each file.
25. In NTFS, the internal information about the data for a volume is stored in
special system files known as metadata files.
26. Windows 2000 has been designed with a general framework to which new
devices can easily be attached as and when required. The I/O manager in
Windows 2000 is closely connected to the plug and play (PnP) manager
which automatically recognizes new hardware installed on the computer,
and make appropriate changes in the hardware configuration.
27. The I/O manager provides a general framework in which different I/O
devices can operate. This framework basically consists of two parts: one
is a set of loaded device drivers that contains the device-specific code
required for communicating with the devices, and another is device-
independent code that is required for certain aspects of I/O such as
uniform interfacing for device drivers, buffering, and error handling.
28. Windows 2000 allows a driver to do all the work either by itself, or can
divide the work among multiple drivers. In the former case, the drivers are
said to be monolithic, and in the latter case, the drivers are said to be
stacked, which means that a request may pass through a sequence of
drivers, each doing a part of the work.

EXERCISES
Fill in the Blanks
1. The main layers in Windows 2000 operating system include
_____________, kernel and _____________.
2. An _____________ is a system-defined data type that consists of a set of
attributes as well as a set of methods.
3. In Windows 2000, pipes are available in two modes: _____________ and
_____________.
4. Each page directory entry points to a _____________.
5. _____________ may span multiple partitions and even multiple disks.

Multiple Choice Questions


1. Which of the following is the main operating environment of Windows
2000?
(a) Win32 subsystem
(b) POSIX subsystem
(c) OS/2 subsystem
(d) None of these
2. A newly created thread is said to be in _____________ state while it is
waiting for the resources required for its execution.
(a) Standby
(b) Running
(c) Ready
(d) Transition
3. Which of the following is not an attribute of MFT?
(a) Volume information
(b) Boot sector
(c) Data
(d) Index root
4. Which of the following is not a component of executive?
(a) I/O manager
(b) Thread manager
(c) Object manager
(d) Power manager
5. What is the size of virtual address space of each user process in Windows
2000?
(a) 2 GB
(b) 1 GB
(c) 4 GB
(d) 3 GB

State True or False


1. The kernel offers a variety of services that can be used by environmental
subsystems.
2. Only the process who has obtained a mailslot handle can read messages
from the mailslot.
3. The virtual address space of each process is 4 GB long.
4. The executive offers a variety of services that can be used by
environmental subsystems.
5. In NTFS, each cluster starting from the beginning of the disk to the end is
assigned a number known as physical cluster number.

Descriptive Questions
1. Explain in brief the structure of Windows 2000 operating system.
2. How a virtual address is mapped onto physical address in Windows 2000?
3. List the various possible states a thread can be in at a time. Explain thread
state transitions with help of diagram.
4. Define the following terms.
(a) Mailslot client
(b) Mailslot server
(c) Resident attribute
5. How threads are scheduled in Windows 2000?
6. What do filter drivers perform?
7. What do the bits of each PTE indicate?
8. What is the role of plug and play manager?
9. Write short notes on the following.
(a) Mailslots
(b) Socket
(c) MFT
(d) Metadata files
10. List some requirements that a conformant driver must meet.
11. How page faults are handled in Windows 2000?
12. Discuss the basic procedures that a driver in Windows 2000 must provide.
Glossary

Access control list (ACL): a method of recording access rights in a


computer system—a list is associated with each object such as file
that stores user names (or processes) which can access the object
and the type of access allowed to each user (or process)
Access control matrix: a protection mechanism used in file system
that records the access rights of processes over objects in a
computer system
Address binding: the mapping from addresses associated with a
program to memory addresses
Aging: a process of gradually increasing the priority of a low priority
process with increase in its waiting time
Asymmetric multiprocessing systems: the multiprocessor
systems in which all processors are different and each of them
performs a specific task
Asynchronous I/O: an alternative to nonblocking I/O where the
invoking process need not wait for I/O completion, rather it can
continue its execution
Asynchronous thread cancellation: a type of thread cancellation
in which the target thread is terminated immediately after any thread
indicates its cancellation
Authentication: the process of verifying the identity of a user
Background processes: the system processes that are not related
with any user but still perform some specific function
Bad sectors: the sectors of a disk drive that do not read back the
correct value just written to them
Balance set manager: a kernel thread in Windows 2000 dedicated
to handle page faults—it keeps track of free frames on the free list—
works in conjunction with working set manager
Balanced utilization: the percentage of time all the system
resources are busy
Banker’s algorithm: an algorithm that ensures deadlock avoidance
Bare machine: a computer having no operating system
Basic file system: a component of the file system that issues
generic (general) commands to the appropriate device driver to read
and write physical blocks on the disk
Batch interface: a type of user interface in which several
commands and directives to control those commands are collected
into files which are then executed
Belady’s anomaly: a situation in which increasing the number of
page frames would result in more page faults
Best fit: a partition selection algorithm in which the operating system
scans the free-storage list and allocates the smallest hole whose
size is larger than or equal to the size of the process
Binary semaphore: a semaphore whose integer value can range
only between 0 and 1
Biometric authentication: uses the unique characteristics (or
attributes) of an individual to authenticate the person’s identity
Bit vector: a method to implement free-space list in which a bit map
of having number of bits equal to the number of blocks on the disk is
used—a ‘1’ on bit map indicates that the corresponding block is free,
while a ‘0’ indicates that the block is allocated
Bit-level striping: the data striping technique that splits each byte of
data into bits and stores them across different disks
Block cache: the technique of caching disk blocks in memory
Block device: a device that stores data in fixed-size blocks with
each block having a specific address—the data transfer to/from a
block device is performed in units of blocks
Blocking I/O system call: the system call which causes the
invoking process to remain blocked until the call is completed
Block-level striping: the data striping technique in which blocks of a
file are striped across multiple disks
Boot block: a fixed area on the disk which stores the entire
bootstrap program
Bootstrap program: the initial program that runs on the system
when the system boots up
Buffer: a region of memory used for holding streams of data during
data transfer between an application and a device or between two
devices
Bus: the simplest interconnection network in which all the
processors and one or more memory units are connected—
processors can send data on and receive data from this single
communication bus
Busy waiting: a situation in which processes execute a loop of code
repeatedly while waiting for an event to occur—results in wastage of
CPU cycles
Byte sequence: a type of file structure in which each file is made up
of sequence of 8-bit bytes having no fixed structure
Cache hit time: the time taken to access the data from cache in
case of a cache hit
Cache hit: a situation that occurs when the required data is found in
cache
Cache memory: a small, highspeed memory that aims to speed up
the memory access operation—stores the frequently accessed data
and instructions
Cache miss time penalty: the time takenin fetching the required
block of data from the main memory in case of a cache miss
Cache miss: a situation that occurs when the required data is not
found in cache
Cache: an area of very high speed memory, which is used for
holding copies of data—provides a faster and an efficient means of
accessing data
Caching: a technique in which the blocks of data from secondary
storage are selectively brought into main memory (or cache memory)
for faster accesses
Character device: a device that accepts and produces a stream of
characters—the data transfer to/from them is performed in units of
bytes
Child process: the process spawned by some other process—also
called sub process
C-LOOK scheduling: a variant of LOOK disk scheduling algorithm
in which the head scans through the disk in both directions, but
services the requests in one direction only—provides a uniform wait-
time
Clustered system: A system with multiple CPUs—two or more
individual systems (called nodes) are grouped together to form a
cluster that can share storage and are closely linked via high-speed
local area network (LAN)
Command-line interface: a type of user interface in which users
interact with the operating system by typing commands
Communication protocol: a set of rules that coordinates the
exchange of information—also known as a network protocol—two
most popular types of communication protocols are the ISO protocol
and TCP/IP protocol
Compaction: the technique of reforming the storage by relocating
(or shuffling) some or all portions of the memory in order to place all
the free holes together at one end of memory to make one large hole
Computation migration: an approach used for accessing data in
distributed system in which the computation is moved to the site
where the data is stored.
Concurrency: implies the simultaneous execution of multiple
processes
Concurrent processes: the processes that coexist in the memory at
some time
Consistency semantics: the characterization of the system that
specifies the semantics of multiple users accessing a shared file
simultaneously
Context switch: the mechanism of saving and restoring the context
while transferring the control of CPU from one process to another
Context: the portion of the process control block including the
process state, memory management information, and CPU
scheduling information— also called state information of a process
Contiguous allocation: an allocation method in which each file is
allocated contiguous blocks on the disk, that is, one after the other
Contiguous memory allocation: a memory allocation approach in
which each process is allocated a single contiguous part of the
memory
Control objects: the objects which control the system
Control synchronization: a kind of synchronization that is needed
when cooperating processes have to coordinate their execution with
respect to one another
Cooperating processes: the processes that need to exchange data
or information with each other—also called interacting processes
Counting semaphore: a semaphore whose integer value can range
over an unrestricted domain—also called general semaphore
CPU burst: the time period elapsed in processing before performing
the next I/O operation
CPU scheduling: the decision to select one of the multiple jobs in
the main memory for allocating CPU
CPU utilization: the percentage of time the CPU is busy in
executing processes
CPU-bound process: a process that involves higher computation
than I/O operations thereby demanding more use of CPU than I/O
devices during its life time—the speed of execution is governed by
CPU
Critical region: the portion of the code of a process in which it
accesses or changes the shared data—also known as critical section
Cross point: an electric switch that can be opened or closed
depending on whether communication is required between the
processor and memory
Crossbar switch: an interconnection network that uses an N x N
matrix organization, wherein N processors are arranged along one
dimension and N memory units are arranged along the other
dimension—every CPU and a memory unit are connected via an
independent bus
Cryptography: the process of altering messages in a way that their
meaning is hidden from the adversaries who might intercept them
C-SCAN scheduling: a variant of SCAN disk scheduling algorithm
in which the head scans through the disk in both directions, but
services the requests in one direction only—provides a uniform wait-
time
Cycle stealing: a mechanism in which CPU has to wait for
accessing bus and main memory when DMA controller acquires the
bus for transferring data
Data access synchronization: a kind of synchronization that is
needed when cooperating processes access shared data
Data migration: an approach used for accessing data in distributed
system in which the required data (such as a file) is moved to the
site where the computation on this data is to be performed
Data striping: a concept used to improve the performance of the
disk that distributes the data transparently among N disks, which
make them appear as a single large, fast disk
Deadlock avoidance: a technique that never allows allocation of a
resource to a process if it leads to a deadlock
Deadlock detection: a technique that periodically examines the
occurrence of deadlock in the system using some algorithm
Deadlock prevention: a technique that ensures that at least one of
the four necessary conditions for deadlock occurrence does not hold
Deadlock recovery: a methodology used for the recovery of the
system from deadlock and continue with processing
Deadlock: a situation where a set of processes is in a simultaneous
wait state and each of them is waiting for the release of a resource
held exclusively by one of the waiting processes in the set
Deadly embrace: a situation that occurs when two processes are
inadvertently waiting for the resources held by each other
Deferred thread cancellation: a type of thread cancellation in which
the target thread gets the opportunity to terminate itself in an orderly
manner
Degree of multiprogramming: the number of processes competing
to get the system resources in multiprogramming environment
Demand paging: a technique in which a page is loaded into the
memory only when it is needed during program execution—pages
that are never accessed are never loaded into the memory
Demand segmentation: a technique to implement virtual memory in
which a user program is divided into segments and the segments of
variable sizes are brought into the memory only when needed
Denial of service (DoS): a type of security violation that does not
damage information or access the unauthorized information but
prevents the legitimate use of the system for authorized users
Deterministic modeling: the simplest and direct method used to
compare the performance of scheduling algorithms—takes into
account the pre-specified system workload and measures the
performance of each scheduling algorithm for that workload
Device controller: an electronic component that can control one or
more identical devices depending on the type of device controller—
also known as an adapter
Device drivers: the kernel modules that encapsulate the differences
among the I/O devices
Device queue: a queue associated with each I/O device in the
system in which the processes that need to perform I/O are kept
Direct access: a file access method in which the data on the disk is
stored as blocks of data with index numbers which helps to read and
write data on the disk in any order
Direct memory access (DMA): a scheme in which the CPU assigns
the task of transferring data to DMA controller and continues with
other tasks—DMA controller can access the system bus
independent of CPU so it transfers the data on its own
Directed acyclic graph (DAG): a generalization of tree-structured
directory system which allows same file or subdirectory to appear in
different directories at the same time
Directory: a file with special format where the information about
other files is stored by the system
Disk access time: the period of time that elapses between a
request for information from disk and the information arriving at the
requesting device—combination of seek time, rotational delay, and
data transfer time
Disk scheduling: the decision to select which of the available disk
I/O requests should be serviced next—requests should be selected
in a order that reduces the seek time significantly
Dispatch latency: the amount of time required by the dispatcher to
suspend execution of one process and resume execution of another
process—low dispatch latency implies faster start of process
execution
Dispatcher objects: the objects which handle dispatching and
synchronization in the system
Dispatcher: the module of the operating system that sets up the
execution of the process selected by the scheduler on CPU
Distributed file system (DFS): a file system that provides a way by
which users can share files stored on different sites of a distributed
system
Distributed operating system: provides an abstract view of the
system by hiding the physical resource distribution from the users—
provides a uniform interface for resource access regardless of its
location
Distributed system: consists of a set of loosely coupled processors
that do not share memory or system clock, and are connected by a
communication medium
Double buffering: a mechanism that allows sharing of two buffers
between producer and consumer thereby relaxing the timing
requirements between them
Encryption: a means of protecting confidentiality of data in an
insecure environment, such as while transmitting data over an
insecure communication link
Entry section: the portion of the code of the process in which it
requests for permission to enter its critical section and sets some
variables to signal its entrance
Environmental subsystems: the collection of user-mode processes
which enable Windows 2000 to execute programs developed for
other operating systems such as Win32, POSIX and OS/2
Exit section: the portion of the code of the process in which it sets
some variables to signal the exit from the critical section
External fragmentation: the phenomenon resulting in the wastage
of memory, which is not a part of any partition—also known as
checkerboarding
Fairness: the degree to which each process is getting an equal
chance to execute
File attributes: the additional information (apart from file name)
associated with each file that helps the file system to manage a file
within the system
File control block (FCB): a component of logical file system that
stores information about a file such as ownership, permissions, and
location of the file content—in UNIX file system (UFS), it is known as
i-node (index node) and in NTFS this information is maintained in
master file table
File extension: the second part in the name of the file after the
period that indicates the type of the file and the operations (read,
write, execute) that can be performed on that file—generally one to
three characters long
File operations: the functions that can be performed on a file—
examples are create, write, read, seek, delete, open, append,
rename and close a file
File structure: the internal structure of the file that describes how a
file is internally stored in the system
File subsystem: basically deals with the activities related to
management of files, allocating space, controlling access to files,
retrieving data for users and administering free space
File system: a mechanism provided by the operating system that is
primarily responsible for the management and organization of
various files in a system
File: a collection of related data stored as a named unit on the
secondary storage
File-organization module: a component of the file system that
organizes the files—it knows the physical block address (the actual
address) and logical block address (the relative address), allocation
method, and location of a file
Filter: a program that inputs data from standard input, perform some
operation on the data, and outputs the result to the standard output
Firewall: a mechanism that protects and isolates the internal network from the outside
world First fit: a partition selection algorithm in which the operating system scans the free-
storage list and allocates the first hole that is large enough to accommodate the process
First-come first-served (FCFS) scheduling: a scheduling
algorithm which allows the requests to be executed in the order of
their arrival in the system—the request that comes first is serviced
first
First-in first-out (FIFO): a page replacement algorithm in which the
first page loaded into the memory is the first page is to be replaced—
pages are replaced in the order in which they are loaded into the
memory
Fixed blocking: a method of record blocking in which fixed-length
records are used and an integral number of records is kept in each
block
Foreground processes: the system processes that involve user
interaction
Free-space list: a list maintained by the file system to keep track of
the free blocks on the disk
Graceful degradation: a feature in which a system can continue to
operate in the event of failure, though with somewhat reduced
capabilities
Graphical user interface (GUI): a type of user interface in which
users interact with the system with a pointing device, such as a
mouse
Hackers: the programmers, who break into others computer
systems in order to steal, damage or change the information as they
want
Hard real-time systems: the real-time systems in which a process
must be accomplished within the specified deadlines; otherwise,
undesirable results may be produced
Hash table: a data structure, with 0 to n–1 table entries, where n is
the total number of entries in the table—uses a hash function to
compute a hash value (a number between 0 to n–1) based on the file
name.
Hashed page table: a page table in which each entry contains a
linked list of elements hashing to the same location—uses hash
value as the virtual page number
Hierarchical directory: a type of directory structure that allows
users to have subdirectories under their directories, thus making the
file system more logical and organized for the user— also known as
tree of directory or tree-structured directory
Hierarchical page table: a page table where a hierarchy of page
tables with several levels is maintained
Highest response ratio next (HRTN) scheduling: a non-
preemptive scheduling algorithm that schedules the processes
according to their response ratio—the process having the highest
value of response ratio among all the ready processes is scheduled
first
High-level disk formatting: the process of storing initial file-system
data structures and a boot block on each partition of the disk by the
operating system—also known as logical formatting
Hit ratio: the number of cache hits divided by the total number of
CPU references—always lies in the close interval of 0 and 1
Host-attached storage: the disk storage connected directly to the
network server and accessed only through local I/O ports I/O burst:
the time period elapsed in performing I/O before the next CPU burst
I/O-bound process: a process that involves a lot of I/O operations
as compared to computation during its life time—the speed of
execution is governed by the I/O device
Independent processes: the concurrent processes that do not
share any kind of information or data with each other—also called
competitors
Indexed allocation: an allocation method in which the blocks of a
file are scattered all over the disk in the same manner as they are in
linked allocation, however, the pointers to the blocks are brought
together at one location known as the index block
I-nodes: an allocation method in which each file is associated with a
data structure called i-node (index node) that stores the attributes of
file and the addresses of disk blocks allocated to the file
Interface: a standardized set of functions through which a device
can be accessed
Internal fragmentation: the phenomenon which results in the
wastage of memory within the partition
Interprocess communication (IPC): a facility provided by the
operating system that allows the processes running on a single
system to communicate with each other
Interrupt handler: a predefined location in the kernel’s address
space, which contains the starting address of the service routine for
the interrupt
Interrupt service routine (ISR): the part of the operating system
which executes the appropriate code segment to deal with the
interrupt
Interrupt vector: a table that contains the addresses of the interrupt
handlers for the various devices—indexed by a unique device
number
Interrupt: the mechanism of informing the CPU about completion of
a task (rather than CPU informing the completion of task)
Intruder: the person who tries to breach the security of a system
and cause harm
Inverted page table: a page tablewhich contains one entry for each
page frame of main memory— each entry consists of the virtual
address of the page stored in that page frame along with the
information about the process that owns that page
Job queue: a scheduling queue on a mass storage device such as
hard disk in which the processes entering the system for execution
are kept—also called input queue
Job scheduling: the decision to select among several jobs in the
job pool to be loaded into the main memory
Journaling: refers to the process of maintaining a log (journal) in
which the changes made to the file system are recorded in a
sequential order
Kernel-level threads: the threads implemented by the kernel—the
kernel is responsible for creating, scheduling, and managing threads
in the kernel space
Least recently used (LRU): a page replacement algorithm that has
not been referenced for the longest time is replaced—uses the
recent past behavior of the program to predict the near future
Limit register: a hardware register that holds the range of logical
addresses—each logical address of a program is checked against
this register to ensure that the program does not attempt to access
the memory address outside the allocated partition
Line discipline: an interpreter associated with each terminal device
in Linux for the data exchanged with that terminal device
Linear list: a directory-management algorithm that organizes a
directory as a collection of fixed size entries, where each entry
contains a (fixed-length) file name, a fixed structure to store the file
attributes, and pointers to the data blocks— stores file’s all attributes
at one place as a single directory entry and uses a linear search to
search a directory entry from the list of entries
Linked allocation: an allocation method in which each file is stored
as a linked list of disk blocks—the disk blocks are generally
scattered throughout the disk, and each disk block stores the
address of the next block
Linux kernel: the core of the Linux system that provides an
environment for the execution of processes—also provides various
system services to allow protected access to hardware resources
Load balancing: the process to keep the workload evenly
distributed among multiple processors in multiprocessor scheduling
Local area network (LAN): a privately owned network that is
confined to an area of few kilometers
Locality: the set of pages that are actively used together
Logic bomb: a program or portion of a program, which lies dormant
until a specific part of program logic is activated
Logical address space: the set of all logical addresses used by a
user program
Logical address: the address generated by the CPU
Logical file system: a component of the file system that manages
all the information about a file except the actual data (content of the
file)
Log-structured file system: a file system that maintains a log file
that contains the metadata and data of all the files in the file system
—whenever the data is modified or new data is written in a file, this
new data is recorded at the end of the log file
Long-term scheduler: a scheduler that selects a process from the
job queue and loads it into the main memory for execution—also
known as job scheduler or admission scheduler
LOOK scheduling: a modification of SCAN disk scheduling
algorithm in which the head does not necessarily reach the end of
disk, instead when there are no more requests in the direction in
which the head is moving it reverses its direction
Low-level disk formatting: the process of dividing all the platters of
the disk into sectors using some software—also known as physical
formatting of disk
Magic number: a sequence of bits, placed at the starting of a file to
indicate roughly the type of file—generally used by the UNIX system
to recognize the file type
Magnetic disk: a secondary storage medium that offers high
storage capacity and reliability
Mailbox: a repository of interprocess messages— also known as
port
Mailslots: a mechanism that enables one-way communication
between processes on same system or different systems connected
via network
Man-in-the-middle attack: a security attack in which an intruder
comes in the middle of the communication between two legitimate
users and pretends as the sender to the receiver and as receiver to
the sender
Masquerading: a security attack which is said to happen when an
entity impersonates another entity
Master processor: a processor in asymmetric multiprocessing
systems that controls the entire system
Master-slave (or asymmetric) multiprocessing systems: a type
of multiprocessing systems in which one processor is different from
the other processors in a way that it is dedicated to execute the
operating system and hence, known as master processor—other
processors, known as slave processors, are identical
Mean time to failure (MTTF): the amount of time for which the
system can run continuously without any failure—a measure of disk
reliability
Mean time to repair: the time taken, on an average, to restore the
failed disk or to replace it Medium-term scheduler: a scheduler that
selects a process among the partially executed or unexecuted
swapped-out processes and swaps it in the main memory—also
known as swapper
Memory access time: the time it takes to transfer a character from
memory to or from the processor
Memory management unit (MMU): a hardware device that
performs the run time address binding
Memory-mapped file: a technique that binds a file to a portion of
the virtual address space of a process—allows treating disk access
as memory access
Message passing systems: a communication model for providing
IPC in which the cooperating processes communicate by sending
and receiving messages from each other—implemented with the
help of operating system calls
Miss ratio: the ratio of number of cache misses divided by the total
number of CPU references— obtained by subtracting hit ratio from 1
Mobile operating system: an operating system specifically
designed for mobile devices including cell phones, PDAs, tablet PCs,
smart phones, and other hand-held devices
Mode bit: a bit associated with the computer hardware that indicates
the current mode of operation—the value “1” indicates the user
mode and “0” indicates the monitor mode
Monitor mode: a mode of execution reserved for the kernel—also
known as supervisor mode, system mode, kernel mode, or privileged
mode
Monitor: a programming language construct that encapsulates
shared data (or variables), procedures or functions that access these
variables, and initialization code within an abstract data type—
provides mutually exclusive access to critical sections
Multilevel feedback queue scheduling: an improved version of
multilevel queue scheduling algorithm in which processes are not
permanently assigned to queues; instead they are allowed to move
between the queues—also known as multilevel adaptive scheduling
Multilevel queue scheduling: a scheduling algorithm designed for
the environments where the processes can be categorized into
different groups on the basis of their different response time
requirements or different scheduling needs
Multiprocessor systems: systems consisting of multiple processors
which share the computer bus and even the system clock, memory,
and peripheral devices—also called parallel systems or tightly-
coupled systems
Multiprogramming: the execution of multiple jobs in an interleaved
manner
Multistage switch: consists of several stages, each containing 2x2
crossbar switches that can be connected in several ways to build a
large multistage interconnection network (MIN)
Multithreaded process: a process with multiple threads of control
Multithreading: a mode of operation that allows a process to have
multiple threads of control within the same address space—multiple
threads can execute in parallel thereby enabling the process to
perform multiple tasks at a time
Mutual exclusion: a condition in which only one process out of
several cooperating processes is allowed to manipulate the shared
data at one time
Network firewall: the most common type of firewall which divides
the network into separate security domains and controls the network
access between different security domains
Network operating system: the earliest form of operating system
used for distributed systems—is mainly responsible for providing
resource sharing among various systems connected to the network
Network topology: refers to the way a network is laid out either
physically or logically—various network topologies include bus, ring,
star, tree, mesh, and graph
Network-attached storage (NAS): a storage system designed to
separate storage resources from network and application servers in
order to simplify storage management and improve the reliability,
performance, and efficiency of the network
Non-blocking I/O system call: the system call which does not
suspend the execution of the invoking process for a long period,
rather it returns quickly with a value which indicates the number of
bytes that have been transferred
Non-preemptive scheduling: the scheduling in which once the
CPU is allocated to a process, it cannot be taken back until the
process voluntarily releases it or the process terminates—also
known as cooperative or voluntary scheduling
Non-uniform memory access (NUMA) architecture: a
multiprocessor architecture in which the system consists of a number
of nodes, where each node consists of a set of CPUs, a memory unit
and an I/O subsystem connected by a local interconnection network
Object type: a system-defined data type that consists of a set of
attributes as well as a set of methods—each instance of an object
type is referred to as an object—in Windows 2000, the kernel
supports two classes of objects, namely, control and dispatcher
objects
Operating system: a program that acts as an interface between the
computer users and the computer hardware—usually called the
kernel
Optimal page replacement: a page replacement algorithm in which
the page to be referenced in the most distant future is replaced—
requires prior knowledge of which page will be referenced next
Overlaying: a memory management scheme that allows a process
to execute irrespective of the system having insufficient physical
memory
Overlays: the small independent parts of a program—no two
overlays are required to be in main memory at the same time
Page fault frequency (PFF): an approach to prevent thrashing that
takes page-fault rate of a process into account—provides an idea of
when to increase or decrease the frame allocation
Page fault: an interrupt initiated by the MMU when the referenced
page is not found in the main memory—also known as missing page
interrupt
Page frame: a fixed-size contiguous block of physical memory
Page table: a mapping table used to perform address translation—
operating system maintains a page table for each process to keep
track of which page frame is allocated to which page
Page: a fixed-size block of logical memory having same size as that
of a page frame
Page-table base register (PTBR): a register that contains the
pointer to page table
Paging: a memory management scheme that allows a program to
be stored noncontiguously in the physical memory
Parent process: the process that spawns a new process
Partitioning: a technique to distribute data in distributed systems in
which the data is divided into several partitions (or fragments), and
each partition can be stored at different sites—also known as
fragmentation
Password: the simplest and most commonly used authentication
scheme in which each user is asked to enter a username and
password at the time of logging in into the system
Personal computer operating system: the most widely known
operating system aimed at providing a good interface to a single
user
Phishing: a form of threat that attempts to steal the sensitive data
(financial or personal) with the help of fraudulent e-mails and
messages
Physical address space: the set of all physical addresses used by
the program
Physical address: the actual address of data in main memory
Pipe: the standard communication mechanism used in Linux that
enables the transfer of data between processes—provides a means
of one-way communication between related processes
Port: a connection point through which a device is attached with the
computer system
Power on self test (POST): a set of PROM (programmable read
only memory) resident firmware programs that execute
independently of the operating system
Precedence graph: a directed graph that is used to implement
control synchronization among cooperating processes
Preemptive kernel: the kernel which allows the preemption of a
process running in kernel mode
Preemptive scheduling: the scheduling in which the CPU can be
forcibly taken back from the currently running process before its
completion and allocated to some other process
Priority inversion: a problem that occurs when a high-priority
process has to wait for the resources currently being accessed by
some low-priority process or a chain of low-priority processes
Priority-based scheduling: a scheduling algorithm in which each
process is assigned a priority and the higher priority processes are
scheduled before the lower priority processes— may be either
preemptive or non-preemptive
Privileged instructions: the machine instructions that are allowed
to be executed only in the monitor mode
Process control block (PCB): a data structure created by the
operating system for representing a process—stores descriptive
information pertaining to a process such as its state, program
counter, memory management information, information about its
scheduling, allocated resources, and accounting information that is
required to control and manage a particular process
Process control subsystem: deals with process scheduling, inter-
process communication, memory management and process
synchronization
Process migration: an approach used for accessing data in
distributed system in which the entire process or parts of it are
moved to different sites for execution—a logical extension of
computation migration
Process scheduling: the procedure of determining the next process
to be executed on the CPU
Process spawning: the task of creating a new process on the
request of some another process
Process state transition: the change in state of a process—caused
by the occurrence of some event in the system
Process state: a variable associated with each process that
indicates the nature of the current activity a process—a process may
be in new, ready, running, waiting, or terminated state at a time
Process table: a structurally organized table maintained by the
operating to keep track of all the processes in the system—includes
an entry for each process
Process: a program under execution—an executing set of machine
instructions
Processor affinity: an effort to make a process to run on the same
processor it was executed last time
Program counter: a register that contains the address of the
instruction to be executed next
Programmable interval timer: the hardware device used to trigger
the operations and to measure the elapsed time
Protection: deals with the threats caused by those users who are
not authorized to do what they are doing of the system
Pthreads: the thread extensions of POSIX standard (IEEE 1003.1c)
that can be implemented either in the kernel space or in the user
space as per the operating system’s designer choice.
Pull migration: a load balancing technique in which the idle
processor itself pulls a waiting process from a busy processor
Push migration: a load balancing technique in which the load is
balanced by periodically checking the load of each processor and
shifting the processes from the ready queues of overloaded
processors to that of less overloaded or idle processors
Queuing analysis: the performance evaluation using the queuing
theory
Quick fit: a partition selection algorithm which finds a hole of the
right size very quickly
Race condition: a situation where several processes sharing some
data execute concurrently and the result of execution depends on
the order in which the shared data is accessed by the processes
Random access memory (RAM): the only storage area that can be
directly accessed by the CPU—volatile in nature, that is, it loses its
contents when power supply is switched off
Ready queue: a scheduling queue in the main memory in which the
processes that are ready for the execution and need CPU are kept
Real-time systems: the systems in which the correctness of the
computations not only depends on the output of the computation but
also on the time at which the output is generated—has well-defined
and fixed time constraints
Record sequence: a type of file structure in which a file consists of
a sequence of fixed–length records where, arbitrary number of
records can be read from or written to a file—records cannot be
inserted or deleted in the middle of a file
Redundant arrays of independent disks (RAID): a technique to
improve the performance and reliability of secondary storage—the
basic idea behind RAID is to have a large array of small independent
disks
Relocation register: a hardware register which contains the starting
address of the partition into which the process is to be loaded
Remote method invocation (RMI): a Java-based approach that
facilitates remote communication between programs written in the
Java programming language
Remote procedure call (RPC): a communication mechanism that
allows a process to call a procedure on a remote system connected
via network
Replay attack: a security attack which involves capturing a copy of
valid data transmission between a sender and receiver and
repeating it later for malicious reasons, bringing out an unauthorized
result
Replication: a technique to distribute data in distributed systems in
which several identical copies or replicas of the data are maintained
and each replica is stored at different sites
Resource allocation graph: a directed graph used to depict a
deadlock
Response time: the time elapsed between the user initiates a
request and the system starts responding to this request
Rotational delay: the time for which the read/ write head has to wait
for the desired to come under it—also known as rotational latency
Round robin (RR) scheduling: a preemptive scheduling algorithm
which considers all the processes as equally important and treats
them in a favorable manner
Runqueue: a data structure used by the Linux scheduler to store
contains the runnable processes
Safe sequence: a sequence of process execution such that each
and every process executes till its completion
Safe state: the state of the system when allocation of resources to
the processes does not lead to a deadlock
SCAN scheduling: a disk scheduling algorithm in which the head
moves across the disk with servicing the requests at cylinders that
comes in the way
Scheduler: the module of operating system that makes the
scheduling decision
Scheduling algorithm: the algorithm used by the scheduler to carry
out the selection of a process for execution
Second chance page replacement: a refinement over FIFO page
replacement algorithm in which the page that is both the oldest as
well as unused is replaced instead of the oldest page that may be
heavily used—also referred to as clock algorithm
Secondary storage: a nonvolatile memory that can hold large
amount of data permanently
Sector slipping: a scheme used for managing bad sector in which
all the sectors following the bad sector are shifted down one place,
making the sector following the bad sector free
Sector sparing: a scheme used for managing bad sector in which
bad sector is logically replaced with one of the spare sectors in the
disk
Sector: the smallest unit of information that can be transferred
to/from the disk
Security: deals with the threats to information caused by the
outsiders (non-users)
Seek time: the time taken in positioning the read/ write head on
specific track on the disk platter
Segment base: a field in the segment table that specifies the
starting address of the segment in physical memory
Segment limit: a field in the segment table that specifies the length
of the segment
Segment table: a table maintained by the operating system to keep
track of each segment— each entry in segment table contains two
fields, namely, segment base and segment limit
Segmentation: a memory management scheme in which the entire
logical address space is considered as a collection of segments with
each segment having a number and a length
Semaphore: an integer variable that is used to provide a general-
purpose solution to critical section problem—accessed by only two
atomic operations namely wait and signal
Separate supervisor systems: a type of multiprocessor operating
systems in which the memory is divided into as many partitions as
there are CPUs, and each partition contains a copy of the operating
system
Sequential access: a file access method in which the information in
the file is accessed in order, one record after the other
Session hijacking: a security attack preceding the-man-in-the-
middle attack in which an intruder intercepts the active
communication session between two legitimate users
Shared memory systems: a communication model for providing
IPC in which a part of memory is shared among the cooperating
processes—the processes can exchange data or information by
writing to and reading from this shared memory
Shared memory: a means of communication that allows
cooperating process to pass data to each other—enables a memory
segment to be shared between two or more processes
Shell program: a sequence of shell commands stored in a text file
Shell: the portion of the operating system that acts as an interface
between the user and the operating system
Shortest job first (SJF) scheduling: a non-preemptive scheduling
algorithm that schedules the processes according to the length of
CPU burst they require—the process having the shortest CPU burst
is scheduled first
Shortest remaining time next (SRTN) scheduling: a preemptive
version of SJF scheduling algorithm that schedules the processes
according to the length of remaining CPU burst of the processes—
the process having the shortest remaining processing time is
scheduled first
Shortest seek time first (SSTF) scheduling: a disk scheduling
algorithm that suggests operating system to select the request for
cylinder which is closest to the current head position
Short-term scheduler: a scheduler that selects a process from the
ready queue and allocates CPU to it—also known as CPU scheduler
or process scheduler
Signal: the most basic communication mechanism in Linux that is
used to alert a process to the occurrence of some event such as
abnormal termination or floating point exception
Simulations: the method of evaluating algorithms that mimic the
dynamic behaviour of a real computer system over time
Slave processors: the processors in asymmetric multiprocessing
systems that either wait for the master’s instructions to perform any
task or have predefined tasks
Smart card: an authentication method in which each user is
provided with a card that is used for identification
Socket: an end-point of the communication path between two
processes—each of the communicating processes creates a socket
and these sockets are to be connected enable the communication
Soft real-time systems: the real-time systems in which the
requirements are less strict; it is not mandatory to meet the deadline
—a real-time process always gets the priority over other tasks, and
retains the priority until its completion
SPOOL: an acronym for Simultaneous Peripheral Operation On-line
Spooling: refers to storing jobs in a buffer so that CPU can be
efficiently utilized
Spyware: the small programs that install themselves on computers
to gather data secretly about the computer user without his/her
consent and report the collected data to interested users or parties
Stable storage: a disk subsystem which ensures that whenever a
write command is issued to it, the disk performs it either completely
or not at all.
Starvation: a condition in which execution of a process is delayed
for an indefinite period of time because other processes are always
given preference
STREAMS: a UNIX System V mechanism that enables
asynchronous I/O between a user and a device—provides a full-
duplex (two-way communication) connection between a user
process and the device driver of the I/O device
Super user: a user having special privileges
Swap space: an area on a secondary storage device on which all
the pages of the executing process are copied—used as an
extension to main memory
Swapping: the task of temporarily switching a process in and out of
main memory
Switch: a device that selects an appropriate path or circuit to send
the data from the source to the destination
Switching technique: determines when a connection should be set
up between a pair of processes, and for how long it should be
maintained—there are three types of switching techniques, namely,
circuit switching, message switching and packet switching
Switching: routing traffic by setting up temporary connections
between two or more network points Symmetric multiprocessing
systems: a type of multiprocessing systems in which all the
processors perform identical functions—a single copy of the
operating system is kept in the memory and is shared among all the
processors—all processors are peers and no master-slave
relationship exists between them
Synchronization: the situation in which two or more processes
coordinate their activities based on some condition in order to avoid
any inconsistency
System call: a request made by user programs to the operating
system for the management of device, directory, file, memory,
process, security, and inter-process communication
System libraries: contain all the operating-system-support code that
need not be executed in the kernel mode
System process: a process executing the system’s code
System utilities: a set of user-mode programs with each program
designed to perform an independent, specialized management task
Target thread: the thread that is to be cancelled before its
termination
Tertiary storage: the storage built from inexpensive disks and tape
drives that useremovable media—also known as tertiary memory
Thrashing: a phenomenon in which the system is mostly busy in
performing paging (page-out, page-in) rather than executing the
processes— results in poor system performance as no productive
work is being performed
Thread cancellation: the procedure of terminating a thread before it
completes its execution
Thread library: a library which provides the programmers with an
application programming interface (API) for thread creation and
management
Thread: the fundamental unit of CPU utilization— has its own its
own ID, stack, set of registers, and program counter
Throughput: the total number of processes that a system can
execute per unit of time—depends on the average length of the
processes to be executed
Time slice: the fixed amount of CPU time that is given to each
process for its execution in timesharing systems—also known as
time quantum
Time-sharing systems: an extension of multiprogrammed systems
in which multiple users are allowed to interact with the system
through their terminals—also called multitasking systems
Tracks: the concentric circles on the magnetic disk where the data is
stored—numbered from the outermost to the innermost ring, starting
with zero
Translation look-aside buffer (TLB): a special hardware device
present inside the MMU that contains a limited number of page table
entries
Transparency: a method of hiding the details from the user and
showing them only the required details
Trap doors: the security holes left by the insiders in the software
purposely—also known as backdoors
Tree structure: a type of file structure in which a file consists of a
tree of disk blocks where, each block holds a number of records of
varied lengths
Trojan horse: a program designed to damage files by allowing
hacker to access your system—it enters into a computer system
through an e-mail or free programs that have been downloaded from
the Internet
Trusted system: a computer and its operating system which can be
relied upon to a determined level to implement a given security
policy
Turnaround time: the amount of time that has rolled by from the
time of creation to the termination of a process—the difference
between the time a process enters the system and the time it exits
from the system
Uniform memory access (UMA) architecture: a multiprocessor
architecture in which all the processors share the physical memory
uniformly, that is, time taken to access a memory location is
independent of its position relative to the processor
UNIX: an open source operating system that is available on wide
range of different hardware
Unsafe state: the state of the system when there exists no safe
sequence of process execution—may lead to a deadlock
User mode: a mode of execution in which execution is being done
on behalf of the user
User process: a process executing the user’s code
User-level threads: the threads implemented by a thread library
associated with the code of a process—the thread library provides
support for creating, scheduling, and managing threads in the user
space without any involvement from the kernel
Variable-length spanned blocking: a method of record
blockingwhich accommodates variable-length records into blocks—
the last record in a block may span to the next block if the length of
the record is larger than the space left in the current block
Variable-length unspanned blocking: a method of record
blockingthat uses variable-length records without spanning
Virtual file system (VFS): a file system offered by Linux that hides
the differences among the various file systems from the processes
and applications—it is a layer of software between the process and
file system
Virtual machine operating system (VMOS): the operating system
that creates several virtual machines by partitioning the resources of
the real machine
Virtual machine: the identical copy of the bare hardware including
CPU, disks, I/O devices, interrupts, and so on—allows each user to
run operating system or software packages of his own choice on a
single machine thereby creating an illusion that each user has its
own machine
Virtual memory: a memory management technique, which gives the
illusion that the system has much larger memory than actually
available memory
Virus: a program designed to replicate, attach to other programs,
and perform unsolicited and malicious actions—it executes when an
infected program is executed
Wait-for graph: a graph that shows the dependency of a process on
another process for the resource allocation
Waiting time: the time used up by a process while waiting in the
ready queue—does not take into account the execution time or time
consumed for I/O
Wide area network (WAN): a network that spreads over a large
geographical area like a country or a continent—much bigger than a
LAN and interconnects various LANs
Wildcard characters: are basically used in pattern matching—‘*’
and ‘?’ characters generate one or more filenames which become
part of the effective command
Working set manager: a kernel thread in Windows 2000 dedicated
to handle page faults—it keeps track of the working set of processes
Working set: the set of pages that a process has referenced in the
latest n page references—helps the operating system to decide how
many frames should be allocated to a process
Worms: the malicious programs constructed to infiltrate into the
legitimate data processing programs and alter or destroy the data
Worst fit: a partition selection algorithm in which the operating
system scans the free-storage list and allocates the largest hole to
the process
Zombie process: a process that no longer exists but still its PCB
has not been removed from the process table

You might also like