To Operating: Systems

INTRODUCTION
TO OPERATING
SYSTEMS
Prof. Chester Rebeiro

Computer Science & Engineering
Indian Institute of Technology, Madras
INDEX
S.No Topic Page No.

Week 1
1 Intro to the Course 01
2 Introduction to OS 03
3 PC Hardware 16
4 From Programs to Processes 29
5 Sharing the CPU 43
Week 2
6 Introduction 60
7 Virtual Memory 72
8 MMU Mapping 82
9 Segmentation 95
10 Memory Management in xv6 108
11 PC Booting 127
Week 3
12 Week 3 Introduction 138
Create Execute and Exit from
13
Processes 150
14 System Calls for Process Management 173
Week 4
15 Interrupts 198
16 Interrupt Handling 212
17 Software Interrupts and System calls 228
18 CPU Context switching 239
Week 5
19 CPU Scheduling 253
20 Priority Based Scheduling Algorithms 277
21 Multi-Processor Scheduling 290
22 Scheduling in Linux 298
23 Completely Fair Scheduling 316
Week 6
24 Inter Process Communication 327
25 Synchronization 342
26 Software solutions for critical sections 352
27 Bakery Algorithm 367
28 Hardware Locks 381
29 Mutexes 398
30 Semaphores 405
Week 7
31 Dining Philosophers Problem 418
32 Deadlocks 429
33 Dealing with Deadlocks 451
34 Threads Part 1 465
35 Threads Part 2 481
Week 8
36 Operating system security 490
37 Information Flow policies 512
38 Buffer Overflows 527
39 Preventing Buffer Overflow Attacks 546
Introduction to Operating Systems
Department of Computer Science and Engineering
Lecture – 01
Introductory to the Course
Hello, and welcome to the course on an Introduction to Operating Systems. So, this is a 8
week NPTEL Course, which is mostly targeted for Under Graduate Computer Science
and Engineering students, Electrical and Electronic students as well as MSc Computer
Science students may also benefit from this course.
The prerequisites for the course is a very good and strong understanding of the C
Programming language. Especially, we will be using a lot of pointers in particular
function pointers as well as data structures, such as linked list and trees. Also important
is a good understanding of Computer Organization and Architecture, especially the way
memory is organized and managed in the computer system.
This course will be graded there will be a series of assignments every week and an end
semester examination as well. So, why do we have to study operating systems? The OS
or Operating Systems as such plays a very crucial role in any computer. It forms one of
the most essential parts of the system. It is used to interface between the computer
hardware and the applications that it uses. So, in its heart the OS is just another software
program that we write, however it differs quite considerably from standard programs.
For instance, in an operating system code you will find a lot of assembly coding that is
present, so this is not typical of any standard code that we write. In addition to this you
will see that in the operating system code you would have many parts of the code which
communicate directly with devices. And by devices we mean the hardware devices like,
the keyboard, network and so on. In standard programs or applications that we write we
do not do such direct communication with the hardware devices.
Another important aspect when we look at operating systems code, is that time plays a
very crucial role in the operating system. So everything as such would depend on time.
1
Operating system in general as such is event-driven and gets triggered to execute only
when events occur. As a consequence of this, debugging an operating system becomes
extremely difficult. Additionally a large part of the operating systems code is very
specific to the micro processor which it runs on; a lot of features of the processor are
utilized in the operating system.
So, in this course we will study keeping the Intel x86 processor in mind. Thus, a lot of
the operating system concepts that we study in this course will pertain directly to the
Intel processor family. However, this being said the concepts that we study in this course
are applicable to most other processors as well.
In this course we will look into various aspects about the operating system. Essentially,
we will analyze the various intricacies in the operating system design as well as the trade
offs with respect to performance and so on. So, welcome to the course.
2
Week – 01
Lecture - 01
Operating Systems (An Introduction)
Hello and welcome to the first weeks lecture for the course An Introduction to Operating
Systems. This week we will be building a platform for the course upon which the
following weeks lectures will depend upon. In this particular lecture, we will give a very
brief introduction to OS. In particular, we will look at where the operating system
actually fits in the entire computer and also we will see what is the essential role of the
operating systems in the computer.
(Refer Slide Time: 00:57)
So, when we look at computer systems, we can think of it in different layers. So, right at
the bottom layer are millions and billions of Transistors. So, these Transistors are
typically CMOS Transistors and these Transistors are then composed together to build
several logical Gates and these Gates would include several digital logical Gates, such as
the AND gate, OR, XOR and so on. Included in this particular layer are various things
like the Memory cells, Flip Flops, Registers and so on.
3
Now, all these basic VLSI units are then organized into various forms to build things like
the Memory, RAM, the Decode unit, the Instruction fetch unit and so on. So, this
Organization (in the slide above) which is the third layer in the computer system is what
we actually see as the Hardware. So, this is the Physical Hardware that we actually
purchase from the store. Now, when we purchase this Computer Hardware, we could
execute several different Applications. For instance, we could have Office, Applications
or Internet Explorers and so on.
Now, sitting between these Applications as well as the Hardware is the Operating
Systems. So, the Operating System essentially would manage both the Applications that
execute on the Computer as well as it would manage how the Resources are utilized in
the system. So, let us see more in detail how the Operating Systems are used in the entire
scheme.
To look at it broadly, the Operating System is used for two things. It first provides
Hardware Abstraction and second manages Resources in the system. The Hardware
Abstraction essentially is used in order to turn the Hardware into something that
Software Applications can utilize easily, while the Resource Management is required
because of the limited resources that are present in the Computer.
4
So, we will look at these two uses of the Operating System in more detail.
Let us start with this very simple program. So, this is a program written in the C
language and essentially it is going to print the string “Hello World” on to the monitor.
So, this is a very simple program and essentially the string “Hello World” is stored in
memory and it is pointed to by this pointer str[ ]. Now, printf (mentioned in above slide)
is passed this pointer str and would result in the string being printed on to the monitor.
Now, the question which comes is how exactly is the string displayed on to the monitor?
What is the process involved to get the string, which is stored in memory to be displayed
on to the monitor?
5
So, in fact when we look at the entire scheme, the string “Hello World” would be present
in some memory location in the Main Memory which is also called as the RAM. Now,
there are certain Instructions in the Processor would read the String byte by byte from the
Main Memory into Registers and then copy them on to something known as a Video
Buffer. Along with copying the String “Hello World” to the Video Buffer, other attributes
are added. For instance, things like the color of the string to be displayed, the x y
coordinates on the monitor where the “Hello World” string needs to be displayed and
other monitor specific attributes such as the depth and so on.
Now, this string which is copied to the Video Buffer is then read by the Graphics Card
which would then display it on the Monitor. So, this is the entire process of displaying a
string “Hello World” on to a Monitor. So, now you would see that doing this is not
trivial. It is in fact quite complex as well as tedious. Imagine that every program that you
write would require to do all these things like knowing where in the memory the “Hello
World” is stored and then, how to actually display it on to the Monitor, how to compute
the coordinates, how to specify the color, the depth and other attributes and how exactly
to pass this information to the Graphics Card and so on.
Another aspect is that this is extremely hardware dependent. Any change in one of these
6
things. For example, if the Processor changes or if the Graphics Card or Monitor changes
then it is quite likely that the Program will not work. For example, if the Monitor
changes, then there may be certain attributes which need to be specified differently for
the new Monitor or if the Graphics Card changes, then perhaps the way the coordinates
are set where the depth and color are set for this string “Hello World” would differ.
Therefore, without the Operating System, every programmer would need to know about
the nitty-gritty details about the Hardware. Essentially he would need to know what
Hardware is used in the System and also, how the various aspects about the Hardware fit
with each other.
So, Operating Systems essentially provide Abstraction. They sit in between the
Applications that we write and the Hardware, and abstract all the nitty-gritty details of
the Hardware from the Applications. So, a simple statement such as this printf(“%s”, str)
would eventually reside in something known as a system call which would trigger the
Operating System to execute. The OS will then manage how the String str will be
displayed on to the Monitor. Everything such as how the color needs to be set, how the x
y coordinates need to be set and so on would be done internally by the Operating System.
So, from an Application perspective and from a programmer perspective, all these details
are abstracted out.
7
So, as you would see this would make writing programs extremely easy. So, the
programmer need not know the nitty-gritty details about the Hardware any more. A
second advantage is the Reusable functionality. Essentially Applications can reuse the
Operating System functionality. So, what we mean by this is that since the OS abstracts
the Hardware details from the Applications, all Applications that execute in this system
could just reuse the Operating Systems features.
For example, every Application that uses printf will be invoking the Operating System
and the OS will then take care of communicating with the Hardware. There is the single
module in the Operating System which handles all printf's from all Applications
executing on the System. A third advantage is the Portability. Essentially what this means
is that when we write a program which uses something like a printf statement, we do not
really bother about what Hardware it runs on. It could run in a Desktop for instance or a
Server or a Laptop or if complied appropriately, also in several embedded devices.
So, the underlying Operating Systems actually would then distinguish between the
various Hardware, that is present and with then, ensure that printf would be executed
appropriately depending on the Hardware. So, what we achieve with Portability is that
essentially Applications will not change even though the underlying Hardware changes.
8
The second use of the Operating System is as a Resource Manager. In the Desktops and
Laptops that we use today, there are several applications which run almost concurrently.
For instance, we would be using a web browser to browse the internet and almost at the
same time, we would be compiling some of the programs that we are written and are also
executing them or we could be using Skype or a Powerpoint application at almost the
same time.
Now, the fact is that we could have several applications running on a system, but the
underlying hardware is constrained. Essentially, we have just one hardware which needs
to cater to several applications almost concurrently. So, the Operating System ensures
that it is feasible for multiple applications to share the same hardware.
Now, within this Computer hardware, there are several components which the Operating
System manages. For example, the CPU, the memory, network, the secondary storage
devices like the hard disk, the monitors and so on. So, all these components within your
computer have a restricted amount. So, you may typically have around 2 or 4 or 8 CPUs
present in your system, also the memory may be restricted to 4 or 8 GB. You have
typically one network card or one or two hard disk and so on.
9
So, the Operating System needs to manage all these various devices and components
present in the system and share these components among several applications almost
concurrently. So, with the help of the Operating Systems, it would be possible that
multiple applications share this limited resources and also, the OS is built in such a way
that applications are protected from each other. Essentially the underlying where the
Operating System is designed is such that every component in the system is adequately
utilized.
To take an example let us consider the CPU. So, systems typically would have one or
two CPUs and multiple applications executing on that CPU. So, how does the Operating
System share the single CPU among multiple applications? So, one way which was
typically done in the earlier operating systems around the late 70’s and early 80’s was to
allow one application to execute on the CPU till it completes and then, start the next
application. For example, Application 1 is made to execute on the processor and after
Application 1 completes its execution, only then Application 2 is made to start.
Now, this scheduling of the various applications on the processor is managed by the OS.
Now, this scheme of sequentially executing one App after the other completes although
very simple to implement in the operating system, it is not the most efficient way to do
10
things. So, as we will see in a later video, we will see what are the issues with having
Applications execute in this particular way and we will see how modern day Operating
Systems manage to utilize this processor in a much more efficient manner.
An other important component in the computer system that requires to be shared among
the various applications almost concurrently is the Main Memory. Now, in order to
execute each of these applications needs to be present in the Main Memory that is the
RAM of the system. The Operating System needs to ensure how this limited RAM
resource is shared among the various applications executing on the system.
11
Now, what makes it difficult for the Operating System is that in spite of sharing the
limited hardware resources among the various applications which are executing on the
computer system, the sharing should be done in such a way so that applications are
isolated from each other. So, each application the OS should ensure runs in a sand boxed
environment that is Application1 should not know anything about Application 2,
Application 2 should not know anything about the other applications running on the
system and so on.
So, why is this isolation required? To take an example of why isolation is required, let us
consider that Application 1 is a web browser in which you are doing a banking
transaction. For example, you are entering your passwords and credit card details in the
web browser which is executing as Application 1.
On the other hand, Application 2 may be a Gaming software and let us assume that it is
having a virus that is it is a malicious application. Now, assume that we do not have
Isolation. Application 2 would be able to determine what application 1 is doing or in
other words, the malicious application will be able to determine what is happening in the
web browser and therefore, may be able to steal certain sensitive information such as
your passwords and your credit card numbers. Therefore, the Operating System should
12
ensure that all these applications are isolated from each other and as we can see this is
not a very easy thing to do.
Operating systems are ubiquitous.. Almost every smart device that we use today has
some form of Operating System present in it. So, this particular slide (mentioned above)
shows the various classifications of operating systems depending on the type of
application they are intended for. Needless to say each of these operating systems are
designed keeping the Application in mind.
For example, the embedded OS such as Contiki operating system or Contiki OS are
designed for memory constraint environments, such as wireless sensor nodes that are
used in the internet of things. Operating System like the Contiki OS are designed with
the power consumption kept in mind. So, these operating systems manage the various
resources and also abstract hardware in such a way that the power consumed by the
entire system is kept minimum. A second class of Operating Systems which many of you
may be familiar with is the mobile operating system or mobile OS.
So, examples of these are the Android, iOS, Ubuntu Touch and Windows Touch. So,
these operating systems like the Embedded OS are designed in order to ensure that the
13
power consumed by the device is kept minimum. So for example, these operating
systems are designed, so that the battery charge of your mobile phone is extended for the
maximum amount of time. So, the mobile OS’s like the once we mentioned over here has
quite similarities in this aspect with the embedded operating systems like the Contiki OS.
However, mobile operating systems unlike the Embedded OS’s also need to take care of
certain special devices. For instance the monitors or the LCD screens and key pads. In
other words, the mobile OS also has user interaction which typically is not present in the
Embedded OS.
A third type of operating system is the RTOS or Real Time Operating Systems. So,
examples of these are QNX, VxWorks and RT Linux. So, these operating systems are
used in machine critical applications where time is very important. For example, in
several machine critical applications like rockets, automobiles or nuclear plants, a delay
in doing a particular operation by the computer system would be catastrophic. So, these
RTOS’s are designed in such a way that every critical operation on the system is
guaranteed to complete within the specified time.
Another classification of operating systems are those used in Secure Environments. So,
examples of these are SeLinux and SeL4. So, these operating systems are especially
utilized for applications where security is extremely critical.
So, these for example could be for web servers that host banking software and so on. So,
the other classes of operating systems which you are quite familiar with are for those
used for Servers and Desktops, such as the Redhat, Ubuntu and Windows Server OS's
while desktop operating systems are for example Mac, Windows and Ubuntu. So, while
these two operating systems have several features which are similar, there may be certain
differences in the way the OS manages the various applications running on it.
14
The operating system that we will be studying for this course is the xv6 OS which is
designed by MIT specifically for teaching purposes. So, the xv6 OS is small, well
documented and easy to understand. Further, xv6 is designed to look similar to UNIX
(version 6). So, what this means is that the way xv6 is designed is how various UNIX
like operating systems like Linux actually works. Therefore, understanding xv6 would
give you a nice insight about other modern day operating systems such as Linux.
Thank you.
15
Week – 01
Lecture – 02
PC Hardware
Hello. A lot of how the operating system works depends on the underlined hardware.
Essentially, as we seen in the previous video, the Operating System is in charge of
managing the hardware. So, before we go any further with, how an operating system
works and how the OS functions? Let us take a brief background about the PC hardware.
Essentially we will look at how a computer hardware is designed.
So, we know that the heart of any system is the CPU; and the CPU is interfaced to
several devices such as the VGA card, a hard disk, a keyboard, RAM, mouse and so on.
Now, in order that all these devices work with a single processor - what is required, is
addresses.
16
Essentially, each device in the system would have a unique address. And it is ensured
that no two devices in the system would have the same address. For instance, we would
have the hard disk which is present from the address range 0x1f0 to 0x1f7. So, what this
means is that when the processor sends out an address on its address Bus, the hard disk
would identify that the address is within this particular range and therefore, it will
respond.
All other devices will then ignore that particular transfer on the processor Bus, because it
does not fall in its particular address range. For example, if the processor puts out the
address 1f2, then only the hard disk will respond. Because 0x1f2 falls within this
particular range of the hard disk. So, if you look at something else like the mouse, which
has address range of 0x60 to 0x6f, it is not going to respond to the processor’s request.
17
Now in systems, there are 3 types of addressing which are generally used. One is called
the Memory Addressing, next is the IO Addresses, and the third is the Memory Mapped
IO Addresses. So, we will look at each of these Types of Addresses more in detail.
So, Let us start with the memory addresses. So, the memory addresses and most systems
18
would have a large amount of such memory addresses correspond to addressing the
RAM. Each memory unit in the RAM is given a unique address. For instance, a RAM in
a typical Intel system with 32 bits could be of size at most 2 32 that is a Intel system with a
32 bit processor could have at most 232or 4 GB of RAM. So, each and every memory unit
in this RAM has a unique address. So, this memory unit could be as small as a single
byte, or in some systems it could be a word that is it could be of 16 bits or 32 bits.
Now, how is this RAM used? So, essentially the RAM is configured in this particular
way as shown in the figure above, and this configuration is especially for IBM PC
compatible Intel machines. So, this RAM as I mentioned has addresses to access each
part of the RAM and the RAM could be as large as 4 GB. The addresses from 0x0 to 640
KB represented in hexadecimal as A0000 is known as the low memory. So, this
particular memory was used in legacy computers like the 8086, 8286 and so on so.
So, the old operating systems like MS DOS would use this particular low memory. So,
above this low memory from 640 KB to 768 KB, all addresses would pertain to the VGA
display. So, again this is an legacy issue, where this particular area in the memory would
correspond to the VGA display. This means to say that any ASCII characters placed in
this particular area would then be taken by the video card and display it on to the screen.
The region from 768 KB to 960 KB was known as the 16 bit expansion ROMs, so these
also were legacy aspects present in legacy computers. And it is pertaining to every device
that is used. Essentially, in the previous generations of computers in the early 80’s to
early 90’s, the devices that were attached to the computer could have what was known as
expansion ROMs, ROMs are the read only memory. And these ROMs could be accessed
or rather these ROMs could be addressed within this particular memory region. Now
what is important for us and what we still use in the present Intel systems that we use for
desktops and laptops is this particular region from 960 KB to 1 MB. So, this was where
the BIOS reside.
So, as many of you would know, BIOS is the basic input output system; and it is a read
only memory which is present on your system. And it ensures that your system boots
correctly and it also ensures that the operating system gets loaded. So, the area of the
19
RAM, essentially all memory addresses below 960 KB is typically not used in modern
day systems. While the area from 960 to 1 MB that is 960 KB to 1 MB is used by the
BIOS and only used during the booting of the system. What is actually used in modern
day system is the RAM above this 1 MB region that is starting from what was known as
the extended memory. And this extended memory could go as far as the amount of RAM
is present in the system.
For instance, if you have 4 GB of RAM or 8 or 16 GB of RAM then this extended

memory will extend up till that particular region. The parts or the addresses above this
RAM area will be generally unused. So, as we will see in the later classes, in this
particular course, we will see how the operating system loads itself starting from this 1
MB region. We would also see how the BIOS is going to be used to boot the operating
system. Now, as we know this particular RAM is also used to contain various aspects of
the applications that we execute. For instance, the code segment or the instructions
present in the program that you execute, the heap, the stack as well as the operating
system reside in this particular memory. Essentially all of them will reside above this
extended memory part.
The next type of addressing is the IO addresses. Essentially in legacy computers like the
20
8086 and the 8088 and 8286, there was a separate memory region for devices. IO devices
like keyboards, mice, hard disks and other programmable devices like a programmable
interrupt controller could be mapped. So, unlike the previous addressing that we seen
that is the memory addressing, so the IO addressing could be from 0 to just 2 16 - 1. So,
this is roughly around 64 KB and the IBM PC standard would define what the use of
certain addresses are.
So, the IBM PC standard is a standard which was followed to develop the systems like
especially the desktop systems and certain servers and these standard mentioned what
devices should be present in what IO addresses. For instance, it mention that the
keyboard should be present between 60 to 6F. So, the IO address - 60 to 6F.
In a similar way, the DMA controller would be present from C0 to DF. So, the other
things like SCSI or a hard disk - a primary hard disk present in your system will be
present from 1f0 to 1f7. So, in this way, several devices had very specific addresses in
the system. So, what is the use of having such a specification is that we obtain
compatibility. So, even now when you boot your laptop or your desktop PC, which is an
Intel system or an AMD system, the BIOS will expect several things like it will expect
that there is a keyboard which is connected to your system and the keyboard is present or
rather the keyboard is accessible from the address range 60 to 6F. Similarly, it would
expect other things like a programmable interrupt controller to be present from the IO
addresses 20 to 3F.
Essentially, what is maintained even in the modern systems is the backward

compatibility to the old computers that we used. So, the memory addressing as we are
specified over here, even though a lot of it is legacy and was used quite a bit in the early
80's and 90's, even now since Intel and IBM followed the backward compatibility. A lot
of the systems that we use for our laptops and desktops still follow this particular
hierarchy. Now one problem, which was noticed with this type of IO addressing, was the
limited amount of address range that was permitted. So, for instance over here we had at
most 64 KB of addresses that could be present. So, this means that the number of devices
that could be connected to this system through the IO addressing was limited.
21
In order to extend this, what was used was known as the memory mapped IO. In this
particular memory mapped IO, hardware devices would then connect or would be then
mapped into regions in the memory addressing itself. So, as we have seen before, the
RAM or the random access memory of the system would be addressed from the location
0 and extend upwards until the end of the RAM.
So, typically in the systems of the early and late 90’s, we would have something like 256
MB, 512 MB or at most 1 GB of RAM. So, the space above this was unused. So, what
devices would do, is they would map a certain region in this particular part of memory
into their devices. What this means is that when the processor generates an address in
this upper region (marked with red circle) of memory that is a high address in this upper
region of memory then that particular address would be directed to the specific device
and not to the RAM.
22
So, we have seen various address ranges that have been allocated to RAM such as the
low memory region, the location where the BIOS should be present, the location where
various devices like the keyboard, hard disk and so on should be present. Now who
decides these particular address ranges? Essentially these address ranges have been
decided by standards and a lot is impacted by legacy computers of the 80’s and 90’s. So,
one very famous standard was the IBM PC standard. Essentially, this particular standard
was fixed for all PCs, that is all personal computers right up from the 80’s would be
compatible with IBM PCs standard.
Again even many of the systems that we use today that is our Intel and AMD laptops as
well as desktops are backward compatible to this IBM PC standard. So, the reason for
this IBM PC standard or for that matter why a standard was required, was to ensure that
the BIOS and operating system is portable across platforms.
So, for instance if I would write up an operating system today, my operating system
would know exactly where the keyboard should be present irrespective to one of what
IBM PC compatible system it is going to be loaded on. So, I could make a lot of
assumptions about where various devices are present in the system. So, having such a
standard would essentially ensure that the software that we write especially the software
23
that interacts with the hardware such as the BIOS and the operating system will be
portable across several different platforms.
The other way in which addresses are decided upon is by something known as plug and
play. So, plug and play devices do not have any fixed address location as we have seen
for things like the keyboard or VGA, memory and so on. But rather, when the system
boots and when the BIOS begins to execute, the BIOS will then detect the presence of
some particular hardware in the system. For instance, the BIOS would detect that the
system has a network card present; it would then ask the network card as to how much
memory it requires and what type of memory it requires, whether it is a IO memory or a
memory address. So, based on this, the BIOS will then allocate a portion of memory for
that particular network card. So, the allocation would be fixed for each boot of the PC,
but on the other hand the location in the address map would vary across each boot.
So, this particular slide here shows how a typical PC is organized. So, we may have
multiple processors present in the system. So, this could be either 1 processor, 2
processor, 3 or 4 and so on. So, it could also be possible that each of these processors
have multiple cores inside them; and each core has 2 or 4 threads. Now all these
processors are connected to each other through what is known as the front side Bus. So,
24
there is always a sharing between all processors with respect to the front side Bus. Now
an important device which connects to the front side Bus is known as the North Bridge
or the chip set. So, the North Bridge would interface with the memory through what is
known as the memory Bus; and it also interfaces with the PCI Bus.
So, on the PCI Bus you could have several devices like the Ethernet controller, USB
controller. And the USB controller could have many USB devices, which you see at the
front or the back of your desktop. Now as you would notice over here, the USB devices
are connected in a tree like structure.
Now the PCI Bus also has a hierarchy type of structure; and each Bus for instance, in this
case, PCI Bus 0 which is the closest to the north bridge. And you have in this way a PCI
Bus 1, which is connected to the north bridge through the PCI Bus 0. So, the interface
between Bus 0 and Bus 1 is through a PCI to PCI Bridge. In addition to this, there is
what is known as the south bridge and this is south bridge interfaces with the PCI Bus or
you could have a special protocol or a special connection with the north bridge which is
known as the DMI Bus. So, in the south bridge, various legacy devices like the PS 2
devices, keyboard, mouse, PC speaker, and so on are connected.
25
So, Let us start with how x86 or the Intel x86 processor evolved. So, it all started with
the 8088 processor or the 8088 processor as it was generally known as. So, this was a 16
bit microprocessor which had a 20 bit external address Bus. And therefore, could address
up to 1 MB of memory. So, how did we get this 1 MB is essentially 2 20, so that 220 is 1
MB. So, the registers within the 8088 were 16 bit; and it could be divided into various
types such as the general purpose registers that is the AX, BX, CX and DX. We had
pointer registers which were used to point to string which are stored in memory, so these
were the base pointer, stack index, destination index and the stack pointer. So, as we
know that the base pointer is used to point to a frame present in the stack, where the
stack pointer is used to point to the bottom of the stack.
Then we have the instruction pointer, which points to the instruction that is being
executed. And you had several segment registers such as the code segment, stack
segment, DS and ES segment registers. Now, in order to load or store an instruction or
data in memory what the 8088 processor would do was, it could take the segment base
that is one of the segment registers which forms the base, it could left shift it by four and
add an offset.
For instance, in order to fetch one instruction from the memory into the processor, the
CS register would be used. So, CS register would be the base for the code segment. It
would be left shift it by 4. And the IP of the instruction pointer would be added to that. In
this way, even though there are the registers used are 16 bit, so each of this registers CS
and IP are 16 bit, still by shifting it by 4 bits and adding the IP. It was possible to address
220 memory locations.
26
One big step in the Intel systems came when the 80386 was invented in 1995, so these
are 32 bit microprocessors. So, this particular processor had a 32 bit external address
Bus. So, therefore, it could address 232 memory addresses. So, this was 2 32 is 4 GB of
memory. So, what this means is that a system could have up to 4 GB of RAM present in
it. Now the same amount of registers which were present in 8088 are also present here
except that now it is extended from 16 bit to 32 bit, and because of this extension the
registers are called EAX, EBX, ECX, EDX, EBP, ESI, EDI and so on.
There were also a lot more features like the protected operating mode as well as virtual
and segmentation schemes, which were present. So, one thing you would notice is that
while all the registers were extended to 32. The segment registers continued to be 16 bit.
Now Intel ensured that even though we switched from a 16 bit processor to a 32 bit
processor, still backward compatibility with the old 8088 systems was maintained. This
meant that any software which was written in an 8088 or 8086 processor would straight
away run in an 80386 without much of a problem.
To ensure this, even though the registers were extended from 16 bit to 32 bit still
programs could access the registers as if they were 16 bits only available. For example,
the AX, BX, CX, DX would access the lower half of the extended registers that is
27
registers from 0 register with 0 to the bit 15. So, this particular feature in the 80386
processor was taken forward to the 486 Pentium and so on.
Now more recently in 2003, there was the next big step in the x86 Evolutions. So, this
was when the AMD k8 was introduced. So, this k8 moved from a 32 bit processor to a 64
bit processor. As a result, the registers which were known as the extended registers and
were of 32 bit, where now extended further to a 64 bits. So, the EAX register which was
previously called in the 32 bit platforms like the 80386 was now called the RAX register,
which essentially was for 64 bit. So, in this way other registers were also extended to 64
bits. So, in spite of extending from 32 to 64, the companies Intel as well as AMD ensured
backward compatibility that is a program written in 8088 would still under certain
circumstances will be able to run in the 64 bit platforms as well.
Thank you.
28
Week - 01
Lecture - 03
From Programs to Processes
Hello. In today's class we will be having a very brief introduction to an operating concept
called Processes.
So, the topics which we will cover, is from Programs to Processes, Memory Maps, a
System Calls, Files and essentially the structure of the Operating System.
29
Consider this particular program written in C. So, this program prints "Hello World" on
to the screen. In order to compile and run this program, we first need to use a compiler
such as gcc and specify the c code name such as hello.c in this case and what we will get
is an executable, in this case it is called a.out. This executable is stored on the hard disk.
In order to run this particular program we specify a command like ./a.out and it results in
a process being created in the RAM. So, this process is essentially the - a.out program
under execution, which is present in the RAM.
30
To define it formally, a process is a program under execution which is executed from

RAM and essentially comprises of various sections, such as the Executable instructions,
Stack, Heap and also a hidden section known as the State. So, this state is actually
maintained by the operating system and contains various things like the registers, list of
open files, process, list of related processes etc..
So, in today's class we will look more into detail about what this particular process
contains and how it is managed.
31
So, Let us take a very simple example. So, this is a program and this is a process that is
created when this program is executed. Now the process has various sections for
example, it has the Text, Data, Heap and the Stack. Now various parts of this program
when executed get mapped into the various sections of the process. For instance, all the
instruction such as the instructions involved in the function main will get mapped into
the text section of the process. Similarly, other functions such as the fact() function
(mentioned in above slide) the instructions involved in this will also get mapped into the
text section.
Now the global data and also static data gets mapped into the data section of the process.
So, this section is actually divided into two parts where called as initialized and non-
initialized sections. Third section is the heap, now any dynamically allocated memory
such as m (mentioned in above slide) which is dynamically allocated using malloc gets
created in the heap. Now the final section is called the stack, which contains all the local
variables such as n and m and also information about function invocation. For example,
in this case we have a recursive function which is getting invoked. So, all this
information is present in the stack.
Now, the memory map of a process comprising all of the sections has a maximum limit
32
called the MAX SIZE. So, typically at least in processes which are used in typical
operating systems these days, this MAX SIZE is going to be fixed by the OS. For
instance in a 32 bit Linux operating system, the MAX SIZE of every process is fixed at
0xc0000000. In the xv6 operating system which we are looking at for this course, the
MAX SIZE of a process is fixed at the address 0x80000000.
So, what we had seen is that if every process, a program that is a program under
execution gets mapped into an area which starts at 0 and ends at MAX SIZE. So, what is
present beyond this MAX SIZE of the process? So, typically the Kernel or the operating
system gets mapped to the memory region from the MAX SIZE to the maximum limit.
The entire thing like Text i.e the instructions of the operating system, operating system
data, the OS heap and also device memory gets mapped into this upper region of the OS.
So, typically any user program could access any part of this lower region i.e the green
region (mentioned in above slide). So, there would not be any problem to actually read
data from any of these user space regions or even write data to parts of these user space
regions. But, however, the process cannot access any data present in the Kernel memory
that is beyond the MAX SIZE limit. However, the Kernel or the operating system which
is executing from this upper region can access data from any part of the region that is it
33
could execute or access data from in this kernel region as well as in the user space
region.
Now what happens when we actually have multiple processes running in the same
system? So, each process would have its own memory map, we having its own
instructions, data, heap and stack, and also the kernel component is also present beyond
the MAX SIZE. So, what you see is that every process in the system would have the
kernel starting at MAX SIZE and extending beyond. Only the lower parts and this kernel
part is going to be same for every process that is executing in the system. Below this
MAX SIZE is going to vary from one process to another.
So, what does this mean? So, what it means is that when you execute one process and
then executing another process, the regions above the MAX SIZE is going to be similar,
while the regions below the MAX SIZE is going to change from one process to another.
So, we mention that user programs will not be able to access any data in the kernel space.
So, in that case how does the user program actually invoke the operating system?
So, there are special invocation functions which the operating system support these are
known as System Calls. So a System Calls are a set of functions which the OS support
34
and a User Process could invoke any of the system calls to get information or to access
hardware for other resources within the Kernel.
So, what happens when a system call is invoked; is that a process which is generally
running in a user mode gets shifted to something known as a kernel mode or a privilege
mode which will allow the kernel or the operating system to actually execute. When the
system call completes execution then the user process will resume its execution from
where it actually stopped.
So Let us take an example of the printf statement. So, printf in fact is a library call. So, it
is present in this libc and it results in particular function in the User space known as
write() to be invoked and printf() function will then invoke a system call called write,
with the parameter call STDOUT. So, STDOUT is a special parameter which essentially
tells the operating system that this string provided by printf should be displayed on to the
standard output that is the screen.
So, the write is the system call which causes a trap to be triggered, and this trap will
result in something known as a Trap Handler in the Kernel space to be executed and the
Trap Handler would then invoke a function which will correspond to the write system
35
call. So, this write system call will then be responsible for actually printing the string
provided by str on to the screen. After the write system call completes, then the execution
is transferred back to the user space and the process continues to execute.
What is the difference between a System Call verses a Standard Function Call or
Procedure Call. So, one important difference is that, when we want to invoke a function
in a program or in a process we use an instruction such as a CALL instruction this is a
standard x68 assembly instruction, and this will result in the function getting called and
after that function gets invoked it returns back to the calling function.
In order to invoke a system call however, we use a TRAP instruction such as the int
0x80. So, int here stands for interrupt or software interrupt and it results in the system
shifting from the user space or the user space mode of operation to the kernel space. So,
the trap instruction causes the kernel to be invoked and it causes instructions in the
kernel to then be executed. However, when we use the standard function call or the
procedure call using the CALL instruction there is no change or shift from user space to
kernel space and so on. So, the execution continues to remain in the user space as it was
before.
36
Another very settle difference between the system call and a standard procedure call or a
function call is that the destination address or the destination function which is invoked
can be at a relocatable address. So, it could change every time the program is compiled
and so on. However with the system call when a trap instruction is used the hardware
actually or the processor actually decides where the next instruction in the kernel space
should get invoked. So, this is going to be fixed irrespective of what program is running,
what operating system is running, and so on.
So, one crucial aspect when actually designing operating systems is what system call
should the operating system supports? We had seen that the only way a user process
could invoke a particular functionality in the operating system is through the system call
interface. So, the question now comes that if a person is actually designing an operating
system, so what are the interface that the system call should support.
So, one obvious requirement is that the system call interface should have several
sophisticated features so that a user process could actually very easily be able to interface
or invoke several important functionality in the operating system. However a different
approach is to have a very simple system call interface and abstract whatever is
necessary from the operating system. So, we will see in the next slide, a particular
37
example in this.
Let us take an example of system calls that an OS supports for accessing files. So, as we
know files are a data which is persistent across reboots, so these are data which is stored
in the hard disk and could be read, written or accessed using function such as fopen,
close, read, write and so on. Every time we do a file open, it would require that the hard
disk be accessed, so a process would need to invoke a system call into the kernel and the
kernel should actually take care of accessing the hard disk or a hard disk buffers or any
other storage medium to open the file and return back a pointer to the process.
Now the question comes is how does a operating system designer decide what system
call should be provided or supported in order to access files? So, some of the obvious
things are like there should be system calls to open a file, read or write to a file, there
should be system calls to actually modify the creation date, set permissions and so on.
The operating system could also support more complicated or more sophisticated
operations such as being able to seek into a particular offset within a file, be able to link
between files and so on. So, these are the essential requirements that a system call for
handling files should support.
38
On the other hand, operating system should be able to hide some details about the file.
For instance, details which should not be supported by system calls is like things such as
like details about the storage media where the file is stored, for instance whether the file
is stored on a USB drive or a Hard disk or a CD-ROM. This is actually abstracted out by
the operating system, and the user process would not be easily able to know where the
file is stored. Another aspect which is abstracted out by the operating system is the exact
locations in the storage media.
We will now look at how a typical Operating System structure looks like. So, the
Operating System, suppose we consider this as a big green block have a several modules
built internally. For example, it would have a memory management module which
manages all the memory in the system, it would have a CPU scheduling block that is also
the file system management module which will control how the file system such as the
those present in hard disk or a CD-ROMs are managed. So, you have a networking stack
which manages the TCP/IP network and you have something known as the inter process
communication module which would take care of processes communicating with each
other.
So, two important things which have not been mentioned as yet is the System Call
39
Interface, which allows user processes to actually access features or functionalities
within each of these modules. Another aspect is the Device Drivers which would take
care of communicating with the hardware devices and other hardware resources within
the system. So, this essentially is all the different modules that an operating system
supports.
So, In a Monolithic Structure of an operating system, all these various modules in the OS
are present in a single addressable kernel space, so what this means is that this is just one
large chunk of code and all of them are you could think of as one large program where
all these modules are present in. So Therefore, calling any function from the memory
management to say the networking stack would just mean a simple function call.
Similarly from the networking stack to the device driver would be another function call.
This is essentially the advantage of having such a monolithic structure, is that you could
there is a direct communication between one module and another. On the other hand the
Kernel space becomes very large, and therefore, difficult to maintain and is likely to have
more bugs. So, typical operating system such as Linux and xv6 and MS-DOS uses a
monolithic structure. So, to take an example the Linux operating system or the Linux
Kernel has around 10 million lines of code. So, all these 10 million lines of code
40
comprises of the entire kernel Linux Kernel which is actually present in this area.
Another common structure of the operating system is known as the Microkernel

structure, where the kernel is actually highly modular and every component in the kernel
has its own addressable space. So, it is like having each of these as independent
processes and you have a very small microkernel which actually runs in the kernel space,
which is in charge of managing communication between each of these processes, and
also communication between user process and the operating system processes.
So, the advantage here is that this microkernel is extremely small. So, ideally it is small
enough to actually fit into the L1 cache of the system itself, so typically it would be quite
fast. However, the drawback is that you now cannot have direct calls from say the file
management to a device driver or rather like unlike the monolithic kernel where you
could make direct function invocation from a file management module to a device driver
function. Here every invocation of that form should be through a communication channel
known as an IPC or Inter Process Communication channel.
With this we would actually end today's lecture, and from the next lecture we will
actually look more about CPU sharing.
41
Thank you.
42
Week - 01
Lecture - 04
Sharing the CPU
Hello. In the introductory video, we had seen that one of the most important applications
of an operating system is for resource management. Essentially, since the system that the
OS runs on has limited amount of hardware resources, so the operating system is in
charge of utilizing these resources in an efficient way.
Among all the resources that are present in the system probably the most critical resource
is the CPU. So, most systems have a very few number of CPUs; earlier systems in the
especially the earlier systems in the 1990s and the early 2000s had a single processor,
while systems these days are equipped with 4, 8 or 16 CPUs. So, it is important that the
operating system ensures that these CPUs present in the system are adequately utilized.
So, we will see how these CPUs are managed by the OS.
43
In systems these days, we have several applications that are running contiguously. So, we
had seen that we could have something like Office application, Internet explorer, a Skype
application and a Power point application all of them need to share the single CPU. So,
how is this possible, essentially how does the operating system decide who should run or
rather which application should run on the CPU?
One very simple thing that the operating system can do is to put one application onto the
CPU, and let it run till it completes. For instance, a very primitive operating system such
as the MS DOS operating system would start an application and make it execute in the
CPU until it completes. When that application completes then the next application that is
App 2 would execute in the CPU till it completes, then application 3 would start and
execute in the CPU till it ends and then application 4 would start. While, this is a very
easy way to manage the CPU time, it is not very efficient and we will see why.
So let us say application 1 is executing on the CPU and after sometime it waits for a
particular operation to be done by the user. For instance, the application1 is waiting for
an event like a scanf. Essentially it is waiting for the user to input something to the
keyboard. So, from this time onwards until the user actually presses a key, the CPU does
nothing but is waiting idly. So essentially it is wasting time, so these are the idle cycle
44
times of the CPU. Only when the event is obtained, will the application 1 continue to
execute.
Thus, we see that even though the scheme of executing 1 application after the other
completes is very simple, yet it does not utilize CPU time adequately or efficiently
because every time that the application requires an event, the CPU would be idle until
that event actually occurs. And this as we see would cause a reduced performance with
respect to the utilization of the CPU.
So a better way to do is something known as Multiprogramming. So, in an operating

system that supports multiprogramming, the application 1 will continue to execute until
it requires an external event. So, when a function like scanf gets executed in App 1 that
application will become blocked. Essentially it will be preempted from the processor and
another application in this case app 2 will execute in the CPU. So after a while, when the
user inputs something through the keyboard, the event is obtained and as a result of this
application 1 is put into a queue.
At a later point in time, application 1 will be again given the CPU and will execute from
where it has stopped. Essentially it had stopped during the processing of the scanf, and it
45
will continue to execute from where it had stopped. Essentially now it has got the input
character which the user has pressed on the keyboard and it will continue to execute from
this point onwards. Therefore, we have seen that we are preventing the CPU from being
idle. Essentially by executing an other application on the CPU, we have preventing the
CPU from idly running and therefore, increasing performance. However, there is a
problem with this also.
Now, consider this particular scenario where application 1 runs for sometimes and then
gets blocked. However, application 2 has something like this (mentioned in slide) present
in its code; essentially this is an infinite loop. So, while(1) would continue to execute
infinitely. So, as a result of this app 2 will never stop, it will keep executing and holding
onto the CPU.
So, due to this other application such as app 3, app 4 and as well as app 1, when once it
has obtained the event will never be able to run. The only way to terminate this particular
thing would be to forcefully stop application 2 or reboot the system. In earlier operating
systems like MS DOS, this use to happen quite frequently. As a result of this, it was often
require that the CPU be restarted because of one application that is hanging the entire
system.
46
More recent operating systems support what is known as Multitasking or Time Sharing.
Essentially, in this case, the CPU time is split into slices or so each slice is known as a
time slice or a time quanta. In each time slice, a process would run; the process will
continue to execute until one of two things happen.
First, it will execute until its time slice completes; in which case another process is given
the CPU. So, for instance, over here process 1 executes until its time slice completes, and
then process 2 is given the CPU and the process 2 would execute. Then when process 2
completes its time slice, process 3 will be given the CPU and will continue to execute.
Now, in this way all processes are sharing the CPU. After sometime the processes could
continue to execute from where it had stopped. For instance process 1 had executed till
its time slice and after sometime based on some decisions by the operating system
process 1 would be given the CPU again to execute. At this point in time process 1 will
continue to execute from where it had stopped.
From a user’s prospective, this delay as a result of multitasking that is as a result of other
processes being executed or rather said another way, the delay or the intermediate
execution of a single process is hardly noticeable because the time slice is very smaller.
47
The other way that a process stops executing is when it requires to wait for an event as
we have seen in the previous slide or it terminates.
In either way, it results in another process being offer the CPU and the new process will
execute. So, the advantage of multitasking is that from a user’s perspective, it gives the
impression that all the applications are running concurrently, there is no starvation. Also
over all from a system perspective the overall performance is improved.
So, the CPU was one example of a resource which the operating system managed. So,
essentially the operating system decided which of the several processes would be given a
chance to execute in the CPU. So, along with the CPU, the system would have several
other resources; and the operating system would need to share these resources among the
various users i.e among the various applications. For instance, the printers which are
connected to the system, the Keyboards, RAM, disks, the Files on the disk as well as
networks. All these are resources in the system and the operating system should manage
these resources and ensure that all applications have a fair usage of these various
resources.
48
So, what would happen in the case of Multiprocessors? Essentially, where the system has
more than a single processor, such a system would look something like this. Essentially,
there would be several chips or CPU chips and each chip would have several cores. For
instance, in this case there are two chips, CPU chips; and each CPU has two cores; and
further it is possible that each core runs multiple threads. So, this is achieved by a
technique known as Symmetrical multithreading. In the Intel nomenclature this is known
as the Hyper threading. So in this particular system, which has two chips, a total of two
cores per chip and a total of two symmetrical threads per core, the CPU has to manage
all these resources.
49
So essentially in such a case, the CPU has to ensure that all computing environments in
the system are adequately used. So in such a case, we would obtain something as
parallelism in the sense that suppose we have 2 processors over here and let us assume
that each processor has a single core in them. So, it would mean that the operating
system could then schedule two processors or rather two applications to execute
simultaneously on the processors. So, one application will execute on one processor,
while the other application executes on the second processor. In addition to this, time
slicing is possible on each processor i.e each processors time could be split into time
quanta or time slices as we have seen before and shared among the various applications.
So in this case, for example, each processor has a time slice which completes
periodically; and at start of each new time slice, the operating system running on the
processor would ensure that application gets scheduled to execute on that CPU.
Similarly, for the second processor, there is also time slices and at the end of a time slice
the application being executing will be preempted from the processor and the OS will
select another application to execute. So, the operating system should be designed in
such a way to ensure that applications unless program to do so, do not execute
simultaneously at the same instant in both the processors.
50
Now one huge issue with a multitasking environment, where time of each processor is
split into quanta or time slices allowing different processors or different applications to
run concurrently. And also when there are multiple processors present in the system,
allowing two or more processors or applications to execute in parallel. So, the issue what
arises over here is when two processors or two applications simultaneously request to
access some resource.
For instance, let us say this resource is a particular file or it could even a device like a
Printer, and we have two applications - application 2 and application 5 want to write or
print something into this resource. So, both application 2 and application 5 want to use
this particular resource. So, this results in what is known as a race condition; and it
results in quite a huge issue when we are studying operating systems or when we are
actually looking deep into operating systems.
So essentially, in order to avoid such an issue, the operating systems needs to

synchronize between the two applications. It should ensure that were when one process
request for a particular resource like a file stored in the hard disk or the printer, an other
application app 5 will not be given permission to write or access that particular resource.
Only when application 2 complete using its resource, will application 5 be allowed to use
51
that resource; said another way the operating system will ensure that there is a serialized
access to this resource. It will ensure that not more than one application is using this
resource at a particular instant of time.
Essentially, operating systems solve these problems by using a technique known as

Synchronization. Now, synchronization you could think of is kind of a lock, which is
associated with a resource. So, when a application wants to access that resource then it
should first get the lock; essentially it should first acquire the lock. So, for instance, we
have application X which wants to use the resource; in such a case application X should
lock the resource. So, locking the resource would mean that no other application such as
App Y or 5 in this case would be able to use this resource.
Now after application 2 completes using the resource, it will unlock the resource. And
during that time, if we have a second application which has also requested for the
resource, the second application would have to wait. So, when application 2 completes, it
will unlock the resource; and this is a signal to application 5 that it can then use the
resource. In order that application 5 uses the resource, it will first lock the resource
ensuring that no other application can then use a resource; and at the end of its usage, it
52
will unlock the resource. Unlocking will allow other applications to use this shared
resource.
Now, one decision that the operating system needs to make especially in a multi-tasking
or multi-programmed based system is the decision about which application should run
next. So, as we have seen in a multitasking environment application 1 would execute till
its time slice and the operating system then decides which process or which application
should run next. So, this decision is made by an entity within the operating system call
the Scheduler.
Essentially, the scheduler would be designed in such a way that it needs to be fair which
means that it should be design in such way that every process or every application
running on that system should get a fair share of the CPU. Also in some systems
especially, the scheduler should be designed in such way so as to prioritize some
applications over the others. And what we mean by this is that we could have some
applications which are far more important than other applications and these applications
need to be prioritized. In other words, these high priority applications should be given
more CPU time to execute as well as it should ensure that these high priority applications
do not wait too long to execute in the CPU.
53
To take an example, let us say we have three applications - application 1, 3 and 4, which
are low priority applications. For instance, it could be like executing a compiler or doing
some scientific operations like computing the primality of a number and so on. So, these
it could consider as a lower priority application. While process 2 let us assume is a high
priority application, in the sense that it could control some parameter in say a robot or it
would acquire some information about the environment such as the temperature or
humidity of the state of a particular value.
Now, since this application is of a higher priority, therefore the operating system should
ensure that this application 2 should be given more CPU time to execute as well as this
application should not be waiting too long before it gets the CPU time. So, this entire
decisions based on the number of processors as well as the number of CPU cores and the
number of threads in each core, and how processor should be executed in which CPU
and which thread in the CPU is decided by a scheduler, which executes in the operating
system.
And other very important requirement when we speak about operating system is the
requirement for Isolation. Essentially, this arises from the fact that we would have
multiple applications that execute either concurrently or in a time sliced manner in the
54
system. Now these applications could be from different users. Also it could happen that
some of these applications are malicious i.e these applications may be a virus or a Trojan
which has manage to find its way into the system and is manage to execute in the CPU.
Therefore, isolation is required to protect one application from another and thereby the
data in one application cannot be visible from another application.
And other requirement for isolation is with respect to the kernel i.e with the operating
system. So, the systems are designed in such a way that processors or applications that
execute in the system do not directly access various resources.
For instance an application running in the system will not be given direct access to a
device such as the printer. If the application wants to use the printer, it needs to go
through the operating system i.e it would need to make a system call which in turn would
trigger the operating system to execute device drivers with respect to the printers. The
application would then send the document or the text file, which needs to print to the
operating system, which then transfers it to the printer.
So, in order to achieve this, most processors and most operating systems have this ring
like structure. Essentially, in the Intel platforms i.e the Intel processors, there are 4 rings -
55
ring 0, ring 1, ring 2, and ring 3. Now, ring 0 is where the operating system executes.So,
this is the most privileged mode of operating. So in ring 0, the operating system or the
kernel which executes over here can do quite a few things like a manage the various
resources, directly communicate with various devices and also could have control about
the various applications that are executing. So in modern operating systems like Linux,
which run on Intel processors, the third ring that is ring 3 is used by the user applications.
So, this is the least privileged ring and it ensures that applications that are executed in
this ring do not have access to the kernel i.e an application running in this ring under
normal circumstances will not be able to view or determine anything about the operating
system.
So, whenever you run a application on your system such as let us say a Web browser or a
Office application, so the application would be started as a user application in ring 3, and
in order to use a resource such as the network or to display something on the monitor, the
these applications would need to invoke the kernel through the system call. Therefore,
the kernel is isolated from all the user applications.
In addition to this, the operating systems as well as the processors are designed in such a
way that each application that runs in the users space is isolated from each other. What
this means is that if we have say of a Web browser running here and also an Office
application. The entire system of the operating system along with the processor will
ensure that the web browser has no information about the other application that it is it
does not know what the Office application is doing. And the Office application will not
know what the Web browser is currently executing. For instance, the office application
will not know what web pages are currently being browsed.
So, this you could see is useful to isolate different users as well. So, if you have two
different users who are sharing the system, so user 1 will not know what the user 2 is
executing unless he makes use of certain kernel functions. And the only way to use those
kernel functions is through system calls or in some certain cases like Linux, there are
certain functionalities which are displayed about the kernel. So, the user will then be able
to use those functionalities of the kernel to determine about the other process.
56
Another important aspect when are talking about operating systems is the need for
Security. Essentially, security features are added to operating systems to ensure that only
authorized users are able to use that system. So, if for instance, I do not have a user
account in a particular server, then the operating system will ensure that I will not be able
to login to that particular server, and I will not be able to run any application in that
server. So, how is security achieved in the system is by several features that operating
systems support. For instance by having a techniques know as access control, which will
ensure that some files that are created by a particular user is not visible to another user.
For instance, suppose I login to a system as a guest i.e through a guest account. So, the
operating system will then know about this and determine what files I am allowed to
execute and what files I will be able to access. So, one way to achieve security is by
using something knows as Access Control. So, access control mechanisms in the
operating system would allow or manage the various resources. For instance, if I login to
a system using a guest account then the operating system will detect this and determine
what are the resources and what are the files that I can access.
For instance, in normal guest accounts, I may not be able to access let us say the USB
drive while on the other hand I may be have given read only access to certain files
57
present in the hard disk. While other files which are system related or based on the
operating system will not even be given read, write. So, I will not be able to view those
files at all. Another technique, which is used by the operating system in order to achieve
security, is the user passwords and cryptography. So, passwords ensure that unauthorized
people are not allowed to use the system. So, modern laptops also use techniques like
biometrics or figure prints scans to filter people who are given access into the system.
So this particular slide shows how access control is implemented in typically. So,
essentially we have a matrix here (mentioned in the slide above) and each row
corresponds to a user. So, Ann, Bob and Carl are three users of the system. While the
columns show the resources i.e file 1 which is stored in the hard disk, file 2, file 3 and
program 1. So, in the system the operating system depending on the user who has logged
in based on their password is given different permissions for each file.
For example, the user Ann has the read and write permission for file 1 as well as file 2
and can execute program 1. However, Ann has no permission at all for file 3. In a similar
way, Bob another user of the system can only read file 1, he cannot write to file 1, but he
can read as well as write to file 3, and based on permission said in the operating system,
bob does not have the access to program 1 that is he cannot execute the program 1.
58
Similarly, the third user Carl cannot have access either read or write access to file 1 and
file 3, and can have only access to file 2; however, Carl can execute program 1 as well as
can read program 1.
So let us say now that we have our system filled with the processor, the various resources
and the operating system which manages these various resources and also manages the
applications that execute on the system. So, how do we access the security of the system?
So, there are two ways in which this is generally done. So this is done by mathematical
analysis or by manual or semi automated verification techniques.
So, many systems especially those which are used in critical applications such as the
military or other defense related applications, go through rigorous security assessment to
ensure that these systems including the operating system as well as the hardware and
applications have sufficient amount of security, so that it could be used for such critical
applications. So, this is more of a testing and validation related aspect in the
system.Thank you.
59
Week - 02
Lecture - 05
Memory Management
Hello. In a previous video (Lecture 4 video) we have seen, how the operating system has
to manage the CPU because it is probably the most important resource in the system.
Arguably, the second most important resource, rather the second most important
hardware resource in the system is the Memory.
In this video, we will look at how the operating system manages the memory?
Essentially, we will see how the operating system manages the RAM present in the
system.
So let us review this particular slide (mentioned above) again. We have seen that when
we take up a particular program and this could be any program written in any language
and when we compile it, we will get what is known as an Executable.
60
Now, the executable will be stored on the hard disk and whenever a user runs this
program or executes this program (mentioned in above slide), it creates what is known as
a Process. So, this process is created by the operating system and it executes in RAM.
So, essentially what the operating system would do is that the executable stored in the
hard disk, would be loaded into the RAM and then on execution is passed to this
particular program and this program will then execute in the CPU.
So, as we have seen this particular process (mentioned in above slide) which is present in
memory comprises of several segments, such as the text segment which contains the
Executable instructions, the Stack, Heap and also as we have seen some hidden metadata
in the operating system such as registers, list of open files and related processes. So, all
these actually constitute the process. Now, what is important for us is this process
(mentioned in above slide) and it is presents in the RAM.
So as we know the RAM or Random Access Memory also called as the Main Memory is
a limited resource in the system. So, each system would have something like 4, 8, 16 or
32 GB of RAM. At the same time we may have multiple processes executing almost
simultaneously. So, like we have seen in the previous video (Lecture 4 video), these
61
processes may execute in a time sliced manner and if there are multiple processors
present in the system, then these processors could also execute in parallel.
However, in all these cases, these processes and their corresponding memory map should
be present in the RAM. Essentially the memory map corresponding to process 1, the
memory map corresponding to process 2, memory map of process 3 and the memory
map of process 4 should be present in the RAM, in order that these 4 processes execute
in parallel or in a multi tasked environment.
Now, what we are going to see is how the operating system manages this resource
(mentioned in above slide) or in other words the RAM such that it would allow multiple
processes to execute almost simultaneously in the system.
So, one of the most primitive ways of managing memory especially done for the older on
a operating systems is what is known as the Single Contiguous Model. So, essentially in
this we have a RAM over here (mentioned in above slide) which is the RAM of the
system and what is ensure or what is done by the operating system is that this RAM is
occupied by one process at a time. Essentially at any particular instant there is only one
62
process and it is memory map that is present in the RAM. So, after this process
completes executing will the next process will be loaded into the RAM.
So, the drawbacks are quiet obvious.
63
The first is that we are forced to have a very sequential execution. When one process
completes, only then the second process could occupy the RAM and so on. Another
limitation of this particular model that is a single contiguous model is that the size of the
process is limited by the RAM size. For instance, let us say we have a RAM which is of
say 12 KB while this is seems to be a very small amount of RAM, this size of RAM is
quiet common in embedded system. So, given this RAM of say 12 KB and let us say our
process size is of 100 KB then it is quite obvious that this process cannot execute using
this RAM. Essentially the RAM is not sufficient to hold the entire process.
The next model that we will see, which is a slight improvement over the single
contiguous model is what is known as a Partition Model. Essentially in this model at any
instant of time, we could have multiple processes that occupy the RAM simultaneously.
For instance, in this particular case we have two processes. This blue process and the
green process (mentioned in above slide) that occupy the RAM and therefore, the
processor could then execute this process as well as this process either in parallel or in a
time sliced manner.
Now, in order to manage such a partitions, the operating system would require something
known as a Partition Table. So, typically this partition table (mentioned in slide above)
64
would also be present in the RAM. It will be present in an area which is not shown over
here. So, the partition table would have the base address of a process, the size of the
process and a process identifier. For instance, the blue process has a memory or a base
address of 0x0 indicating that is starting from 0 th location in the RAM and this process
has a size of 120 KB. So, 120 KB means up to this point.
There is also a flag known as the usage flag which mentions whether this particular area
in RAM is in use or it is free. For instance, let us take process 1 that is this one, the green
one over here shown over here (mentioned in above slide). So, this process 1 starts at the
memory location 120K that is this point and has a size of 60K. So, it extends up to 180K
and this area is also in use. Now, there is another entry in the partition table which is
specified as free. So, this starts at 180K and extends for a size of 30K. So, this
corresponds to this white area over here (mentioned in above slide). So, the operating
system could possibly use this free memory to run perhaps the third process and
therefore, would be able to have three processes present in the RAM at the same time.
So, let us say a new process with the identifier 5 has just being created. So, the operating
system would look into the partition table and create a particular entry for this new
process. Now, since this process requires a size of 20 KB of RAM therefore, the
65
operating system will allocate 20 KB of RAM for that process. So, this allocation is done
from the free space and as we see over here (mentioned in above slide) the 3 rd process
now gets end present in the RAM while the free space reduces in size. So, we now have
the 10 KB of free space.
Similarly, when a process completes execution, for instance when process one completes
its execution, the area in RAM that it holds will be de-allocated. Consequently the entry
corresponding to the partition table will be free. So, this would lead to a free memory of
60K in the RAM. So, this corresponds to this particular area in the RAM (mentioned in
above slide) which was used by process 1. While this technique of partitioning the RAM
and therefore, allowing multiple processes to be allocated in the RAM is quiet easy to do.
It could lead what to something known as Fragmentation.
So, let us say this is how our RAM looks and as we seen we have a total of 70 KB of
RAM which is free i.e we have 60 KB + another 10. So, that is 70 KB of free RAM
memory. However, in spite of having 70 KB of RAM, a new process such as a new
process with Id 6 which has a size 65 k cannot start even though 70 KB of free RAM is
available and the reason for this is that the 70 KB of free memory is not in contiguous
locations.
66
For instance, we have 60 KB of free memory present here and 10 KB of free memory
present here (mentioned in above slide). So, individually each of these blocks of free
memory is not sufficient to start a new process of 65 KB. So, this is what is termed as
fragmentation and could result in what is known as the under utilization of the RAM.
Now, let us assume that when a new process arrives, there is sufficient amount of free
spaces that are present in the RAM memory. So, next the operating system has to decide
that which among this free memory should the new process be loaded into. For instance
over here, there is a new process which has just arrived and we have 3 blocks of free
memory, this one, this one and this one (mentioned in above slide). So, which of these
three blocks of free memory in the RAM should the new process be loaded into?
There could be several algorithms to achieve this. So, one of the most simplest algorithm
is something known as the First Fit. So, with this First fit algorithm what the operating
system would do is scan the free blocks in RAM starting from the top and use the first
available free block which is at least as large as the processes requirements. So for
instance over here, this RAM is scan from the top and it was see that the first free block
is too small for the process to fit in. Therefore, it is not allocated here.
67
The next free block is large enough to allocate this process. Therefore, the process is
allocated here (mentioned in above slide). While this First fit allocation algorithm is
extremely easy to perform from an OS perspective, it could make fragmentation worse.
Essentially with the First fit, what we are doing is that we are breaking the amount of
free blocks into smaller and smaller chunks. Thus individually each chunk will not be
able to get a single process, thus making the fragmentation worse.
The other algorithm that we will see is known as the Best fit algorithm and this performs
considerably well with respect to the fragmentation. Essentially it will not fragment the
memory as much as the first fit algorithm. So, what happens here is that when a new
process enters or is created, the operating system will scan through all free blocks that
are available and choose the block which is the best fit for that process. So, in this
particular case as we have seen earlier as well, this particular free block is too small
(mentioned in above slide) to cater to this new process. This free block is too large while
the 3rd free block is just correct.
So, the operating system would allocate this process into this free block as shown over
here (mentioned in above slide). While the best fit algorithm efficiently utilizes the
available RAM reducing the fragmentation issue, there may be a performance hit.
Essentially, it may result in a deterioration of performance. The reason being is that now
the operating system has to scan through every free block that is available and needs to
make a decision about which free block is best for the new process and this would take
some time.
68
Another issue with the partitioning model is the case of deallocation. So, essentially
deallocation would occur when a process completes its execution or is terminated and
has to be removed from the RAM in order to allow a new process or another process to
execute from the RAM.
So, deallocation would require that a new free block to be created and it would result in
the free flag set corresponding to this block in the partition table. So, what the operating
system now has to do is that it should detect that there are indeed three contiguous free
blocks that are present in the RAM and as a result of this, it should detect these three
contiguous free blocks (mentioned in above slide) and be able to merge these three
blocks in to one single block.
69
The advantage of merging it into one thick free block is that now this larger free block
could cater to a larger process and thereby reducing the effect of fragmentation.
So, thus we have seen that the major limitation of the single contiguous model as well as
the partition model is that the entire memory map of the process needed to be present in
70
RAM during its entire execution. So, all allocation for the entire process needed to be in
contiguous memory and because of these issues, it had led to fragmentation, limit on the
size of the process due to base on depending on the RAM size and also performance
degradation due to book keeping and also the management of partitions. So, luckily for
us modern day operating systems do much better in managing memory. So, most modern
day operating systems use two concepts that is virtual memory and segmentation.
So in the next few videos, we will look into these memory concepts rather these memory
management concepts and we will also see how the Intel x86 processors manage
memory.
Thank you.
71
Week – 02
Lecture – 06
Virtual Memory
Hello. In this video, we will look at Virtual Memory; which is by far the most commonly
used memory management technique in systems these days.
So in a virtual memory system, the entire RAM which is present in the system is split
into equal sized partitions called Page frames. So typically, page frames would be of 4
kilobytes each. So in this RAM for instance, we have 14 page frames; which are
numbered 1 to 14. And, each of these page frames have the same size.
In the previous Intel processors, all pages were fixed at 4KB. But, in more recent
processors we could have pages or page frames, which are of a larger size. In a similar
way, the process which executes in the CPU or the process map corresponding to the
process is also split into equal sized blocks. Now, the split of a process is in such a way
that the block size of a process that is this size (mentioned in above slide) is equal to the
page frame size; that is the size of each block in this process is equal to the page frame
size.
72
Now because we have split the RAM like this (mentioned in above slide), as well as the
processor’s memory in a similar way, what we can then do is allocate blocks in a process
to page frames in the RAM. Additionally, what the operating system maintains is a table.
This table is also present in the RAM and not shown over here. But, what it contains is
the mapping from the process blocks to the page frames. For instance over here, the
mapping is very simple; block 1 of the process gets mapped into page frame number 1,
that is, block 1 of the process gets mapped into page frame number 1, block 2 gets
mapped to page frame 2, block 3 gets mapped to page frame 3, and so on.
Now this (mentioned in above slide), I would like to highlight again; is a per process
page table. So per process means that every process running on the system will have a
similar page table as shown over here.
Now because we have such a table which provides the mapping between blocks of a
process to the corresponding page frame, what we can then do is we could have any kind
of mapping that we choose. For instance, now we have block 1 of the process present in
block 14 that is over here (mentioned in above slide) block 2 of the process present in
page frame 2, block 3 of the process in page frame 13 and so on.
Essentially, what we are able to achieve with this particular process page table is that the
blocks of the process need not be in contiguous page frames in the RAM. However, the
overhead that we incur is that every time a particular memory location needs to be
73
accessed in the processors memory map. The CPU would need to look up the processors
page table, obtain the corresponding page frame number for that particular address and
only then can the RAM be addressed or accessed. For instance, let us say that the
program executing in the CPU accesses a memory location inside a block 3 of the
process. So, this could be an instruction or a load or store to some data.
Now, when such an operation is executed a memory management unit present in the
processor would intercept this access. And then, it is going to look up in to the page table
and find that the corresponding page frame, which towards block number 3 is 13. Then,
it is going to generate something known as a physical address, which will then look into
the 13th page frame in RAM. As a result, every memory access has the additional
overhead of looking into the page table, before the access into RAM can be made
possible.
So this look up into the page table is the extra overhead. And, typically this overhead is
partially mitigated by using something known as a TLB cache or a Translational
Lookaside Buffer cache. So, we will not go into details about the TLB cache. But, for our
understanding with respect to the course we are studying, we need to remember that
every memory access or every load or store of instructions fetched by the process during
its execution would need to look up a particular page table and only then can the RAM
be accessed.
74
Now, what we mentioned was that every process executing in the system would have its
own process page table (mentioned in above slide). Thus, if you have a second process
that is process 2, a process 2 will also have associated page table; process 3 will also
have its associated page table. Now, these processes or these processes as we have seen
the memory associated with these processes are present in the user space region of the
memory. However, the process page tables are present in the Kernel’s space. And
therefore, any program you write in the user space, but not be able to determine what the
process page table mappings are.
Now, since we have multiple processes executing in the system, each process could have
various page frame mappings. For instance, (mentioned in above slide) process 1 has a
mapping of all the blue page frames in this particular RAM; while process 2 has all the
yellow page frames; when the third, process 3 has all the orange page frames.
Now, what we can see is that we are capable of sharing the RAM with multiple processes
simultaneously. Now, assume that we have a system with a single CPU. This means that
at a particular point in time, only a single process will execute. Now, this process will
continue to execute until its time slice completes.
So, during the execution of the process, that is when for instance, process 1 is executing
in the CPU, this particular process page table (mentioned in above slide) will be the
active page table. Therefore, any instruction or data which is loaded or stored will have,
will look up this particular process page table to get the corresponding page frame in
RAM. When there is a context switch from process 1 to say process 2, it would be this
(mentioned in above slide) process’s page table, which will become active.
So, any address that is accessed in process 2 will get the corresponding page frame
number from this (mentioned in above slide) active page frame. Similarly, when process
3 gets executed in the CPU, this (mentioned in above slide) process’s page table will be
the active page table. So, what we can see is that it is not possible for process 2 to access
any of the page frames corresponding to process 1 or process 3.
75
The next thing we would look at is that do we really need to load all blocks into memory
before the process starts to execute? So, in the earlier video we had seen that when a
process executes, the entire process was loaded into a RAM. So, now we ask ourself that
is such a loading actually required? And, the answer is no. Essentially, what we notice
about programs is that not all parts of the programs are access simultaneously.
In other words, there is a locality aspect in the way a particular program executes. What
this means is that if some instructions are executed in let us say this particular block
(mentioned in above slide), it is highly likely that the next set of instructions or the next
few instructions that follow this would also be present in this block. So, once there is a
instruction to another block, say block 3, it is quite likely that because of the locality of
the program, there would be quite a few instructions executed from this block and so on.
In fact, you may have, and it is quite often the case that there may be several parts of the
process memory, which are not even accessed at all. That is, they are neither executed,
loaded or stored from memory. So, what the virtual memory concept does is that it takes
advantage of this locality aspect during the execution by using a concept known as
Demand Paging.
76
In a demand paging scheme, what would happen is that we would have on a secondary
storage device like a hard disk, there would be a particular space allocated as the Swap
Space. So, in this swap space (mentioned in above slide) all blocks of the executable will
be present. So, for instance if the process has 6 blocks, 1 to 6 and all the 6 blocks will be
present in the swap space. And only on demand, will blocks be loaded from the swap
space into the RAM.
So, now let us assume that there is an instruction which is executing in page frame 8 that
corresponds to block 6 of the process (mentioned in above slide). And, it has resulted in a
load or a store to a particular memory location in block number 5. So, what happens is
that the processor is going to look up the process page table corresponding to block
number 5 and see that the present bit is set to 0. So, this would result in something
known as a Page fault trap, also called as the Page fault interrupt.
Now, the page fault interrupt would then trigger the operating system to execute. Now,
the operating system will then detect that this was a page fault interrupt. And, it will
determine the cause of this page fault interrupt and it will load the corresponding page
from the disk into RAM. Consequently, it will also modify the process’s page table.
For instance, over here a value of 1 is added to the page frame number and the present bit
is set to 1. Now, all later accesses to this particular block 5 will not cause a page fault
because the present bit is set to 1. However, the first accesses to block 2, 3 and 4 would
77
result in page faults. And, it would cause the operating system to execute and load the
corresponding blocks into the RAM. Consequently, the page tables for that particular
process will be updated.
So now we have reached a state where every page frame in the RAM occupies some
block. So, these blocks could either be from the process 1 or process 2 or process 3.
Now, what happens if let us say there is a memory access to process 1’s 3 rd block
(mentioned in above slide). So, as we can see that since the present bit is set to 0 in the
process page table, it would result in a page fault interrupt that occurs. And, it would
trigger the operating system to execute.
Now, the thing is what will the operating system do? So, in order to cater to this
particular memory access in the 3rd block, the operating system would need to remove
one of these pages. Essentially, it needs to clear up one of these page frames in order to
make way for the new block. So, the obvious decision that the operating system would
need to make is which of these blocks should it remove, in order to make way for the
new block.
So, this choice is based on some algorithm present in the operating system. So, these are
known as the Page Replacement Algorithm. And, there are quite a few choices for that;
such as the First in first out, Least recently used or Least frequently used. So, based on
the decision that the replacement algorithm makes, the OS would swap out a block from
78
the RAM and replace it with the third block that was recently accessed.
Consequently, the present bit corresponding to the block that has been removed will be
set to 0. So, we had removed, for instance, the block number 1 from the RAM to the
swap. And therefore, the present bit corresponding to block one will be set to 0
(mentioned in above slide). Now, block 3 which has just got loaded into the particular
page frame will have its page frame number set and the present bit set to 1. So, this
process of removing a block from the RAM and replacing it by another is known as the
swap out and swap in, respectively.
The process of moving a block from the RAM to the disk is known as a swap out
process; while, the process of loading a block from the disk to the RAM is known as
swap in. Now, during the process of swap out, that is storing block 1 to the hard disk, all
the changes that happened during the execution of the process (mentioned in above
slide), all changes that occurred in block 1 will get updated in the swap space. Thus, the
block which eventually gets stored in the swap space will have the latest view of the
data. So, the next thing that we are going to ask is that if the swap out, that is, the
copying of the block 1 from the RAM to the disk is actually required?
Let us assume that block 1 which was present over here (mentioned in above slide) in
the 14th frame had no data that is changed. For instance, it could be corresponding to
instructions or read only data. And therefore, none of this data actually changes. In such
79
a case, the swap out process of copying the contents of this page frame back to the disk
will not be required. On the other hand, suppose we had data in block number 1 which
was modified, so this would indicate that the copy of block 1 in the RAM is different
from that in the disk. And therefore, the entire page frame should be swapped out and
return back into the disk.
So, how will the operating system know whether a swap out is required or not? So, this is
done by an additional bit, which is present in the page table known as the D-bit or the
dirty bit. So, the dirty bit essentially indicates whether the contents of a block in the
RAM has been modified with respect to its contents in the disk or in the swap space.
So, if the dirty bit is set to 1 (mentioned in above slide), it indicates that indeed there is a
difference between the contents in the RAM corresponding to the contents in the swap
space, and therefore during a page fault, the contents of the page frame will be returned
back to RAM. If the dirty bit has a value of 0, then the entire swap out process will not
be required. It will just require to swap in the new process and replace that corresponding
block in that corresponding page frame.
In addition to the p and d bits, that is the present and dirty bits, the page table for a
process has some additional bits known as the protection bits. So, these protection bits
would determine various attributes for a particular block. So for instance, the process of
the operating system could set various blocks as executable; which means that the data
80
present in that block corresponds to executable code.
And therefore, the CPU could then execute it. Other bits could also tell whether the block
is read only and therefore, modifying of any data in that block will lead to a fault or an
error. And, bits over here will also determine whether a particular block corresponds to
an operating system code or it is a regular user code. So, these additional information is
present by the protection bits.
Thank you.
81
Week - 02
Lecture – 07
Virtual Memory (MMU and Mapping)
Hello. In this video, we will discuss how the virtual memory mapping takes place
between the Virtual Memory and the RAM.
So essentially, let us start with this very famous slide by now; that when you write a
program and compile it, you get the executable a.out. And, when you execute this
particular executable, a.out executable, the operating system will create a process for
you. Now, the process is represented by what is known as the virtual address space. Now,
the virtual address space is a contiguous address space starting from 0 and extending up
to an address, defined by something known as the MAX SIZE. This MAX SIZE comes
from the xv6 nomenclature.
Now, within this contiguous virtual address space for the process are various sections
like the Text, Data, Heap and the Stack. So, all these sections are created by the operating
system. Essentially, the operating system will need to know where the heap would start,
where the stack would start and so on. So, how does the operating system know all these
82
information? So, this information is given to the OS through the - a.out. Essentially what
happens is that when the program is compiled and linked, the compiler will put a lot of
these information in the executable a.out.
So when executed, the operating system will read out this information and then
determines where in the virtual address space should these various segments be present.
So, now when this process is executing all these instructions corresponding to ‘printf’ for
instance, gets executed and all addresses corresponding to ‘str’ is accessed (mentioned in
above slide). So, it may be noted that all these addresses, all these instructions, would be
mapped into this particular virtual address space. For instance, if I were to print the
address of “Hello World”, essentially the address of this ‘str’ then the result would be the
virtual address with respect to this particular mapping.
So, what we will next see is how this virtual address mapped; gets mapped into the main
memory of the system. So, let us say that the processor wants to access this particular
string ‘str’, at least the base address of this string, and as we know that would contain the
letter ‘H’.
So, what would happen is that the processor will put out a virtual address on its bus. So,
let us say this virtual address is ‘v’ and this virtual address could be in the range from 0
<= v < MAX SIZE. So, MAX SIZE is the largest address that a user space process could
83
have in its virtual address space. So said another way, the virtual address ‘v’ would be
some address in the virtual address space of the process.
Next, another unit in the CPU known as the MMU or the Memory Management Unit, so
this entire thing (mentioned in above slide) would typically be in the same package or
jointly called the CPU or processor would then take the virtual address and convert it to a
physical address ‘p’, which will then be used to access the RAM.
So, what we will see next is how this virtual address gets converted to the corresponding
physical address. So, it may be noted that the virtual address corresponds to the virtual
address map, but the physical address correspondence to one physical address in the
main memory or the RAM.
84
So when a process begins to execute, the operating system would create a page table for
that process in RAM. So, this is the page table (mentioned in above slide). And as we
have seen in the previous video, the page table holds the mapping from the virtual blocks
or the virtual address space of the process to the physical page frames. And, we have
other bits like the present bit, dirty bit and the protected bit which is not shown here.
Now in the memory management unit, there is a register known as the Page table pointer
register or PTPR. So, in the Intel systems this PTPR or page table pointer register is
known as the CR3 register. And, we will see more details of this in a future video. So,
this PTPR register present in the MMU will have a pointer to the process page table. So
let us see how these things are used to provide the mapping from the virtual address
space to the physical space.
85
So, what happens is that the virtual address which is sent out by the processor core
comprises of two parts. It has a table index, which are a few bits typically, they are more
significant bits of the address and an offset. So, when the MMU obtains this virtual
address, it is going to look up the process page table, corresponding to the offset present
in the table index (mentioned in above slide). So, the PTPR would be the base address of
this particular process page table and, the higher bits in the virtual address would
correspond to the offset in the process page table. So from here, the corresponding page
frame is taken and it forms part of the physical address.
Now, the second part of the physical address is directly taken from the offset. So, in this
way we would then create what is known as the Physical Address. And, it is this physical
address that is used to access or to address the RAM on the main memory.
86
So let us see how this MMU mapping works for a 32-bit system. So in a 32-bit
processor, the virtual address space could be at most 2 32, that is, 4GB. The virtual address
is of 32 bits as shown over here (mentioned in above slide).
And, this is typically divided into two parts; the table index and the offset, which is of 20
bits and 12 bits respectively. So, how did we get this particular thing is by the fact that
each page frame as we had mentioned in the earlier video is of size 4KB. Thus, to access
any offset within the page frame would require 12 bits. 12 bits because 2 12 is 4096, which
87
is 4KB and therefore, we have an offset which is of 12 bits, which essentially would give
the offset within a page. The remaining bits that is, 32 - 12, which is a 20 bits, would be
used for the table index. Thus, the process page table would be, which is indexed by
these 20 bits would have 220 entries.
Now, assuming that each entry is of 4 bytes, then the entire size of this page table is
4MB. And, what essentially is required is that this page table has to be contiguous, that is
we need to have these 4 MB of process page table to be in contiguous memory locations.
So, why do we need it to be contiguous is essentially that the table index is added to the
process page table pointer, in order to get the location of a particular block and the
corresponding page frame. And therefore, it needs to be in contiguous memory. Thus, we
see that each process that executes on the system will have the additional overhead of
having a 4MB process page table, while 4 MB is not a very large space with today's
RAM sizes of 32 and 64GB. However, the requirement that it needs to be contiguous
could, would be an issue.
So, what some systems do, especially the Intel systems is to have 2 level page
translation. Essentially, instead of having just a table and offset for the virtual address,
now we have 3 components in the virtual address. We have a directory entry which is of
10 bits, a table entry which is also of 10 bits and the offset which as usual is of 12 bits.
88
Now, corresponding to the virtual address what would happen is that the directory entry
is used to index into something known as a Page directory. Now, this page directory is
pointed to by a page directory pointer register present in the MMU. Now, the contents of
this entry in the page directory would point to a particular page table. So, this (mentioned
in above slide) page table is of 4KB. And, this is also of 4 kilobytes; that is, the directory
is also of 4KB.
Next, the table entry is used to indexed into this (mentioned in above slide) particular
4KB table and from there, we obtain the first part of the physical address. And as usual,
the second part of the physical address is taken from the offset. So, how does the scheme
actually help us?
Now, what we see is that the number of page tables that can be present, that is the
number of these (mentioned in above slide) is 210 or 1024. And, how do we get 1024 is
that we have a 10-bit directory entry over here and 2 10 is 1024. Thus, the directory page
table is of 4KB having 1024 entries, So 2 10 entries. Since each entry points to a different
table, thus we can have at most 210 different page tables. Each entry in the directory
would point to a different page table. Now, how is the scheme actually helping us?
So in total although we still have 4MB of page tables that is required, but the advantage
we get is that they are not contiguous. So, we just need to have chunks of 4KB, which
needs to be contiguous and this is easily obtained because each frame in the RAM is of
4KB. So, the 2 Level page translation would allow us to have non-contiguous page tables
present.
89
Next, we will look at the sequence of steps that occur during the virtual memory
mapping. So, let us start with the particular program (mentioned in above slide) and this
is its virtual address space with different colors corresponding to each section of the
program. Also, we have the various blocks and as we have mentioned, each of these
blocks is of 4KB.
Then, what we have seen is that also there is a process page table present in the RAM,
and there is the RAM itself which is again split into equal size blocks known as the Page
frames. Now, there are actually three entities which are involved with the working of the
virtual memory. That is the Operating System, which is the software component of this
(mentioned in above slide) while the Processor and the MMU the memory management
unit are the hardware components.
So let us say that we have started to execute this program (mentioned in above slide).
And, this particular line i.e int main() in the program is the first instruction being
executed. Now, let us see what is the sequence of steps that occur as the program
executes. So, essentially when the user runs this particular program, it triggers the
operating system to create this particular process page table (mentioned in above slide) in
RAM. So, ideally this particular page table will have the present bits set to all 0 and the
other parts of the page table may or may not be empty at this particular point in time.
90
Third, the RAM may or may not contain any of the process blocks corresponding to the
newly created process.
Next, what happens is that the operating system would transfer control into the main
function of the program.
So this main function in the C code (mentioned in above slide) would probably point to a
particular instruction in the virtual address space. So, this would result in the processor
sending a particular virtual address corresponding to the main function. So, this virtual
address would be sent on to the bus and this virtual address is then intercepted by the
memory management unit.
91
And, the MMU would then look into the page table of the process and determine that,
that corresponding page or that corresponding block is not present in the RAM. So, this
is determined by the 0 bit present in this particular bit. So, that is determined by a 0 in
the present bit (mentioned in above slide). So as a result of this, what the MMU is going
to do is that it is going to cause a page fault to occur. So when a page fault occurs, it is
going to trigger the operating system to execute.
92
So then, the operating system is going to execute and determine the cause of the page
fault. So, essentially there is a sequence of steps that the operating system does whenever
the page fault occurs. So these are; that it will first check if the page referred is legal and
only if it is legal it will continue. Then, it will identify the page location on the disk
corresponding to this particular block (mentioned in above slide). Then, it will identify a
page frame in the RAM that need to be used. Then, essentially if it requires that a swap
out process is needed.
And if the dirty bit is set then the page which was previously present in the RAM would
be returned back into the swap space. While this is followed by the block corresponding
to this a virtual address being accessed loaded from the disk into the RAM. So, this
essentially is done by what is known as a DMA transfer from the hard disk to the RAM.
And, the operating system would just need to trigger this particular DMA transfer to be
initiated and also, it is going to update the page table. So, essentially it is going to say
that the first of all, the present bit is set to 1 and the page frame corresponding to block
number 2 will be set to 11.
Then, after the page gets loaded into RAM, the transfer comes back to the CPU or the
processor. And, the processor would reissue the instruction. So, this instruction would
mean that there is a virtual address being sent on the processor bus and it would cause
the MMU to convert that virtual address into the corresponding physical address.
93
So now what the MMU is going to see is that it is going to look into the process page
table, is going to see that the present bit is set to 1. Therefore, it would mean that the
block is loaded into a page frame in the RAM. And, it would then be able to convert the
virtual address to the physical address and cause the instruction to be loaded from the
RAM into the processor. And, in this way the processor executes an instruction.
So this is the sequence of steps that occur with the virtual addressing scheme.
Essentially, it not only involves just the operating system, but there is a inter working
between the operating system, the processor and the memory management unit in order
to achieve virtual memory. In other words, in order to make the virtual memory begin to
work.
Thank you.
94
Week – 02
Lecture – 08
Segmentation
Hello. In the previous video we had looked about Memory Management, especially we
had focused on a concept known as Virtual Memory. In this video we will look at another
important concept for memory management which is known as Segmentation.
So, when we look at programs in general they can be split into Logical modules. So,
logical modules for instance global data, stack, heap, functions, classes, namespaces, and
so on.
95
When we look at virtual memory on the other hand, virtual memory does not split
programs into logical modules instead virtual memory actually splits programs into fixed
sized blocks. So, while this would work in general it is not a very logical thing to do. For
instance we may have a few instructions of a function in one logical block while the rest
of the instructions of that function within a totally different logical block.
96
Segmentation on the other hand, achieves a more logical split of the program. So, we
could define segments to vary in size from a few bytes to up to 4GB and we could define
segments to be in a more logical order. For instance, we could have each function within
our program to be in a different segment as shown in this particular slide (mentioned in
above slide).
A very common usage of segmentation is to split the program into various segments such
as the text segment, data segment, heap segment and the stack segment. So, this is known
as the Logical view of the program.
97
Now, let us look how the address mapping is done from the logical view to the physical
view in segmentation. So, essentially we would have a segment descriptor table which is
stored in memory. Each row in the segment descriptor table pertains to one particular
segment. For instance, the Data segment 2 (mentioned in above slide) is at an offset 2in
the Segment descriptor table, and the offset would specify the Base address in RAM and
the Limit of the segment.
Now in order to make the mapping, the processor would have at least 3 registers that is
the Segment selector, Offset register and the ptr to the descriptor table. So, as the name
suggests the pointer to the descriptor table is a pointer to the memory location which
holds the descriptor table. The segment selector on the other hand, is an offset into the
descriptor table. The memory management unit in the processor would look up this
particular offset (mentioned in above slide) and pick the base value 3000. So, this base
value is then added with the contents of the offset register to get what is known as the
Effective Address. So, this Effective Address will correspond to some address in the
RAM.
98
Let us look at another view of this mapping scheme. The logical address comprises of
two parts, it comprises of a segment selector as well as an offset address also known as
the effective address. So, the segment selector in an Intel 32 bit processor is of 16 bits,
while the effective address or the offset register is of 32 bits. So, what would happen is
the contents of the segment selector is used as an offset into the descriptor table. The
memory management unit would then pick up the base address from this particular offset
(mentioned in above slide) and add the contents to the effective address to what is known
as the Linear address.
99
Let us look at the mapping done in segmentation with an example. Let us say the register
containing the pointer to the descriptor table has a value of 3000. So, this means that at
an address 3000 in RAM (mentioned in above slide) there is this descriptor table which
contains the mapping for the various segments. Now let us say that the segment register
has a value 1, so this means we are trying to use the segment at offset 1 in the descriptor
table.
The memory management unit would then take the base address corresponding to this
offset, which is 1000 in this case (mentioned in above slide) and use the offset register
which has a value of 100 to get what is known as the linear address that is 1100. Now,
the segment register along with the offset register form what is known as the Logical
Address.
100
So, one of the biggest problems with segmentation is Fragmentation. Let us look at this
particular example (mentioned in above slide); we have 70KB of space which is free in
the RAM. However, the free memory is not in contiguous locations; we have 60
kilobytes of free space in one chunk, another 10 kilobytes of free space in another chunk.
So, this cannot be used to allocate a new segment which is of 65 kilobytes. So, even
though there is 70 kilobytes of free memory available, the memory is not in contiguous
locations and therefore, cannot be used. Fragmentation is one of the biggest limitations
of segmentation; however, fragmentation is much less an issue with virtual memory.
101
We will now look at how Intel x86 systems makes use of both segmentation as well as
paging, so that advantages of both are obtained. So, in x86 system (mentioned in above
slide), the CPU generates a logical address comprising of a segment plus an offset. So
this is sent to a segmentation unit which then generates a linear address. The paging unit
is essentially the virtual memory management which would take the linear address and
generate the physical address. So, let us see how the segmentation unit is designed in x86
systems.
102
x86 systems have 2 types of descriptor tables. So, one is known as the Local Descriptor
Table, while the other is known as the Global Descriptor Table which we will be
presenting over here.
So the global descriptor table is stored in memory and has a format as shown over here
(mentioned in above slide). So, essentially it has the first field which is 0, followed by
Segment Descriptors. This particular global descriptor table is pointed to by a register
known as the Global Descriptor Table Register or GDTR. The GDTR is a 48 bit register,
having the following format. The least significant 16 bits contains the size of the GDT,
while the upper bits contains the base address that is the pointer to the GDT; that is this
pointer (mentioned in above slide). So, let us look at what the content of the segment
descriptor is?
103
So, the segment descriptor contains 3 parts it has a Base Address, it has a Limit and it has
an Access Rights (mentioned in above slide). The Base Address and the Limit can take
values from 0 to 4GB, while the access rights are bits which specify various access
policies such as Execute, Read, Write or the Privilege Level, for that particular segment.
104
Next, let us look at the segment and offset registers in Intel 32 bit machines. The segment
selector registers are 16 bit segment selectors which points to offsets in the GDT. The
offset registers are 32 bit registers. So, quite often the segment selectors, a couple bit
corresponding offset registers. For instance, in order to access the code segment we use
the CS register which is the segment selector for the code segment, and the
corresponding EIP register which is the offset register known as the Instruction Pointer.
In order to access the Data segment we have several segment registers such as the DS,
ES, FS and GS. In order to access the Stack segment we have the SS register which holds
the segment selector, and the SP register which holds the stack pointer. All these segment
selector registers and offset registers along with the GDTR and the GDT table present in
memory are used to convert the logical address to a corresponding linear address.
Next we will look at the paging unit which essentially manages the virtual memory
mapping in the x86 system. So, the paging unit takes a linear address and converts that to
an equivalent physical address which is then used to address the physical memory or the
RAM.
The paging unit in x86 system comprises of 2 level page translation. It takes a 32 bit
105
linear address which is split into 3 parts, the most significant 10 bits is known as the
directory entry, while followed by which there is 10 bits for the table index and finally,
the least significant 12 bits are the offset (mentioned in above slide). So the directory
entry points to a particular offset in the page directory. The page directory is a special
table which is present in the RAM and it is pointed to by the CR3 register. So the
contents of the page directory point to a particular page table and an offset within that
page table is taken from the table index. So the contents of this particular page table
along with the offset are then used to form the physical address.
So I have two questions for you. So one is how many page tables are present? So how
many of such page tables are present in a 32 bit Intel system? While the second question
is what is the maximum size of the process’s address space? So, given that each process
has a, such a linear address to physical address mapping. So, I want you to actually find
out what would be the maximum size of a process’s address space.
This particular slide shows the full address translation in an x86 system. The CPU puts
out a logical address comprising of a segment selector and an offset. The segment
selector is an index into the global descriptor table into something known as the segment
descriptor. The segment descriptor along with the offset then creates what is known as
106
the linear address, and this entire space is known as the Linear Address Map for the
process. The linear address comprises of 3 components that is the Directory entry, the
Table index and the Offset. So, the directory entry indexes into the page directory and
this content is then you select a page table. The table index is used to get an offset within
that page table and the contents of this, along with or the final 12 bits in the linear
address is then used to obtain the final physical address which is used to read or write
data to the RAM.
With these set of videos we had looked at memory management schemes such as virtual
memory and segmentation and we have seen how Intel manages address translation in 32
bit systems.
Thank you.
107
Week - 02
Lecture - 09
Memory Management in xv6
Hello. We had seen in previous videos about memory management schemes in

Processors and about Virtual Memory and Segmentation, and we also had seen about
how memory is managed in x86 systems.
In this video we will look at Memory Management in xv6. So, xv6 is an operating
system which is targeted for x86 platforms, therefore the video corresponding to memory
management in x86 processors would be important. So, we would be referring a lot of
xv6 source code in this particular video, therefore for reference you could actually look
at the xv6 source code booklet (revision 8) which it can be downloaded from this
particular (mentioned in above slide) website.
108
Now, just to recall in an x86 system, there is two levels of memory translation. So the
CPU puts out a Logical address which comprises of a Segmentation + an offset. Then,
there is a Segmentation unit which converts the Logical address into a Linear address
and a Paging unit which converts a Linear address to the Physical address and only then
the Physical memory or the RAM is actually accessed.
109
To get a full view of this memory management in xv6, we had seen this particular
diagram (mentioned above). And in particular, we had seen about the segmentation unit
followed by the paging unit. We had seen especially in the segmentation unit that there is
a segment selector, which is a register such as the stack segment, code segment, the DS,
ES, FS segment and this particular segment would index into a table known as the GDT
that is the Global Descriptor Table.
And in this particular GDT is what is present the Segment Descriptor. The segment
descriptor when combined with the offset will give you what is known as the Linear
Address. Let us look more about how segmentation is handled in the xv6 operating
system. So, first of all we will look more into the segment descriptor.
Each segment descriptor in an x86 system has 64 bits. So, these 64 bits can be viewed as
2 words of 32 bits each. The segment descriptor contains several attributes of the
segment. Some of the important attributes are shown over here (mentioned in above
slide). So, one important attribute is the segment base or the base address of the segment.
So, this base address comprises of 3 parts; it is of 32 bits and comprises of 3 parts, 16
bits over here, bits 16 to 23 over here, and bits 24 to 31 present over here.
110
The segment limit contains the limits of the segment. So, this is of 20 bits and is present
in 2 parts. You have 16 lowest significant bits of the present over here while the 4 most
significant bits present here (mentioned in above image).
Another important attribute in the segment descriptor is the privilege level. The privilege
level is of 2 bits and can have values from 0 to 3. So, user process’s which have the least
privilege level are given a value of 3, while operating system code which has the highest
privilege level has a value of 0. Another attribute in the segment descriptor is the
Segment type. So, these could have 3 values STA_X that is executable, readable
segment as well as writeable segment. In addition to this, you could actually combine
these attributes for a particular segment. For instance, you could have a segment, which
is executable as well as readable. So, you could specify that the segment has a type
STA_X as well as STA_R (mentioned in above slide).
So In xv6, this segment descriptor is represented by a structure in the operating system

code. So, if you look at mmu.h, you will actually see this particular structure (mentioned
in above slide). So, as you can see all the attributes in the segment descriptor are
represented in this structure. For instance, the base address which is of 32 bits and split
into 3 parts is also represented by 3 parts over here. You have 16 bits over here followed
111
by 8 bits and the most significant 8 bits present over here. In addition to this, there is a
macro in xv6 known as SEG which takes 4 parameters: type, base that is a base address,
segment limit and the dpl. So, this macro is used to create a segment descriptor, it is a
helper macro which is used to create a segment descriptor.
A typical usage of the SEG macro is as follows (mentioned in above slide). So, this
particular usage of SEG creates a segment which has a base address of 0, it has a limit of
232-1 that is 0xFFFFFFFF. It is having a privilege level of user, specified by DPL_USER
and it is of type W - that is the segment is writeable.
Xv6 does not make use of segmentation much, it only creates 4 segments. These are the
Kernel code, Kernel data, User code and User data. All these segments have a base
address of 0 and have a limit of 4 GB. All the code segments that is the Kernel code and
the User code are of type executable and readable that is X, R (mentioned in above
slide). The data that is the Kernel data and the User data are of type writeable that is W.
Now the Kernel code and Data have a DPL value of 0, which indicates the highest
privilege level while the User code and Data have a DPL value of 3 indicating the lowest
privilege level. Next we will see how these 4 segments are created in xv6.
112
So, when it comes to segmentation it is required to have a GDT, which is present in

memory. So this particular declaration which is struct segdesc gdt[NSEGS] which takes
the number of segments is an array comprising the GDT. So If you look up line number
2308 of the kernel code, you will see this declaration. So, this particular memory array or
this particular array is used to store the GDT table. So this particular array is filled by
these codes (mentioned in above slide). So, we create the 4 segments as follows. So, we
create the 4 segments has the Kernel code segment, the Kernel data segment, the User
code segment and the User data segment. So, these segments are created according to the
rules we have seen (in slide time 6:00).
Finally, we need to have the GDT register pointing to this particular GDT table, so that is
done by this invocation of this particular function known as lgdt or load gdt. So, load gdt
is a function present in line number 512; and essentially it takes two parameters, it takes
a pointer to the segment descriptor which essentially is the pointer to this GDT table (C
-> gdt) and the size of the GDT table which we have passed over here as size of c -> gdt
that is the size of this GDT table. And, essentially it just going to invoke this particular
instruction call lgdt (mention in above slide), which fills the GDT pointer in the mmu
with the address of the gdt.
113
Next we will move on from segmentation to the paging unit that is how the virtual
memory is managed in the xv6 operating systems. As we seen before in x86 systems,
virtual address sync is actually managed by 2 levels of page directories. As a result, the
linear address or the virtual address is split into 3 parts; you have a directory entry which
comprises of 10 bits, a table entry comprising of 10 bits, and an offset comprising of 12
bits. The directory entry is used, to index into a page directory table which is present in
memory.
In x86 systems, this page directory is pointed to by a register known as the CR 3 register.
Now, the contents of the page directory entry is then used to point to a page table. So that
the second part of the linear address that is a table index is then used to offset into the
page table to get the page corresponding page table entry then the contents of the page
table is then added to the offset to get the corresponding physical address.
114
Now, we will see how this virtual addressing scheme is managed in xv6? As the xv6
begins to boot, the operating system Kernel code and Data gets copied into the lower
regions of RAM as shown in this pink shaded region (mentioned in above slide). As the
OS continues to boot page directories and page tables are created such that the entire
RAM gets mapped into the higher region of the logical address space that is the 0th
location of RAM gets mapped into what is defined as the KERNBASE that is
0x80000000. Similarly, every address in this RAM would have an identically mapped
address in the logical space.
Why is such one to one mapping created between the logical address space for the
Kernel and the physical RAM? The reason is that, which such a map the kernel could
easily convert from a virtual address to the corresponding physical address that is given
any virtual address in the kernel space that is some over here (mentioned in above slide).
The operating system or the kernel could easily obtain the corresponding physical
address using a macro known as the V2P or the virtual to physical memory conversion
macro.
Similarly, in order to convert from a physical address to the corresponding virtual

address would be very simple. So, any read any physical address in the RAM could be
converted to the corresponding logical address or the corresponding virtual address by a
115
macro known as the physical to virtual memory macro is shortened as P2V. So the
important thing during these in these 2 macros is this KERNBASE.
So let us see these two macros in more detail. So, if you look at these two macros, this is
present in the line number 212 in the kernel source code that the macro P2V simply takes
an address ‘a’ (mentioned in above slide)- which is a physical address and converts it to
the virtual address by adding a KERNBASE. Similarly, the macro V2P takes a virtual
address ‘a’, and converts it to a physical address by subtracting the KERNBASE. So, as
you can see converting from virtual to physical and vice versa that is from physical to
virtual becomes very simple from the operating system perspective.
116
Now let us see in more detail how the xv6 operating system creates the mapping between
physical address and the logical address space. So, this particular figure here (mentioned
in above slide) shows the physical RAM. So, as you can see the physical RAM is divided
into several regions. The region from 0 to 640K is known as the base memory; the
region from 640K to 1 MB that is 100000 is known as the I/O space. While the region
from 1 MB to phystop - p h y s stop is the extended memory. Now this phystop is
defined as a macro, which signifies the maximum amount of RAM present in the system.
So, it could be something like 2 GB, 4 GB or 16 GB and could vary from system to
system. So the region above this which extends up to 4 G is used by memory mapped
devices.
Now, in order to create mappings for this particular physical RAM, the operating system
would create page directories and page tables. So, in order to create this mapping, the
operating system code defines several macros. So, some of the important ones are shown
over here (mentioned in above slide). So, we have already seen about KERNBASE
which is defined to be equal to 0x80000000 and there are other macros such as the
KERNLINK, PHYSTOP and EXTMEM.
117
Now the KERNBASE determines the virtual address at which the kernel’s space starts.
So, if you actually look at this logical address space, you will see that the KERNBASE is
defined at this particular location (mentioned in above slide). The region of memory
below this KERNBASE location is user space and this is the region where user processes
execute; while the region above this KERNBASE that is from KERNBASE up to the
maximum of 4 Gigs is where the operating system is present as well as where it manages
memory and interacts with devices.
Now, another important macro is the KERNLINK, which is the KERNBASE + 1 MB.
So, as you see over here (in above slide) this particular point, which is KERNBASE + 1
MB denotes the KERNLINK. So, this location is important because it is the location
where the Kernel code and Data are present. So, starting at the KERNLINK, there is the
Kernel text that is a Kernel code; and after the Kernel code is done, the locations
comprise of the kernel data that is the global data which is present in the xv6 operating
system. The end of the Kernel code and data is specified by this symbol ‘end’. The
region from end to this particular location 0xFE000000 is the free memory. This
particular free memory is use for various things in the operating system such as for the
OS heap as well as for allocating pages for user processes.
Now in order to create a map of this kernel space into the physical RAM a structure
known as kmap is defined. So the kmap structure which is present in the line number
1823 contains four elements (mentioned in above slide). So, it has the *virt -, which is a
pointer to the virtual address. Then it has the physical start address which indicates the
physical address which gets mapped to the virtual address and the physical end that is the
end of that region, and of course, there is the permissions. So, xv6 defines 4 such
regions. So, we have the I/O space, the Kernel text and read only data, the Kernel data
and memory regions and the device space. So, these four regions can be actually mapped
into this particular kernel space.
For example, the I/O space which starts from the virtual address KERNBASE and gets
mapped to the physical address 0 up to EXT EXTMEM. So, this is the 1 MB region
starting from 0 to the end of the I/O space that is the 1 MB mag. So, this is known as the
I/O space and is typically not used by the xv6 operating system.
118
The region KERNLINK onwards is used to store the kern text + rodata that is Kernel
code and read only data. So, KERNLINK region gets mapped from the start of the
extended memory in the sense that KERNLINK gets mapped into the extended memory.
So, it is at the start of the extended memory that contains the Kernel code and read only
data in physical RAM. Then we have a region known as the data region which gets
mapped into the free memory part, and finally, you have the device spaces which is
above this particular location 0xFE000000.
Let us see the sequence of steps that is involved in creating the mapping between the xv6
kernel space to the RAM. So, these are the four steps involved. First, there is a Enable
paging. So, by default when the system is turned on paging is disabled. So, in order to
turn on paging, a particular bit known as the paging enable bit which is present in the
CR0 register has to be set to 1. So the CR0 register is a specific register in the x86
processor and in that register there is a bit called the paging enable bit which needs to be
set to 1 to enable paging.
119
Another step is to create the page directories and the page tables. In order to create and
fill the page directories, the function walk page directory or walkpgdir is invoked by the
operating system. So, if you look up the source code, you will see this walkpgdir()
function present in line number 1754.
120
The walk page directory function creates the page table entry corresponding to a virtual
address. So, essentially it is going to create an entry in this particular page directory.
Secondly, if it finds that the corresponding page table entry in the page directory is not
present then it creates in RAM a page table for it. So, this page table (mentioned in above
slide) as we have seen will be of 1 page that is of 4KB. So, we have seen that there are
1024 entries and each entry is of 32 bits. So, in all this page table will be of 4 KB that is
it will hold one page.
In a similar way, other page tables are created whenever required. So, walkpgdir function
use a several macro such as the PDX macro which given the virtual address will extract
the page directory index that is it is going to just give you this upper 10 bits of the linear
address. Then we have the page table entry address which will give the page directory
entry and the PTX macro which takes the virtual address and gives you the page table
entry.
After creating the page directory, the next step is to fill in the page tables. So, this is done
by the map page which is present in line number 1779 of the xv6 source code.
121
So, what the map pages does is that it is going to fill this particular page table with the
mapping from the virtual address to physical address. So, this particular table entries
(mentioned above in red circle) contain as we have seen the physical address mapping,
permissions and also present bit. So, as we have seen that an entry in the page table is
then used along with the offset to create the corresponding physical address.
122
So, once we have created the page directories and the page tables, the final step is to load
the CR3 register. The CR3 register is another register in the x86 processor which
contains the pointer to the page directory.
So in other sense we have this CR3 register over here (mentioned in above slide), which
is present in the processor of the MMU and it points to the memory location of the page
directory. So now that we have seen how the xv6 code enables paging creates page
directories as well as page tables. Let us see how the xv6 operating system allocates
memory for various purposes. So, these purposes could be from allocating pages for user
processes or for the operating system its use itself.
123
We had seen that the xv6 code and read only data gets loaded in the 0th location of RAM
and extends upwards. At the end of the code and read only data up to PHYSTOP that is
the physical end of the RAM is the free memory. This free memory is utilize by the OS
for several purposes such as for allocating pages to user processes or for internal
operating system book keeping requirements. So, what we will see now is how this free
memory is managed by the operating system.
So essentially this free memory could be a large junk and it is split into pages of 4 kilo
byte granularity. Thus we would have several pages present in this particular free
memory region. Now, whenever required that is on demand a page would be allocated to
a particular calling function and used for it is requirements. So, after the usage the
particular page is freed.
Now, the next thing that we need to think about is how the operating system or that is
how the xv6 OS determines which page to be allocated and how the pages should be
freed? In order to do this the xv6 code maintains a link list of free pages. So, as you can
see here (figure in above slide), this particular figure has pages which are either blue or
yellow. So, this entire region corresponds to the free memory region in the RAM, the
124
blue region are the pages which is utilized by either a user process or by the operating
system itself while the yellow pages are the once which are free.
Now, all the free pages are linked together using a link list which is pointed to by a
pointer known as the freelist. So the freelist points to the head of the link list and all the
free pages are linked together with this list. Now in order to allocate a page the kalloc
function is utilized. The kalloc function would essentially remove a free page from the
list and assign that page to be used. In order to free a page the kfree function is used;
essentially the kfree function would add the free page back into the list.
So the link list is as shown over here (mentioned in above slide) where you have the
freelist pointer which points to the head of the list and then there are pointers to the
consecutive free pages. Now, one thing which is different from the standard link list is
that there is no exclusive memory to store the pointer to the next page. Note that each of
these pages is of 4KB and this 4KB free pages is not used for any other data thus the 0 th
location in this page contains the pointer to the next page.
In a similar way, the 0th location in this page would contain the pointer to the next page
and so on. Thus we are creating this list. So the freelist points to the first or the head
125
node of the list and then the 0th location of that node points to the consecutive page and
so on until we reach the end of the list. So, with that we come to the end of how xv6
manages memory. So, it is not the best management scheme that is possible, but it is a
representative of what several operating systems actually do to manage memory.
Thank you.
126
Week - 02
Lecture - 10
PC Booting
Hello. In this video, we will look at how a PC Boots, right up from you turn it on to the
time the operating system executes. Now this particular video is especially applicable for
Intel and AMD based platforms. So, one big thing which we should remember when we
are talking about the Intel platforms that we typically use in desktops or laptops is the
concept of backward compatibility.
So as we have seen in a previous lecture. So Intel maintained backward compatibility.

So, it ensures even today that any code which was developed on an Intel processor 20 or
30 years back which still execute on the an Intel processor are used today. So because of
backward compatibility a lot of things that we actually do while loading an operating
system would reflect on what was done 30 years back. Essentially, the things what
happened in and a system in around 95 or 97. That is when the 386 based processors
where used are still done today on the latest i7 processor from Intel. So, we will look at
how PC Boots.
127
Now, we all know that in order to start a computer, we need to press the reset button or
the start button present in the desktop. So, what actually happens internally is that when
you press this button it is going to send a signal to the CPU. The signal for instance
would be a pulse, these an electrical pulse which gets created when you press the start
button or the reset button. And this pulse is sent to a specific pin on the CPU known as
the reset pin and when the CPU obtains or gets this particular signal about the reset, it is
going to start booting. So let us see what are the various steps involved when the CPU
starts to boot?
128
So, we had seen the power on reset. Now when the power on reset comes and the CPU
detects it, what happens is that every CPU register which is present inside the CPU is
initialize to 0 except for 2 registers and these registers are the code segment and the IP.
Now when the reset occurs the code segment is set to the value of 0xf000 and the
instruction pointer is set to 0xfff0. So, if you recollect how an 8088 or an 8086 processor
computes its address, is going to take the code segment register in this case f000 shift it
by 4 bits and add the instruction pointer.
As a result the physical address or the address for the first instruction to be executed will
be present in 0xffff0. Now if you look up the RAM module, and which we had covered
in the previous videos, what we would see is that the memory address corresponding to
0xffff0 would be pertaining to the BIOS area. In fact, this particular memory location
0xfff0 is just 16 bytes below the 1 MB mark. So, this particular thing 0x100000 that is 1
MB is this particular point or this particular address and the first physical address that is
put on the address bus by the CPU is 0xffff0 which is 16 bytes below 1 MB mark. So, as
a result if you want to boot your system, it should be ensure that at 0xfff0 memory
location we have a valid instruction which is present. So, this point over here (mentioned
in above slide) is the first instruction that is present.
129
Now, another thing what happens is that soon as the power is reset, the processor in is set
to what is known as the real mode. So, in the real mode, the processor is in a backward
compatibility mode with the 8088 or the 8086. So, recollect that the 8088 or 8086
processor could address at most 1 MB. So, it could address at most 0x100000 and this is
shown as the green region in this RAM. There were other features of the real mode; they
were no protection, no privilege levels, direct access to all memory and no multi tasking.
So, the first thing that the instructions in this particular location 0xffff0 should do is, to
jump to a other location. So, essentially what would happen is that it would jump to a
location in the BIOS and this jump would trigger the BIOS to start to execute.
So next, we have that the BIOS ROM (mentioned in red circle) would begin to execute.
So, as we know the BIOS is the basic input output system and it is a read only memory
often these days it is in the form of a Flash or is E 2PROM and you would actually notice
a particular chip (mentioned in above slide) which is present on your system. So some
CPU’s also display that particular BIOS name while booting up. For example; this
particular chip is the AMIBIOS and it may get displayed by the system boots up.
So the BIOS present in this particular area of the RAM will begin to execute code in real
mode. So, the BIOS does the following; essentially 1 st does a power on self test where it
130
checks the system for correctness. It ensures that all parts of the system are working
properly. Next it initializes the video card and all other devices which are connected to
the system then, optionally it may display a BIOS screen on the monitor. Note that we
have initialized the video card, and therefor the monitor is activated and it is capable of
displaying things. So, the BIOS screen will now be able to display on the screen.
Then it performs what is known as a memory test and sometimes some of the BIOSs also
determine what memory is used and also the amount of memory that is present in the
system. After the memory test, some parameters are set. For example, these correspond
to DRAM parameters and the BIOS will ensure that various requirements of the DRAM
such as the frequency at which the DRAM capacitors are refreshed are set adequately.
Then plug and play devices are configured, in the sense that all devices which are plug
and play are going to be queried and the BIOS will then determine how much memory is
required for each of these devices, and these devices are then allocated memory in the
system. After that the BIOS with assign resources to DMA channels and the various
IRQ’s that is the interrupt request. From our perspective what is important is the next
step where the BIOS identifies the boot device, that is the device which most likely
would hold the operating system. It would read the sector 0 from that boot device into
the memory location 0x7c00.
So note that, 0x7c00 is a memory location in the low memory region of the RAM. So,
what it would do is, it would copy the sector 0 which is typically of 512 bytes from the
boot device which is typically the hard disk into the memory location 0x7c00. So, at the
location 0x7c00 we would have 512 bytes of code which would help in booting the
operating system. The BIOS then causes a jump to 0x7c00 what it means, is that the code
present in location 0x7c00 which is present in the low memory of the RAM will began to
execute.
131
Now this memory, the memory present in 0x7c00 and a copied from sector 0 of the hard
disk into the RAM, is known as the MBR or Master Boot Record. So, this particular code
is of 512 bytes out of which 446 bytes are instructions and contain bootable code - about
how to boot the system. There are 64 bytes which have information about the various
partitions that are present on the disk. Essentially the 64 bytes are divided into 16 bytes
per partition and then there are 2 bytes of a signature which is used to identify whether
this is in fact an MBR code.
So, this code begins to execute from the location 0x7c00 present in the RAM and what it
typically does is that it is going to look into the partition table which is present, and it is
going to try to boot the operating system. So, essentially in order to do this, it first loads
what is known as the boot loader of the operating system. So, each operating system
would have its own boot loader. For instance LINUX would have its own boot loader or
windows would have its own boot loader and so on. And, optionally it may directly load
the operating system by itself. So, we will see what happens in the boot loader.
132
So, after the MBR executes, the boot loader would executes. So, the main job of the boot
loader is that it loads the operating system. So, it optionally like some operating systems
that we see today, it may give an option to the user to select what operating system to
load. The other jobs that are done by the boot loader is to disable interrupts, set up the
GDT switch from real mode to protected mode and read the operating system from the
disk into the RAM. So, these are things which are done by the xv6 operating system. So,
there may be slight variations when we go from one boot loader for one operating system
to another operating system.
Sometimes what may happen is that we do not have this MBR code present at all. In
such a case the BIOS or rather in such a case the boot loader itself is present in the sector
0 of the hard disk and the BIOS will load the boot loader into the location 0x7c00 and
jump to the boot loader. Essentially what is happening is we are skipping this particular
MBR execution. So, once the boot loader executes and sets up the processor and the
GDT and switching from real mode to protected mode, it would load the operating
system from the disk.
133
Now the protected mode is a 32 bit mode, essentially where we extent the memory
region that can be accessed from 1 MB to the entire region of 4 Giga Bytes. So, we will
not go into more details about how this particular protected mode is activated so on.
So, once the boot loader loads the operating system, it then transfers control into the
operating system. And the operating system does several things, like it sets up virtual
memory. So, this includes setting up page directories and page tables so on. It initializes
interrupt vectors and the IDT interrupt descriptor tables and the other aspects pertaining
to interrupts. Then it initializes various devices present in the system like timers,
monitors, hard disks, consoles, file systems and so on.
Then it may also initialize other processes if they are present, and finally it would startup
the first user process. So, in a future video, we will see what the first user process is? So
this particular user process (last point in above slide) is the first process that executes in
user’s space. So, you may recollect that all of this (all points mentioned in square box)
executes in the operating system and essentially this executes in the Kernel Space and it
is only at this particular point during the boot up sequence, will the user process start to
execute.
134
So, after this, what is expected is that this first user process will spon various jobs or user
process jobs, various DRAMs and so on; and one of its jobs is to create a shell. So, this
shell would be then used by the user, to run various programs and commands and so on.
So, we will now look at systems which have a multiprocessor present in them. So, as we
have seen in a previous video, so in the Intel type of architecture which have a
multiprocessor present. So, the all processors share a front side bus and on the front side
bus there is a chip set or the north bridge which interfaces with the memory bus. So,
essentially in this Intel type of architecture we have memory symmetry. So, what this
means is that, all processors in the system share the same memory space.
Essentially in order to access a particular DRAM location, all processors would need to
send the same address to the DRAM, and the advantage of having such a symmetric
view of the memory is that we can have a common operating system code which could
execute in any of these processors. Similarly, there is what is known as the I/O
symmetry. Essentially what this means is that all processors share the same I/O
subsystem; essentially all processors can receive interrupts from any I/O device.
135
Now, in order to boot a multiprocessor system, what is generally done is that one
processor is designated as the ‘Boot Processor’ or the BSP. So, this designation is done
either by setting a particular signal in the hardware or by the BIOS itself, and all other
processors are designated as ‘Application Processors’. So, when the system is powered
on, it is only the boot processor which begins to execute. So, the BIOS will execute in
the boot processor and that is the BSP, and the BSP then learns about the system
configuration.
It determines how many other APs are there, that is how many other application
processors are present in the system and then it triggers the booting of the application
processor. So, after it does all the required initialization it would trigger the boot of the
application processors. So, this triggering of the boot is done by something known as the
startup IPI or the startup inter processor interrupt.
This is a signal from the BSP that is the boot processor to the application processor. So,
when the application processors see this signal they will begin to boot and of course,
they identify that they are not the main BSP, but rather the application processor. So, they
skip various aspects such as initializing the various devices present in the system and so
on. In this video we had seen how the CPU boots right from the time the power on reset
136
is provided to the processor to the point when the operating system begins to execute and
spons the first user process.
In the later part of this course we would see various aspects about how the operating
system manages memory and manages different processes which are running in the
system.
Thank you.
137
Week - 03
Lecture - 11
Operating Systems (Processes)
Hello and welcome to this video. So in this video we will look at Processes, which is
possibly the most crucial part of operating systems. So, processes as we know is a
program in execution. So we will see today how operating systems manage processes.
So let us start with this now famous example of printing "Hello world" on to a screen.
So, when compiled with $gcc.hello.c it creates an executable a.out. When $a.out is
executed a process is created. Part of this process will be in the RAM and it is identified
by a virtual address map. So the virtual address map is a sequence of contiguous
addressable memory locations starting from 0 to a limit of MAX_SIZE. So, within this
virtual address map we have various aspects of the process including the instructions,
global and static data, heap, as well as the stack.
138
So as we have seen in an earlier video, the virtual address space or virtual address map of
a process is divided into equally sized blocks. So typically the size of each block is 4KB.
Again there is each process would also have a process page table in memory which maps
each block of the process into a corresponding page frame. The RAM, as we have seen is
divided into page frames of size 4 KB similar to the block size. And these page frames
contain the actual code and data of the process which is being executed.
Now we have seen these in a previous video. But the question which we need to ask is;
where does the operating system or where does the kernel reside in this entire scheme.
So as we know the kernel is an other software and has to be present in the RAM to
execute. Thus, in most operating systems such as Linux as well as in the operating
systems which we are studying that is xv6, the kernel resides in the lower part of the
memory starting from page frames 1, 2, 3, and so on.
Just like every other page frame the kernel two is divided into page frames of equal
sized. Now since we are using the virtual addressing in the system, the page frames
corresponding to the kernel are mapped into the virtual address space of the process. So,
the kernel code and data are present above the MAX_SIZE and below a limit known as
Max Limit. Now, again in the process’s page table there are entries corresponding to this
139
map. For instance 7, 8, 9, and 10 corresponding to the blocks that have the kernel code
and data and the page table tells us that they are mapped into page frames 1, 2, 3, and 4.
Now, we could divide this particular virtual address space into two components. One is
the user space which corresponds to the blue area which contains the user processes,
code, data, and other segments such as the stack and heap. Again, there is the kernel
space which corresponds to the kernel code data and other aspects of the kernel.
So, the MAX_SIZE defines the boundary between the user space and the kernel space. A
user program can only access any code or data present in this user space. The user
program cannot access anything in the kernel space. On the other hand, the Kernel can
access code as well as data in both the kernel space as well as the user space. So, this
prevents the user space programs from maliciously modifying data or modifying kernel
structures.
140
So, another thing to notice is that there is a contiguous mapping between the kernel
addresses in the virtual space of the process to the corresponding physical frames in
which the kernel gets mapped into. For instance, the kernel blocks 7, 8, 9, and 10 get
mapped into the contiguous page frames 1, 2, 3, and 4. So, why is this contiguous
mapping actually used? So, one most important aspect is that given this contiguous
mapping it is easy for the kernel to make conversions from virtual address to physical
address and vice versa.
For instance, to convert from virtual address in the kernel space to the corresponding
physical address in the page frames of the kernel a simple subtraction by MAX_SIZE
would do the trick. For instance, in xv6 where the MAX_SIZE is defined as
0x80000000, a virtual address of 0x80124345 can be converted to the corresponding
physical address by subtracting the MAX_SIZE. So, the physical address would be
simply written as 0x00124345.
Similarly, a physical address corresponding in the kernel code and a data in the kernel
page frames can be converted to the corresponding virtual address in the kernel space by
adding MAX_SIZE. For example, in this case the physical address 0x00124345 can be
141
converted to the corresponding virtual address in the kernel space by adding the
MAX_SIZE to get - 0x80124345.
So, what happens when we have multiple processes in the system? The kernel space is
mapped identically in all virtual address spaces of every process. For instance, above
MAX_SIZE and below Max Limit the kernel space is present in all processes. Similarly,
the page table in each process also has an identical mapping between the kernel page
tables and the corresponding page frames that the kernel occupies, as can be seen (in
above mentioned slide) in these few entries as well as these few entries.
Now one thing to be noticed is that, all though the virtual address space of each process
has different entries for the kernel. However all processes eventually map their kernel
space into the same page frames in the RAM. So, what this means is that, we have just a
single copy of the kernel present in the RAM. However, there can be multiple identical
entries in each processes page table corresponding to the kernel code and data.
142
Now that we have seen where the kernel exist in the RAM as well as where it gets map
to in the virtual address space of each process, now we will look at what metadata the
kernel has corresponding to each process that runs in the system.
So, each process in the system has 3 metadata known as; the process control block, a
kernel stack for that user process and the corresponding page table for that user process.
So, each process that runs in the system will have these three blocks that is unique for
that process. So we have already seen a page tables map the virtual addressable space of
that user process to the corresponding page frames that the process occupies. Now we
will look at the other 2 metadata.
143
So, we have learnt that corresponding to each process there are various segments, and
one important segment is the stack of the process. So this process stack present in the
user space (mentioned in above slide) would have information such as the local variables
and also information about function calls. So this we will now call as the user space
stack. In addition to the user space stack each process will also have something known as
the kernel space stack or the kernel stack for that process. So this kernel stack is used
when the kernel executes in the context of a process. For instance, when the process
executes a system call it results in some kernel code executing and these kernel code
would use the kernel stack for it is local variables as well as function calls.
Also this kernel stack is use for many other important aspects such as to store the context
of a process. In addition to the standard use of the stack such as for local and auto
variables as well as for function calls, the kernel stacks plays a crucial role in storing the
context of a process which would allow the process to restart after periods of time. So,
why do we have two separate stacks? Why do we have a user stack for the process as
well as the kernel stack? The advantage that we achieve is that the kernel can execute
even if the user stack is corrupted. So attacks that target the stack, such as buffer
overflow attack will not affect the kernel in such a case.
144
So, let us look at some of the important components in the PCB. So this particular
structure is taken from the xv6 operating systems PCB which is defined as struct proc.
Some of the important elements or aspects of this particular structure is ‘sz’, which is the
size of the process memory, ‘pgdir’ which is a pointer to the page directory for the
process. ’kstack’, which is a pointer to the kernel stack which we have defined a few
slides earlier.
And there are other aspects such as, a list of files that are opened by the process, the
current working directory of the process, and the executable name; for instance, a.out in
our example. So, we will look at some of these other parameters in the forth coming
slides.
So an important entry in the PCB corresponding to each process is the PID or Process
Identifier. So this is an identifier for the process essentially defined as an integer and
each process would have a unique PID. So typically, the number would be incremented
sequentially in such a manner that when a process is created it gets a unique number.
145
An other very important aspect in the PCB is the state of the process. So from the time
process is created to the time it exists it moves through several states. Such as the NEW,
READY, BLOCK state, or RUNNING state. The xv6 calls these states by different
names, such as the NEW is called the Embryo which means that a new process is
currently being created, while READY is known as the Runnable which means it is
ready to run, while the Sleeping is known as the BLOCK state and essentially blocked
for an I/O.
So, how and when does a process actually go from one state to another? So when a new
process is created it is initially in the state known as NEW. When it is ready to run the
state is moved to what is known as the READY state, and when it finally runs on the
processor it get shifted to the RUNNING state. After running for a while the process gets
preempted from the processor in order to allow other processes to run, and in such a case
it goes back from the running state to the ready state.
Now, suppose during the execution of the process there is some I/O operation that is
required. For instance the process could call invoke a scanner which requires the user to
enter something through the keyboard. In such a case the process would be moved from
a running state to a block state.
146
So, the process will remain in the block state until the event occurs. For instance when
the user enter something through the keyboard, so when this event occurs the process
moves from the block state back to the ready state. And this process of moving from one
state to another from ready to running, from running to back to ready or from running to
blocked and then ready keeps going on through the entire life cycle of the process. At the
end when the process exists or gets terminated it goes to what is known as an EXIT state,
it is not shown in this diagram. So, you could actually look up the xv6 code proc.h and
which will tell give you more information about the various states. So, what is this ready
state?
Operating systems maintain a queue of processes which are all in the ready state. When
an event such as a timer interrupt occurs, a module within the operating system known as
the CPU scheduler gets triggered. This CPU scheduler then scans through these the
queue of ready processes and selects one which then gets executed in the processor. This
selected process then changes it is state from ready to running. The running process
would continue to run until the next timer interrupt occurs, and the entire cycle repeats
itself.
147
Another entry in the PCB are pointers to what is known as a trapframe and context. So,
these trapframe as well as context are part of the kernel stack and as seen in this figure
(mentioned in above slide) they have a lot of information about the current state of the
running process. For instance, it would say the stack met segment, the stack pointer, the
flag register, the code segment, instruction pointer and so on. So, this particular
trapframe and context is used when a process is restarted after a context switch.
148
So, how are these various PCB stored in xv6? So, in xv6 a structure known as ptable is
defined. So this structure has an array of struct procs, so remember that struct procs is
actually the PCB structure in xv6. So the array has NPROC entries, where NPROC is
defined as 64. So, each process that was created in xv6 will have an entry in this
particular array. So, you could have more information about this particular structure by
looking at the xv6 code proc.c and the structure ptable. Also params.h is a file in xv6
which defines what NPROC is.
So, this gives us a brief introduction to how processes are managed in the operating
system. In the next video, we will look at how a process gets created, executes, and exits
from the system.
Thank you.
149
Week – 03
Lecture – 12
Create, Execute, and Exit from a Process
Hello, and welcome to this video. In this video, we will look at how to Create, Execute
and Exit from a Process.
In order to create a process, operating systems use a technique called Cloning. When a
process coincidence process 1 over here invokes a system call called fork, it results in the
kernel being triggered. The kernel then executes the fork system call in the kernel space
and creates what is known as a child process; here it is referred to as process 2. The child
process is an exact duplicate of the parent process. So, let us see how the fork system call
works with an example.
150
Let us say that we have written this particular program (mentioned in above image)
which invokes the fork system call ‘fork()’. The return from the fork system call is p and
p is defined as an integer, and it can take values of -1, 0 or something > 1. So, when this
particular program gets executed, the fork system call would trigger the operating system
to execute, and the OS would then create an exact replica of the parent process. So, this
would be the parent process (mentioned in above image) and the exact replica of this
process is known as the child process. The only difference between the parent process,
and the child process is that p in the parent process is a value which is the child’s PID.
So, this value is typically > 0; however, in the child process the value of p = 0.
So, when the OS completes its execution, both the parent as well as the newly formed
child would return from the fork system call with their different values of p. In the parent
process, the child’s PID is > 0, therefore the parent process would execute that is 3
statements below condition if (p > 0). However, in the child process, since the value of p
= 0 the else if part of that is this green statements would be executed (mentioned in
above image). Now if by chance the fork system call fails a value of -1 would be
returned which would result in this printf Error being printed onto the screen.
151
Now, let us see how this fork system call works inside the operating system. So, as we
have mentioned before, each process has 3 metadata stored in the kernel. So, one is the
Page Table, second is the Kernel Stack, and third is an entry in the proc structure which
is the process control block - PCB. So, when the fork system call executes, a copy of the
page table is made (that is Page Table in orange). So, this is (mentioned in above image)
corresponds to the page table of the new child that is being created.
Similarly, the kernel stack of the parent process is duplicated as the child’s kernel stack.
Further a new PCB entry is created corresponding to the child’s P C B entry. In order to
do this, 1st an unused PID needs to be found by the operating system. Then a state of the
newly formed process or the state of the child process is set to NEW. Then several
pointers are set in the PCB such as the pagetable pointer that is the page table pointer
will point to this particular page table (child page table), the kernel stack pointer would
point to this newly formed copied entry (child Kernel Stack), which are present within
this kernel stack. Other information such as the files opened, the size of the process, the
current working directory ‘cwd’ etc are also copied from the parents PCB into the child’s
PCB.
152
Finally just before returning, the operating system would set the state of the newly
created child to READY. So, at the start of execution of the fork, the child’s state is set to
NEW while towards the end of execution of the fork, the OS would set the state to
READY. So, why do we need this intermediate state NEW? So, what is NEW signify and
what does the READY signify. So, the NEW state indicates that the PID, the chosen PID
has already been taken, and the process is currently being created but not ready to run.
On the other hand, when the state is set to READY it means that the various metadata
within the kernel has been initialized, and the process is can be moved into the ready
queue and is ready to be executed.
153
So, with respect to the ready queue the new process would have an entry in the ready
queue and whenever the CPU scheduler gets triggered it may pick up this particular
process and change its state to from READY to RUNNING, and the new process will
begin to get executed within the processor.
Now one important difference between the parent process and the child process with
respect to the fork is that, in the parent process fork returns the child’s PID; while in the
child’s process the fork system call would return 0. So, how does the operating system
154
make this difference? So, essentially the return from the fork system call is stored as part
of the kernel stack. So, when the fork is executed, the OS or the operating system would
set the return value in the kernel stack of the parent process as the child’s PID.
Further in the kernel stack of the child’s process the return value is set to 0; thus when
the system call returns in each of these cases, each process the parent process as well as
the child process will get different values of the return type.
Now, one of the primary aspects while invoking fork is the duplication of the page table.
So, we have the parent page table over here (mentioned in image in blue), and when fork
is invoked a duplicate of this page table is created for the child (in orange). So, what does
this mean to have a duplicate page table? So, when we actually look at this particular
figure, we see that we have two page tables which are exactly the replica of each other
and have exactly the same entries. So, what this means is that, both the parent as well as
the child page table are pointing to the same page frames (mentioned in above image) in
the RAM. So let us look at this in more detail.
155
So we have the parent process with its virtual memory map, and the corresponding child
process with its own virtual memory map; and each of these processes have their own
page table. So, the Child process page table is an exact replica of the Parents process
page table. So, what it means is that block 1 in the parent as well as block 1 in the child
point to the same page frame that is 6.
Similarly, block 2 in the parent as well as block 2 in the child point to the same page
frame 13 as seen over here as well as in the child’s process page table as seen over here.
So, essentially what we are achieving over here (mentioned in above image) is that we
have two virtual memory maps one for the parent process and one for the child.
However, in RAM, we have just one copy of the code and data corresponding to the
parent process, both the parent as well as the child page tables point to the same page
frames in RAM.
156
So let us look at this with an example. Suppose, we have written this following program
which invokes the fork and fork returns a pid; and in the parent process, we sleep for
some time and then invoke printf which prints the value of I (mentioned in above image).
So, 'i' is defined as an integer which has the value of 23. Similarly, in the child process
which gets executed in this else part, we again print the value of i, now since the child is
an exact replica of the parent and the value of i in the parent as well as the child would
be the same. Thus when we execute this program, we get something; both the child as
well as the parent prints the value of i as 23.
157
Now, let us modify this particular example and see what happens when we add this
particular line (mentioned in above image). So, what we have done over here is that we
have incremented i i = i + 1; only in the child. Note that in the parent process, there is a
sleep(1) over here, so which means that we are giving sufficient time for this i = i + 1 to
be executed by the child and only then with this printf of the parent be executed.
So, what do you think would be the output of this particular program? So, one would
expect that the child would print the value of 24, since we have incremented i = i + 1, so
that is 23 + 1; and also the parent would also print the value of 24. However, when we
execute this program we get the child has the updated value of 24; however, the parent
still has the old value of 23. Now given that both the parent and child point to the same
page frames in the RAM. So, how is this possible? So, we will look into why such
phenomena is happening.
So, this phenomena occurs due to a technique known as Copy on Write that is
implemented in the operating systems. So, when a fork system call is executed in the OS,
all parent pages are initially marked as shared. The shared bit is present in both the
parent and child page tables. When data in any of the shared pages change, the OS
intercepts and makes a copy of that page. Thus the parent and child will have different
copies of that page and note that ‘this’ page is highlighted over here (mentioned in above
158
image), so that; that means, only that page would be different in the parent and child
while all other pages would still remain the same.
Let us see how Copy on Write works with our example. Let us say that the value of i is
stored in this particular page frame (mentioned in above image), which is pointed to by
the parent page table as well as the child page table. When the child process executes and
increments the value of i to i + 1 that is the value of 23 is incremented to 24 and the new
value is returned back, the OS would intercept the write back and create a new page for
the child (mentioned in below image).
So, the i of the child process would be present here (mentioned in above image in orange
color) and it would have the new updated value of 24 while the original i value of the
parent would be present over here (mentioned in above image in green color), and would
have the old value of i that is 23. Further, the corresponding page entry in the child’s
page table would then be updated. So, what is the advantage of COW (Copy of Write),
we will see in a later slide. Now that we have seen how to clone a program, we shall now
look how to execute a completely new program.
159
Executing a new program comprises of 2 steps; 1 st a fork() system call needs to be

invoked which would result in a child process created which is an exact replica of the
parent process, and then an exec() system call needs to be invoked which causes the new
program to be executed. So, let us take this particular small C code (mentioned in above
slide). So, what we see is that initially a fork system call is invoked which should return
a PID value and create a copy of the parent process called the child process.
Now in the child process, the PID has a value 0 therefore, execution will enter this else
part. Now in the child process, we invoke a system call exec in this case it is a variant of
exec known as execlp() and pass several arguments, of these the most important one of
course is the executable file name. In our case, we are trying to execute the file a.out. So,
when this exec system call is executed it triggers the operating system functionality. So,
what the OS does is it finds on the hard disk the exact location of the – a.out executable
then it loads on demand the pages required to execute a.out.
160
So let us actually look at this more pictorial. So, what happens when we invoke the fork?
So, fork as we know would create a child process, which is an exact replica of the parent.
So, what we have seen is the child has its own page tables, but all entries in the page
table identically map to the same page frames as the parent. Now let us see what happens
when the exec system call gets invoked. So, when the exec system call gets invoke, the
operating system would find the program or the executable location in the hard disk and
load blocks from the hard disk into page frames on demand.
Thus for instance, when we start to execute the new child process, it would load for
instance the first block of the new child; and then on demand whenever required new
blocks would be loaded into the RAM. Subsequently, the child’s process’s page table
would also be updated as required. So, we see 2 things that occur, first we notice that
whenever a new block or whenever the child program that is being executed has a new
functionality a new page frame gets allocated to the child process (mentioned in above
image in orange color). However, the functionality which is common to both the parent
as well as the child still share the common pages (mentioned in above image in blue
color).
161
So let us see what the advantage of Copy on Write is. So, as we know most programs are
written with a large number of functionality which is common. For instance, many
programs we know would have would use printf or scanf or standard library function
calls like freadf(), fscanf() and fprintf() and so on. Since, a new process is created from a
parent by first cloning the parent and then executing the new program, thus a lot of the
functionality of the parent will also be present in the child process. Now since pages are
replaced only on demand in the child process, the common functionality or the common
code which is present in both the parent as well as the child process is still shared
between the two processes.
For instance, printf() which is present in the parent process and the printf() which is
present in the child process, still point to exactly the same page frame in the RAM or in
the physical memory. So, what this means is that although you may have like 100
different processes running on your system, and all these processes may use a common
function such as printf(), in RAM as such there would be only one copy of printf()
present. All processes would then use their page table to point to this particular page
frame, which contains the printf().
162
So we have seen that creating a new process first requires a fork, and which is then
followed to by an exec system call. So, every process in the system is created in such a
way. Therefore we get a tree like structure where you have a root node known as init.d,
and every subsequent node represents a process running in the system and is created
from a previous process. For instance, the process compiz (mentioned in above image
last circle) is the child process of gnome-session. Gnome-session in turn is a child
process of init and so on.
So, eventually we reach the root process which in this case is known as inti.d. If you are
interested, you can actually look up or execute this particular command that is pstree
from your bash prompt which lists all the processes in your system in a tree like
structure. So, we have seeing that every process in the system that is executing has a
parent process. So, what about the first process? So, it is the only process in the system,
which does not have a parent. So, who creates this process?
163
The first process in xv6, the first process is present in initcode.S is unlike the other
processes, because this particular process is created by the kernel itself during the
booting process. So, in Unix operating systems, the first process is known is present in
/sbin/init. So, when you turn on your system, and the operating system starts to boot, it
initializes various components in your system and finally, it creates the first user process
which is executed. So, this is sometimes known as a Super parent, and its main task is to
actually fork other processes. Typically in Linux the first process would start several
scripts present in /etc/init.d.
164
Now let us look at the code init.c, which is part of xv6. So, essentially this particular
code forks a process and creates a shell (sh). So, this is a snippet of the code in particular
I would like you to notice this particular for loop for( ; ; ), which is an infinite for loop.
So, it runs infinitely; its only task is to fork a process creating a child process; and in the
child process (inside if condition), it runs exec(“sh”, argv); which is a shell with some
argument. Then it waits until that particular forked process completes, and then this thing
continues forever.
Now that we have seen how processes are cloned, how a new program is executed. Let
us see how to terminate a process.
165
A process gets terminated by something known as an exit call. So, for instance, let us go
back to our example (mentioned in above image). And in the child process, we actually
run the executable a.out or execute, which then gets executed and this is followed by an
exit(0) (mentioned in red circle). So, this exit is invoked in the child process, and the
parameter 0 is a status which is passed on to the parent process. So, this particular way of
terminating a process is known as a voluntary termination.
166
In addition to the voluntary termination, there is also something known as an Involuntary
termination. In such a termination, the process is terminated forcefully. So, a system call
which actually does that is the kill system call kill(pid, signal), which takes two
parameters pid that is the pid of the process which needs to be killed and a signal. A
signal is an asynchronous message which is sent from the operating system or by another
process running in the system. There are various types of signals such as SIGTERM,
SIGQUIT, SIGINT and SIGHUP. When a process receives a signal such as SIGQUIT,
the process is going to terminate irrespective of what operation is being done.
So, another important system call with respect to process termination is the Wait system
call. The wait system call is invoked by the parent; it causes the parent to go to a blocked
state until one of its children exits. If the parent does not have any children running then
a -1 is returned. So, let us go back to our example over here (mentioned in above image),
where the parent has forked a child process; and in the parent process, it obtains a PID
which is equal to the child process’s PID.
So, this would result in the parent actually executing the statement wait(); and it would
cause the parent process to be blocked until the child process has exited. When the child
process exits i.e exit(0), it would cause the parent process to wake up and the wait
function to return with a return value of pid, which is the child process’s pid. So, in order
to obtain the return status of the child that is in this particular example 0, the parent
167
process can invoke a variant of wait i.e wait(&status). So, in this particular variant, a
pointer is passed to status in which the operating system will put the exit status of the
child process.
When a process terminates, it becomes what is known as a Zombie or a defunct process.

What is so special about a zombie is that particular process is no longer executing;
however, the process control block(PCB) in the operating system will still continue to
exist. So, why do we have this concept of zombies in operating system? So, zombies are
present, so that the parent process can read the child’s exit status through the wait system
call. When a particular program exits, its exit status is stored in the PCB present in the
operating system. So, when the wait system call is invoked by the parent process, the
PCB of the exiting child process would be read and its exit status would be taken from
there. When the wait system call actually is invoked by the parent, and the extra zombie
entries present in the OS will be removed. So, this is known as the process reaped.
So, what happens if the parent does not read the child’s status? In such a case, it will
result in a resource leakage and the zombie entries in the operating system will continue
to exist infinitely. So, these are eventually removed by a special process in the OS known
as the Reaper process. The reaper process periodically runs and recovers any such
zombie process states present in the OS.
168
When a parent process terminates before its child, it results in what is known as an
Orphan. For instance, let us consider this as the process tree (mentioned in above image)
and in particular this is a process with a parent over here (crossed circle). Now suppose
this particular parent exits while the child continues to execute, then the child is known
as an orphan (circle below crossed circle). In such a case, the first process of the
/sbin/init process will adopt the orphan child.
169
There are 2 types of orphans; one is the Unintentional orphan which occurs when the
parent crashes. The other is the Intentional orphan sometimes called daemons. So, an
intentional orphan or a daemon, are processes which become detached from the user
session and run in the background. So, these are typically used to run background
services.
So now let us look at the exit() system call from an operating system perspective. First
we notice that the sbin/init that is the first process in the system can never exit. For all
other processes, they can exit by either invoking the exit system call or can exit
involuntarily by a signal. So, when such a process exits, the operating system will do the
following operations. First, it would decrement the usage count of all open files. Now if
the usage count happens to be 0, then the file is closed. Then it would wake up parent
that is if the parent is currently in a sleeping state then the parent would be made in a
runnable state. So, remember that runnable state is also known as the ready state.
So, why do we need to wake up the parent? We need to wake up the parent, because the
parent may be waiting for the child due to the wait system call. So, we need to wake up
the parent, so that the parent could continue running. Then for all the children that the
exiting process has, the operating system will make the init process adopt all the
children.
170
And lastly, it would set the exiting process into the zombie state. So, note that certain
aspects of the process such as the page directory and the kernel stack are not de-allocated
during the exit. These metadata in the operating system are de-allocated by the parent
process, allowing the parent to debug crashed children that is suppose the particular
process has crashed, the page directory and the kernel stack will continue to be present in
the operating system. Thus, allowing the parent process to read the contents of the
crashed child’s page directory and kernel stack, thus allowing debugging to happen.
Now, let us see about the internals of the wait() system call, so this particular flowchart
in done with respect to the xv6 operating system. When a parent process in the user
space invokes the wait system call, these following operations in green (mentioned in
above image), occurs in the operating system. So, first there is a loop i.e “if p is a child”
which iterates to every process in the p table and checks whether the process p is a child
of the current process. So, if it is not a child, then you take the next process in the p table;
however, if it happens to be a child, then we do an additional check to find out if it is a
zombie i.e “if p is a zombie”.
So, if the child is not a zombie then the parent process will sleep; however, if the child is
a zombie, then we de-allocate the kernel stack and free the page directory, and the wait
system call will return the pid of p (process). So, in this video, we had seen about the
171
process, how it is created, how it exits and about system calls such as the wait(), fork(),
exec(), exit() system call.
Thank you.
172
Week – 03
Lecture – 13
System Calls for Process Management in xv6
In a previous video, we had seen about several important system calls related to process
management. So, we had seen these system calls such as the fork(), exec(), wait() and
exit(). So in this video, we will look at how these system calls are implemented within
the operating system.
So in particular, we will see about how the system calls are implemented in the OS xv6.
So, we will be referring the xv6 source code, which can be downloaded in the form of a
booklet from this particular website (mentioned in above image). Please download the
revision number 8, so that it matches with the particular video.
173
So let us start with the fork system call. In the previous video, we had seen that when the
fork() system call gets invoked it creates a child process. The return value of fork that is
pid over here will have a value of 0 in the child process; as a result, these green lines are
what is executed exclusively by the child process (mentioned in above image). While in
the parent process, the value returned by fork will be greater than 0 essentially the value
returned by fork will be the child process’s pid value, thus these purple lines are what is
going to be executed by the parent process.
174
Inside the operating system, the fork essentially creates a new process control block and
fills it. Recollect that in xv6, the process control block or PCB is defined by a struct proc
as shown over here (mentioned in above slide). So, essentially what fork does is that it is
going to fill up the various elements of the struct proc corresponding to the new child
process created. Recollect also that there is a ptable that is defined in the xv6 operating
system. The ptable essentially contains an array of procs, the size of the array is NPROC
which is the total number of processes get that, can run at a single time in the xv6 OS.
So, every process is allocated one entry in this particular ptable.
The implementation of fork in xv6 is as shown over here. So, this can be seen in the
source code (booklet downloaded from website) listing in line number 2554. So the first
thing you would notice over here is that we are declaring a pointer called ‘np’ which is
defined as a struct proc *np;. So, this is a pointer corresponding to the new process that is
been created or rather the new child process.
Now the first step that fork does is to invoke allocproc(). So, what allocproc function
does is that it is going to parse through the ptable and find a proc structure which is
unused; once this proc structure is found, then it is going to set the state as EMBRYO.
So, recollect that in xv6 EMBRYO means a new process which is not yet ready to be
executed, also which is set is the pid for the new child process. Other things which are
175
done in allocproc is the allocation of a kernel stack and filling the kernel stack of the new
child process with a things like the trapframe pointer, the trapret as well as the context.
The next step in the fork implementation is this, call to this function called copyuvm().
So, essentially what the copyuvm function does is that it copies the page directory from
the parent process to the child process. This copyuvm takes two parameters; it take the
parent page directory which is represented by proc -> pgdir; and proc -> sz, the parent
size that is represented by ‘sz’, and what is returned by this function is a pointer to the
new processes page directory i.e np->pgdir.
Now we will see this particular function in detail in a later slide, but for now notice that
if this function fails then everything is reverted. First, the kernel stack which has been
allocated previously in allocproc gets freed i.e kfree(np->kstack), and the value of np-
>stack = 0; and the state is set back set to UNUSED again i.e np->state = UNUSED;.
Remember that allocproc had set the state to EMBRYO, now over here the state is set to
unused. Then fork is going to return with -1. So, this -1 is sent back to the user process
that is the process which had invoked forked.
176
The next step in the fork implementation is to copy some of the parameters of the parent
onto the child; of these steps the most important one is the third one where in the entire
trapframe of the parent process is copied onto the trapframe of the child process. Now
recollect that the trapframe is used or rather recollect that a trapframe is created
whenever a hardware interrupt occurs or a system call gets invoked.
If we go back pid = fork(); when the fork system call gets invoked, it triggers the
operating system to execute as well as its going to create a trapframe for this process in
177
the kernel stack. Now the trapframe is used, so that when the fork system call completes
executing in the operating system it will return back to this point i.e pid.
Now by copying the entire trapframe of the parent process to the child process, it will
allow that the child process also continues to execute from this point i.e pid = fork(); thus
when fork returns in the child process, the child process will execute from this point i.e
if(pid > 0). Also, recollect that the difference between the return types of the parent
process as well as the child process is that the pid value in the child process is 0; while in
the parent process, it has a value that is > 0. So, we will next see how this is achieved.
So essentially in the fork implementation, we see that in the trapframe of the new process
the value of eax is set to 0 (statement marked with blue arrow in above image). So, this
will ensure that in the child process, the return value of fork is set to 0. We will see later
how the return value in the parent process is set to the child’s pid.
178
In this part of the fork implementation (statements mentioned in blue parenthesis), other
things are copied from the parent process onto the child process. Other things include the
executable name, cwd that is the current working directory and copy of file pointers from
the parent.
Now that the proc structure for the child process is completely filled, the state is
switched from EMBRYO to RUNNABLE. So, setting the state to runnable would i.e np
-> state = RUNNABLE imply that a scheduler could actually select this child process
179
and allocate it to CPU. Thus the child process would be able to run and execute code on
the CPU.
The return from the fork implementation is pid, the value of pid is set over here i.e pid =
np -> pid (mentioned in above image). So, essentially the pid is np -> pid that this is the
child process’s pid value. And this pid value i.e return pid; goes as the return to the fork
in the parent process. Thus, in the parent process in the user space, fork would return
with the pid value of the child process, while as we have seen in this line over here i.e np
-> tf ->eax = 0; in the child process, fork would return with the value of 0.
180
When it comes to the CPU registers, all register values in the child process is exactly
identical to that of the parent process except for 2 registers. As we have seen before, one
is the %eax register. So, in the child process the eax register is set to 0 i.e %eax = 0 so
that when fork returns, it returns with the value of 0 in the child process. And other thing
which is changed in the child process is the %eip or the instruction pointer. The
instruction pointer is set to forkret i.e %eip = forkret which is a function which is
exclusively executed by the child process and not by the parent. So, forkret is a function
in the xv6 code.
181
Now, we will recall the exit system call internals. So, when a exit system call gets
executed, these are the 6 things (mentioned in above image) which occur inside the
operating system. Essentially, there would be a decrement in the usage count of all the
open files. Further, if the usage count goes to 0 then the file is closed. 2 nd , there is a drop
in reference for all in-memory inodes.
3rd, there is a wakeup signal sent to the parent process; essentially, if the parent state is
sleeping then the parent is made runnable. Why is this needed? Essentially this is needed
because a parent may be sleeping due to a wait system call and therefore, making it
runnable would ensure that the wait system call becomes unblocked. The 4th point is that
the exiting process will make init, recollect that init is the first process ever created by
the OS. So the init process is made to adopt all the children of the exiting process, and
the exiting process is going to be set to a state called a ZOMBIE state.
This setting of state to ZOMBIE is used, so that the parent process would then determine
that one of its child processes is exiting. So, we will see more on this in the wait system
call. And lastly it is going to force a context switch in the scheduler.
We will next see the wait system call. Recollect that when the wait() system call is
invoked in the parent, it is going to be blocked until one of its child process exits. So, if
no child exits then it will continue to be blocked. On the other hand, if the parent process
182
has no child at all, then it will return with -1. Now we will see the internals of the wait
system call.
So this is the implementation of the wait system call in the xv6 operating system. So,
this listing is obtained from the proc.c file of the xv6 source code. Essentially, you would
see that it has an infinite loop for( ; ; )which starts from 4 th line (mentioned in above
image) and ends at last 2nd bracket. So, within this particular infinite loop, there is an
inner for loop starts from 6th line and ends with bracket. So, essentially this inner for loop
parses through the ptable. So, recollect that the ptable is an array of procs; and each and
every process which is in some state in the xv6 OS has an entry in the ptable. So, by
parsing through all elements of the ptable, this particular loop (for loop in the code) will
be able to check every process that is present in the xv6 OS in this particular instant.
So the first check that it does is to find out whether the current proc is the parent of this
particular entry in the ptable, which is p (mentioned in above image). So, it is going to
find out if the current proc which has invoked the wait is the parent of p ( if(p->parent !=
proc). If it is not the parent then it just continues; otherwise it comes over here i.e
havekids = 1. So, at this particular point, in the implementation, we are ensure that the p
is a child of the process which has invoked the fork.
The next check is to determine the state of p i.e if(p->state == ZOMBIE). So, if the state
happens to be ZOMBIE, it indicates that the child process has exited, and therefore, it
183
will enter this particular if condition and it will do various freeing such as it is going to
free the kernel stack i.e kfree(p->kstack), set the state to unused i.e p->state = UNUSED,
set pid to 0 i.e p->pid = 0, and so on. The return is this point i.e return pid, so this would
result in the break of the infinite loop and it will result in the wait to exit and the return
would be the pid; essentially pid is the child process’s pid value.
So, you may have noticed 2 things; 1 st the kernel stack of the exiting child is cleared or
rather is freed at this particular point i.e kfree(p->kstack), 2nd the page directory
corresponding to the exiting child is freed at this particular statement i.e freevm(p-
>pgdir). So, freeing this child processes stack as well as page directory would allow the
parent process to peek into the exited child’s process. So, this enables better debugging
facilities of the child process.
For instance, if the child happens to have crashed then the parent process could look up
the stack as well as the page directory, and therefore into the physical pages of the child
process and we will be able to get information about why the child process has crashed.
184
Next during the entire for loop, if we have found no children for the particular process
then we just return -1. So, wait will again return to the user process, but with a value of
-1.
So, execution comes to this particular line that is the sleep (mentioned in above image),
when we have a child process which is not in a ZOMBIE state. So, in such a case this
wait statement is going to sleep until it is woken up by an exiting child.
185
So we will now look at the internals of the exec system call. Recollect that the exec
system call would load a program into memory and then execute it. In this particular
code distinct (mentioned in above image), the parent process would invoke the fork
system call which would result in a child process being created. The child process in this
particular example would execute the exec system call; and as a result, it would cause a
new program to be executed in the system. In this particular example, the new program is
the 'ls' program, which lists all the files in the current directory.
Now, 'ls' is an executable. It has a current a particular format known as the ELF format or
Executable Linker Format. So, this particular format is what is interpreted internally by
the exec system call within the operating system and this format is essentially understood
and used to load ‘ls' from the hard disk into the RAM.
186
Let us see what actually is present in the ELF format. So, every time we actually compile
or link a program, it creates an ELF executable or an ELF object. For instance, with the
example that we took that is - /bin/ls, this is an ELF executable. So, it has a format has
shown over here (mentioned in above image in blue box), so at least this is part of the
format, and more details about the format can be obtained from these particular
references (mentioned below in above image). What we will do now is we will go
through some important components of this ELF executable, and see how this is going to
be useful for us that is, from an OS perspective.
187
So, we will start with the ELF header. The ELF header contains various parameters and
these are just a few of the parameters which are present (mentioned in above image). The
ELF header starts with an Identifier. So, this identifier essentially is a magic number
which is used to identify whether this file is indeed an ELF file. 2 nd, there is the type so a
type would tell that the type of this ELF file, so the type would have a value of
executable or relocatable object or shared object or core file and so on.
Another important entry is the Machine Details. So it would tell information about
whether this ELF executable or the ELF object could run on a certain machine. For
instance, ELF this machine details could have values such as i386, x86_64, ARM, MIPS
and so on. Then we have an Entry value in the ELF header, this entry value will tell the
virtual address at which the program should begin to execute. So, beside this we have a
pointer to the program header, number of program headers which are present, pointer to
section headers and the number of section headers which are present. So, we will see
more about this in the later slides.
So let us first start with an example. So let us take this particular example of our hello
world program written in C, and we compile it with the - c option. So, when you say
$gcc hello.c -c, it creates the object file hello.o. So, this is an ELF object, we then use a
utility readelf with an option - h with the object file hello.o. So the - h option would print
the ELF header. So, this is the ELF header (mentioned in above image), so it has a magic
188
number which essentially is the identifier and it is used to distinguish this particular
object file from any other file. So, it is used to say that this particular object file is indeed
an ELF file.
Then another thing which we have seen is the Type of the ELF header which it could be
for instance a relocatable object and so on. So, in this particular example (mentioned in
above image), since we used a dot using a hello.o that is a object file, it is a relocatable
object file and you can actually see it over here REL which is a relocatable object file.
And another thing is the machine details, which over here (mentioned in above image)
which specified as Machine in this case it is the AMD x86-64. So, which indicates that
this object file is for x86-64 bit machines, then we have seen the Entry point in this case
is 0. So, other things we have seen is the Start of the program headers which is 0 (bytes
into the file), this one (refer ELF header image) pointer to program headers. And we have
seen the Number of program headers in this particular object file is 0. In the other two
aspects are the pointer to the section headers and the number of section headers (refer
ELF header image). So the pointer to section headers is the Start of section headers it is
368, and the Number of section headers is 13. So, in this way, we can actually see the
various contents of the ELF header.
Now, we will look more into detail about the Section Headers, so in order to get a listing
of the section headers you could use $readelf –S hello.o, which will print all the section
189
headers present in the relocatable object hello.o. So, this is actually shown over here
(mentioned in above image in square box). So, we will not go into too much details of
this because not much is applicable for us, but just give an explanation about the various
columns. So, this particular column that is the 2 nd column over here is the name of the
sections, while the 3rd column gives you the type of the sections.
So the type could be one of these (refer first square in above image), these are the
PROGBITS which is information defined by the program, SYMTAB which is a symbol
table, NULL or NOBITSs essentially is the section which occupies no bits, and RELA
which is a relocatable table. Then we have here the address (refer second square in above
image), which is a virtual address for where the section should be loaded. In this
particular case, it is all 0’s because it is a .o file, it is an object file and it could be
relocated. Then here you have the offset and the size of the sections (refer third square in
above image), while here (refer last square in above image) you have a table size if there
is a table then you have a non-zero value; however, if no table is present then you have a
0 value.
Next, we will look at the Program Header Contents So, program header also has several
parameters such as the type of the program header, the offset, the virtual address
essentially the virtual address where the segment needs to be loaded, and you have a p
address offset which is essentially ignored.
190
So, in order to get the program headers for our hello world program, so we use readelf –l
hello. So note that, we are giving the executable over here and not the object. So these
are the parameters which are printed related to the program headers. So, essentially we
have the type of the header, the offset, the virtual address where it needs to be loaded,
physical address, the memory size and the flags.
Now that we have some idea about ELF executable files and exec elf objects. Essentially
how they are stored and the various components in the ELF images, we are equipped to
191
see how exec system call is implemented in the xv6 OS. So, this is actually listed over
here (refer above image). And you could also look it up in the xv6 source code in this
particular file, that exec.c. So, also shown in the slide is the virtual memory map. So,
recollect that the exec system call gets invoked by the child process.
So first, there is a parent process which forks and in the child process, the exec system
call that gets invoked. So, when the child process is created by forking, there is a virtual
memory map which is created for the child process. And we as we have seen the top half
or the top area regions of this virtual memory map comprises of the kernel code, while
the lowest half comprises of the user code and data. Also we have seen that, during the
process of forking there is the kernel stack which gets created. So, this kernel stack is
specific for this child process (refer above image).
Now, we will see how as exec() function executes this virtual memory map changes. So,
to start with let us look at the exec parameters which are taken (refer above image). So,
in this case there are 2 parameters which are taken that is the path and the argv, so path
specifies the path to the executable. So, in our example we have used /bin/ls. So, this
would be specified in the path while argv is the parameter which you are passing to the
particular program. So, for instance ls if you are taking could have arguments such as ls
-l or ls -t and so on. So the -l and -t are arguments which are passed over here.
192
So the first thing that is done is that we get a pointer to the inode of the executable (refer
above image if condition). Now inode is a metadata which has information about where
the particular executable is stored on the secondary storage device such as the hard disk.
So, for instance, in our particular case, where we are using /bin/ls, so this particular
function namei() would return the inode for the ls executable.
Next, what we are doing is we are using this function called readi(), which reads from
that inode that is from the secondary storage device, the ELF header. So, we are reading
the ELF header from this particular inode i.e ip. So, elf header or just mention as elf over
here is defined over here as elf header (Check ELF header condition in above image). So,
we are then looking up into this ELF header and checking the identifier, we are checking
the magic number and we are verifying whether this magic number is indeed correct
(refer above image).
So, this essentially is a sanity check and this magic number for ELF should have this
particular value that is, it should have \x7fELF (mentioned in above image). So, this is
the 7f is the hexadecimal value, while ELF is the alphabets.
193
The next step is a call to the setupkvm() (refer above image), where we setup kernel side
page tables again. So this essentially may not be required, but it is done in xv6, though it
is not a necessity since, we have already done it during the fork process.
So this particular slide shows a continuation of the exec implementation (mentioned

above image). So the next thing is we continue to read from the inode. We read various
things like the program headers and we begin to load code and data from the ELF image
which is present on the hard disk into RAM, and consequently we are actually filling up
194
the virtual address space corresponding to the code and data. So, this is done over here
(refer all if conditions in above image).
So I will not go more into details about this about how the various functions are used.
But the basic idea is that, we are going to the hard disk looking into the inode
corresponding to that particular executable and loading the code and the read only data
into the physical memory map and we are also creating page table and page directory
entries corresponding to that code and data. So, this is actually present, so therefore, you
get this mapping present over here (refer above image).
The next step in the exec implementation is to create the stack for the user process. So,
this stack (mentioned in above image) as opposed to the kernel stack is used by the code
for storing of local variables as well as for function calls. So, in order to create this stack,
we rather the exec implementation allocates 2 contiguous pages. So, it one is used for the
stack while the other one is used as a guard page. So the guard page is made
inaccessible; essentially this is used to protect against stack overflows. So, what does it
means is that as we keep using the stack, the stack size keeps increasing and it would
eventually hit the guard page; and as a result, we would know that the stack overflow has
occurred.
195
The next step in the exec implementation is to fill the user stack. So, essentially we have
created the stack over here (refer above image) and now we are actually filling the stack.
So we fill the stack with command line arguments. So, we know that any program that
we write could take command line arguments and these arguments are actually filled into
the stack.
So, we have like command line argument 0 to N, followed by a null termination string
and then we have pointers to these arguments like pointer to argument 0, pointer to
argument 1, and so on (mentioned in above image). So these pointer to arguments, forms
the argv of your program. So, you know that a main function takes argc and argv, so
these pointers from the argv of your programs input and after these argv is the argc,
which is the number of such parameters and followed by 0xfffffff, there is a dummy
return location for main.
Now, we have seen that we have actually created the code for the user process, we have
the data for the user process and these two have actually been taken from the secondary
storage device and loaded into physical RAM. And also a page directory and page table
entries have been created thus we are able to see it in the virtual main memory map. And
also we have created a stack for this user process and we have filled the stack with
command line arguments, creating the argc and argv. The only thing next to do is to
actually start executing the process.
196
So this is done by filling the trapframe for this current process with elf entry. So,
essentially the tf eip that is the instruction pointer in the trapframe is stored with elf.entry
i.e tp->eip = elf.entry, which essentially is a pointer to the main of the user program or
main function of the user program. Similarly, the stack pointer is set to sp, so we are
creating the trapframe esp is set to sp i.e tf->esp = sp. Now when this particular exec
system call returns to the user process, the trapframe gets restored into the registers as a
result the eip gets the value of the main address while the stack pointer the esp gets the
value of the user space stack and therefore, execution will start from the main program.
The stack pointer will have the pointer to the various arguments for argc and argv, which
will then be used in the main program.
So with this, we come to the end of this video lecture. We had seen quite in detail about
how xv6 operating system implements various system calls related to the process
management such as the fork, wait, exec and the exit system call. So, in particular with
the exec system call, we had seen how the various user space sections such as the code
data and the stack gets created, and how the stack gets filled with command line
arguments, and how execution of the user space process gets initiated.
Thank you.
197
Week – 04
Lecture – 14
Interrupts
Hello and welcome to this video. In today's video we will look at interrupts, which forms
a crucial part in all modern day operating systems. So, unlike normal software or normal
programs that we write, operating systems are event based that is whenever an event
occurs only then the operating system executes. To see what this means, let us look at the
slides.
So suppose you have a user process that is running, so as we have seen before user
process runs in user space and in the Intel nomenclature this is in ring 3 (mentioned in
above image). Now this user process continues to execute on the processor until an event
occurs. So, when this event occurs it would trigger the operating system to execute. So,
this triggering of the operating system will also result in a change in the privilege level.
So the system would no longer be executing in the user space, but rather it will be in
privilege level 0 or that is executing in the Kernel space.
So the operating system would essentially execute and service this particular event (refer
198
above image) and at the end of that execution, the control is fed back to user space and
the process will continue to execute. So the User space process which would execute
after the OS completes could be the same process that is User process 1 or some other
User Process, in this example it is User process 2 (mentioned in above image).
Let us look at how events are classified. So, various literature categorizes events in
different ways, but what we will do is we will follow the categorization of events based
on this particular book called the Art of Assembly Language Programming which can be
downloaded from this website (mentioned in above image). So, in this book events are
classified into 3 different types; these are Hardware Interrupts, Traps and Exceptions.
So Hardware Interrupts or sometimes just called as Interrupts are raised by external

hardware devices. For example, the network card when it receives a packet could
possibly raise an interrupt or other devices such as the keyboard, mouse or a USB device
when plugged in could raise in hardware interrupt. So, these hardware interrupts are
asynchronous and can occur at any time.
Besides hardware interrupts there are Traps and Exceptions. Traps are sometimes known
as software interrupts, they are raised by user programs in order to invoke some
operating system functionality. For instance, if a user program wants to print something
on the monitor it would invoke a trap, which essentially would be a system call to the
operating system and the OS will then take care of writing the particular text or writing
199
the particular data on to the screen.
The third type of event is known as Exceptions. These events are generated automatically
by the processor itself as a result of an illegal instruction. So, there are 2 types of
exceptions these are Faults and Aborts. A very common example of a fault is a page
fault. So, faults are essentially exceptions from which the processor could recover. For
instance, when there is a page fault that occurs when a process is executing, it would
result in the OS or in the operating system executing and loading the required page from
the swap space into the RAM. On the other hand, an exception which is of the form abort
would be very difficult to recover, such as a divide by 0 exception.
So when a divide by 0 exception occurs in a program, typically the program would be

terminated. Essentially the operating system has no way to recover from such a divide by
0 exception. This particular slide (mentioned in above image) shows the various
classification of events into Exceptions, Traps and Interrupts. Exceptions are further
classified into Faults and Aborts. So, we will now take a specific case of interrupts that is
of Hardware Interrupts.
200
So, let us look at Hardware Interrupts. In general, processors today have a dedicated pin
on the IC known as the Interrupt pin. So, this pin is often shortened or just called by the
INT pin or in some processors as the INTR pin. So, devices such as the keyboard will be
connected to the processor through the INT pin. When a particular key is pressed on the
keyboard it would result in an interrupt being generated to the processor.
Now, let us see how this particular interrupt takes place (mentioned in above image) and
what happens in the processor. So, the processor typically would be executing a program
and would be executing some instructions. Now, when a key is pressed, an interrupt is
generated to the processor and that would result in a switch in the processor to what is
known as the Interrupt Handler Routine.
201
So, in this particular case since it is the keyboard which has resulted in the interrupt, the
keyboard interrupt handler routine would be invoked. The processor would then begin to
execute this keyboard handler routine until an instruction such as the IRET is obtained.
When the IRET instruction gets executed, the context is switched back to the program
which was originally being run. So, in this way we see that interrupts could occur any
time during the programs execution, it would result in a new context being executed and
at the end of that execution the processor goes back to the original context.
So typically systems do not have just one device connected to the processor, there could
be several devices. For instance systems could have timers, USB drives, keyboards,
202
mouse, network cards and so on. However, as we have seen previously the processor just
has a single interrupt pin. So how is it possible then that several devices share the single
interrupt pin? In order to achieve this, a special hardware is used in systems. So, this is
known as the Interrupt Controller (mentioned in above image). The job of the Interrupt
Controller is to ensure that the single pin of the interrupt is shared between multiple
devices. So, the interrupt controller would receive interrupts from each of these devices
and then channelize that interrupt to the INT pin of the processor.
The processor would then communicate with the interrupt controller to determine which
of these devices had actually generated the interrupt. As a result, the processor would
then execute the corresponding interrupt handler routine. For instance, if the timer had
resulted in the interrupt then the timer interrupt handler routine would be invoked. On the
other hand, if a USB device had resulted in the interrupt, the USB interrupt handler
routine would be invoked and so on (refer above image). Thus, the interrupt handler
routine invoke is going to be very specific to the device that cause the interrupt to occur.
So, one commonly used interrupt controller is known as a Programmable Interrupt

Controller. So, it is numbered as 8259 and pictorially this is how it gets connected. So,
the 8259 has two sides, this is the input side and the output side. The output is connected
to the INT pin of the CPU, there is also an INTA pin which is an interrupt acknowledge
pin (mentioned in above image).
203
On the other side we have 8 IRQ lines. IRQ stands here for Interrupt Requests. So, on the
input side there are up to 8 devices that can be connected to the 8259. These devices are
labeled device 0 to device 7, all these devices could independently request an interrupt
from the CPU. The 8259 would then channelize that interrupt through the INT pin of the
CPU. The CPU would acknowledge the interrupt through the INTA pin and also
determine which of these 8 devices had requested the interrupt.
Now, what would happen if two devices request the interrupt at exactly the same time?
In such a case, 8259 would use some priority encoding algorithm to determine which of
these devices should be given the privilege to request for the interrupt first. Another
feature of the 8259 is that it can be cascaded to support more than 8 devices, thus more
than 8 devices could cause interrupts to the CPU.
So in Legacy computers typically there are two 8259 controllers present - one is
configured as the master, while the other is configured as the slave (first 8259 in above
image is master and below that is slave 8259). The slave 8259 controller is connected to
one of the input channels of the master 8259. So, if any of these devices connected to the
slave 8259, request an interrupt, this interrupt is channelized to the master 8259 and the
master 8259 would then channelize this interrupt to the CPU.
So, one limitation of this legacy configuration of the 8259s is the limited IRQs. So, each
device could as we sink and support only 8 devices and therefore, if you have large
204
number of devices in your system then you would require several 8259 controllers to be
present. Another major limitation of this configuration is that, the support from multi-
processor and multi-core platforms is difficult. Essentially as we seen over here (refer
above image), there is only one CPU that is connected to the master 8259 programmable
interrupt controller. This particular INT pin cannot be sent to an other CPU, which may
result in some problems.
In current systems, the 8259 programmable interrupt controller is replaced by something

known as APIC or Advanced Programmable Interrupt Controller. The configuration for
the APICs in modern systems is as shown over here (mentioned in above image). So,
each CPU in a multi-core or multiprocessors system would have a local APIC as shown
here. So, Processor 1 has a local APIC, Processor 2 has its own APIC, Processor 3 has its
own local APIC and so on.
In addition to this, there is something known as an I/O APIC which is present in the
System Chip Set (mentioned in above image). Now, external devices such as keyboards,
mouse, network cards, and so on, would request interrupts through the I/O APIC which is
then channelized to one of the local APICs in each CPU. Thus interrupts can be
distributed and prioritized between CPUs. Also the local APICs as well as the I/O APICs
communicate through interrupt messages and IPIs that is Inter Processor Interrupts.
205
So the local APIC receives interrupts from the I/O APIC and routes it to the
corresponding local CPU. The local APIC can also receive some local interrupts such as
interrupts from the thermal sensor which is present on the CPU, an internal timer and so
on. So, the local APIC could also send and receive IPIs which is Inter Processor
Interrupts, which allows interrupts between processors that it allows processors to do
system wide functions like booting, load distribution, and so on. The I/O APIC is present
in the System Chip Set which is sometimes known as North Bridge. This particular APIC
is used to route external interrupts into the local APIC.
206
So, we have seen so far that we have multiple devices which are connected to an
interrupt controller which could be either an APIC or in legacy systems based on the
8259 controller. When any of these devices request an interrupt, the interrupt is
channelized to the processor and it would result in the corresponding interrupt handler
routine to be invoked.
How does the processor know the location of the Interrupt Handler Routine? So, the
interrupt handler routine is just like any other software code is also executed from the
RAM. So, the question that we are posing here is how does the processor know the
starting location or the address of the interrupt handler routine.
So in order to achieve this, what happens is the following (refer above image); so each of
the devices that are connected to the interrupt controller is also has a dedicated number
known as the IRQ number. So, when a device, request an interrupt and the INT pin in the
processor gets asserted, the processor would then obtain the IRQ number from the
interrupt controller. In this way the processor now determines which of these devices has
requested for the interrupt.
207
We will now see how the processor uses the IRQ number (mentioned in above image). In
the memory of the system, a table known as the interrupt descriptor table or IDT is
stored. So this IDT table is pointed to by a register stored in the processor known as the
IDTR. This is interrupt descriptor table register. So, each entry in this descriptor table
contains information of where in memory the corresponding interrupt handler routine is
present. So, the processor would use the IRQ number to look into the interrupt descriptor
table, from this descriptor the corresponding location of the interrupt handler routine is
obtained and there after the processor can then execute instructions from this handler
routine.
208
So, what is the contents of the interrupt descriptor table? In the x86 systems, since it also
has segmentation. So therefore, the each interrupt descriptor would contain the segment
selector as well as the offset. The segment selector as we have seen is of 16 bits and
while the offset is of 32 bits. So, bits 0 to 15 are here and the bits 16 to 31 are present
over here (mentioned in above image).
There are other aspects in the interrupt descriptor such as the Present bit and the
Descriptor Privilege Level.
209
So let us see how this Interrupts Descriptors are used in reality. So, the processor would
obtain the IRQ number or the interrupt vector from the programmable interrupt
controller or the APIC, and as we have seen this interrupt vector is used to index into the
IDT table, and the corresponding interrupt or trap gate contains the segment selector as
well as the offset. The segment selector is then used to index into the GD or the LDT
table from which, the base address of the code segment is obtained. The offset part is
then taken and added, to the base address to obtain the final address where the interrupt
procedure is present.
So the Intel x86 systems supports 56 different types of events. Some of these events are
used internally for faults and aborts, while others can be configured for hardware
interrupts as well as for software interrupts.
So, this particular table (refer above image) gives some of these interrupts and
exceptions that are supported by Intel x86 systems. Especially we would like to see that
interrupt with vector number 0 to 31 are internal to the processor, of these the vector
number 0 to 20 are used for various faults or aborts. The interrupt vector numbers from
32 to 255 can be user defined. So, some of these user defined interrupts are configured as
hardware interrupts, while others are used as software interrupts.
So in the next video, we will look at how interrupts are handled in the CPU as well as in
the operating system.
210
Thank you.
211
Introduction To Operating Systems
Week – 4
Lecture – 15
Interrupt Handling
Hello. In the previous video we had seen how interrupts can be requested from any
external device and it would result in a interrupt service routine being executed.
So, in this video we will look at more into detail how interrupts are handled. So Interrupt
Handling is a pretty involved process which involves both the CPU as well as the
operating system. So, we will go step by step and see what are the various stages in
interrupt handling.
So let us start with the first. Let us say that the device asserts an interrupt line, and as we
know this would result in the interrupt controller channelizing that request into the INT
pin of the processor. So, what happens next? So In the processor called CPU over here
(mentioned in above image), (1st point) the CPU would sense that the interrupt line or the
INT line is asserted, and it would obtain the IRQ number from the PIC that is the
Programmable Interrupt Controller. Then, (2nd point) the processor would switch to the
kernel stack and also switch the privilege level if required.
212
The 3rd step is the current execution of the program is stopped and the program state is
saved. So, in x86 the program state comprises of several registers such as the SS register
that is the Stack Segment, the stack pointer, flags register, the code segment and the
instruction pointer.
So all these are saved on to the stack and then the processor would jump to the interrupt
handler (4th point). So in the interrupt handler, we have a top half of the interrupt handler
(5th point) important things like respond to the interrupt, more storage of program state,
scheduling of something known as the bottom half of the interrupt handler is done, and
then it is followed by the IRET which is the Return from the interrupt. The CPU then
executes the return from interrupt (6th point) and after sometime there is the bottom half
of the interrupt handler that runs (7th point). So, this bottom half of the interrupt handler
is essentially known as the work horse of the interrupt and does most of the difficult or
time consuming jobs.
So, one thing to notice is that each of these stages could be done either by the CPU
automatically that is the processor hardware itself does this automatically, so these are
the blacked boxes (Refer Slide Time: 00:50), while the yellow boxes is done in software,
essentially by the operating system. So, you see that some of these steps are done
automatically by the CPU, while others such as the interrupt handler is done in software
by the operating system. So, what we will see next is each of these stages in more detail.
213
So, after the current instruction completes executing, the CPU senses the INT line and
then when it determines that a particular interrupt has been requested, it obtains the IRQ
number of that interrupt from the PIC (point 1st in above image).
Then it would switch to the kernel stack if necessary, and also change the privilege level
to ring 0 that is it would allow the kernel code to start executing (point 2 nd in above
image).
214
So, we will look at this, so we look at the stacks first. So, as we have seen before each
process has two stacks (refer above image). So, one stack known as the User Stack
(highlighted above in blue color) is visible in the user space and it is typically what is
used to store various auto variables and for function calls for the instructions in the user
program.
On the other hand, there is a hidden kernel stack corresponding to each process
(highlighted above in purple color). So, when the processor detects the interrupt, the
context changes from user stack to the kernel stack. Now, hence forth the kernel stack is
going to be used to store auto variables as well as for function calls of the code that
executes in the kernel.
215
So, why do we switch stacks? Essentially stacks are switched because the OS cannot
trust the user process stacks. The user process stacks may be corrupted and as a result we
do not want that the kernel also gets corrupted due to this reason. Another reason is that
user processes cannot access the kernel stack, so for instance if the user process is a
malicious virus for instance, it has no access to the kernel stack and therefore, cannot
modify or change anything in the kernel.
Second thing is, how is the stack actually switched from the user stack to kernel stack?
So, this is done by something known as the task segment descriptor and essentially what
it does is that it changes the stack segment and the stack pointer of the processor. So the
information of the new stack segment and the stack pointer is present in the stack
segment register. So, another thing that occurs during the process of switching stack is
that the privilege level changes from low to high that is from ring 3 which where the user
processes run to ring 0 where the kernel runs. So, all these things are done automatically
by the CPU.
216
So, after changing stack and raising privilege level, the basic program state is saved
(point 3rd in above image).
So, what this means is that, suppose we have a program which is being executed in user
space and an interrupt occurs, then the state of that process is saved. Why do we require
to save the state of that process? It is required so that the process could resume after the
interrupt servicing is completed.
217
How is the program state saved? In order to save the program state, we will use the
kernel stack which we have just pointed to when the interrupt occurred. So in the kernel
stack we would save the various register such as SS, ESP, EFLAGS, CS, EIP and an
Error code if required.
Next, we will see how the processor jumps to the interrupt handler (point 4 th in above
image).
218
So, we have seen this before that the processor would use the IRQ number, or the
interrupt vector to index into the IDT table from where we would obtain the segment
selector as well as an offset (refer above image). The segment selector is used to obtain
the base address of the code segment, while the offset is added to this base address to
obtain the location of the interrupt procedure. So the processor could then begin to
execute this interrupt procedure.
The next step is to actually execute the interrupt handler. Return from the interrupt and
219
also execute the bottom half of the interrupt handler (point 5th, 6th and 7th in above
image).
So let us see what a typical interrupt handler does (refer above image). So, a typical
interrupt handler would have 3 parts - one it would save some additional information
about the process being invoked. Then, it would process the interrupt which is going to
be very specific to the type of interrupt. For instance, if it is a keyboard interrupt then the
code executed over here would be very specific to the keyboard. On the other hand, if it
is a timer interrupt then the interrupt would be very specific to timers.
After the processing of the interrupt is done, then the CPU restores the original context
and returns back to the user process. So, we have the first and the third part written in
assembly language that is typically written in assembly language (point 1 st and 3rd in
above image) while the processing of the interrupt is typically written in higher level
language like C (point 2nd in above image).
220
So let us see about the additional information that is stored. So, we have seen in an
earlier slide (Refer Slide Time: 06:27) that there is a saving of the program state on the
kernel stack and note that this is done automatically by the CPU, and second that a few
registers such as the SS, ESP, EFLAGS, CS, EIP and an Error code may be saved. So,
when the interrupt handler begins to execute, additional information is also stored on to
the kernel stack. So, we have a part of this (mentioned in above image in orange color)
which is stored by the hardware or that is by the CPU automatically and remaining part
which is done in software that is in by the operating system (mentioned in purple color).
So this additional part which is done by the operating system will have more registers
which are saved. For example, the segment register such as the DS, ES, and so on, and
also the general purpose registers such as the EAX, ECX, and so on. Similarly, the other
registers like the ESI, EDI and ESP are stored. Or this pattern of registers which is stored
on to the kernel stack is known as the Trapframe, and plays a crucial role during the
return from the interrupt, in order that the user process which had been executing before
could continue to execute from where it has stopped.
221
One important aspect when writing an interrupt handler is the Interrupt Latency. So, what
is this Interrupt Latency? So let us say that we have a processor which is executing a user
process that is User process 1 (mentioned in above image) and after some time an
interrupt occurs. So, when this interrupt occurs there are a series of tasks that happen
between the CPU as well as the operating system. First, the interrupt has to be detected
by the PIC and then forwarded to the CPU, then the CPU detects the interrupt and then it
has to do various things like change context from user space to kernel space, and then
save various registers of the user space program and only then, it would be able to run
the interrupt handler.
So, this time difference between, when the interrupt actually has been triggered to when
the interrupt handler executes is known as the Interrupt Latency (difference between two
red arrows in above image). So, this Interrupt Latency can be significant as well as
important especially in systems such as real time systems. For instance, if you look at
systems such as an operating system used in car, you do not want a large latency, for
instance when an interrupt occurs in order to release the air bag which is present in some
of the modern cars. So, for instance when an accident occurs you would require that air
bags are immediately and extremely quickly opened up. Therefore, in such a case you
would require the interrupt latency to be as small as possible. What affects this Interrupt
Latency?
222
So the Interrupt Latency could vary between a Minimum and a Maximum in a system.
The Minimum Interrupt Latency is due to the delays within the interrupt controller. So
the system would typically not be able to get interrupt latency which is less than this
minimum latency specified by the controller.
On the other hand, the Maximum Interrupt Latency is due to the way the operating
system is designed. Some operating systems for instance would disable interrupts when
doing important jobs such as handling an other interrupt or doing some atomic
operations. So, during this process if a new interrupt occurs, that interrupt would have to
wait until the previous interrupt completes or the atomic operation completes. So, as a
result you will get an increase in the interrupt latency.
223
So, one way to reduce interrupt latency is by not disabling interrupts. However, this
could result in what is known as Nested Interrupts and is depicted in this particular figure
(refer above image). So let us say the kernel code is executing in the CPU and after
sometime an interrupt occurs due to which the interrupt handler begins to execute. So,
while the interrupt handler executes, a second interrupt of a higher priority occurs and
this would lead to the second interrupt handler being executed.
After the second interrupt handler completes, the first interrupt handler continues from
where it had stopped just before the interrupt occurred, that is it had stopped at this
particular point (second interrupt arrow in above image) and it will now continue
executing from this particular point (refer above image). So, after this interrupt handler
finishes executing, the original kernel code will continue to execute.
So, as we see the system becomes more responsive, in the sense that when a new
interrupt of a higher priority comes the latency incurred is much lesser. However, the
limitation is that nested interrupts makes designing the operating system far more
difficult, also validating this particular operating system will be more tedious. So
therefore, as far as possible we would like to design interrupt handlers to be extremely
small so that such nested interrupts are highly unlikely. For instance, if we design this
interrupt handler 1 (mentioned in above image) in such a way that it would have
completed its execution at this point itself (where red arrow points), then there would be
224
no need to actually nest the second interrupt.
One way to actually achieve small interrupt handlers is to design it in such a way that
only the required crucial and critical operations are performed in the interrupt handler.
All other operations are deferred to later, that is all other non-critical actions are deferred
to later.
In Linux, this is achieved by having a top half interrupt handler and a bottom half
interrupt handler. The top half interrupt handler gets executed first, and does the
225
minimum amount of work which is critical and then returns from the interrupt. So the
critical work involved is the saving of registers, unmasking of other interrupts, triggering
the bottom half of the interrupt handler to execute and restoring registers and returning to
previous context.
So, some time later the bottom half interrupt handler then executes. The Bottom half
interrupt handler would typically fetch some data which the top half interrupt handler has
sent to it, through a say for instance a queue and it will process that particular data. So,
unlike the top half interrupt handler, the bottom half interrupt handler can be interrupted.
So let us take an example of interrupt handling in xv6. In particular we will see about
interrupt handling with respect to the keyboard. So, we have seen this particular figure
before (refer above image) and we have seen that the keyboard is connected to IRQ 1 of
the master 8259. So, when a key is pressed then it results in this particular line 1 (red in
circle) being asserted and the master 8259 will then transfer the interrupt to the CPU
through the INT pin. The CPU would then detect that a keystroke has been pressed, and
determine the corresponding interrupt vector which would result in this function
consoleintr() to be executed.
226
Now, in the consoleintr which is present in console.c in the xv6 code, what is first done
is that the function would communicate with the keyboard using the keyboard driver
present in kbd.c to determine which key has been pressed. Now, if it was a special key
then there is a special servicing done for special characters pressed in the keyboard and
then the data is pushed into a circular buffer. So the circular buffer is as shown over here
(mentioned in above image) - it has got a read pointer and a write pointer. So the
consoleintr will push the data at the memory pointed to by the write pointer and then
increment W. So, at a later point other functions in the operating systems would then read
data using the read pointer and therefore, be able to determine what are the keystrokes
which been pressed.
So, this was the brief introduction to interrupts and how interrupts are worked and how
they are handled by the CPU and the operating systems.
In the next video we will look at software interrupts and how they are used to implement
system calls in the operating systems.
227
Week - 04
Lecture – 16
Software Interrupts and System calls
In this video, we look at an important type of interrupts known as Software Interrupts;

and their applications in system calls.
In the previous video, we had looked at hardware interrupts. We had seen how a device
such as a keyboard or a network card could assert a particular signal in the CPU and this
would cause the CPU to asynchronously execute an interrupt handler corresponding to
the device. So, as we have seen in the previous videos, this device (refer above image)
would typically send a signal to the CPU through an intermediate device such as a PIC or
a programmable interrupt controller.
In much the similar way, we have what is known as a Software Interrupt. However,
unlike having an external device which causes the interrupt, here an instruction in the
program would trigger the interrupt. In this particular case, for example, an instruction
such as INT (refer above image) would cause the interrupt to occur and the operating
system to execute. So, here the instruction is INT x, so x here is the interrupt number. It
228
typically has a value less than 256, and it is used to specify or distinguish between
software interrupts.
So, where is the software interrupt used? So, software interrupts are used to implement
system calls. So, as we know a user process (mentioned in above image) could invoke a
system call to perform some Kernel operation. For example, it could be to read a file or
to write a file, to print something to a monitor, or to send a packet through the network
and so on. More specifically, all operating systems implement system calls through one
particular software interrupt.
For example, in the Linux operating systems, the software interrupt 128 is used to
specify system calls. Therefore, in a Linux OS, if I have a INT 128 which is executed in
the user process, it would lead to an interrupt that occurs and cause the kernel or the
operating system to execute, and thereafter the OS would execute code depending on the
interrupt. In xv6, the software interrupt used to implement system calls is 64 or
instruction like INT 64 in the user process (refer above image) would be meant to
implement a system call.
229
So to take an example, let us consider that our application has a printf statement present
in it. So, printf would print this string ‘str’ (mentioned in above image) to the standard
output, which typically is the monitor. Now printf is a function present in the library libc
and it would cause the libc function to be invoked. Now in the libc function there is a call
to the write system call with the specifier STDOUT i.e write(STDOUT). So, the
STDOUT here is the file descriptor; and it is a special file descriptor which is meant for
the standard output or the monitor.
So in the write function, it would invoke INT 64 in xv6 or INT 128 in Linux and cause a
software interrupt to occur. So, the software interrupt as we know would cause the
transformation from the User space to the Kernel space and it would result in the
operating system executing. The OS would then determine that the interrupt was in fact
due to a system call and then it would determine what system call it was from; in this
case, it was from a write system call and it was a write to the STDOUT - the standard
output.
So the operating system would then invoke the handler for the write system call, and this
handler (second pink box in above image) would take care of communicating with the
various devices such as the video card to display the string onto the monitor. So, after
this handler completes execution, the IRET instruction is executed which would result in
230
a transformation back from Kernel space to the User space and the program will continue
to execute.
Now, typically operating systems support several different types of system calls. So, this
particular table over here (refer above image) shows the various system calls supported
by xv6. So, in a previous video we had seen some of them. For example, we had seen the
fork(), exit(), wait() so on. And you are also familiar with several types of system calls
such as open(), read(), write(), close(), change directory chdir(), make directory mkdir()
and so on. So, each of these system calls would be executed by having a software
interrupt such as INT 64, because it is the xv6, so it is 64. So, each time any of these
system calls are invoked by a user process, it would trigger the operating system to
execute.
Now, the next obvious question that one would ask is from the OS perspective, how does
the OS distinguish between the various system calls? So, we mention that all the system
calls will use either INT 64 for xv6 and INT 128 for Linux. So how does the OS
determine whether the system call was with respect to fork(), wait(), sleep(), exit(), and
so on. Essentially this distinguisher comes from the user process itself.
231
What happens is that, before the INT 64 instruction, the user process will move a system
call number to the eax register. So, for example, instruction mov x, %eax will move the
system call number to the eax register. Now each system call in the operating system will
have a unique number, system call number. Now the operating system when triggered by
the INT instruction would look up the eax register, and then determine what system call
was invoked. For example, in xv6, if we look up these particular header files (mentioned
in above image), we would see the various system call numbers defined. For example,
we have each system call given a specific number, for example SYS_fork given 1,
SYS_exit giving 2 and so on.
Now when the OS gets trigger due to the INT 64 instruction getting executed, the OS
will determine the system call using this system call numbers and then invoke the
corresponding system call handler. So, each of the system calls also have a
corresponding system call handler. This is shown over here (refer above image),
corresponding to each of the system call numbers SYS_fork that is 1, SYS_exit it is 2
and so on.
So, these are system call functions (mentioned above in System call handler list) present
in the operating system which gets triggered based on the type of the system call. For
example, if eax had a value of 11, the operating system will look in to the eax register
and determine that this corresponded to the getpid system call being invoked. And then it
232
would look into this particular table (mentioned above in System call handler list) and
see that the getpid system call is handled by this function sys_getpid, and therefore it will
then invoke this sys_getpid function.
Now, let us look at the typical prototype of a system call. So, a typical system call is as
shown over here (refer above image). So, of course, it has a system call name that is a
function name and then it is passed some resource descriptor and parameters and
typically would return an integer. So, the resource descriptor specifies what operating
system resource is the target here. For example, it could be a file or a device; and as we
have seen in the previous slides it could also specify a particular monitor. For example, if
the resource descriptor is STDOUT then the resource in use here is the monitor.
So, some system calls also do not specify this resource descriptor; in such a case, the
system call is meant for that resource itself. For example, if we use the sleep system call,
we only specify the time and no specific descriptor such as the file, a device and so on.
This means that the current process wants to sleep for that given interval time.
The next part is the parameters. So, these parameters are specific for the system call. For
example, if we invoke read, write, open or close or any other system call, the parameters
specified here is going to be very specific for each of these system calls. For example,
the write system call (mentioned in above image) has the parameter *buf that is a void
pointer and the count. So, the open or the close or any other system call would have
233
different set of parameters. So, essentially these parameters are very specific to the type
of the system call.
The return type is typically int or integer, and sometimes it is a void. So, ‘int’ is typically
used because in this way the operating system will be able to send the completion status
of the system call, whether it had executed successfully or it had failed and so on. So,
sometimes the return is also used to specify certain specific information about the system
call. For example, in write (mentioned in above image) the return is ssize_t which in fact,
is typdef to integer and it specifies number of bytes that have been written to the file,
specified int fd. So, the return type could also vary depending on the type of system call.
The next thing what we will look at is how these parameters that is the resource
descriptor and the parameters pass to the system call are sent to the kernel. So, note that
system calls are invoked very differently from a standard function call. So, in a function
call, as we know the instruction call would be used and the call would specify a address
which is where the function would reside and the various parameters for the call are
passed through the local stack.
Similarly, the int return which is returned from the function call would be returned
through the eax register. So, system calls on the other hand work very differently from
function calls. So, as we have seen system calls invokes the kernel by the INT 64
instruction as in the case of xv6. So, how are the parameters such as the resource
descriptor and the other parameters passed to the kernel?
234
So essentially, there are three ways of doing so. The first is by pass by registers which is
typically done in Linux; the second way is by passing through the user mode stack which
is done in xv6; and the third way is by passing through a designated memory region. So,
in this particular case (third case), what is done is that in the user process itself, a
designated region most likely in the heap would be used to save the various parameters
that is needed to be passed to the system call; and the address to this region in the heap is
passed through the registers. So, we will look at the other two cases that is pass by
registers and pass via user mode stack in more detail.
235
Now Pass by Registers which is used by Linux system calls would use the registers
present in the processor to pass parameters to the kernel. So, we know we have already
seen an example of this, of how the %eax register is used to pass the system call number
from the user process to the operating system. In a similar way, other registers such as
the %ebx, %ecx, %esi, %edi, and %ebp are used to pass the various parameters of the
system call from the user process to the kernel. If the system call has more than 6
arguments then a pointer to a block structure containing the argument list is passed to the
kernel.
Now, let us look at the second case that is Pass via the User Mode Stack and this is what
is done in xv6. So, in this particular way, before the int 64 instruction (mentioned above
in User process box) is present various parameters in the system call are pushed onto the
stack. For example, if the system call had 3 parameters; param 1, param 2, and param 3,
so these three parameters are pushed into the user space stack and then the system call
number (number before int 64) as we have seen is moved into the %eax register. So, this
here is the user space stack of the user process containing the 3 parameters.
Now, when the INT instruction is executed, as we know, it triggers an interrupt causing
the switch from the user space into the kernel space. Also as a result of this interrupt
execution, as we have seen, there is the switch in the stack from the user space stack to
the kernel space stack. And what we have seen in the previous video that this kernel
236
stack is used to create what is known as the trapframe. So, this trapframe is shown over
here (mentioned in above image). So, what we have seen in the previous video that some
of these entries in the trapframe are pushed into the stack automatically by the CPU. So,
in particular, these registers specified in capitals letters are all pushed onto the kernel
stack that is on to the trapframe by the CPU.
Now, the SS and ESP here (in trapframe box) is important for us. So, these are the stack
segment and the stack pointer, and these registers correspond to the user space stack. So,
as we know before the INT 64 has been executed, the last known stack pointer was
pointing to this particular location. And therefore, the contents of ESP will also point to
this location in the user space stack. So, in this way, the kernel will then use the SS and
the ESP from the trapframe to determine the various parameters for the system call.
The next thing we will look at is how the return value is passed from the system call
back to the user process. Again we will recollect that the entire reason for creating this
trapframe in the kernel stack for the process is due to the reason that when the interrupt
or the system call completes its execution, the entire state in the trapframe is restored
back into the corresponding CPU registers, and it would result in the user process
continuing to execute from where it had stopped, and also the context of the user process
is restored with the help of the trapframe.
237
Now, in order to return a value from the system call which is executing in the kernel
space back to the user space, so what is done is that the eax register in the trapframe is
modified; essentially, we had seen that the eax register because of this particular
instruction (mov sysnum, %eax) would contain the system call number.
Now this system call number is overridden by the return value of the system call. So, this
could be a negative number like -1 or a positive number as we have seen in the earlier
slide. So, now when the system call executes and completes its execution and the context
is transferred back to the user process, the entire trapframe including the new value of
eax in the trapframe is restored back in to the registers of the CPU. The process
continues to execute from this particular instruction (after int 64) with the new value of
eax, which contains the return from the system call.
Thank you.
238
Week - 04
Lecture – 17
CPU Context Switching
Hello. In this video we will look at CPU Context Switching. We will see how the
operating system enables multiple processors or rather enables multiple processes to
share a single CPU.
So let us consider this particular figure (refer above image), where we have a single CPU
in the system which is shared among multiple processes. So, the operating system by a
feature known as Multi Tasking enables that this CPU is fairly shared among the various
processes. So, in a multi tasking environment or rather in a multi tasking enabled
operating system, the OS would allow one process to execute for some time, and then
there is a context switch. So, during this context switch, the process 1 would be stopped
and a new process, in this case (as per above example) process 2 will begin to execute.
Now the operating system ensures that when process 1 stops executing, its state is saved
in such a way that after sometime it can be scheduled back into the CPU and can
continue executing from where it had stopped. So, as a result in a multi tasking
239
environment such as this (mentioned in above image), processes execute for brief
intervals of time known as Time slices. So, for instance process 4 executes in this time
slice (between process 1 and 2) then there is a period where it does not execute and then
it continues executing in this time slice (between process 3 and 3) and so on. So, in this
video what we are going to see is how the operating system ensures that a context switch
occurs.
To begin with, we will see what triggers a context switch in an operating system?
240
To answer this particular question, we need to go back to the Process State diagram. So,
in a previous video on the processes, we had seen that when a process gets created to the
time it exits, it goes through several different states, and this is represented by this
process state diagram (refer above image). What we will see now is how these various
states will trigger a context switch to occur.
Let us see this 1st scenario about how a context switch gets triggered. Let us consider this
particular program (refer above image) which essentially invokes the scanf. Therefore, it
requires a user to input something through the keyboard.
Let us say this function (scanf() ) is executing as a process in the system, and it is
currently in the running state, which means that it is currently holding the CPU and is
currently executing in the CPU. When the scanf() function gets invoked what would
happen is that the process now requires to be blocked until a user inputs data through the
keyboard. So, as a result the state of the process changes from the running state into the
blocked state. Now the process will remain in the blocked state until the user has input
data into the console.
So in order to utilize time effectively, what the operating system is going to do is that it is
going to trigger context switch so that another process could then be executing in the
running state and will have the processor.
241
So let us take the 2nd scenario where context switch can occur. So let us say that we are
considering this particular program (refer above image), where there is a printf and then
there is an exit and as we know and we have seen before in a previous video that the exit
is a system call which results in the process going into this zombie state.
So at the end of the exit system call, the operating system will trigger a context switch.
So, this will ensure that the current process which is just in a exiting mode will not have
the CPU resources anymore, but rather a new process will be assigned the CPU.
Therefore, a new process will come from the ready state into the running state.
242
Now, let us look at the 3rd scenario that can occur. So, if a process is in a running state,
and an event such as an hardware interrupt occurs, then it could lead to a context switch.
So, for instance if a process is currently executing or currently holding the CPU and an
interrupt occurs it moves from the running state into the ready state. As a result of the
interrupt it would cause an interrupt handler to execute, and at the end of the handler a
new process may be executed in the CPU that is a new process will move from the ready
state into the running state.
243
A very popular interrupt in this context is the Timer interrupt. Now all systems have a
timer within them. So, this timer (refer clock timer in above image) is configured to send
interrupts periodically to the CPU. The period could be anything from 10 milliseconds to
100 milliseconds, and may vary from system to system or when the timer interrupt
occurs, the OS gets triggered and it causes a CPU scheduler to execute.
Now the CPU scheduler looks at the queue of processes (mentioned in above image)
which are in the ready state, and then based on some algorithm will choose a particular
process. So, this process would then be moved from the ready state into the running
state, and it would be this process that would execute in the CPU until the next interrupt
that occurs. So, in this process every 100 milliseconds, for instance the scheduler would
pick a new process to execute and that process would then hold the CPU for the time
slice of 100 milliseconds.
So now let us look more into detail about how a context switch occurs. So, in a previous
video we had seen that corresponding to each process in the system, the kernel stores
some metadata. Essentially there are three metadata for each process, they are; the
process control block, the kernel stack and the page tables corresponding to that process.
Now we have also seen in an earlier video, that when an interrupt occur this context of
that process is stored in the kernel stack, in what is known as the trapframe. So, this
context (mentioned in above figure in kernel block) would allow the process to restart
244
execution from where it had stopped. So, in this particular figure for instance (refer
above image), each process has its own trapframe or own kernel stack which was the
trapframe when an interrupt occurs, and its associated page tables. Similarly, process 3
has its own trapframe and page tables, process 1 and process 2 as well.
In addition to this, the CPU scheduler has a context which is also stored in a separate
stack. So, this is known as a Scheduler context and it is used while doing a context
switch.
245
So let us look more into detail about how a context switch occurs. Let us assume that the
user process 3 is currently executing in the CPU (refer above image) that is the user
process 3 currently holds the CPU. Also as a result of this it is the process 3's page table
which is currently active, that is for every instruction fetch, every memory load or
memory store it is the process 3’s page table which does the translation from the logical
address into the physical address.
Now, when an interrupt occurs we have seen that there is a switch from the user mode
into the kernel mode.
Therefore the kernel begins to execute again, and it is the kernel which will now hold the
CPU. All instructions that are being executed in the CPU, thus belong to the kernel code.
Also as a result of the interrupt what we have seen before was that, the entire context of
this process 3 get stored in the kernel stack of that process. This context (green stack in
above image) is stored in a structure known as the trapframe and this trapframe as we
have seen before has sufficient amount of information that would allow process 3 to
restart executing from where it had stopped. The next things that happens is that the
kernel determines that the interrupt occurred was due to the timer, and it would invoke
the scheduler.
246
The scheduler would then switch from the kernel stack of the user process to its own
stack or the scheduler stack (red stack in above image); therefore, it would obtain the
scheduler context. The scheduler then chooses from the ready list, the next process to be
executed in the CPU and then there is a switch to the new process’s kernel context. Thus
we are moving back from the scheduler’s context back to the new process. In this case
process 4 that has been selected by the scheduler and we switch back to process 4’s page
tables and process 4’s kernel stack (mentioned in above image in yellow color).
Now, recall that each of these stacks have the trapframe. Therefore, even process 4 based
on its previous execution has a trapframe which contains the entire context of were
process 4 had stop executing.
247
Thus, when the timer interrupt returns with the Irate call this contents of the process 4's
trapframe gets restored. And as a result of this, process 4 will continue to execute. Also
what happens is that, we are switching to process 4’s page table. Now as process 4
executes instructions, its instructions are fetched memory loads and memory stores are
translated by process 4’s page tables.
In this way every interrupt that occurs would cause a switch to the kernel and if it is a
timer interrupt and requires a new process to be executing, the new process would be
selected by the scheduler and its context would be restored. Also as we have seen the
new process’s page table would then be made active.
248
Now, let us look how timers actually interrupt processors. As we have mentioned before
all systems have a timer present in them. In legacy system this is known as the PIT or
Programmable Interval Timer which is a dedicated chip present in the mother board. In
modern systems or in current day systems, timers are present in the LAPIC. So, in both
these devices there is a counter present internally. So, this counter can be programmed to
start counting from a particular number.
For example, this large number (10000000 mentioned in above image), every clock cycle
this counter is actually decremented until it reaches 0. At the end of the count that is
when it reaches 0 the counter is loaded back with the original value (10000000) and this
process continues. So, essentially the counter keeps going from this large value to 0 and
then keep cycling across these values. Now when 0 is obtained by the counter an
interrupt is asserted to the CPU. Thus at periodic intervals the CPU would receive
interrupts from the timers. So, these interrupts are then used to decide on the next process
that is going to execute in the CPU.
249
This figure over here (refer above image) depicts the ideal view of multiple tasking. So,
in this particular ideal view, a process will continue to execute until its time slice
completes. When its times slice completes a new process or another process will execute
in the CPU. In a more realistic view of multi tasking, and also as we can infer from the
previous slides is depicted in this particular figure at the bottom.
So, in this view or the more realistic view the process will continue to execute as usual
until its time slice completes, and then instead of the next process immediately being
context switched into the CPU, the kernel will execute, where in the kernel will do
various things like, handling the interrupts, doing the various jobs and also executing the
scheduler where a new process is chosen. And only after this all this occurs will this next
process (process 2 in above image) execute in the CPU. So, this time difference between
when the first process is preempted from the CPU to when the second process starts to
execute is the context switch overheads; now, this context switch overheads could be
significant.
250
The factors affecting the context switching overheads can be classified as either the
direct factors or the indirect factors. Among the direct factors these are the three
examples of direct factors (mentioned in above image), and they are essentially quite
straight forward and easy to understand. For instance, the timer interrupts latency or for
that matter any interrupt latency will add up to overheads in the contexts switching.
Essentially there would be time taken for saving and restoring context for the various
processes. In addition to this overheads are also caused by the scheduler, which needs to
choose the next process to execute in the CPU.
The indirect factors are more subtle and more difficult to understand. So, these are three
examples of indirect factors. So, the first factor is that the TLB needs to be reloaded. So,
TLB is the translational look aside buffer. So, it is a cache which stores recently use page
mappings. So, when we switch from one process to another process, we know that the
page table change from the earlier process to the new process that is we have a new set
of page tables which becomes active.
As a result the TLB needs to be flushed again. Now refilling the TLB will take time and
result in overheads. So, another aspect is the loss in the cache locality. So, as we know
cache memories work on the principle of locality. So, when there is a context switch this
locality is lost. In the sense that we are no longer executing the same instructions as we
had done before the context switch. So, as a result of this, the cache memories would
251
need to be reloaded again and this could incur several cache misses and as a result add up
to overheads.
Another aspect is that every time an interrupt occurs, the processor pipeline would need
to be flushed. So, this also adds up to the overhead. The context switching could incurs
significant overheads and degrade performance quite significantly. As a result designer
should very carefully decide upon how context switching is done and rather when it is
done in order to achieve best performance of their system.
With this we will end this particular video on context switching in operating systems. So,
we have seen various aspects of context switching essentially we have seen some details
about how context switching occurs in operating systems and we have also seen
overheads that are incurred due to context switching.
Thank you.
252
Week – 05
Lecture – 18
CPU Scheduling
Hello. In this lecture, we will look at CPU Scheduling Algorithms.
We had seen in operating systems that a scheduler present would choose a particular
process from the ready queue and that process is assigned to run in the processor. Now
the question that we were going to analyze now is how should this scheduler (mentioned
in above image) choose the next process? Essentially, how should the scheduler choose a
process to run on the CPU from the existing queue of ready processes?
253
Now to analyze this, we first look at execution phases of a process. Now any program
has been known to have two phases of execution; one is when it is actually executing
instructions which is known as a CPU burst (mentioned in above image as blue boxes),
while the other is when it is blocked on an I/O or not doing any operations, so in this
particular case, the CPU is idle (mentioned in above image as grey boxes).
Thus, as we look with over time, a particular process would have some amount of CPU
burst in which it would execute instructions on the processor, then it would have an some
idle time in which it is waiting for a I/O operation. Then there would be a burst of CPU
time and so on. Thus, there is always this inter-leaving between CPU burst and idle time
that is a waiting for an operation.
254
So, based on this cases of execution (CPU working and CPU idle case) one could
classify processes into two types. One is the I/O bound processes, while the other is the
CPU bound process. Why do you make this distinction between I/O bound and CPU
bound processes? Essentially, this is from a scheduling perspective, we would like to
give I/O bound processes a higher priority with which they are allocated the CPU.
Essentially, we want that I/O bound processes wait lesser time for the CPU compared to
CPU bound processes. So, why is this required, we can take with an example.
So suppose we are using a word processor such as a notepad or Vim and we are giving
this I/O bound process a very low priority. Now, suppose a user presses a key, because it
has a low priority it does not get the CPU very often and therefore, it would take some
time before that key pressed by the user appears onto the screen. So, this may be quite
uncomfortable for the user, therefore we would like to give the I/O bound process such
as the word processor higher priority with which it will get the CPU, so that the user
interaction with the CPU becomes more comfortable.
On the other hand, if you look at CPU bound processes, we could give it a lower priority.
Now for instance, if you take one of these applications, the CPU bound applications like
for instance say let us say ‘gcc’ that is compiling a program and let us say you are
compiling a large program which takes 5 minutes. Now it will not effect this user much
if the time taken to compile that particular program increases from 5 minutes to say 5.5
255
minutes. Thus, a CPU bound processes could work with a lower priority. This
classification between I/O bound and CPU bound is not a rigid classification, that is a
process could be an I/O bound process at one time, and after some time it could behave
like a CPU bound process.
So, to take an example of a process which behaves both like an I/O bound as well as
CPU bound, you could take for instance Microsoft excel. When we are actually entering
data into the various cells in excel, it acts as I/O bound process. So, it behaves like an I/O
bound process with small CPU burst and large times of I/O cycles. While on the other
hand when you are actually computing some statistic on the data entered, Excel will
behave like a CPU bound process, where there is a large portion of CPU activity or the
time taken to actually operate on that particular data.
Now, let us come back to the question about how the CPU scheduler should choose from
the queue of ready processes, the next process to execute in the CPU. There could be
several ways in which the scheduler could makes this choice; essentially, there could be
several CPU scheduling algorithms which would look into the queue and make a
particular decision.
256
So in order to compare these various scheduling algorithms, operating systems text

books or operating systems research defines several scheduling criteria. So, these criteria
could be used to actually to compare various scheduling algorithms to see the advantages
and disadvantages of each of them. So let us go through each of these scheduling criteria
one by one. The first scheduling criteria is the CPU utilization. The scheduling algorithm
should be designed in such a way so as to maximize CPU utilization. In other words, the
CPU should be idle as minimum time as possible.
The next criteria, we will look at is the throughput. Essentially, scheduling algorithms
would try to complete as many processes as possible per unit time. A third criteria is the
turnaround time and this criteria is looked at from a single process perspective. So,
turnaround time is defined as the time taken for a single process from start to completion.
The fourth criteria is response time. So, this is defined as the time taken from the point
that when the process enters into the ready queue to the point when the process goes into
the running state, that is the time taken from the instant the process enters the ready
queue to the time the CPU begins to execute instructions corresponding to that process.
Another criteria, is the waiting time. Now this criteria, is based on the time taken by a
process in the ready queue. Now as we know processes ready to run are present in the
ready queue and it is required that they do not wait too long in the ready queue. So,
257
scheduling algorithms could be designed in such a way that the waiting time or the
average waiting time in the ready queue is minimized.
The final criteria we will see now is fairness. The scheduler should ensure that each
process is given a fair share of the CPU based on some particular policy. So, it should not
be the case that some process, for instance, takes say 90 percent of the CPU while all
other processes just get round 10 percent of the CPU. So, all these criteria need to be
considered while designing a scheduling algorithm for an operating system.
A single scheduling algorithm will not efficiently be able to cater to all these criteria
simultaneously. So, therefore, scheduling algorithms are therefore designed for to meet a
subset of these criteria. For instance, if you consider real time operating system, the
scheduling algorithm for that system would for instance be designed to have minimum
response time; other factors such as CPU utilization and throughput may be of secondary
importance.
On the other hand, desktop operating system like a Linux will be designed for fairness,
so that all applications running in the CPU or in the system are given a fair share of the
CPU. Criteria such as response time may be less important from that perspective. So, we
will now look at several scheduling algorithms starting from the simplest one that is the
first come first serve scheduling algorithm and go to more complex scheduling
algorithms as we proceed.
258
So let us look at the First Come First Serve of the FCFS scheduling algorithm. The basic
scheme in this case is that the first process that requests the CPU would be allocated the
CPU or in other words, the first process which enters into the ready queue would be
allocated the CPU. So, this is a non preemptive scheduling algorithm which means that
the process once allocated the CPU will continue to execute in the CPU until its burst
cycle completes.
Let us see this with an example. Let us say we have a system with 4 processes running
(1st column); the processes are label P 1, P 2, P 3, P 4 and they have an arrival time, so
the arrival time is present in the 2nd column (mentioned in above image). So the arrival
time is defined as the time when these processes enter into the ready queue. So, for this
particular very simple example, so we will consider that all processes enter the ready
queue at the same time that is at the 0th time instant. The 3 rd column is the CPU burst
time, so it gives the amount of CPU burst for each process.
For instance, P 1 has a CPU burst of 7 cycles; P 2 has a burst of 4 cycles. Thus, this
particular table (mentioned in above image) we have like 4 processes, which all enter
simultaneously into the ready queue at the time instants 0 and they have different CPU
burst time, for instance P 1 has 7 cycles, P 2 - 4 cycles, P 3 - 2 cycles and P 4 - 5 cycles.
Now, we will see how these four processes get scheduled into the CPU or how these four
processes get allocated the CPU. Since, all of these processes arrive at the same time, the
259
scheduler does not actually have a choice to make. So, he would pick randomly a
particular ordering. For instance, let us say the scheduler picks P 1 to run, so P 1 runs for
7 cycles and when it completes, the scheduler picks P 2 and P 2 runs for 4 cycles. After P
2 completes its burst, and P 3 executes in the CPU for 2 cycles; and then finally, P 4 is
scheduled in to execute for 5 CPU cycles (mentioned in above image). So, this is
represented by a Gantt chart.
So, a Gantt chart is a horizontal bar chart developed as a production tool in 1917 by
Henry L Gantt who was an American engineer and a Social scientist. So, essentially in a
Gantt chart we have like several blocks over here and each block represents a cycles of
execution (mentioned in above image). So, for instance, P 1 executes for 7 cycles, so it
has likes 7 blocks – 1, 2, 3, 4, 5, 6, 7 (yellow blocks), P 2 then executes for 4 cycles, so
it’s given like 4 blocks (blue blocks). Then P 3 executes for 2 cycles, so it is given 2
blocks (green blocks) and finally, P 4 executes for 5 cycles, so it is given 5 blocks (red
blocks).
So, we could compute the average waiting time for this particular case (mentioned in
above FCFS example image). We see that process P 1 enters into the ready queue at the
instant 0 and immediately gets to execute in the CPU. Thus, it does not have to wait at all
(waiting time is 0). The second process P 2 arrives also at 0 that is also arrives at this
point (instant 0) but gets to execute only after P 1 executes that is only after 7 cycles;
thus, its wait time is for process P 2 is 7. Similarly, process P 3 which also enters in the
0th cycle gets to execute only after process P 1 and P 2 completes. In other words, it has
to wait 11 cycles and the fourth process in a similar way needs to wait for 13 cycles.
Therefore, the average waiting time in this particular case is 7.75 cycles ((0+7+11+13)/4
= 7.75).
Now, we can look at an other scheduling criteria, which is the average response time. So,
average response time in this case is the same as the average waiting time. So the
average response time is the time taken for a particular process to begin executing in the
CPU minus the time it actually enters into the ready queue. So, for instance, P 2 enters
into the ready queue at this instant, but begins to execute in the CPU only after 7 cycles.
So, therefore, the response time for process P 2 is 7. Similarly, process P 3 has a response
time of 11 because it has waited for 11 cycles to actually begin executing in the CPU
260
(mentioned in above image). So, on average, the average response time is 7.75 just like
the average waiting time.
Now, one characteristic of the FCFS scheduling algorithm is that the order of scheduling
matters. In the previous slide that is in this slide (slide mentioned at 14:26) we had
assume that P 1 executes then P 2 executes then P 3 and then P 4, and we have got
average waiting time and average response time of 7.75 cycles. Now, suppose we just
change the ordering and let us say the ordering is now as follows that P 2 executes then P
3 executes then P 4 and finally P 1 (mentioned in above slide).
In such a case, if we compute the average waiting time we see that it gets reduce from
7.75 cycles to 5.25 cycles. Similarly, if you compute the average response time, you will
see that the response time is also 5.25 cycles. In a similar way, you could compute the
other criteria, which we are mentioned over here (mentioned in slide at 15:05) and you
will be able to actually see the difference with the two ordering schemes.
261
Another characteristic of the FCFS scheduling algorithm is the Convoy effect.

Essentially, in this particular case we see that all processes wait for one big process to get
off the CPU. So, for instance, in this case, until or rather even though all processes enter
the CPU at the same time, all processes have to wait for P 1 to complete only then they
could be scheduled. So, if we have a large process over here for instance, we have a
process P 1 instead of burst time of 7 takes a burst time of say 100, all other processes P
2, P 3 and P 4 would wait for 100 block cycles before they actually be able to get to
execute in the CPU. So, this is a huge drawback of FCFS scheduling scheme.
262
However, FCFS scheduling algorithm have several advantages. For the first thing is it is
extremely simple. So the scheduling algorithm could complete very quickly and
therefore, the time taken by the scheduling algorithm will be very less, and you would
end up with very less context delays while changing the contexts. Another advantage of
the FCFS scheduling algorithm is that it is fair. As long as no process hogs the CPU,
every process will eventually run; or in other words, as long as every process terminates
at some point, every other process in the ready queue will eventually get to execute in the
CPU.
Now the drawback or the disadvantage of the FCFS schedulers as we have seen is that
the waiting time depends on the arrival order. So, we have seen the example in the
previous slides. Another disadvantage is that short processes are stuck in the ready queue
waiting for long processes to complete; or rather, this is the convoy effect that we have
just looked at (in previous slide).
Now, let us look at another scheduling algorithm know as the Shortest Job First
scheduling algorithm. In this particular scheduling algorithm, the job or the process with
the shortest CPU burst time is scheduled before the others. Now if you have more than
one process with the same CPU burst time then standard FCFS scheduling is used. There
are two variants of the shortest job first scheduling algorithm.
263
The first is the no preemption variant, while the second one is the shortest job first with
preemption. Now in the SJF with no preemption, the process will continue to execute in
the CPU until its CPU burst completes. In the second variant with preemption, it may be
possible that the process which may get preempted, when a new process arrives into the
ready queue. We will see more on this later, but first we will start with the shortest job
first variant with no preemption.
So let us take the same example that we have seen in the previous case, we had their four
processes P 1 to P 4 and all of them are arriving at the instant 0, that is at instant 0 these
four processes P 1 to P 4 get into the ready queue. And each of these processes have a
different CPU burst time that is 7, 4, 2 and 1 respectively (mentioned in above image).
Now in the first instant, the CPU scheduler will look at the various burst times and find
the one which is minimum.
So, in this case (mentioned in above image) we see that P 4 has the minimum CPU burst
time (burst time 1), so that were scheduled first. So, first the process P 4 gets scheduled
until it completes, in this particular case it completes in 1 cycle. Then among the
remaining three, we see that P 3 has the lowest CPU burst time. So, process P 3 gets
scheduled and executes till it completes its burst (burst time 2). Then P 2 gets scheduled
because it has a burst time of 4, while P 1 has a higher burst time of 7. And finally, P 1
264
gets to execute till completion. Now, the average wait time, if we compute this is 2.75,
while the average response time is also 2.75 as in the wait time case.
Now let us look at another example of shortest job first without preemption (mentioned
in above image). So, we will take the same four processes P 1 to P 4 and each of these
processes have the same burst time as before that is 7, 4, 2 and 1 respectively. However,
they arrive at different instants that is the moment the instant in which they enter into the
ready queue would be different. So, P 1 enters at the 0th instant, P 2 in the 2 nd instant, P 3
at the 4th instant and P 4 in the 7th instant. Now, this is a slight modification in the Gantt
chart, where in addition to showing which process is executing in the CPU, it also shows
the order in which processes arrive. It shows that the P 1 arrives first, then P 2 in the 2 nd
instant, then P 3 and finally P 4 in the 7th instant (refer above slide image).
Now, when the scheduler begins to execute at this particular instant (starting point in
Gantt chart), the only process that has arrived at this particular point is P 1 therefore, it
schedules P 1 to execute. So, P 1 executes for its entire burst that is of 7 cycles, and then
at this particular cycle (after 7 blocks), the scheduler enters again or a scheduler executes
again and this time it has got three processes to choose from all P 2, P 3 as well as P 4
have arrived in the ready queue.
And out of them P 4 has the shortest burst time therefore, it is chosen for execution.
Therefore, P 4 executes in the CPU, and then P 3 because P 3 has a shorter burst time
265
than P 2 and finally, P 2 gets executed. So, if we compute the average wait time, we see
that it is 3 cycles (mentioned in above image).
So the advantages of the shortest job first scheduling algorithm is that is Optimal. It will
always give you the minimum average waiting time. And as a result of this, the average
response time also decreases. The main disadvantages of the SJF scheduling algorithm is
that it is not practical; essentially, it is very difficult to predict what the burst time would
be. So another drawback of the SJF scheduling algorithm is that some jobs may gets
starved; essentially if you have a process which has an extremely long CPU burst time,
then it may never get a chance to actually execute in the CPU.
266
Now, we will look at the shortest job first scheduling algorithm (SJF) with preemption.
So, this is also the shortest remaining time first. So the basic idea in this algorithm is that
if a new process arrives into the ready queue, and this process has a shorter burst time
than the remaining of the current process, then there is a context switch and the new
process gets scheduled into the CPU. This further reduces the average waiting time as
well as the average response time. However, as in the previous case that is the shortest
job first with no preemption here also it is not practical. So let us understand more on
this with an example.
267
So let us take the same example of 4 processes with burst time 7, 4, 2 and 1; and arrival
times at 0, 2, 4 and 7, will develop the Gantt chart as the time processes. So, at the
instant 0, the only process which is present is P 1; and therefore, the scheduler has no
choice, but to schedule P 1 onto the CPU. Thus P 1 would execute for 2 clock cycles.
Now after the 2nd clock cycle the process P 2 has entered into the ready queue. Now you
see that P 2 has a burst of 4 cycles. However, P 1 has a remaining burst of 5 cycles. So,
what we mean by this is that out of the CPU burst time of 7 cycles, P 1 has completed 2
cycles. So, what remain is 5. Now the scheduler will see that P 1 has 5 cycles, which is
greater than P 2, which has burst of 4 cycles. Therefore, it will do a context switch and
schedule P 2 to run. So, P 2 will run for 2 clock cycles and then P 3 arrives at the 4 th
clock instant (mentioned in above image).
So, at this particular instant, the scheduler will find out that P 3 has a burst time of 2
cycles, while P 2 has a remaining burst time of 2 cycles; we achieve 2 because out of the
4 cycle burst time for P 2, it has completed 2. So, what remains is 2 more cycles. Since P
2 the old process which is running on the CPU has a 2 cycle remaining burst time, and P
3 the new process also has 2 cycles burst time therefore, there is no preemption and P 2
will continue to execute (refer above image).
Now after P 2 completes, in this particular case, P 3 executes for 2 cycles, and then P 4
enters now after which if you will verify will not cause any preemption. So, after P 3
completes, the scheduler decides to run either P 4 or P 1. So, we see that P 4 has a burst
cycle of 1 while P 1 has a remaining burst cycle of 5, therefore the scheduler will decide
to choose P 4 over P 1. So, P 4 runs on the CPU and after it completes P 1 will executes
for its remaining burst time (refer above image Gantt chart).
So, if you compute the average wait time, you see that it reduces to 2.5, while the
average response time reduces considerably to 0.75 (mentioned in above image).
However, as we mentioned before just like the shortest job first, this scheduling
algorithm is also not feasible to be implemented in practice, because it is very difficult to
actually identify what the burst time of a process is and even more difficult to identify
what the remaining burst time of the process would be.
268
Now we will look at another scheduling algorithm known as the Round Robin
Scheduling algorithm. So, essentially with the round robin scheduling algorithm, a
process runs for a time slice that is a process executes for a time slice and when the time
slice completes it is moved on to the ready queue. So, in order to achieve this round
robin scheduling algorithm which is also a preemptive scheduling algorithm; now in
order to achieve the round robin scheduling, we need to configure the timer in the system
to interrupt the CPU periodically. At every timer interrupt, the kernel would preempt the
current process and choose another process to execute in the CPU.
269
Let us discuss the round robin scheduling algorithm with an example. So, one difference
with respect to the other scheduling algorithms that we have seen so far is the notion of
time slice. So, this is the Gantt chart (refer above image). So, especially see that
periodically in this case, with a period equal to 2 that is we have keeping a time slice
equal to 2, there is a timer interrupt that occurs and the timer interrupt would result in the
scheduler being run and potentially another process being scheduled into the CPU. So, a
data structure which is very useful in implementing the round robin scheduling algorithm
is the FIFO.
This particular FIFO stores the processes that need to be executed next into the CPU. For
example, in this particular case (mentioned in above slide image), P 2 is at the top of the
FIFO, so it is a next process which gets executed in the FIFO. So, P 2 gets executed over
here (next to P1 in Gantt chart). So, for example, we will still consider the 4 processes as
we have done before that is P 1 to P 4 and we will assume that all of them arrive at the
instant 0 and go into the FIFO of the ready queue and they have burst times of 5, 4, 2 and
3 respectively.
So let us say for discussion that the scheduler starts off with this particular order P 1, P 2,
P 3 and P 4 and it first chooses to execute P 1. So, P 1 executes for 2 cycles and then
there is a interrupt which occurs leading to a context switch and then the top of the FIFO
in this particular case (mentioned in above image) P 2 get scheduled into the CPU, while
P 1 gets pushed into the FIFO. So, P 2 then executes for 2 cycles until the next timer
interrupt in which case the time slice of 2 cycles completes and then it gets pushed into
the FIFO. So, P 2 now is at the bottom of the FIFO, while P 3 which is at the top of the
FIFO get scheduled to run. So, in this way, every two cycles a new process may get
scheduled into the CPU and execute.
So, if we compute the average waiting time. So, in this particular case, we will see that
the average waiting time and the average response time is different. So, what is the
average waiting time for P 1? So, the P 1 executes 2 cycles here (check first P1 block in
Gantt chart), 2 cycles here (second P1 block) and completes execution over here (last P1
block). So, it waits in the ready queue in the remaining of the cycles. So the number of
cycles it waits is 1, 2, 3, 4, 5, 6, 7, 8, 9 (P2, P3, P4 block then P2,P4 block). So, P 1 waits
for 9 cycles; now P 2 waits for 8 cycles – 1, 2, 3, 4, 5, 6, 7, 8; P 3 for 4 cycles and P 4 for
10 cycles. So the average waiting time is 7.75 cycles.
270
Now, to compute the average response time as we have defined it before, the response
time is the time the process enters into the ready queue to the time it begins to execute in
the CPU, that latency would be the response time. So, P 1 for instance has a response
time of 0 because it enters into the ready queue or enters into the FIFO and gets executed
immediately. P 2 on the other hand enters in the 0th cycle, but gets to execute only after 2
cycles, so it has a response time of 2 (refer round robin scheduling slide).
Similarly, P 3 enters at 0 but executes only at this point. So, at rather at this instant (After
P1 and P2 in Gantt chart) therefore, it has a response time of 4 and P 4 has a response
time of 6 therefore, the average response time is 3. Now the number of context switches
that occur is 7. So, in this case, 1, 2, 3, 4, 5, 6 and a context switch occurs over here (last
block P4 and P1) because the process P 4 is existing out and P 1 gets continues to
execute. So the numbers of context switches over here are 7 that is 1, 2, 3, 4, 5, 6 and the
7th context switch occurs over here (last block) then P 4 exits and P 1 gets switched into
the CPU.
Now let us take another more complex example of round robin scheduling, where we
also have arrival times which are not the same. So, P 1 is arriving at the 0th instant; P 2 at
the 2nd, P 3 and P 4 at the 3rd and 9th respectively. So, we can similarly draw the Gantt
chart and the states of the FIFO for this case. So to start with, in the 0 instant, the only
process which has arrived is P 1 and therefore, P 1 executes for 2 cycles. And at this
271
particular point (first pink line in above slide Gantt chart), when the timer interrupt
occurs, no other process is present as yet, therefore P 1 will continue to execute for
another 2 cycles for another time slice.
However, in this time slice, we have two processes which have entered into the ready
queue; these are the process P 2 and P 3. So, P 2 arrives at this interval (3 rd cycle) while P
3 arrives at this interval (4th cycle) and they get added into the FIFO. So, at the second
time slice completion, there is a context switch and P 2 gets scheduled into the CPU to
execute while P 1 which was executing will then go into the FIFO. So, P 2 executes for 2
cycles, then P 3 executes for 2 cycles, then P 1 executes for 2 cycles and at that time P 4
has arrived and gets added into the FIFO.
Now, we have three processes P 1, P 2 and P 4 and these get schedule to run for a period
of time. So the average waiting time in this case is 4.75, while the average response time
is 2. So, how is the average waiting time 4.75, it means that process P 1 has waited for 7
cycles so, before it completes. So that is 1, 2, 3, 4, 5, 6 and 7. While process P 2 has
waited for 6 cycles, so it is 1, 2, 3, 4, 5, 6; process 3 has waited for 3 cycles, and process
4 has waited for 3 cycles (refer above slide). The average response time can be verified
to be equal to 2 and the number of context switches was as before 7.
Now let us take the same example as we have done before, but with a time slice of one
that is we have reduced the time slice from 2 and we have made it 1. So, we see that in
272
every instant we have a timer interrupt and potentially context switch that can occur. If
we compute the average waiting time and average response time for this particular case
we see that the average response time in particular has reduced. Why has this occurred is
because once a process has entered into the queue it has to wait for lesser cycles before it
gets scheduled into the CPU.
On the other hand, the number of context switches has increased from 7 to 11. Since we
are having timer interrupts which are more frequent therefore, there is more likely that a
context switch will occur.
Now, if we take the same example, but with the time slice of 5 instead of 1, and if we
compute the average waiting time and average response time (refer above image), we see
that the response time increases considerably to 2.75 while the number of context
switches reduces quite a bit to 4. On the other hand, we see that the scheduling begins to
behave more and more like the first come first serve. The response time is bad because
due to the large time slice the scheduling behaves more and more like the first come first
serve which we know has a bad response time. So, from all this examples that we have
seen so far we seen we can conclude that the duration of a time slice is very critical. So,
it effects both the response time as well as the number of context switches.
273
So, essentially, if we have a time slice which is of a very short quantum, the advantage is
that processes need not wait too long in the ready queue before they get schedule into the
CPU. Essentially this means that the response time of the process would be very good or
it would have a less response time.
On the other hand, having a short time slice is bad, because we would have very frequent
context switches. And as we seen before context switches could have considerable over
heads. Therefore, it degrades the performance of the system. A long time slice or a long
quantum has a drawback that processes no longer appear to execute concurrently, it
appears more like a first come first serve type of scheduling algorithm and so this again
in turn may degrade system performance. So, typically in a modern day operating
systems the time slice duration is kept anything from 10 milliseconds to 100
milliseconds. So, xv6 programs, programs timers to interrupt every 10 milliseconds.
274
So the advantage of the round robin scheduling algorithm is as follows. The algorithm is
fair because each process gets a fair chance to run on the CPU. The average wait time is
low especially when the burst times vary and the response time is very good. On the
other hand, the drawbacks of the round robin scheduling algorithm are as follows; there
is an increase number of context switching that occurs and as we have seen before
context switching has considerable over heads. And the second drawback is that the
average wait time is high especially when the burst times have equal lengths.
275
The xv6 scheduling policy is a variant of the round robin scheduling policy; the source
code is shown over here (refer above slide). So, essentially what the xv6 scheduler does
is that it parses through the ptable array. So, we have seen ptable before, which is an
array of procs and the scheduler parses through this particular array and finds the next
process that is runnable and invokes the switch. So, this is where you invoke the context
switch (switchkvm(); in above slide code). So, every time the scheduler executes, the
next process in this array (mentioned in circle in above slide) that is runnable gets
scheduled to execute.
So next, we will see scheduling algorithms which are based on priority. So, these are the
priority based scheduling algorithms.
Thank you.
276
Week – 05
Lecture – 19
Priority Based Scheduling Algorithms
So far we had seen about some Scheduling Algorithms like, First Come First Serve and
the preemptive scheduling algorithms like the Round Robin.
So in this particular video, we will look at a class of algorithms for scheduling which are
known as the Priority Based Scheduling Algorithms. So we will start this lecture with a
motivating example.
So, let us relook at this particular example (mentioned in above slide) which we have
taken in the last video. So we had seen this Round Robin Scheduling Algorithm where
we took four processes P1 to P4, and these processes arrived at different times 0, 2, 3 and
9, and they had different burst times like 7 cycles, 4, 2, and 1 cycles respectively.
Now we will add a bit above the processes which are involved. Let us assume that P2 is a
critical process, while P1, P3 and P4 are less critical. So, what is a critical process? So let
us take an example of an operating system that runs in a micro processor present in a car,
277
and whenever the breaks are pressed process P2 will execute and it will do several task
related to breaking of the car, while process P1, P3 and P4 may be less critical tasks.
With respect to the car again for instance process P1, P3 and P4 may be the music player,
so one of the process may be the music player while another one could be for instance
controlling the AC or the heater and so on.
So, lets us look at this Gantt chart again, we see that P2 arrives in the second cycle and it
takes considerable amount of cycles to complete executing. The main reason for this is
that this is a round robin scheduling scheme and other processes have a fair share of the
CPU. So, P2 first executes here (1st blue block in Gantt chart) and then there is a P1 and
P3 which executes, and then P2 executes again and again (2 nd and 3rd blue block) and
finally completes executing its burst at this particular time (4th blue block).
Now, you will see that there is an obvious problem with this particular scheme. Since, P2
is a critical process such as controlling the breaking of the car it has taken really long
time for it to complete its execution, and this could lead to catastrophic accidents.
So what one would have expected is something like this. Whenever a critical process
such as P2 arrives in the ready queue, it should get a priority and it should be able to
execute continuously, irrespective of the fact that there are other processes with the lower
priority present in the queue.
278
So this is what we would like to have in a more realistic situation. So scheduling
algorithms which take care of such priorities among processes are known as Priority
Based Scheduling Algorithms, and this is what we are going to study in this particular
video lecture.
In a priority based scheduling algorithm each process is assigned a priority number. A

priority number is nothing but a number within a particular range, so this range is
predefined by the operating system. For instance, the priority range is between 0 and
255. Therefore, every process that is executed would be assigned a priority number
between 0 and 255. A number which is small, that is a priority number which is small
that is close to 0 would mean that the process is a high priority process or a high priority
task. On the other hand, a large priority number would mean that the process has a low
priority.
So this is just a nomenclature which Linux follows, other operating systems may have a
different approach. So it would keep for instance a higher number indicating a high
priority process while a lower priority number indicating a low priority process, but for
this lecture we will goes with the Linux nomenclature where a small priority number
means a high priority process and vice versa.
279
Now, in a priority based scheduling algorithm even though there may be several tasks
present in the ready queue, the scheduling algorithm will pick the task with the highest
priority. The advantage of the priority based scheduling algorithms is quite easy to see.
Essentially in an operating system which supports a priority based scheduling algorithm,
tasks can be given relative importance or processes could be given relative importance.
High priority processes would get a higher priority to actually execute in the CPU
compare to low priority processes.
On the other hand, the main drawback of having a priority based scheduling algorithm is
that it could lead to what is known as starvation. So essentially, a starved process is one
which is not able to execute at all in the CPU, although it is in the ready queue for a long
time. It is quite easy to see that starvation mainly occurs for low priority processes.
SO let us see how starvation occurs with an example. So we will take the same four
processes P1 to P4 arriving at different times 0, 2, 3 and 9, and having different burst
times 8, 4, 2 and 1. Also, let us assume that the priority of P1, P2 and P3 is higher than
P4 or in other words P4 is the lowest priority process. We will also make this assumption
that every 15 cycles this particular sequence repeats, that is after the 15th cycle of the
processor, P1 again arrives, P2 again arrives, P3 and similarly P4 arrives, so this
sequence of processes arriving into the ready state continuous infinitely.
280
So what we see, if you look at the Gantt chart is that depending on the priority P1, P2
and P3 are scheduled. However, the low priority task never gets scheduled within the 15
cycles. Additionally, since P1, P2, P3 and P4 arrive again into the ready queue after 15
cycles. So, again P1, P2 and P3 get scheduled while P4 never get scheduled. So as a
result what we see is that, this lowest priority process gets starved.
Essentially, even though it is in the ready queue and just waiting for the CPU to be
allocated to it, it never gets a chance to execute in the CPU because there are always
higher priority processes which are arriving in the ready queue.
So let us see how priority based scheduling algorithms deals with starvation. So
essentially, in operating systems which have a priority based scheduling algorithm, the
scheduler would dynamically adjust the priority of a process to ensure that all processes
would eventually execute in the CPU. What this mean is that, in our example the priority
of process P4 would gradually increase over a period of time until the point that it has a
priority which is greater than or equal to the priority of the processes P1, P2 and P3, at
that time process P4 will get a chance to execute in the CPU.
Thus, we see by elevating process P4 over time the starvation that would otherwise have
occurred, since P4 was a low priority process is eliminated. So there are several
techniques that priority based scheduling algorithm could deal with starvation. So we
281
give one particular example here in which an operating system with a priority based
scheduling algorithm could ensure that every process, even the low priority process will
eventually execute in the CPU. So, let us say in this particular operating system when a
process starts it is given a base priority. Now, this base priority will differ from process to
process, a high priority process will be given a high priority number, while a low priority
process will be given a low priority number.
Now, after every time slot or time slice the priority of the process is increased by a fixed
value. So, essentially all processes in the ready queue have their priority increased by a
fixed value except for the process that is currently being executed in the CPU. So what
you would see that, over a period of time all processes in the priority queue would
gradually have their priority increased. Now even a low priority process would have its
priority increased up to the point that its priority is high enough so that it is scheduled the
CPU.
So, after it is scheduled into the CPU and executes in the CPU, at the end of its time slice
its priority is reset back to its base priority. Thus the process which starts off with the
base priority of 250 will gradually have its priority increased, so it is a point that is
begins to execute and after it’s executes its priority is reset to 250. Thus, starvation
would be avoided.
282
So based on this, there are two types of priorities present for a process. There is the static
priority which is typically set at the start of execution, so it can be set by user who is
starting that application, and by chance if the user does not set that particular priority
then a default value is taken.
On the other hand, the second type of priority is known as the Dynamic Priority, where
the scheduler can change the process priority during the execution, in order to achieve
some scheduling goals. For example, the scheduler could decide to decrease the priority
of a process in order to give another process a chance to execute. Another example as
you have seen before is to increase the priority of I/O bound processes. Since, I/O bound
processes would typically require a faster response time.
One disadvantage of having a range of numbers from which a process obtains its priority,
is the case when there are a large number of processes present in the system. In such a
case, there may be several processes which are executing with exactly the same priority.
For instance, let us say that a priority range is from 0 to 255 and there are may be over a
ten thousand processes running in the system. In such a case each priority number there
may be several processes executing with that priority number.
So this is depicted in this particular figure (mentioned in above slide image), where we
have four processes arriving at different times and having different burst times, but all of
283
them have the same priority. So in such a case, the scheduling would begin to behave
like non priority based scheduling algorithm. For instance it would begin to behave like a
round robin scheduling algorithm.
So in order to handle this, what modern operating systems have is Multilevel Queues
(mentioned in above slide) that is instead of assigning a single priority number to each
process; processes are assigned a priority class. Now there could be several priority
classes within the operating system and each of these priority classes could range from a
high priority to a low priority. For instance, there could be a real time priority class, a
system priority class and interactive priority class or batch priority class and so on.
All real time processes would be present in this priority class. So each of these priority
classes has a ready queue and when a real time process is ready to execute it gets added
on to this particular ready queue, that is it gets added on to the ready queue present in the
real time priority class.
Similarly, if an interactive process wants to execute and it is ready to execute it gets

added to this ready queue present in the interactive priority class. Now, at the end of a
time slice when the scheduler executes, the scheduler picks the highest priority class
which has at least one ready process. So for instance, the scheduler will scan through
these priority classes starting from the highest priority to the least priority, and it will
284
select that particular priority class which has at least one process in the ready queue.
Now, if there is exactly one process in the ready queue then we do not have a problem,
now that process is going to execute in the CPU.
However, if we have multiple processes in the ready queue corresponding to the ready
queue chosen by the scheduler, then a second scheduling algorithm would be used to
pick or to choose from this set of processes. For instance, let us say the scheduler has
decided to choose the interactive priority class, because there were no another processes
ready in the real time and the system priority classes.
Further, let us assume that there are 4 processes present in the interactive class. Now the
second scheduling algorithm such as a Round Robin or a FIFO scheduling algorithm will
then execute to determine which of these four processes will be assigned the CPU. Thus,
in multilevel queues there are two scheduling algorithms involved. The first is a priority
based scheduling algorithm, which chooses one of the priority classes, while the second
is a non priority based algorithm, which chooses among the processes in that priority
class. The second scheduling algorithm would typically the round robin, while for high
priority processes the second scheduling algorithm is typically something like first come
first serve and so on.
285
Further, in schedulers that support multilevel queues, the scheduler could adjust the time
slice based on the priority class that is picked. So, I/O bound processes are typically
assigned a higher priority class and a longer time slice. This ensures that the I/O bound
processes would be quickly serviced by the CPU as soon as possible. Also the longer
time slice will ensure that the burst of the I/O bound process completes before the time
slice completes.
CPU bound processes on the other hand are assigned a lower priority class and given
shorter time slices. The main drawback of having a multilevel scheduling queue based
scheduling algorithm, is that the class of the process must be assigned apriori that is we
would require to know whether a process is a CPU bound process or an I/O bound
process. Now this is not an easy thing to do; first it is very difficult to identify whether a
process is indeed an I/O bound process or a CPU bound process. Second, a single
process may act like an I/O bound process at one time, but at another time it can act like
a CPU bound process.
That is, at one time it may have short CPU burst and long burst of idle time, while at
other times it may have long CPU burst and short I/O burst like a CPU bound process.
Therefore, deciding apriori whether a particular process belongs to I/O bound process
and it should be given a higher priority is difficult.
286
So, one way to mitigate this particular limitation is to use schedulers which use
multilevel feedback queues. In such schedulers processes are dynamically moved
between priority classes depending on its CPU, I/O activity. So these particular
schedulers work on this basic observation that CPU bound processes are likely to
compute its entire time slice. This is because a CPU bound process will have a long burst
of CPU and therefore it is quite likely that it would use up the entire time slice.
On the other hand I/O bound processes have a very short CPU burst time and therefore it
is not likely to complete its entire time slice. So if you take this particular example
(mentioned in above slide), we can say that process 1 and 4 have completed their time
slice; therefore these processes are likely to be CPU bound. On the other hand, process 2
has very short CPU burst; therefore process 2 is likely to be an I/O bound process rather
than a CPU bound process.
The basic idea of a multilevel feedback queue based scheduling algorithm is the
following; all processes start with the highest priority or a base priority. Now if the
process finishes its time slice, that is if the process executes until its time slice completes
then the assumption is made that process is a CPU bound process. So it is moved to the
next lower priority in the class.
287
On the other hand if the process does not complete its time slice, it is assume that the
process is an I/O bound process and therefore it would either increase the priority class to
a higher priority class or keep it on the same priority class. So this as you can see is a
dynamic way of changing priority classes of a process. However, the drawback is that we
could still have starvation and then techniques are to be implemented where starvation
needs to dealt with.
Another drawback of this particular scheduling algorithm is that, a malicious process

could game the system and always remain in the high priority class. So, let us see how
this works with an example (mentioned in above slide). So let us assume the malicious
process knows the time slice of the system that is it knows whenever the time slice
completes and it knows when a context switch occurs. So let us say it has a loop such as
this over here (refer above slide). Essentially, the malicious process will do some work
for most of the time slice and then it will sleep till the end of the time slice.
Now as we know when a process goes to sleep, it goes from the running state to the
blocked state. Therefore, sleep will force a context switch to occur. Therefore, what is
going to happen is that a new process will execute for the remaining of the time slice.
Thus, for the entire time slice for instance over here (refer above slide) process 4 which
is a malicious process would run for most of the time slice and then just before the time
slice completes it goes to sleep; thus, forcing an other process to execute.
288
The other process 1 will now execute and just execute for a small duration. At the end of
the time slice the scheduler is going to see that process 1 has completed the time slice, so
it is going to assume that process 1 is a CPU bound process and move the process 1 to a
priority class which is lower.
On the other hand, process 4 which has blocked on a sleep will remain in the high
priority class. Thus, process 4 will be able to game the system. It will continue to execute
from the high priority class while another process such as process 1 will be moved on to
a lower priority class.
Thank you.
289
Week - 05
Lecture – 20
Multiprocessor Scheduling
Hello. So far we had seen how scheduling algorithms are designed for a single CPU. So,
the scheduling algorithm would choose a process to execute for the CPU. In this
particular video, we will look at Multiprocessor Scheduling Algorithms.
Essentially, we will see that if we have multiple processors in the system, how a
scheduling algorithm could schedule processes into these various CPU's?
290
One very simple scheme for a multiprocessor scheduler is where there is a dedicated
CPU to run the scheduler. This scheduler decides for everyone (CPU 3 is scheduler
mentioned in above slide), essentially decides which process should run in which CPU.
Now implementing this scheme is very simple. So, this scheduler is going to have a local
queue of processes which are ready to run and it would use some mechanism to decide
which of these processes should be scheduled into which CPU. The limitation, as one
would expect is the performance degradation.
Since, all CPU's are waiting for this scheduling CPU to tell it which process to execute
next and this could happen for instance every 10 milliseconds. Therefore, there could be
significant overheads due to scheduling.
291
Another multiprocessor scheduling scheme is the symmetrical scheduling scheme. Here

instead of a single CPU which decides for all processors in the system, here each CPU
runs its own scheduler which is independent of each other (mentioned in above slide).
Therefore, each CPU at the end of a time slice decides which process it is going to
execute next. Now, there are two variants of the symmetrical scheduling scheme - one is
with a Global queue another with Per CPU queues. We will look at each of these variants
next.
292
In the symmetrical scheduling scheme with global queues, there is exactly one queue of
ready processes which is shared among all processors (refer above slide). These
schedulers in each CPU would need to look up this global queue to decide upon which
process to run next.
For instance, CPU 0 will need to look up this global queue and decide on a particular
process to run next. That process will then change from the ready state to the running
state. The advantage of the scheme is that there is good CPU utilization and fairness to
all processes. The drawback of this particular scheme comes from the fact that a single
queue is shared among the various processors in the system. Thus, we could reach a state
where there are two processors which query the queue at exactly the same time and pick
exactly the same process to execute. Thus, a single process may execute in the same
instant of time in two different CPU's.
In order to prevent such issues it is require that access to this particular queue is
serialized. That is, if CPU 0 wants to access the queue no other CPU should be able to
access that queue during that particular time. This ensures that only CPU could choose a
process at a particular instant of time.
293
Now, this mechanism of serializing access to the global queue is achieved by a technique
known as Locking. Now, while this locking mechanism would work it is not a good idea
because it will not scale. Essentially, consider the fact that instead of 4 processors in the
system there are now 64 processors and all these processors have a serialized access to
this global queue. Thus, when CPU 0 is accessing this global queue, all other processors
will have to wait and this could lead to considerable amount of time during scheduling.
Another disadvantage comes from the fact that processor affinity is not easily achieved.
So, what is Processor Affinity? Processor Affinity is an option given to users in the
system to choose which processor they want their process to execute in. For instance,
user may decide that he wants his process to execute only in CPU 0 and no other CPU's.
Now, in this particular scheme it is more difficult to implement processor affinity. Now
this particular scheme is used in Linux 2.4 Kernels as well as the xv6 operating system.
The second variant of symmetrical scheduling is where there is a single CPU queue for
every CPU that is each CPU has its own queue of ready processes. Thus, each CPU
needs to only look at its own queue to decide which process it needs to execute next.
Now, this particular strategy uses a static partitioning of the processes, that is at the start
294
of execution of a process either the user or the operating system decides which CPU the
process needs to execute, and the process is then placed in that corresponding CPU's
queue (refer above slide). As you can see, the advantage is that it is easy to implement,
there is no locking mechanism present, so it is quite scalable as well as due to locality, in
the sense that each CPU just needs to look at its own CPU queue and is not bothered
about any other CPU queues that are present. Therefore, there is no degradation of
performance. The advantage of the scheme is that it is easy to implement and it is
scalable. Scalable comes from the fact that there is no locking mechanism or serialization
of the accesses to the CPU queues.
Another advantage is that the choice is local. Essentially the CPU 0 is only concerned
with its own queue and not concerned with any other queue present in the system.
Therefore, the choice made is very quick, there is minimum performance degradation.
The drawback of the scheme is that it could lead to load imbalance. Load imbalance
occurs when some CPU queues have a lot of processes to execute or rather some CPUs
have a lot of processes in the ready state while other processes such as CPU 3 have just
very few processes in the ready state (mentioned in above image). Thus CPU 2 is doing
lot more work compare to CPU 3.
295
A third way to do symmetrical scheduling is a Hybrid Approach. So, this approach is
used in Linux Kernels from 2.6 onwards. So, essentially this approach uses both local
queues as well as global queues (mentioned in above slide). The local queues are the
queues which are associated with each CPU. Essentially, each CPU has its own local
queue while the global queue is shared among the CPU's. The global queue is used to
ensure that load balancing is maintained that is it is going to ensure that each CPU would
have a fair share of the processes to execute. The local queues would ensure locality, by
having the local queue the performance degradation of the system is minimized.
Beside the global queue there are two more techniques to achieve Load Balancing - one
is the Push Migration and the other is the Pull Migration. With the Push Migration, a
special task would periodically monitor the load of all processors and redistribute work
whenever it finds an imbalance among the processors. With Pull Migration, an idle
processor will pull a task from a busy processor and start to execute it.
Thus migrating processes from busy processors to less busy processors will achieve load
balancing. This being said, process migration should be done with the pinch of salt.
Essentially, migrating a process from one CPU to another is expensive. Whenever a
process migrates from one CPU to another, all memories with related to the new CPU
296
needs to be repopulated, by this we mean the cache memory associated with that CPU
needs to load the migrating processes instructions and data. Similarly, the TLB needs to
be flushed and so on. This could lead to significant overheads. Thus process migration
should be done with only if required.
With this we will end this video which looked at Multiprocessor Scheduling Schemes
and Load Balancing among the different CPU's.
Thank you.
297
Week - 05
Lecture - 21
Scheduling in Linux (O(n) and O(1) Scheduler)
Hello. In this lecture we will look at Scheduling in the Linux operating systems.
Essentially, we will look at how the schedulers in the Linux kernel evolved over time.
So, the reference for this particular lecture is this particular book, Understanding the
Linux Kernel, 3rd Edition by Daniel Bovet and Marco Cesati.
Linux classifies processes into 2 types, one is the real time process and the other is
normal processes. Essentially a process could be either real time or a normal process.
Real time processes are those which have very strict deadlines. For instance, there could
processors involving control of a robot or data acquisition systems or EIC controllers.
Real time processes should never miss a deadline and they should never be blocked by a
low priority task.
The other type of processes, are the non real time processes or in generally called normal
298
processes. So, normal processes are of two types they are Interactive or Batch. An
interactive process constantly interacts with the users, therefore spends a lot time waiting
for key presses and mouse operations.
So typically, you could consider these as I/O bound processes. When an input is
received, the process typically should wake up with in 50 to 150 milliseconds. If it gets
delayed more than 150 milliseconds then the system begins to look sluggish, in the sense
that a user will not feel very comfortable using the system. On the other hand, batch
processes are most closely similar to the CPU bound processes, which do not require any
user interaction and they often run in the background. For instance - gcc compilation or
some scientific operations like MATLAB are fall into these type of processes.
Once a process is specified as a real time process, it is always considered as a real time
process. In the sense that once a user specifies that a process is a real time process, then
it is always going to be a real time process even though it does nothing, but sleeping.
299
On the other hand, for processes which are non real time that is the normal processes, it
can behave as either an interactive process at one point of time or at other points in time
as a batch process. Essentially a normal process could behave as an interactive process or
a batch process and this behavior could vary from time to time. So, in order to
distinguish between interactive and batch processes, Linux uses sophisticated heuristics
based on past behavior of the process to decide whether a given process should be
considered a batch or an interactive process. So, we will look more into detail about this.
300
So, looking from a historic perspective, over the years starting from Linux 2.4 to 2.6
there was a scheduler which was adopted which was known as the O(n) scheduler. Then
from Linux 2.6 to 2.6.22 the scheduler was called the O(1) scheduler and the scheduler
currently incorporated is known as the CFS scheduler (Linux 2.6.23 onwards).
Now, we will look at each of these schedulers in the following lecture.
301
So let us start with the O(n) scheduler (refer above slide). O(n) scheduler means that the
scheduler looks into the queue of ready processes which could be at most ‘n’ size, and it
takes into each of these processes and then makes a choice about the process that needs
to execute. So, essentially at every context switch, the O(n) scheduler will scan the list of
runnable processes, compute priorities and select the best process to run. So, scanning
the list of runnable processes is an O(n) job, essentially we have assume that the ready
queue has up to ‘n’ processes present.
So, this obviously is not scalable. It is not scalable because as ‘n’ increases in number
that is as more and more processes or tasks enter the ready queue, the time to make a
context switch increases. Essentially the context switch overheads increase and this is not
something which is wanted for.
So, these scalability issues with the O(n) scheduler was actually noticed when Java was
introduced, roughly the early 2000s and essentially due to the fact that JVM - the Java
Virtual Machine spawns multiple task, and each of these tasks are present in the ready
queue. So, the scheduler would have to scan through this ready queue in order to make
their choice. Another limitation of the O(n) scheduler was that it used a single global run
queue for SMP systems. Essentially when you had multiprocessor systems, as we have
302
seen in the previous lecture, a single global run queue was used. So, this again as we
have seen is not scalable.
So the next choice was something known as the O(1) scheduler. The O(1) scheduler
would make a scheduling decision in constant time, irrespective of the number of jobs
which are present in the ready queue. Essentially it would take a constant time to pick the
next process to execute in the CPU. So, this quite obviously scales, even with large
number of processes that are present in the ready queue. In the O(1) scheduler, processes
are divided into two types; the real time processes and normal processes as we have seen.
So, the real time processes are given priorities from 0 to 99, with 0 being the highest
priority and 99 being the least priority of a real time task.
Now, normal processes on the other hand are given priorities between 100 to 139. So,
100 is the highest priority that a normal process could have while 139 is the lowest
priority that any process could have, and as we have seen before the normal processes
could be either interactive or batch. We will see later how the scheduler distinguishes
between the interactive and batch processes.
303
Now, we will talk about a scheduling in normal processes that is the non real time
processes. Essentially the scheduling algorithm which is used is very similar to the multi
level feedback queue with a slight variation. Essentially in this particular scheduler there
are 40 priority classes, which are labeled from 100 to 139, and as we have discussed
before the lowest priority is corresponds to the class 139 while the highest priority for the
normal task is at 100 (mentioned in above slide).
So, corresponding to each of these tasks there is a ready queue. So, every process or
every task present in the ready queue corresponding to a particular priority class has
equal priority to be executed on the CPU. On the other hand, a process present in the
100th priority class has a higher priority to execute with respect to a process present in
102 that is a process present in the ready queue corresponding to 102 (mentioned in
above slide). Now based on this priority class, the scheduler maintains 2 such queues.
So, one is known as the Active Run queues and the other one is known as the Expired
Run queues.
304
When a context switch occurs, the scheduler would scan the active run queues, starting
from the 100th run queue and going up to 139. It picks the run queue which has a non
zero or a non empty queue present in that (refer above slide).
So for instance over here (mentioned in above slide) it starts from 100 and picks this first
particular process (circled in black) and this process is executed in the CPU. At the end
of the time slice this process is put into the expired run queue. So, gradually you will see
that as time proceeds and time slices complete, the scheduler keeps picking out processes
from the active run queues, executes them and puts them into the expired run queues.
After a while we would have a point where the entire processes in the active run queue is
complete. Essentially we come to the point that none of these priority classes have any
processes with them.
On the other hand the expired run queue is now filled up. Essentially because the
scheduler has been adding processes to each priority class in the expired run queue.
305
When this happens (as mentioned in above para), the scheduler would switch between
the active run queues and the expired run queues. Now this (Expired Run queues
mentioned in above slide) is the queue which actually becomes active while this (Active
Run queues) becomes the expired run queues.
Now, for consecutive time slice completions, the scheduler would pick task from this
particular queue (from right hand side queues mentioned in above slide) and execute
these task. After execution these task could be moved to this queue over here (left hand
side queues), which is now the expired run queue. In this way there is a toggling between
these two queues. All tasks present in one queue are executed and moved on to the task
in the other queue. Then all tasks over here executed and after executing the task are
moved to the previous queue and so on. So, this process goes on continuously. The main
reason for having such a technique in the Linux kernels scheduler is that it prevents
starvation it ensures that every process gets a turn to execute in the CPU.
306
Recall that we call this particular scheduler as an O(1) scheduler. Essentially this means
it is constant time scheduler irrespective of the number of processes present in the ready
queues.
Now, let us analyze how this particular thing is constant time. Now if we recall there are
2 steps involved in scheduling a process. First is to find the lowest numbered queue with
at least 1 task present in that corresponding ready queue. Second choose the first task
from that ready queue. Now how are these two things constant time? It is quite obvious
that the second step is constant time, essentially we are choosing from the head of the
queue and that is obviously done in a fixed amount of time irrespective of the size of the
processes. Next question is how is step 1 constant time? So, essentially we know that the
scheduler scans through all the priority queues starting from 100 and going up to 139 and
chooses the lowest numbered queue which has at least one task, essentially it chooses the
lowest non empty queue.
So, this does not seem to be constant time. So, how does the scheduler actually
implement this, so that the time taken to actually find the lowest non empty queue is
independent of the number of processes? In order to do this, the scheduler or the Linux
scheduler makes use of two things, the first is a bitmap. This is a 40 bit bitmap which
307
stores the run queues with non zero entities. Essentially this particular bitmap stores zero
for a class which is empty, and it stores one for a class which has non zero entities.
Second it uses a special instruction known as find first bit set. So, this particular
instruction (mentioned in above image) for example, there is a bsfl instruction on Intel
would look at this 40 bit bitmap and choose the lowest index, which is non zero.
Essentially this would give us the lowest numbered non zero bit in the bitmap and this
corresponds to the lowest non zero or rather lowest non empty priority queue.
Now, let us look more at the priorities of the O(1) scheduler. So, we had seen that a
priority 0 to 99 are meant for real time tasks, while priority 100 is the highest priority
that can be given to a normal process, while priority 139 is the lowest priority that a
normal process could have. Now these are the ranges of the priorities for the normal
process that is a normal process could have anything from a priority of 100 to a priority
of 139. The priority 120 is known as the base priority and taken by default. So, whenever
you start a program for example, whenever you start a.out program it is given the priority
120. Now you could change this base priority by using the command called nice. The
nice command line is used to change the default priority of the process from something
from 120 to something else.
308
So, the command would look something like this way (mentioned in above slide). So,
$nice –n N ./a.out, where the capital N can take a value of +19 to -20. So, essentially the
value that we specified here (in place of N) gets added to the base priority. So, for
example, if I say $nice –n +19 ./a.out, so, the process that I am starting will have a
priority 120 + 19. So, that is 139. So, this would be the least priority that it could have.
On the other hand, you could also specify a priority like $nice –n -20 ./a.out. So, this
would give the process a priority of 100. So, this is the highest priority that the process
could get considering that it is a normal process. So, these are static priorities that the
process could get during the start of execution.
Besides the static priority, the scheduler also sets something known as the Dynamic
priority. This dynamic priority is based on heuristics and is set based on whether the
process acts like a batch process or an interactive process. Essentially its set based on
whether the process has more of I/O operations or is more CPU bound.
So, the scheduler uses some heuristics which is present over here (mentioned in above
slide) to compute the dynamic priority. Essentially it takes the static priority which we
have set which is either 120 by default a given at the start of the execution, or the priority
based on the nice value and subtract something known as the bonus and adds plus 5. So,
309
it takes min of static priority minus bonus plus 5 and 139 (i.e MIN(static priority – bonus
+ 5), 139) and it then takes the max of 100 and this value i.e (MAX(100, MIN(static
priority – bonus + 5), 139)). So, for instance let us say we start the process with the
default static priority that is 120 and we give a bonus of say 3, so then we have like 120 -
3 + 5 that is one 122. So, MIN(122,139) is 122 and you take MAX(100,122) is 122. So,
the dynamic priority for this particular example was 122.
Now, the crucial thing is this bonus. So, how is this particular bonus set? So, essentially
this bonus has a value between 0 and 10 if the bonus is less than 5, it implies there is less
interaction with the user. Thus the process is assumed to be more of a CPU bound
process. The dynamic priority is therefore, decreased, essentially the dynamic priority
goes towards 139 of a low priority task (refer above slide). If the bonus is greater than 5
on the other hand it implies that more there is more interaction with the user that is the
process acts more like a I/O bound process and essentially the process behaves more like
an interactive process. The dynamic priority is therefore, increased that is it goes towards
100 (refer above slide).
So, you can take two examples. For instance we have seen a bonus of 3 and we had seen
that it is resulted in a dynamic priority of 122 (from the example given in above para)
implying that it is more of a CPU bound process therefore, is given a lower priority in
the system. On the other hand if you take a bonus of 8 and compute the same thing you
will see that the dynamic priority actually reduces. So, the crucial part is now how do we
actually set the value of bonus?
310
So, the bonus is in the scheduler is determined by the average sleep time of a process.
The assumption here is I/O bound processes will sleep, or in another words will block
more, therefore should get a higher priority. So, therefore, I/O bound processes will have
higher value of bonus.
On the other hand CPU bound processes will sleep less, therefore should get a lower
priority. If we look at this table (refer above slide) and if we see that the average sleep
time for a process is greater than 200, but less than 300 milliseconds, it is given a bonus
of 2. So, this is just heuristic and followed as a formula in the kernel. On the other hand
if the process has an average sleep time greater than or equal to 800 millisecond but
smaller than 900 millisecond, we give it a larger bonus of 8. Thus these processes,
process with the high value of bonus, that is those processes which sleep more are
considered interactive processes and given a dynamic priority which is closer to 100.
On the other hand processes which sleep less are assumed to be CPU bound processes,
and given a bonus which is low, and as a result the dynamic priority is lowered that is it
goes towards 139.
311
So, how are these dynamic priorities affecting the scheduling algorithm? Essentially if
you come back to the active and expired run queues (mentioned in above slide), the
dynamic priority determines which queue the particular process would be placed in. For
instance when we choose a particular process to execute, say from the 100 th priority class
and after it executes its average sleep time is then used to compute the bonus, and
thereafter use to compute the dynamic priority, and then this process (process which was
executed now) is placed in the corresponding priority class.
So as a result based on the average sleep time, when the process is placed back in the
expired run queue the class in which it is placed in would depend on its dynamic priority.
312
So besides the dynamic priority, there is also the time slice which is adjusted by the
scheduler. Essentially interactive processes have high priorities, but they are more of I/O
bound processes, and they are likely to have short CPU burst times. These processes,
essentially interactive processes are given the largest time slice to ensure that it
completes its burst without being preempted. So, in order to set the time slice we are
using more heuristics, such as this (If condition statement in above slide image).
Essentially if the priority is less than 120 that is it is an interactive process then the time
slice is given as (140 – priority) * 20 milliseconds.
However if it is more of a batch process that is, using more of CPU activity then the time
slice is set by the lower statement (else statement in above slide image) that is time slice
is equal to (140 – priority) * 5 milliseconds. So, as a result of this if you see this
particular table (refer above slide), you see that if you have a static priority of 100 that is
obtained by having a nice value of -20. So, it means that this is a interactive process and
is given a large time slice of 800 millisecond. On the other hand, a process with a static
priority of 139 that is 120 + having a nice value of 19 (120 + 19) it is given the lowest
time slice or the smallest time slice of 5 milliseconds.
So, thus we see that based on the priority of the process the corresponding time slice
313
given to that process is fixed.
So summarizing the O(1) scheduler it is a multi level feedback queue with 40 priority
classes. The base priority is set to 120, but modifiable by the use of this command called
nice. The dynamic priority set by heuristics is based on the processes average sleep time.
The time slice interval for each process is set based on the dynamic priority.
314
So the limitations of the O(1) scheduler is as follows; So, the O(1) scheduler uses a lot of
complex heuristics to distinguish between interactive and non interactive processes. So,
essentially there is a dependency between the time slice and the priority, and the priority
and time slice values are not uniformed. That is for processes with a priority of 100
compare to processes with the priority of 139; if you compute the time slices in these two
ranges you see that it is not uniform.
If you look back in the table over here (refer slide time 20:28), you see going from a
priority of 100 to 110 has a time quantum of a 200 milliseconds; however, going from
130 to 139 has a time slice difference of just 45 milliseconds. So, with this we will end
this particular video lecture.
So in the next lecture we look at the CFS scheduler, which is the latest scheduler or the
current scheduler which is used in Linux kernels.
Thank you.
315
Week – 05
Lecture – 22
Completely Fair Scheduling
In this video lecture, we will look at the Completely Fair Scheduler. So, the completely
fair scheduler or the CFS scheduler is the default scheduler used in Linux kernels in the
latest versions.
So the CFS scheduler has been incorporated in the Linux kernel since version number
2.6.23, and has been used as the scheduling algorithm since 2007. So, it was based on the
Rotating Staircase Deadline Scheduler by Con Kolivas. So, the advantage of the CFS
scheduler compared to the O(1)scheduler in particular is that there are no heuristics
which are used and there is very elegant handling of I/O bound and CPU bound
processes.
Essentially the interactive and non-interactive or batch processes are very easily fit into
this particular scheduler. So, we will see a very brief overview of the CFS scheduler.
316
Now as the name suggest the CFS scheduler or the completely fair scheduler aims at
dividing the processor time or the CPU time fairly or equally among the processes.
In other words, if there are N processes present in the system or present in the ready
queue and waiting to be scheduled, then each process will receive (100/N)%of the CPU
time. So, this is the Ideal Fairness. Let us take a small theoretical example for this ideal
fair scheduling.
Let us consider the four processes A, B, C and D, and having the burst time 8
milliseconds, 4 milliseconds, 16 milliseconds and D has the 4 milliseconds respectively
(refer above slide). So, what we will do is let us just divide the time into quanta of 4
milliseconds slices and what we will now see is how the ideal fair scheduling should take
place. So, in an ideal fair scheduling, at the end of say this 4 milliseconds epoch, all
processes which are in the ready queue should have executed for the same amount of
clock cycles.
For instance, if we look at this particular first epoch (first cycle), so it has 4 milliseconds,
and we have four processes are present in the ready queue; therefore each process should
get 4 divided by 4 that is 1 milliseconds of processor time (refer above Ideal fairness
317
block). Therefore, A, B, C and D will execute for 1 millisecond (each). In a similar way
for the 2nd cycle, there are four processes again, and therefore, these four processes get an
equal share of the slice. So, each process executes for 1 millisecond again. So, therefore,
in all A has executed for 2 milliseconds, B for 2, C for 2, and D for 2 milliseconds.
Similarly, for 3 and for 4 (same cycle as first and second), so at the end of the 4 th epoch,
we see that processes B and D have completed (4ms of both processes are over). So,
what happens next?
Now after B and D completes, we see that we have two processes present in the ready
queue that is A and C, also the time quanta remains as 4 milliseconds. So, now each
process gets 4 (time slice) divided by 2 (remaining process) that is 2 milliseconds of the
processor time. Therefore, process A executes for 2 milliseconds, similarly process C
executes for 2 milliseconds. Similarly, for the next epoch, A executes for 2 more
milliseconds, and C executes for 2 more milliseconds. So, both have executed for 8
millisecond and as a result, A has completed executing.
318
Now, the last part we see that only C is present in the ready queue and it is the since it is
the only process which is present in the ready queue. So, it is given the entire slot of 4
milliseconds. So, C executes for 4 milliseconds and followed by the final slot where it
executes for another 4 milliseconds to complete its burst time. So, what you see in this
ideal scheduling is that in each epoch or in each slot, the scheduler is trying to divide the
time equally among the processes, so that asymptotically all processes execute for the
same amount of time in the CPU. So, you see that all processes execute for 4
milliseconds here, at the end of this all processes execute for 6 milliseconds then 8
milliseconds and so on. How is this ideal fair scheduling incorporated in the CFS
scheduler?
319
So, this is done by what is known as the Virtual Runtimes. In each processes PCB that is
in each processes process control block, an entry is present known as the vrun time or the
virtual run time. At every scheduling point, if a process has run for t milliseconds then its
vruntime is incremented by t. Vruntime for a process, therefore will monotonically
increase.
320
Now, the basic CFS idea is whenever there is a context switch that is required to be done,
always choose the task which has the lowest vruntime. So, this is maintained by a
variable called min_vruntime, this is a pointer to the task having the lowest virtual run
time. So, then the time slice required is the dynamic time slice for this particular process
is computed and the high resolution timer is programmed with this particular time slice.
The process begins to then execute in the CPU. When an interrupt occurs again a context
switch will occur if there is another task with a smaller run time. So, you see that this
particular process which is selected to run over here (from first point mentioned in above
slide) will continue to run until there is another task with a lower run time.
Now in order to manage this various tasks with various run times, the CFS scheduler
with quite unlike the schedulers which we seen so far do not use ready queue; instead, it
uses a red black tree data structure (refer above slide). So, in this red black tree or the rb
tree data structure, each node in the tree is represented as a runnable task. Nodes are
ordered according to their virtual run time.
Nodes on the left have a lower run time or lower vruntime compared to nodes on the
right of the tree that is if you see these particular things (rb tree in above slide), so each
321
node is a task and each node has a number written over here (inside the circle) which is
the virtual run time for that particular task. So, you see that each task on the left (left side
of the tree) has a lower virtual run time compared to task on the right.
Now, the left most node of this rb tree is the task which has the lowest vruntime or the
lowest virtual run time. So, in this particular case (mentioned in above slide), it is this
particular node (node with vruntime as 2) which corresponds to the task having the
lowest virtual run time, therefore the scheduler should pickup this task to run next.
In order to find this task, there are two ways which are possible; one is you could
traverse a tree and go towards the left until you reach a leaf, or the other way is we could
directly have a pointer like the min_vruntime which points to the left most node of the
tree. So, whenever the scheduler needs to make a context switch, it would just need to
look into where min vruntime points to and pick out this particular task. This quite
naturally will be the lowest or the task with the lowest virtual run time.
So, this choice of the lowest vruntime is can be done in O(1) and therefore, independent
of the number of processes present in the rb tree. So, at the end of the time slice, if this
process which is currently executing is still runnable that is it has not blocked on an I/O
322
or it has not exited then it is new virtual run time is computed based on the amount of
time it has executed in the CPU. Then it is inserted back into the tree corresponding to its
virtual run time. So, a process in other words would be picked out from the left most part
of the tree because it has the lowest virtual run time and then it would execute in the
CPU for some time say t milliseconds.
And at the end of its time slice, its virtual run time would be incremented by t values,
and it would be inserted again into the tree. Now it will not go to the left of the tree, but
it will rather be inserted somewhere in the middle towards the right (right side of the
tree). Thus as the virtual run times increment, a process moves towards from the left
towards the right. This ensures that every process gets a chance to execute because it
ensures that at one point or the other, every process is going to have the minimum virtual
run time in this particular tree and therefore, will get executed thus starvation is avoided.
So, why do we choose the red black tree or rather why did the Linux kernel choose the
red black tree for the CFS scheduler. So, one obvious reason is the rb tree is self
balancing. So, no path in the tree will be twice as long as any other path because of the
self balancing nature of the tree. Due to this, all operations will be O(log n), thus
inserting or deleting tasks from the tree can be quick and done very efficiently.
323
Now how are priorities implemented in the CFS scheduler? So, essentially, CFS does not
use any exclusive priority based queues as we seen in the O(1) scheduler, but rather it
uses priorities to only weigh the virtual run time.
For instance, if a process has run for t ms then the virtual run time is incremented by t
into weight based on the nice value of the process i.e vruntime+= t * (weight based on
nice of process), essentially based on the static priority of the process. So, a lower
priority implies that the time moves at a faster rate compared to that of a high priority
task. So, essentially what we are doing is we are providing a weight for the time that its
executes, that is we are either accelerating the time or decelerating the time at which a
process runs. So, this weight is used to implement priorities in the CFS scheduling
algorithm.
324
Next we will look at how the CFS scheduler distinguishes between an I/O bound and a
CPU bound process. So, essentially this distinguishing is done very efficiently. It is based
on the fact that I/O bound processes have a very small CPU burst, and therefore its
vruntime does not increment very significantly. As a result of this, it is more often than
not appearing in the left part of the rb tree. Therefore, it gets to execute more often than
other processes. This is because of the fact that as we mentioned as time progresses each
process in the CFS scheduler is picked up from the left most nodes and executes and then
it is placed on the right; therefore, in general every process moves towards the right part
of the rb tree present in the scheduler.
Now, for the I/O bound process, since the vruntime does not change too much or
increments just by a small margin, it does not move to the extreme right, but rather it’s
still stays towards the left part of the tree. Thus very soon, it will soon find itself as a
process with a lowest vruntime and will have a chance to execute again in the CPU. Now
as a second effect due to the small vruntime or the small virtual run time of the I/O
bound processes, it is given a larger time slice to execute in the CPU. Thus we see the
I/O bound and CPU bound processes are very well distinguished quite inherently by the
CFS algorithm.
325
When a new process gets created, it gets added to the red black tree. Now its starts with a
initial value of min_vruntime therefore, gets placed to the left most node of the tree and
this ensures that it gets to execute very quickly. So, as it executes depending on the
amount of time it executes whether it is an interactive or a CPU bound process, its
position within the rb tree would vary. This was a brief introduction to the CFS
scheduler, which is the default scheduler in current versions of the Linux kernel.
Thank you.
326
Week – 06
Lecture – 23
Inter Process Communication
In this video we will look at Inter Process Communication. Essentially, when we write
large applications it is often quite useful to write them as separate processes. So in order
to have an efficient communication between these processes, in order to make things
happen efficiently we use inter process communication. In this particular video, we will
look at a brief introduction to IPC's that is Inter Process Communication and we will see
a few examples of the same.
So we had seen this Virtual Memory View of a process. We said when a process is
executing it sends out virtual addresses which get mapped into this virtual memory map.
And we had seen that there is an MMU which uses page table stored in the memory and
converts these virtual addresses into physical addresses. Now we have also seen that
each process that executes in the CPU will have it is own virtual memory map.
327
The use of virtual memory map provides some level of abstraction. Essentially, the
executing process would only know its virtual addresses and with that virtual addresses,
it can only access the user space of the virtual memory map. We had seen that the MMU
then transfers this mapping into corresponding physical addresses. So, the executing
process will have no way to determine what the corresponding physical address for its
memory accesses are. So for instance, if you declare an integer ‘i’ in your program you
can print out the virtual address corresponding to ‘i’ for example, with a printf &i.
However, it is not possible for you in the user space to determine what the corresponding
physical address is for that particular variable ‘i’.
The second thing what the virtual memory map provides is that the executing process
could only have access to its own virtual memory map. If there is another process it has
with its own table, now in user space there is no way for the executing process to
determine anything of the other process.
328
So, when there is a context switch and a new process executes it is the second processes
virtual address map or virtual memory map which is then used. So this process has no
way to determine any other processes virtual address map. So the thing to conclude from
these 2 slides (Virtual memory view mentioned above) is that, a process has no way to
determine any information or any data about another process. So given this, how does
one process communicate with another process?
329
So in order to do this, there is a mechanism known as Inter Process Communication.

Essentially with IPC's or inter process communication, two processes will be able to send
and receive data between themselves. The advantage of IPC is that the processes could
be return to be modular. So essentially each process is meant to do a single job and
processes could then communicate with each other through IPC's.
For an example, let us say we have a data acquisition system and a control system. So
essentially we could write one process which acquires data from the external world, such
as the temperature, pressure, or the speed, and then it could then send this information or
the data collected to a second process which then analyzes the data and determine some
particular parameters. So these parameters could be sent to a third process which then
actuates some external data, for instance it could open a valve or close a valve or adjust
the temperature of the room and so on. Thus, we see we are able to achieve a modular
structure in our application.
Each processes job is to only focus on a single aspect. The communication between the
processes is achieved by inter process communication. In typical operating systems there
are 3 common ways in which IPC's are implemented. These are through Shared memory,
Message passing and Signals. So let us look at each of these things.
330
So, with a shared memory we have one process which creates an area in RAM which is
then used by another process. So essentially, the communication between process 1 and
process 2 is happening through this particular shared memory (green colored box
mentioned in above slide). Both processes can access the shared memory like a regular
working memory, so they could either read or write to this particular shared memory
independent of each other. The advantage with this is that the communication is
extremely fast, there are no system calls which are involved. And, the only requirement
is that you could define an array over here (in shared memory) and then fill the array in
the shared memory which can then be read by the other process.
The limitation of the shared memory approach is that it is highly prone to error; it
requires the two processes to be synchronized. So, we will take a small example of how
shared memory is implemented in Linux.
331
So essentially, in a shared memory in the Linux operating system, we have three system
calls which are used. First is the ‘shmget’ or the shared memory get, which takes three
parameters; a key, size, and flags. So this system call creates a shared memory segment.
It returns an ID of the segment that is ‘shmid’ - the shared memory ID of the segment. So
the parameters key is a unique identifier for that shared memory segment, while size is
the size of the shared memory. So this is typically rounded up to the page size that is 4
kilobytes. So, this is how a shared memory gets created in a particular process
(mentioned in above slide).
Now an other process could attach to this shared memory by this (mentioned in above
slide) particular system call that is shared memory attach - ‘shmat’. So, this particular
call requires the shared memory ID (shmid), and address (addr), and flags. So, essentially
system call would attach the shared memory to the address space of the calling process.
The address is a pointer to the shared memory address space. So we will understand
more of this through an example. The opposite of the shared memory attach is the shared
memory detach where a process could detach the shared memory from it is user space.
So, the detach system call takes the shared memory ID (shmid).
332
Let us see the, an example of shared memory with this example (refer above image). So,
let us say we have written this particular program called server.c and we create a shared
memory in this. So we define a key called 5678 this is some arbitrary value for key
(defined in above program), but the requirement is that this key should uniquely identify
the shared memory. So we could use this key to invoke this function shmget (mentioned
in if condition in above program), we pass it the key, we pass it SHMSIZE that is is
defined here (before main() in above program) as the size of the shared memory which
we want to create. Note that; this size although we specified as only 27 bytes, we will get
extended to a page that is we will create a page of 4 kilobytes corresponding to this.
The third parameter is the permissions, that is we are given it as IPC_CREAT that is we
are creating this particular shared memory and these are the permissions (0666): read,
write, execute permissions. And of course, if this function (shmget) fails then it enters
over here (inside if condition) and exits. Otherwise, if it executes successfully we get a
valid shared memory ID.
The next part is to invoke the shared memory attach - shmat (second if condition in
above program) providing this particular ID (shmid) and we will get a pointer to this
particular shared memory. So shm, so we have like char *shm this is a pointer to the
333
shared memory. Now the shared memory attach would return a pointer to this particular
shared memory.
Now in case it (second if condition) returns -1 then it is due to an error and we are
exiting. Now at this particular point we have obtained a pointer to the shared memory
and we can use that particular pointer just as a standard pointer as we use in a C program.
So for instance, over here (s = shm) we have moved the pointer to this variable s which is
defined as char *s and we have put alphabets from - a to z in this thing (inside for
condition). And finally, we have ended it with a null termination and then we are going
in a loop continuously sleeping (while condition in above program), so we will come
back to these two statements (mentioned inside while loop condition) later on.
Now, let us look at the client side of this code (client.c program mentioned in above
slide). So we invoke this shmget (function in first if condition) and give it a key, that is
key is the same 5678, the shared memory size (SHMSIZE) is as before 27 and we are
providing the permissions (0666). Note that we do not require to give CREAT over here
(IPC_CREATE given in server.c program but not in client side) because the shared
memory region is already created. Then we attach to this particular shared memory
segment (shmat function in second if condition) as before (as in server.c program), we
invoke the function shared memory attach - ‘shmat’ and we pass it the shared memory ID
(shmid). Then we get a pointer ‘shm’ which is a pointer (declared as char *shm) to this
shared memory location.
So, essentially both the server as well as the client are pointing to the same physical
memory page. Now we could actually read data from this particular shared memory
region (for condition mentioned in client.c program). So remember that the server has
put values from a to z that is a b c d to z (inside server,c program), while in this case
(client side program) we could read the values a to z into this particular pointer s and
print it out in the screen. So once we have completed reading all data from the shared
memory we put this star into the shared memory location (*shm = ‘*’;). This is the first
location in the shared memory.
Now recall that over here (while loop mentioned in server.c program) in the server we
334
have put a while loop which continuously loops until the first parameter pointed to in the
shared memory is a *. So when we obtain a star (from the client side program) it means
that the client has completed reading all data and therefore the server can exit. In a
similar way the client can exit as well. So, you see that we have these 2 processes; one is
server, one a client.
And these two processes are having their own virtual address space, but by the use of this
shared memory (shm) which is created by the shared memory get (shmget) and shared
memory attach (shmat), we have able to create a shared memory region which is
common between the server and the client. Then we have obtained a pointer to that
particular shared memory region that is s on shm (s = shm) and the server could put data
into the shared memory region while the client could read data, the vice verse is also
possible.
And at the end of it we required to note that there is a synchronization required between
the server and client. So this synchronization requirement should be explicitly program
into this particular server client modules.
Next, we will look at the Message Passing. Unlike the shared memory that we just seen
335
where the shared memory is created as part of the user space, in message passing the
shared memory is created in the kernel (refer above slide). Essentially we would then
require system call such as send and receive in order to communicate between the two
processes. If a process 2 wants to send data to process 1, it will invoke the send system
call. So this would then cause the kernel to execute and it would result in the data return
into the shared memory. While, when process 1 invokes receive, data from the shared
memory would be read by process 1.
The advantage of this particular message passing is that the sharing is explicit.
Essentially both process 1 and process 2 would require support for the kernel to transfer
data between each other. The limitations is that it is slow, each call for the send or
receive involves the marshalling or demarshalling of information. And as we know in
general a system call has significant overheads. Therefore, message passing is quite slow
compared to shared memory. However, it is less error prone then shared memory because
the kernel manages the sharing, and therefore would be able to do the synchronization
between the process 1 and process 2.
Another very common application or message passing is the use of Pipes. Now pipes are
used between parent and child processes only. Essentially, you can only communicate
336
data from a parent process to a child process and vice verse. Another aspect of pipes is
that it is unidirectional. Now when a pipe is created generally there are two file
descriptors which are associated with it that is fd [0] which is used to read from the pipe,
while fd [1] is used to write to the pipe. So we know the unidirectional nature of this
particular IPC (refer above slide). So, fd [1] is exclusively used to write to the pipe
buffer, while fd [0] is used to read from the pipe buffer.
So in order to obtain two way communication between the parent and the child, we
would require two pipes to be obtained; pipe 0 and pipe 1 (refer above slide image). So,
when pipe 0 is opened it would create two file descriptors; one to write into pipe 0 and
the other one to read from pipe 0. Similarly, there are two file descriptors to write to pipe
1 and read from pipe 1. Similarly there are two pair of file descriptors for pipes in the
child process.
Now you note that these additional descriptors are not required, therefore we could
actually close the extra file descriptors to obtain something like this (second figure in
above slide image). Now in order that the parent sends some information to the child, the
parent will write to pipe 0 while the child will read from pipe 0. In order to that the child
sends some information to the parent, the child will write to pipe 1 while the parent will
337
read from the pipe 1.
So let us see an example of the user pipes (refer above slide). So this again is a standard
Linux example and as we have said, so we create a parent and child process so that there
is a uni directional communication from the parent to the child. So let us look at this
particular thing (program mentioned in above slide). First we invoke this particular
system call called pipe and give it this pipefd (i.e pipe(pipefd);). So pipefd is defined
over here (variable below main()) and taking two elements and then we invoke the
system called fork (mentioned inside switch condition). So as we have studied the fork
system call would create a child process which is the exact duplicate of the parent
process. So in the parent process the value of a pid that is the value what fork returns is
the child's pid value, and it is a value which is greater than 0. On the other hand in the
child process the value return by fork is 0.
So, let us see what happens in the child process first. So in the child process because the
pid value is 0 so it would come over here (case 0) and the first thing what we do is we
close pipefd [0]. So, closing pipefd [0] would mean that we are closing the read pipe to
the child, that is so we are we having like two pipes over here (first figure in slide time
15:14) corresponding to pipe 0 and we have just closing the read pipe while the pipe 1
338
that is the right file descriptor is still open. Now let us see what happens in the child
process, in the child process the value of pid returned by fork is 0, therefore it will result
in this particular code being executed (case 0 in above program).
So in this code (case 0 set of statements) first, there is we are closing the pipefd [0]. So
this means that the file descriptor corresponding to the read pipe is closed. Second, we
are then opening a second file called fdopen corresponding to file descriptor 1 and it is
opened in the write mode (“w”) and we could then use fprintf(out, “Hello World\n”);. So
essentially what we are doing is to this pipe with descriptor 1 we are writing “hello
world”. If you go back to this particular thing (refer slide 15:14 second figure), so what
we are seeing is that we are writing “hello world” into this pipe 1.
Now in the parent process which would execute over here (default set of statements)
because fork would return with the value which is greater than 0, so we are closing the
write pipe and opening using fdopen, the read pipe (refer default section in program). So
we are then using fscanf to read whatever has been pushed into the pipe, in this case it is
“hello world” and printing it to the screen.
So, essentially what we have implemented in the program is this lower part. So the child
opens this pipe in the write mode and puts “hello world” into this pipe while the parent
then opens the pipe 1 in the read mode and reads the string “hello world” from the pipe.
Thus, the string “hello world” has been transferred from the child to the parent. So you
can actually try to implement this particular program and execute in a Linux system.
339
Besides, the message passing, shared memory and pipes that we have studied so far. A
Third way of inter process communication is by what is known as Signals. So, signals
are asynchronous unidirectional communication between processes. The operating
system defines some predefined signals and these signals could be sent from the
operating system to a process, or between one process to another. Signals are essentially
small integers, for instance an each of these (refer above slide second point) integers has
a predefined meaning. For instance, 9 would mean to kill a process, while 11 would
mean a segmentation fault and so on.
So in order to send a signal to the process we could for instance have kill send it to this
pid and we could send this particular signal number which is an integer (i.e kill(pid,
signum)). In order that the process handles that signal it is to define this particular
function called signal, which takes the signal number for instance 9 or 11 and then the
handler (i.e signal(signum, handler)). So, handler is a pointer to the signal handler for
that particular number. So as a result we could send signals from one process to an other
and depending on the type of the signal the corresponding handler would be executed.
In this video we had seen a brief introduction to IPC's. Essentially, we had seen different
types of IPC's like, message passing, shared memory, pipes and asynchronous transfers
340
using signals. These IPC's are used extensively in systems to communicate between
processes. And therefore, we are able to achieve or built applications which are
extremely modular with small programs, with each program focusing on only a certain
aspect in the entire system.
Thank you.
341
Week – 06
Lecture – 24
Synchronization
Hello. In the last video we had looked at IPC’s or Inter Process Communication. So we
had seen that due to IPC’s we could built application comprising of several processes and
we could achieve modularity. And as a result of having modular processes, with each
process doing a specific job we are able to have a very efficient and easy to understand
applications.
One consequence of having this modular approach and the use of IPC’s is the
requirement for Synchronization. In this video we will look at what synchronization is
and what are the issues corresponding to synchronization, and how it can be solved?
Let us explain what synchronization is with this motivating scenario (refer above slide).
Let us say we have two programs; program 0 and program 1, and a shared variable which
is defined as counter (defined in above slide image). It is a global shared variable int
342
counter = 5; . So, in program 0 we are incrementing the counter by 1, while in program 1
we are reducing the counter by 1, essentially decreasing the counter by 1. Now as we
know that when we execute this program, let us say in a single core processor.
So, program 0 (program 1 in blue color mentioned above) would execute for some time
then there will be a context switch, then program 2 will execute and then program 1
would execute and so on. Assuming that, it is a round robin scheduling and assuming that
no other process is present. Now, the question is what would be the value of counter?
One would expect that the counter value is 5 because, let us say program 0 executes first
so it is going to increment the value of counter, so counter becomes 6 over here (in
program 0). And then program 1 executes and it decrements the value of the counter and
since a counter is shared and has the value of 6 it gets decremented back to 5. On the
other hand, if let us say program 1 executes first and then program 0, so counter—
(mentioned in program 1 in above slide) would cause the counter to reduce from 5 to 4
and then program 0 executes, this particular line (counter++ in program 0) causing it to
increment the counter from 4 back to 5.
Thus, one would expect that the result at the end of both these programs would be 5 for
343
the value of the counter. But now what we will show is that, we can also obtain the
values of 4 and 6 for the value of counter. So let us see this more in detail.
So, let us look more deeply into how these instructions (mentioned in above two
programs) are executing. Let us see what happens when we are actually incrementing
this counter. So, essentially what we are doing is we are seeing how what happens when
we are doing counter++ with respect to the assembly instructions (mentioned below
program 0 in above slide). So first, the value of counter which is stored in memory is
loaded into a register say R1 and then R1 is incremented by 1. So that is the second line,
R1 = R1 + 1 and then R1 is return back into memory. So, R1 is return back into the value
of counter which is stored in memory.
So in terms of the numbers, so we have 5 here which is the counter, so the contents of
counter is 5 and that is loaded into R1 (i.e. R1 <- counter), then 5 is incremented so now
R1 contains 6 (R1<- R1 + 1) and the value of R1 is return back into counter (i.e. counter
<- R1). So, this value of 6 is return back into the memory location specified by counter
and therefore 6 is return back in the counter.
Now, suppose there is a context switch, the same thing happens again. So value of
344
counter is loaded into R2 (R2 <- counter), so the R2 has a value of 6 then we are
decrementing R2 by 1 (R2 <- R2 – 1), so R2 becomes 6 - 1 that is 5. And then the
register R2 is stored back into the memory location counter (counter <- R2). Therefore,
R2 is 5; therefore the value of 5 is stored back into the counter. So at the end of these two
programs executing the value of counter is 5.
So now let us look at the second two scenarios, when we get the value of counter as 4
and when we get the value of counter as 6. So, let us look at this scenario (refer above
assembly instructions where counter value is 6), so Program 0 executes and loads the
value of counter into R1 (R1 <- counter); therefore recollect that R1 has the value of 5.
Now there is a context switch over here (first dotted lines) and process 1 executes. Now
the counter over here is still having the value of 5 and it is loaded into the register R2
(R2 <- counter). Then R2 is decremented by 1 (R2 <- R2 - 1), so R2 now has 4. And the
value of 4 is stored back into counter (counter <- R2). So now, counter in memory has
the value of 4.
Now there is a context switch again and recollect, when there is a context switch the
program 0 continues from where ever it has stopped. Essentially, the context switch was
stored in the kernel is restored back into process 0 allowing it to continue from where it
stopped. Therefore, we see that the value of R1 at this particular point was 5 (when R <-
counter) and after the context switch we have R1 back at 5 again over here, so R1 is
incremented by 1 (R1 <- R1 + 1) to get 6 and the value of 6 is stored into counter
(counter <- R1). Therefore, at the end of this execution we get the counter value equal to
6.
Now let us look at the third case that is when we get the counter value equal to 4. So this
is exactly the opposite to the second case in which the program 1 is executed first. So
essentially the counter which has a value of 5 gets loaded into R2 (R2 <- counter),
therefore R2 has 5. Now there is a context switch causing program 0 to execute and the
value of counter is loaded into R2 incremented by 1 (R2 <- counter, R2 <- R2 + 1), so
that is 6 and 6 is return back into the counter, so counter now will have the value of 6. So
there is a context switch again causing program 1 to execute from where it had stopped.
So, we notice that R2 had the value of 5. Now, R2 is reduced by 1, so that makes it 4 (R2
345
<- R2 – 1). And the value of 4 is stored into the memory location corresponding to
counter (counter <- R2). Thus, at the end of the execution in this case the value of
counter is 4.
So, this was an example of the issues that would occur when we have a shared memory.
So even though there was a very simple operation of incrementing a counter in one place
while decrementing the counter in another place we had seen that the result could be
different, depending on how the instructions get executed and how the context switch is
occurred. So we would define this scenario more formally by what is known as Race
Conditions.
A Race Condition is a situation where several processes access and manipulate the same
data. So this part of the process which accesses common or the shared data is known as a
Critical Section. The outcome of a race condition would depend on the order in which
the accesses to that data take place. As we have seen in the previous example, depending
on which program executes first as well as depending on how the context switch is occur
the result would vary. The race conditions could be prevented by what is known as
synchronization. Essentially, with synchronization we would ensure that only one
process at a time would manipulate the critical data.
346
So coming back to our example what is required, is that we would mark this area of
instructions which access or which manipulate the shared data as a critical section (refer
above slide). Then we would have some additional techniques to ensure that no more
than one process could execute in a critical section at a given time. So we will see this in
more detail in this particular videos that follow this.
Race conditions not only occur in single core systems, but also in multicore systems. So,
essentially this is quite obvious to reduce that due to the fact that each processor in a
multicore system is executing simultaneously it is likely that the shared variable could be
accessed by both programs at exactly the same time. Therefore, race condition in multi
core systems is more pronounced compared to the single core systems.
347
Now, let us look at how solutions for the critical section problem can be obtained. Any
solution for the critical section problem should satisfy the following requirements. These
are; 1 Mutual Exclusion, 2 Progress, and 3 No starvation or bounded wait. In Mutual
Exclusion, the critical section solution should ensure that not more than one process isin
the critical section at a given time.
Progress should ensure that when no process is in the critical section, any process that
requests entry into that critical section must be permitted without any delay. Bounded
wait or no starvation means that there is an upper bound on the number of times a
process enters the critical section while another is waiting. Essentially, it means that a
process should not wait infinitely long in order to gain access into the critical section.
348
All critical section problems use techniques known as locking and unlocking in order to
solve the critical section problem (refer above slide image). Essentially, in the solution
we would have something like this (lock variable) which is also a shared variable. So we
define something known as a lock and define it as an L (i.e. lock_t L;). So before
entering into the critical section the program should invoke lock (L), and while exiting
from the critical section the unlock (L) should be invoked. Similarly every program
which uses the same critical section should lock and unlock before entering and exiting
the critical section respectively.
Now what lock does is that it acquires the lock (L) exclusively. So, after lock completes
its execution that is after the function lock (L) completes execution, it is ensured that
exactly one process, in this case (refer above two programs) only this process enters into
the critical section and is present in the critical section. When unlock is invoked the
exclusive access to the lock is released, this permits other processes to access the critical
section. Now the locking and unlocking should be designed in such a way that the three
requirements of the critical sections solution should be satisfied, that is mutual exclusion,
progress and bounded wait.
So we had already seen mutual exclusion and let us see progress. So, Progress means that
349
let us say program 0 is not in the critical section and let us say program 1 has come here
and it has requested the lock. So progress means that, since no other program is in the
critical section so it should be given the lock immediately. So, program 1 should get
exclusive access to the lock.
Bounded wait means this particular scenario. Let us say program 0 is present in the
critical section while program 1 has requested the lock. So there is an limit on the
amount of time that program 1 has to wait before it gets access into the critical section,
that is the solution should ensure that program 0 unlocks L and only then will program 1
enter into the lock. So, there is a bound on the amount of time that program 1 waits, if it
requests the lock while program 0 is already in the critical section.
The use of lock and unlock constructs in a program will ensure that the critical section is
atomic. So when do we need to use this locking mechanisms? Essentially single
instructions by themselves are atomic. Instruction such as add %eax to %ebx is an
atomic instruction and does not require an explicit locking. However, multiple
instructions where you have a sequence of instructions and if you need to make them
atomic then explicit locking and unlocking is required, so each piece of code in the
operating system must be checked if they need to be made atomic or not. Essentially
350
things involving the interrupt handlers need to be checked to be made atomic.
So in this particular video, we had seen the requirement for locking and unlocking of
instructions. So in the next video, we are going to see how such locking and unlocking
mechanisms are implemented in systems.
Thank you.
351
Week - 06
Lecture – 25
How to Implement Locking (Software Solutions)
Hello. In the previous video, we had took, seen an Introduction to Critical Sections. We
had seen that in order to solve the critical section problem, there were 3 requirements. 1 st
is mutual exclusion, 2nd progress and 3rd bounded weight. In this video, we will see some
software solutions for the critical section problem.
So let us start with the more simple solution that is by disabling interrupts, so this
solution is only applicable for single core systems. With multicore systems, essentially
because of the advanced programmable interrupt control is present in the systems;
interrupts are routed to several processors. Therefore this solution will not work on
multicore systems. On the other hand, for single core processors or single core systems,
disabling interrupts will work. Essentially we had seen that if interrupts are disabled then
it would prevent context switching and preventing context switching would mean that
the critical section would be not pre-empted during its execution.
352
Let us look at how interrupts can be used to solve the critical section problem (refer
above slide). So, we have the two processes, process 1 and process 2 and they have this
critical section which is shown in red. In this critical section, both processes access or
modify the same shared data. In order to ensure that the critical section problem is solved
and that both processes do not end up in the critical section at the same time, we disable
interrupts before entering into the critical section.
While, on the other hand, while leaving the critical section the interrupts are enabled
again. Disabling interrupts as we have seen will prevent the process from getting pre
empted at the end of its time slice. Essentially, there would not be a timer interrupt that
would occur which would prevent process 1 from pre-empting. Therefore, we see that
this is a very simple solution and process 1 will continue to execute without any other
interrupts at occurring as a result of the disabling of interrupts.
Similarly, when process 2 enters into this critical section, it disables interrupts; and on
leaving the critical section, it enables interrupts again (mentioned in above slide). So, this
application of disabling and enabling interrupts is similar to the locking mechanism
before entering the critical section and the unlocking mechanism at the exit of the critical
section. The locking will ensure that only one process enters or executes in the critical
section at a given time. While the unlocking mechanism that is when interrupts are
enabled would then allow other processes the chance to enter or execute in the critical
section.
So, this method of solving the critical section problem is simple. It just requires a single
instruction in order to disable or enable interrupts. Thus, preventing context switches
from happening. However, the limitations of using interrupts is that it requires a higher
privilege level. So, normal application programs which run in user space will not be able
to disable and enable interrupts in such a way. Therefore, this solution is only applicable
for code that runs in the kernel.
So it is only the kernel or the operating system that is allowed to disable interrupts and
enable interrupts again. Therefore, this solution is only applicable for operating system
code. And as we have seen before this disabling and enter enabling of interrupts is not
suited for multicore systems; essentially because when interrupts are disabled on a
multicore system, it would mean that interrupts corresponding to that code alone is
353
disabled. On the other hand, interrupts are still allowed on the other codes which are
running. Thus on a multicore system when disable interrupts is invoked, all interrupts to
only that code or that code in the processor is disabled. Other processors such as a
processor which is running this process 2 (mentioned in above slide) will not be affected
by the disabling of interrupts.
Therefore, this solution is not suited for multicore systems. So, what we will do next is
we will try to actually build our own solution for the critical section problem. So, we will
start with a very simple solution, we will see its drawbacks, and then we will gradually
modify that particular solution until we reach a point where we have solved the critical
section problem. So let us look at this.
(Refer Slide Time: 05:47).
So let us start with our first attempt. So let us say we have this critical section in process
1 and this critical section in process 2 (refer above slide) and both these critical sections
are manipulating the same data. In addition to this shared data, we also define an other
shared data known as turn, and set this value of turn to 1 (i.e turn = 1;). So, what happens
we look into more detail? So, when process 1 begins to execute it sees that turn = 1,
therefore this while loop immediately exits (because while condition is false) and the
critical section gets executed.
Now during the time when process 1 is in the critical section, when process 2 arrives at
this particular point, it sees that turn = 1, and therefore it will continue to loop in this
354
particular while loop (because while condition is true). The only time that it is capable of
exiting from this while loop is when the 1st process sets turn to 2, so when the value of
turn is set to 2, and remember that turn is a shared variable, therefore, this while loop in
process 2 will terminate. And process 2 will then enter into the critical section. At the
exit of the critical section, process 2 will set the value of turn to 1 (i.e turn = 1;). So,
setting the value of turn to 1 would mean that a process 1 can then enter into the critical
section.
So, essentially when we look at the locking and unlocking mechanism to solve this
critical section problem, we see that this while loop present over here (second while in
process) is the locking mechanism, while the unlocking mechanism is the third line that
is setting the value of turn to 2 (i.e turn = 2 in process 1). The second observation that we
make is that this algorithm or this solution for the critical section problem requires that
process 1 executes into the critical section and then because turn is set to 2 over here (last
line in process 1), after process 1 completes its critical section execution, process 2
should execute. So, once process 2 completes its execution, process 1 will execute and so
on.
So essentially there is an alternating behaviour for the critical section (refer last bullet
point in above slide image). First, process 1 needs to execute the critical section because
the value of turn is equal to 1 (turn = 1;). Then when it completes it sets turn to 2 (turn =
2;), so process 2 will execute. And process 2 will set the value of turn to 1 (turn = 1;) at
the end of its critical section which will result in process 1 executing in the critical
section and so on.
So, we see that there is an alternating behaviour between process 1 and process 2. Quite
naturally because the value of turn is shared between both the processes and as we have
seen only one process could satisfy this while condition, and break out from this while
loop. Therefore, exactly one process would enter into the critical section at a given time.
Therefore, this solution achieves mutual exclusion. However, there are two limitations of
this solution; 1 is the busy waiting. So while process 1 is executing in this critical
section, process 2 would endlessly execute in this particular while loop.
Note that process 2 is in the ready state or the running state it is not in the blocked state
and therefore, it would consume power as well as time, because it needs to execute in the
355
CPU periodically in order that this value of turn is checked. Another limitation as we
have seen before it is that this solution for the critical section problem requires that
process 1 and process 2 alternates its access into the critical section. So what it means is
that first process 1 should execute then process 2, process 1, process 2 and so on. So, this
creates a problem especially with the progress requirement of the critical section
solution.
So, the progress requirement condition stated that if no process is in the critical section,
and a process request for the critical section, then it should be offered the lock,
essentially it should be able to execute in the critical section. We see that this solution
will not satisfy the progress requirement. For instance (refer above slide image) let us say
that process 2 begins to execute before process 1. So process 2 comes here (check while
condition) and it executes this while turn equal to 1, and we see that since process 1 has
not yet started. So if the progress condition needs to be met, so process 2 should enter
into the critical section. But this is not so, therefore this is not a good solution for the
critical section problem.
The main drawback or the main problem with our first attempt to solve the critical
section problem was that it used a common turn flag that was modified by both
processes. So, this turn flag was set to 2 in process 1 and set to 1 in process 2 (last line in
both the process); and this force the two processes to alternate execution in the critical
356
section. So, one possible solution for this or one possible thing we could actually make
better in our second attempt is to not have a single shared flag, instead we will have 2
flags one for each process.
So let us see the second attempt to solve the critical section problem. So, this is the
second attempt (refer above slide image). So, unlike the previous attempt we now have 2
flags present. These 2 flags are also shared among the two processes; that means that this
flag p2_inside can be accessed from both process 1 as well as process 2. However, it is
process 2 which actually changes or modifies this flag, so process 2 changes from false
to true or true to false, while process 1 only reads the status of this flag; it only
determines whether the flag is a true or a false.
Similarly, the second flag is p1_inside. Now these flags p2_inside and p1_inside are used
during the locking and unlocking of the critical section. Essentially before process 1
enters into the critical section, it does two things. 1 st, it needs to check that the second
process that is p 2 is not inside the critical section and this happens when p2_inside =
false. So, if p2_inside = true, it would mean that the second process is executing in the
critical section. Then it sets it own flag that is p1_inside = true and executes in the
critical section (process 1 critical section), while at the exit of the critical section it sets
p1_inside = false.
357
Similarly, process 2 would first execute this while loop (check while condition). Until
p1_inside = false (i.e already set in process 1 while exiting), so a value of p1_inside is
false would mean that process p 1 has exited from the critical section and is no longer in
the critical section. And after it obtain such a condition, the while loop is completed and
execution comes over here where process 2 sets p2_inside = true indicating that it is
inside the critical section and at the end of the critical section, just before exiting it sets
that p2_inside = false.
Now this second solution does not require that the two processes alternate execution in
the critical section. So, for instance, since the initial values of p1_inside and p2_inside
are false, suppose process 2 executes first, so it sees p1_inside = false, so it breaks from
this (while) loop and enters into the critical section. Similarly, if process 1 does not start
executing, it will keep entering into the critical section without requiring process 1 to
execute. However, the problem with this particular solution is that it does not guarantee
mutual exclusion rather under some circumstances it is possible that process 1 as well as
process 2 is present in the critical section at the same instant of time.
So let us see when this mutual exclusion does not hold for our second attempt. This
particular table here (refer above slide image) shows the execution of various statements
in the CPU with respect to time that means, that this particular while loop (while loop in
first row of the table) executes and while since it is in blue colour it means that it
358
corresponds to the process 1. Then while (p1_inside == true) (third row in the table) and
p2_inside = true (fourth row in the table), so this is in the green colour, so it corresponds
to process 2. And p1_inside = true (sixth row in the table) again corresponds to process
1.
So, this is how the CPU would execute instructions corresponding to the two processes
and we have like several context switches that occur during the execution and these are
the status of the various flags as the execution progresses (second and third column in the
table).
So let us start with process 1, we have initially set that p1_inside and p2_inside is both
false, indicating that both processes are not in the critical section. Therefore, this
particular while loop (while loop mentioned in first row) will complete, essentially
because p2_inside is false; now suppose there is a context switch that occurs resulting in
process p 2 which executes, so it will also execute these statements while p1_inside is
false (mentioned in third row of the table), so the while loop will break. And then it sets
p2_inside = true (fourth row of the table)(.
Now immediately after setting this suppose there is a context switch that occurs and
process p 1 will continue to execute and it sets p1_inside = true (mentioned in last row).
Essentially due to the context switch, it will continue to execute from where it has
stopped that is it has stopped just soon after completion of the while loop (first while
loop in the table) and then it continues with the next instruction over here (last row in the
table) that is setting of p1_inside = true. So, here we see we have a state where p2_inside
= true (mentioned in fourth row)indicating that p 2 is in the critical section as well as
p1_inside = true indicating that p 1 is also in the critical section. Therefore, we have not
able to achieve mutual exclusion.
359
So, the main problem with this second attempt to solve the critical section problem was
that we had two flags p1_inside and p2_inside. However, they were set after we break
from the while loop (second while condition in both process) that is only after this
particular while loop completes execution and breaks from this only then are we setting
p1_inside = true and p2_inside = true . Essentially, what this means is that p1_inside and
p2_inside are set to true only within the critical section, and this is what created the
problem. So let us see if we can actually change these ordering a bit.
360
So, we will start with the third attempt that is the third attempt for a solving the critical
section problem. And what we will do is just a simple change, so we will have these two
flags instead of p1_inside, we will have now p1_wants_to_enter into the critical section,
and p2_wants_to_enter into the critical section. Essentially, each of these flags are set to
true when the corresponding process wants to enter into the critical section. So, the
minor change we do over here with respect to the second attempt is that process 1 first
shows his intention to enter into the critical section by setting p1_wants_to_enter = true.
And only then will it try to determine, if process p 2 is in the critical section or not. So,
essentially it sets p1_wants_to_enter to true and then it would execute in this while loop
until p2_wants_to_enter is false. So, when p2_wants_to_enter it becomes false indicating
that the 2nd process has completed execution in the critical section then process 1 will
enter into the critical section.
At the end, after the critical section has completed execution p 1 sets the flag
p1_wants_to_enter = false, so this setting of false will allow process 2 to enter into the
critical section. Note that the only difference of this, corresponding to our previous
attempt was that we move the flag from inside the critical section that is after the while
loop to before the while loop. So, what we see is that this particular solution achieves
mutual exclusion, so, unlike the previous attempt where we obtained a condition where
mutual exclusion was not achieved and we had two processes in the critical section at the
same time.
In this solution, the mutual exclusion is achieved; essentially, it is guaranteed that when
process 1 is in the critical section, process 2 is not in the critical section and vice versa.
So, what is not guaranteed is Progress. Essentially, we could obtain a state where both
processes are endlessly waiting or endlessly executing in this while loop and we will see
how this thing occurs, so such a state is known as a deadlocked state.
361
So let’s see the deadlock situation in this particular case. So let us look in this table (refer
above slide), which shows how the CPU executes instructions with respect to time, and
also this table shows the various values of the flags which are present. Let us start with
process 1 executing and the CPU executes this particular statement (first row statement)
and it sets the value of p1_wants_to_enter = true. Then there is a context switch, which
results in process 2 executing which sets the flag p2_wants_to_enter = true.
Now you see that both processes have set their respective flags to true (as mentioned
above). Now when both processes enter into this while loop (second while loop in above
program), we see that in both cases process 1 and as well as process 2, this flag is set to
true indicating that both processes will continue to execute this while loop endlessly.
Essentially, process 1 is waiting for process 2 to set its flag to false, while process 2 is
waiting or is waiting for process 1 to set its flag to false. So, essentially we have reached
a state which where each process is waiting for the other process to do something. So,
this is what is known as a deadlocked situation and it could lead to considerable amongst
of problem in operating systems.
362
So, to give a more clearer view of what a deadlock is we have this particular figure here
(mentioned in above slide), where we have process 1 (blue colour) and process 2 (green
colour) present here. Now process 1 is in the while loop and it is looping continuously
waiting for process 2 to set this flag p2_wants_to_enter = false. Similarly, process 2 is
waiting in the while loop or looping continuously and waiting for this flag
p1_wants_to_enter = false. However, the problem here is that process 1 needs to change
the value of this flag (p1_wants_to_enter) because this flag can be only changed by
process 1; similarly process 2 needs to change the value of this flag (p2_wants_to_enter)
since this flag can be only changed by process 2.
And therefore, we see that there is a tie; process 1 is waiting for process 2 to change the
value of its flag (p2_wants_to_enter); however, process 2 cannot change the value of its
flag because it is waiting for process 1 to change the value of the flag
(p1_wants_to_enter). So, we get some kind of a cycle present over here (refer above
slide image). So, this cycle will result in both process 1 and process 2 waiting for each
other for an endless amount of time, so this is known as a deadlock. So, in a later video,
we will look more into details about how deadlocks are caused and how they can be
prevented and avoided.
363
So let us see what the problem was with the third attempt. Essentially we had the process
setting its flag and waiting for the other processes flag to be set to false. And we have
seen that because both processes may enter into the while loop and wait for each other,
we would have a deadlock situation. Now the next best thing to do is whenever we have
such a deadlock situation, we need to find a way to actually come out of the deadlock
situation. Essentially we need to ensure that one and exactly one of these two processes
will break from the while loop and enter into the critical section.
364
So, this solution was made by Peterson, and it is what is known as a Peterson’s solution,
and essentially has an other variable known as favored (refer above slide image), so this
is a globally defined variable which is shared among the two processes. Essentially, what
the favored variable does is that whenever deadlock situation occurs, it favors one of the
processes, and that process would then enter into the critical section. While the other
process will continue to wait until the second process sets its flag to false.
So, essentially Peterson’s solution sets the value of favor to the other process so that it
favors the other process to enter into the critical section. Now favored it is used to break
the tie when both p 1 and p 2 want to enter into the critical section that is when both p 1
and p 2 are waiting in this while loop or rather are executing in this while loop. The
reason this particular solution works is that, out of the two processes there is exactly two
values that favored can take, it can take only one or two, and therefore only one of these
while loops will actually break or complete execution.
So let us see this in more detail. So, we have seen the two processes p 1 and p 2, and
everything is as the third attempt that we want that p1_wants_to_enter set to true and
then there is a while loop and then the critical section and then p1_wants_to_enter set to
false (refer above image process 1). Now we also have a favored added here in the
Peterson solution and this favored is set to 2 over here and favored is set to 1 in the 2 nd
process (mentioned in above slide). So, in all other cases this particular Peterson’s
365
solution behaves like our third attempt to create the critical section solution. However the
only difference comes when both processes enter into this while loop at the same time. In
such a case, the value of favored will determine which of the two processes enter into the
critical section.
So let us say process 2 is favored and it enters into the critical section, and at the end of
its critical section it sets p2_wants_to_enter to false. So, remember that during this entire
period p 1 is still waiting or still executing in the while loop. Now when process 2 sets its
flag to false, it immediately causes this while loop to break (process 1 while loop), as a
result process 1 will enter into the critical section. So, you see that only when process 2
exits from the critical section only then will process 1 enter into the critical section, thus
achieving mutual exclusion. Also, we had seen that the bounded weight issue is solved
and as well as the progress thing is solved. The Peterson’s solution works very efficiently
for two processes.
In the next video, we will see the bakery algorithm which is used to solve the critical
section problem when there are multiple processes in the system. So, the bakery
algorithm solves the critical sectional problem when the number of processes is greater
than two.
Thank you.
366
Week - 06
Lecture – 26
The Bakery Algorithm for Synchronization
In this video, we will look at the Bakery Algorithm, which is a software solution for the
critical section problem. Essentially compared to the Peterson’s Solution which we seen
previously, this algorithm is most suited for larger number of processes. Essentially when
we have the number of processes which is greater than 2, then the bakery algorithm
would work efficiently.
So, the bakery algorithm was invented by Leslie Lamport, and you can get more
information from this particular website present here (mentioned in above image). So,
the essential aspect of the bakery algorithm is the inspiration from bakeries and banks. In
some bakeries, what we see is that when we enter the bakery, we are given a particular
token for instance in a bakery, we would be given a token with a number present over
here (mentioned in token in above image).
So in this particular case, we have a token number 196 which is present. Now we need to
wait for some time until the token number 196 is called out. So, we have a display over
367
here (mentioned in above slide) which periodically would set a token number, and when
the token number of 196 is displayed then you are able to get your food from the bakery
and you could eat. So, essentially when we look at this from synchronization aspect, we
see that we are trying to synchronize the usage of a particular counter. So all people who
have such a token should wait until their number is called, then sequentially each person
depending on when the number is called goes into the counter, and is able to collect
whatever he or she wants and for instance eat.
So, we will see how the bakery algorithm is used to solve the critical section problem.
So, we will start with the simplified analysis of the bakery algorithm. So, this particular
algorithm (mentioned in above slide) is used to solve the critical section problem when
there are N processes involved and all these N processes access the same critical section.
Now there is also a global data which is shared among these N processes and this data is
known as num. So, the length of num is of size N; essentially each process has a
particular index in this num array.
So, for instance process 0 would have a flag corresponding to num[0], process 1 has
num[1], process 2 has num[2] and so on. Secondly, at the start of execution, this value of
num is all set to 0s. Now in order to enter a critical section, a process would first need to
invoke the lock call with the value of i (i.e lock(i)). So, i here is the process number for
example, a process id. So, we have N processes so, the value of i could be from 0 to N -
368
1. And at the exit of the critical section, the function unlock i is invoked (i.e. unlock(i))
where the value of num[i] is set to 0.
So let us see what lock and unlock is actually doing internally. So, when a process
invokes lock(i) where i is its number; its corresponding num value. So, num[i] is set to
the maximum value, essentially this particular function MAX (mentioned in above
image) is going to look at all the numbers corresponding to all of the processes and get
the maximum from that and add one to that. So, num[i] is going to get the highest
number which is present among the shared num array. Second there is for loop, which
scans through all processes. So, for(p=0;p<N;++p).
And within this for loop (as mentioned earlier), there is a while loop which checks two
things, it checks that num[p] != 0 and num[p] < num[i]. So, essentially this particular
while loop will break, when either num[p] is 0 or this particular condition (i.e num[p] <
num[i]) is false. So, essentially num[i] <= num[p]. So, essentially, the process i will enter
into the critical section only if it has the lowest nonzero value of num[i].
So let us look at this within an example (refer above slide image). So let us assume that
we have five processes P 1 to P 5, and these are the num values (initially 0). So, the num
array is also having 5 elements, and each of these elements corresponds to a process. So
num[0] corresponds to process P 1, and num[1] corresponds to P 2, num[2] corresponds
to P 3 and so on. Now let us also assume that all these processes almost simultaneously
369
invoke the lock(i) function, essentially all these processes want to enter into the critical
section almost simultaneously. Then what would happen?
Let us say process P 3 begins to execute. So, it comes here (red box in above image) and
it finds, the CPU finds that the MAX of all these numbers which are present is 0, so num
of a corresponding to P 3 is set to 1.
Then let us say P 4 executes and it gets a value of num[4] equal to 2 (mentioned in above
slide below P4).
370
Then P 5 executes, and P 2 executes and they get corresponding values of num as 3 and 4
(mentioned in above image). Now, let us say that we come into this part of the loop (i.e
for loop), and we see that the process with the lowest nonzero value of num would enter
into the critical section. So in this particular case, we scan through all these particular
values of num and we see that P 3 has the lowest value. Therefore, P 3 needs to execute
in the critical section.
371
So, P 3 executes in the critical section and at the end of the critical section it sets the
corresponding value to 0 (i.e num[i] = 0) then because other processes are also waiting in
this particular loop.
So, the next lowest number which corresponds to the process P 4 and has a value of 2
would get to execute in the critical section. Therefore, process P 4 enters into the critical
section and at the end of it the number corresponding to process P 4 is set to 0. And then
process P 5 executes and then process P 2 executes (mentioned in above image and
setting value of num to 0).
372
So, process P 2 would execute, because it is the only nonzero number which is present.
So, at the end of the P 2 execution, we get all values of num which are back to 0.
So, one requirement or one assumption that we made over here (blue box statement in
above image) is that this particular assignment of MAX needs to be atomic; essentially
this is required to ensure that no two processes get exactly the same token.
373
Essentially, it means that when a particular process is executing this particular statement
(blue box statement in above image) that is find the MAX of all these numbers and
adding 1 to it then no context switch can occur. This entire statement executes as a single
entity. So, the reason why we make this particular assumption is that we need to ensure
that no two processes get the same number. So let us see what would happen if we
actually have two processes having the same number, essentially what would happen if
this doorway (refer above slide image) or this statement which is known as the doorway
is not atomic. So, we will take our example of the five processes and we will look with
respect to this particular example.
374
So, as usual let us say process P 3 invokes lock first, and it obtains a number 1 because is
the smallest number all other numbers are 0.
Then let us assume that process P 4 and P 5 simultaneously execute MAX, resulting in
both of them getting the value of 2 (refer above slide image).
375
And then of course, we have the process p 2 which gets the value of 3. Now, what would
happen in the second part of this lock (blue box statement)? So, we would see as usual
process P 3 is going to execute first, because it has the lowest number. And once it exit
from the critical section, P 3 is going to set its corresponding number to 0, therefore this
number (1 corresponding to p3 as mentioned above) corresponding to P 3 is set to 0.
Now next there are two small numbers corresponding to P 4 and P 5 which are equal (i.e
2). So, as a result of this, we have process P 4 as well as process P 5 which enter into the
critical section simultaneously.
And thus we do not achieve the mutual exclusion, therefore it is required that this MAX
operation is atomic. So, this will ensure that no two processes get the same value for
num; and thereby it will ensure that the critical section is executed exclusively by a
process at any given instant of time. Next what you are going to look at is the relaxation
of this particular assumption, so we are going to look at what is known as the original
bakery algorithm where we do not require to make this statement atomic.
376
The original bakery algorithm is as follows (refer above slide image); in addition to the
shared array num, which is present as before. We also have a shared array called
choosing. So, this is a Boolean array and could have the value of true and false; and
length of this array is N that is each process will have a particular element in the
choosing. So, essentially this particular choosing is set to true before the process could
invoke MAX and after this particular MAX function is invoked and 1 is added then the
process would set its choosing value to false (3 statements in blue box mentioned in
above image).
So, there are also some minor changes in the second part of the algorithm. First, we have
a statement called while(choosing[p]);, which is present here (mentioned above as blue
arrow). So, this particular statement would ensure that a process is not at the doorway
that is it will ensure that the process is not currently being assigned a new number
through this MAX, that is the process is not choosing a new value of num.
377
Secondly, we have the second part of the while loop which essentially changes over here
in the condition check (blue arrow in mentioned in above slide), which we have a tuple
num[p],p checked with less than num[i],i. So, what this condition checking means is
written over here and it means the following (mentioned in above slide image). If (a, b) <
(c, d) is the same as (a < c) or ((a == c) and (b < d)).
This particular complex looking check is used to break the condition when two processes
have the same num value. So, as we have seen before if two processes are given the same
value of the num then this condition (mentioned above) is used to going to resolve the
issue, and ensure that only one of these two processes would enter into the critical
section. When num[p] = num[i] that is both of the numbers have the same value we need
to favour one of the processes. So in such a case, we favour the process with the smaller
value of i.
378
So let us see this with the same example as we seen before. So let us look at this example
again and let us say as usual process P 3 executes this MAX first and is given the
smallest number, then P 4 and P 5 happen to get the same number of 2 and then process P
2 gets the next highest value of 3. Now, let us look at the second part of this locking
(refer above slide image). So, the first process to execute in the critical section is quite
obvious that is P 3, because it has the lowest number, so P 3 executes. And at the end, it
will set the value of num[3]= 0. Now the next process to execute could be either P 4 or P
5.
So, how do we choose between these two processes? We have seen that both num[p] and
num[i] are 2 in such a case, therefore, in order to favour one of the processes, we look at
the second part that is p and i (only p and i mentioned in while condition), so, based on
this we favour the process which has a lower number and therefore, process P 4 executes
in the critical section. So, after P 4 executes its value is set to 0, and then quite naturally
P 5 executes.
And after P 5 executes as usual P 2 will execute. Thus we see the addition of choosing
the Boolean array over here as well as a more complex conditional check would help
resolve the need for an atomic operation of MAX. So, this is the original bakery
algorithm which was proposed by Leslie Lamport; and it efficiently helps to solve the
critical section problem when the number of processes is greater than 2.
379
Thank you.
380
Week – 06
Lecture – 27
How to implement Locking (Hardware Solutions and Usage)
Hello. In this particular video we will look at Hardware Solutions to Solve the Critical
Section Problem.
So in the previous video we had looked at Software Solutions to Solve the Critical
Section Problem. However, in practise these solutions are not very efficient. So in
practise both in the operating system as well as in applications, hardware techniques are
used to efficiently solve the critical section problem. So we will start this video with the
small motivating example.
So let us start with this particular example (refer above slide image). Let us say we have
the two processes; process 1 and process 2 and both are having a similar critical section
that accesses the same shared data. And also let us say that we have this shared variable
call lock (mentioned in above slide), so this variable is shared between process 1 and
process 2. Now what each of these processes do is two things; first in order to lock the
critical section, the process would first execute this while loop until lock becomes 0 (i.e
381
while(lock != 0)). When lock becomes 0 then the process would enter here (below while
condition) and set lock to 1 and then execute the critical section. At the end of the critical
section the process would set lock to 0 in order to unlock the section. The question that I
would post over here is; does this particular scheme achieve mutual exclusion? The
answer is no.
Essentially we have seen such things in the previous video as well. Due to context
switching between the two processes we would reach a state some times when both
processes execute in the critical section. Thus, we will not be able to achieve mutual
exclusion. So, for instance if we actually look at this (refer above slide 3rd block of
statements), we see that lock has an initial value of 0. Then suppose process P1 executes
in the CPU, and it executes while(lock != 0) and since, we have lock as 0 over here
(before while condition), so this particular while loop will break, however we have a
context switch (2nd dotted line in 3rd block) that occurs before the process could set lock
equal to 1.
Now, this context switch results in process P2 executing. Now P2 will also see the value
of lock equal to 0, and therefore break from this (while condition of P2) while loop and
then set lock to 1. Now again if there is a context switch occurring (3 rd dotted line) and
process P1 will continue to execute from where it stopped, it will see the old value of
lock which is 0 because, the context is returned back to the old value of the process
before the context switch occurred. And therefore, process P1 would see the value of
lock as 0. And therefore, it would set lock equal to 1 (i.e. lock = 1) and enter into the
critical section. Thus we have reached the state where both processes are in the critical
section.
382
Now, the main reason that this particular scheme failed was that as we have written this
(set of statements) in software, we were unable to make these two statements atomic.
That is, we are unable to ensure that a context switches do not occur in between these
two statements. Said another way, we are unable to make these two statements execute as
a single unit. And thus, due to this reason we were unable to achieve mutual exclusion.
Now most processors such as the x86 Intel processors have dedicated instructions that
will ensure that these statements are executed atomically.
Let us take an example of such an atomic instructions through which we could

implement this set of statements (refer above slide image) in an atomic manner. So we
will take a very general instruction first and then analyse specifically for the Intel x86
type of processors.
383
The first instruction that we would see today is the Test and Set Instruction. Essentially if
we were to write this instruction in C, it would look something like this (mentioned in
above image). It is a function which takes a pointer to a memory location. So in this
function (i.e test_and_set) first the contents of that memory location is stored into this
variable called previous i.e prev, then that memory is set to 1 (i.e *L = 1) and the value
returned is the previous value. In other words this function would return the previous
contents of the memory and set that memory to a value of 1.
Now, when we look at this (above mentioned function) from a hardware perspective and
essentially from the processor perspective, this entire function is an atomic function.
Essentially we would have one instruction that would do all of this thing in one shot that
is, we would have one instruction that would perform all of these operations atomically.
So let us look at this particular diagram to demonstrate how this thing actually works.
So let us say that we have the processor and the memory location pointed to by L is over
here. So, what would happen when this test and set gets executed, is the following
(mentioned in above image slide); first the previous contents of the memory get loaded
into a register present in the processor and then the memory contents is set to 1. So let us
look at it another time; first the memory contents pointed to by L is loaded into a register
present in the processor and then the contents of that memory location is set to 1.
384
Now, the atomicity of the test and set instruction can be defined or explained as this
statement here (mentioned in above image in box) which stats that, if two CPU's execute
test and set at exactly the same time, the hardware or that is a processor ensures that one
test and set for one of these processors does both its operations. That is, reading contents
of the memory to a register and setting that particular memory location, so both these
steps are done before the other process starts executing.
In other words, it is not possible for two processes to execute the test and set instruction
and both processes read the previous value of the memory location. So, essentially this
thing is wrong (mentioned in above imsge), is not correct. That is it is not possible for
both processes to simultaneously read the contents of the memory location (i.e 0
mentioned above) and set the value of memory (i.e setting 1).
385
However, what actually is guaranteed by the processor hardware is that when processes
run on each processor and both processes execute the test and set instruction at exactly
the same time, the hardware will ensure that one process completes its entire instruction
before the next process could execute its instruction. Therefore, in such a case one
process would read the value of 0 while the other process would read the value of 1
(mentioned in above image). Now we will see how this particular instruction is used to
solve the critical section problem.
386
So this particular snippet over here (while condition mentioned in above image) shows
how a process could use the test and set atomic instruction which hardware supports in
order to solve the critical section problem. Essentially, in order to lock the critical section
we would have a while loop present over here (second while condition in above snippet)
which would invoke the test and set with a lock, that is with the memory location
(&lock). Now, this while loop will execute continuously until the value returned by the
test and set is 0; So when the value returned by test and set instruction is 0 then critical
section is entered and at the end of this the value of lock is set to 0.
When there are multiple processes having such a code snippet, the hardware guarantees
that the test and set for exactly one process would return a value of 0. All other processes
will loop continuously in this particular while loop. So when this process has completed
executing the critical section, the value of lock would be set to 0. So, setting the value of
lock could to 0 would result in exactly one other process to obtain a value of 0 from the
return of the test and set. Therefore, exactly one other process would then enter into the
critical section.
So the first invocation of the test and set will read a 0 and set the lock to 1 and return.
The second test and set innovation will then see the lock value as 1 and will loop
continuously until the lock become 0. The value of the lock becomes 0 only when the
first process unlocks it that is exclusively sets the value of lock to 0. So in this way we
see that by a little help from the hardware it is feasible to solve critical section problem
very efficiently.
387
So, Intel systems do not support the test and set instruction. On the other hand it supports
an instruction known as exchange; xchg. So an xchg instruction is represent over here
and this is also an atomic instruction and the xchg instruction takes two parameters, so it
takes a memory address and an integer value. Now what is done is that the contents of
that memory location is stored in previous, then the value of v that is the value which is
passed to xchg is then stored into that memory location and the previous value is
returned (mentioned in above slide image).
So, essentially what this achieves is that we are able to xchg register variable (i.e int v)
with a memory location. To take an example, so look at this (mentioned in above image)
when once an xchg instruction is executed by a process running in this processor, it will
exchange a register value with that of a memory location and this entire thing is done
atomically.
388
So even if another process executes the xchg at exactly the same time, the Intel hardware
ensures that these two operations are done distinctively. So, first one process would
execute and complete the xchg instruction and only then the second process would
execute and exchange the data.
389
So this particular slide (mentioned above) shows how an xchg instruction in an Intel
hardware is used to solve the critical section problem. We could have two primitive
functions called Acquire and Release. So the acquire is used to lock a critical section,
while a release is used to unlock a critical section. So this, acquire and release functions
are passed a pointer to a memory location known as Locked. So in acquire in an infinite
rather in acquire in a loop, this function xchg locked is invoked. So, xchg is present over
here (function mentioned in above image) and it is passed two parameters; the address of
the memory location in this case the address of locked and the value which you want to
set (i.e xchg(addr, value)). So first the value in this case one is moved into the eax
register in the Intel processor and then the xchg instruction is invoked.
As a result the memory or the data which was stored in memory gets loaded into the eax
register. And the value in this case one which was loaded into the eax register gets stored
into memory. Now this particular eax register (mentioned as note in above image slide) is
what is returned back to acquire. This loop (loop mentioned inside acquire function)
would break if the value returned by xchg that is the contents of the eax register is 0. In
order to release the lock, we simply set the locked value to 0 (mentioned in release
function).
390
Let us see how it works. So first, the memory location over here would be 0 and when
the process invokes xchg instruction that 0 value is pushed into the eax register, and the
value of 1 which was present in the eax register comes into the memory.
As a result, the memory has a value of 1 while the eax register present in the processor
has a value of 0 (mentioned in above slide image) and this is what is used to break from
this while loop and enter into the critical section. Now in order to release the lock, we
have simply setting the value of locked to 0. Now when a second process invokes
acquire, it will continue to loop in this while loop until the value of this memory location
is set to 0 by the first processor.
391
We see over here (as mentioned above), so in order to release lock we simply set the
value in memory to 0 and then in another processor perhaps would be able to obtain the
value of 0 and then acquire the critical section.
So this instruction like the xchg is used to build higher level constructs which are used in
various critical section problems. So, we will look at Spinlock, and Mutex in this video
and Semaphore in a later video.
392
So Spinlock is what we have seen before, it has two functions acquire and release. While
the acquire is used to gain access to the critical section essentially it is used to lock the
critical section, while the release function is used to unlock the critical section. So in
order to use this we could have two processes; process 1 and process 2. In order to enter
into the critical section the process has to acquire the lock and this function acquire will
continuously loop until the value return by xchg is 0. So when the value of xchg is 0 then
this loop breaks and the process 1 would enter into the critical section. Releasing the lock
is just done by setting the value of locked to 0.
So, one process will acquire the lock at a given time while the other process will wait in
this particular loop and continuously invoke the xchg function until it obtains 0. So the
first process in order to release the lock would have set the value to 0, this would cause
the second process to break from this loop (mentioned above in acquire function) and
enter into the critical section. So you could see more details about the way spinlocks are
implemented in xv6 by looking into these 2 files that is spinlock.c and spinlock.h
(mentioned above).
393
We will now look at some of the issues that go about with the use of the xchg instruction.
So as we have seen the xchg instruction is the most crucial part of any of the constructs
such as the Spinlocks that we have seen in the previous slide. So therefore, it’s important
to understand what are the issues related to the xchg instruction. Essentially this
particular format of the xchg instruction (i.e xchg %eax, X), exchanges data between a
register eax and a memory location X. Now it should be ensured that there are no
complier optimization that are done on this variable X.
So, some of the common optimization that are possible is to make the value of X stored
in a register. In such a case, we are simply exchanging data between a register eax and an
other register which will not solve the purpose. Therefore, we should write this particular
loop in assembly or use the keyword volatile.
Another requirement while implementing the xchg instruction is to ensure that memory
operations are not reordered. The CPU should not reorder memory loads and stores. So
in order to achieve this we use serialized instructions, which force instruction not to be
reordered. So luckily for us the xchg instruction by default already implements
serialization. So there is nothing much we need to take care about this.
394
However, what we need to look at more closely is the fact of cache memories. Now,
recollect that xchg instruction exchanges data between a register and a memory location.
Now each CPU present in the system could have their own private cache, for instance
CPU 0 has its own private L 1 cache; similarly CPU 1 has its own L 1 cache (mentioned
in above slide image). So it should be ensured that this value of X (mentioned in xchg
instruction) is not a cached in each of these CPU's, essentially caching of CPUs is not
possible. But rather, each and every execution of this xchg instruction should actually go
to memory and load or store the value of X.
Secondly it should also ensure that when one CPU is reading and writing to this X
(mentioned above in memory box), that is in other words when one CPU is invoking the
xchg instruction during this time no other CPU can modify the value of X, as this will
break the atomic nature that is present. In order to do this the CPU asserts what is known
as a lock line to inform all other CPU's in the system that there is locked memory access
that is going to take place.
So as a result of this lock instruction, all xchg instruction which read and write data
would result in the reading and the writing data from a memory. So, this may be
tremendous amounts of performance hits.
395
Therefore, in order to make a better acquire, an acquire which is more efficient we will
look at a small tweak to this particular acquire function (mentioned in above slide). So
originally, we use to have a loop over here (mentioned in above in original side) and in
this loop we would continuously keep invoking the xchg function and checking whether
it returns a value of 0.
So note that, each of these xchg instructions has a huge performance overhead because it
requires that the locked keyword that is the X value would go all the way to the memory
and read and write data from the memory. Essentially, the caching is not possible and
therefore huge overheads.
On the other hand if we make a minor change to this acquire function as shown over here
(mentioned above in better way side) it would improve performance quite significantly.
So we have two loops one is the regular xchg loop which as you know would result in a
bus transaction and has huge overheads, while does an inner loop which simply loops
checking the value of the memory location locked.
So this particular internal thing (i.e second while condition mentioned above) can is
cacheable and therefore, will not incur much performance overhead, while, the external
while loop (i.e first while condition) will have significant overheads, but it’s not invoke
too often. Most of the time this particular cached memory (i.e second while condition
mentioned above ) would be read.
396
Now the cache coherency protocol will ensure that when another process changes the
value of locked, this process would see the value of locked changing from 1 to 0 and it
will exit the while loop (inner loop) and go back to this particular outside while loop.
The xchg function would then exchange the locked value with the register value and set
the value to 1, and therefore break from the while loop.
Thank you.
397
Week - 06
Lecture – 28
Mutexes
Hello. In this video we will look at a Mutexes, which is a construct used to solve the
critical section problem.
So we will start with where we stopped off in the last video, with Spinlocks. Essentially,
the main characteristic of spinlocks is that it uses busy waiting. That is, we had seen that
in order to have a lock, there was a while loop and in that while loop the xchg instruction
was continuously invoked and the while loop would only exit when the xchg instruction
returned a 0.
So this busy waiting is not ideally what is required. Essentially, busy waiting causes the
CPU cycle to be wasted, leading to may be things like performance degradation as well
as huge wastage of memory.
So, where would we actually use Spinlocks? Spinlocks are useful when we have short
critical sections and we know that we do not have to waste too much time in waiting. So
398
for instance, if we just want to increment a shared counter then we could possibly use a
spinlock or another thing is to access an array element then a spinlock would be
preferred. Essentially, these things we assume would will not have too much of
overheads. Therefore, even if another process is accessing the counter we are certain that
the process is going to spend too much time incrementing the counter.
Therefore, the waiting process will not have to waste too many cycles waiting to enter
into the critical section. However, spinlocks are not useful when the period of waiting is
unpredictable or it will take a very long time. For instance, if there is a page fault and
resulting in a page of memory which is loaded from the hard disk into the main memory.
So, this would take a considerable very long time and you do not want your process to be
actually wasting CPU cycles during this entire operation. In such a case, we use a
different construct called a Mutex.
This over here (mentioned in above slide image) shows how typically the mutex is
implemented. So, it again relies considerably on the xchg instruction which is used and
just like the spinlock, we have the lock and unlock and a memory location which is
shared between all processes.
Now, in order to obtain the critical section, a process would need to invoke a lock and in
this lock function we have a while loop (as mentioned above). So, as in this spinlock
case the xchg instruction is invoked in the while loop and this instruction would either
399
return a value of 0 or something not equal to 0 or typically 1. So if the value is equal to 0,
then we break from this loop and that process would then have acquire the lock and
execute in the critical section. However, if the xchg returns a value which is not 0 then
we go into this else part and execute this function called sleep() (mentioned in lock
function in above image).
Now, this sleep function would cause the process to go from the running state into the
blocked state. Essentially, the process is waiting for a particular operation to arrive. So,
until this operation arrives the process will not get any CPU time. Now, this event which
the sleep is waiting for is the wake up event. So, when other process invokes wake up, it
will result in the sleeping process to be woken up from the blocked state and put on to
the ready queue. Now, if it is lucky when it executes the xchg instruction again it would
get 0 and it would get into the critical section.
On the other hand, if it is unlucky it would execute the xchg instruction and get
something which is non-zero and it would go back to sleep. And it would continue to
sleep until woken up by another process. So essentially, we see over here (in mutex lock
function) that instead of doing a busy waiting as was done in spinlocks, in mutexes we
put the entire process into a sleep state. The process will continue to be in a sleep state
until it is woken up and when it is woken up, it is going to try the lock again and if it
achieves a lock then it enters into the critical section.
400
So one issue with mutex is what is known as a Thundering Herd Problem. So, the
thundering herd problem occurs when we have large number of processes. So, each of
these processes let us assume is using the same critical section and invokes the lock in
order to enter into the critical section, and at the end of the critical section would invoke
unlock which would then wake up another process waiting for the critical section. So, it
could happen if we have large number of processes that, there are several processes
present in the sleep mode while one process enters into the critical section.
So, when that process executes unlock it invokes wake up. This results in all the
processes which are sleeping to be woken up. So, all these processes would then go from
the blocked state into the ready queue and the scheduler would then sequentially execute
each of these processes. So, each process is then going to continue its while loop and
executes the xchg function. So out of all these processes because of the atomic nature of
the xchg instruction, only one process would acquire the lock and all other processes
would go back into sleep state. So, this continues every time.
So every time, whenever an unlock is invoked by a process just completing its critical
section, it will invoke wake up (i.e wakeup() inside unlock() function) and it will result
in all processes waiting on that mutex to be woken up and all, except one would go back
to sleep. There would be exactly one process which would gain the lock and enter into
the critical section. So, as a result of this, what we see that whenever there is a wake up
invoked there would result in several context switches occurring, in order that all the
processes execute and check the xchg instruction again. Now, by the way exchanges
implemented in the hardware, all processes except one will enter into the critical section.
So, the issue over here and why it is called a Thundering Herd Problem is every time
there is a wake up, there is a huge avalanche of context switching that occurs because a
large number of processes are entering into the ready queue and this could lead to
starvation.
401
One solution to the Thundering Herd Problem is to modify the way mutexes are
implemented by incorporating queues. In this implementation of a mutex, whenever the
lock is invoked and the xchg instruction return something which is non-zero, the process
gets added into a queue and then goes to sleep.
Now, when a process invokes unlock, the sleeping process is removed from the queue
and a wake up specifically for only that process is invoked. So, unlike the previous cases
where all processes are woken up, in this case there is exactly one process which is
woken up, so this process P which is specified here (in unlock() i.e wakeup(P)) would
wake up from sleep and since it is the only process which has woken up, it would
typically go into this while loop, check the exchange and most likely it would get the
lock and execute the critical section.
Similarly, when it unlocks, it would pick out the next process which is waiting in the
queue and it will wake up only that process. Now, this second process would then enter
into the critical section.
402
So, when we are talking about synchronization primitive such as a spinlocks and
mutexes, it is important to also consider the case when a priority based scheduling
algorithm is used in the operating system. So let us consider this particular scenario.
So, let us say we have a high priority task and a low priority task which share the same
data and have a critical section. Now, let us say that the low priority task is executing in
the critical section and at this particular time, the high priority task request for the lock in
order to enter into the critical section. So, the scenario we are facing here is that the low
priority task is executing in the critical section, while during this time the high priority
task invokes something like a lock and wants to enter into the critical section.
Now, the dilemma we are facing here is that we have a high priority task which is
waiting for a low priority task to complete. So, this is known as the Priority Inversion
Problem. Essentially, we have something important - a task which is important and given
a high priority and it is waiting for a lower priority task to complete its execution. And if
you look at this particular link present here (mentioned in above slide image), you will
see quite an interesting case where such a priority inversion problem had occurred,
essentially with a path finder.
So, one possible solution for the priority inversion problem is known as the Priority
Inheritance. So, essentially in this solution whenever a low priority task is executing in
the critical section a high priority task request for that critical section, what happens is
403
that the low priority task is escalated to a high priority. Essentially, the priority of the low
priority task becomes equal to that of the high priority task. The low priority task then
would execute with this high priority until it releases the critical section. So, this would
ensure that the high priority task would execute relatively quickly.
Thank you.
404
Week – 06
Lecture – 29
Semaphores
Hello. In this video, we will look at another synchronization primitive known as

Semaphores. As usual we will start with the motivating example and then we will show
the Application of Semaphores.
So let us start with the very popular example known as the Producer-Consumer Problem.
So this is also known as the Bounded buffer Problem. Essentially, what we have here are
two processes; one process is known as the Producer and the other process is known as
the Consumer. Now the producer and the consumer share a bounded buffer. So this is a
normal buffer and it has a size of N that is it has N data elements which can be stored in
it.
So in this particular case for example, there are 6 elements that can be stored in the
buffer. Now the producer produces data. So for instance, it could be a data acquisition
405
module which collects data such as the temperature, pressure and so on, so this data is
pushed into the buffer. Now on the other side, the consumer takes from the buffer and
then processes the data. So for example, this consumer process could perhaps compute
some analytics on the producer data.
So, everything would work quite well, that is the producer produces data, puts it on the
buffer and on the other side the consumer takes from the buffer and begins to consume
the data (mentioned in above slide image). For instance computes something with the
data. The trouble will occur when for instance the consumer is very slow compare to the
producer. In such a case the producer will produce quite a bit of data at a faster rate
compare to the consumer and therefore very quickly the buffer will be full.
So, what does the producer do next? An other problem could occur where the consumer
is very fast compared to the producer. And therefore, it could very quickly consume all
the data in the buffer, and resulting in a buffer which is empty. So, what should the
consumer do next? So this requires a synchronization mechanism between the producer
and the consumer. Essentially, when the buffer is empty the consumer should wait until
the producer fills in data into the buffer. Similarly, when the buffer is full the producer
has to wait until the consumer takes out data from the buffer.
406
So this is the general producer and consumer code (mentioned in above slide), and we
are trying to solve the producer-consumer problem by using Mutexes. Essentially, we are
using 3 mutexes; empty, full, and a Mutex just called a mutex (as mentioned above).
So, the producer code (mentioned in above slide image) essentially would produce an
item, insert the item into the buffer and increment the count, count++. While, the
consumer code would remove the item, decrement a count and then consume the item.
Now in order to take care of the troubled situation, that is when the buffer is full or the
buffer is empty we are using these mutexes; empty and full. Essentially, before inserting
the item the producer would check if count == N that is its going to check if the buffer is
full. And if the buffer is full, it is going to sleep on this particular mutex called Empty
(i.e sleep(empy)).
And on the other side, the consumer would check if count == N - 1 which means that it
has just removed one element from a full buffer. So if this is so then it is going to wakeup
empty (i.e wakeup(empty)). This wakeup is a signal to the producer to wake up from its
sleep and then the producer can insert the item into the buffer and increment the count.
On the other hand, if the consumer finds that the buffer is empty, that is the count equal
to 0 (if count == 0) then it is going to sleep on this particular mutex called full (i.e
sleep(full)). So, it will block on this (sleep) mutex until it gets a wakeup from the
producer.
Essentially, if the producer finds that when he inserted the item and incremented count
that there is exactly 1 item present in the buffer then he will send a wakeup signal to the
consumer, so wakeup(full). And therefore, this wakeup(full) will cause the consumer to
unblock and put it back into the ready queue and it would allow the consumer to execute,
and remove that item and then consume that item. So after he removes the item the
counts goes back to 0. Now in addition to this empty and full mutexes, there is also the
third mutex which is used. So, this mutex is essentially used to protect or synchronize
access to the buffer. So, before inserting an item and incrementing the count, the
producer needs to lock the mutex and unlocking is done after the item is pushed and
count incremented (as mentioned in above slide).
407
On the other side, before the buffer is accessed to remove item and also count is
decrementing. Essentially you notice that count and the buffer is shared among these two
processes that is the producer and the consumer. And therefore, this mutex will help
synchronize access to the buffer and to the count value, so this solution seems to be work
fine that is with the 3 mutexes. So, while this scheme seems fine we will show that under
a certain condition the producer and the consumer will block infinitely without any
progress.
So that condition is based on the fact that this particular line, if (count == 0) actually
comprises of two steps which are non atomic. The first step is that the count value which
is stored in memory will be loaded into a register in the processor, and the second step is
when the register value is checked to be 0 or not. So let us look at the problem that could
occur because this particular execution or this particular statement is non-atomic.
So let us say, that the consumer starts executing first and its starts executing with an
empty buffer. So it reads the value of count from a memory location into a register, since
we are assuming that this is the initial state so the value of count that is loaded into the
register would be 0. Let us then say that there is a context switch that occurred and the
producer has executed. Now the producer produces an item, increments the count to 1,
408
and then inserts the item into the buffer. Now after it executes, let us say that there is a
context switch again, as a result the consumer continues to execute from where it had
stop that is from this point (point 3 in green color as mentioned above).
Now we know that it has already loaded the register previously with the value of 0. Now
it is going to test whether count equal to 0, which is true in this case and therefore the
consumer is going to wait (i.e last 2 green lines in above image). Essentially, it’s going to
wait till it receives a signal from the producer. However, the actual value of count is 1,
because the producer has pushed an item into the buffer, and thus we see there is a lost
wakeup that occurs. The consumer has missed a wakeup signal which the producer has
sent. Now there is nothing stopping the producer from pushing more items into the
buffer.
So, eventually the entire buffer is full and the producer will then wait for the consumer to
remove some item. However, this will not occur because the consumer itself is waiting.
Thus, we have a producer waiting for the consumer to remove an item, while the
consumer is also waiting because it has missed the wakeup. Thus, we eventually reach a
particular state where both producer and consumer will wait infinitely. So we see that
using 3 mutexes will not solve the producer-consumer problem.
409
Let us look at another primitive known as Semaphores. So this semaphores is another
synchronization primitive it was proposed by Dijkstra in 1965. And semaphores are
implemented with 2 functions called down and up which we assume are atomic
(mentioned in above slide image). So these are the two functions and thus requirement is
that both these functions need to be atomic. So there is a shared memory location which
is termed as S and in the down function, the while loop will test whether S <= 0.
So, as long as S has a value which is less than equal to 0 this particular loop (i.e while(*S
<= 0)) will execute. When S takes a value which is greater than 0, then the loop would
break and the value of that memory location S would be decremented by 1. In the up
function which is also atomic the value of the memory location S is incremented by 1 (as
mentioned above). So, the down and up functions are sometimes called as the P and V
functions respectively from there Dutch names. And we could also have two different
variants of the semaphores; we could have a blocking semaphore and a non-blocking
semaphore as well. So, a non-blocking semaphore is shown over here (mentioned in
above image in box), essentially it is a while loop which is resulting in a busy waiting
much like a spinlock.
On the other hand we can make a small modification and have a blocking semaphore,
where this particular statement (i.e while(*S <= 0)) will result in the process going to a
blocked state, while a signal from the up would wake up the process. So if the values of
S was initially set to 1 then a blocking semaphore is similar to a mutex, while a non-
blocking semaphore is similar to a spinlock. So now, let us see how we could use the
semaphore to solve the producer-consumer problem.
410
So in order to solve the problem we require two semaphores; one is known as full. So
when I say two semaphores it means two memory locations which we specify by this S
over here (mentioned in slide time 10:39). So we have two memory locations or two
semaphores, full and empty. So, full is given the initial value 0, while empty is given the
initial value N, where N here is the size of the buffer (mentioned in above slide). The
semaphore full indicates the number of filled blocks in the buffer, while the semaphore
empty would indicate the number of empty blocks in the buffer.
In this particular case, where N equal to 6, fill will have a value of 4 because there are 4
filled blocks and empty will have a value of 2 because there are 2 empty blocks
(mentioned in above slide image). So, the initial states just before the start of execution
of the producer and consumer will have full equal to 0 and empty equal to N, because
there is no data items in the buffer; essentially, because the buffer is empty.
411
So let see, how the semaphores are used. Let us look at the producer, so the producer
produces an item and we will take this particular example (mentioned in above slide
image) where a fill is 4 and empty is 2, and then when the item is produced it invokes the
down semaphore (i.e down(empty)). So the down semaphore, as we have seen will down
the empty semaphore, so empty will go from 2 to 1. So, this is an atomic operation.
412
Then the next step in the producer is to insert an item, so a new item gets inserted into
the buffer. Then there is an up(full) (as mentioned above) that is the semaphore full will
get a value of 5.
Similarly, in the consumer part is as follows (mentioned in above slide image). First
there is the down(full), so the value of full will go from 5 to 4 as seen here.
413
Then an item is removed from the buffer and the value of empty is set to up. So, up the
value of empty will become 2, and then the consumer will consume the particular item
(as mentioned in above slide).
Now let us see both the producer and consumer and the case of a full buffer (as
414
mentioned above). So we have, let us assume that the buffer is full, so in such a case the
full value has 6 which is equal to N, while the empty has a value of 0 indicating that
there are no empty blocks in the buffer, and full of 6 indicates that there are 6 full blocks
in the buffer. So the producer as usual will produce an item and then it will down empty
(mentioned in producer() arrow). Now you see that empty has a value of 0. So if you go
back to the down function of the semaphore it would cause the while statement to keep
executing continuously. So, the producer would be blocked or waiting on this particular
down semaphore.
Now, after a while when the consumer begins to execute, so it will execute the
down(full); so as a result, it is going to consume one particular element, so it is going to
decrement the value of full as will be seeing over here (mentioned above), so full goes to
5. And then it is going to remove an item and it is going to up(empty).
So now empty is set to 1, and then of course it is going to consume the item. Now setting
empty to 1 would result in the loop in the semaphore down to wakeup or for the loop in
the semaphore to complete to break (i.e down(empty) semaphore in producer()), and then
the producer will set empty back to 0 and insert the item into the buffer and then the full
value is set to 6 yet again.
415
So in this way, the semaphores full and empty are used to solve the producer-consumer
problem when the buffer is full.
A similar analysis can be made when the buffer is empty. In such a case the values of full
and empty are 0 and 6 respectively.
416
So, one thing we have not taken care about so far in the producer and consumer code is
that, we are not synchronizing access to this particular buffer. As a result it could be
possible that the producer may be inserting item into the buffer and at exactly the same
time the consumer may be removing an item from the buffer. So in order to prevent such
a thing to occur we use a mutex to synchronize access into a buffer.
So before accessing this particular buffer, a producer as well as the consumer would need
to lock the mutex i.e lock(mutex), and after accessing the buffer we unlock mutex i.e
unlock(mutex) needs to be invoked. As a result of this mutex we can be guaranteed that
only one of these two processes are executing in this particular critical section, that is
accessing the buffer at any given instant of time.
So, now there are several other variants of the producer-consumer problem, and there are
various schemes in which semaphores could be utilized to solve these various problems.
But, for this particular course we will stop with this particular example.
Thank you.
417
Week – 07
Lecture – 30
Dining Philosophers Problem
Hello. In the previous video, we had talked about semaphores. We had seen how
semaphores could be used to solve synchronization problems in the producer consumer
example that we had taken.
In this video, we will look at an other problem, where semaphores are useful. So, this is
the dining philosophers' problem and it is a classic example of the use of semaphores.
So let us start with the problem. So let us say we have five philosophers P 1, P 2, P 3, P
4, P 5 who are sitting around a table. Now in front of them, there are five plates; one for
each of the philosophers and five forks 1, 2, 3, 4, 5 (mentioned in above slide image).
Now each of these philosophers could do just one of two things. Each philosopher could
either think or eat.
In other words, if the philosopher is not thinking then he is eating and vice versa. Now,
in order to eat, a philosopher needs to hold both forks, that is the fork on his left and the
418
one on the right that is for instance if p 1 wants to eat then he needs to have the fork 1
and fork 2. These are the two forks, which are closes to him. Similarly, if P 3 wants to eat
then forks 4 and forks 3 are required. Now the problem is or the problem what we are
trying to solve is to develop an algorithm, where no philosopher starves that is every
philosopher should eventually get a chance to eat.
So let us start with the very naive solution to this particular problem. So let us say we
have a solution over here (mentioned above in a box), where we define N as 5
corresponding to each philosopher. And we have a function for philosopher i.e. void
philosopher(int i). So, this function takes a integer value ‘i' and this ‘i' could be values of
1 to 5 corresponding to each philosopher that is P 1, P 2, P 3, P 4 or P 5.
Now in the function, we have an infinite loop (as mentioned above) where the i th
philosopher will think for some time and then after some time he begins to feel hungry.
So, he will take the fork on his right, then take the fork on his left, then he is going to eat
for some time, and after that he is going to put down the left fork, and then put down the
right fork, and this continues in a loop infinitely.
So, for instance philosopher P 1 will think for some time, then feel hungry, then he would
pick up the right fork that is the fork number 1, then pick up the left fork that is fork 2,
then eat for some time and put down the forks, first 2 and then 1. So, this seems like a
very simple and easy solution to the problem. But as we will see that there are certain
419
issues that could crop up. The issues come because each of these philosophers which
essentially executes this function independently, or thinking, and feeling hungry, and
eating all independently.
So let us consider this particular scenario (based on above slide image). Let us say the
philosophers P 1 and P 3 have a higher priority that is whenever the request for the fork
to be picked up then the system will always ensure that they are given the forks. So, what
would happen in such a case? So we will get a case where P 1 eats whenever he wants,
and P 3 eats whenever he wants, while the other philosophers P 2, P 4 and P 5 which
have the lower priority in picking the fork will not be able to eat. For instance, the
philosopher P 2 neither could pick up the right fork or the left fork, and therefore, P 2
cannot eat.
In a similar manner, P 4 and P 5 have just one fork between them, which they could
possibly pick up; the other fork in each case would with high probability be given to the
philosophers P 1 and P 3. Thus P 2, P 4 and P 5 will starve; and this is not the ideal
solution for our problem.
So let us see another possible issue that could take place. So let us say by some chance,
all the philosophers’ pickup their right fork simultaneously (mentioned in above slide
image). So, we have philosopher P 1 picking up his right fork, philosopher P 2 picking
the right fork, P 3, P 4, P 5 pickup the right fork respectively.
420
Now, in order to eat, each of the philosophers have to pick up their left fork and this
could lead to a starvation. Essentially P 1 is waiting for P 2 to put down the fork, so that
he could pick it up. Then P 2 is waiting for P 3 to put down the fork; P 3 is waiting for P
4 to put down the fork; P 4 is waiting for P 5; and P 5 is waiting for P 1. So, essentially
we see that every philosopher is waiting for another philosopher, thus creating a chain.
And this waiting will go on infinitely leading to starvation, which we often call as a
deadlock.
So, to define a deadlock more formally; a deadlock is a situation, where programs

continue to run indefinitely without making any progress. Each program is waiting for an
event that another program or process can cause. So, you see in this case each of the
philosophers is waiting for a particular event that is putting down the fork, which an
other philosophers should do. So, there is a circular wait that is present and this leads to a
deadlock, there by starvation.
421
So let us look at another attempt to solve this particular problem. So let us say we have
the same function over here (mentioned in above slide). And this time the philosopher
takes the right fork i.e take_fork(R), then he would determine if the left fork is available
(if condition mentioned above); if the left fork is available the philosopher would take
the left fork i.e take_fork(L), eat for sometime i.e eat() then put down both the forks the
right as well as the left fork (i.e put_fork(R) and put_fork(L), and the loop continues as
usual.
However, if the left fork is not available, then we go to the else part (mentioned in above
slide image) and the philosopher will put back the right fork i.e put_fork(R). Essentially,
the fork which was picked up over here (initially), the right fork would be put back on to
the table, if the philosopher finds out that the left fork is not available. So this will allow
another philosopher to probably eat. And after this is done there is a sleep for some fixed
interval T i.e sleep(T) before the philosopher tries again.
Let us see what is the issue with this particular case? (mentioned in above slide) So let us
consider a particular scenario where all philosophers start at exactly the same time, they
run simultaneously and think for exactly the same time. So, this could lead to a situation
where the philosophers pick up their fork all simultaneously then, they find out that their
left forks are not available, so they put down their forks simultaneously, then they sleep
for some time and then they repeat the process. So, you see (mentioned in above slide
422
image) that the five philosophers are again starved. They will be continuously just
picking up their right fork and putting it back onto the table. So, this solution is not also
going to solve our purpose, since we have the philosophers starving again.
A slightly better solution to this case (as mentioned above) is where instead of sleeping
for a fixed time, the philosopher would put down the right fork and sleep for some
random amount of time (i.e sleep(random_time). While this does not guarantee that
starvation will not occur, it reduces the possibility of starvation. Now such a solution is
implemented in protocols like the Ethernet.
423
So let us look at a third attempt to solve this particular problem. So, this particular
solution uses a mutex (as mentioned in above image). Essentially before taking the right
or the left fork, the philosopher needs to lock a mutex. And the mutex is unlocked only
after eating and the forks are put back on to the table. So, there is a lock mutex over here
before picking the forks i.e lock(mutex) and an unlock mutex after the forks are put
down onto the table i.e unlock(mutex). So, this solution essentially ensures that
starvation will not occur, it prevents deadlocks.
However, the problem here (in above slide) is that because we are using a mutex, so at
most one philosopher can enter into this critical section (i.e in between lock and unlock
mutex). In other words, at most one philosopher could eat at any particular instant. So,
while this solution works, it is not the most efficient solution. So, we would want
something which does much better than this.
424
So let us look at our 4th attempt, and this one using Semaphores. So let us say that we
have N semaphores; so the semaphores s[1] to s[n] and we have one semaphore per
philosopher. So all these semaphores are initialized to 0; in addition the philosopher can
be in one of 3 states – hungry, eating or thinking. So, over a period of time, each
philosopher will move to one of these states. For instance when the philosopher is
thinking, the state will be thinking; then the philosopher becomes hungry, so it goes to
the hungry state then eating, and then back to thinking, and this process continuous till
the eternity.
So, the general solution that we will be seeing here (as mentioned in above slide) is that a
philosopher can only move to the eating state if neither neighbor is eating. That is a
philosopher can eat only if its left neighbor as well as its right neighbor is not eating. So,
in order to implement this particular solution, we have 4 functions. So, first is the
philosopher i.e void philosopher(int i), which is the infinite loop and corresponds to the
philosopher ‘i'. So the philosopher will think, then it will take forks, then eat for some
time and put down the forks and this repeats continuously.
Now, in the take forks function (mentioned in above slide), first we set that the
philosopher is in a hungry state. The state of the philosopher is set to hungry then the
function called test is invoked i.e test(i). So, what test will do is that it’s going to check
whether the state of the philosopher is hungry and as well as the state of the philosopher
425
to the left as well as to the right is not in the eating state (refer above slide image for test
function). If this indeed is true then the philosopher can eat. And at the end, after eating,
the forks are put down i.e put_forks() and the state of the philosopher goes to thinking.
So let us look at this, particular functions more in detail (mentioned in above slide
image). Let us say that these are the five philosophers P 1, P 2, P 3, P 4, and P 5. And let
us say initially that all of them are in the thinking state so this is represented by the T
over here (in the slide image). And as we know their initial value for the semaphores are
all 0. So let us say all the philosophers are in the thinking state. And then philosopher 3
goes from the thinking state and it goes to the take fork; and as a result the take fork
function i.e take_forks(3) for philosopher 3 gets invoked, and the state of the philosopher
then changes from thinking to hungry i.e state[3] = HUNGRY.
Then the function test is invoked i.e test(3), and this condition is checked (if condition
mentioned in test function); essentially the state of philosopher 3 is hungry. So this
condition is true then the state of the left philosopher is not eating, because P 2 is in the
thinking state as well as the right philosopher is not eating, because P 4 is also in the
thinking state. So, as a result, this condition (if condition) goes to true and the state for
philosopher 3 is set to eating, then the corresponding semaphore is incremented from 0 to
1. So the test function returns and thats the mutex which gets unlocked, and then there is
the down.
426
So, down as we know would check the value of s[i], and if this value is less than or equal
to 0 it is going to block or loop infinitely. And if the value is greater than 0, which is the
case over here (as mentioned above), then it just decrements the value of s[i], so thus
here s[i] had the value of 1, so the down will not block, but rather it is just going to
decrement the value of s[i] to 0. So, s[i] or the corresponding semaphore has a value of 0.
And then the philosopher 3 can go to the eating state and consume his food.
Now, let us see another situation, where philosopher 4 moves from the thinking state to
the take forks state. Now take forks get invoked for philosopher 4 here i.e take_forks(4),
and there is a lock mutex, the state is set to hungry for philosopher 4 i.e state[4] =
HUNGRY and there is test(i) which is invoked. So, in the test(i) function, we see that the
first condition is met for philosopher 4, because P 4 is indeed in a hungry state; however,
the state left is in the eating state. Therefore, this entire if condition (mentioned above in
test function) evaluates to false, and therefore, execution does not enter this if loop,
rather it just skips the if part of it. Now we go back to the unlock and down.
So what we see now is that the semaphore value corresponding to P 4 has a value of 0.
So, when down is invoked (mentioned inside take_form function) as we know, it would
lead to the process getting blocked. So, thus philosopher P 4 will get blocked, so this P 4
will continue to be blocked as long as P 3 is eating. Then after a while P 3 decides to put
down the forks (i.e put_forks function executed), and sets it state back to the thinking
427
state. Then it would invoke test with the left philosopher i.e test(LEFT) this is with
respect to P 2, which in our case it is not really interesting so we will not look at that. But
what is interesting is the test right i.e test(RIGHT) and the right here is corresponds to P
4 so this will invoke here.
And what we see is the state of P 4, so remember that test of right so this is invoked with
i having the value of 4 (function test(4) invoked from test(RIGHT). Thus we see that the
state of ‘i’ is hungry, because P 4 is hungry. The left and the right is not eating, therefore,
this condition (if condition mentioned above in test function) will be set to true and
execution comes into this if. Consequently, the value of state for the philosopher 4 would
be set to eating and the semaphore value is set to 1 for P 4.
Now setting the value of 1 for the semaphore will cause the wakeup to occur. So P 4
which was blocked on the semaphore would wake up. And as a result of this (i.e
down(s[i]) the value of the semaphore gets decremented to 0. Thus the P 4 philosopher
would wake up and start to eat.
Thus, we see that the semaphores had efficiently ensured that the different philosophers
could share the five forks, which they have in common. And it will ensure that every
philosopher would get to eat eventually. So, there are other variance and solutions for the
dining philosophers' problem which are very interesting to read, but we will not go into
that for this particular course.
Thank you.
428
Week – 07
Lecture – 31
Deadlocks
Hello. In the previous video, we had seen the dining philosophers’ problem. And we had
seen for the initial naive solutions that we had for the problem, it could result in
something known as deadlocks. So, in this video, we will look at more in detail about
deadlocks, and how they are handled in the system.
So let us say we have two processes A and B, and we have two resources R 1 and R 2 (as
mentioned in above slide). So, these resources could be anything in the system which
have a limited quantity. For example, the resources could be as something as small as a
file or stored in the disc or it could be a printer which is used to print or a plotter, a
scanner and so on.
So, essentially the arrow from R 1 to A indicates that A is currently holding the resource.
For instance, if R 1 is the file; that means, A has currently open the file exclusively and is
doing some operations onto the file. In a similar way, the resource R 2 is held by B. So if
429
it is a printer, for instance, if R 2 is a printer, it means that B is currently using the printer
to print some particular document.
Now, consider this particular scenario (as mentioned in above slide) where the process A
holds the resource R 1, and process B holds the resource R 2; but at the same time,
process A is requesting to use R 2. So, essentially the process A is waiting for R 2 to be
obtained; and process B is waiting for the resource R 1 to be obtained. So, to take an
example, process A is opened the file and is using a particular file which is stored in the
disk.
And at the same time, for instance, it wants to print the file to the resource R 2 which we
assumed was a printer. Now in a similar way, process B is currently holding this resource
(i.e printer) that is using this particular resource (R2) and it wants to open and utilize this
particular resource R 1 (i.e file).
So, what we see that over here (as we discussed above), we have a scenario called a
Deadlock. Essentially, a deadlock is a state in the system where each process in the
deadlock is waiting for an event that an other process in that set can cause. For instance,
over here (as mentioned in above slide), the process A is waiting for the resource R 2
which is held by B; B in turn is waiting for R 1 which is held by A. We have a set of two
processes A and B; and each process in the set is waiting for the other process to do
something and each process in the set is waiting for the other process to do a particular
430
thing. For example, A is waiting for B to release this particular resource R 2, while B is
also waiting for A to release the resource R 1.
So, deadlock like this is a very critical situation that would occur in systems. And when
this deadlocks occur, it could lead to process A and B in this case waiting for an infinite
time continuously waiting without doing any useful work. So, such deadlocks should be
analyzed thoroughly. So, in this particular video, we will see how such deadlocks are
handled in systems. Now, in order to study deadlocks, we use graphs like this (mentioned
in below slide), these are known as Resource Allocation Graphs.
Now resource allocation graphs or directed graphs used to model the various resource
allocations in the system. And there by determine whether a deadlock has occurred or a
deadlock is potentially going to occur and so on. So, in this directed graph, we represent
resources by a square. So, instance R 1 and R 2 are resources and they are represented by
the square as shown over (mentioned in below slide). In a similar way, circles as shown
over here (as mentioned in below slide) are used to represent processes.
431
And as we have seen before, arrows from the process to the resource that is directed from
the process to the resource would indicate that a request is made for that resource. For
example, over here the arrow from A to R 1 indicates that A is requesting for resource R
1; similarly B, in this case is requesting for resource R 2 (mentioned in above slide). So,
these requests are made to the operating system and if possible the operating system will
then allocate that resource to the corresponding process.
432
So, when that happens the graph will look like this (as mentioned in above slide),
essentially the direction of the arrow has changed. Now the arrow moves from R 1 to A,
indicating that A holds resource R 1. Similarly, the arrow from R 2 to B indicates that B
holds resource R 2. There are four conditions in order that a deadlock occurs. So, we will
now look at each of these conditions for a deadlock.
So, the first is Mutual Exclusion. So, what we mean by this is that each resource in the
system is either available or currently assigned to exactly one process. So, for instance
over here (refer to above slide image) we have resource R 1 which is free (1 st figure). So,
it is not assigned to any particular process, so this is fine. While this is also fine (2 nd
figure) where the resource is allocated to exactly one process, but in order that deadlocks
happen this kind of scenario (3rd figure) should not be present that is the resource cannot
be shared between two processes A and B.
433
The next condition for a deadlock is Hold and wait (refer above slide image) that is a
process holding a resource can request another resource that is for example, in this case
(figure mentioned above) the resource R 1 is held by process A. and while having R 1, A
is also requesting for another resource R 2, so it essentially holding R 1 and waiting for
R 2.
The third condition for a deadlock to happen is No preemption (refer above slide image).
Essentially, it should not be the case that resources which an operating system previously
434
granted for a particular process is forcibly taken away from that process. That is the OS
or another entity in the system cannot forcibly remove a resource which has been
allocated to a particular process. Instead processes should explicitly release the resource
by themselves that is whenever the process wants it should release the resource by itself.
So, a fourth requirement is the Circular wait (refer above slide image). What this means
is that there is a circular chain of 2 or more processes, each of which is waiting for a
resource held by the next member of the chain. So, we see over here (mentioned in above
circular figure) with that we have a circular chain and there is a wait over here because
process A is waiting for process B to release resource R 2; and process B is in turn
waiting for process A to release the resource R 1. So, we have a circular wait condition
over here.
435
So, these 4 conditions: mutual exclusion, hold and wait, no preemption and circular wait
must be present in the system in order that a deadlock could occur. So, if for instance, we
were able to build a system where one of these conditions, were not present. For
example, suppose we build a system where processes cannot hold a particular resource
and wait for another resource at the same time. So, such system would never have any
deadlocks.
On the other hand, suppose a system has been developed where all of these things are
possible (as mentioned in above slide) that is there is a mutual exclusion when using
resources, a process could hold a resource and wait for another one, once allocated they
cannot be forcefully preempted from the resource and circular wait mechanisms are
allowed then deadlocks could potentially occur. So, having all these conditions does not
imply that a deadlock has occurred. It only implies that there is a probability of deadlock
occurring in the future.
436
So, this being said “a deadlock in a system is a chanced event”. Essentially, it depends on
several factors such as the way resources are requested by processes, the way allocations
are made for these resources, the way de-allocations are made by the operating system
and so on. So, only if certain order of these requests an allocation happen and only then
will a deadlock occur. So, a small variation in the request and allocations may cause the
deadlock to not occur.
So let us see some examples of this (mentioned in slide time: 11:33). So let us say we
have 3 processes in the system A, B and C, and there are 3 resources as well R, S and T.
So, each process could request and release resources at sometime during its execution.
So obviously, a release can be made by a process only after the request is made. So, these
request and release of resources are given to the operating system at various time
instance, depending on how A, B and C get scheduled and how they are executed.
So let us consider this particular sequence that A requests R, and then B requests S, C
request T. Then A requests S, B requests T, and C requests R. So, this is one potential
order for how request occur. So, we can use our resource graphs or resource allocation
graphs to view this (mentioned in slide time 11:33). So, we see that A requests R and the
operating system will then allocate the resource R to A, then corresponding to B requests
S, the allocation will be of S will be to B. Then C requests T and the OS will allocate T
to C.
437
Then A requests S, so there is a line like this (arrow pointing to S from A). B requests T
there is a line over here (arrow pointing to B from T) and C requests R. So, we have the
four conditions that we have seen in the previous slide, have all been met. For instance,
the circular wait you see is achieved here. R is held by A, while A requests S; S is held
by B and at the same time B request T. Now T is held by C, while simultaneously C is
requesting for R. So, you see that each process is waiting for an other process in this set
to release a particular resource. So, we have a deadlocked scenario over here.
Now, we will see that if the same requests and allocations are done in a slightly different
manner then the deadlock will not occur. For example, R is allocated to A (fig. (l) as
above), and then T gets allocated to C (fig. (m) as above); then A requests S and gets
allocated to A (fig. (n) as above). Then C requests R, so we have this over here
(mentioned above in fig.(O)). Now, A releases R, so there is no more line over here (fig.
(p) as above) or an edge over here then A releases S, and therefore R can be allocated to
C (fig. (q) as above).
So, you see that depending on the requests and releases, we are able to achieve a
situation where each and every request or release can be serviced by the operating
system. So, in such a scenario, we have not obtained the deadlock.
438
Now, we have seen that deadlock is indeed a probabilistic event and could occur with
some probability. Now one way to reduce the probability is by having multiple resources
present in the system. So, we had seen that this was a deadlock state (mentioned in above
slide left side), essentially because A is waiting for resource R 2 to be released by B, and
B in turn is waiting for resource R 1. Now one way this can be solved is by having
multiple resources, while this particular solution will not always work, and essentially
depends on the type of resources, it may help to some extent.
For instance, if we have 2 resources of exactly the same type then both A’s request as
well as B’s request could be managed that is the resource if we have 2 types of R 1 or in
other words if you have a duplicate of the resource R 1 then that can be given to A as
well as B. Similarly, a duplicate of the resource R 2 can be given to A and B
simultaneously (mentioned in above image right side). So, what this means is that for
example, we could have 2 printers present and A can be allocated one printer while B
could be allocated the other printer while this does not completely eradicate deadlocks, it
may reduce the likelihood the deadlocks may occur.
439
Now, the next question is in a system which could have deadlocks, should deadlocks be
handled? So, this is a debatable question, essentially is not an easy thing to answer,
because the cost of having a prevention mechanisms or to detect deadlocks is extremely
high, and it will cost huge overheads in the operating system.
So, the other aspect was known as the ostrich algorithm is to completely ignore that
deadlocks could occur and run the system without any prevention or any deadlock
detection mechanisms. So, either having some deadlock prevention or detection
mechanisms or just ignoring the entire aspects of deadlock would need to be made
during the OS design time or rather the system design time. So, various things need to be
discussed before a decision can be made, such as what is the probability that a deadlock
occurs? Is it likely that a deadlock will occur every week, or every month or once in 5
years or so on?
Second what is the consequence of a deadlock? Essentially, how critical a deadlock could
be? For instance, if a deadlock occurs on my desktop, I could simply reboot the system
and it’s not going to affect me much. On the other hand if a deadlock occurs, say in a
safety critical application like a spacecraft or a rocket kind of scenario, then the
consequence could be disastrous. Therefore, we need to argue about these two aspects,
essentially if the probability that a deadlock occurs is very frequent then probably the OS
would require some measures in order to handle the deadlock.
440
On the other hand, if the deadlock occurs very sporadic may be on average once in 5
years or so then you may not require to have or handle a deadlock in the operating
system. So now let us assume that we need some mechanism in our operating system to
handle deadlocks. So, what can we do about this?
Essentially, there are three ways that deadlocks can be handled. One is by detection and
recovery, second by avoidance and third by prevention.
441
So let us look at the first case, that is detection and recovery and first how are deadlocks
detected? Essentially for the operating system to detect deadlocks, it requires to know the
current resource allocation in the system. Essentially which process holds which
resources and it also requires to know the current request allocation by the processes.
Essentially which process is waiting for which resources, and the OS will then use this
information to detect if the system is in a deadlocked state.
So, the way the detection could work is by finding cycles in the resource allocation
graphs. For instance, if this is the resource allocation graph (mentioned in above slide
image) for the various resources in the system, the OS will detect a cycle present. For
example, over here (processes inside orange box), we have a cycle between the processes
D, E and G then the OS will say that these 3 processes are indeed in the deadlock state.
442
Now the other aspect is there could be request as shown over here (mentioned in above
slide in blue cross), where there is resource S and it is requested simultaneously by A, D,
F and C. So, we see that this is not in a deadlock state because there is some sequence of
allocation of S to all these processes. For instance one possible allocation came for S is
that first S be allocated to A then A will use the resource S for some time, and then after
it complete using S, S can be then allocated to C, and after which S is allocated to F, and
then D. So, essentially the allocation of S could be sequential among these 4 processes.
So, this will not have a deadlock.
443
However, the presence of a cycle in the resource allocation graph will indicate that a
deadlock is present in the system.
This technique of finding cycles in the resource allocation graph would work well with
systems where there were one resource of each type. Now suppose we had systems
where there were multiple resources of each type then another algorithm would be
required.
444
So let us give an example of how that would work (mentioned in slide time 23:45). Let
us say in a system we have 4 resources: tape drives, plotters, scanners and CD ROMs.
Further, there are 4 tape drives, 2 plotters, 3 scanners and 1 CD-ROM. Then let us also
say that we have 3 processes executing in the system P 1, P 2, P 3, and this is the current
allocation matrix (refer above slide). So, the row 1 is with respect to process P 1. This
means that the process P 1 does not have any tape drives, does not have any plotters, is
allocated 1 scanner and no CD-ROMs. Similarly, process 2 is allocated 2 tape drives and
1 C D ROM (2nd row of matrix) and process 3 is allocated 1 plotter and 2 scanners (3 rd
row of matrix).
Now, this particular set A determines the resource available (i.e A = (2 1 0 0); essentially
out of the 4 tape drives that we have, if you look into this corresponding column over
here in the current allocation matrix (refer above slide image) we have seen that 2 is
used. So, what remain is 4 - 2 that is 2 tape drives are free and available to be allocated.
So, similarly if you look at plotters, out of the 2 plotters 1 is allocated and 1 is free. Out
of the 3 scanners, we see that all 3 scanners are allocated, so it is 0 over here (i.e in
resource available set). Similarly, there are no CD-ROMs also available to be allocated.
Now, in addition to this we have a request matrix (mentioned in above slide), essentially
which process is waiting for what, is represented in this matrix. For example, process p 1
requests 2 more tape drives and 1 CD ROM (1 st row in request matrix). So, process P 2
requires 1 tape drive and 1 scanner (2nd row of matrix), and process P 3 requires 2 tape
drives and 1 plotter (3rd row of matrix). Now the goal of having such definitions or such
representations of the resources allocations and requests is to determine if there exists a
sequence of allocations of these requests, so that all request can be met. If such a case is
possible, then there is no deadlock present. Let us see if we can allocate these requests by
the 3 processes.
445
So let us take a process P 1 and let us see (refer above slide request matrix) the request
by process P 1 can be met. So, it requires 2 tape drives and we see that 2 tape drives are
available. So, this is fine and 1 CD-ROM is requested, but there are no CD-ROMs is
available. So, process P 1 cannot execute. So, the process P 1 cannot be allocated all its
resources. So, it cannot continue to execute.
Let us see for about process P 2. So, process P 2 requires 1 tape drive which can be
allocated to it because it is available; and it requires 1 scanner, but 0 are available. So,
446
similarly, process P 2 cannot be satisfied as well, and process P 2 will also need to wait.
Now let see the 3rd case where for process 3, there are 2 tape drives which can be met, 1
request for a plotter which can be met and 1 request for a scanner which cannot be met
(mentioned in above slide image).
So, process P 3 also cannot be satisfied its request. So, P 3 also will wait. So, P, 1 P 2 and
P 3 are all waiting for the request, and therefore the state is in a deadlock because none of
the requests by any of these processes can be met and therefore, all the processes will
need to be waiting.
If you look at another example where there is a small change (mentioned in above slide
image). We have just reduced the number of scanners required by a process P 3
(mentioned above in 3rd row of request matrix). So, in such a case, we see that both the
tape drives requested by process P 3 can be met by the system, the single plotter can be
also met and there are no scanners and no CD-ROMs that are required. And therefore, we
see that all the request can be allocated to process P 3, therefore P 3 can be satisfied and
we do not have a deadlock over here. So, in this way, deadlocks can be detected by the
operating system, based on the current allocation matrix and the request matrix.
447
So, once the deadlocks is detected, what next should the operating system do? So, there
could be various things that the OS could do. So, one thing is that it could raise an alarm,
that is tell users and administrator that indeed deadlock has been detected.
Then a second way is to force a preemption that is you force a particular resource to be
taken away from a process and given to another process. For instance, it could be like R
2 was a printer and it is currently held by B (refer above slide image). So, what could be
done was that the printer could be forced to be taken away from B, while it is allocated to
448
A for some time and thus the deadlock will be broken, this is as shown over here. So, B
no longer has other resource R 2, but R 2 is given to A.
Then a third method is by using a technique known as rollback. So, with rollback, both
processes A and B as they execute will be check pointed (mentioned in above slide
image). So, by check point we mean that the state of the process gets stored on to the
disk. The process executes for some time and then the entire state of the process gets
stored on to the disk. Now storing the state of the process on the disk, will allow the
processes to execute from the point where it has been check pointed, now the checkpoint
state. So, as time progresses more check points are taken periodically as shown over here
(dotted lines in above image).
Now, let us say a deadlock is detected after sometime (mentioned in above slide image)
then what could happen is that the system could rollback to the last non deadlock state
that is over here (last dotted lines). So, how the rollback occurs would be by loading the
state of process A and B in this particular example to the last known state as shown over
here (last dotted lines in above image).
Now process A and B will continue to execute from this state (last dotted lines in above
image) and the deadlock may not occur again. Essentially we have seen that since
deadlock is a probabilistic event by modifying the ordering in which the allocations are
made we could prevent this particular deadlock.
449
A fourth way is to kill processes. Essentially, if process A and B are in a deadlock state,
killing one of these processes would break the deadlock. So, typically the lower
important or the less priority process would be killed.
Thank you.
450
Week - 07
Lecture - 32
Dealing with Deadlocks
So there are 3 ways to handle Deadlocks. That is by Deduction and Recovery from
Deadlocks, Deadlock Avoidance and Deadlock Prevention.
In this video, we will have a look at Deadlock Avoidance and Prevention. By avoidance
we mean that, the system will never go into a state which could potentially create a
deadlock situation. To have an example about how avoidance algorithms work, we will
just take very simple example.
451
So, let us say we want to go from this point A (map’s blue dot in above slide image) this
is a place, to this point B (red mark) and as usual there are multiple routes which could
take us there. So, this is one particular route (gray color route) and this is another route
(purple color route). Now we need to make a choice or rather the right choice that will
take us to this particular place B in a safe way. For example, if we may go through this
route (using purple route) it may be dangerous or may have some particular blocks on the
road.
On the other hand going through a third way may have a similar situation. So, out of the
multiple ways which we can go from one point to another, we need to make a choice of
which path we need to go. Deadlock avoidance algorithms works in very similar ways.
Let us take a particular example with two processes.
452
Process 1 and process 2, and we will represent the state of the system by this particular
graph (refer above slide image). Now on the x axis, are the instructions executed by
process 1 and the y axis corresponds to the instructions executed by process 2 with
respect to time; so both are to with respect to time. Thus, process 1 and process 2 begin
to execute from the origin that is this point over here (arrow mentioned in bottom of the
image) and they complete execution at the other end that is the point over here (arrow
mentioned on top). Now, as these process executes in the system the state of the system
will change and the state changes depending on how these two processes execute with
respect to each other.
So let us say this state is represented by each of these blocks over here. Now, this line
over here indicates the execution path. For example, over here (blue line indicating in
above graph) the line is horizontal indicating that instructions’ corresponding to process
1 gets executed. So, as a result of process 1 executing while process 2 not executing, we
get a state of the system over here which is shown by this (first square) small square,
then the state is shifted to this as shown over here (second square) and then at this
particular point (first vertical blue line in graph) there is a context switch and process 2
executes.
453
So as a result of process 2 being executed, the state of the system gets shifted to this
point. Then process 1 executes again and the state changes, then process 2 get executed,
process 1 and so on (refer blue graph line). And at the end we will have both process 1 as
well as process 2 reaching this particular point (arrow on top in above slide); essentially
both processes have completed execution.
Now, you see depending on how the operating system schedules, how various resources
are allocated or de-allocated to process 1 and process 2, this particular trace or rather the
execution trace of the process 1 with respect to process 2 would vary. For example, in the
worse case it could be the process 1 executes continuously till it completes and then
process 2 executes completely till it completes or another technique could be that process
1 executes a few instructions, then process 2 executes the few instructions, then process
1, process 2, process 1 and so on, till both of them complete.
Now, let us also state that process 1 and process 2 requires a resource R1 during its
execution. So, process 1 requires the resource R1 at this particular time. So, this
particular time over here which is shown at this point (horizontal R1 arrow mentioned in
above slide image), so, these instructions in process 1 requires the use of resource R1.
So, this has been highlighted by this purple block over here (refer slide time 05:08).
454
Now, in a similar way during this time from this point to this point (vertical R1 arrow in
above slide), process 2 requires the use of resource 1, so this has been highlighted over
here (black square in above slide graph). Now this intersecting square is the region where
both process 1 as well as process 2, require resource 1. So, this intersecting square
represented by the intersection of these two regions corresponds to the instructions in
both processes that request resource R1.
Similarly, let us say that there is another resource R2 which is required in these particular
instructions by process 2 (vertical R2 arrow in above slide) and these particular
instructions in process 1 (horizontal R2 arrow in above slide), and correspondingly
obtained a intersecting square as shown here (blue square in above slide graph) and this
square represents the part where both processes p 1 and p 2 request resource R2. Now,
what we will see is the intersecting part of R1 and R2 resources. So, if we actually
intersect this graph (refer slide time 05:08) with this graph (refer slide time 06:15), we
get the points where both processes request R1 that is this purple part here (refer below
slide image), and this part (3 blue blocks) is where the both processes require R2, and the
intersecting area is where both processes require both resources R1 and R2 (red block in
below slide image).
455
Now, this area is which may potentially cause a deadlock (red block in above image). So,
we call this as an unsafe state. So, an unsafe state is not a deadlock state, but rather it is a
state which may potentially cause a deadlock. Thus, if the OS was to schedule the
process 1 and process 2 in such a way that the execution path ends up in this unsafe state
then, we may potentially have a deadlock. On the other hand, if the OS schedules process
1 and process 2 execution in such a way that this unsafe state is avoided then the
deadlock is avoided. So, this is the essential technique used to avoid deadlocks in
systems.
456
So the next question is - is there an algorithm that can always avoid deadlocks by
conservatively making the right choice? That is can we ensure that the system never
reaches an unsafe state by making some choices in the way processes execute or the way
resources are allocated and so on.
So, one way that we can build such an algorithm is by using what is known as a Banker's
algorithm. So, let us see an example with this.
457
Now, consider a banker with 3 clients A, B and C (refer above slide image). Each client
has certain credit limits and the sum of all these credit limits totals to 20 units. Now the
banker knows that the maximum credits of 20 units will not be used all at once so he
keeps only 10 units. To see what this means, let us say we consider this table (mentioned
in above slide image) with the 3 clients A, B and C and each client has in his account
some units. So, for instance A has got 3 units, B has got 2 units and C has got 2 units, so
the total of 7. And since, the banker only keeps 10 units so the remaining 3 units are free
and not allocated to any of the clients.
On the other hand, the maximum credits for each client is 9, 4 and 7 respectively (refer
slide time 09:12). So, you note that 9 + 4 +7 is the maximum credit limits which totals to
20 units. And this maximum credit limits is declared by the client in advance. Now,
given this particular table the banker should be able to allocate the maximum credits to
users in such a way that no unsafe state is reached. So, let us see this with examples.
458
So, the problem we are trying to solve here is whether these allocations maximum of 9, 4
and 7 can be made to A, B and C in such a way that the state is safe or rather such an
allocation can be made. So, let say we start with 2 units being allocated to B (mentioned
in above slide image). So, we have 3 units in the free store, out of these 3 units - 2 are
allocated to B. Therefore B has now 2 + 2 that is 4 units, and the free store now has 3 - 2
that is 1 unit free (refer table “Allocate 2 units to B” in above image). Now, after B
completes all the 4 units are returned to the free store (refer table “B completes”), so 4 +
1 will be 5 units present in the free store and these 5 units can then be allocated to C. So,
C will get its maximum credit limit of 5 + 2 that is 7 units as seen over here (refer table
“Allocate 5 to C”). So, the amount present in the free store is 0 units.
Now, after C completes all the 7 units of C get returned back to the free store (refer table
“C completes”). Now the banker could allocate the resources required by A that is the
maximum resources required by A that is of 7 more units to get its maximum credit limit
of 9 units. So, we have 9 units allocated to A and 1 unit which is free (refer table
“Allocate 7 units to A”). Thus we see with this particular start point or this particular
state of the system, where there are 3 units free and there is maximum of 9, 4 and 7 units
as the credit limit of each client can have (refer first table in above slide). We see that
with this particular state there is a schedule which is possible so that all requests and all
459
maximum requests by the clients can be fulfilled. So, we call such a state as a Safe State
and the banker could go ahead and make such allocations.
Now, let us look at another example where the state we start with, is that A has 4 units
with it, B has 2 and C has 2, so a total of 8 units are present with A, B and C respectively
(refer above slide image). And the number of free units is 2 and the maximum credit
limit of each client is as before 9, 4 and 7 respectively. Now, we see that irrespective of
how the banker tries to allocate these remaining units to each of its clients, there is no
schedule which is feasible that could fulfill these requirements.
For example suppose B were allocated 2 more units and you see that there are only 2
units free, so B is the only client which can be serviced. Therefore, B gets 4 units while
there are 0 free units present (refer table “Allocate 2 units to B”). Now after B completes
we have 4 units free (refer table “B completes”), but you see that neither A which
requires 5 more units, nor C which requires 5 more units as well can be serviced with 4
units which is present in the free store. Therefore, we see that this state is an Unsafe State
because there is no scheduling order which can cater to these requirements (process
requirements 9, 4 and 7). So, this is an unsafe state and the banker should ensure that
such a state is not reached at any point.
460
So a similar mechanism can be also implemented in the operating system, where

corresponding to each request the operating system will determine if the system is in a
safe state. And if indeed the system is in a safe state indicating that there is a schedule of
the resources so that all request can be granted, then the requests are allocated. Otherwise
the request is postponed until later.
461
The third way to prevent deadlocks is what is known as Deadlock Prevention. Essentially
we prevent deadlocks by designing systems where one of the 4 conditions is not satisfied
(conditions mentioned in above slide image). So, we have seen these 4 conditions and
these are the conditions which are essential for a deadlock to occur. By preventing one of
these conditions from holding in the system, we can therefore prevent deadlocks. For
example, if we design a system where hold and wait condition cannot be satisfied, that is
a process cannot hold a resource while waiting for another resource then such a system
will not have a deadlock.
So, let us see various ways in which we can actually prevent one of these conditions
from happening (refer below slide for all prevention techniques).
So Preventing Mutual Exclusion, so in practice this is not feasible. We will not be able to
always prevent mutual exclusion. For example, take the case of a printer resource so; it
always needs to work or print corresponding to a particular process. So, it cannot be
simultaneously shared with two processes unless it has a spool present in it. However,
this being said, the operating system can ensure that the resources are optimally
allocated.
462
So let us see the Hold and wait. So, one way to achieve this is to require all processes to
request their resources before they start executing. So, this obviously is not an optimal
usage of the resources and also may not be feasible to know the resource requirements
during the start of execution.
Now, let us look at the third way there is no preemption. So, Pre-empt the resources such
as by virtualization of resources (example by printer spools). So, in a printer spool we
could have several processes sending documents to be printed at exactly the same time.
So, all these documents are spooled or buffered in the printer and eventually each
document is then printed one after the other.
The final technique we can target is the Circular wait. So one way to prevent this is that
process holding a resource cannot hold a resource and request for another one at the
same time. Second way to prevent the circular wait is by ordering requests in a particular
order - either sequential or hierarchical order.
So let us look at the hierarchical order of resources. Here resources are grouped into
levels that is they are prioritized by some numeric value. A process may only request
resources at a higher level than any resource it currently holds, and resources may be
463
released in any order. For example, let us say we have semaphores s1, s2 and s3 with
priorities in increasing order that is s3 is the highest priority, while s1 is the lowest
priority (mentioned in above slide image). So, let us say a process makes this particular
sequence of semaphore requests S1, S2 and S3. So, we see that an ordering is followed
that is S2 is greater than S1, and S3 is greater than S2 (i.e down(S1); down(S2);
down(S3)). So, this kind of allocation should be allowed.
However, in this case where we have the first S1 then S3 and then S2 (i.e down(S1);
down(S3); down(S2)), so in this particular case the allocation is not allowed because the
ordering is not maintained.
Thank you.
464
Week – 07
Lecture – 33
Threads (Light Weight Processes) Part 1
Consider this particular problem. If it takes one man, 10 days to do a job then how many
days will it take 10 men to do the same job? So, we have all done such problems during
our school days, and the answer is quite simple to compute. Essentially it would take 10
men 1, day to do the job.
Now the job is done much more quickly because the 10 men are working in parallel. So,
each man just needs to do one-tenth of the entire job and the entire job is just completed
within a single day. So, this very same concept of parallelization is applied extensively in
computer systems. Essentially, parallelization in computer systems are used quite often
to improve performance of applications. So, what is done is that if there is a large job to
do, we will parallelize it in the system. So that different computing entities, these are
computing entities which are very small and do a small portion of the larger job and
these computing entities execute in parallel and all together the job will complete much
more quickly.
So, these days parallelization is extensively supported by several computer hardware. In

fact, there are some hardware that is present which are called GPUs or Graphics
processing units which are capable of performing thousands of tasks in parallel. So, each
of these tasks are a very small part of a larger job, but the fact that all these tasks are
done in parallel will achieve a much more lesser time to complete the larger job.
Now one important construct when doing parallelization is threading or threads, threads
are essentially execution entity, very much like processes which we have studied in the
previous videos. However, threads are extremely light weight processes, and they are
essentially used to make parallelization much more easy.
In this video we will look about threads and essentially provide an introduction to
threads. So, the video will cover how threads are created and destroyed, how it differs
from processes, and how different operating systems support threads in different ways.
465
So let us start this particular video with an example. So let us consider this particular
program (refer above slide image program), essentially we have a function here called
addall() which is invoked from main() and returns unsigned long. So, this function
addall(), just sums up or adds up the first 10 million positive integers and the sum is
returned over here (inside main() function to variable sum), and printed in the main
function. So obviously, this particular program could be written in a much more different
and much more easier way, without having to iterate through every number from 0 to 10
million. However, for stay sake of understanding threads, let us go with this particular
program.
So, what we have seen in the earlier videos is that after we write this program, we use a
compiler such as gcc which creates an executable and then that executable is run to
create a process. So, we will eventually have a process which has its own memory space
and part of it may be in the RAM (refer above slide image) and now let us see what
would happen if this process is executed on a system with say 4 different processors. So,
what happens is that the scheduler in the operating system is going to pick up this
process and assign it to one of these processors. So, always once assigned this process
will be executed in one of these processors, and a rarely this process may migrate to
other processors, but let us not get concerned with that.
Now, what we observe here is that even though we have 4 processors present in the
466
system, this entire program (refer slide time 03:12) which we take considerably the long
to execute on a system. Just runs on a single processor leaving the other 3 processors
ideal or doing some other tasks. Thus the execution time of this entire program will be
quite large because this sum is found sequentially by iterating through all numbers from
1 to 10 million in a sequential manner as done in this particular while loop. So, given that
we have hardware like this with 4 processors, this is not a very good way or efficient way
to write this particular program. So, what can we do better to make this program
parallelizable and make it to execute on all these 4 processors.
So, what we will do is that we will take 10 million numbers and divide it by 4 to get 4
quarters of 2.5 million each. So, then we would create 4 processes and each process will
loop through a quarter of these numbers; that means, the first process would loop through
numbers from 0 to 2.5 million and find the sum of all these numbers, process 2 will find
the sum of the numbers from 2.5 million to 5 million, a process 3 would find the sum of
numbers from 5 million to 7.5 million and process 4 will find the sum of numbers from
7.5 million to 10 million.
So if we are able to create 4 processes in such a way, then when we execute these 4
processes it will be something like this (mentioned in above slide image). So, each of
these 4 processes execute on a different processor in the system and an each of these
processes just do a smaller job of finding the sum of 2.5 million numbers instead of
467
finding the sum of the entire range of 10 million numbers.
Further, since each process is scheduled in its own processor therefore, we get
parallelization. Thus what one would expect is that the entire program of finding the sum
of the 10 million numbers can be completed 4 times faster. So, this seems to solve our
problem of parallelization and it will more effectively use the processors present in the
system. And some of the properties that we have of this particular approach are as
follows. So, in order to create this particular 4 processes (mentioned in slide time: 06:07)
we would require 4 fork system calls. So, each of the fork system calls that we invoke
would create one of these processes.
Further what we notice another property of this particular model of speeding up or

achieving parallelization, is that each of these processes are isolated from each other.
That is each process has its own set of instructions, data, heap and the stack and these
segments in this process are isolated from the other processes among these 4.
Next due to this isolation we require inter process communication techniques, as we have
seen in a previous video in order to communicate with one process with another. So, we
have seen that IPCs can be achieved in several ways. So, one way is by having a send
and receive system calls and the operating system will help in creating a channel through
which IPCs are achievable. The other ways are by shared memories, in which case a
shared memory is created in one of these processes (4 processes mentioned in above
slide) again by use of system calls and this shared memory can then be used for IPCs
between the processes.
Further, since there are various stages in the process, such as the creation, destroying or
file operations and so on. So, all these process management activities are done through
system calls. And also the last point as we have mentioned, each process has its own
isolated memory map comprising of instructions, data, stack and heap. Now, what we
notice among all this from these properties is that there are considerable amount of
system calls involved in solving this problem in this particular way that is by having
multiple processes to perform parallelization.
And as we have seen in an earlier video these system calls have considerable overheads.
So, every time one of these processes invokes a system call it results in the operating
system being executed, being triggered due to a software trap and the operating system
468
then determines what system call has invoked and the corresponding system call handler
will be executed. So, as we have seen in an earlier video this entire process could be
quite considerable.
An other overhead comes from the fact that a large portion of these 4 processes are
similar. For example, all these 4 processes would have the same set of instructions which
they operate upon and also there could be the same array which they are accessing, same
global data which they are accessing and therefore, what we see is that there are a lot of
duplication of instructions and data which is happening due to having multiple processes
which execute a job in parallel. So, is there a way we can do better than this? Is there a
way we can have a solution, where the amount of overheads present in parallelization
can be reduced? So, in order to achieve this, what is used is the concept of Threading.
So, with threading like before, we divide the entire space of 10 million into 4 parts of 2.5
million each. But instead of creating 4 processes as we have done earlier, we will only
create one process and within this process we create execution entity which we call a
thread. So, each of these threads (mentioned in above slide, one process with 4 threads)
quite like the previous idea where of using parallelized processes, would have a loop
which does a quarter of the work that is each of these threads would find the sum of 2.5
million numbers.
So, pictorially this is what it’s going to look like (refer slide time 12:02). So, we have 1
469
process, so, this 1 process would mean that it has one set of instructions, a global data,
heap and other segments, and within this process we have 4 executing context. So, these
4 executing context are the 4 threads and they execute within the particular process. And
each of these threads are executed in a different processor or typically executed in a
different processor. For example, the red thread goes into processor 1 and its executing
here (executes inside processor 1), the blue thread goes into processor 2, green and
yellow go into processor 3 and 4 respectively.
Thus we are able to achieve parallelization by using threads. So, this parallelization is
quite similar with respect to the process parallelization what we seen in the previous
slide. However, it has lower overheads. So, the properties of this particular thread model
of a parallelization, is as follows. Since we have created only one process over here, so,
what we require is just one fork system call to create this process. Now within this
particular process in order to create the 4 threads, we may use a library call such as
Pthread create which may or may not require system calls.
Further, a major difference with respect to the process parallelization from the previous
slide (refer slide time 06:07) is that these threads are not isolated from each other. So,
what this means is that since all these 4 threads execute in the same process space, so
these threads could access the processes segment such as the heap segment and the
global data. Also it is possible that the instructions or the text segment in this process is
also shared among the various threads. Now what is different or what distinguishes the
different threads is that each of these threads will have its own stack and having its own
stack will allow it to have its own execution context. So, each of these threads will be
able to use the stack to create local variables as well as to store information about
function calls and so on.
Further management of these threads that is for example, waiting for another thread or
exiting a thread, may not require to have system calls. So, what we will see now is these
threads in more detail.
470
So let us look at it with this particular figure (refer above slide). So, we have this single
process over here (complete box mentioned above) and within this particular process we
have multiple threads (3 threads as given in above slide). So, these are the execution
context of this process. So, these execution contexts all share the same code, and have
access to the same global data and heap of the process, and also they will have access to
the same files which are opened by the process (given on top of the box). So, what
distinguishes one thread from the other is that each thread has its own stack which it uses
for local variables and for function calls.
So these threads are as mentioned execution context and each of these threads could be
schedule to run on one of the processors present in the system. Or it could be also
possible that these threads are scheduled on to the same processor will be at different
time instance. Now each thread is executing almost independently of each other
therefore, this registers over here (mentioned in above slide in process box) signifies the
execution context of a thread. So, as we have seen, each thread has its own execution
context.
So the execution context of this thread (first thread in above process box) comprises of a
set of registers like the EIP and the ESP register. Similarly the execution context of this
thread (second thread) comprises of the stack and registers values. So, these register
values would help in the stopping and restarting of threads after or during a context
471
switch.
Now, what is the major advantage of having threads over processes?
So this is actually seen from this particular slide (refer above slide); essentially the most
important difference comes on the fact that threads are extremely light weight compare
to processes. To take an example let us have a look at this particular table (mentioned in
above slide image), which shows the time required to create 50,000 processes using the
fork system call and it compares this with creating 50,000 threads on exactly the same
machines. So, let us take any of these machines as an example. Let us say the first one
(first row of table) and what we see and if you look in this particular column (second
column), which shows the time taken is 8.1 in order to create 50,000 processes using the
fork system call. On the other hand, creating 50,000 threads on exactly the same machine
only requires a time of 0.9.
So we see that creating threads are much, much more efficient than creating processes. In
a similar way if we actually analyze other operations as well, we would understand that
creating and managing threads are much more lighter than managing processes. Another
important advantage of threads is that they allow efficient communication between the
execution entities. For example, we have seen in processes since each process is isolated
from each other therefore, the only way that processes can communicate is by IPCs, and
as we have seen IPCs involve quite a bit of system calls. So, for example, in the message
472
passing IPC where we use send and receive; each send and receive invocation is a system
call and has significant overheads.
Similarly for shared memory IPCs, creating and managing the shared memories is again
done by system calls, again has overheads. On the other hand communication between
two threads in a single process can be done extremely easily, the reason for this is that
the global data of the process is shared among the various threads and this means that
each thread has access to the global data in the process. Therefore, communication
between threads in a single process can be easily achieved via the shared global data or
similarly via the shared heap of the process.
Another big advantage of threads is that it results in efficient context switching. So, when
we switch from one process to another, the context switching time is considerable
because the process state is quite large. Also we require the TLB, we flushed and the
page tables for the new process to become active and this new page table would then
loaded into the TLB and so on. On the other hand, switching from one thread to another
first of all requires a smaller saving of the context, because the context of the thread is
much more smaller than that of a process. Therefore, lesser amount of context needs to
be saved when switching between threads. Therefore, threads context switching would
be faster.
And other reason why context switching is faster in threads is that we are not changing
page tables. Essentially since the threads execute from the same process and each
process is associated with a set of page directly and page tables. When switching from
one thread to another, this page tables and page directories will remain the same.
Therefore, there are no overheads of making an other page table the active and therefore,
the locality is maintained and cached entries in the TLB will still be valid.
473
So this particular slide (mentioned in above slide image) re-iterates a lot of differences
between processes and threads. So, essentially a process is an isolated execution entity
which has its own code, heap, stack and other segments. On the other hand, a thread does
not have its own data segment and heap and essentially it adopts or it uses the processes
data segment and heap. So, all threads within a particular process would use the same
data segment and heap. Another difference is that a process as itself is a complete entity.
So a process you could consider it as having a single thread that is a single execution
context and could survive on its own. On the other hand; a thread cannot live on its own.
So, it needs to be attached to a process. So, when a process terminates then all the
threads that are present in that process will also terminate. So, in other words without a
process a thread cannot execute. On the other hand, a process would just have a single
thread and that is sufficient for the process to execute.
So, why do we say a single thread? This means any execution context of a process forms
the single thread. So, threads within a process share the same code and files as we have
seen in the earlier slide (refer slide time 15:53), but each thread has its own stack and this
stack is reclaimed, when the thread dies. So, it is possible that when a thread terminates
the remaining of the process will continue to execute. For example, if a process has say
10 threads and one of the threads terminates then the remaining nine threads will
continue to execute.
474
On the other hand if a process terminates, then all threads within that process will also
die or will also terminate.
So, how do we create and manage threads? Now there are several libraries to do so, but
what we will look in this particular video is a very popular library known as the pthread
library. So, we will see some of the very critical or very important functions in the
pthread library which are used to create and manage threads. So, let us see how we could
create a thread in a process. So, in the pthread library a thread is created by using this
function pthread_create(), which takes several arguments. So, it takes like 4 arguments
and these 4 arguments are as shown over here (mentioned in above slide image).
So the 1st argument which is a pointer to pthread_t is the thread identifier or TID (refer
above slide). So, this is very similar to the process identifier or PID which we have
studied in the process video. The 2nd argument to pthread create is the attribute.
So through this attribute you could specify several properties of the thread that you have
been creating. So, what pthread_create() function would do is that when it is invoked, it
is going to create a thread context or rather a new thread context in the process and it is
going to start executing that new context from this function specified over here. So this
3rd parameter specified as start_routine is a pointer to a function, which begins to execute
in the new thread context.
475
So this function (3rd argument) will start to execute in a different thread. The last
argument for pthread_create is arg which is a pointer to the arguments to this particular
start routine function (3rd argument). So, in the next couple of slides, we will see
examples of how to use pthread create and also examples of the other functions in
pthread.
So now, that we have created a pthread and let us say it has done its job. The next thing is
to actually destroy a thread and this is done by this function called pthread_exit(). So,
pthread_exit() will also pass a pointer to the return value, in order to pass the return
status of the thread (mentioned in above slide image). So, this is in many ways similar to
the exit system call that we have seen in processes.
Now, another important pthread library function is this particular function call
pthread_join(). So through this, a process or a thread could wait for a specific thread to
complete. So, this is much like the wait system call that we have seen with respect to
processes. So, in pthread_join() what is specified is the TID of the thread to wait for that
is the thread Id of the thread and what is obtained over here is the exit status of the thread
(refer above slide for arguments of pthread_join()).
This pthread_join() will block the calling thread, until the thread specified by this TID
passed over here (first arguments of function) exits and when the thread exits then
pthread_join() will wake up and it will able to read the exit status of the exited thread
476
specified by the TID. So, given these particular 3 functions pthread_create(),
pthread_exit(), and pthread_join(); Let us see how we could use these pthreads in order
to parallelize our program of finding the sum of the first 10 million positive integers.
So, this is the code to do that (mentioned in above slide image). So, what this particular
code does is that it creates 4 threads and each thread finds the sum of a different 2.5
million set of numbers. And these 4 threads execute in parallel and therefore, very
quickly find the different 2.5 million numbers and then we add the results to get the sum
of the first 10 million numbers.
So let us see this example in more detail (refer above slide code). So, first of all we
defined 4 variables of type pthread_t. So, these are the t 1, t 2, t 3 and t 4 (mentioned
inside main()). So, these are going to be used to create the 4 threads and these would
contain the thread ID of the TIDs. Now as we have seen in the earlier slide, creating the
thread is done by this function call pthread_create(). So, each of these pthread_create()
invocations (mentioned inside main()) would pass the pointer to one of this that is t 1, t 2,
t 3 and t 4 which then gets filled with the thread identifier or the TID of the newly
created thread.
The second parameter that is the attributes of the thread is specified as NULL while the
third parameter is function pointer, which specifies the function where the thread should
execute. So, over here the function pointer is a pointer to this function thread_fn, which
477
is defined over here (above while condition i.e void *thread_fn(void *arg)) and it takes
an argument void *arg. So, the last parameter which is passed to pthread_create is the
argument which would be passed over here in the arg of this particular function (i.e
*thread_fn()).
So, in this way when pthread_create()is invoked, it would create a thread context in the
process and this thread context or this execution context will begin to execute from this
particular function (i.e void *thread_fn() mentioned in code). Now the argument passed
in this particular thread would be as specified in the last argument of pthread_create(). In
this way we have invoked 4 pthread_create()and thus creating 4 threads within the single
process. All these 4 threads have the same starting function that is thread_fn and the
thread IDs will be filled into this first parameter that is t 1, t 2, t 3, and t 4 respectively.
Now each of these 4 threads which will begin executing from this thread function (i.e
*thread_fn()) which is like, you could think of like the main of the thread, would see a
different value of arg. For example, thread 1 would have a arg value of 0, thread 2 would
have a arg value of 1, thread 3 would have an arg value of 2 and thread 4 would have an
arg value of 3. So, using this arg value which is then type casted to ID and then it is
actually used to determine the start.
So, start (variable defined in above slide) is then use to determine which of the 2.5
million numbers that particular thread should add. For instance if arg in a thread is 0,
then the value of ID after typecasting is 0 and the stacked value would also be 0. So, this
thread with the ID 0 would add the first 2.5 million numbers that is the number from 0 to
2.5 million.
Now, in addition to this, what is defined is a global array called sum (defined on top in
above slide). So, this is an unsigned long array of 4 elements. Now this sum is used to
accumulate the sum for each thread. So, sum[0] would contain the sum of the numbers
from 0 to 2.5 million which is filled by the thread 0 or thread 1 that is the t1 thread.
Similarly, sum[1] will have the sum of the numbers from 2.5 million to 5 million and this
sum[1] is filled by thread t2. In similar way sum[2] and sum[3] would have the
corresponding 2.5 million sum and it is filled by t3 and t4. So, while this process is going
on in the 4 threads that is this while loop which takes considerably long time, is being
executed by the 4 threads independently.
478
What the main thread does is after creating the 4 threads it will invoke join (mentioned
inside main() after pthread_create()). So, it invokes 4 joins corresponding to the 4 threads
and what this join call would do is that it is going to block until the corresponding thread
specified by the TID over here will exit. Thus when thread t1 exits from this particular
function (i.e *thread_fn()), this pthread_join() of t1 will wake up and it will complete
executing. So, in this way all 4 join functions on completing will indicate that all the 4
threads have completed their execution.
Then after the 4 threads have exited, the elements of sum which contain the partial sums
of the 2.5 million numbers are taken and added to get the final result which is printed on
using the printf(). So, this final sum printed is the sum of the first 10 million numbers
and therefore, we have seen that the large job of adding numbers from 0 to 10 million is
broken down into 4 parts and each part is then computed by a thread to get a partial
result which is then added on.
Now in order to execute and run this program we use something like gcc threads.c which
is the compilation and we also specify a minus l P thread which is the pthread library,
which also needs to be linked into this particular executable (i.e gcc thread.c –lpthread).
So, gcc would result in an a.out executable which is then going to be executed.
So, you can actually try out this particular program on a LINUX system and you could
try to determine the amount of speed up that you obtain by having different threads to do
the job, compared to different processes and also a sequential program where you have
only one process executing this job.
479
Besides the pthreads there are several other libraries which you could possibly use, to
create multi threaded programs. For example, the windows threads library which is a
library used for Microsoft windows based applications, boost is a library for C++ then
there is also another library known as Linux threads and so on. So, if we actually Google
for thread libraries you will find a large list of such libraries which are present.
Thank you.
480
Week - 07
Lecture - 34
Threads (Light Weight Processes) Part 2
So now that we have seen how threads are created and how they can be used to solve
large jobs by parallelization. Now, we will look at how threads are managed in systems.
Essentially, as we have seen threads are executing context and therefore, we need some
entities which manages the thread resources, decides which thread should execute in the
CPU what CPU, should be used and so on.
So in order to do this the two strategies that are available are known as the User threads
and Kernel threads. So, User threads are threads where the thread management is done
by a user level thread library. So typically in this, the kernel does not know anything
about the threads running. In Kernel threads, the threads are directly supported by the
kernels and these are sometimes known as light weight processes.
So, we will look at user threads and kernel threads in more detail taking one at a time.
481
So let us start with User threads. So in a user thread, as shown over here (mentioned in
above slide image) we have the kernel space where the operating system or the kernel
executes and we have the user space which has different processes executing. Now, each
of these processes would have multiple threads running. So in addition to this we have a
run time system over here this rectangle over here (inside process circle), which manages
the several threads in this particular process. Also what is required is a thread table which
is stored as part of this run time system. So, note that this thread table contains
information which is local to only the threads in this particular process.
So for another process, it would have its own run time system and its own thread table.
So, notice that the number of entries in the thread table is equal to the number of threads
that are executing. Also note that the thread table which is sometimes known as the TCB
or the thread control block is different from the process control block which is stored in
the kernel space. Essentially besides the fact that the thread table will be have far fewer
entries compare to the process control block in the kernel space, it is also in a user space.
So, the advantage of user level threads is that it is extremely fast, and it is really light
weight. Essentially this is because there are no system calls required to manage threads.
482
The run time system which is present over here (mentioned in slide time 1:20) does all
the thread management, all the context switching between the threads and so on.
As such the kernel would not have any knowledge that a particular process has multiple
threads. So, the second advantage which we will see is that this particular mechanism of
supporting threads in a system is useful on operating system that does not support
threading. An other important advantage of this user level thread is that switching
between threads is extremely fast, and the reason why it is fast is that, there is no switch
from user mode to protected mode and back to user mode again.
So on the other hand, switching between threads would just require a switch within the
user mode itself from one thread to another. So, the drawbacks are also several. So, one
important drawback is that there is a lack of coordination between the operating system
kernel and the threads. So, this is because the OS is not aware that a process is multi
threaded, and also it has no indication about how many threads are present in a particular
process.
So to take an example of what problems it could cause. So, consider a system which has
2 processes, and one of these processes has 100 threads. Now the kernel does not know
about this because the kernel is unaware about the several threads that are present in a
process and therefore, when it does context switching it is unaware that one process
requires a far more number of threads compare to the other one. So, it would allocate the
same time slice interval for the process with 100 threads as well as the process with the
single thread.
So, what happens is that, because the scheduler in the kernel is unaware about the
number of threads that are executing therefore, scheduling decisions cannot be made to
favour processes with the larger number of threads. So, another drawback with respect to
the scheduling comes, when the threads are in different states. For example, let us say
this particular process has three threads, and one of these threads is in runnable state,
while the other two threads are in blocked state. Now the operating system is unaware of
this. So, what should the decision be? So, since one of the threads should be runnable it
can execute in the processor or should it be considered as a runnable process, or since
483
there are two threads which are blocked, should this process be considered as a blocked
process and not going to execute in the processor.
A Third issue occurs with respective system calls, now suppose system calls are
blocking in the sense that when one of these threads invoke a system call, then all other
threads will need to wait until that first thread completes its system call invocation. Thus
in order to support this user level thread model, what is required is that the OS should
preferably be supporting non blocking system calls.
So now let us look at the kernel level threads. So, as before the process which runs in
user space could have multiple threads with each thread being a separate or a
independent execution unit and the management of these thread resources is done in the
kernel space. So, along with the process control block, that is shown here as a process
table, the kernel also maintains a thread control block in the kernel space (mentioned in
above slide). So, this is quiet unlike the user level threads, where the thread table is
maintained in user space, here the thread table or the TCB thread control block is
maintained in the kernel space. And therefore, the kernel is aware of the number of
threads that a process executes, and therefore, could make a lot of decisions based on this
fact.
484
For example; the scheduler could make more smart decisions about how much time
slices should be allocated to a process, depending on the number of threads the process is
running. So, for example, the scheduler could decide to give threads with a larger
number of processes more amount of time to execute. An other advantage is that since
threads are managed by the kernel, so, the blocking on system calls is not required.
Essentially when a thread executes a system call the other threads in thar particular
process can continue its execution.
It does not have to block until this thread (thread mentioned in above slide image) which
invoke the system call has completed its invocation. The drawback of this particular
kernel level thread model is that it is slow. Essentially managing the thread would in
kernel invocations, and therefore; since this kernel invocations involve system calls
therefore, it is considerably slow. Also there are overheads with respect to the operating
system and this occurs because the OS, the kernel needs to manage the scheduling
threads as well as the processes.
So, this means that in addition to the metadata present in the kernel about the process,
more metadata needs to be present for each thread that is executing in the process and
therefore, the overheads in the kernel could be significant. So, when we actually design
an operating system or for that matter an entire system which supports threads, there are
several aspects which need to be taken care of which may lead to lot of complexities. So
this particular slide (refer below slide image) highlights some of those issues.
485
For example what should the system do when a thread invokes a fork? So, as we know a
fork is a system call which causes the operating system to execute and it would result in
duplication of that process. So, a new process would be created which is an exact copy of
the invoking process. Now what would happen if a thread invokes the system call fork?
So, there are several options that one could think off. So, what should the OS do? Should
all the threads that are executing in the process should be duplicated, in other words
should a new process be created, which also has the same number of threads executing in
the same stages.
So, this is easier said than done, essentially it could also create a lot of synchronization
issues. For example, consider the case that we have a process with two threads, one
thread is executing a critical operation say in a critical section, which is accessing a
critical resource while the other thread invokes the fork system call. So, what would
happen if the OS duplicates the entire process along with all its threads? So, as we have
mentioned the second thread in the new process would also be in the critical section and
as we have seen in earlier videos this could be catastrophic, essentially it could change
the output of the program. An other approach that one could follow while designing or
managing this particular aspect is where we duplicate the caller thread?
486
So, even though the process may have 10 different threads, one of these threads invoke
the fork system call, the new process created will only duplicate that thread. So, the new
process created will not have the remaining 9 threads, but will only have one of these
threads. So, another thing to think about is what should happen when there is a
segmentation fault in a thread. So, should the operating system terminate the just the
single thread, or should the entire process be terminated? Now making choices for these
is not easy and operating system designers would need to make critical choices about
how to manage these aspects while designing the operating system.
So let us see one typical application of the use of threads. So, you could take for
example, a network card where packets keep coming in through the network and these
packets have to be serviced let us say through threads. So let us say we have a loop over
here (mentioned in above slide), which keeps waiting for an event to occur for example,
the packet on the network and when that when an event occurs, it spawns or it creates a
thread which services that event and then terminates. So, during the process of servicing
of that event, if another event occurs, then a new thread is created. In this way we could
have several events which are serviced simultaneously.
487
So this approach is scalable because it could service multiple events simultaneously. So,
the drawback of this particular model is the overheads. Essentially creating and term
terminating threads though have less overheads, is still an overhead which could reduce
performance and these could affect the entire system. So, what applications typically do
is use the technique known as Thread pools.
In this particular technique, what the application does on creation is that it is going to
create a pool of threads. For example, it could create say 50 or 100 different threads and
these threads are would typically be in a blocked state.
Now, there would be a main loop which keeps waiting for an event to occur (mentioned
in above slide image) and when an event occurs one of these threads which in the
blocked state is woken up, that thread would then service the event and go back to the
blocked state. So, in this way if there are 50 threads in this pool which are already
created, there are 50 events which could be serviced simultaneously without any
overheads. If the 51st event occurs while all the 50 threads in the thread pool are busy
servicing the event, then the 51st event would need to wait. So, this way we see that we
have eliminated the overheads of creating and destroying threads whenever an event
occurs.
488
So now, the only requirement is to pick out a thread from the thread pool which would
then service the event. Now an other important aspect with thread pools is the number of
threads in the pool. So, this is critical for every application and is going to be very
application dependent. For example, if you have a thread pool with very few numbers of
threads may be say 4 or 5, then it could service only 4 or 5 events simultaneously.
If more events occur then the events would have to be queued until the threads have
completed their servicing and therefore, your performance is affected. On the other hand
if we have a large number of threads in the thread pool for example, 1000 threads while
the events do not occur so often therefore, a large number of threads are simply wasting
resources sitting idle in the thread pool. Therefore, the number of threads in the thread
pool is critical choice that an application designer will have to make.
So, with this we have given a brief introduction to threads, we had seen the difference
between threads and processes and how threads could be actually used to reduce the
execution time and improve performance of applications, and we have also seen a brief
introduction of how threads are managed in operating systems and the various usage of
threads.
Thank you.
489
Week – 08
Lecture – 35
Operating System Security
Hello. In this video we will look at Operating System Security. Now, security as such is
become extremely important in this current world because systems as such are always
connected, as a result it becomes very easy for malicious programs to enter into the
system. Therefore, operating systems should be designed in such a way so as to prevent
as much as possible any loss of information due to such malicious intrusions.
So, we will see a brief introduction to Operating Systems Security in today’s video.
So, whenever we design a system, we need to have some security goals and these goals
are the Secrecy, Integrity and Availability of the system (mentioned in above slide
image). Now, secrecy or confidentiality means that certain objects should not be visible
to particular processes. For instance depending on the privilege of a process and also on
the privilege of the user running that process, certain objects like certain files stored in
the desk or certain network sockets should not be readable by that process. Essentially
what confidentiality achieves is that unauthorized disclosure of some objects should be
490
prevented. What secrecy or confidentiality achieves is that unauthorized disclosure of
certain objects like files and sockets should not happen.
With respect to integrity means unauthorized modifications. In other words a malicious

user or a user with not sufficient privileges should not be able to write to a particular
object, such as a file or a socket. To take an example, a normal user in the system should
not be able to modify some system related files.
Availability, on the other hand means that a particular user or a particular process has a
limited use of the system resources. In other words, one particular user should not be
able to hog the entire system resources.
If such a thing (Availability) is not available in the system it could be lead to what is
known as denial of service attacks, wherein some malicious programs running on the
system will prevent any other program which is present from having access to certain
hardware resources like the CPU, the RAM or disk. So, availability in a system largely
depends on how the operating system gets designed.
For instance, we had seen how CPU schedulers can be designed in order to ensure fair
scheduling that is every process in the system gets a fair share of the CPU depending on
its priority. Now, in order to achieve secrecy and integrity, what operating systems
generally do is to have access control. So, we will be looking at access control
techniques in this particular video.
491
In order to achieve access control, what it means is that every access to an object such as
a file, a socket or it could be a hardware device like a printer or a monitor should be
controlled. Essentially, if a process wants to get access to a particular file or any other
object in the system, it has to go through an access control mechanism. Essentially, all
and only authorized accesses can take place. In other words, if a process needs to access
an object then it definitely should be given access to that particular object. So, what we
mean by access? Essentially by access we mean an operation on the object which a
particular process intends to do.
So, these operations could be one of read, write, execute, create or delete. Now, this
access should be permitted if and only if the particular process is authorized to make that
access. In other words, only if a process is authorized to execute a particular program or
read or write to a particular file only then should the operation be permitted.
492
In order to develop an Access Control System, there are three components: Security
policies, Security model and Security mechanism (mentioned in above slide image).
Now security policy, are high level rules that define the access control. Now security
model is a formal representation of the access control security policy and its working.
Essentially this (security model) is a formal model or a mathematically represented
model of the security policy. And this (security) model is used to help in proving various
things about the system, for instance you could use this particular security model to
prove that a particular system is secure.
Now, security mechanism is low level hardware or software that has the functional
implementation of the policy that is the security policy and the model. So, in other words
you can think of this (i.e security policy) as the high level rules that define the security
policies or the security features that need to be supported by the system. This (i.e
Security Model) is a mathematical model or slightly lower level representation and while
this mechanism is the actual implementation of the policies. So, let us look at each of
these more in detail.
493
So, Security policy is essentially a scheme for enforcing some policies within the system.
So how do we decide upon what policies a system should have? This is defined by what
threats are present in the system as well as how the system is designed. So, developing a
security policy is not easy. So it would require a lot of brain storming to actually identify
what are the various threats that or what are the various attacks that are possible on the
system and once these attacks are known then policies are created in order to prevent
such attacks. So, how is the security policy actually look like?
So a security policy essentially would be a set of statements. These statements should be

extremely succinct and precise and should have the goals of the policy. So, these goals
should be agreed upon by either the entire community or the entire set of people who are
developing that system, or by certain top management and this forms the basis for the
security model which is the formal mathematical representation of the policy.
494
So let us take how a security policy should not be written. So, what we are seeing now is
that security policies are not necessarily for computer systems, but it could also be for
organizations (as mentioned in above slide image). So, this particular security policy is
for this particular company called Megacorp Incorporated. So, there security policy is
extremely short, it has just four statements and it reads as follows; this policy is approved
by the management, all staff shall obey the security policy, data shall be available only to
those with a ‘need-to-know’, all breaches of this policy shall be reported at once to
security.
So what is the issues with this particular security policy? So if we actually take a more
closer look at this policy from this particular company, what we see that there are lot of
flaws in this. So, essentially the policy is not complete and cannot be used to build a
security mechanism. So for instance, this policy is approved by management the first
thing (i.e first policy mentioned in above slide image). So, typically in our security
policy the approval of the policy should not be part of the policy document itself.
Second, all staff shall obey (mentioned in above slide). So, what we mean by this? Who
enforces that a staff should obey? Is it moral a requirement of the staff to actually obey to
this policy or there is actually a unit which enforces all staff in the organization to obey
for this particular policy? Third, the ‘need-to-know’, so how or how to be defined what I
need to know? Who determines who should know or what information? So, this is not
495
specified or not very obvious from this particular policy statement.
Finally, all breaches of this policy should be reported at once the security. So, how are
breaches deducted? The security policy does not tell anything of this form? Whose duty
is to report them? So this two is not mentioned over here (in above security policies), so
as you see even though this particular policy is very short and very tars and incorporates
just four points, still it leads to a lot of ambiguity and such ambiguity would lead to a
weak security for that company. So, when we have to write a security policy for a
company it should be complete in all aspects.
Now, let us look at the security model (mentioned in above slide image). Essentially why
do we need to have the security model at all? Why cannot we go from the security policy,
use the security policy and directly implement security mechanisms? Why do we need to
have a security model present in this entire hierarchy? Essentially by having a security
model, we are able to model the security policy in a lower and more formal construct. So
in this way any gaps in the security policy would be deducted and secondly, we could
also use tools and techniques in order to argue or prove that the system is indeed secure
with respect to the security policy that was defined.
496
So, the last aspect is the Security Mechanism (as mentioned in above slide image). So
Security Mechanism as we have seen deals with implementing of the security policy.
Now, it is important that implementing the security mechanism is bug free, it has no
bugs. Why is it extremely important that there are no bugs? This is because if there are
bugs in the implementation of the mechanism then it would be possible for attackers to
exploit that bugs and get in unauthorized accesses into that system.
Second, the implementation of the security mechanism should be the trusted base.
Essentially this is the core from where the security of the entire system would depend on.
Therefore, if it itself is buggy and incomplete, it will not create a secure system. So,
properties of the security mechanism implementation is that it should be tamper proof (1 st
property), non-by passable (2nd property). And by non-by passable, we mean that all
accesses into the system should first be evaluated by the security mechanism. Then it
should be a security kernel (3rd security kernel) so, what it means by this is that the entire
implementation for the security policy should be confined to a limited part of the system
and it should be scattered into various parts of the system.
So, having a security kernel where everything is just confined into, say a small part of
the entire system would make it easy to test and debug and verify in terms of security.
Again it should also be small (4th property). So, a small size for the security mechanism
will ensure that it can undergo some rigorous verification.
497
Now, there are 3 techniques for access control. They are the DAC, MAC and RBAC. So,
DAC is the Discretionary Access Control, MAC is Mandatory Access Control and
RBAC is Role-based Access Control. So, we will be looking at the DAC and MAC in
this particular lecture.
Discretionary Access Control or DAC (mentioned in above slide image) is an access

control which is based on the identity of the requester. There would be a set of access
rule what requesters are (or not) allowed to do. So, these access rules would define
498
essentially what objects the requester, it also known as the subject could access and what
objects could be read, written to, executed, created or deleted and so on.
Now, the privileges for these various objects are granted as well as revoked by the
system administrator. So, users if they have a particular privilege then they can also pass
on that privilege to other users. A very common example of a DAC system that is the
discretionary access control system is the Access Matrix Model.
So, the Access Matrix Model was designed by Butler Lampson in 1971. It comprises of
subjects and objects, in addition to this it also has a table in this form (mentioned in
above slide image) where each cell has some particular actions. So, subjects are active
elements in the system, such as users of the systems or processes or programs that are
requesting information.
Objects on the other hand are passive elements which for instance store information.
Now the actions are specified in each cell of this particular matrix. So, for instance the
subject Ann has execute permissions for program 1, she has read and write permissions
for file 2 and she has read and write permission for file 1 as well as she is the owner for
file 1. So, if you look more carefully, Ann does not have any permission on file 3, she
can neither read it nor write it nor she can actually read or write the program 1
(mentioned in above slide image).
499
So, in order to represent this access matrix model in a formal method, what we can do is
define this matrix A which comprises of Xsi, that is for subjects and Xoj, which is for an
object (mentioned in above slide image). So, A of this particular thing (i.e A[X si,Xoj])
would indicate the subject si, for instance Bob and an object for instance file 2 (refer
above slide image). Then we could also give rights to each cell in this particular matrix.
For example, the rights could be taken from a set like this R= {r 1, r2, …, rk} and this
particular thing A[Xsi,Xoj] is a subset of R. So for example, the rights in this example (as
mentioned in above slide image) is own, read, write, execute, so on and these would be
in the set R and corresponding to each cell in this matrix we define what the rights are.
So, this would define what the right for subject si has on the object oj. In addition to this
there is something known as the primitive operations. So, this is represented by O and it
is a set of 6 operations which is specified here (mentioned in box in above slide image).
So this is for instance, enter r into some location in the matrix or delete r from that
location, here r is taken from the generic rights (as mentioned in above slide image). So,
it is an element form the set R. Similarly other operations are to create a particular
subject Xs, create an object Xo, destroy a subject Xs, or destroy an object Xo (refer slide
time 17:27).
So these are the only primitive operations that can be done on this particular access
matrix (as mentioned in above slide). So, as we see each of these primitive operations
500
would either modify the contents of a particular cell in the matrix or it could delete an
entire column with the destroy subject and destroy object. So it could either delete a
particular column or a particular row or if it wants it could also create a new column that
is creating an object or creating a subject. So once, we have defined such primitive
operations, we can define something which is a bit more complex and paste on these
primitive operations which are known as Commands.
So, to recollect we have the access matrix which is specified by A (i.e A[X si,Xoj]), we
have the generic rights r1 to rk (i.e R= {r1, r2, …, rk}) and we have 6 primitive operations
as we specify here (mentioned in above slide image in box). So, we can use all of this to
create complex commands like this way (mentioned in above slide image). A typical
command would look like this α(X1, X2, …, Xn) and it would be called say, for instance α
which takes several parameters X1 to Xn and we can have a set of rules. For instance if r 1
is in this matrix cell (i.e A[X si, Xoj)] and r 2, r 3 and so on, then perform these following
operations (i.e op1, op2 etc). So, what we are saying is that based on these low level
primitive operations, the genetic rights and the access matrix, we can have more
complicated commands on the access matrix.
501
So let us take a few examples of such commands, for instance CREATE(process, file).
So, process here is the subject and what it means to say is, this process (first argument)
wants to create this file (second argument). So, the command will work like this (first
command mentioned in above slide image), so first we use the primitive operations,
create an object file and then enter own into (process, file). So, this process, file
corresponds to the cell in the access matrix. So if this particular command is not present,
then it will not be possible for a process to actually create a particular file. So, this is a
very simple example.
We could have something a bit more complicated like this particular example that is
CONFERr - owner, friend, file. So, owner and friend are the subjects and file is the
object over here (second command mentioned in above slide image). And what is
intended to do by this particular command on the access matrix is that, we want to confer
right ‘r’ which is present over here (i.e CONFER r) to a friend for that particular object.
So, this owner (1st arg) wants to confer the right ‘r’ for the friend (2 nd arg) with respect to
this particular file (3rd arg). So, what goes on in this particular command? So essentially
we first test if the owner is the indeed the owner of that file. So, we see that first if own
right is present in owner, file in the matrix cell corresponding to owner the subject and
file the object (i.e own in (owner, file)), if this is indeed true then enter r that is enter the
right ‘r’ into the friend, file part of the matrix (i.e enter r into (friend, file)).
502
Similarly, we can have another example, where we can remove a right from an exfriend.
We are passing this particular command, three parameters: the owner, ex friend and the
file (i.e REVOKEr (owner, exfriend, file). So, essentially what we want to do is revoke
the particular right ‘r’ from this exfriend which is another subject. So, the command
looks like this, if own is in the (owner, file) that is if owner is indeed the owner of the file
and the right ‘r’ is present in (exfriend, file) then delete ‘r’ from (exfriend, file).
Therefore, we can remove the right ‘r’ which could be a read or write or execute right, on
the file with respect to exfriend. Essentially the exfriend subject will not have the right
‘r’ on the particular file.
So a protection system, if we actually represent it by this particular matrix (first matrix

mentioned in above slide image) and what we seen is that we could apply commands to
it like we have seen a couple of commands in the examples in the previous slide (refer
slide time 21:19) and with each command we are modifying this particular access matrix.
So, this command for instance (i.e command 1) has modified this cell over here in the
access matrix (second matrix mentioned in image). Similarly, if you run another
command (I,e command 2), another part of the access matrix or multiple cells in the
access matrix will be modified (third matrix mentioned above). So, based on this we
define, what is known as a Leaky state? A state of the access matrix is said to leak right
‘r’ if there exists a command that adds the right ‘r’ into an entry in the access matrix that
did not previously contain ‘r’.
503
So, what this means is that we have the state of the matrix here (two matrix mentioned in
leaky state section) and we run a command and as a result there is an r or right ‘r’ which
is entered into this particular matrix (i.e second matrix). So, what we say is that this
particular state of the matrix leaks r. Now, a leaky state does not always need to be bad.
So, it is left to us to actually determine whether leaking this ‘r’ from a particular cell in
the matrix is good or bad with respect to security of the system.
So let us define when a system is said to be safe? So, we have 2 definitions of safety. The
1st definition; a system is safe if access to an object without the owner’s concurrence is
impossible. 2nd, a user should be able to tell if giving away a right to a particular subject
and with respect to an object would lead to further leakage of that right.
504
So let us see what this means form a formal model. Suppose, a subject ‘s’ plans to give
another subject s prime i.e s’, the right r to object o. With r entered in A[s’, o],that is with
a new right added to this particular cell of the matrix. Is it possible that r could
subsequently be entered somewhere else in the matrix A, if such a thing is possible then
the system is set to be unsafe. So, essentially a system is unsafe if any operation or any
command run on the access matrix could result in that particular right being conferred or
transferred to someone else or to somewhere else in the matrix.
505
So we will look at an example of an unsafe state. So let us consider these two commands
(mentioned in above slide image), one is the CONFER execute. So, the subject S wants to
confer the subject S prime, the execute right for O. So, this command CONFER execute (S,
S’, O) states that a subject if it is the owner of the object O then it can give the right ‘x’
that it can give the right to execute object O to another subject S prime (i.e x in A[S’,
O]).
Now, let us say our system also has this particular command which is called
MODIFY_RIGHT(S, O) which states as follows (mentioned in above slide image), if a
particular subject has an execute right ‘x’ with respect to an object then it can also enter a
write in that particular object (i.e w in A[S, O]). In other words, what this
MODIFY_RIGHT allows us to do is that if a particular subject can execute an object
then it can actually modify its rights in order to change the object as well. So, essentially
it can write into that object.
So let us look at a particular scenario to see, how this example shows an unsafe state
(mentioned in above slide image). So, let us say, Bob creates an application object, he
wants it to be executed by all others but not modified by them. So, the system is
obviously unsafe due to the presence of the MODIFY_RIGHT in the protection system.
Alice, another person who has the execute right for that particular object or application
could invoke MODIFY_RIGHT to get the modification rights on that application and
once Alice gets the right to modify that application then there is nothing stopping Alice,
from actually changing that operation.
So, what we see is that because Bob has given the execute permission on a particular
object, and the systems policies or the systems access control mechanism is built in such
a way that because this right i.e ‘x’ is present in a particular object, then the w write is
also given to that particular object. Essentially if a subject has the execute right for an
object then that right is actually transformed and transferred and it will cause the w write
to be also associated with that particular object with respect to that subject. Therefore,
this particular state in the system is an unsafe state.
506
So how do we actually implement the access matrix model? So, one way as we have seen
is to have the Matrix representation. However, the issue with this particular
representation is that it is too large as well as it is too sparse. So, it is not a very efficient
way to use.
The other technique is to use something known as an Authorization table as is shown

over here (mentioned in above slide image). So this particular table is used in data bases
and what it actually have is that each row comprises of a user, it has a right and it has an
object. So, it says that for instance, Ann has a read, write on the object file 1 (mentioned
in above image).
So, for every right that Ann holds there will be a separate row in this particular table. So,
the problem with this particular scenario is that or this particular implementation
(authorization table implementation) is that, one needs to at least search the entire table
in order to identify whether a particular user has a particular right on a particular object.
So, it again is having a lot of performance issues with respect to how long it takes to
determine a particular right for a subject and object.
507
So another technique of implementing the access matrix model is by using Capabilities.

Here (as mentioned in above slide image), each user is associated with a capability list,
indicating, for each object the permitted operations. For instance, Ann, for file 1 has the
following rights over here, for file 2 she has these rights, for program 1 she has these
rights (refer above slide image). Similarly, for every user we have such a list and these
are the capabilities list for that user with respect to each of these objects.
So the advantage of such a model or such an implementation is from a distributed system

scenario, since it prevents repeated authentication of a subject. So, essentially once the
subject authenticates itself in the distributed environment, then it could obtain this entire
list and the system will know its rights for each object which is present. However, the
limitation is that it is vulnerable to forgery if a particular right or a list is copied by an
attacker, it can be then misused.
508
The third way of implementing the access control model is by what is known as ACL or
Access Control List (mentioned in above slide image). Essentially, each object here is
associated with a list indicating the operations that each subject can perform on it.
Essentially, this is the opposite of the previous way. So, corresponding to each object we
have a list and each node in the list has a subject and the corresponding operations that
are present here (i.e read, write, execute). So, the advantage that we get with this
particular thing is that it is easy to represent by small bit factors. So, if you look at how
UNIX actually implements this ACL.
509
So, we see that every file in a UNIX system has 9 bits associated with it (mentioned in
above slide image). So, this gives the authorization for each file to be specified. For
instance there are 3 bits which specify read, write, execute for the file, for the owner of
the file. Then we have the group for we have the 3 bits r, w, x for the group and for the
rest of the world, for the others that is we have 3 more bits (authorization for each file as
mentioned in above slide image). Thus in a UNIX system which uses the ACL
implementation of the access matrix, there are 9 bits involved and each of these bits
would give permissions to various users of that file.
Now, the vulnerabilities in the discretionary policy is that, it is subjected to the Trojan
Horse attacks. So what we mean by Trojan horse is that it is a small code which is
present in a larger non-malicious code. So, the small Trojan present in the larger code is
the malicious code. So the issue with this is that because the Trojan is actually like a
virus which is joined with a much larger program, so the Trojan can actually inherit all
the users or all the processes privileges.
So for instance if you have a Trojan which is connected to let say a program which has a
super user privileges to execute or which needs to execute as a route, then the Trojan
itself would be able to inherit all the route permissions. The reason why this happens is
that the Trojan present in the process would send request to the operating system in the
valid user’s behalf. So from the OS perspective, the operating system will not be able to
510
distinguish whether it is from the valid process, which is started by a valid user or
whether the request is coming from a Trojan. Therefore, it becomes difficult to detect.
Another drawback of discretionary policy is that it is not concerned with information

flow; anyone with access to information can also propagate information. For instance if
we have this one person over here (mentioned in above slide image in middle) who is
able read a particular file, there is nothing stopping this person from actually making a
copy of this file and transferring it to hundred different people. So as such the
discretionary policies do not take care of the flow of information from one person or one
user to another. So it is only capable of preventing access to information, but not the flow
of information.
On the other hand, in order to consider flow of information also, we need to use what is
known as information flow policies which restrict how information flows between
subjects and objects.
So, we will look at Information Flow Policies in the next video.
Thank you.
511
Week - 08
Lecture - 36
Information Flow Policies
Hello. In the previous video we had looked about Access Control Techniques, and in
particular we had looked into the DAC or the Discretionary Access Control. So, we had
seen that DAC is very efficient in giving certain access rights to a particular subject with
respect to a set of objects. However, what we seen was the drawback of DAC was that it
was incapable of preventing information flow.
So, in this video, what we are going to see is information flow techniques and the MAC
that is the Mandatory Access Control.
In Information Flow Policies, every object in the system is assigned to a particular

security class. As such the system is divided into a fixed number of security classes and
each security class is then given a particular category.
512
For example you have a security class which is high, a security class which is low, and so
on. Now each object in the system is assigned to one of the security class. Similarly each
subject in the system that is the entities which actually operate or access these objects
they also are given a particular security class (mentioned in above slide image). Next we
will define how information flows between the classes.
Note that here about information flow, we are not concerned about how information
about one object flows to another object or flows to a subject. But rather a concern is
more about classes. So, what we are going to see is that how information from one class
flows to another class.
So, formally this is represented by this following triple (refer slide time 01:03). So, this
triple is a 3 triple, it contains the SC that is a security class. It has this arrow operator
which shows the flow relation, and it has the join operator which is the join relation. So,
an operator like this B -> A implies that information from B can flow to A. Similarly this
example C arrow B to arrow A shows that information from C flows to B, and
information from B flows to A. In other words information from C can also flow to A
(mentioned in above slide image). Now based on this there is another technique of
representing, which is by this particular symbol the <= symbol. So, what it means is that
A dominates B and B dominates C.
Now, let us look at the join relation (refer slide time 01:03). So, essentially the join
relation is used to determine how to label information, after information from 2 classes is
combined. So, for instance, let us say we are creating a new object say a new file, and
this file is for instance created by concatenating an object present in the security class C
and an object in security class B.
So now, the question that arises is to which class should the new file that we have just
created be present in, should it be in security class C or security class B or in the A
security class? So, in order to formally write this, the join operator is used (mentioned in
above slide image). So, join is defined as a function between 2 classes. So, it takes the
security class and another security class and it will actually give us what is the resulting
security class i.e SC X SC -> SC.
513
So, we will see and understand more about this with an example. Now what we see is
that for the system, these security classes are fixed. So, during the design time itself for
instance we would say that our system has for instance just the security class A, B and C,
also the flow operations is fixed at the design time itself. So, we can design various
scenarios saying that information can flow to C and from C to B and from B to A and so
on.
So this is also a design time construct, also the join operation is fixed at design time. So,
what changes over time is the position of the objects. For instance we may have object
which is present in the security class A, and after a while this object becomes for instance
less important or it becomes public domain and it can be moved to the security class C.
To take an example from real life, what we see is that some top secret documents would
be initially categorized as highly secure documents, but over a period of time this
becomes public information and therefore, that particular document could then go to a
lower security class.
So, you see that while the security classes are fixed as well as the flow of information is
fixed among the security classes, the objects as such could move between security
classes over a period of time.
514
So let us take some examples about information flow. Let us start with a very trivial case,
which also is the most secure example for information flow, essentially because it does
not allow information flow between classes. So, let us take this example (mentioned in
above slide image) of a system where the security classes are defined by this particular
set SC and it has ‘n’ security classes. Out of these the security class A1 is the lowest
while security class An is the highest, then we need to define the flow operator in this
case it is Ai flows to Ai.
In other words what it means for every value from 1 to ‘n’, information can only flow
within the class. In other words, it is not possible for information to flow from one class
to another class. The third requirement is to define the join operator (mentioned in above
slide image). So, in this particular case it is quiet trivial to know that if you combine a
particular information from one class with an other information from the same class, will
result in a new object which also belongs to the same class. Now let us look at a less
stringent example, which is less secure than the previous case, which allows information
only to flow from the low to high and not anywhere else.
So the security classes are defined as in the previous case i.e trivial case. So, we had ‘n’
security classes and A1 is a lowest An is the highest and information can only flow from
Aj to Ai where j is less than equal to i (mentioned in above slide in second case). So, what
this means is that information can flow only from a lower class to a higher class, but the
opposite direction from a high to low is not possible. Also while defining the join
operator that is when we take information from a low class and combine it with
information from a higher class, that is Ai with Aj we will get some new information
which also belongs to the higher class that is the Ai class (3rd point mentioned in above
slide).
So, to repeat that if we take some information from a low class and combine it with an
information in a higher class for instance A2, then the new object that we create will also
be in the A2 security class. Thinks of what would happen, if instead of Ai over here (in 3rd
point) we had Aj. So, I leave that to you to think about.
515
So now, let us come to the Mandatory Access Control Mechanism. So, essentially over
here (mentioned in above slide image), the access mechanism or policies are based on
regulations which are sent by a centralized authority. So, the most common form is the
MLS or Multilevel Security policy. Essentially in this policy we have several access
classes like the unclassified, confidential, secret and top secret. So, every object in the
system needs a particular classification level. So, every object could be classified as
either one of these 4 (i.e 4 classes mentioned above). Similarly every subject in the
system also needs a clearance level. So, the clearance levels are also unclassified,
confidential, secret and top secret, so one of these 4.
Now, a subject with clearance X can only access all objects in X and below X and not
vice-versa, that is information can only flow upwards but cannot flow downwards. To
take an example suppose we have a particular subject that is a user who has a clearance
level secret, now what it means is that this particular user can access all the secret
objects, all the objects which are classified as secret, all objects which are classified as
confidential as well as unclassified, but this user will not be able to access any top secret
objects.
516
In other words information about from a top secret object cannot flow to a user with
clearance secret, while the upward direction of information flow is possible. Now there
are 2 types of MAC control techniques.
And the first we will see is the Bell LaPadula model. So, this was developed in 1974 and
has is a formal model for access control. It gives four access modes read, write, append
and execute. And it has 2 MAC properties that is No read up which is also known as the
SS property or simple security property and No write down which is the star property
(mentioned in above slide image). So, let us look at what these two properties are?
517
So, what this means that is the No read up that is the first property what we seen here
(mentioned in above slide image), is that if we have a user with clearance confidential
then that user can read all the objects which are confidential as well as all the objects
which are unclassified. In the sense she cannot read any object which is classified as
secret and any object which is classified as top secret. So, this particular mechanism does
not allow read up.
518
On the other hand the second policy’s states that there is no write down. In other words
this particular user with clearance confidential is allow to write or modify an object
which is classified has confidential as well as this user can modify objects of classified as
secret as well as top secret. She is able to write upwards, but what is not possible is that
she is not allowed to write downwards (mentioned in above slide image).
So, she is not allowed to change or modify any object which is unclassified. Now this
would seen very strange. So, let us see why such a mechanism was present.
519
Essentially let us consider this particular scenario (mentioned in above slide image),
where we have a process which is a confidential process with clearance confidential
which is executing. Therefore, as we know it could read data or read an object which is
also classified as confidential. Now let us assume that there is a trojan host that is present
(mentioned above in red box). So, what we have seen is that when a trojan executes it’s
going to inherit all the clearance levels of the process.
So in this case the trojan is also going to get a clearance confidential. Now assume that
write down was allowed, in such a case it will not be very difficult for the trojan to read
at particular file or a particular object which is marked confidential and write it down to
the unclassified, that is it could be made as a public domain information. So, you see that
this causes a flow of information from high a confidential level to a low confidential
level, essentially through that particular Trojan (as mentioned above). Therefore, this
particular model does not permit write down.
520
So, we look at the limitations of the Bell-LaPadula model, essentially what the Bell-
LaPadula model does not prevent is that it allows a user with clearance confidential to
write data upwards. So, this particular user with clearance confidential could essentially
modify a particular top secret document or a secret document (mentioned in above slide
image). In other words the BLP model does not address the integrity issues that are
present. So, in order to cater to the integrity issues, there was another model which was
used.
521
So, this is the Biba model (mentioned in above slide image). So, Biba model is the Bell-
LaPadula model upside down. So while, the Bell-LaPadula model just focuses on
confidentiality and ensures that no information flows from a high to a low.
The Biba model on the other hand ignores confidentiality all together and deals only with
integrity. So, the main goal of the Biba model is to prevent unauthorized users from
making modifications to a particular document. Also it prevents authorized users from
making improper modifications in a document. So, this Biba model is incorporated in
Microsoft windows vista operating system.
522
So, what the Biba model defines is that the user can first of all read and write to any
object within the same security class, and the user could write to a object in a lower
security class, and read from an object present in a higher security class. This particular
object (mentioned object in top class) can be read, this particular object (object in lower
class) can be modified because it’s in a lower security class, this can be read because it’s
in a higher security class, while objects in the same security class can be read and
written. Please follow the arrows in this particular case (mentioned in above slide
image). So, what it does not allow is that the lower security class be read from. Thus
essentially a user cannot read from the lower security class and cannot write to a higher
security class.
So, you see this is exactly the opposite of what the Bell-LaPadula model tells us. So, the
properties of the Biba model, is that there is no read down and no write up this is the star
integrity theorem and while the no read down is a simple integrity theorem (mentioned in
above slide image). So, with respect to the security classes this is the low integrity (low
level class) and this is the high integrity class (high level class).
523
So, why does the Biba model not support read down? Why cannot a user with a
particular security class read from an object from a lower security class? So, why is this
(low to high level) particular direction for information flow not permitted? The reasoning
behind that is that a higher integrity object such as this one (object present in middle in
above slide) may be modified based on a lower integrity document.
So for instance since, this particular user (mentioned in above slide) with a particular
security class is capable of writing to this particular object present in the same security
class. Now if she is able to read from a lower security class, and there is a flow of
information upwards what it means is that she can then get influenced by this
information (information present in lower security class object) or she can then copy this
information on to this particular object in the higher security class. So, this means a
lower integrity object is affecting a higher integrity object, and therefore the Biba model
prevents such flow of information.
524
So to take an example, let us say the hierarchy in the military where you have a general
right on top, then the captains and the privates who are right at the bottom of the
hierarchy (mentioned in above slide).
Now, the Biba model allows read up meaning a document which is prepared by the
general should be read by all, that is a document which is created by the general should
be read by the captain as well as the privates. However, no read down is permitted, that is
a document written or modified by the privates at the lower end of the hierarchy should
not affect the general’s decision.
525
So, we see that in spite of having such flow mechanisms, where information can be
restricted in how they flow that is we can prevent information flow between security
classes. However, due to various reasons and due to bugs or other flaws in the design, it
may be possible for information to actually flow between security classes, in spite of
having such robust measures like the access control measures that we just discussed.
So, what we will do in the next video is that, we will look at such techniques where
information can actually flow in spite of having such mechanisms in place. So, we will
look at control flow hacking, essentially buffer overflows and how they could be used to
allow an unauthorized user gaining information from a system.
Thank you.
526
Week – 08
Lecture – 37
Operating System Security (Buffer Overflows)
Hello. In this video, we will talk about Buffer Overflows. Essentially, Buffer Overflows
is a vulnerability in the system and it is not just restricted to the operating system, but it
could be pertaining to any application that runs in the system. Now buffer overflows is
vulnerability that allows malicious applications to enter into the systems, even though
they do not have a valid access. Essentially it would allow unauthorized access into the
system.
So let us look at Buffer Overflows in this particular lecture.
So, when we look at how an unauthorized user or an unauthorized attacker could gain
access into the system so, we see that it is just by flaws present. There are two types of
flaws that a system can have (mentioned in above slide image). One it could have Bugs
present in the application or the operating system in this particular case or it could have
527
flaws due to the Human factor. When we for instance browse the internet where we see
many such web pages opening and prompting us to click on particular things which
would take us to may be a malicious website and a result of that would cause malicious
applications to be downloaded into the system.
Another one, which is more pertaining to the operating system, is when there are bugs in
the operating system code. Now, modern day operating systems especially the ones that
we typically use on a desktop and servers are extremely large pieces of code. For
instance, the current Linux kernel has over ten million lines of code and all these codes
are obviously written by programmers and will have numerous bugs. So, these bugs are
not very easy to detect; however, if an attacker decides to look, he could find such a bug
and he could then exploit this bug in the operating system to gain access into the OS.
And as you know that once the attacker gains access into the OS, he will be able to do
various things like he will be able to execute various components of the operating system
code, he could control all the resources present in the system, he could also control
which users execute in the system and so on. Thus, an unauthorized access through a bug
in the operating system is a very critical aspect.
528
So there are a number of bugs that an attacker can exploit in order to gain unauthorized
access into the operating system. So, here is the list of some of them (mentioned in above
slide image). So, there could be buffer overflows in the stack of the program or in the
OS, in the heap, there may be something known as Return-to-libc attacks. There are
double frees, essentially this occurs when a single memory location which is dynamically
allocated through something like a malloc gets freed more than once, there are integer
overflow bugs, and there are format string bugs.
So there are essentially numerous different ways that bugs can be exploited by an
attacker to enter into the system. So, what we will be seeing today are the bugs in the
stack and something known as a Return-to-libc attack which essentially is a variant of
the buffer overflow attack in the stack.
So, in order to understand how the buffer overflows work in the stack, we first need to
know how a stack is managed. So let us see how the user’s stack of a process’s is
managed.
529
So let us say we take this very simple example (as mentioned in above slide) which has
two functions the main() and in this main function, we invoke another function with
parameters 1, 2 and 3 (i.e function(1, 2, 3)). And this function just allocates two buffers -
buffer 1[5] bytes and buffer 2[10] bytes. So, as we know when we execute this program
in the system, the operating system creates a process comprising of various things like
the instruction area containing the text or the various instructions of this particular
program, the data section, the heap as well as the stack.
The stack in particular is used for passing parameters from one function to another and it
is also used to store local variables. So let us see how the stack is used in this particular
example. So let us say that this is the stack (mentioned in above slide image) and this
stack corresponds to when the main function is executing. Now when main() wants to
invoke this function over here that is function(1, 2, 3); it begins to push something on to
the stack. So, what is pushed onto the stack we will see now?
So, first the 3 parameters 1, 2 and 3 which are passed from main to function (mentioned
in program) are pushed on to the stack that is parameters for function 1, 2 and 3 would be
pushed onto the stack. Then the return address is pushed on to the stack. So, this return
address (as mentioned in stack) will point to the instruction that follows this function
530
invocation (i.e calling function mentioned in main()). As we know in order to invoke
function in an x86 space processor, the instruction that is used is the call. So the return
address will point to the next instruction following the call.
So, after the return address (mentioned in stack) something known as the previous frame
pointer is push onto the stack, this frame pointer points to the frame corresponding to the
main() function. So, this is the frame which is used to when the function is executed
(green part of stack as mentioned in above slide image), while this frame (blue part of
stack) is used when main is executed. Now after the previous frame pointer is used, the
local variables which are defined in function are then allocated. In this case, we have two
character arrays which are allocated 1 is of size 5 and the other is of size 10 bytes i.e
buffer 1[5] and buffer 2[10].
So besides all of this, we have 2 CPU registers which are used to manage this stack
pointer, one is the frame pointer which is typically the register bp in Intel x86, and the
other one is the stack pointer or sp in the x86 nomenclature. Now the frame pointer
points to the current functions framed (green part frame). So, it actually points to this
particular thing (green part frame of stack) corresponding to the frame for function().
Now after this function (mentioned in program) completes its execution and returns, this
previous frame pointer is loaded into the register bp, therefore the frame pointer will then
point over here that is the frame corresponding to the function main (blue part frame).
Now the stack pointer on the other hand points to the bottom of the stack.
531
Now, let us look at this in more detailed. Let us say that this is the stack (mentioned in
above slide image) and this is the address for the various stack locations (first column in
stack) and this is the data stored in that particular address (second column in stack). So
let us assume that the top of the stack is 1000 i.e address and it decrements downwards.
So, this was the stack (stack mentioned in above image in green) corresponding to the
function when it is invoked. So, we first see that there are the parameters that are passed
to the function are pushed onto the stack; this is the parameters 3, 2 and 1 (mentioned in
stored data in stack above) which are pushed onto the stack. So, we note that each of
these parameters since they have defined as integer in this function are given 4 bytes. So
the integer ‘a’, which is passed to function would start at the address location 997 and
from there be 4 bytes 997, 998, 999 and 1000. Similarly, the second and third parameters
also take 4 bytes (refer in above image).
The return address for this function (function in program) at essentially the point at
which the function has to return is also given 4 bytes (i.e address 988 to 985). While the
base pointer since it is a 32 bit system is also given 4 bytes (i.e address 984 to 981), then
we have the buffer1[5] which is allocated as a local of the function, which is given 5
bytes 976 to 980 and then buffer2[10] is allocated 10 bytes. So, these two arrays are the
locals of the function. The base pointer points to this particular location (i.e address 984
532
to 981) and the stack pointer points over here (mentioned in slide image) to the address
number 964.
Now, let us look at some very simple aspects. So let us say what would happen if we
print this particular line, so printf(“%x”, buffer2). So, as we know buffer 2 corresponds
to the address of this particular array, so this particular printf statement would print the
address of buffer 2. So, if we look up the stack we see that the start address of buffer 2 is
966; therefore, this printf will print 966.
Now what happens if we do something like this, printf(“%x”, &buffer2[10])? So, we

know that buffer 2 is of 10 bytes and will have indexes from 0 to 9; now buffer 2[10] is
966 (i.e address) plus 10 which is 976. So, what is going to be printed over here is 976
(i.e address of buffer2[10]). Now it so happens that 976 is outside the region of buffer 2,
in fact 976 is in buffer 1 (refer slide time 10:42). Therefore, what we are getting now is
that we are printing an address which is outside buffer 2, and this is what is known as a
Buffer Overflow.
Essentially, we have defined a buffer of 10 bytes, but we are accessing data which is
outside the buffer 2 area. So, we are accessing the 10th, 11th, 12th and so on byte. So,
533
this is known as a buffer overflow. Now, what we will see next is how this buffer
overflow can be exploited by an attacker, and how an attacker could then force a system
to execute his own code.
Now one important thing from the attackers' perspective is the return address. If the
attacker could somehow fill this buffer 2 in such a way that he would cause a buffer
overflow, and modify this particular return address (address mentioned in stack), then let
us see what would happen. So let us say he makes this particular statement. So,
buffer2[19] is some arbitrary memory location. So, what the attacker is doing is that he is
forcing this buffer 2 to overflow and he is overflowing it in such a way that the return
address which was stored onto the stack is replaced with his own filled location (i.e
address 988 to 985 is replaced with arbitrary location).
After the function completes executing, it would look into this location and instead of
getting the valid return address it would get this arbitrary location, and then it would go
to this arbitrary location and start to execute code. So, what we would see is that instead
of returning to the main function as would be expected in the normal program, since the
attacker has changed this return address to some arbitrary location, the processor would
then cause this instructions corresponding to this arbitrary location to be fetched and
534
execute it. So, now, it looks quite obvious what the attacker could do in order to create an
attack.
So essentially what the attacker is going to do is that he is going to change this return
address the valid return address present in the stack with a pointer to an attack code (i.e
attacker’s code pointer). Therefore, when the function returns, instead of taking the
standard return address it would pick the attackers code pointer (as mentioned in stack in
place of valid return address); and as a result, it would then cause the attackers code to
begin to execute. So now, we will see how the attacker could change the return address
with it is own code pointer.
535
In order to do this, what we assume is that the attacker has access to this particular buffer
(buffer mentioned above in green stack). So, this means that the attacker will be able to
fill this particular buffer as required. For instance, this buffer could be say passed
through a system call, for instance, if the operating system would require that a character
string be passed through the system call by the user, and thus the attacker will be able to
fill that character string and pass it to the kernel. So, what the attacker is going to do is to
create a very specific string that it has the exploit code, and it is also able to force the
operating system or for that matter any application to execute this exploit code. So, how
it is going to do it is as follows.
So, in the buffer, the attacker will do two things; first he will put the exploit code, that is
the code which the attacker wants to execute in the lower most region of the buffer, and
then begin to overflow the buffer (mentioned in above slide image). Essentially what he
is going to put is this address location BA. Now BA here is the address of this exploit
code. So, it is assumed or if a smart attacker will be able to determine what the address
location is of buffer, and he will overflow the buffer with BA. So, he keeps overflowing
the buffer with BA, and when this happens at one particular case the return address
present in the stack is changed from the valid return address to B A.
536
Thus, when the function returns what the CPU is going to see is the address BA in the
return address location. Thus it is going to take this address BA and start executing code
from that. So, since BA corresponds to the address where the exploit code is present, the
CPU would then begin to execute this exploit code and in this way, the attacker could
force the CPU or the processor to execute the exploit code.
So, now we will see how a one particular attack code is created, and how an attacker can
force an application or an operating system to execute that exploit code. So let us take a
very simple example of the exploit code, which is shown over here (mentioned in above
slide). Essentially this exploit code which we call as shell code does nothing but only
executes a particular shell, the shell is specified by /bin/sh.
And this particular function execve is invoked in order to execute this shell. The
parameters are name[0] which comprises of the executable name and there is name[1]
which is essentially a null terminated string. So, we will see how this particular code can
be forced to be executed by an unauthorized attacker. So the first question that needs to
be asked is how does this attacker manage to put this code onto the stack?
537
The first step in doing so is that the attacker needs to obtain the binary data
corresponding to this particular program (first program in above slide image). In order to
do this, what the attacker does is that he will re-write this program in assembly code. So,
this assembly code as we see here (assembly code mentioned in above image) does
exactly the same thing as done by this program (first basic program).
The next thing what the attacker would do is to compile this particular assembly code
and get what is known as the object dump (file pointed by assembly code in above
image). So the object dump is obtained by running this particular command thing (first
command mentioned above). So, he first compiles this particular code to get what is
known as the shellcode.o which is the object file, and then he will run this particular
command which is objdump-disassemble-all shellcode.o to get this particular file (file
pointed by assembly code in above image).
Now, what is important for us over here is this particular column or the second column
(mentioned in above file). So the numbers what you see over here, the hexadecimal
numbers are in fact the machine code for this program (first program). So the numbers
like eb 1e, 5e, 89 76 08, and so on correspond to the machine code of this particular
program. In other words, if the attacker manages to put this machine code onto the stack
538
and is able to force execution to this particular machine code, then the attacker would be
able to execute the shell as required.
So the machine code is shown over here (second point mentioned in slide time 18:48)
and one thing which is required for this particular attack is to replace all the 0’s present
in this machine code with some other instructions, so that you do not have any 0’s
present over here.
The next thing is to scan the entire application in order to find one location which can be
exploited for a buffer overflow. So, essentially the requirements for a buffer overflow is
that the attacker finds in the application code a command such as this a strcpy(buffer,
large_string) where buffer is a small array and it is defined locally in the stack, while the
large string is a much larger array (mentioned in above slide image). So, as we know the
way this particular function strcpy() works is that the large string gets copied to buffer
and this copying will continue byte by byte until there is a /0 which is found in large
string; in which case the strcpy will complete executing and will return.
So let us assume that the attacker has found such a case where we have the buffer, a large
string and a string copy, and the buffer is a small array defined onto the stack and how
539
does the attacker make use of this.
So, what the attacker would then do is create something known as the shell code array
which essentially is the code that he wants to execute. So, he creates the shell code array
comprising of all the assembly op codes or machine codes which it wants to execute and
he places this code or this shell code in the first part of the large string. So, if you look at
this the large string array, which is a very large string of 128 bytes in this case, gets the
shell code in the first part (mentioned in above slide image).
540
Then he computes what the address of the buffer should be, and fills the remaining part
of the large string with the buffer address (as mentioned in above slide image).
Now he needs to force the string copy to execute with this buffer and the large string
which he has just created as shown over here (i.e strcpy(buffer, large_string)). Now as a
541
result of this string copy being executed, there is a stack frame created for the strcpy()
and as we know that the strcpy() will continue to copy bytes from the large string to the
buffer until it finds the /0. So, in such a way what would happen onto the stack is that the
large string gets copied.
So first, the shell code gets copied and then the buffer address keeps getting overflowed
on to the buffer and keeps going on until a /0 is found (stack shown in above slide
image). So, when the strcpy() executes what we have seen is that instead of getting the
valid return address, it now gets what is known as the buffer address. So, essentially we
know that the buffer address points to this particular location (i.e shellcode location), and
therefore, the CPU would be forced to return to this location pointed to by the buffer
address and execute the shell code. As a result the attacker would be able to execute the
shell code which in this particular example was the exploit.
So let us look at the entire thing all together. So we have the shell code which is the code
which the attacker wants to execute (shellcode mentioned in above slide image). Over
here (in above slide), we have just defining it as a global array, but in reality this could
be entered through various things like a scanf() or it could also come in through the
network card, essentially we passed a packet with particular format containing the
542
exploit and various other different ways of passing in the shell code.
Next, let us assume that somewhere in the application there is this particular code (code
mentioned in above slide inside main()). We have the large string and we have short a
string which is buffer, which is a local array locally defined array and therefore, gets
created onto the stack.
So, what we first do is somehow manage to fill the long string or the large string with the
address of buffer. So, if you recollect, this is the BA parts which are present (first for
loop instruction), then we will copy the shell code on to the large string (second for loop
instruction). So, we have created this large string in the format that we require. In the
first part of this large string is the shell code, and then it is followed by the buffer return
address. And then if there is a function like strcpy() which copies large string into buffer
it will result in a buffer overflow to occur and instead of the function returning to this
particular point soon after string copy.
On the other hand, execute this particular shell code and cause this shell specified by this
command to be executed i.e /bin/sh. So, if we actually see this, if we execute gcc
overflow1.c and run ./a.out instead of just doing this strcpy and exiting, this particular
program created a new shell due to the exploit code that is executing.
543
So, buffer overflows are an extremely well known bug and extremely exploited by
various different malware and viruses over the last decade or actually more than a
decade. And one of the first viruses that actually use the buffer overflow was the worm
called CODERED which was released on July 13th 2001. So, this created a massive
chaos all over the world, and the red spots actually show how the virus spread across the
world in about or rather in less than a day or a few hours. So, we see that this particular
virus which used the buffer overflow infected roughly 359000 computers by July 19th
2001.
544
So essentially the targeted application by this worm or this particular worm was the
Microsoft’s IIS web server and the string which was executed was as shown over here
(mentioned in above slide image). So, this string was actually the exploit code which was
executed and what it resulted was something like this being displayed in the web
browser.
Thank you.
545
Week – 08
Lecture – 38
Preventing Buffer Overflow Attacks
Hello. In the previous video, we had seen how attacker essentially an unauthorized user
of the system could cause his own code to execute in the system by exploiting what was
known as a buffer overflow bug. So we had seen, at least we had created such a buffer
overflow bug and shown how the attacker could create his exploit code and run that
exploit code which created a shell in the system.
So, this particular example (mentioned in below slide image) was an application based
example in the user space, but very similar kind of exploits can be written in the
operating system. So, what we will be seeing in this video are techniques of how this
buffer overflow vulnerability is overcome, and also how the attack has progressed over
the years and evolved over the years, to make more powerful attacks in order to
overcome these protections.
546
The first and the most obvious way to prevent the buffer overflow attack which occurs in
the stack is by making the stack pages non-executable. What we seen is that the attacker
would force the CPU to execute an exploit code which is also present in the stack. So, for
instance over here (inside main() in above image), this buffer which is defined on the
stack also contain the exploit code and the string copy was executed in such a way that
after string copy completed its execution, it would cause the exploit code that is the shell
code which is present on the stack to be executed.
So, one obvious way to prevent this attack is to make the stack pages non-executable, so
and this is what is done in systems that are used these days. So, if you would actually run
this particular program on your Intel systems, and instead of getting the exploit code to
execute, you would get a segmentation fault. This would be caused because the program
is trying to execute some instructions onto the stack.
In Intel machines, an NX bit is present in the page tables to mark the stack as non-
executable. While this works for most of the programs that is in most programs it is an
added benefit, but the problem is some programs even though they may not be malicious
require to execute from the stack. So these programs need to execute from the stack in
order to function properly. Therefore, setting the NX bit is not always very beneficial for
547
all programs.
So the next thing, we are going to ask ourselves is that if we make the stack as non-
executable, will it completely prevent buffer overflow attacks and in fact, it does not.
Over the years buffer overflow attacks have evolved to something known as the return to
libc attacks, which could be used even on systems which have a non-executable stack.
548
So let us see in very brief how a return to libc attack works (refer above slide image). So,
essentially what we have done, when we try to overflow the buffer in the stack is that we
had the exploit code present onto the stack, and we had replaced the valid return address
with the address of the buffer. Now this did not work for us in modern day systems,
because the stack was set to non-executable. So let us look at how return to libc works in
spite of having the non-executable stack that is in spite of having the NX bit set.
549
So, what the return to libc does is that instead of forcing the return address to branch to a
location within the stack, it forces the return address to go to some other location
containing some other function. Let us say this function is F 1 (mentioned in above slide
image). So, what is filled onto the stack through a buffer overflow is a pointer to F 1 i.e
F1ptr therefore when the function completes executing, the return address taken from
here is a F1ptr, and therefore, the CPU is forced to execute this function F 1.
Now, the next question is - what is this function F 1 (blue box in above slide)? So, one
thing is certain that this function cannot be the attacker's own exploit code. So, it has to
be some valid function which is already present in the code segment and which has the
permission to execute. So, what would this function F 1 be? That is point number 1; and
point number 2 is how will an attacker use a normal function which is present in the
program to do something malicious such as to run and exploit code?
550
There are various ways in which the function F1 can be implemented, but what we will
see today is the function F 1 implemented using this particular function called system().
Now, system is a function which is present in the library called libc, and what it does is
that it takes a character pointer to a string which would take an executable. So the string
which is passed to system takes an executable name (i.e /bin/bash as mentioned in above
slide) and when executed it would essentially execute this particular program(i.e bash).
So, in this particular example (i.e system(“/bin/bash”), what system would do is that it
would execute the bash shell, thereby creating a bash shell. Now libc is a library, which
is used by most programs. So, even a normal “hello world” program that you write
would quite likely use the libc library. So, what this means is that in your process’s
address space, there is the function system(). So, what the attacker would need to do now
is to just identify in your process’s address space where the system() function resides.
Next he needs to somehow pass an argument to this (system function) in order to run a
program.
So, suppose let us say if we are continuing the example, what we seen in the previous
video where the attacker creates an exploit which executes a shell. So, in this case also, if
the attacker wants to execute the same shell, then in addition to finding the systems
551
functions address in the memory space, the attacker would somehow needs to pass this
particular string /bin/sh as a parameter to the system. So, this means that the attacker
would need to find an address that points to the string /bin/sh.
So, what is typically done is have a stack frame something like this (mentioned in above
slide image) where the return address or rather the valid return address is replaced by
F1ptr which is a pointer to the system function which is present in libc and if your
program uses the dynamically linked library libc, then such an address will be valid.
Second, what the attacker needs to do is to pass an argument to the system function,
which essentially has an executable name.
So, essentially somewhere in your address space (blue box mentioned in above image),
the attacker needs to find this particular string being present that is /bin/bash or /bin/sh,
and fill in the stack with the pointer (i.e Shell ptr) to this particular string. So, essentially,
it requires two things; one is the return address in the stack to be modified with the
address of the system call which is present in libc, and also the parameter passed to libc
should be present on the stack. So, in this case it is a Shell ptr which points to this
particular string /bin/bash.
552
So, once this is done, when the function returns instead of returning to the valid address,
it is going to cause the CPU to execute this particular system. Now during this process,
also the pointer to this particular shell /bin/bash would be considered in the system call
and it would result in a shell being formed. So, from the shell, the attacker could then
spawn various other things and run his own programs. So, in this way, the return to libc
attack works in spite of having a stack which is non-executable. So, note that we are not
executing any of these instructions (mentioned in the buffer stack); we are just reading
and writing to the stack while the real execution occurs in the code segment itself by the
function system().
Now, the limitation of the return to libc is that it is extremely difficult to execute some
arbitrary code. So, we had seen one example of how the attacker could execute a shell,
but the amount that an attacker could do with the return to libc type of attack is very
limited. And therefore, the attacks over the period of time have evolved to something
more stronger.
553
And more recently, there is something known as the Return Oriented Programming
Attacks. So, this is one of the most powerful attacks that are known which utilize buffer
overflows. So, this is also applicable for systems which have non-executable stack. The
return oriented program or also for short known as the ROP attack was discover by
Hovav Shacham in the Stanford University and it allows any arbitrary computation or
any arbitrary code to be injected into the program in spite of having a non-executable
stack. So let us see with a very small example how this thing works.
554
So let us say that the code that the attacker wants to execute is given by these set of
assembly lines. So, essentially if the attacker manages to execute these assembly codes,
then his job is done, he will be able to run whatever exploit he wants. Now what the ROP
attack does is use something known as the Gadget.
Now gadgets essentially mean splitting this particular code or the payload as it is called,
into small components which are known as Gadgets. So, it has been shown that a variety
of different payloads could be executed just by using gadgets, and therefore we could
have a variety of different exploit codes that have been used by the attacker. So let us see
what a gadget is?
555
So essentially a gadget is some useful instruction, and useful instruction over here it
means that it is one of these instructions in the payload that needs to be executed as part
of the exploit followed by a 'ret' - a return (mentioned in above slide image). So, ret as
you know in the Intel instruction set it means the return from the particular function. So,
this very simple thing is - what is a gadget? So now, what the attacker needs to do is
corresponding to the payload that he wants to execute, he needs to scan the entire binary
code of the executable in order to find such useful gadgets.
556
So, for instance, if we take this particular payload (refer above slide image), the attacker
may find that somewhere in the programs binary that is among all the instructions that
are present, there is this particular gadget (i.e movl %esi, x8(%esi) ret) present which
does almost the same thing what is required that is this particular instruction in the
payload is movl %esi, x8(%esi); and this exact same thing is present over here (G1 in
program binary mentioned above).
Similarly, what the attacker would do is he would scan the other parts of the program
binary in order to find more such gadgets. So the second line in the payload is present in
the gadget 2 (refer slide time 14:00); the third line is present here (above G1) and so on.
So, essentially what he would do is force this payload to execute by using gadgets. So,
what he is going to overflow the stack with is a chain of gadgets, so essentially the stack
is going to contain G 1 then G 2, G 3 and G 4.
So these gadgets are or the addresses to these gadgets are organized in such a way that
when the first valid return address is met instead of finding the valid return address, the
address of gadget 1 is found and as a result this instruction (i.e movl %esi, 0x8(%esi))
corresponding to gadget 1 is executed. And then there is a return and the stack is
arranged in such a way that gadget 2 will then execute, then gadget 3, gadget 4 and so
557
on. So, in total although the instructions are not in contiguous locations what the attacker
has managed to do is he has managed to execute all these instructions (payload
instructions). So, in this way, he is able to execute his payload and it has been shown that
a large set of such gadgets are feasible in programs.
So, other precautions for buffer overflows, is to use a programming language such as
java which automatically checks array bounds. So, this will ensure that no array is
accessed out of its limits. Another way is to use more secure libraries. For example, the C
11 specification annex K, specifies these securer libraries to be used (mentioned in above
slide image). So, for example, the gets_s, strcpy_s, strcnpy_s, so all these have the same
functionality as the standard functions that we use, but these functions are secure and
would prevent buffer overflows.
558
Another popular use to prevent buffer overflows is by the use of Canaries. So, a canary is
a known pseudo random value which is placed onto the stack in order to detect buffer
overflows. What is done is that at the start of the function a canary is inserted onto the
stack; push some canary value on to the stack as shown over here (inserted canary in
function mentioned in above image). So, in addition to the parameters, the return
address, the frame pointer, we have now a canary also (Stack mentioned in above slide).
And just before returning or leaving the function, the canary is checked to find out
whether it is modified.
Now if a buffer overflow occurred then as we know the buffer overflow would modify
the addresses in the stack, and as a result the canary value would be modified, and
therefore, we would be able to detect a change in the canary value if a buffer overflow
occurred. And therefore, we will be able to perhaps stop the program. So, these days in
recent versions of the gcc compiler, such canaries are implemented by default. So, this
being said the entire use of canary is evaded if the canary value is known, that is if the
attacker manages to know what the canary value is used then he could just change or set
this value of canary in such a way that this canary is not changed at all and therefore, its
use is limited.
559
Another way to prevent this particular attack something known as the Address Space
Randomization or ASLR, in this particular counter measure technique it uses the fact that
the attackers need to know specific locations in the code. In the return to libc attack for
instance, the attacker needed to know where the function F 1 or in our example where the
function system() was located in the program space.
Now if we had ASLR enabled then the layout of the address space is randomized
therefore, the attacker would find it difficult to determine where exactly the function is
present that needs to be exploited. In other words, the attacker would find it difficult to
find out where the function system() is present in the address space. Thus it would make
the attack much more difficult.
Thank you.
560
THIS BOOK IS NOT FOR SALE
NOR COMMERCIAL USE
PH : (044) 2257 5905/08 nptel.ac.in swayam.gov.in

To Operating: Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

To Operating: Systems

Uploaded by

Copyright:

Available Formats

INTRODUCTION

Prof. Chester Rebeiro

S.No Topic Page No.

(Refer Slide Time: 00:57)

(Refer Slide Time: 02:51)

(Refer Slide Time: 03:31)

(Refer Slide Time: 07:15)

(Refer Slide Time: 10:12)

(Refer Slide Time: 11:15)

(Refer Slide Time: 12:35)

(Refer Slide Time: 14:13)

(Refer Slide Time: 16:45)

(Refer Slide Time: 00:50)

(Refer Slide Time: 03:10)

For instance, if you have 4 GB of RAM or 8 or 16 GB of RAM then this extended

(Refer Slide Time: 09:41)

Essentially, what is maintained even in the modern systems is the backward

(Refer Slide Time: 18:38)

(Refer Slide Time: 21:03)

(Refer Slide Time: 26:10)

(Refer Slide Time: 00:23)

To define it formally, a process is a program under execution which is executed from

(Refer Slide Time: 04:07)

(Refer Slide Time: 06:40)

(Refer Slide Time: 07:25)

(Refer Slide Time: 08:40)

(Refer Slide Time: 10:30)

(Refer Slide Time: 11:29)

(Refer Slide Time: 13:40)

(Refer Slide Time: 14:52)

(Refer Slide Time: 16:14)

Another common structure of the operating system is known as the Microkernel

(Refer Slide Time: 01:32)

(Refer Slide Time: 03:10)

(Refer Slide Time: 04:46)

So a better way to do is something known as Multiprogramming. So, in an operating

(Refer Slide Time: 06:41)

(Refer Slide Time: 10:52)

So essentially, in order to avoid such an issue, the operating systems needs to

(Refer Slide Time: 17:46)

Essentially, operating systems solve these problems by using a technique known as

(Refer Slide Time: 19:27)

(Refer Slide Time: 23:00)

(Refer Slide Time: 25:08)

(Refer Slide Time: 32:00)

(Refer Slide Time: 33:46)

(Refer Slide Time: 00:56)

(Refer Slide Time: 02:45)

(Refer Slide Time: 04:25)

(Refer Slide Time: 05:21)

So, the drawbacks are quiet obvious.

(Refer Slide Time: 05:25)

(Refer Slide Time: 06:35)

(Refer Slide Time: 09:34)

(Refer Slide Time: 11:18)

(Refer Slide Time: 12:37)

(Refer Slide Time: 17:51)

(Refer Slide Time: 00:31)

(Refer Slide Time: 03:23)

(Refer Slide Time: 06:29)

(Refer Slide Time: 14:10)

(Refer Slide Time: 18:12)

(Refer Slide Time: 20:19)