You are on page 1of 10

February 2011

Master of Computer Application (MCA) – Semester 5
MC0085 – Advanced Operating Systems
(Distributed Systems) – 4 Credits
(Book ID: B0967)
Assignment Set – 1 (60 Marks)

Answer All Questions Each Question Carries FIFTEEN Marks

1. Explain the following:
A) Features of a Message Passing System
B) Buffering
2. Discuss the implementation of RPC Mechanism in detail.
3. Explain the following:
A) Distributed Shared Memory Systems B) Memory Consistency models
4. Describe the Clock Synchronization Algorithms and Distributed Algorithms in the context of
Synchronization.

With Lots of Luck
February 2011
Master of Computer Application (MCA) – Semester 5
MC0085 – Advanced Operating Systems
(Distributed systems) – 4 Credits
(Book ID: B0967)
Assignment Set – 2 (60 Marks)

Answer all Questions Each Question carries FIFTEEN Marks

1. Describe the following:

A) Physical Clocks

B) Lamport Clocks
C) Mutual Exclusion

2. What are the Issues in Load-Sharing Algorithms? Discuss in detail .
3. Describe:
A) Stateful Versus Stateless Servers B) Replication
C) Caching

4. Describe the following :
A) Process Migration B) Threads

1
Assignment Set – 1(ANSWER-1a)

Message passing is the paradigm of communication where messages are sent from a sender to one or more recipients. Forms of
messages include (remote) method invocation, signals, and data packets. When designing a message passing system several
choices are made:

Whether messages are transferred reliably
Whether messages are guaranteed to be delivered in order
Whether messages are passed one-to-one, one-to-many (multicasting or broadcasting), or many-to-one (client–server).
Whether communication is synchronous or asynchronous.

Prominent theoretical foundations of concurrent computation, such as the Actor model and the process calculi are based on
message passing. Implementations of concurrent systems that use message passing can either have message passing as an
integral part of the language, or as a series of library calls from the language. Examples of the former include many distributed
object systems. Examples of the latter include Microkernel operating systems pass messages between one kernel and one or
more server blocks, and the Message Passing Interface used in high-performance computing.
Message passing systems and models
Distributed object and remote method invocation systems like ONC RPC, Corba, Java RMI, DCOM, SOAP, .NET Remoting,
CTOS, QNX Neutrino RTOS, OpenBinder, D-Bus and similar are message passing systems.

Message passing systems have been called "shared nothing" systems because the message passing abstraction hides underlying
state changes that may be used in the implementation of sending messages.

Message passing model based programming languages typically define messaging as the (usually asynchronous) sending
(usually by copy) of a data item to a communication endpoint (Actor, process, thread, socket, etc.). Such messaging is used in
Web Services by SOAP. This concept is the higher-level version of a datagram except that messages can be larger than a
packet and can optionally be made reliable, durable, secure, and/or transacted.

Messages are also commonly used in the same sense as a means of interprocess communication; the other common technique
being streams or pipes, in which data are sent as a sequence of elementary data items instead (the higher-level version of a
virtual circuit).
Examples of message passing style
#Actor model implementation
#Amorphous computing
#Flow-based programming
#SOAP (protocol)
Synchronous versus asynchronous message passing
Synchronous message passing systems require the sender and receiver to wait for each other to transfer the message. That is,
the sender will not continue until the receiver has received the message.

Synchronous communication has two advantages. The first advantage is that reasoning about the program can be simplified in
that there is a synchronisation point between sender and receiver on message transfer. The second advantage is that no
buffering is required. The message can always be stored on the receiving side, because the sender will not continue until the
receiver is ready.

Asynchronous message passing systems deliver a message from sender to receiver, without waiting for the receiver to be ready.
The advantage of asynchronous communication is that the sender and receiver can overlap their computation because they do
not wait for each other.

Synchronous communication can be built on top of asynchronous communication by ensuring that the sender always wait for
an acknowledgement message from the receiver before continuing.

The buffer required in asynchronous communication can cause problems when it is full. A decision has to be made whether to
block the sender or whether to discard future messages. If the sender is blocked, it may lead to an unexpected deadlock. If
messages are dropped, then communication is no longer reliable.

2
Assignment Set – 1(ANSWER-1b)
In computer science, a buffer is a region of memory used to temporarily hold data while it is being moved from one place to
another. Typically, the data is stored in a buffer as it is retrieved from an input device (such as a Mouse) or just before it is sent
to an output device (such as Speakers). However, a buffer may be used when moving data between processes within a
computer. This is comparable to buffers in telecommunication. Buffers can be implemented in either hardware or software, but
the vast majority of buffers are implemented in software. Buffers are typically used when there is a difference between the rate
at which data is received and the rate at which it can be processed, or in the case that these rates are variable, for example in a
printer spooler or in online video streaming.

A buffer often adjusts timing by implementing a queue (or FIFO) algorithm in memory, simultaneously writing data into the
queue at one rate and reading it at another rate.
Applications
Buffers are often used in conjunction with I/O to hardware, such as disk drives, sending or receiving data to or from a network,
or playing sound on a speaker. A line to a rollercoaster in an amusement park shares many similarities. People who ride the
coaster come in at an unknown and often variable pace, but the roller coaster will be able to load people in bursts (as a coaster
arrives and is loaded). The queue area acts as a buffer: a temporary space where those wishing to ride wait until the ride is
available. Buffers are usually used in a FIFO (first in, first out) method, outputting data in the order it arrived.
Buffer versus cache
A cache often also acts as a buffer, and vice versa. However, cache operates on the premise that the same data will be read from
it multiple times, that written data will soon be read, or that there is a good chance of multiple reads or writes to combine to
form a single larger block. Its sole purpose is to reduce accesses to the underlying slower storage. Cache is also usually an
abstraction layer that is designed to be invisible.

A 'Disk Cache' or 'File Cache' keeps statistics on the data contained within it and commits data within a time-out period in
write-back modes. A buffer does none of this.

A buffer is primarily used for input, output, and sometimes very temporary storage of data that is either enroute between other
media or data that may be modified in a non-sequential manner before it is written (or read) in a sequential manner.

Good examples include:

The BUFFERS command/statement in CONFIG.SYS of DOS.
The buffer between a serial port (UART) and a MODEM. The COM port speed may be 38400 bps while the MODEM may
only have a 14400 bps carrier.
The integrated buffer on a Hard Disk Drive, Printer or other piece of hardware.
The Framebuffer on a video card

Assignment Set – 1(ANSWER-2)
In computer science, a remote procedure call (RPC) is an inter-process communication that allows a computer program to
cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network)
without the programmer explicitly coding the details for this remote interaction. That is, the programmer writes essentially the
same code whether the subroutine is local to the executing program, or remote. When the software in question uses object-
oriented principles, RPC is called remote invocation or remote method invocation.

Note that there are many different (often incompatible) technologies commonly used to accomplish this.
Message passing
An RPC is initiated by the client, which sends a request message to a known remote server to execute a specified procedure
with supplied parameters. The remote server sends a response to the client, and the application continues its process. There are
many variations and subtleties in various implementations, resulting in a variety of different (incompatible) RPC protocols.
While the server is processing the call, the client is blocked (it waits until the server has finished processing before resuming
execution).

An important difference between remote procedure calls and local calls is that remote calls can fail because of unpredictable
network problems. Also, callers generally must deal with such failures without knowing whether the remote procedure was
actually invoked. Idempotent procedures (those that have no additional effects if called more than once) are easily handled, but
enough difficulties remain that code to call remote procedures is often confined to carefully written low-level subsystems.
Sequence of events during a RPC
The client calls the Client stub. The call is a local procedure call, with parameters pushed on to the stack in the normal way.

3
The client stub packs the parameters into a message and makes a system call to send the message. Packing the parameters is
called marshalling.
The kernel sends the message from the client machine to the server machine.
The kernel passes the incoming packets to the server stub.
Finally, the server stub calls the server procedure. The reply traces the same steps in the reverse direction.

Assignment Set – 1(ANSWER-3a)

Distributed Shared Memory (DSM), also known as a distributed global address space (DGAS), is a concept in computer
science that refers to a wide class of software and hardware implementations, in which each node of a cluster has access to
shared memory in addition to each node's non-shared private memory.

Software DSM systems can be implemented in an operating system, or as a programming library. Software DSM systems
implemented in the operating system can be thought of as extensions of the underlying virtual memory architecture. Such
systems are transparent to the developer; which means that the underlying distributed memory is completely hidden from the
users. In contrast, Software DSM systems implemented at the library or language level are not transparent and developers
usually have to program differently. However, these systems offer a more portable approach to DSM system implementation.

Software DSM systems also have the flexibility to organize the shared memory region in different ways. The page based
approach organizes shared memory into pages of fixed size. In contrast, the object based approach organizes the shared
memory region as an abstract space for storing shareable objects of variable sizes. Another commonly seen implementation
uses a tuple space, in which the unit of sharing is a tuple.

Shared memory architecture may involve separating memory into shared parts distributed amongst nodes and main memory; or
distributing all memory between nodes. A coherence protocol, chosen in accordance with a consistency model, maintains
memory coherence.
Examples of such systems include:

Kerrighed
OpenSSI
MOSIX
Terracotta
TreadMarks
DIPC
Assignment Set – 1(ANSWER-3b)
Memory consistency model
Order in which memory operations will appear to execute
#What value can a read return?
Affects ease-of-programming and performance
An implementation of a memory consistency model is often stricter than the model would allow. For example, SC allows the
possibility of a read returning a value that hasn’t been written yet (see example discussed under 3.2 Sequential Consistency).
Clearly, no implementation will ever exhibit an execution with such a history. In general, it is often simpler to implement a
slightly stricter model than its definition would require. This is especially true for hardware realizations of shared memories
[AHJ91, GLL+90]
The memory consistency model of a shared-memory multiprocessor provides a formal specification of how the memory system
will appear to the programmer, eliminating the gap between the behavior expected by the programmer and the actual behavior
supported by a system. Effectively, the consistency model places restrictions on the values that can be returned by a read in a
shared-memory programexecution. Intuitively, a read should return the value of the “last” write to the same memory location.
In uniprocessors, “last” is precisely defined by program order, i.e., the order in which memory operations appear in the
program. This is not the case in multiprocessors. For example, in Figure 1, the write and read of the Data field within a record
are not related by program order because they reside on two different processors. Nevertheless, an intuitive extension of the
uniprocessor model can be applied to the multiprocessor case. This model is called sequential consistency. Informally,
sequential consistency requires that all memory operations appear to execute one at a time, and the operations of a single
processor appear to execute in the order described by that processor’s program. Referring back to the program in Figure 1, this
model ensures that the reads of the data field within a dequeued record will return the new values written by processor P1.
Sequential consistency provides a simple and intuitive programming model. However, it disallows many hardware and
compiler optimizations that are possible in uniprocessors by enforcing a strict order among shared memory operations. For this
reason, a number of more relaxed memory consistency models have been proposed, including some that are supported by

4
commercially available architectures such as Digital Alpha, SPARC V8 and V9, and IBM PowerPC. Unfortunately, there has
been a vast variety of relaxed consistency models proposed in the literature that differ from one another in subtle but important
ways. Furthermore, the complex and non-uniform terminology that is used to describe these models makes it difficult to
understand and compare them. This variety and complexity also often leads to misconceptions about relaxed memory
consistency models.

Assignment Set – 1(ANSWER-4)
Clock Synchronization Algorithms
Clock synchronization algorithms may be broadly classified as Centralized and Distributed:
Centralized Algorithms
In centralized clock synchronization algorithms one node has a real-time receiver. This node, called the time server node whose
clock time is regarded as correct and used as the reference time. The goal of these algorithms is to keep the clocks of all other
nodes synchronized with the clock time of the time server node. Depending on the role of the time server node, centralized
clock synchronization algorithms are again of two types – Passive Time Sever and Active Time Server.
1. Passive Time Server Centralized Algorithm: In this method each node periodically sends a message to the time
server. When the time server receives the message, it quickly responds with a message (“time = T”), where T is the current time
in the clock of the time server node. Assume that when the client node sends the “time = ?” message, its clock time is T0, and
when it receives the “time = T” message, its clock time is T1. Since T0 and T1 are measured using the same clock, in the
absence of any other information, the best estimate of the time required for the propagation of the message “time = T” from the
time server node to the client’s node is (T1-T0)/2. Therefore, when the reply is received at the client’s node, its clock is
readjusted to T + (T1-T0)/2. 2. Active Time Server Centralized Algorithm: In this approach, the time server periodically
broadcasts its clock time (“time = T”). The other nodes receive the broadcast message and use the clock time in the message for
correcting their own clocks. Each node has a priori knowledge of the approximate time (Ta) required for the propagation of the
message “time = T” from the time server node to its own node, Therefore, when a broadcast message is received at a node, the
node’s clock is readjusted to the time T+Ta. A major drawback of this method is that it is not fault tolerant. If the broadcast
message reaches too late at a node due to some communication fault, the clock of that node will be readjusted to an incorrect
value. Another disadvantage of this approach is that it requires broadcast facility to be supported by the network.
2. Another active time server algorithm that overcomes the drawbacks of the above algorithm is the Berkeley algorithm
proposed by Gusella and Zatti for internal synchronization of clocks of a group of computers running the Berkeley UNIX. In
this algorithm, the time server periodically sends a message (“time = ?”) to all the computers in the group. On receiving this
message, each computer sends back its clock value to the time server. The time server has a priori knowledge of the
approximate time required for the propagation of a message from each node to its own node. Based on this knowledge, it first
readjusts the clock values of the reply messages, It then takes a fault-tolerant average of the clock values of all the computers
(including its own). To take the fault tolerant average, the time server chooses a subset of all clock values that do not differ
from one another by more than a specified amount, and the average is taken only for the clock values in this subset. This
approach eliminates readings from unreliable clocks whose clock values could have a significant adverse effect if an ordinary
average was taken. The calculated average is the current time to which all the clocks should be readjusted, The time server
readjusts its own clock to this value, Instead of sending the calculated current time back to other computers, the time server
sends the amount by which each individual computer’s clock requires adjustment, This can be a positive or negative value and
is calculated based on the knowledge the time server has about the approximate time required for the propagation of a message
from each node to its own node.
Centralized clock synchronization algorithms suffer from two major drawbacks:
1. They are subject to single – point failure. If the time server node fails, the clock synchronization operation cannot be
performed. This makes the system unreliable. Ideally, a distributed system, should be more reliable than its individual nodes. If
one goes down, the rest should continue to function correctly.
2. From a scalability point of view it is generally not acceptable to get all the time requests serviced by a single time
server. In a large system, such a solution puts a heavy burden on that one process.
Distributed Algorithms
We know that externally synchronized clocks are also internally synchronized. That is, if each node’s clock is independently
synchronized with real time, all the clocks of the system remain mutually synchronized. Therefore, a simple method for clock
synchronization may be to equip each node of the system with a real time receiver so that each node’s clock can be
independently synchronized with real time. Multiple real time clocks (one for each node) are normally used for this purpose.
Theoretically, internal synchronization of clocks is not required in this approach. However, in practice, due to inherent
inaccuracy of real-time clocks, different real time clocks produce different time. Therefore, internal synchronization is
normally performed for better accuracy. One of the following two approaches is used for internal synchronization in this case.
1. Global Averaging Distributed Algorithms: In this approach, the clock process at each node broadcasts its local clock
time in the form of a special “resync” message when its local time equals T0+iR for some integer I, where T0 is a fixed time in
the past agreed upon by all nodes and R is a system parameter that depends on such factors as the total number of nodes in the
system, the maximum allowable drift rate, and so on. i.e. a resync message is broadcast from each node at the beginning of
every fixed length resynchronization interval. However, since the clocks of different nodes run slightly different rates, these
broadcasts will not happen simultaneously from all nodes. After broadcasting the clock value, the clock process of a node waits

5
for time T, where T is a parameter to be determined by the algorithm. During this waiting period, the clock process records the
time, according to its own clock, when the message was received. At the end of the waiting period, the clock process estimates
the skew of its clock with respect to each of the other nodes on the basis of the times at which it received resync messages. It
then computes a fault-tolerant average of the next resynchronization interval.
2. The global averaging algorithms differ mainly in the manner in which the fault-tolerant average of the estimated
skews is calculated. Two commonly used algorithms are: 1. The simplest algorithm is to take the average of the estimated
skews and use it as the correction for the local clock. However, to limit the impact of faulty clocks on the average value, the
estimated skew with respect to each node is compared against a threshold, and skews greater than the threshold are set to zero
before computing the average of the estimated skews. 2. In another algorithm, each node limits the impact of faulty clocks by
first discarding the m highest and m lowest estimated skews and then calculating the average of the remaining skews, which is
then used as the correction for the local clock. The value of m is usually decided based on the total number of clocks (nodes).

Assignment Set – 2(ANSWER-1a)
Let us introduce a physical time coordinate into our space-time picture, and let Ci(t) denote the reading of the clock Ci at
physical time t. 8 For mathematical convenience, we assume that the clocks run continuously rather than in discrete "ticks." (A
discrete clock can be thought of as a continuous one in which there is an error of up to ½ "tick" in reading it.) More precisely,
we assume that Ci(t) is a continuous, differentiable function of t except for isolated jump discontinuities where the clock is
reset. Then dCg(t)/dt represents the rate at which the clock is running at time t. In order for the clock Cg to be a true physical
clock, it must run at approximately the correct rate. That is, we must have dCi(t)/dt -~ 1 for all t. More precisely, we will
assume that the following condition is satisfied: PCI. There exists a constant x << 1 such that for all i: [dCg(t)/dt - 1 [ < x. For
typical crystal controlled clocks, x _< 10 -(~. It is not enough for the clocks individually to run at approximately the correct
rate. They must be synchronized so that Cg(t) = C/(t) for all i,j, and t. More precisely, there must be a sufficiently small
constant e so that the following condition holds:
PC2. For all i, j: [ Ci(t) - Cy(t)[ < •. If we consider vertical distance in Figure 2 to represent physical time, then PC2 states that
the variation in height of a single tick line is less than E. Since two different clocks will never run at exactly the same rate, they
will tend to drift further and further apart. We must therefore devise an algorithm to insure that PC2 always holds. First,
however, let us examine how small x and • must be to prevent anomalous behavior. We must insure that the system 5e of
relevant physical events satisfies the Strong Clock Condition. We assume that our clocks satisfy the ordinary Clock Condition,
so we need only require that the Strong Clock Condition holds when a and b are events in 0 ° with a 4-> b. Hence, we need
only consider events occurring in different processes. Let # be a number such that if event a occurs at physical time t and event
b in another process satisfies a ~ b, then b occurs later than physical time t + bt. In other words,/~ is less than the shortest
transmission time for interprocess messages. We can always choose # equal to the shortest distance between processes divided
by the speed of light. However, depending upon how messages in ~ are transmitted, # could be significantly larger. To avoid
anomalous behavior, we must make sure that for any i, j, and t: Ci(t + #) - CAt) > 0. Combining this with PC I and 2 allows us
to relate the required smallness of x and ~ to the value of # as follows. We assume that when a clock is reset, it is always set
forward and never back. (Setting it back could cause C I to be violated.) PCI then implies that Cg(t + #) - Cg(t) > (1 - x)#.
Using PC2, it is then easy to deduce that Cg(t + #) - C/(t) > 0 if the following inequality holds: E/(I - ~) _< ~. This inequality
together with PC 1 and PC2 implies that anomalous behavior is impossible. We now describe our algorithm for insuring that
PC2 holds. Let m be a message which is sent at physical time t and received at time t'. We define l, m ~- t t -- I to be the total
delay of the message m. This delay will, of course, not be known to the process which receives m. However, we assume that
the receiving process knows some minimum delay tzm >_ 0 such that ~£m ~ Pro. We call ~,, = I,m -- #m the unpredictable
delay of the message. We now specialize rules IRI and 2 for our physical clocks as follows: IR 1'. For each i, if Pi does not
receive a message at physical time t, then C/is differentiable at t and dCg(t)/dt >0. IR2'. (a) If Pg sends a message m at physical
time t, then m contains a timestamp Tm= C/(t). (b) Upon receiving a message m at time t', process P/ sets C/(t') equal to
maximum (Cj(t' - 0), Tm + /Zm). 9 Although the rules are formally specified in terms of the physical time parameter, a process
only needs to know its own clock reading and the timestamps of messages it receives. For mathematical convenience, we are
assuming that each event occurs at a precise instant of physical time, and different events in the same processoccur at different
times. These rules are then specializations of rules IR1 and IR2, so our system of clocks satisfies the Clock Condition. The fact
that real events have a finite duration causes no difficulty in implementing the algorithm. The only real concern in the
implementation is making sure that the discrete clock ticks are frequent enough so C 1 is maintained. We now show that this
clock synchronizing algorithm can be used to satisfy condition PC2. We assume that the system of processes is described by a
directed graph in which an arc from process Pi to process P/represents a communication line over which messages are sent
directly from Pi to P/. We say that a message is sent over this arc every T seconds if for any t, Pi sends at least one message to
P/between physical times t and t + -r. The diameter of the directed graph is the smallest number d such that for any pair of
distinct processes P/, Pk, there is a path from P/to P~ having at most d arcs. In addition to establishing PC2, the following
theorem become synchronized when the system is first started. THEOREM. Assume a strongly connected graph of processes
with diameter d which always obeys rules IR 1' and IR2'. Assume that for any message m, #m --< # for some constant g, and
that for all t > to: (a) PC 1 holds. (b) There are constants ~" and ~ such that every ~- seconds a message with an unpredictable
delay less than ~ is sent over every arc. Then PC2 is satisfied with • = d(2x~- + ~) for all t > to + Td, where the approximations
assume # + ~<< z. The proof of this theorem is surprisingly difficult, and is given in the Appendix. There has been a great deal

6
of work done on the problem of synchronizing physical clocks. We refer the reader to [4] for an intro- :) C/(t' - 0) = lim C,(t' -
181).

Assignment Set – 2(ANSWER-1b)
In a distributed system, it is not possible in practice to synchronize time across entities (typically thought of as processes)
within the system; hence, the entities can use the concept of a logical clock based on the events through which they
communicate.
If two entities do not exchange any messages, then they probably do not need to share a common clock; events occurring on
those entities are termed as concurrent events.
Among the processes on the same local machine we can order the events based on the local clock of the system.
When two entities communicate by message passing, then the send event is said to 'happen before' the receive event, and the
logical order can be established among the events.
A distributed system is said to have partial order if we can have a partial order relationship among the events in the system.
If 'totality', i.e., causal relationship among all events in the system can be established, then the system is said to have total order.

Assignment Set – 2(ANSWER-1c)
Mutual exclusion (often abbreviated to mutex) algorithms are used in concurrent programming to avoid the simultaneous use of
a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of
code in which a process or thread accesses a common resource. The critical section by itself is not a mechanism or algorithm
for mutual exclusion. A program, process, or thread can have the critical section in it without any mechanism or algorithm
which implements mutual exclusion.

Examples of such resources are fine-grained flags, counters or queues, used to communicate between code that runs
concurrently, such as an application and its interrupt handlers. The synchronization of access to those resources is an acute
problem because a thread can be stopped or started at any time.

To illustrate: suppose a section of code is altering a piece of data over several program steps, when another thread, perhaps
triggered by some unpredictable event, starts executing. If this second thread reads from the same piece of data, the data, which
is in the process of being overwritten, is in an inconsistent and unpredictable state. If the second thread tries overwriting that
data, the ensuing state will probably be unrecoverable. These shared data being accessed by critical sections of code must,
therefore, be protected, so that other processes which read from or write to the chunk of data are excluded from running.

A mutex is also a common name for a program object that negotiates mutual exclusion among threads, also called a lock.

Assignment Set – 2(ANSWER-2)
Several researchers believe that load balancing, with its implication of attempting to equalize workload on all the nodes of the
system, is not an appropriate objective. This is because the overhead involved in gathering the state information to achieve this
objective is normally very large, especially in distributed systems having a large number of nodes. In fact, for the proper
utilization of resources of a distributed system, it is not required to balance the load on all the nodes. It is necessary and
sufficient to prevent the nodes from being idle while some other nodes have more than two processes. This rectification is
called the Dynamic Load Sharing instead of Dynamic Load Balancing.
The design of a load sharing algorithms require that proper decisions be made regarding load estimation policy, process transfer
policy, state information exchange policy, priority assignment policy, and migration limiting policy. It is simpler to decide
about most of these policies in case of load sharing, because load sharing algorithms do not attempt to balance the average
workload of all the nodes of the system. Rather, they only attempt to ensure that no node is idle when a node is heavily loaded.
The priority assignments policies and the migration limiting policies for load-sharing algorithms are the same as that of load-
balancing algorithms.

Assignment Set – 2(ANSWER-3a)
Stateful Versus Stateless Servers
The file servers that implement a distributed file service can be stateless or Stateful. Stateless file servers do not store any
session state. This means that every client request is treated independently, and not as a part of a new or existing session.

7
Stateful servers, on the other hand, do store session state. They may, therefore, keep track of which clients have opened which
files, current read and write pointers for files, which files have been locked by which clients, etc.
The main advantage of stateless servers is that they can easily recover from failure. Because there is no state that must be
restored, a failed server can simply restart after a crash and immediately provide services to clients as though nothing
happened. Furthermore, if clients crash the server is not stuck with abandoned opened or locked files. Another benefit is that
the server implementation remains simple because it does not have to implement the state accounting associated with opening,
closing, and locking of files.
The main advantage of Stateful servers, on the other hand, is that they can provide better performance for clients. Because
clients do not have to provide full file information every time they perform an operation, the size of messages to and from the
server can be significantly decreased. Likewise the server can make use of knowledge of access patterns to perform read-ahead
and do other optimizations. Stateful servers can also offer clients extra services such as file locking, and remember read and
write positions.

Assignment Set – 2(ANSWER-3b)

Replication
The main approach to improving the performance and fault tolerance of a DFS is to replicate its content. A replicating DFS
maintains multiple copies of files on different servers. This can prevent data loss, protect a system against down time of a
single server, and distribute the overall workload. There are three approaches to replication in a DFS:
1. Explicit replication: The client explicitly writes files to multiple servers. This approach requires explicit support from the
client and does not provide transparency.
2. Lazy file replication: The server automatically copies files to other servers after the files are written. Remote files are only
brought up to date when the files are sent to the server. How often this happens is up to the implementation and affects the
consistency of the file state.
3. Group file replication: write requests are simultaneously sent to a group of servers. This keeps all the replicas up to date, and
allows clients to read consistent file state from any replica.

Assignment Set – 2(ANSWER-3c)

Caching
Besides replication, caching is often used to improve the performance of a DFS. In a DFS, caching involves storing either a
whole file, or the results of file service operations. Caching can be performed at two locations: at the server and at the client.
Server-side caching makes use of file caching provided by the host operating system. This is transparent to the server and helps
to improve the server’s performance by reducing costly disk accesses.
Client-side caching comes in two flavours: on-disk caching, and in-memory caching. On-disk caching involves the creation of
(temporary) files on the client’s disk. These can either be complete files (as in the upload/download model) or they can contain
partial file state, attributes, etc. In-memory caching stores the results of requests in the client-machine’s memory. This can be
process-local (in the client process), in the kernel, or in a separate dedicated caching process. The issue of cache consistency in
DFS has obvious parallels to the consistency issue in shared memory systems, but there are other tradeoffs (for example, disk
access delays come into play, the granularity of sharing is different, sizes are different, etc.). Furthermore, because write-
through caches are too expensive to be useful, the consistency of caches will be weakened. This makes implementing Unix
semantics impossible. Approaches used in DFS caches include, delayed writes where writes are not propagated to the server
immediately, but in the background later on, and write-on-close where the server receives updates only after the file is closed.
Adding a delay to write-on-close has the benefit of avoiding superfluous writes if a file is deleted shortly after it has been
closed.

Assignment Set – 2(ANSWER-4a)
Migration of a process is a complex activity that involves proper handling of several sub-activities in order to meet
the requirements of a good process migration mechanism. The four major subactivities involved in process
migration are as follows:
1. Freezing the process and restarting on another node.
2. Transferring the process’ address space from its source node to its destination node
3. Forwarding messages meant for the migrant process

8
4. Handling communication between cooperating processes that have been separated as a result of process
migration.

Assignment Set – 2(ANSWER-4b)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It
generally results from a fork of a computer program into two or more concurrently running tasks. The implementation of
threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process.
Multiple threads can exist within the same process and share resources such as memory, while different processes do not share
these resources. In particular, the threads of a process share the latter's instructions (its code) and its context (the values that its
variables reference at any given moment). To give an analogy, multiple threads in a process are like multiple cooks reading off
the same cook book and following its instructions, not necessarily from the same page.

On a single processor, multithreading generally occurs by time-division multiplexing (as in multitasking): the processor
switches between different threads. This context switching generally happens frequently enough that the user perceives the
threads or tasks as running at the same time. On a multiprocessor or multi-core system, the threads or tasks will actually run at
the same time, with each processor or core running a particular thread or task.

Many modern operating systems directly support both time-sliced and multiprocessor threading with a process scheduler. The
kernel of an operating system allows programmers to manipulate threads via the system call interface. Some implementations
are called a kernel thread, whereas a lightweight process (LWP) is a specific type of kernel thread that shares the same state and
information.

Programs can have user-space threads when threading with timers, signals, or other methods to interrupt their own execution,
performing a sort of ad-hoc time-slicing.

9