Spring 2012 Master of Computer Application (MCA) – Semester V MC0085 – Advanced Operating Systems (Distributed Systems) – 4 Credits (Book

ID: B 0967) Assignment Set – 1

1. Describe the following: o Distributed Computing Systems o Distributed Computing System Models Answer:
Distributed Computing Systems
Over the past two decades, advancements in microelectronic technology have resulted in the availability of fast, inexpensive processors, and advancements in communication technology have resulted in the availability of cost-effective and highly efficient computer networks. The advancements in these two technologies favour the use of interconnected, multiple processors in place of a single, high-speed processor. Computer architectures consisting of interconnected, multiple processors are basically of two types: tightly coupled systems, there is a single system wide primary memory (address space) that is shared by all the processors (Fig. 1.1). If any processor writes, for example, the value 100 to the memory location x, any other processor subsequently reading from location x will get the value 100. Therefore, in these systems, any communication between the processors usually takes place through the shared memory. loosely coupled systems, the processors do not share memory, and each processor has its own local memory (Fig. 1.2). If a processor writes the value 100 to the memory location x, this write operation will only change the contents of its local memory and will not affect the contents of the memory of any other processor. Hence, if another processor reads the memory location x, it will get whatever value was there before in that location of its own local memory. In

these systems, all physical communication between the processors is done by passing messages across the network that interconnects the processors.

Usually, tightly coupled systems are referred to as parallel processing systems, and loosely coupled systems are referred to as distributed computing systems, or simply distributed systems. In contrast to the tightly coupled systems, the processors of distributed computing systems can be located far from each other to cover a wider geographical area. Furthermore, in tightly coupled systems, the number of processors that can be usefully deployed is usually small and limited by the bandwidth of the shared memory. This is not the case with distributed computing systems that are more freely expandable and can have an almost unlimited number of processors.

Hence, a distributed computing system is basically a collection of processors interconnected by a communication network in which each processor has its own local memory and other peripherals, and the communication between any two processors of the system takes place by message passing over the communication network. For a particular processor, its own resources are local, whereas the other processors and their resources are remote. Together, a processor and its resources are usually referred to as a node or site or machine of the distributed computing system.

Distributed Computing System Models

Distributed Computing system models can be broadly classified into five categories. They are Minicomputer model Workstation model Workstation – server model Processor – pool model Hybrid model

Minicomputer Model :-

The minicomputer model (Fig. 1.3) is a simple extension of the centralized time-sharing system. A distributed computing system based on this model consists of a few minicomputers (they may be large supercomputers as well) interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on to it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputers. The network allows a user to access remote resources that are available on some machine other than the one on to which the user is currently logged. The minicomputer model may be used when resource sharing (such as sharing of information databases of different types, with each type of database located on a different machine) with remote users is desired. The early ARPAnet is an example of a distributed computing system based on the minicomputer model.

Workstation Model:-

A distributed computing system based on the workstation model (Fig. 1.4) consists of several workstations interconnected by a communication network. An organization may have several workstations located throughout a building or campus, each workstation equipped with its own disk and serving as a single-user computer. It has been often found that in such an environment, at any one time a significant proportion of the workstations are idle (not being used), resulting in the waste of large amounts of CPU time. Therefore, the idea of the workstation model is to interconnect all these workstations by a high-speed LAN so that idle workstations may be used to process jobs of users who are logged onto other workstations and do not have sufficient processing power at their own workstations to get their jobs processed efficiently.

In this model, a user logs onto one of the workstations called his or her "home" workstation and submits jobs for execution. When the system finds that the user's workstation does not have sufficient processing power for executing the processes of the submitted jobs efficiently, it transfers one or more of the processes from the user's workstation to some other workstation that is currently idle and gets the process executed there, and finally the result of execution is returned to the user's workstation.

Workstation – Server Model:-

The workstation model is a network of personal workstations, each with its own disk and a local file system. A workstation with its own local disk is usually called a diskful workstation and a workstation without a local disk is called a diskless workstation. With the proliferation of highspeed networks, diskless workstations have become more popular in network environments than diskful workstations, making the workstation-server model more popular than the workstation model for building distributed computing systems.

A distributed computing system based on the workstation-server model (Fig. 1.5) consists of a few minicomputers and several workstations (most of which are diskless, but a few of which may be diskful) interconnected by a communication network.

In this model, a user logs onto a workstation called his or her home workstation. Normal computation activities required by the user's processes are performed at the user's home workstation, but requests for services provided by special servers (such as a file server or a database server) are sent to a server providing that type of service that performs the user's requested activity and returns the result of request processing to the user's workstation. For better overall system performance, the local disk of a diskful workstation is normally used for such purposes as storage of temporary files, storage of unshared files, storage of shared files that are rarely changed, paging activity in virtual-memory management, and caching of remotely accessed data.

Processor – Pool Model:The processor-pool model is based on the observation that most of the time a user does not need any computing power but once in a while the user may need a very large amount of computing power for a short time (e.g., when recompiling a program consisting of a large number of files after changing abasic shared declaration). Therefore, unlike the workstationserver model in which a processor is allocated to each user, in the processor-pool model the

processors are pooled together to be shared by the users as needed. The pool of processors consists of a large number of microcomputers and minicomputers attached to the network. Each processor in the pool has its own memory to load and run a system program or an application program of the distributed computing system. The pure processor-pool model (Fig. 1.6), the processors in the pool have no terminals attached directly to them, and users access the system from terminals that are attached to the network via special devices. These terminals are either small diskless workstations or graphic terminals, such as X terminals. A special server (called a run server) manages and allocates the processors in the pool to different users on a demand basis. When a user submits a job for computation, an appropriate number of processors are temporarily assigned to his or her job by the run server. For example, if the user's computation job is the compilation of a program having n segments, in which each of the segments can be compiled independently to produce separate relocatable object files, n processors from the pool can be allocated to this job to compile all the n segments in parallel. When the computation is completed, the processors are returned to the pool for use by other users. In the processor-pool model there is no concept of a home machine. That is, a user does not log onto a particular machine but to the system as a whole. This is in contrast to other models in which each user has a home machine (e.g., a workstation or minicomputer) onto which he or she logs and runs most of his or her programs there by default.

Amoeba proposed by Mullender et al. in 1990 is an example of distributed computing systems based on the processor-pool model.

Hybrid Model:-

Out of the four models described above, the workstation-server model, is the most widely used model for building distributed computing systems. This is because a large number of computer users only perform simple interactive tasks such as editing jobs, sending electronic mails, and executing small programs. The workstation-server model is ideal for such simple usage. However, in a working environment that has groups of users who often perform jobs needing massive computation, the processor-pool model is more attractive and suitable. To combine the advantages of both the workstation-server and processor-pool models, a hybrid model may be used to build a distributed computing system. The hybrid model is based on the workstation-server model but with the addition of a pool of processors. The processors in the pool can be allocated dynamically for computations that are too large for workstations or that requires several computers concurrently for efficient execution. In addition to efficient execution of computation-intensive jobs, the hybrid model gives guaranteed response to interactive jobs by allowing them to be processed on local workstations of the users. However, the hybrid model is more expensive to implement than the workstation-server model or the processor-pool model

2. Describe the following with respect to Remote Procedure Calls: o The RPC Model o STUB Generation Answer:
The RPC Model

The RPC mechanism is an extension of a normal procedure call mechanism. It enables a call to be made to a procedure that does not reside in the address space of the calling process. The called procedure may be on a remote machine or on the same machine. The caller and callee have separate address space; so called procedure has no access to the caller’s environment.

Implementation of RPC Mechanism
To achieve the goal of semantic transparency, the implementation of RPC is based on the concept of stubs. Stubs provide a perfectly normal local procedure call abstraction. It conceals

from programs the interface to the underlying RPC system. On the client side and the server side, a separate stub procedure is associated with each. To hide the existence of functional details of the underlying network, an RPC communication package (called RPC runtime) is used in both the client and server sides.

Thus implementation of an RPC mechanism involves the following five elements: 1. The Client 2. The Client stub 3. The RPC Runtime 4. The server stub, and 5. The server

The job of each of these elements is described below:

1. Client: To invoke a remote procedure, a client makes a perfectly local call that invokes the corresponding procedure in the stub 2. Client Stub:

The client stub is responsible for performing the following tasks: On receipt of a call request from the client, it packs the specification of the target procedure and the arguments into a message and asks the local runtime system to send it to the server stub. s the result and passes it to the client.

3. RPCRuntime:

The RPC runtime handles the transmission of the messages across the network between client and server machines. It is responsible for retransmissions, acknowledgements, and encryption. On the client side, it receives the call request from the client stub and sends it to the server machine. It also receives reply message (result of procedure execution) from the server machine and passes it to the client stub. On the server side, it receives the results of the procedure execution from the server stub and sends it to the client machine. It also receives the request message from the client machine and passes it to the server stub.

4. Server Stub: The functions of server stub are similar to that of the client stub. It performs the following two tasks:

perfect local call to invoke the appropriate procedure in the server.

e procedure execution received from server, and asks the local RPCRuntime to send it to the client stub. 5. Server: On receiving the call request from the server stub, the server executes the appropriate procedure and returns the result to the server stub.

STUB Generation The stubs can be generated in the following two ways: Manual Stub Generation: RPC implementer provides a set of translation functions from which user can construct his own stubs. It is simple to implement and can handle complex parameters. Automatic Stub Generation: This is the most commonly used technique for stub generation. It uses an Interface Definition Language (IDL), for defining the interface between the client and server. An interface definition is mainly a list of procedure names supported by the interface, together with the types of their arguments and results, which helps the client and server to perform compile-time type checking and generate appropriate calling sequences. An interface definition also contains information to indicate whether each argument is an input, output or both. This helps in unnecessary copying input argument needs to be copied from client to server and output needs to be copied from server to client. It also contains information about type definitions, enumerated types, and defined constants-so the clients do not have to store this information. A server program that implements procedures in an interface is said to export the interface. A client program that calls the procedures is said to import the interface. When writing a distributed application, a programmer first writes the interface definition using IDL, then can write a server program that exports the interface and a client program that imports the interface. The interface definition is processed using an IDL compiler (the IDL compiler in Sun RPC is called rpcgen) to generate components that can be combined with both client and server programs, without making changes to the existing compilers. In particular, an IDL compiler generates a client stub procedure and a server stub procedure for each procedure in the interface. It generates the appropriate marshaling and un-marshaling operations in each sub procedure. It also generates a header file that supports the data types in the interface definition to be included in the source files of both client and server. The client stubs are compiled and linked with the client program and the server stubs are compiled and linked with server program

3.Describe the following: o Distributed Shared Memory Systems (DSM) o DSM – Design & Implementation issues Answer:
Distributed Shared Memory Systems (DSM)

This is also called DSVM (Distributed Shared Virtual Memory). It is a loosely coupled distributed-memory system that has implemented a software layer on top of the message passing system to provide a shared memory abstraction for the programmers. The software layer can be implemented in the OS kernel or in runtime library routines with proper kernel

support. It is an abstraction that integrates local memory of different machines in a network environment into a single logical entity shared by cooperating processes executing on multiple sites. Shared memory exists only virtually. DSM Systems: A comparison between messages passing and tightly coupled multiprocessor systems

DSM provides a simpler abstraction than the message passing model. It relieves the burden from the programmer from explicitly using communication primitives in their programs. In message passing systems, passing complex data structures between two different processes is difficult. Moreover, passing data structures containing pointers is generally expensive in message passing model. Distributed Shared Memory takes advantage of the locality of reference exhibited by programs and improves efficiency. istributed Shared Memory systems are cheaper to build than tightly coupled multiprocessor systems. The large physical memory available facilitates running programs requiring large memory efficiently.

DSM can scale well when compared to tightly coupled multiprocessor systems. Message passing system allows processes to communicate with each other while being protected from one another by having private address spaces, whereas in DSM one can cause another to fail by erroneously altering data. When message passing is used between heterogeneous computers marshaling of data takes care of differences in data representation; how can memory be shared between computers with different integer representation. DSM can be made persistent - i.e. processes communicating via DSM may execute with overlapping lifetimes. A process can leave information in an agreed location to another process. Processes communicating via message passing must execute at the same time. Which is better? Message passing or Distributed Shared Memory? Distributed Shared Memory appears to be a promising tool if it can be implemented efficiently

As shown in the above figure, the DSM provides a virtual address space shared among processes on loosely coupled processors. DSM is basically an abstraction that integrates the local memory of different machines in a network environment into a single local entity shared by cooperating processes executing on multiple sites. The shared memory itself exists only virtually. The application programs can use it in the same way as traditional virtual memory, except that processes using it can run on different machines in parallel.

Architectural Components: Each node in a distributed system consists of one or more CPUs and a memory unit. The nodes are connected by a communication network. A simple message-passing system allows processes on different nodes to exchange messages with each other. DSM abstraction presents a single large shared memory space to the processors of all nodes. Shared memory of DSM exists only virtually. Memory map manager running at each node maps the local memory onto the shared virtual memory. To facilitate this mapping, shared-memory space is partitioned into blocks. Data caching is used to reduce network latency. When a memory block accessed by a process is not resident in local memory: a block fault is generated and control goes to the OS. the OS gets this block from the remote node and maps it to the application’s address space and the faulting instruction is restarted.

Thus data keeps migrating from one node to another node but no communication is visible to the user processes. Network traffic is highly reduced if applications show a high degree of locality of data accesses. Variations of this general approach are used for different implementations depending on whether the DSM allows replication and/or migration of shared memory.

DSM – Design and Implementation Issues The important issues involved in the design and implementation of DSM systems are as follows: Granularity: It refers to the block size of the DSM system, i.e. to the units of sharing and the unit of data transfer across the network when a network block fault occurs. Possible units are a few words, a page, or a few pages. Structure of Shared Memory Space: The structure refers to the Lay out of the shared data in memory. It is dependent on the type of applications that the DSM system is intended to support. Memory coherence and access synchronization: Coherence (consistency) refers to memory coherence problem that deals with the consistency of shared data that lies in the main memory of two or more nodes. Synchronization refers to synchronization of concurrent access to shared data using synchronization primitives such as semaphores. Data Location and Access: A DSM system must implement mechanisms to locate data blocks in order to service the network data block faults to meet the requirements of the memory coherence semantics being used. Block Replacement Policy: If the local memory of a node is full, a cache miss at that node implies not only a fetch of the accessed data block from a remote node but also a replacement. i.e. a data block of the local memory must be replaced by the new data block. Therefore a block replacement policy is also necessary in the design of a DSM system. Thrashing: In a DSM system, data blocks migrate between nodes on demand. If two nodes compete for write access to a single data item, the corresponding data block may be transferred back and forth at such a high rate that no real work can get done. A DSM system must use a policy to avoid this situation (known as Thrashing). Heterogeneity: The DSM systems built in for homogenous systems need not address the heterogeneity issue. However, if the underlying system environment is heterogeneous, the DSM system must be designed to take care of heterogeneity so that it functions properly with machines having different architectures.

4. Discuss the clock synchronization algorithms. Answer:
Clock Synchronization Algorithms Clock synchronization algorithms may be broadly classified as Centralized and Distributed: Centralized Algorithms In centralized clock synchronization algorithms one node has a real-time receiver. This node, called the time server node whose clock time is regarded as correct and used as the reference time. The goal of these algorithms is to keep the clocks of all other nodes synchronized with the clock time of the time server node. Depending on the role of the time server node, centralized clock synchronization algorithms are again of two types – Passive Time Sever and Active Time Server.

1. Passive Time Server Centralized Algorithm: In this method each node periodically sends a message to the time server. When the time server receives the message, it quickly responds with a message (“time = T”), where T is the current time in the clock of the time server node. Assume that when the client node sends the “time = ?” message, its clock time is T 0, and when it receives the “time = T” message, its clock time is T1. Since T0 and T1 are measured using the same clock, in the absence of any other information, the best estimate of the time required for the propagation of the message “time = T” from the time server node to the client’s node is (T 1T0)/2. Therefore, when the reply is received at the client’s node, its clock is readjusted to T + (T1-T0)/2. 2. Active Time Server Centralized Algorithm: In this approach, the time server periodically broadcasts its clock time (“time = T”). The other nodes receive the broadcast message and use the clock time in the message for correcting their own clocks. Each node has a priori knowledge of the approximate time (Ta) required for the propagation of the message “time = T” from the time server node to its own node, Therefore, when a broadcast message is received at a node, the node’s clock is readjusted to the time T+Ta. A major drawback of this method is that it is not fault tolerant. If the broadcast message reaches too late at a node due to some communication fault, the clock of that node will be readjusted to an incorrect value. Another disadvantage of this approach is that it requires broadcast facility to be supported by the network. Another active time server algorithm that overcomes the drawbacks of the above algorithm is the Berkeley algorithm proposed by Gusella and Zatti for internal synchronization of clocks of a group of computers running the Berkeley UNIX. In this algorithm, the time server periodically sends a message (“time = ?”) to all the computers in the group. On receiving this message, each computer sends back its clock value to the time server. The time server has a priori knowledge of the approximate time required for the propagation of a message from each node to its own node. Based on this knowledge, it first readjusts the clock values of the reply messages, It then takes a fault-tolerant average of the clock values of all the computers (including its own). To take the fault tolerant average, the time server chooses a subset of all clock values that do not differ from one another by more than a specified amount, and the average is taken only for the clock values in this subset. This approach eliminates readings from unreliable clocks whose clock values could have a significant adverse effect if an ordinary average was taken. The calculated average is the current time to which all the clocks should be readjusted, The time server readjusts its own clock to this value, Instead of sending the calculated current time back to other computers, the time server sends the amount by which each individual computer’s clock requires adjustment, This can be a positive or negative value and is calculated based on the knowledge the time server has about the approximate time required for the propagation of a message from each node to its own node. Centralized clock synchronization algorithms suffer from two major drawbacks: 1. They are subject to single – point failure. If the time server node fails, the clock synchronization operation cannot be performed. This makes the system unreliable. Ideally, a distributed system, should be more reliable than its individual nodes. If one goes down, the rest should continue to function correctly. 2. From a scalability point of view it is generally not acceptable to get all the time requests serviced by a single time server. In a large system, such a solution puts a heavy burden on that one process. Distributed algorithms overcome these drawbacks:

5. Discuss the following with respect to Resource Management in Distributed Systems: o Load – Balancing Approach

o Load – Sharing Approach Answer:
Load-Balancing Approach The scheduling algorithms that use this approach are known as Load Balancing or LoadLeveling Algorithms. These algorithms are based on the intuition that for better resource utilization, it is desirable for the load in a distributed system to be balanced evenly. Thus a load balancing algorithm tries to balance the total system load by transparently transferring the workload from heavily loaded nodes to lightly loaded nodes in an attempt to ensure good overall performance relative to some specific metric of system performance. We can have the following categories of load balancing algorithms: 1. Static: Ignore the current state of the system. e.g. If a node is heavily loaded, it picks up a task randomly and transfers it to a random node. These algorithms are simpler to implement but performance may not be good.
2. Dynamic: Use the current state information for load balancing. There is an overhead involved in collecting state information periodically; they perform better than static algorithms. 3. Deterministic: Algorithms in this class use the processor and process characteristics to allocate processes to nodes. 4. Probabilistic: Algorithms in this class use information regarding static attributes of the system such as number of nodes, processing capability, etc. 5. Centralized: System state information is collected by a single node. This node makes all scheduling decisions. 6. Distributed: Most desired approach. Each node is equally responsible for making scheduling decisions based on the local state and the state information received from other sites. 7. Cooperative: A distributed dynamic scheduling algorithm. In these algorithms, the distributed entities cooperate with each other to make scheduling decisions. Therefore they are more complex and involve larger overhead than non-cooperative ones. But the stability of a cooperative algorithm is better than that of a non-cooperative one. 8. Non-cooperative: A distributed dynamic scheduling algorithm. In these algorithms, individual entities act as autonomous entities and make scheduling decisions independently of the action of other entities. Load Estimation Policy: This policy makes an effort to measure the load at a particular node in a distributed system according to the following criteria: de.

None of the above fully captures the load at a node, other parameters such as resource demands of these processes, architecture and speed of the processor total remaining execution time of the processes, etc should be taken into consideration as well. Process Transfer Policy: The strategy of load balancing algorithms is based on the idea of transferring some processes from the heavily loaded nodes to lightly loaded nodes. To facilitate this, it is necessary to devise a policy to decide whether or not a node is lightly or heavily

loaded. The threshold value of a node is the limiting value of its workload and is used to decide whether a node is lightly or heavily loaded. The threshold value of a node may be determined by any of the following methods: 1. Static Policy: Each node has a predefined threshold value. If the number of processes exceed the predefined threshold value, a process is transferred. Can cause process thrashing under heavy load, thus causing instability. 2. Dynamic Policy: In this method, the threshold value is dynamically calculated. It is increased under heavy load and decreased under light load. Thus process thrashing does not occur. 3. High-low Policy: Each node has two threshold values, high and low. Thus, the state of a node can be overloaded, under-loaded or normal depending on the number of processes greater than high, less than low or otherwise. Location Policies: Once a decision has been made through the transfer policy to transfer a process from a node, the next step is to select the destination node for that process’ execution. This selection is made by the location policy of a scheduling algorithm. The main location policies proposed are as follows: 1. Threshold: A random node is polled to check its state and the task is transferred if it will not be overloaded; polling is continued until a suitable node is found or a threshold number of nodes have been polled. Experiment shows polling 3 to 5 five nodes performs as good as polling large number of nodes, like 20 nodes. This also has substantial performance over no load balancing at all. 2. Shortest: A predetermined number of nodes are polled and the node with minimum load among these is picked for the task transfer; if that node is overloaded the task is executed locally. 3. Bidding: In this method, each node acts as a manager (the one who tries to transfer a task) and a contractor, the one that is able to accept a new task. In this the Manager broadcasts a request-for-bids to all the nodes. A contractor returns bids (quoted price based on the processor capability, memory size, resource availability, etc). A Manager chooses the best bidder for transferring the task. Problems that could arise as a result of broadcasts of two or more managers concurrently need to be addressed. 4. Pairing: This approach tries to reduce the variance in load between pairs of nodes. In this approach, two nodes that differ greatly in load are paired with each other so they can exchange tasks. Each node asks a randomly picked node if it will pair with it. After a pairing is formed, one or more processes are transfered from heavily loaded node to the lightly loaded node. State Information Exchange Policies: The dynamic policies require frequent change of state information among the nodes of the system. In fact, a dynamic load-balancing algorithm faces a transmission dilemma because of the two opposing impacts the transmission of a message has on the overall performance of the system. On one hand, transmission improves the ability of the algorithm to balance the load. On the other hand, it raises the expected queuing time of messages because of the increase in the utilization of the communication channel. Thus proper selection of the state information exchange policy is essential. The proposed load balancing algorithms use one of the following policies for the purpose: 1. Periodic Broadcast: Each node broadcasts its state information periodically, say every t time units. It does not scale well and causes heavy network traffic. May result in fruitless messages.

2. Broadcast When State Changes: This avoids fruitless messages. A node broadcasts its state only when its state changes. For example, when the state changes from normal to low or normal to high, etc. 3. On-Demand Exchange: Under this approach A node broadcasts a state information request when its state changes from normal load region to high or low load. Upon receiving this request, other nodes send their current state information to the requesting node. If the requesting node includes its state information in the request then, only those nodes that can cooperate with the requesting node need to send reply. 4. Exchange by Polling: In this approach the state information is exchanged with a polled node only. Polling stops after a predetermined number of polling or after a suitable partner is found, whichever happens first. 5. Priority Assignment Policies: One of the following priority assignment rules may be used to assign priorities to local and remote processes (i.e. processes that have migrated from other nodes): i) Selfish: Local processes are given higher priority than remote processes. ws this approach yields worst response time of the three policies. transferred and hence will execute as low priority processes. It favors the processes that arrive at lightly loaded nodes. ii) Altruistic: Remote processes are given higher priority than local processes processes. iii) Intermediate: If the number of local processes are more, local processes get higher priority; otherwise, remote processes get higher priority. that of the altruistic policy. range of loads. iv) Migration – Limiting Policies: This policy is used to decide about the total number of times a process should be allowed to migrate. Uncontrolled: Remote process is treated like local process. So, there is no limit on the number of nodes it can migrate. Controlled: Most systems use controlled policy to overcome the instability problem cuted process is expensive; so, many systems limit the number of migrations to 1. For long running processes, it might be beneficial to migrate more than once.

Load Sharing Approach Several researchers believe that load balancing, with its implication of attempting to equalize workload on all the nodes of the system, is not an appropriate objective. This is because the overhead involved in gathering the state information to achieve this objective is normally very

large, especially in distributed systems having a large number of nodes. In fact, for the proper utilization of resources of a distributed system, it is not required to balance the load on all the nodes. It is necessary and sufficient to prevent the nodes from being idle while some other nodes have more than two processes. This rectification is called the Dynamic Load Sharing instead of Dynamic Load Balancing. Issues in Load-Sharing Algorithms: The design of a load sharing algorithm requires that proper decisions be made regarding load estimation policy, process transfer policy, state information exchange policy, priority assignment policy, and migration limiting policy. It is simpler to decide about most of these policies in case of load sharing, because load sharing algorithms do not attempt to balance the average workload of all the nodes of the system. Rather, they only attempt to ensure that no node is idle when a node is heavily loaded. The priority assignment policies and the migration limiting policies for load-sharing algorithms are the same as that of load-balancing algorithms. 1. Load Estimation Policies: In this an attempt is made to ensure that no node is idle while processes wait for service at some other node. In general, the following two approaches are used for estimation:

Process Transfer Policies: Load sharing algorithms are interested in busy or idle states only and most of them employ the all-or-nothing strategy given below: All or Nothing Strategy: It uses a single threshold policy. A node becomes a candidate to accept tasks from remote nodes only when it becomes idle. A node becomes a candidate for transferring a task as soon as it has more than one task. Under this approach, an idle process is not able to immediately acquire a task, thus wasting processing power. To avoid this, the threshold value can be set to 2 instead of 1. Location Policies: Location Policy decides the sender node or the receiver node of a process that is to be moved within the system for load sharing. Depending on the type of node that takes the initiative to globally search for a suitable node for the process, the location policies are of the following types: 1. Sender-Initiated Policy: Under this policy, heavily loaded nodes search for lightly loaded nodes to which task may be transferred. The search can be done by sending a broadcast message or probing randomly picked nodes reshly arrived tasks, so no preemptive task transfers occur.

2. Receiver-Initiated Location Policy: Under this policy, lightly loaded nodes search for heavily loaded nodes from which tasks may be transferred randomly picked nodes. may not have any freshly arrived tasks. receiver will quickly find a sender; and under low system loads, it is OK for processes to process some additional control messages.

3. Symmetrically Initiated Location Policy: Under this approach, both senders and receivers search for receivers and senders respectively. 4. State Information Exchange Policies: Since it is not necessary to equalize load at all nodes under load sharing, state information is exchanged only when the state changes. 5. Broadcast When State Changes: A node broadcasts a state information request message when it becomes under-loaded or overloaded. -initiated approach a node broadcasts this message only when it is overloaded.

In the receiver-initiated approach, a node broadcasts this message only when it is underloaded. 6. Poll When State Changes: When a node’s state changes, e information with the polled nodes.


receiver initiated policy, receiver polls to find suitable sender. The above Average Algorithm by Krueger and Finkel (A dynamic load balancing algorithm) tries to maintain load at each node within an acceptable range of the system average. 7. Transfer Policy: A threshold policy that uses two adaptive thresholds, the upper threshold, and the lower threshold


6. Discuss the following with respect to File Systems: o Stateful Vs Stateless Servers o Caching

Stateful Vs Stateless Servers
The file servers that implement a distributed file service can be stateless or stateful. Stateless file servers do not store any session state. This means that every client request is treated independently, and not as part of a new or existing session. Stateful servers, on the other hand, do store session state. They may, therefore, keep track of which clients have opened which files, current read and write pointers for files, which files have been locked by which clients, etc. The main advantage of stateless servers is that they can easily recover from failure. Because there is no state that must be restored, a failed server can simply restart after a crash and immediately provide services to clients as though nothing happened. Furthermore, if clients crash the server is not stuck with abandoned opened or locked files. Another benefit is that the server implementation remains simple because it does not have to implement the state accounting associated with opening, closing, and locking of files. The main advantage of stateful servers, on the other hand, is that they can provide better performance for clients. Because clients do not have to provide full file information every time they perform an operation, the size of messages to and from the server can be significantly decreased. Likewisethe server can make use of knowledge of access patterns to perform readahead and do other optimisations. Stateful servers can also offer clients extra services such as file locking, and remember read and write positions.

Besides replication, caching is often used to improve the performance of a DFS. In a DFS, caching involves storing either a whole file, or the results of file service operations. Caching can be performed at two locations: at the server and at the client. Server-side caching makes use of file caching provided by the host operating system. This is transparent to the server and helps to improve the server’s performance by reducing costly disk accesses.

Client-side caching comes in two flavours: on-disk caching, and in-memory caching. On-disk caching involves the creation of (temporary) files on the client’s disk. These can either be complete files (as in the upload/download model) or they can contain partial file state, attributes, etc. In-memory caching stores the results of requests in the client-machine’s memory. This can be process-local (in the client process), in the kernel, or in a separate dedicated caching process. The issue of cache consistency in DFS has obvious parallels to the consistency issue in shared memory systems, but there are other tradeoffs (for example, disk access delays come into play, the granularity of sharing is different, sizes are different, etc.). Furthermore, because writethrough caches are too expensive to be useful, the consistency of caches will be weakened. This makes implementing Unix semantics impossible. Approaches used in DFS caches include, delayed writes where writes are not propagated to the server immediately, but in the background later on, and write-on-close where the server receives updates only after the file is closed. Adding a delay to write-on-close has the benefit of avoiding superfluous writes if a file is deleted shortly after it has been closed.

Sign up to vote on this title
UsefulNot useful