You are on page 1of 37

Distributed File System

1. 2.

Why use files? Permanent Storage of information Sharing of information

What is a File system? A subsystem of the OS that performs file management activities like storing, naming, sharing, protection.

What is DFS ? A Distributed File System ( DFS ) is simply a classical model of a file system ( as discussed before ) distributed across multiple machines. The purpose is to promote sharing of dispersed files.

A DFS makes it convenient to use files in a distributed environment. More complex than a conventional file system since the users and the storage devices are physically dispersed

Advantages of using a DFS 1. Remote information sharing : A DFS allows a file to be transparently accessed by processes of any node irrespective of the files location. 2. User mobility : Allows a user to work on different nodes at different times without the necessity of physically relocating the secondary storage devices. 3. Availability : DFS normally keeps multiple copies of file to enhance availability in case of temporary failures. Both the existence of multiple copies and their locations are hidden from the clients. 4. Diskless workstation : Disk drives more expensive than other parts of a workstation. A DFS with its transparent remote file accessing capability, allows the use of diskless workstation in a system.

Services provided by DFS : 1. Storage service : Deals with the allocation and management of space on a secondary storage device that is used for storage of files in the file system. Since magnetic disks are normally used, the service is also called disk service. 2. True file service : Concerned with operations on individual files like accessing and Modification and creating and deleting files. 3. Name Service : Provides a mapping between text names for files and the corresponding file id. File ids are difficult for human users to remember. Most file systems use directories to perform the mapping.

Two commonly used criteria for file modeling are structure and modifiability. 1. Unstructured and structured files : Unstructured files: Contents of a file appear to the file server as an uninterpreted sequence of bytes. Structured files : Contents of a file appear to the file server as an ordered sequence of records. Rarely used nowadays. 2. Mutable and Immutable Files : Files are of 2 types: Mutable : An update performed on a file overwrites its old contents to produce new contents. 2. Immutable : Rather than updating the same file , a new version of file is created each time a change is made and the old version is retained unchanged.

The manner in which a clients request to access a file is serviced depends on the file accessing model. 1. Remote Service Model : The clients request for file access is delivered to the servers node, the server performs the access request, and finally the result is forwarded to the client as messages across network. Every remote file access results in network traffic. 2. Data-caching model : Attempts to reduce the amount of network traffic by taking advantage of locality feature found in file accesses. In this model id the data needed to satisfy the clients access request is not present locally, it is copied from the servers node to the clients node and is cached there. The clients request is processed on the clients node itself by using the cached data. Hence repeated accesses to the same data can be handled locally.

Greatly reduces the n/w traffic. Overhead in write operation since in addition to modifying the local copy changes should be made in the original file at the server node.

Unit of data transfer

Accessing Remote Files

File access Merits Demerits

Remote service model

Data caching model

At a server
At a client that cached a file copy

A simple implementation
Reducing network traffic Demerits

Communication overhead
Cache consistency problem

Unit of Data Transfer

Merits Simple, less communication overhead, and immune to server A client not required to have large storage space Flexibility maximized Handling structured and indexed files A client required to have large storage space More network traffic/overhead Difficult cache management to handle the variable-length data More network traffic More overhead to re-construct a file.

Transfer level File Block Byte Record

A file caching scheme for DFS should address the following key decisions: 1. Cache location 2. Modification propagation 3. Cache validation
1. Cache location Refers to the place where the cached data is stored. Assuming the original location of a file is on its servers disk, there are 3 possible cache locations : a. Servers main memory Before a remote client can access a file, the file must first be transferred from the servers disk to the servers main memory and then across the network from the servers main memory to the clients main memory. Total cost = One Disk access cost +one n/w access cost

Adv: Easy to keep the original file and the cached data consistent since both reside on the same node. Disadv : Having the cache in the servers main memory involves a network access for each file access operation by a remote client. b. Clients disk : Have cache in a clients disk. Eliminates n/w access cost on a cache hit but requires a disk access cost. Adv: 1. Modifications to the cached data are not lost since it resides on the disk. 2. More data can be cached as compared to a main memory cache. 3. Supports disconnected operation ( client doesnt need to contact the server) by using clients disk caching and file level transfer model.

c. Clients main memory Have cache in the clients main memory. Eliminates n/w access as well as disk access cost on cache hit. Permits workstations to be diskless. Contributes to scalability and reliability since on cache hit the access request can be serviced locally without contacting the server. Not preferable when large cache size is required.

2. Modification propagation When a cache resides on the clients node, a file may be simultaneously cached on multiple nodes. When the caches of all nodes contain exactly the same copies of data, cache is said to be consistent. Cache can become inconsistent when the file data is changed by one of the clients and the corresponding data cached at other nodes are not changed. What is needed ? CONSISTENCY The approaches depend on the schemes used for the following cache design issues for DFS. 1. When to propagate modifications made to a cached data to the corresponding file server. 2. How to verify the validity of cached data

Modification Propagation schemes 1. Write Through Scheme When a cache entry is modified, the new value is immediately sent to the server for updating the master copy of the file. Adv : Since every modification is immediately propagated to the server the risk of updated data getting lost(when a client crashes) is very low. Disadv : Poor write performance since each write access has to wait until the information is written to the master copy of the server. Suitable when read-to-write accesses is large.

2. Delayedwrite scheme Write through Increases the n/w traffic for writes. To reduce n/w traffic for writes use delayed write scheme. When a cache entry is modified, the new value is written only to the cache. Some time later, all updated cache entries corresponding to a file are sent to the server. Suitable when read-to-write accesses is low. When a the modifications sent to the file server ? 1. Write on ejection from cache : Modified data is sent to the server when the cache replacement policy has decided to eject it from the clients cache. 2. Periodic write Cache is a scanned at regular intervals , any cached data that have been modified since the last scan are sent to the server.

3. Write on close : Writes data back to the server when the file is closed. Best for files that are open for long periods and frequently modified. Adv of Delayed write access : 1.Write accesses complete quickly since the new value is written only in the cache of the client performing the write. 2. Modified data may be deleted before it is time to send it to the server. Hence every modification need not be sent to the server, resulting in major performance gain. 3. Gathering all file updates and sending them together to the server is more efficient than sending each update separately. Disadv : In case of client crashes, updates are lost

3. Cache validation schemes. Modification propagation policy Only specifies when the master copy of a file at the server node is updated. Does not tell anything about when file residing in caches of other nodes is updated. It is necessary for the client to verify if data contained its cache is consistent with the master copy or not i.e its cache contains valid data or not. Approaches to verify validity of cached data : 1. Client-Initiated Approach 2. Server-Initiated Approach

1. Client Initiated Approach Client contacts the server and checks whether its locally cached data is consistent with the master copy. Following approaches can be used : 1. Checking before every access : Server needs to be contacted for every access to check validity. Defeats the purpose of caching. 2. Periodic checking : A check is initiated every fixed interval of time. 3. Check on file open : A clients cache entry is validated only when the client opens the corresponding file for use.

The validity check is performed by comparing the time of last modification of the cached version of the data with the servers master copy version. If the 2 are the same the cached data is up to date. Else current version of the data is fetched from the server.

2. Server Initiated Approach The client informs the file server when opening a file, indicating whether the file is being opened for reading, writing or both. The file server keeps track of which client has which file open and in what mode. In this manner, the server keeps monitoring the file usage modes being used by different clients and reacts whenever it detects a potential for inconsistency. A potential for inconsistency occurs when 2/more clients try to open a file in conflicting modes. Eg : If a file is open for reading, other clients may be allowed to open it for reading, but opening it for writing is not allowed. Similarly a new client should not be allowed to open a file in any mode if a file is already open for writing.

When a client closes a file , it sends an intimation to the server along with any modifications made to the file. On receiving an intimation the server updates its record. Disadv : 1. Violates traditional client server model in which servers simply respond to service request initiated by clients. 2. Requires file servers to be stateful. A stateful server loses all its volatile state in a crash. 3. A check on open client initiated approach must still be used along with the server initiated approach.

A good DFS should have high availability. How ? Use REPLICATION A replicated file is a file that has multiple copies, with each copy located on a separate file server. Difference between replication and caching 1. A replica is associated with a server, whereas a cached copy is normally associated with a client. 2. The existence of a cached copy is primarily dependent on the locality in file access patterns, whereas the existence of a replica normally depends on availability and performance requirements. 3. As compared to a cached copy, a replica is more persistent, accurate, complete and available. 4. A cached copy is contingent upon a replica. Only by periodic revalidation w.r.t a replica, a cached copy can be useful.

Multicopy update problem Multiple copies of the same file exists on different file servers Problem------ Keeping them mutually consistent Approaches used : 1. Read-Only replication Allows replication of only immutable files . Since immutable files are used only in read-only mode and because mutable files cannot be replicated, multicopy problem does not arise.

2. Read-any-write all protocol A read operation on a replicated file is performed by reading any copy of the file and a write operation by writing to all copies of the file. Locking is used to carry on writes. Before updating any copy , all copies are locked, then they are updated and finally locks are released. 3. Available-Copies Protocol Problem of read-any-write-all protocol ---write operation cannot be performed if any one of the servers having a copy of the replica is down at the time of write. The available copies protocol relaxes this restriction and allows write operations to be carried out even in this condition.

When a server recovers from failure, it brings itself up to date by copying from other servers after a failure before accepting any user request. Primary-Copy Protocol One copy is designed as the primary copy, all others are secondary copies. When a replicated file is to be updated, the change is sent to the primary copy, which make the change locally, and then sends commands to the secondaries. Reads can be done from any copy, primary or secondaries. To guard against the situation that the primary crashes before it has had a chance to instruct all the secondaries, the update should be written to stable storage prior to changing the primary copy. When a server reboots after a crash, a check can be made to see if any updates were in progress at the time of crash. If the crash is in the time of update, they can still be carried out after the reboot and all secondaries will be updated.

A quorum based protocol works as follows : Assume n copies of a replicated file F. To read the file, a minimum r copies of be consulted. This set of r copies is called the read quorum. Similarly to perform a write operation on the file a minimum of w copies have to be written. The set of w copies is called the write quorum. The restriction on the choice of r and w is that (r+w)>n. This restriction guarantess that there is a nonnull intersection between every read quorum and every write quorum. i.e. atleast one common copy of the file between every pair of read and write operations resulting atleast one up=to-date copy in any read/write quorum

Since the quorum protocol does not require that write operationsbe executed on all copies of a replicated file, some copies will be obsolete, and therefore it becomes necesarry to identify a current (up-to-date) copy in a quorum. This is achieved by associating a version number attribute with each copy. The version number of a copy is updated every time the copy is modified. A copy with the largest version number in a quorum is current. The new version number assigned to each copy is one more than the version number associated with the current copy.

A read is executed as follows : Retrive a read quorum(any r copies) of F. Of the r copies retrieved, select the copy with the largest version number. Perform the read operation on the selected copy. A write is executed as follows : Retrieve a write quorum (any w copies) of F. Of the w copies retrieved, get the version number of the copy with the largest version number. Increment the version number. Write the new value and the new version number to all the w copies of the write quorum.

Example : let there be 8 copies of the replicated file (n=8) Let r=4, w=5 , hence the condition r+w>n is satisfied. Suppose write operation is performed on the write quorum comprised of copies 3,4,5,6 and 8. All these copies get the new version and the new version Number. Now any subsequent read opeartion will require a read quorum of 4 copies because r=4. Hence the read quorum will have to contain atleast one copy of the previous write quorum.

Read quorum 1 2 7 3 6 5 In this example 3 is the common copy.

write quorum
8 4

When the version number of the copies belonging to the read quorum are seen, the version no. of copy 3 is found to be larger than the other copies of the read quorum.
Hence read operation is performed using copy no. 3

A transaction that spans across multiple servers/nodes. Example : A transaction involving an airline reservation. User/Client reservation request could go through 3 servers : 1. A server that manages the seat booking hence reducing the no. of seats available upon booking. 2. A server that charges the users credit card upon booking. 3. A server that increases number of meals loaded on flight.

Hence here if the transaction is completed, it is done on all the servers/none.

General Goal: We want an operation to be performed by all group members(servers), or none at all. 2 phase Multiserver commit protocol/Distributed Commit : One server is elected coordinator of the transaction T. And the Other servers participating in the transaction are called workers/cohorts/participants. The general protocol for committing a distributed transaction has 2 phases : 1. Preparation phase 2. Commitment phase

When the client of a distributed transaction makes an end_transaction request, the coordinator and the workers in the transaction have tentative values in their logs, describing the operations that affect their own files. The coordinator is responsible for deciding whether the transaction should be aborted/committed i.e if any server is unable to commit, the whole transaction should be aborted. Preparation phase : 1. The coordinator makes an entry in its log that it is starting the commit protocol. 2. It then sends a vote_request to all the workers telling them to preapare to commit. 3. When the worker gets the message, it checks to see if it is read to commit. If so it makes an entry in its log and replies with a vote_commit message. Otherwise it replies with a vote_abort message.

Commitment Phase: At this point the coordinator receives either a vote_commit/vote_abort message from each worker. 1. If all the workers are ready to commit, the coordinator sends a Global_commit message to all workers and the the transaction is committed. Hence the coordinator makes an entry in its log indicating that the transaction has been commited. Now the transaction is effectively completed so the coordinator can report success to the client. On the other hand if any worker replied vote_abort, the coordinator sends a Global_abort message to all workers and the the transaction is aborted. The coordinator reports failure to the client.

2. When a worker receives a Global_commit message, it makes a commited entry in its log and sends an ACK reply to the coordinator. 3. When the coordinator has received an ACK reply from all the workers, the transaction is considered complete. The coordinator keeps resending the Global_commit message until it receives an ACK reply from all the workers.

(a) (b)

The finite state machine for the coordinator in 2PC. The finite state machine for a participant.