You are on page 1of 51

PRESENTATION

ON
DISTRIBUTED FILE SYSTEM
INSTITUTE OF ENGINEERING AND TECHNOLOGY BUNDELKHAND UNIVERSITY

Submitted By:
Abhishek gaur, Akanksha singh, Akanksha Singh Maurya, Arjun yadav , Saumya katiyar
CHAPTER 12
DISTRIBUTED FILE SYSTEM
INTRODUCTION

 Distributed file system (DFS) is a method of storing and


accessing files based in a client/server architecture. In a distributed file
system, one or more central servers store files that can be accessed, with
proper authorization rights, by any number of remote clients in the
network.
 In distributed file system multiple clients share files provided by shared
file system.
 A DFS makes it possible to restrict access to the file system depending
on access lists or capabilities on both the servers and the clients,
depending on how the protocol is designed.
FILE SYSTEM MODULES

A typical layered module structure or the implementation of a


non-distributed file system. Each layer depends on only the
layer below it. The implementation of a distributed file service
requires all of the components shown here with additional
components to deal with the client-server communication and
with the distributed naming and location of files
FILE ATTRIBUTE RECORD STRUCTURE
Files contain both data and
attributes. Data consists of a
sequence of data
items(bytes) accessible by
operations to read and write
any portion of the sequence.
Attributes are held as a
single record containing
information.

The shaded attributes are


managed by the file system
and are not normally
updateable by user
programs.
UNIX FILE SYSTEM OPERATIONS
DISTRIBUTED FILE SYSTEM
REQUIREMENTS
 Transparency TRANSPARENCY:
 Concurrency Access transparency: Client
 Replication programs should be unaware of
the distribution of files.
 Heterogeneity Location transparency: Client
 Fault Tolerance program should see a uniform
 Consistency namespace. Files should be able
to be relocated without
 Security
changing their path name.
 Efficiency
Mobility transparency:
Neither client programs nor system admin program tables in
the client nodes should be changed when files are moved
either automatically or by the system admin.

 Performance transparency:
Client programs should : Client programs should continue to
perform well on load within a specified continue to perform
well on load within a specified range.

 Scaling transparency:
increase in size of storage and : increase in size of storage and
network size should be transparent. network size should be
transparent.
CONCURRENCY REPLICATION
PRROPERTIES: PROPERTIES:

Isolation File service maintains multiple


 File-level or record- identical copies of files
level locking o Load-sharing between servers
 Other forms of makes service more scalable
concurrency control to o Local access has better
minimise contention response (lower latency)
o Fault tolerance
Full replication is difficult to
implement .
Caching (of all or part of a file)
gives most of the benefits (except
fault tolerance
HETEROGENEITYPROPER CONSISTENCY:
TIES: Unix offers one-
copy update
semantics for
 Service can be accessed by clients
operations on local
running on(almost) any OS or
files-caching is
hardware platform.
Completely
 Design must be compatible with
transparent.
the file systems of different Oses.
 Service interfaces must be open-
Difficult to achieve
precise specifications of APls are
the same for
published.
distributed file
Systems while
maintaining good
performance and
FAULT TOLERANCE:
Service must continue to operate even when clients make
errors or crash.
oat-most-once semantics
o at-least-once semantics
requires idempotent operations
Service must resume after a server machine crashes
If the service is replicated, it can continue to operate even
during a server crash.

EFFICIENCY:
Goal for distributed file systems is usually performance
comparable to local file system.
FILE SERVICE ARCHITECTURE
A file service architecture has three divisions of
responsibilities:
 Flat file service: Concerned with the implementing
operations on the contents of the file. UFIDs (Unique File
Identifies) are used to refer to files. UFIDs also differentiate
between directory and file .
 Directory Service: For mapping between text names of
files and their UFIDs.
 Client Module: A client module runs in each client
computer. It integrates and extends operations of flat file service
and directory service under a single interface.
FLAT FILE SERVICE OPERATIONS
DIRECTORY SERVICE OPERATIONS
FILE GROUPS
 A collection of files that can be located on any server or moved
between servers while maintaining the same names. A file can not
change its group.
o Similar to a UNIX file system

Helps with distributing the load of file serving between


several servers.
o File groups have identifiers which are unique throughout the
system (and hence for an open system, they must be globally
unique).
o Used to refer to file groups and files
 To construct a globally unique ID we use some unique
attribute of the machine on which it is created, e.g. IP
number, even though the file group may move subsequently.
File Group ID
CASE STUDY : SUN NFS
 An industry standard for file sharing on local networks since the
1980s.
 An open standard with clear and simple interfaces
 Supports many of the design requirements already mentioned:
-transparency -heterogeneity
-efficiency -fault tolerance
 Limited achievement of:
-Concurrency -replication
-Consistency -security
NFS ARCHITECTURE
NFS ARCHITECTURE:DOES THE
IMPLEMENTATION HAVE TO BE IN THE
SYSTEM KERNEL?
 No:
there are examples of NFS clients and servers that run at application-level
as libraries or processes (e.g. early Windows and MacOS
implementations, current Pocket PC, etc.)
 But, for a Unix implementation there are advantages
 Binary code compatible- no need to recompile applications
o Standard system calls that access remote files can be routed
through the NFS client module by the kernel
o Shared cache of recently-used blocks at client
o Kernel-level server can access -nodes and file blocks directly
o but a privileged (root) application program could do almost the same
o Security of the encryption key used for authentication.
NFS SERVER OPERATIONS (NFS VERSION 3 PROTOCOL,
SIMPLIFIED)
NFS ACCESS CONTROL AND AUTHENTICATION

 Stateless server- user's identity & access rights must be checked


by server on each request.
-In local file system, checked only on open()

 Every client request is accompanied by the userlD and grouplD

 Server is exposed to imposter attacks unless userlD& grouplD are


protected by encryption.

 Kerberos- integrated with NFS to provide stronger &


comprehensive security solution.
MOUNT SERVICE
Mount operation:
mount(remotehost ,remotedirectory ,localdirectory)

 Server maintains a table of clients who have mounted file systems


at that server

 Each client maintains a table of mounted file systems holding:


<IP address, port number, file handle>
LOCAL AND REMOTE FILESYSTEMS ACCESSIBLE ON AN NFS
CLIENT

Note: The file system mounted at /usr/students in the client is actually the
subtree located at /export/people in Server 1; the file system mounted at
/usr/staff in the client is actually the subtree located at /nfs/users in Server 2.
AUTOMOUNTER
 NFS client catches attempts to access 'empty' mount points
and routes them to the Automounter
-Automounter has a table of mount points and multiple
candidate serves for each
- it sends a probe message to each candidate server and then
uses the mount service to mount the file system at the first
server to respond
 Keeps the mount table small
 Provides a simple form of replication for read-only file
systems
E.g. if there are several servers with identical copies of /usr/lib
then each server will have a chance of being mounted at some
clients.
KERBERIZED NFS
 Kerberos protocol is too costly to apply on each file access request
 Kerberos is used in the mount service:
-to authenticate the user's identity
-User's UserlD and GrouplD are stored at the server with the
client's IP address.
 For each file request:
-The UserlD and GrouplD sent must match those stored at the
server
-IP addresses must also match
 This approach has some problems:
-cant accommodate multiple users sharing the same client computer
-all remote filestores must be mounted each time a user logs in
NFS OPTIMIZATION- SERVER CACHING
 Similar to UNIX file caching for local files:
- pages (blocks) from disk are held in a main memory buffer
cache until the space is required for newer pages. Read-ahead
and delayed-write optimizations.
- For local files, writes are deferred to next sync event (30
second intervals)
-Works well in local context, where files are always accessed
through the local cache, but in the remote case it doesn't offer
necessary synchronization guarantees to clients.
NFS OPTIMIZATION- SERVER CACHING(CONT.)

 NFS v3 servers offers two strategies for updating the disk:

-write-through- altered pages are written to disk as soon as they


are received at the server. when a write()RPC returns, the NFS
client knows that the page is on the disk: non pre-emptive

- delayed commit -pages are held only in the cache until a commit()
call is received for the relevant file. This is the default mode used
by NFS v3 clients. A commit()is issued by the client whenever a
file is closed: pre-emptive
NFS OPTIMIZATION- CLIENT CACHING
 Server caching does nothing to reduce RPC traffic between client
and server

- further optimization is essential to reduce server load


- NFS client module caches the results of read, write, getattr, lookup
and readdir operations.
- synchronization of file contents (one-copy semantics)is not
guaranteed when two or more clients are sharing the same file.
NFS SUMMARY
 An excellent example of a simple, robust, high-performance
distributed service.
 Achievement of transparencies

Access: Excellent; the API is the UNIXsystem call interface for


both local and remote files.
Location : Not guaranteed but normally achieved; naming of file
systems is controlled by client mount operations, but
transparency can be ensured by an appropriate system
configuration.
 Achievement of transparencies(continued):

Concurrency :Limited but adequate for most purposes, when read-


write files are shared concurrently between clients, consistency is
not perfect.

Replication: Limited to read-only file systems; for writable files,


SUN Network Information Service(NIS) runs over NFS & is used
to replicate essential system files .

Failure: Limited but effective, service is suspended if a server


fails. Recovery from failures is aided by the simple stateless design.
 Mobility: Hardly achieved, relocation of files is not possible,
relocation of file systems is possible ,but requires updates to
client configurations.

 Performance: Good multiprocessor servers achieve very high


performance, but for a single file system it's not possible to go
beyond the throughput of a multiprocessor server.

 Scaling: Good; file systems (file groups) may be subdivided


and allocated to separate servers. Ultimately, the performance
limit is determined by the load on the server holding the most
heavily-used file system (file group)
CASE STUDY:AFS(ANDREW FILE SYSTEM)
 Like NFS, AFS provides transparent access to remote shared files for
UNIX programs running on workstations.
 AFS is compatible with NFS.
 AFS differs markedly from NFS in its design and implementation.
 AFS is designed to perform well with larger numbers of active users
than other distributed file systems. The key strategy for achieving
scalability is the caching of whole files in client nodes.
 AFS has two unusual design characteristics:
-Whole-file serving: The entire contents of directories and files are
transmitted to client computers by AFS servers .
- Whole-file caching: Once a copy of a file or a chunk has been
transferred to a client computer it is stored in a cache on the local disk.
OPERATION OF AFS(ANDREW FILE SYSTEM)

1. When a user process in a client computer issues an open system


call for a file in the shared file space and there is not a current
copy of the file in the local cache, the server holding the file is
located and is sent a request for a copy of the file.
2. The copy is stored in the local UNIX file system in the client
computer.
3. Subsequent read, write and other operations on the file by
processes in the client computer are applied to the local copy.
4. When client issues a closed,updated will sent to te server –
timestamp and content.copy on the local disk is retained for other
user in same workstation.
AFS(ANDREW FILE SYSTEM) IMPLEMENTATION

 How does AFS gain control when an open or close system call
referring to a file in the shared file space is issued by a client?

 How is the server holding the required file located?

 What space is allocated for cached files in workstations?

 How does AFS ensure that the cached copies of files are up-to-
date when files may be updated by several clients?
DISTRIBUTION OF PROCESSES IN THE ANDREW FILE SYSTEM
 AFS is implemented as two software components that exist
as UNIX processes called Vice and Venus.

-Vice: is the server software that runs as a user-level UNIX


process in each server computer.

- Venus is a user-level process that runs in each client


computer and corresponds to the client module in our abstract
model.
FILE NAME SPACE SEEN BY CLIENTS OF AFS
SYSTEM CALL INTERCEPTION IN AFS
CACHE CONSISTENCY
 When Vice supplies a copy of a file to a Venus process it also provides a
callback promise
 Callback promises are stored with the cached files on the workstation
disks and have two states: valid or cancelled.
 Whenever Venus handles an open on behalf of a client, it checks the
cache.
 If the required file is found in the cache, then its token is checked.
 If its value is cancelled, then a fresh copy of the file must be fetched
from the Vice server.
 if the token is valid, then the cached copy can be opened and used
without reference to Vice.
 When a workstation is restarted after a failure or a shutdown.
IMPLEMENTATION OF FILE SYSTEM CALLS IN AFS
THE MAIN COMPONENTS OF THE VICE SERVICE INTERFACE
ENANCEMENTS AND FURTHER DEVELOPMENTS

 NFS enhancements
 AFS enhancements
 Improvements in storage organization
 New design approaches
NFS ENHANCEMENTS

 Achieving one-copy update semantics:


(Spritely NFS, NQNFS) - Include an open() operation and maintain
tables of open files at servers, which are used to prevent multiple
writers and to generate callbacks to clients notifying them of updates.
Performance was improved by reduction in gettattr() traffic.
 WebNFS:

The advent of Java and applets led to WebNFW. NFS server


implements a web-like service on a well-known port. Requests use a
'public file handle' and a pathname-capable variant of lookup().
Enables applications to access NFS servers directly, e.g. to read a
portion of a large file
AFS ENHANCEMENTS

  The design of DCE/DFS distributed file systems goes beyond


AFS.
 Improvements in disk storage organization
 RAID - improves performance and reliability by striping data
redundantly across several disk drives
 Log-structured file storage
–updated pages are stored contiguously in memory and committed
to disk in large contiguous blocks (~ 1 Mbyte).
-File maps are modified whenever an update occurs.
-Garbage collection to recover disk space.
IMPROVEMENTS IN STORAGE ORGANIZATION

 Redundant Arrays of Inexpensive Disks (RAID): This is a


mode of storage [in which data blocks are segmented into
fixed-size chunks and stored in ‘stripes’ across several disks,
along with redundant error-correcting codes that enable the
data blocks to be reconstructed completely and operation to
continue normally in the event of disk failures.
 Log-structured file storage (LFS): Like Spritely NFS, this
technique originated in the Sprite distributed operating system
project at Berkeley . The authors observed that as larger
amounts of main memory became available for caching in file
servers, an increased level of cache hits resulted in excellent
read performance, but write performance remained mediocre.
NEW DESIGN APPROACHES

 Distributed file data across several servers:


-Exploits high-speed networks (ATM, Gigabit Ethernet)
-Layered approach, lowest level is like a 'distributed virtual disk‘
-Achieves scalability even for a single heavily-used file
'Serverless' architecture
-Exploits processing and disk resources in all available network
nodes
-Service is distributed at the level of individual files
 Examples:
 DFS is serverless: Experimental implementation
demonstrated a substantial performance gain over NFS and
AFS
 Frangipani is a highly scalable DS: Performance similar to
local UNIX file access
 Tiger Video File System
 Peer-to-peer systems: Napster, OceanStore (UCB), Farsite
(MSR), Publius (AT&T research)
 Replicated read-write files
 High availability
 Disconnected working
-re-integration after disconnection is a major problem if
conflicting updates have occurred
Examples:
Bayou system
Coda system
SUMMARY
 Sun NFS is an excellent example of a distributed service designed
to meet many important design requirements
 Effective client caching can produce file service performance
equal to or better than local file systems
 Consistency versus update semantics versus fault tolerance
remains an issue
 Most client and server failures can be masked
 Superior scalability can be achieved with whole-file serving
(Andrew FS) or the distributed virtual disk approach
THANK
YOU

You might also like