Requirements For Distributed File Systems

06-06798 Distributed Systems Overview
• Requirements for distributed file systems

– transparency, performance, fault-tolerance, ...
Lecture 7: • Design issues
Distributed File Systems – possible options, architectures
– file sharing, concurrent updates
– caching
• Example
– Sun NFS
Distributed Systems 1 Distributed Systems 2
Characteristics of file systems File attributes

• Operations on files (=data + attributes) File length
– create/delete Creation timestamp
– query/modify attributes Read timestamp
– open/close Write timestamp
– read/write Attribute timestamp
– access control Reference count

Owner
• Storage organisation User controlled
File type
– directory structure (hierarchical, pathnames) Access control list
– metadata (file management information)
• file attributes
• directory structure info, etc
Distributed file system requirements Distributed file system requirements

• Transparency (clients unaware of the distributed
nature) – Concurrent file updates (changes by one client do not
affect another)
– access transparency (client unaware of distribution of files,
same interface for local/remote files) – File replication (for load sharing, fault-tolerance)
– location transparency (uniform file name space from any – Heterogeneity (interface platform-independent)
client workstation) – Fault-tolerance (continues to operate in the face of client
– mobility transparency (files can be moved from one server and server failures)
to another without affecting client) – Consistency (one-copy-update semantics or slight
– performance transparency (client performance not affected variations)
by load on service) – Security (access control)
– scaling transparency (expansion possible if numbers of – Efficiency (performance comparable to conventional file
clients increase) systems)
1
File Service Design Options File Service Design Options
• Stateful • Stateless
– server holds information on open files, current position, file – no state information held by server
locks – file operations idempotent, must contain all information
– open before access, close after needed (longer message)
– better performance - shorter message, read-ahead possible – simpler file server design
– server failure - lose state – can recover easily from client or server crash
– client failure - tables fill up – locking requires extra lock server to hold state
– can provide file locks
File server architecture

File Service Architecture Text names
Components (for openness):
to UFIDs
• Flat file service
Client computer Server computer
– operations on file contents
Application Application Directory service – unique file identifiers (UFIDs)
program program – translates UFIDs to file locations
• Directory service
RPC Flat file service – mapping between text names to UFIDs
Client module • Client module
– API for file access, one per client computer
– holds state: open files, positions
– knows network location of flat file & directory server
API: knows open files, positions... UFIDs
opns on contents
Flat file service RPC interface Access control

• Used by client modules, not user programs • In UNIX file system
– FileId (UFID) uniquely identifies file – access rights are checked against the access mode (read,
– invalid if file not present or inappropriate access write, execute) in open
– Read/Write; Create/Delete; Get/SetAttributes – user identity checked at login time, cannot be tampered with
• No open/close! (unlike UNIX)
• In distributed systems
– access immediate with FileId
– access rights must be checked at server
– Read/Write identify starting point
• RPC unprotected
• Improved fault-tolerance • forging identity possible, a security risk
– operations idempotent except Create, can be repeated (at- – user id typically passed with every request (e.g. Sun NFS)
least-once RPC semantics)
– stateless
– stateless service
2
File names
Directory structure Text name (=directory pathname+file name)
• Hierarchical • hostname:local name
– tree-like, pathnames from root – not mobility transparent
– (in UNIX) several names per file (link operation) • uniform name structure (same name space for all
• Naming system clients)
– implemented by client module, using directory service • remote mount (e.g. Sun NFS)
– root has well-known UFID – remote directory inserted into local directory
– locate file following path from root – relies on clients maintaining consistent naming
conventions across all clients
• all clients must implement same local tree
• must mount remote directory into the same local directory
Remote mount Directory service

Server 1 Client Server 2 • Directory
(root) (root) (root)
– conventional file (client of the flat file service)
– mapping from text names to UFIDs
export ... vmunix usr nfs • Operations
– require FileId, machine readable UFID as parameter
Remote Remote – locate file (LookUp)
people students x staff users
mount mount – add/delete file (AddName/UnName)
big jon bob ... jim ann jane joe – match file names to regular expression (GetNames)
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.
File sharing Example: Sun NFS (1985)

• Structure of flat file & client & directory service
Multiple clients share the same file for read/write access. • NFS protocol
• One-copy update semantics – RPC based, OS independent (originally UNIX)
– every read sees the effect of all previous writes • NFS server
– a write is immediately visible to clients who have the file – stateless (no open/close)
open for reading – no locks or concurrency control
• Problems! – no replication with updates
– caching: maintaining consistency between several copies • Virtual file system, remote mount
difficult to achieve • Access control (user id with each request)
– serialise access by using file locks (affects performance) – security loophole (modify RPC to impersonate user…)
– trade-off between consistency and performance • Client and server caching
3
NFS architecture File identifier (FileId)
Client computer Server computer
Simple Solution
– i-node (number identifying file Server address Index
Application Application
program program
within file system)
UNIX – file migration requires finding IP address.socket i-node number
system calls
UNIX kernel and changing all FileIds
UNIX kernel Virtual file system Virtual file system – UNIX reuses i-node numbers
Local Remote
after file deleted (i-node gen. no)
file system
UNIX NFS NFS UNIX NFS file handle

file file
Other
client server
system
NFS
system Virtual file system uses i-node if local, file handle if remote.
protocol
File handle
File system identifier i-node no. i-node gener. no.
RPC (UDP or TCP)
Caching in NFS Server caching

• Store data in server memory
• Indispensable for performance
• Read-ahead: anticipate which pages to read
• Caching
– retains recently used data (file pages, directories, file
• Delayed write
attributes) in cache – update in cache; write to disk periodically (UNIX sync to
– updates data in cache for speed synchronise cache) or when space needed
– which contents seen by users depends on timing
– block size typically 8kbytes
• Server caching • Write through
– cache and write to disk (reliable, poor performance)
– cache in server memory (UNIX kernel)
• Client caching • Write on close
– cache in client memory, local disk – write to disk only when commit received (fast but
problems with files open for a long time)
Client caching
• Potential consistency problems! Summary
– different versions, portions of files, check if copy still valid • File service
• Timestamp method – crucial to the running of a distributed system
– tag with latest time of validity check and modification time – performance, consistency and easy recovery essential
– copy valid if time since last check less than freshness
interval, or modification time on server the same • Design issues
– choose freshness interval adaptively – separate flat file service from directory service and client
module
• Reads
– perform validity check, if not valid, request data from server, – stateless for performance and fault-tolerance
optimisations – caching for performance
• Writes – concurrent updates difficult with caching
– After modification, marked as dirty and flushed – approximation of one-copy update semantics
• Not truly one-copy update semantics...

Requirements For Distributed File Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Requirements For Distributed File Systems

Uploaded by

Copyright:

Available Formats

06-06798 Distributed Systems Overview

• Requirements for distributed file systems

Distributed Systems 1 Distributed Systems 2

Characteristics of file systems File attributes

– access control Reference count

Distributed Systems 3 Distributed Systems 4

Distributed file system requirements Distributed file system requirements

Distributed Systems 5 Distributed Systems 6

Distributed Systems 7 Distributed Systems 8

File server architecture

Flat file service RPC interface Access control

Distributed Systems 11 Distributed Systems 12

Remote mount Directory service

Distributed Systems 15 Distributed Systems 16

File sharing Example: Sun NFS (1985)

Distributed Systems 17 Distributed Systems 18

UNIX NFS NFS UNIX NFS file handle

Caching in NFS Server caching

You might also like