You are on page 1of 56

UNIT-3 DISTRIBUTED FILE SYSTEM DESIGN A distributed file system has two distinct components are true file

service and the directory service. The operations on individual files such as reading, writing and appending, whereas the latter is concerned with creating and managing directories, adding and deleting files from directories and so on.

THE FILE SERVICE INTERFACE

A File can be structured as a sequence of records, for example with operating system calls to read or write a particular record. The record can be usually be specified by either its record number or the value some field. A file can have attributes, it provides primitives like read and write attributed. File services can be split into two types depending on whether they support an upload/download model or remote access model. In the upload/download model the file service provides only two major operations read file and write file.

THE DIRECTORY SERVICE INTERFACE It provides operations for creating and deleting directories, naming and renaming files and moving them from one directory to another.

Some systems divided file system into two parts. The primary and secondary parts.
The primary is the file name and secondary is the extension name. All distributed file system are allowed sub directories to make it possible to users to make a related groups.

THE DIRECTORY SERVICE INTERFACE Sub directories can contain their own subdirectories and so on leading to a tree of directories often called a hierarchical file system.

In distributed we have a system in which all clients have the same view of the distributed file system.

Naming transparency

The first one is location transparency means that the path name gives that where the file is located. A path like /server1/dir1/dir2/x tells everyone that x is located in the server1 but does not tells where the server is located. The server is free to move any where it wants to in the network without the pathname having to be changed. Thus this file system has location transparency. The system might well like to move x to server 2 automatically. A distributed system that embeds machine or server names in path names clearly not location independent.

Naming transparency

One based on remote mounting is not either, since it is not possible to move a file from one file group to another and still be able to use the old path name. There are three common approaches to file and directory naming in a distributed system: 1. machine+pathnaming such as /machine/path or machine.path 2. Mounting remote file systems onto the local file hierarchy. 3. A single name space that looks the same on all machines.

Semantics of file sharing When two or more files share the same file, it is necessary to define the reading and writing semantics of the file to avoid problems. In single processor system that permit processes to share files, such as UNIX, the semantics normally state that when a READ operation follows a WRITE operation, the READ returns the value just written. When two writes happen in quick succession, followed by a READ, the value read is the value stored by the last write. The system enforces an absolute time ordering on all operations and always returns the most recent value.

Semantics of file sharing When two or more files share the same file, it is necessary to define the reading and writing semantics of the file to avoid problems. In single processor system that permit processes to share files, such as UNIX, the semantics normally state that when a READ operation follows a WRITE operation, the READ returns the value just written. When two writes happen in quick succession, followed by a READ, the value read is the value stored by the last write. The system enforces an absolute time ordering on all operations and always returns the most recent value.

SYSTEM STRUCTURE Systems in which clients and servers are fundamentally different machines, in terms of either hardware and software. The servers may even run a different version of the operating system from the clients.

SYSTEM STRUCTURE There are two types of servers. They are 1. Stateless servers 2. Stateful servers Stateless servers means that when a client sends a request to a server, the server carries out the request, sends the reply and then removes from its internal tables all information about the request. Stateful servers means that it maintains that all right for server to maintain state information about clients between request.

DISTRIBUTED SHARED MEMORY Multiple CPU computer system fall into two types. Those that have shared memory and those that do not have. Distributed shared memory is a technique for making multi computers easier program by simulating shared memory on them. On chip memory Self contained chips containing a CPU and all the memory also exist. Such chips are produced by the millions and are widely used in cars, toys etc., In this design, the CPU portion of the chip has address and data lines that directly connect to memory portion.

DISTRIBUTED SHARED MEMORY It is an extension of this chip to have multiple CPUs directly sharing the same memory.

DISTRIBUTED SHARED MEMORY

(a) A shared-memory multiprocessor. (b) A messagepassing multicomputer. (c) A wide area distributed system.

UMA (unified memory access Multiprocessors with Bus-Based Architectures

Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories.

Bus-Based Multiprocessors

CPU

CPU

CPU

Memory

Bus A multiprocessor

CPU
Cache

CPU Cache Bus

CPU Cache

Memory

A multiprocessor with caching

Ring-Based Multiprocessors: Memnet

Protocol
l

Read
When the CPU wants to read a word from shared memory, the memory address to be read is passed to the Memnet device, which checks the block table to see if the block is present. If so, the request is satisfied. If not, the Memnet device waits until it captures the circulating token, puts a request onto the ring. As the packet passes around the ring, each Memnet device along the way checks to see if it has the block needed.

If so, it puts the block in the dummy field and modifies the packet header to inhibit subsequent machines from doing so. If the requesting machine has no free space in its cache to hold the incoming block, to make space, it picks a cached block at random and sends it home. Blocks whose Home bit are set are never chosen because they are already home.

Write

If the block containing the word to be written is present and is the only copy in the system (i.e., the Exclusive bit is set), the word is just written locally. If the needed block is present but it is not the only copy, an invalidation packet is first sent around the ring to force all other machines to discard their copies of the block about to be written. When the invalidation packet arrives back at the sender, the Exclusive bit is set for that block and the write proceeds locally.

If the block is not present, a packet is sent out that combines a read request and an invalidation request. The first machine that has the block copies it into the packet and discards its own copy. All subsequent machines just discard the block from their caches. When the packet comes back to the sender, it is stored there and written.

Switched Multiprocessors
The bus based multiprocessors and ring based multiprocessors work fine for small systems (up to around 64 CPUs), they do not scale well to systems with hundred or thousands of CPUs. Two approaches can be taken to attack the problem of not enough bandwidth: Reduce the amount of communication. E.g. Caching. Increase the communication capacity. E.g. Changing topology.

Switched Multiprocessors
One method is to build the system as a hierarchy. Continue to put some number of CPUs on a single bus. Build the system as multiple clusters and connect the clusters using an intracluster bus. As long as most CPUs communicate primarily within their own cluster, there will be relatively little intercluster traffic. If still more bandwidth is needed, collect a bus, tree, or grid of clusters together into a supercluster, and break the system into multiple superclusters.

Switched Multiprocessors

Switched Multiprocessors
The machine is called DASH (Directory Architecture for Shared memory). Each cluster has a directory that keeps track of which cluster currently have copies of its blocks. Since each cluster owns 1M memory blocks. It has 1M entries in its directory, one per block. Each cluster in Dash is connected to an interface that allows the cluster to communicate with other cluster. The interfaces are connected by intercluster links in a rectangular grid.

Switched Multiprocessors
Each cache block can be done in one of the following three states.

Consistency Models
l

Strict Consistency Any read to a memory location x returns the value stored by the most recent write operation to x.

P1: W(x)1 P2: R(x)1 Strict consistency P1: W(x)1 P2: R(x)0 R(x)1 Not strict consistency

Sequential Consistency
l

The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

The following is correct: P1: W(x)1 P2: R(x)0 R(x)1 P1: W(x)1 P2:

R(x)1 R(x)1

All processes see all shared accesses in the same order.

Causal Consistency
l

Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
W(x)3 R(x)1 R(x)1 R(x)1 W(x)2 R(x)3 R(x)2 R(x)2 R(x)3

P1: W(x)1 P2: P3: P4:

This sequence is allowed with causally consistent memory, but not with sequentially consistent memory or strictly consistent memory.

P1: W(x)1 P2: P3: P4:

R(x)1 W(x)2 R(x)2 R(x)1 R(x)1 R(x)2

A violation of causal memory in correct P1: W(x)1 P2: P3: P4:

W(x)2 R(x)2 R(x)1 R(x)1 R(x)2

A correct sequence of events in causal memory correct All processes see all casually-related shared accesses in the same order.

W(x)2 depending on W(x)1 because the 2 may be result of computation involving the value read by R(x)1. The two writes are causally related, so all processes must see them in the same order. So it is incorrect. The read has been removes, so w(x)1 and W(x)2 are now concurrent writes. So it is in correct.

PRAM Consistency pipelined RAM


l

Writes done by a single process are received by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.
R(x)1 W(x)2
R(x)1 R(x)2 R(x)2 R(x)1

P1: W(x)1 P2: P3: P4:

A valid sequence of events for PRAM consistency but not for the above stronger models.

Processor consistency
l

Processor consistency is PRAM consistency + memory coherence. That is, for every memory location, x, there be global agreement about the order of writes to x. Writes to different locations need not be viewed in the same order by different processes.

WEAK Consistency
PRAM consistency and processor consistency can give better performance than the stronger models. So to have a better solution we are making synchronization and using the synchronization variable to synchronize the operations in the memory. When a synchronization completes all writes done on that machine are propagated and all writes done on other machines are brought in.

WEAK Consistency
All the shared memory is synchronized. We have the three properties are there. 1. Accesses to synchronization variables are sequentially consistent. 2. No access to a synchronization variable is allowed to be performed until all previous writes have completed every where. 3. No data access (read or write) is allowed to be performed until all previous access to synchronization variables have been performed.

WEAK Consistency
Process P1 does two writes to an ordinary variable and then synchronizes indicated by the letter S. If p2 and p3 have not yet been synchronized, no guarantees are given about what they see, so this sequence of events is valid. b) Process p2 has been synchronized, which means that its memory is brought up to date, so it is invalid sequence.

WEAK Consistency

Release Consistency
l l

Release consistency provides acquire and release accesses. Acquire accesses are used to tell the memory system that a critical region is about to be entered. Release accesses say that a critical region has just been exited.
Acq(L) R(x)2 Rel(L)

P1: Acq(L) W(x)1 W(x)2 Rel(L) P2: P3:

R(x)1

A valid event sequence for release consistency. P3 does not acquire, so the result is not guaranteed.

COMPARISION OF CONSISTENCY MODELS

NUMA Multiprocessors
l

Like a traditional UMA (Uniform Memory Access) multiprocessor, a NUMA machine has a single virtual address space that is visible to all CPUs. When any CPU writes a value to location a, a subsequent read of a by a different processor will return the value just written. The difference between UMA and NUMA machines lies in the fact that on a NUMA machine, access to a remote memory is much slower than access to a local memory.

NUMA Multiprocessors
In this design each CPU is coupled directly to one memory. Each of small squares represents a CPU+memory pair. The CPUs on the right hand side of the figure are the same as those on the left. The CPUs are wired up via eight switches, each having four input ports and four output ports. Local memory requests are handled directly, remote requests are turned into request packets and sent to the appropriate memory via the switching network.

PROPERTIES OF NUMA Multiprocessors


NUMA machines have three key properties that are concern to us 1. Accessing to remote memory is possible. 2. Accessing remote memory is slower than accessing local memory. 3. Remote access times are not hidden by caching.

NUMA Algorithms
To allow mistakes to be corrected and to allow the system to adapt to changes in reference patterns, NUMA systems usually have a daemon process called the page scanner, running in the background. Periodically every 4 sec, the page scanner gathers usage statistics about local and remote preferences, which are maintained with help from the hardware. Every n times the page scanner reevaluates earlier decisions to copy pages or map them to remote memories.

NUMA Algorithms
If the usage statistics that a page is in wrong place, the page scanner unmaps the page so that the next reference causes a page fault, allowing a new placement decision to be made. If page is moved too often within short internal, the page scanner can mark the page as frozen, which further movement until some specified happens (some number of seconds has elapsed).

You might also like