Shark: Scaling File Servers via Cooperative Caching

Siddhartha Annapureddy, Micheal J. Freedman, David Mazie’res
Presented By : Syed Jibranuddin

Motivation

Replicating Execution Environment

on each machine before launching distributed application -> WASTE RESOURCES + DEBUG

Multiple Client Copying Same Files • Replicating data and serving same copy of data to multiple clients -> INCONSISTENCIES

Introduction
Shark is a distributed file system Designed for large-scale, wide-area deployment. Scalable. Provides a novel cooperative-caching mechanism
Reduces the load on an origin file server.

Challenges
Scalability:

What if a large program (say in MB’s) is being executed from a file server by many clients? Ø Because of bandwidth, server might deliver unacceptable performance. As the model is similar to P2P file systems,

How Shark Works ???

 Shark

clients find nearby copies of data by using distributed index. avoids transferring the file/chunks of the file from the server, if the same data can be fetched from nearby, client.

 Client

 Shark

is compatible with existing backup and restore procedures. shared read, Shark greatly reduces server load

 By

Design Protocol
1. 2. 3.
Ø

4.
Ø Ø

File into chunks by using Rabin finger print algorithm. Shark interacts with the local host using an existing NFSv3 and runs in user space. When First Client reads a particular file Gets file and Registers as replica proxy for chunks of file in distributed index. Now when 2nd client wants to access the same file:
discovers proxies of the file chunks by querying the distributed index. Establish a secure channel to proxy(s), download the file chunks in parallel.

5. 6.

1.

2.

After fetching, the client then registers itself as a replica proxy for these chunks. Server exposes two Api’s. : Put & Get: SECURE DATA SHARING Data is encrypted by the sender and can be decrypted only by the client with appropriate read permissions. Clients cannot download large amounts of data without proper read authorization.

Secure Data Sharing
Ø

Cross-file-system sharing Shark uses token generated by the file server as a shared secret between client and proxy. Client can verify the integrity of the received data. For a sender client (proxy client) to authorize requester client, requester client will provide the knowledge of token Once authorized, receiver client will establish the read permission of the file.

Ø

Ø

Ø

Ø

File Consistency
Shark uses two network file system techniques: Ø Leases Ø AFS-style whole-file caching Client first checks its own cache, if file is not cached/updated, it fetches it from fileserver/replica proxies. When client makes a read RPC to the file server, it gets a read lease on the particular file. In Shark default lease duration = 5mins.

Ø

Ø

Ø

Ø

Cooperative Caching
.

Client (C1)
GETTOK (fh, offset, count)
1. file attributes 2. file token
3. (chunktok1, offset1, size1) (chunktok2, offset2, size2) (chunktok3, offset2, size2)

Server

C2

C3

C4

File Not Cached / Lease Expired

Issues Series of Read calls to the Kernel NFS server. 1. TF = tok (F) = HMACr(F) 2. Split data into chunks 3. Compute tokens for chunks Caches the tokens for future reference. Multicast Requesting for Chunk Fi

Determines if (Local Cache == Latest) Fetch ‘K’ Chunks in Parallel

if (Local Cache is not latest => Create ‘K’ Threads

t1 t2 t3 t1 t2 t3 t3
Chunk F1 Chunk F3
Request Chunk F2 Chunk F2

F3 F2

PUT() -> Announuces as a Proxy for Fi

Distributed Indexing

Ø

Shark uses global distributed index for all shark clients. System maps opaque keys onto nodes by hashing the value onto a key ID. Assigning ID’s to nodes allows lookup on the O(log(n)) Shark stores only small information about which clients stores what data. Shark uses Coral as its distributed index. Coral Provides Distributed Sloppy Hash Table (DSHT).

Ø

Ø

Ø

Ø

Ø

Implementation

q

Shark consists Components:
Ø

of

3

Main

Server side daemon
Ø

 Implemented

Client side daemon Coral daemon

Ø

in C++ and are built Client side daemon ● Biggest Component of shark. ● Handles User Requests

Evaluation
Shark is evaluated against NFSv3 and SFS. Read tests are performed both
Ø

within the controlled Emulab LAN environment and In the wide area on the PlanetLab v3.0 test-bed.

Ø

The server required 0.9 seconds to compute chunks for a 10 MB random file, and 3.6 seconds for a 40 MB random file.

Ø

Ø

For the local-area micro-benchmarks, local machines at NYU are used as a Shark client. In this micro-benchmark, Shark’s chunking

Conclusion
Ø

Current Networked Filesystems like NFS are not scalable.
Ø

Forces users to resort to inconvenient tools like rsync.

Ø

Shark offers a filesystem that scales to hundreds of clients.
Ø Ø

Locality aware cooperative cache. Supports Cross FS Sharing enabling Novel Applications.

Question s

Sign up to vote on this title
UsefulNot useful