• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
 
COMMUNICATIONSOF THE ACM
February 2003/Vol. 46, No. 2
43
Systems
T
he main challenge in P2P computing is to design and imple-ment a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelatedadministrative domains. The participants in a typical P2Psystem might include computers at homes, schools, and businesses, andcan grow to several million concurrent participants.P2P systems are attractive forseveral reasons:• The barriers to starting andgrowing such systems are low,since they usually don’t requireany special administrative orfinancial arrangements,unlike centralizedfacilities;• P2P systems offer a way to aggregate and make useof the tremendous com-putation and storageresources on computers acrossthe Internet; and• The decentralized and distrib-uted nature of P2P systemsgives them the potential to berobust to faults or intentionalattacks, making them ideal forlong-term storage as well as forlengthy computations.P2P computing raises many interesting research problems indistributed systems. In this article we will look at one of them, the
lookup problem
. How do you findany given data item in a large P2Psystem in a scalable manner, with-out any centralized serversor hierarchy? This problemis at the heart of any P2Psystem. It is not addressed well by most popular sys-tems currently in use, and itprovides a good example of how the challenges of designingP2P systems can be addressed.The recent algorithms devel-oped by several research groups forthe lookup problem present a sim-ple and general interface, a distrib-uted hash table (DHT). Dataitems are inserted in a DHT andfound by specifying a unique key 
D
ATA
By Hari Balakrishnan,M. Frans Kaashoek, David Karger,Robert Morris, and Ion Stoica
 Designing and implementing a robust distribution system composed of inexpensive computers in unrelated administrative domains.
L
OOKING
U
P
in P2P 
 
44
February 2003/Vol. 46, No. 2
COMMUNICATIONSOF THE ACM
for that data. To implement a DHT, the underlyingalgorithm must be able to determine which node isresponsible for storing the data associated with any given key. To solve this problem, each node main-tains information (the IP address) of a small numberof other nodes (“neighbors”) in the system, formingan overlay network and routing messages in theoverlay to store and retrieve keys.One might believe from recent news items thatP2P systems are mainly used for illegal music-swap-ping and little else, but this would be a rather hasty conclusion. The DHT abstraction appears to pro-vide a general-purpose interface for location-inde-pendent naming upon which a variety of applications can be built. Furthermore, distributedapplications that make use of such an infrastructureinherit robustness, ease-of-operation, and scalingproperties. A significant amount of research effort isnow being devoted to investigating these ideas (Proj-ect IRIS, a multi-institution, large-scale effort, is oneexample; see www.project-iris.net).
The Lookup Problem
The lookup problem is simple to state: Given a dataitem
 X 
stored at some dynamic set of nodes in thesystem, find it. This problem is an important one inmany distributed systems, and is the critical com-mon problem in P2P systems.One approach is to maintain a central databasethat maps a file name to the locations of servers thatstore the file. Napster (www.napster.com) adoptedthis approach for song titles, but this approach hasinherent scalability and resilience problems: thedatabase is a central point of failure.The traditional approach to achieving scalability is to use hierarchy. The Internet’s Domain NameSystem (DNS) does this for name lookups. Searchesstart at the top of the hierarchy and, by followingforwarding references from node to node, traverse asingle path down to the node containing the desireddata. The disadvantage of this approach is that fail-ure or removal of the root or a node sufficiently highin the hierarchy can be catastrophic, and the nodeshigher in the tree take a larger fraction of the loadthan the leaf nodes.These approaches are all examples of structuredlookups, where each node has a well-defined set of information about other nodes in the system. Theadvantage of structured lookup methods is that onecan usually make guarantees that data can be reliably found in the system once it is stored.To overcome the resilience problems of theseschemes, some P2P systems developed the notion of symmetric lookup algorithms. Unlike the hierarchy,no node is more important than any other node as faras the lookup process is concerned, and each node istypically involved in only a small fraction of the searchpaths in the system. These schemes allow the nodes toself-organize into an efficient overlay structure. At one end of the symmetric lookup spectrum,the consumer broadcasts a message to all its neigh-bors with a request for
 X 
. When a node receives sucha request, it checks its local database. If it contains
 X 
,it responds with the item. Otherwise, it forwards therequest to its neighbors, which execute the sameprotocol. Gnutella (gnutella.wego.com) has a proto-col in this style with some mecha-nisms to avoid request loops.However, this “broadcast” approachdoesn’t scale well because of thebandwidth consumed by broadcastmessages and the compute cyclesconsumed by the many nodes thatmust handle these messages. In fact, the day after Nap-ster was shut down, reports indicate the Gnutella net- work collapsed under the load created by a largenumber of users who migrated to it for sharing music.One approach to handling such scaling problemsis to add “superpeers” in a hierarchical structure, asis done in FastTrack’s P2P platform (www.fast-track.nu), and has been popularized by applicationslike KaZaA (www.kazaa.com). However, this comesat the expense of resilience to failures of superpeersnear the top of the hierarchy. Furthermore, thisapproach does not provide guarantees on objectretrieval.Freenet [1] uses an innovative symmetric lookupstrategy. Here, queries are forwarded from node tonode until the desired object is found based onunstructured routing tables dynamically built upusing caching. But a key Freenet objective—anonymity—creates some challenges for the system.To provide anonymity, Freenet avoids associating adocument with any predictable server, or forming apredictable topology among servers. As a result,unpopular documents may simply disappear fromthe system, since no server has the responsibility formaintaining replicas. Furthermore, a search may often need to visit a large fraction of nodes in thesystem, and no guarantees are possible.The recent crop of P2P algorithms, including
One might believe P2P systems are mainly used for illegal music-swapping and little else, butthis would be a rather hasty conclusion.
 
CAN [8], Chord [11], Kademlia [6], Pastry [9],Tapestry [2], and Viceroy [5] are both structuredand symmetric, unlike all the other systems men-tioned here. This allows them to offer guarantees while simultaneously not being vulnerable to indi-vidual node failures. They all implement the DHTabstraction.The rest of this article discusses these recent algo-rithms, highlighting design points and trade-offs.These algorithms incorporate techniques that scale well to large numbers of nodes, to locate keys withlow latency, to handle dynamic node arrivals anddepartures, to ease the maintenance of per-noderouting tables, and to bal-ance the distribution of keys evenly among theparticipating nodes.
A DistributedHash Table
 A hash-table interface isan attractive foundationfor a distributed lookupalgorithm because itplaces few constraints onthe structure of keys orthe values they name.The main requirementsare that data be identifiedusing unique numerickeys, and that nodes be willing to store keys for each other. The values couldbe actual data items (file blocks), or could be point-ers to where the data items are currently stored. A DHT implements just one operation:
lookup(key)
yields the network location of thenode currently responsible for the given key. A sim-ple distributed storage application might use thisinterface as follows. To publish a file under a partic-ular unique name, the publisher would convert thename to a numeric key using an ordinary hash func-tion such as SHA-1, then call
lookup(key)
. Thepublisher would then send the file to be stored at thenode(s) responsible for the key. A consumer wishingto read that file would later obtain its name, convertit to a key, call
lookup(key)
, and ask the resultingnode for a copy of the file.To implement DHTs, lookup algorithms have toaddress the following issues:
 Mapping keys to nodes in a load-balanced way 
.In general, all keys and nodes are identified using an
m
-bit number or identifier (ID). Each key is storedat one or more nodes whose IDs are “close” to thekey in the ID space.
Forwarding a lookup for a key to an appropri-ate node 
. Any node that receives a query for a key identifier
must be able to forward it to a node whose ID is “closer” to
. This rule will guaranteethat the query eventually arrives at the closest node.
Distance function
. The two previous issuesallude to the “closeness” of keys to nodes and nodesto each other; this is a common notion whose defin-ition depends on the scheme. In Chord, the close-ness is the numeric difference between two IDs; inPastry and Tapestry, it is the number of commonprefix bits; in Kademlia, it is the bit-wise exclusive-or (XOR) of the two IDs. In all the schemes, eachforwarding step reducesthe closeness betweenthe current node han-dling the query and thesought key.
Building routing tables adaptively 
. Toforward lookup mes-sages, each node mustknow about some othernodes. This informationis maintained in routingtables, which must adaptcorrectly to asynchro-nous and concurrentnode joins and failures.
Routing in OneDimension
 A key difference in thealgorithms is the datastructure that they use as a routing table to provideO(log
) lookups. Chord maintains a data structurethat resembles a skiplist. Each node in Kademlia,Pastry, and Tapestry maintains a tree-like data struc-ture. Viceroy maintains a butterfly data structure, which requires information about only constantother number nodes, while still providing O(log
)lookup. A recent variant of Chord uses de Bruijngraphs, which requires each node to know only about two other nodes, while also providing O(log
) lookup. We illustrate the issues in routing usingChord and Pastry’s data structure.
Chord: Skiplist-like routing
Each node in Chord [11] has a finger table contain-ing the IP address of a node halfway around the IDspace from it, a quarter-of-the-way, an eighth-of-the- way, and so forth, in powers of two, in a structurethat resembles a skiplist data structure (see Figure 1). A node forwards a query for key 
to the node in its
COMMUNICATIONSOF THE ACM
February 2003/Vol. 46, No. 2
45
N51N56K54N1N8
lookup(54)
N14N21N38N42N48
Figure 1. A structureresembling a skiplist datastructure.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...