CAN [8], Chord [11], Kademlia [6], Pastry [9],Tapestry [2], and Viceroy [5] are both structuredand symmetric, unlike all the other systems men-tioned here. This allows them to offer guarantees while simultaneously not being vulnerable to indi-vidual node failures. They all implement the DHTabstraction.The rest of this article discusses these recent algo-rithms, highlighting design points and trade-offs.These algorithms incorporate techniques that scale well to large numbers of nodes, to locate keys withlow latency, to handle dynamic node arrivals anddepartures, to ease the maintenance of per-noderouting tables, and to bal-ance the distribution of keys evenly among theparticipating nodes.
A DistributedHash Table
A hash-table interface isan attractive foundationfor a distributed lookupalgorithm because itplaces few constraints onthe structure of keys orthe values they name.The main requirementsare that data be identifiedusing unique numerickeys, and that nodes be willing to store keys for each other. The values couldbe actual data items (file blocks), or could be point-ers to where the data items are currently stored. A DHT implements just one operation:
lookup(key)
yields the network location of thenode currently responsible for the given key. A sim-ple distributed storage application might use thisinterface as follows. To publish a file under a partic-ular unique name, the publisher would convert thename to a numeric key using an ordinary hash func-tion such as SHA-1, then call
lookup(key)
. Thepublisher would then send the file to be stored at thenode(s) responsible for the key. A consumer wishingto read that file would later obtain its name, convertit to a key, call
lookup(key)
, and ask the resultingnode for a copy of the file.To implement DHTs, lookup algorithms have toaddress the following issues:
Mapping keys to nodes in a load-balanced way
.In general, all keys and nodes are identified using an
m
-bit number or identifier (ID). Each key is storedat one or more nodes whose IDs are “close” to thekey in the ID space.
Forwarding a lookup for a key to an appropri-ate node
. Any node that receives a query for a key identifier
s
must be able to forward it to a node whose ID is “closer” to
s
. This rule will guaranteethat the query eventually arrives at the closest node.
Distance function
. The two previous issuesallude to the “closeness” of keys to nodes and nodesto each other; this is a common notion whose defin-ition depends on the scheme. In Chord, the close-ness is the numeric difference between two IDs; inPastry and Tapestry, it is the number of commonprefix bits; in Kademlia, it is the bit-wise exclusive-or (XOR) of the two IDs. In all the schemes, eachforwarding step reducesthe closeness betweenthe current node han-dling the query and thesought key.
Building routing tables adaptively
. Toforward lookup mes-sages, each node mustknow about some othernodes. This informationis maintained in routingtables, which must adaptcorrectly to asynchro-nous and concurrentnode joins and failures.
Routing in OneDimension
A key difference in thealgorithms is the datastructure that they use as a routing table to provideO(log
N
) lookups. Chord maintains a data structurethat resembles a skiplist. Each node in Kademlia,Pastry, and Tapestry maintains a tree-like data struc-ture. Viceroy maintains a butterfly data structure, which requires information about only constantother number nodes, while still providing O(log
N
)lookup. A recent variant of Chord uses de Bruijngraphs, which requires each node to know only about two other nodes, while also providing O(log
N
) lookup. We illustrate the issues in routing usingChord and Pastry’s data structure.
Chord: Skiplist-like routing
Each node in Chord [11] has a finger table contain-ing the IP address of a node halfway around the IDspace from it, a quarter-of-the-way, an eighth-of-the- way, and so forth, in powers of two, in a structurethat resembles a skiplist data structure (see Figure 1). A node forwards a query for key
k
to the node in its
COMMUNICATIONSOF THE ACM
February 2003/Vol. 46, No. 2
45
N51N56K54N1N8
lookup(54)
N14N21N38N42N48
Figure 1. A structureresembling a skiplist datastructure.
Leave a Comment