Professional Documents
Culture Documents
• Scalability
• Flexibility / extensibility
• Security / reliability
• and more…
•2002:
• Average hard disk size: 100 Gbyte
•2004:
• Average processing power (clock frequency) of personal computers: ~ 3 GHz
Personal computers now have the capabilities comparable to servers in
the 1990s
Driving Forces Behind Peer-to-Peer Networks (2)
•Development of communication networks
•Early 1990s: private users start to connect to the Internet via 56 kbps modems
•1997/1998:
• first broadband connections for residential users become available
• cable modem with up to 10 Mbps
•1999:
• Introduction of DSL and ADSL connections
• Data rates of up to 8.5 Mbps via common telephone connections become
available.
• The deregulation of the telephone market shows first effects with significantly
reduced tariffs, due to increased competition on the last mile.
bandwidth is plentiful and cheap!
Problems of Today‘s Internet: Flexibility
• Resistance to censorship
• Great demand from users` point of view
• „Small demand“ from the „agencies`“ point of view
• Central server systems can be easily shut down or rendered harmless
Departure from the End-to-End Principle
NAT
Result of the Developments: De-Centralization
•Conclusion
• Centralized systems
• … do not scale infinitely
• … entail single points of failures (reliability problems)
• … are easy to attack
• … but are easy to realize
• and will thus continue to be used (up to the scalability limit)
•A first definition
P2P is a class of applications where each node is at the same time a
client and a server. Because they operate in an environment of
unstable con-nectivity and un-predictable IP addresses, P2P nodes
must operate outside the DNS system and have significant or total
autonomy from central servers.
Definition of Peer-to-Peer Systems (2)
? ? distributed system
?
12.5.7.31
peer-to-peer.info
berkeley.edu planet-lab.org 89.11.20.15
95.7.6.10
86.8.10.18 7.31.10.25
Types of Peer-to-Peer Systems
C
• Client-Server systems C
C •Examples
• Classical role-based systems S •WWW
P
P P
Architectures of Peer-to-Peer
Networks
Client-Server Peer-to-Peer
1. Server is the 1. Resources are shared between the peers
central entity and 2. Resources can be accessed directly from other peers
only provider of 3. Peer is provider (server) and requestor (client): the servent concept
service and
content. Unstructured P2P Structured P2P
Network
managed by the Centralized P2P Pure P2P Hybrid P2P DHT-Based
Server
1. All features of Peer-to- 1. All features of Peer-to- 1. All features of Peer-to- 1. All features of Peer-to-
2. Server as the Peer included Peer included Peer included Peer included
higher performance 2. Central entity is 2. Any terminal entity can 2. Any terminal entity can 2. Any terminal entity can
system. necessary to provide be removed without be removed without be removed without
the service loss of functionality loss of functionality loss of functionality
3. Clients as the
3. Central entity is some 3. No central entities 3. dynamic central 3. No central entities
lower performance kind of index/group entities
Examples: Gnutella 0.4, 4. Connections in the
systems database Freenet Example: Gnutella 0.6, overlay are “fixed”
Example: Napster JXTA Examples: Chord, CAN
Example: WWW
Freenet SMTP
Peer-to-Peer Overlay
•Peer-to-Peer networks
• De-centralized self-organizing systems with de-centralized usage of
resources
•Additional reasons
• Success of “practical” P2P approaches (publicity through copyright
discussions)
• Emerging of innovative types of services (ICQ, file sharing, Skype, etc.)
• Interesting research areas:
• Quality in peer-to-peer networks
• De-centralized self-organization
• Location-based routing content-based routing
Distributed Hash Tables
•Essential challenge in (most) peer-to-peer systems:
• Location of a data item among the distributed systems:
• Where shall the item be stored?
• How does a requester find the location of an item?
• Allow peer nodes to join and leave the system anytime.
• Scalability: keep the complexity for communication and
storage scalable.
• Robustness and resilience in case of faults and frequent
changes
O(N) Flooding
Communication
Bottlenecks:
• Communication Overhead
Overhead
O(log N)
?
Scalable solution Central
between the two Server
O(1)
extremes?
O(1) O(log N) Node State O(N)
Distributed Indexing (2)
• Communication overhead vs. node state
Scalability: O(log N)
No false negatives
Resistant against changes,
failures, attacks
Flooding Allows short-time users
Communication
O(N)
Bottleneck:
•Communication Overhead
Overhead
•False negatives
Bottlenecks:
•Memory, CPU, Network
Distributed •Availability
O(log N)
Hash Table
Central
Server
O(1)
Central
O(N) O(1)
Server
Flooding
O(1) O(N²)
Search
Distributed
O(log N) O(log N)
Hash Tables
From Classic Hash Tables to
Distributed Hash Tables
•Classic Hash Table peers
•Searching is easy and efficient. 0 1 2 3 4 5 6
However, adding a new peer node
changes the hash function! 23 0 5 1 4
f(23)=1
data f(1)=4
data
•Distributed Hash Table
•Each peer is responsible for a subset of
the data range; that subset is computed hash function
by the hash function.
•If we search for data, the query is
submitted to the same hash function.
hash function
responsible peers
Insertion into a Distributed Hash Table
• Peers are hashed to a specific area
• Documents are also hashed to a
specific area
• Each peer is responsible for his area
2m-1 0
H(Node Y)=3485 Often, the address
Y space is visualized
as a circle.
X
Data item “D”:
H(“D”)=3107 H(Node X)=2906
Association of Address Space
with Nodes
•Each node is responsible for a part of the value range Node 3485 is responsible
• Sometimes with redundancy for data items in range
• Continuous adaptation 2907 to 3485 (in case of
a Chord DHT)
• Real (underlay) and logical (overlay)
topology are uncorrelated
1008 1622 2011
709 2207
H(„my
data“) 1622
= 3107 709
1008 2011
2207
Node 3485 manages
keys 2907-3485,
611 2906
3485
Key = H(“my
data”)
Initial node
(arbitrary)
(3107, (ip, port))
611 2906
3485
1622
709 1008 2011
2207
611 2906
D 3485
D
D HSHA-1(„D“)=3107
134.2.11.68
Association of Data with IDs –
Indirect Storage
•Indirect storage
• Nodes in a DHT store tuples (key,value)
• Key = Hash(„my data”) 2313
• Value is then the storage address of the content:
(IP, Port) = (134.2.11.140, 4711)
• More flexible, but requires one step more to reach the content.
1622
709 1008 2011
2207
D Item D: 134.2.11.68
134.2.11.68
Node Arrival
1. Calculation of node ID
2. New node contacts DHT via arbitrary node.
3. A particular hash range is assigned to the node.
4. The key/value pairs of this hash range are stored on
the new node (usually with redundancy).
5. The node is integrated into the routing environment.
1622
709 1008 2011
2207
611 3485 2906
ID: 3485
134.2.11.68
Node Failure / Node Departure
•Failure of a node
• Use of redundant storage of the key/value pairs (if
a node fails)
• Use of redundant/alternative routing paths
• Key/value usually still retrievable as long as at
least one copy remains.
•Departure of a node
• New partitioning of the hash range to neighbor
nodes
• Copy the key/value pairs to the neighbor nodes
• Remove the departing node from the routing
environment.
DHT Interfaces
•Generic interface of Distributed Hash Tables
• Provisioning of information
• publish(key,value)
• Requesting of information (search for content)
• lookup(key)
• Reply
• value
•DHT approaches are then interchangeable (implementing the
same inter-face).
Distributed Application
publish(key,value)lookup(key)
value
Distributed Hash Table
(CAN, Chord, Pastry,…)
1
0 successor(1) = 1
7 1
Chord
successor(6) = 0 6 6 Ring 2 2 successor(2) = 3
Identifier
5 3 Node
4 X Key
2
Chord Topology (2)
•Topology determined by links
between nodes
• Link: knowledge about another node
• Stored in a routing table on each node
0
•Simplest topology 7 1
• circular linked list
•Principle of consistent 6 2
(distributed) hashing
• Initial idea: balance load among nodes
by using a hash function to map 5 3
nodes/data into the linear address 4
space.
• Each node has a link to the next node
(clockwise)
Chord Routing (1)
•Primitive routing in distributed hashing
• Forward query for key k to the next node until
successor(k) is found
• Return result to the source of the query
•Advantages 6
node 0
• Simple
1
• Little node state needed 0
•Disadvantages 7 1 key 6?
7 1 0 2 3
1 3 3
2 5 0
6 2
56 7
iii 2^i
2^i Target
2^i Target Link
Target Link
Link 54
000 111 40 53
24
43 42
54
26
45 52 13
111 222 41 54
25
44 42
54
26
45 14
222 444 (44)
lookup 43
56
46
27 = 45
56
49
30 4949
16
333lookup
45 31(44)52
888 47
60
50 49
60
33
444 1616 55
16 4
58
39 56
4
60
39 4545
19
555 32
32
32 10 7
20
55 7
13
23
56 44
4242 23
39
37 26
33 30
Chord Self Organization (1)
•Handle a changing network environment
• Arrival of new nodes
• Departure of participating nodes
• Failure of nodes
13
fingers 14
case of failures
19
45
• trade-off: maintenance traffic 45
44
vs. correctness and timeliness 42
42 23
39
37 26
33 30
Chord Self Organization (4)
•Successor failure during routing
• Last step of routing can return a failed node to the
source of the query
-> all queries for the successor fail
• Store n successors in a successor list
• successor[0] fails -> use successor[1], etc.
• routing fails only if n consecutive nodes fail
simultaneously.
7 1 0 2 3
1 3 3
2 5 0
6 2
finger table at node 6 keys
i n+2i succ. finger table at node 3 keys
0 7 0 5 3 i n+2i succ. 2
1 0 0 0 4 0
2 2 3 4 1 5 0
2 7 0
Chord: Node Join (2)
•Examples for choosing new node IDs
• random ID: equal distribution assumed
• hash IP address and port
• place new nodes based on
• load of the existing nodes ID = rand()
? =6
• geographic location
• etc.
•Retrieval of existing 0
1
node IDs 7
entrypoint.chord.org?
• Controlled flooding
• DNS aliases 6 2
• Published through
Web
5 3
• etc.
DNS 182.84.10.23 4
Chord: Node Join (3)
•Construction of finger table
• iterate over finger table rows
• for each row: query entry point for
successor
• use standard Chord routing on entry
point
•Construction of successor list successor list
• add immediate successor from the 1 3
finger table
• request successor list from this
successor
0
7 1
finger table at node 6 keys succ(7)= ?
0
i n+2i succ.
succ(0)= ?
0
0 7 0
1 0 0 6 succ(2)= ?
3 2
2 2 3
successor list
0 1
5 3
4
Chord: Node Join (4)
•Update of finger pointers: Example
• Node 82 joins example for i = 3
• Finger entries to node 86 may now point
to the new node 82 1
• Candidates for updates: 87
i
• Nodes (counter-clockwise) whose 2 -th 8
86
finger entry have to point to 82
• Check predecessor’s
i
t i of keys (s – 2 i)
82 82-23
• route to s - 2
• If t’s 2i-finger points to a node
beyond 82: i
• change t’s 2 -finger to 82
• set t to predecessor of t
and repeat 32
• ELSE continue with 2i+1 74
23-finger=72
X
23-finger=86 72
82
•O(log2 N) for looking up and updating X
23-finger=86
the finger entries. 82
Conclusions for Chord
•Complexity
• Messages per lookup: O(log N)
• Memory per node: O(log N)
• Messages per management action (join/leave/fail): O(log²
N)
•Advantages
• Theoretical models and proofs exist about the complexity
• Simple and flexible
•Disadvantages
• No notion of node proximity and proximity-based routing
optimizations
• Chord rings may become disjoint (partitioned) in realistic
settings
•By today, many improvements were published
• e.g., provisions for proximity, bi-directional links, load
balancing, etc.
CAN: Content-Addressable Network
•An early and successful algorithm
•Simple and elegant
• Intuitive to understand and implement
• Many improvements and optimizations exist
• Published by Sylvia Ratnasamy et al. in 2001
•Main responsibilities
• CAN is a distributed system that maps keys to values.
• CAN uses distributed hashing.
• Keys are hashed into a D-dimensional space
• Interface:
• insert(key, value)
• retrieve(key)
CAN Overview (1)
Basic idea (K1,V1) K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert
(K1,V1) retrieve (K1)
CAN Overview (2)
•Solution
•Virtual Cartesian coordinate space
•Entire space is partitioned amongst all the nodes. Every
node “owns” a zone in the overall space.
•Abstraction
• can store data at “points” in the space
• can route from one “point” to another
4
H(„movie.avi“) (0.7, 0.2)
0
0 0,5 1
CAN Overview (3)
•An overlay node manages
one partition (rectangle) of
the value space.
• Example: node 4 manages
all values in x [0.5, 1], y
[0, 0.25] 1
• Adjacent partitions are 7
called “neighbors”: 6
• Nodes 6, 2 and 4 are 8
neighbors 1
of node 5 9
• „wrap around“ on DHT- 0,5
borders:
node 3 is also a neighbor of 2 5
node 5 3
• Expected number of
neighbors: O(2D) 4
independent of the size of the
CAN network! 0
0 0,5 1
CAN Setup
State of the system at time t
Zone
x Peer
Peer
Q(x,y)
CAN Routing (2)
•Each node manages a rectangle with ratio 1:1, 1:2 or 2:1 (if D=2)
Example
7
• Dimension = 2, x=0…8, y=0…8 n5
• Node n1 is the first node and 6
thus manages the entire space n3 n4
• Node n2 joins the CAN-Network: the 5
0 1 2 3 4 5 6 7
CAN Routing (3)
•Data location is associated with
coordi-nates derived from the key.
7
A (key, value)-pair is stored at the 6 n5
node responsible for the respective n3 n4
section. 5 k4
A query for a key is always forwarded 4
via neighbors: k1
• Entry point at some known node, 3
n1 n2
e.g., n1 2
• Lookup for key k4 k3
1
0 k2
0 1 2 3 4 5 6 7
CAN Routing (4)
•Path selection in CAN
• Routing along the shortest path in the D-dimensional
space
• Details:
The distance decreases continuously
• effort: O( D4 N ) hops
1
D
CAN: A Simple Example (1)
1 1 2
2 2 4
CAN: A Simple Example (2)
node J::retrieve(K)
I::insert(K,V)
(1) a = hx(K)
b = hy(K) I
(2) route
route(K,V)
“retrieve(K)”
-> (a,b)to (a,b)
x=a
Partitioning of CAN Ranges (1)
Partitioning of CAN Ranges (2)
•Partitioning is performed according to some rules
• Strict sequencing of value range partitioning
• According to the order D
• For example: x, y, z, x, y, z, ... if D=3
•Partitioning tree 0 1 Y
• Reflects „history“ of the
partitioning process 0 1 0 1 X
• Important for fusion of B 0 1
A F Y
ranges in the case of 0 1 X
exit or failure of nodes D
C E
Structure of a CAN – Example (1)
•Insertion of nodes A,…, D
y
C (011)
B (0) B (00) C (01) B (00)
D (010)
x
0 1 0 1 0 1
0 1 0 1
B A A A
B C B 0 1
D C
Structure of a CAN – Example (2)
•Insertion of nodes E, F, G
y
G (101)
A (1) A (10) F (11) F (11)
A (100)
C E C E C E
(0110) (0111) (0110)(0111) (011 (0111)
B (00) B (00) B (00) 0)
D (010) D (010) D (010)
x
0 1 0 1 0 1
0 1 0 1 0 1
A 0 1 0 1
0 1 0 1
B B 0 1
A F B 0 1
F
0 1 0 1
D D 0 1 D A G
C E C E C E
Other Improvements for CAN
•Routing metrics
• measure the delay between neighbors
• choose the neighbors with the shortest delay
•Overlapping regions
• k nodes jointly manage one area
• more redundancy
• faster routing paths because of less number of zones
•Equal (uniform) partitioning of regions
• Target zone tests during the join procedure: are there
“large” neighbors in the proximity, being more qualified
for partitioning?
Conclusions for CAN
•CAN is a peer-to-peer system based on a DHT.
•It operates with D dimensions. The number of
dimensions determines the efficiency.
•Access to a key in O( N )
1
D D
4