Professional Documents
Culture Documents
Sameh El-Ansary
Nile University
5/18/2009 1
P2P Why Should we care?
800
700
600
non-HTT P
500
non-HT TP T CP
Mbps
400 T CP
300
P2P P2P
200
Akamai
100 WWW
WWW
0
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
12:00
0:00
Mon
Thu
Sat
Wed
Sun
Wed
Fri
Thu
Tue
Breakdown of UW TCP bandwidth into HTTP Components (May 2002)
5/18/2009
Outline
What is P2P?
What problems come with it?
Evolution (Discovery Related)
1st Generation:
Napster
2nd Generation: Flooding-Based systems
Gnutella
3rd Generation: Distributed Hash Tables
Chord, DKS, Pastry etc…
Applications
5/18/2009
What is Peer-To-Peer Computing? (1/3)
5/18/2009
What is Peer-To-Peer Computing? (2/3)
5/18/2009
What is Peer-To-Peer Computing? (3/3)
5/18/2009
Peer-to-Peer Research Issues
Discovery:
Where are things?
Content Distribution:
How fast can we get things?
NAT/Firewalls
Jumping over them
Legal issues
Copyright laws
Security
Anonymity
…
…
5/18/2009
Let us see how did it all start..
Users storing data items on their machines
Other users are interested in this data
Problem: How does a user know which other user(s) in the
world
have the data item(s) that he desires?
Where is
foo.mp3? ….
….
….
….
Hello.mp3
foo.mp3
Britney.mp3 …..
bye.mp3
….
Hello.mp3
Hope.sics.se
….
x.imit.kth.se
5/18/2009 S. El-Ansary & S. Haridi, 2G1526,
9
Lecture 01
Q
1st Generation of P2P systems
(Central Directory + Distributed Storage)
Solution: (build a
central directory) bye.mp3 {x.imit.kth.se}
Napster…. britney.mp3
hello.mp3
{hope.sics.se}
{hope.sics.se, x.imit.kth.se}
britney.mp3 {hope.sics.se}
Central Directory
foo.mp3 {x.imit.kth.se}
Hello.mp3 foo.mp3
….. ….
bye.mp3
Britney.mp3
Data Transfer Hello.mp3 Data Transfer ….
Hope.sics.se x.imit.kth.se
5/18/2009 …..
Basic Operations in Napster
Join
Connect the central server (Napster)
Leave/Fail
Simply disconnect
Server detects failure, removes your data from the
directory
Share (Publish/Insert)
Inform the server about what you have
Search (Query)
Ask the central server and it returns a list of hits
Download
Directly download from using the hits provided by the
server
5/18/2009
The End of Napster
Since users of Napster stored copyrighted
material, the service was stopped for legal
reasons
But…. a second generation appeared…
Discussion
Scalability
Failure
5/18/2009
Q
2nd Generation (Random Overlay Networks
Distributed Directory + Distributed Storage)
Main representatives:
Gnutella
Freenet
Storage + Directory
Storage + Directory
Storage + Directory
Storage + Directory
Storage + Directory
Storage + Directory
Storage + Directory
5/18/2009 S. El-Ansary & S. Haridi, 2G1526,
13
Lecture 01
Gnutella Protocol Messages
Broadcast Messages
Ping: initiating message (“I’m here”)
Query: search pattern and TTL (time-to-live)
Back-Propagated Messages
Pong: reply to a ping, contains information about the peer
Query response: contains information about the computer
that has the needed file
Node-to-Node Messages
GET: return the requested file
PUSH: push the file to me
5/18/2009
Gnutella search mechanism
Steps:
7
1 A 4 • Node 2 initiates search for file A
2
3 6
5/18/2009
15
Gnutella Search Mechanism
A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2 A
3 6
A
5/18/2009
Gnutella Search Mechanism
A A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message
A
3 6
A 5
5/18/2009
Gnutella Search Mechanism
A:7 A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message
• Nodes that have file A initiate a
A reply message
3 6
A:5 5
A
5/18/2009
Gnutella Search Mechanism
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2 A:7
• Neighbors forward message
• Nodes that have file A initiate a
A:5 6 reply message
3
• Query reply message is back-
propagated
5 A
5/18/2009
Gnutella Search Mechanism
Steps:
7
1 4 • Node 2 initiates search for file A
A:7
• Sends message to all neighbors
2 A:5 • Neighbors forward message
• Nodes that have file A initiate a
6 reply message
3
• Query reply message is back-
propagated
5
5/18/2009
20
Gnutella Search Mechanism
download A Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message
• Nodes that have file A initiate a
6 reply message
3
• Query reply message is back-
propagated
5 • File download
Q
What did we cover so far?
What is P2P?
What problems come with it?
Evolution (Discovery Related)
1st Generation:
Napster Will
Cover
2nd Generation: Flooding-Based systems Today
Gnutella
Applications
5/18/2009 23
Key idea in DHTs
1st,2ndGeneration: Each data item
is stored in the machine of its
creator/downloader
Third Generation (DHTs): The id of a data
item determines the machine on which it is
going to be stored
Simplest Example: Consistent Hashing using
a Ring
5/18/2009 24
Consistent Hashing using a Ring (1/6)
5/18/2009 27
Consistent Hashing using a Ring (4/6)
11
5
10 6
9
8 7
12 14
5/18/2009 28
Consistent Hashing using Ring
(5/6)
- The policy is: A doc with id y, 14
would be stored at Succ(y) 0
15 1
- So, node 0 gets to store doc 14, 2
node 3 to store doc 2, and node 12 14 2
to store doc 12
- But how can we do this? 13
3
- Simple, if the successor pointers
are already there, the two 12
operations, get and put would be 4
12
simply done by following them
sequentially 11
- From any node, you can do: 5
put( hash(plan.doc), plan.doc )
10 6
- From any node, you can also do:
get(hash(plan.doc))
9
8 7
5/18/2009 29
Consistent Hashing using a Ring
(6/6)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else // forward the query around the circle
return successor.findSuccessor(id)
(a, b] the segment of the ring moving clockwise from but not including
a until and including b
n.foo(.) denotes an RPC of foo(.) to node n
n.bar denotes and RPC to fetch the value of the variable bar in node n
We call the process of finding the successor of an id a LOOKUP
5/18/2009 30
Consistent Hashing using a Ring
(6/6)
// ask node n to find the successor of id
procedure n.put(id,value)
s = findSuccessor(id)
s.store(id,value)
procedure n.get(id)
s = findSuccessor(id)
return s.retrieve(id)
5/18/2009 33
Handling data - Joins (1/2)
14 2
2
13
3
9 12 12
8 4
11
5
10 6
9
8 7
5/18/2009 34
Handling data - Joins (2/2)
- Node 9 takes over docs 9,8 from
14
its successor node 12 0
15 1
-Note that we assumed that the
successor pointers were correct 14 2
after the join and the data was
handled accordingly 2
13
3
12 12
4
9
8
11
5
9 10 6
9
8 8 7
5/18/2009 35
Handling data - Leaves (1/2)
-Node 12 can decide to leave
14
gracefully 0
15 1
14 2
2
13
3
Leave 12
4
12
11
5
10 6
9
9
8
8 7
5/18/2009 36
Handling data - Leaves (2/2)
12 14
-Node 12 hands over its data to it
successor which is node 14 0
15 1
- Note that we assumed that the
successor pointers were correct 14 2
after the leave and the data was
handled accordingly 2
13
3
12
4
12
11
5
9 10 6
9
8 8 7
5/18/2009 37
Agenda
Handling data
Joins, Leaves
Handling successor pointers
Joins, Leaves,
Scalability
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)
5/18/2009 38
Handling Successors
Every thing depends on successor pointers,
so, we better have them right all the time!!
Chord, a system authored at MIT was one of
the leading DHTs which provide a
mechanism for doing that.
In Chord, in addition to the successor pointer,
every node has a predecessor pointer as well
5/18/2009 39
Handling Successors - Chord Algorithm
nil
5/18/2009 40
Handling Successors - Chord Algo – Join (1/2)
9
15 0
1
14 2
13
- 9 can join through any other node, 3
take 0 for example
12
- 9 will set its predecessor to nil
4
- 9 will ask 0 to find succ(9) which
will be 12 11
5
- 9 will set its successor to 12
10 6
9
8 7
5/18/2009 41
Handling Successors - Chord Algo - Join (2/2)
15 0
1
14 2
13
- 9 can join through any other node, 3
take 0 for example
12
- 9 will set its predecessor to nil
4
- 9 will ask 0 to find succ(9) which
will be 12 11
5
- 9 will set its successor to 12
10 6
99
8 7
nil
5/18/2009 42
Handling Successors - Chord Algo - Stabilization
(1/3)
15 0
1
14 2
13
3
12
4
11
5
5/18/2009 43
Handling Successors - Chord Algo - Stabilization (2/3)
15 0
1
14 2
13
3
12
4
11
5
nil
5/18/2009 44
Handling Successors - Chord Algo - Stabilization
(3/3)
15 0
1
14 2
13
3
12
4
11
5
10 6
Finally, when 7 runs its 99
stablization, it will discover 8 7
from 12 that pred(12) is now
9, so it sets its successor to it
5/18/2009 45
and notifies 9.
[DONE] S. El-
Handling Successors - Chord Algo - Leaves
(3/3)
5/18/2009 46
Agenda
Handling data
Joins, Leaves
Handling successor pointers
Joins
Scalability
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)
5/18/2009 47
Get(15
Chord – Routing (1/7) )
15 15 0 1
10 6
9
8 7
5/18/2009 48
Chord – Routing (2/7)
15 15 0 1
10 6
9
Get(15 8 7
)
5/18/2009 49
Chord – Routing (3/7)
Get(15 15 15 0 1
)
Routing table size: M, 2
14
where N = 2M
Every node n knows 13
3
successor(n + 2 i-1) ,
for i = 1..M 12
Routing entries = log2(N)
4
log2(N) hops from any
11
node to any other node 5
10 6
9
8 7
5/18/2009 50
Get(15
Chord – Routing (4/7) )
15 15 0 1
From node 1, only 2 hops to 2
14
node 0 where item 15 is stored
For an id space of 16 is, the 13
maximum is log2(16) = 4 hops 3
between any two nodes
In fact, if nodes are uniformly 12
distributed, the maximum is 4
log2(# of nodes), i.e. log2(8)
hops between any two nodes 11 5
The average complexity is:
½ log(#nodes) 10 6
9
8 7
5/18/2009 51
Chord – Routing (5/7)
Pseudo code findSuccessor(.)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else
n’ := closestPreceedingNode(id)
return n’.findSuccessor(id)
5/18/2009 52
Chord – Routing (6/7)
Log(# of ids) or Log (# of nodes)?
upon using the
Succ(n+2log(N)) longest finge,r
Succ(n+22) you exclude half
Succ(n+21) of the space, if
Succ(n+1 )
nodes are
uniformly
distributed in
n+4
n+2
n+1
n+8
n+N-1
n
4 hops
2 hops
5/18/2009 55
Example
finger stabilization (2/5)
Current situation: succ(N48) is N60
Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60
N21.Fingerj.start N21.Fingerj.node
5/18/2009 56
Example
finger stabilization (3/5)
New node N56 joins and stabilizes successor pointer
Finger j of node N21 is wrong
N53 eventually try to fix finger j by looking up 53 which stops
at N48, however and nothing changes
N21.Fingerj.start N21.Fingerj.node
N56
5/18/2009 57
Example
finger stabilization (4/5)
N48 will eventually stabilize its successor
This means the ring is correct now.
N21.Fingerj.start N21.Fingerj.node
5/18/2009 58
Example
finger stabilization (5/5)
When N21 tries to fix Finger j again, this time the response
from N48 will be correct and N21 corrects the finger
N21.Fingerj.node
N21.Fingerj.start
5/18/2009 59
Agenda
Handling data
Joins, Leaves
Handling successor pointers
Joins, Leaves,
Scalability
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)
5/18/2009 60
Handling Failures –
Replication of Successors & Data
Evidently the failure of Similarly for data, each
one successor pointer node replicates its
means total collapse content on the nodes
Solution: A node as a in its successors list
“successors list” of size Works perfectly with
r containing the
the sequential search
immediate r successors
How big should r be?
log(N) or a large
constant should be ok
Enables stabilization to
handle failures
5/18/2009 61
Handling Failures- Ring (1/5)
Maintaining the ring
Each node maintains a successor list of length r
If a node’s immediate successor does not respond, it uses
the second entry in its successor list
updateSuccessorList copies a successor list from s:
removing last entry, and prepending s
Join a Chord containing node n’
procedure n.join(n’)
predecessor := nil
s := n’.findSuccessor(n)
updateSuccessorList(s.successorList)
5/18/2009 62
Handling Failures- Ring (2/5)
Check whether predecessor has failed
procedure n.checkPredecessor()
if predecessor has failed then
predecessor := nil
5/18/2009 63
Handling Failures- Ring (3/5)
procedure n.stabilize()
s := Find first alive node in successorList
x := s.predecessor
if x not nil and x (n, s) then s := x end
updateSuccessorList(s.successorList)
s.notify(n)
procedure n.notify(n’)
if predecessor = nil or n’ (predecessor, n) then
predecessor := n’
5/18/2009 64
Failure – Ring (4/5)
Example – Node failure (N26)
Initially suc(N21,2)
suc(N21,1)
suc(N26,1) N32
N21 N26
pred(N32) pred(N32)
pred(N32)
5/18/2009 65
Failure – Ring (5/5)
Example - Node failure (N26)
After N21 performed stabilize(), before N21.notify(N32)
N21.notify(N32) has no effect
suc(N21,1)
pred(N32)
After N32.checkPredecessor()
suc(N21,1)
5/18/2009 68
Failure – Lookups (3/5)
Example- finger stabilization
N56 fails
Finger j
5/18/2009 69
Failure - Lookups(4/5)
Example - finger stabilization
N56 fails
Successor list stabilization corrects the ring
Finger j
5/18/2009 70
Failure - Lookups(5/5)
Example - finger stabilization
N56 fails
Successor list stabilization corrects the ring
N21.fixFingers() for finger j calls N21.findSuccessor(53)
Returns from node N48
Finger j
5/18/2009 71
What should the VALUE in
put(key,value) be?
If we were storing files, replication can be
costly, also access time might be high,
however, it might be needed in certain
applications like distributed files systems
For file sharing and other application it would
be more suitable to store addresses of files
rather that files, however stale addresses
could be a problem. One solution is to let
them values expire and the publisher
refreshes them periodically.
5/18/2009 72
Applications
FileSharing:
Emule and bittorrent are one of many popluar filesharing
applications which are currently using DHTs
Decentralized directories of all kinds, work has been done on
using DHTs to replace
DNS, MIT
SIP discovery, Cisco
Trackers in Bittorrent
Decentralized file systems
Past & Farsite, Microsoft
Oceanstore, Berkeley
Keso, KTH
SICS, KTH
Distributed backup
Decentralized command & control for military apps
5/18/2009 73