You are on page 1of 73

Peer-to-Peer Computing

Sameh El-Ansary
Nile University

5/18/2009 1
P2P Why Should we care?
800
700
600
non-HTT P
500
non-HT TP T CP
Mbps

400 T CP
300
P2P P2P
200
Akamai
100 WWW
WWW
0
12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00

12:00

0:00
Mon

Thu
Sat
Wed

Sun

Wed
Fri
Thu

Tue
Breakdown of UW TCP bandwidth into HTTP Components (May 2002)

• WWW = 14% of TCP traffic; P2P = 43% of TCP traffic


• P2P dominates WWW in bandwidth consumed!!
Source: Hank Levy. See
5/18/2009
http://www.cs.washington.edu/research/networking/websys/pubs/osdi_2002/osdi.pdf
Not interested yet? .. What
about…?
 “In September 2005, eBay acquired Skype
for €2.5 billion in cash and stock, plus an
additional 1.5 billion in rewards if goals are
met by 2008” ….
 The message is:
 P2P is a vital topic in today’s information
systems and definitely an essential part of
any distributed systems course.

5/18/2009
Outline
 What is P2P?
 What problems come with it?
 Evolution (Discovery Related)
 1st Generation:
 Napster
 2nd Generation: Flooding-Based systems
 Gnutella
 3rd Generation: Distributed Hash Tables
 Chord, DKS, Pastry etc…
 Applications
5/18/2009
What is Peer-To-Peer Computing? (1/3)

Oram (First book on P2P):



P2P is a class of applications that:
 Takes advantage of resources – (storage, cpu, etc,..) –
available at the edges of the Internet.
 Because accessing these decentralized resources means
operating in an environment of unstable connectivity and
unpredictable IP addresses, P2P nodes must operate
outside the DNS system and have significant or total
autonomy from central servers.

5/18/2009
What is Peer-To-Peer Computing? (2/3)

 P2P Working Group (A Standardization Effort):


P2P computing is:
 The sharing of computer resources and services
by direct exchange between systems.
 Peer-to-peer computing takes advantage of
existing computing power and networking
connectivity, allowing economical clients to
leverage their collective power to benefit the
entire enterprise.

5/18/2009
What is Peer-To-Peer Computing? (3/3)

 Our view: P2P computing is distributed


computing with the following desirable
properties:
 Resource Sharing
 Dual client/server role
 Decentralization/Autonomy
 Scalability
 Robustness/Self-Organization

5/18/2009
Peer-to-Peer Research Issues
 Discovery:
 Where are things?
 Content Distribution:
 How fast can we get things?
 NAT/Firewalls
 Jumping over them
 Legal issues
 Copyright laws
 Security
 Anonymity
 …
 …
5/18/2009
Let us see how did it all start..
 Users storing data items on their machines
 Other users are interested in this data
 Problem: How does a user know which other user(s) in the
world
have the data item(s) that he desires?

Where is
foo.mp3? ….

….
….

….

Hello.mp3
foo.mp3
Britney.mp3 …..
bye.mp3
….
Hello.mp3
Hope.sics.se
….
x.imit.kth.se
5/18/2009 S. El-Ansary & S. Haridi, 2G1526,
9
Lecture 01
Q
1st Generation of P2P systems
(Central Directory + Distributed Storage)

Solution: (build a
central directory) bye.mp3 {x.imit.kth.se}

Napster…. britney.mp3
hello.mp3
{hope.sics.se}
{hope.sics.se, x.imit.kth.se}
britney.mp3 {hope.sics.se}
Central Directory
foo.mp3 {x.imit.kth.se}

Hello.mp3 foo.mp3
….. ….
bye.mp3
Britney.mp3
Data Transfer Hello.mp3 Data Transfer ….
Hope.sics.se x.imit.kth.se
5/18/2009 …..
Basic Operations in Napster
 Join
 Connect the central server (Napster)
 Leave/Fail
 Simply disconnect
 Server detects failure, removes your data from the
directory
 Share (Publish/Insert)
 Inform the server about what you have
 Search (Query)
 Ask the central server and it returns a list of hits
 Download
 Directly download from using the hits provided by the
server
5/18/2009
The End of Napster
 Since users of Napster stored copyrighted
material, the service was stopped for legal
reasons
 But…. a second generation appeared…
 Discussion
 Scalability
 Failure

5/18/2009

Q
2nd Generation (Random Overlay Networks
Distributed Directory + Distributed Storage)
Main representatives:
Gnutella
Freenet
Storage + Directory
Storage + Directory

Storage + Directory

Storage + Directory
Storage + Directory

Storage + Directory

Storage + Directory
5/18/2009 S. El-Ansary & S. Haridi, 2G1526,
13
Lecture 01
Gnutella Protocol Messages
 Broadcast Messages
 Ping: initiating message (“I’m here”)
 Query: search pattern and TTL (time-to-live)
 Back-Propagated Messages
 Pong: reply to a ping, contains information about the peer
 Query response: contains information about the computer
that has the needed file
 Node-to-Node Messages
 GET: return the requested file
 PUSH: push the file to me

5/18/2009
Gnutella search mechanism
Steps:
7
1 A 4 • Node 2 initiates search for file A
2

3 6

5/18/2009
15
Gnutella Search Mechanism
A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2 A

3 6
A

5/18/2009
Gnutella Search Mechanism
A A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message

A
3 6

A 5

5/18/2009
Gnutella Search Mechanism
A:7 A
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message
• Nodes that have file A initiate a
A reply message
3 6

A:5 5
A

5/18/2009
Gnutella Search Mechanism
Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2 A:7
• Neighbors forward message
• Nodes that have file A initiate a
A:5 6 reply message
3
• Query reply message is back-
propagated
5 A

5/18/2009
Gnutella Search Mechanism
Steps:
7
1 4 • Node 2 initiates search for file A
A:7
• Sends message to all neighbors
2 A:5 • Neighbors forward message
• Nodes that have file A initiate a
6 reply message
3
• Query reply message is back-
propagated
5

5/18/2009
20
Gnutella Search Mechanism
download A Steps:
7
1 4 • Node 2 initiates search for file A
• Sends message to all neighbors
2
• Neighbors forward message
• Nodes that have file A initiate a
6 reply message
3
• Query reply message is back-
propagated
5 • File download

• Note: file transfer between


clients behind firewalls is not
possible; if only one client, X, is
behind a firewall, Y can request
that X push the file to Y
5/18/2009
Gnutella Summary
 Uses a flooding-based algorithm
 Simple
 Robust
 Created too much control traffic
 Actually became a problem for ISPs
 Low guarantees
 The problem motivated academic research to create
a better solution
 Discussion:
 How do we know the first nodes?
 Can we reduce this traffic without drastic changes?
5/18/2009

Q
What did we cover so far?

 What is P2P?
 What problems come with it?
 Evolution (Discovery Related)
 1st Generation:
 Napster Will
Cover
 2nd Generation: Flooding-Based systems Today
 Gnutella

 3rd Generation: Distributed Hash Tables


 Chord, DKS, Pastry etc…

 Applications
5/18/2009 23
Key idea in DHTs
1st,2ndGeneration: Each data item
is stored in the machine of its
creator/downloader
Third Generation (DHTs): The id of a data
item determines the machine on which it is
going to be stored
Simplest Example: Consistent Hashing using
a Ring
5/18/2009 24
Consistent Hashing using a Ring (1/6)

123.55.17.20 hi.boo.com foo.kth.se course.ericsson.com

-4 machines some of Plan.doc Sip.ppt


which have data they are cash.xls
willing to share
- In 1st, 2nd, those data
items would be stored at
their creators
- Not the case any more in
DHTs
5/18/2009 25
Consistent Hashing using a Ring (2/6)
3 12 0 7

hash(123.55.17.20) hash(hi.boo.com) hash(foo.kth.se) hash(course.ericsson.com)

123.55.17.20 hi.boo.com foo.kth.se course.ericsson.com


- Each machine will have an id
based on the hash of its IP Plan.doc Sip.ppt
address cash.xls

- SHA-1 or MD5sum could be


used for this hashing
Hash(Plan.doc) Hash(sip.ppt)
-Naturally all of them are using
Hash(cash.xls)
the same hashing function
- Data will also have ids using the 2
same function 12
5/18/2009
14 26
Consistent Hashing using a Ring (3/6)
- In the following example, we are 15 0
1
using a hashing function with a
range of 0-15, i.e. with a maximum 14 2
of 16 nodes.
- We treat this range as a circular id 13
3
space
-Succ(x) is the first node on the ring 12
with id greater than or equal x, where 4
x is the id of a node or a document
11
-Every node n has one successor
5
pointer to Succ(n)
-Thus, the nodes are forming a ring. 10 6
- Q: how can we build this ring? 9
8 7
Failures, joins etc….

5/18/2009 27
Consistent Hashing using a Ring (4/6)

- Using this ring, we can decide 2


which document is stored at 0
which node. 15 1
- Initially, node 0 stored doc 2 14 2
and node 7 store docs 9,14
- However, using a DHT 13
3
scheme for storing files, this
will not be the case
12
4

11
5

10 6
9
8 7
12 14
5/18/2009 28
Consistent Hashing using Ring
(5/6)
- The policy is: A doc with id y, 14
would be stored at Succ(y) 0
15 1
- So, node 0 gets to store doc 14, 2
node 3 to store doc 2, and node 12 14 2
to store doc 12
- But how can we do this? 13
3
- Simple, if the successor pointers
are already there, the two 12
operations, get and put would be 4
12
simply done by following them
sequentially 11
- From any node, you can do: 5
put( hash(plan.doc), plan.doc )
10 6
- From any node, you can also do:
get(hash(plan.doc))
9
8 7
5/18/2009 29
Consistent Hashing using a Ring
(6/6)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else // forward the query around the circle
return successor.findSuccessor(id)

 (a, b] the segment of the ring moving clockwise from but not including
a until and including b
 n.foo(.) denotes an RPC of foo(.) to node n
 n.bar denotes and RPC to fetch the value of the variable bar in node n
 We call the process of finding the successor of an id a LOOKUP

5/18/2009 30
Consistent Hashing using a Ring
(6/6)
// ask node n to find the successor of id
procedure n.put(id,value)
s = findSuccessor(id)
s.store(id,value)

procedure n.get(id)
s = findSuccessor(id)
return s.retrieve(id)

 PUT and GET are nothing but lookups!!


5/18/2009 31
Consistent Hashing using a Ring-
Discussion
 We are basically done
 But….
 What about joins and failures?
 Nodes come and go as they wish
 How good or bad is this solution:
 How efficient is this solution? O(N)? Redundant
traffic?
 Should I lose my doc because some kid decided to
shut down his machine and he happened to store
my file? What about storing addresses of files
instead of files?
 What did we gain compared to Gnutella? Increased
guarantees and determinism?
 So actually we just started..
5/18/2009 32
Agenda
 Handling data
 Joins, Leaves
 Handling successor pointers
 Joins, Leaves
 Scalability
 Routing table reducing the cost from O(N) to
O(logN)
 Failures (for all the above)

5/18/2009 33
Handling data - Joins (1/2)

- A node with id 9 can come at any


14
time … 0
15 1

14 2

2
13
3

9 12 12
8 4

11
5

10 6
9
8 7
5/18/2009 34
Handling data - Joins (2/2)
- Node 9 takes over docs 9,8 from
14
its successor node 12 0
15 1
-Note that we assumed that the
successor pointers were correct 14 2
after the join and the data was
handled accordingly 2
13
3

12 12
4
9
8
11
5

9 10 6
9
8 8 7
5/18/2009 35
Handling data - Leaves (1/2)
-Node 12 can decide to leave
14
gracefully 0
15 1

14 2

2
13
3

Leave 12
4
12
11
5

10 6
9
9
8
8 7
5/18/2009 36
Handling data - Leaves (2/2)
12 14
-Node 12 hands over its data to it
successor which is node 14 0
15 1
- Note that we assumed that the
successor pointers were correct 14 2
after the leave and the data was
handled accordingly 2
13
3

12
4
12
11
5

9 10 6
9
8 8 7
5/18/2009 37
Agenda
 Handling data
 Joins, Leaves
 Handling successor pointers
 Joins, Leaves,
 Scalability
 Routing table reducing the cost from O(N) to
O(logN)
 Failures (for all the above)

5/18/2009 38
Handling Successors
 Every thing depends on successor pointers,
so, we better have them right all the time!!
 Chord, a system authored at MIT was one of
the leading DHTs which provide a
mechanism for doing that.
 In Chord, in addition to the successor pointer,
every node has a predecessor pointer as well

5/18/2009 39
Handling Successors - Chord Algorithm

nil

5/18/2009 40
Handling Successors - Chord Algo – Join (1/2)

9
15 0
1

14 2

13
- 9 can join through any other node, 3
take 0 for example
12
- 9 will set its predecessor to nil
4
- 9 will ask 0 to find succ(9) which
will be 12 11
5
- 9 will set its successor to 12
10 6
9
8 7

5/18/2009 41
Handling Successors - Chord Algo - Join (2/2)

15 0
1

14 2

13
- 9 can join through any other node, 3
take 0 for example
12
- 9 will set its predecessor to nil
4
- 9 will ask 0 to find succ(9) which
will be 12 11
5
- 9 will set its successor to 12
10 6
99
8 7
nil
5/18/2009 42
Handling Successors - Chord Algo - Stabilization
(1/3)
15 0
1

14 2

13
3

12
4

11
5

9 runs Stablize( ), i.e 10 6


First, 9 asks 12 who is your pred? 99
Second, 9 is notifying 12 that it 8 7
thinks it is the pred of 12 nil

5/18/2009 43
Handling Successors - Chord Algo - Stabilization (2/3)

15 0
1

14 2

13
3

12
4

11
5

12 gets 9’s message, so 10 6


12 knows that 9 is now its 99
pred instead of 7 8 7

nil
5/18/2009 44
Handling Successors - Chord Algo - Stabilization
(3/3)
15 0
1

14 2

13
3

12
4

11
5

10 6
Finally, when 7 runs its 99
stablization, it will discover 8 7
from 12 that pred(12) is now
9, so it sets its successor to it
5/18/2009 45
and notifies 9.
[DONE] S. El-
Handling Successors - Chord Algo - Leaves
(3/3)

 In fact leaves and failures will be treated the same by


this algorithm, so we will take it when we come to
failures

5/18/2009 46
Agenda
 Handling data
 Joins, Leaves
 Handling successor pointers
 Joins
 Scalability
 Routing table reducing the cost from O(N) to
O(logN)
 Failures (for all the above)

5/18/2009 47
Get(15
Chord – Routing (1/7) )
15 15 0 1

 Routing table size: M, 2


14
where N = 2M
 Every node n knows 13
3
successor(n + 2 i-1) ,
for i = 1..M 12
 Routing entries = log2(N)
4
 log2(N) hops from any
11
node to any other node 5

10 6
9
8 7

5/18/2009 48
Chord – Routing (2/7)
15 15 0 1

 Routing table size: M, 2


14
where N = 2M
 Every node n knows 13
3
successor(n + 2 i-1) ,
for i = 1..M 12
 Routing entries = log2(N)
4
 log2(N) hops from any
11
node to any other node 5

10 6
9
Get(15 8 7
)

5/18/2009 49
Chord – Routing (3/7)
Get(15 15 15 0 1
)
 Routing table size: M, 2
14
where N = 2M
 Every node n knows 13
3
successor(n + 2 i-1) ,
for i = 1..M 12
 Routing entries = log2(N)
4
 log2(N) hops from any
11
node to any other node 5

10 6
9
8 7

5/18/2009 50
Get(15
Chord – Routing (4/7) )
15 15 0 1
 From node 1, only 2 hops to 2
14
node 0 where item 15 is stored
 For an id space of 16 is, the 13
maximum is log2(16) = 4 hops 3
between any two nodes
 In fact, if nodes are uniformly 12
distributed, the maximum is 4
log2(# of nodes), i.e. log2(8)
hops between any two nodes 11 5
 The average complexity is:
½ log(#nodes) 10 6
9
8 7

5/18/2009 51
Chord – Routing (5/7)
Pseudo code findSuccessor(.)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else
n’ := closestPreceedingNode(id)
return n’.findSuccessor(id)

// search locally for the highest predecessor of id


procedure closestPreceedingNode(id)
for i = m downto 1 do
if finger[i] (n, id) then return finger[i]
end
return n

5/18/2009 52
Chord – Routing (6/7)
Log(# of ids) or Log (# of nodes)?
upon using the
Succ(n+2log(N)) longest finge,r
Succ(n+22) you exclude half
Succ(n+21) of the space, if
Succ(n+1 )
nodes are
uniformly
distributed in
n+4
n+2
n+1

n+8

n+N-1
n

the space, you


also have
1/2 of the id space excluded half of
Routing table of a Chord Node n:
1/4 of the id space Every Node should know the the nodes
1/8 of the id space first node in 1/2, 1/4, ...1/2log(N)of
and identifier space starting from n.
1/16 of the id space
5/18/2009 53
Chord – Routing (7/7)
Average Lookup Length

4 hops
2 hops

The # hops to reach different keys vary between 1


and log(N), on average, it is 1/2log(N)
5/18/2009 54
Handling Join/Leaves For Fingers
Finger Stabilization (1/5)
 Periodically refresh finger table entries, and
store the index of the next finger to fix
 This is also the initialization procedure for the
finger table
 Local variable next initially 0
procedure n.fixFingers()
next := next+1
if next > m then next := 1
finger[next] := findSuccessor(n + 2next-1)

5/18/2009 55
Example
finger stabilization (2/5)
 Current situation: succ(N48) is N60
 Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60

N21.Fingerj.start N21.Fingerj.node

N21 N26 N32 N48 N53 N60

5/18/2009 56
Example
finger stabilization (3/5)
 New node N56 joins and stabilizes successor pointer
 Finger j of node N21 is wrong
 N53 eventually try to fix finger j by looking up 53 which stops
at N48, however and nothing changes

N21.Fingerj.start N21.Fingerj.node

N21 N26 N32 N48 N53 N60

N56

5/18/2009 57
Example
finger stabilization (4/5)
 N48 will eventually stabilize its successor
 This means the ring is correct now.

N21.Fingerj.start N21.Fingerj.node

N21 N26 N32 N48 N53 N56 N60

5/18/2009 58
Example
finger stabilization (5/5)
 When N21 tries to fix Finger j again, this time the response
from N48 will be correct and N21 corrects the finger

N21.Fingerj.node

N21.Fingerj.start

N21 N26 N32 N48 N53 N56 N60

5/18/2009 59
Agenda
 Handling data
 Joins, Leaves
 Handling successor pointers
 Joins, Leaves,
 Scalability
 Routing table reducing the cost from O(N) to
O(logN)
 Failures (for all the above)

5/18/2009 60
Handling Failures –
Replication of Successors & Data
 Evidently the failure of  Similarly for data, each
one successor pointer node replicates its
means total collapse content on the nodes
 Solution: A node as a in its successors list
“successors list” of size  Works perfectly with
r containing the
the sequential search
immediate r successors
 How big should r be?
log(N) or a large
constant should be ok
 Enables stabilization to
handle failures
5/18/2009 61
Handling Failures- Ring (1/5)
 Maintaining the ring
 Each node maintains a successor list of length r
 If a node’s immediate successor does not respond, it uses
the second entry in its successor list
 updateSuccessorList copies a successor list from s:
removing last entry, and prepending s
 Join a Chord containing node n’
procedure n.join(n’)
predecessor := nil
s := n’.findSuccessor(n)
updateSuccessorList(s.successorList)

5/18/2009 62
Handling Failures- Ring (2/5)
 Check whether predecessor has failed
procedure n.checkPredecessor()
if predecessor has failed then
predecessor := nil

5/18/2009 63
Handling Failures- Ring (3/5)
procedure n.stabilize()
s := Find first alive node in successorList
x := s.predecessor
if x not nil and x  (n, s) then s := x end
updateSuccessorList(s.successorList)
s.notify(n)

procedure n.notify(n’)
if predecessor = nil or n’ (predecessor, n) then
predecessor := n’

5/18/2009 64
Failure – Ring (4/5)
Example – Node failure (N26)
 Initially suc(N21,2)

suc(N21,1)
suc(N26,1) N32
N21 N26

pred(N32) pred(N32)

 After N21 performed stabilize(), before N21.notify(N32)


suc(N21,1)

N21 N26 N32

pred(N32)

5/18/2009 65
Failure – Ring (5/5)
Example - Node failure (N26)
 After N21 performed stabilize(), before N21.notify(N32)
 N21.notify(N32) has no effect
suc(N21,1)

N21 N26 N32

pred(N32)

 After N32.checkPredecessor()
suc(N21,1)

N21 N26 N32

 Next N21.stabilize() fixes N32’s predecessor


5/18/2009 66
Failure – Lookups (1/5)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else
n’ := closestPreceedingNode(id)
return
try
n’.findSuccessor(id)
catch failure of n’ then
mark n’ in finger[.] as failed
n.findSuccessor(id)

// search locally for the highest predecessor of id


procedure closestPreceedingNode(id)
for i = m downto 1 do
if finger[i].node is alive and finger[i] (n, id) then return finger[i]
end
return n
5/18/2009 67
Failure – Lookups (2/5)
 As long as the ring is correct findSuccessor
always find the successor of an identifier

5/18/2009 68
Failure – Lookups (3/5)
Example- finger stabilization
 N56 fails

Finger j

N21 N26 N32 N48 N53 N56 N60

5/18/2009 69
Failure - Lookups(4/5)
Example - finger stabilization
 N56 fails
 Successor list stabilization corrects the ring

Finger j

N21 N26 N32 N48 N53 N56 N60

5/18/2009 70
Failure - Lookups(5/5)
Example - finger stabilization
 N56 fails
 Successor list stabilization corrects the ring
 N21.fixFingers() for finger j calls N21.findSuccessor(53)
 Returns from node N48

Finger j

N21 N26 N32 N48 N53 N56 N60

5/18/2009 71
What should the VALUE in
put(key,value) be?
 If we were storing files, replication can be
costly, also access time might be high,
however, it might be needed in certain
applications like distributed files systems
 For file sharing and other application it would
be more suitable to store addresses of files
rather that files, however stale addresses
could be a problem. One solution is to let
them values expire and the publisher
refreshes them periodically.
5/18/2009 72
Applications
 FileSharing:
 Emule and bittorrent are one of many popluar filesharing
applications which are currently using DHTs
 Decentralized directories of all kinds, work has been done on
using DHTs to replace
 DNS, MIT
 SIP discovery, Cisco
 Trackers in Bittorrent
 Decentralized file systems
 Past & Farsite, Microsoft
 Oceanstore, Berkeley
 Keso, KTH
 SICS, KTH
 Distributed backup
 Decentralized command & control for military apps
5/18/2009 73

You might also like