Peer-to-Peer Computing

Sameh El-Ansary
Nile University

5/18/2009

1

P2P Why Should we care?
800 700 600
Mbps

500 400 300 200 100
12:00 0:00

non-HT TP T CP P2P WWW
12:00 12:00 12:00 12:00 12:00 12:00 0:00 0:00 0:00 0:00 0:00 0:00

non-HTT P T CP P2P Akamai WWW
12:00 12:00 0:00 0:00

0

Mon

Wed

Breakdown of UW TCP bandwidth into HTTP Components (May 2002)

• WWW = 14% of TCP traffic; P2P = 43% of TCP traffic • P2P dominates WWW in bandwidth consumed!!
5/18/2009

Source: Hank Levy. See http://www.cs.washington.edu/research/networking/websys/pubs/osdi_2002/osdi.pdf

Wed

Sun

Thu

Tue

Thu

Sat

Fri

Not interested yet? .. What about…?

“In September 2005, eBay acquired Skype for €2.5 billion in cash and stock, plus an additional 1.5 billion in rewards if goals are met by 2008” …. The message is:

P2P is a vital topic in today’s information systems and definitely an essential part of any distributed systems course.

5/18/2009

Outline
  

What is P2P? What problems come with it? Evolution (Discovery Related)

1st Generation:

Napster Gnutella Chord, DKS, Pastry etc…

2nd Generation: Flooding-Based systems

3rd Generation: Distributed Hash Tables

Applications

5/18/2009

What is Peer-To-Peer Computing? (1/3) Oram (First book on P2P): P2P is a class of applications that:

 

Takes advantage of resources – (storage, cpu, etc,..) – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers.

5/18/2009

What is Peer-To-Peer Computing? (2/3)

P2P Working Group (A Standardization Effort): P2P computing is:

The sharing of computer resources and services by direct exchange between systems. Peer-to-peer computing takes advantage of existing computing power and networking connectivity, allowing economical clients to leverage their collective power to benefit the entire enterprise.

5/18/2009

What is Peer-To-Peer Computing? (3/3)

Our view: P2P computing is distributed computing with the following desirable properties:
    

Resource Sharing Dual client/server role Decentralization/Autonomy Scalability Robustness/Self-Organization

5/18/2009

Peer-to-Peer Research Issues
       

Discovery:

Where are things? How fast can we get things? Jumping over them Copyright laws

Content Distribution:

NAT/Firewalls

Legal issues

Security Anonymity … …

5/18/2009

Let us see how did it all start..
  

Users storing data items on their machines Other users are interested in this data Problem: How does a user know which other user(s) in the world have the data item(s) that he desires?

Where is foo.mp3?

…. …. …. ….

Hello.mp3 foo.mp3 Britney.mp3 Hope.sics.se x.imit.kth.se
5/18/2009 S. El-Ansary & S. Haridi, 2G1526, 9 Lecture 01

bye.mp3 Hello.mp3

….. …. ….

Q

1st Generation of P2P systems (Central Directory + Distributed Storage)
Solution: (build a central directory)

bye.mp3 britney.mp3 hello.mp3 Central Directory britney.mp3 foo.mp3

{x.imit.kth.se} {hope.sics.se} {hope.sics.se, x.imit.kth.se} {hope.sics.se} {x.imit.kth.se}

Napster….

Hello.mp3 Britney.mp3 Hope.sics.se
5/18/2009

foo.mp3 bye.mp3 Data Transfer Hello.mp3 x.imit.kth.se

….. Data Transfer

…. …. …..

Basic Operations in Napster
 

Join

Connect the central server (Napster) Simply disconnect Server detects failure, removes your data from the directory Inform the server about what you have Ask the central server and it returns a list of hits Directly download from using the hits provided by the server

Leave/Fail
 

  

Share (Publish/Insert)

Search (Query)

Download

5/18/2009

The End of Napster

 

Since users of Napster stored copyrighted material, the service was stopped for legal reasons But…. a second generation appeared… Discussion
 

Scalability Failure

5/18/2009

Q

2nd Generation (Random Overlay Networks Distributed Directory + Distributed Storage)
Main representatives: Gnutella Freenet
Storage

+

Directory Storage

+

Directory

Storage

+

Directory

Storage Storage

+

Directory

+

Directory

Storage

+

Directory

Storage
5/18/2009

S. El-Ansary & S. Haridi, 2G1526, 13 Lecture 01

+

Directory

Gnutella Protocol Messages

Broadcast Messages
 

Ping: initiating message (“I’m here”) Query: search pattern and TTL (time-to-live) Pong: reply to a ping, contains information about the peer Query response: contains information about the computer that has the needed file GET: return the requested file PUSH: push the file to me

Back-Propagated Messages
 

Node-to-Node Messages
 

5/18/2009

Gnutella search mechanism
1 2 7 A 4

Steps: • Node 2 initiates search for file A

3

6

5

5/18/2009 15

Gnutella Search Mechanism
A 1 2 A 7 4

Steps: • Node 2 initiates search for file A • Sends message to all neighbors

3

A

6

5

5/18/2009

Gnutella Search Mechanism
A 1 2 4 A 7

Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message
6 A

3

A

5

5/18/2009

Gnutella Search Mechanism
A:7 1 2 7 4 A

3

6

A

Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message

A:5 A

5

5/18/2009

Gnutella Search Mechanism
1 2 7 4 A:7

3

A:5

6

Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated

5 A

5/18/2009

Gnutella Search Mechanism
1 2 7 A:7 A:5 4

3

6

Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated

5

5/18/2009 20

Gnutella Search Mechanism
download A
1 2 4 7

3

6

5

Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated • File download • Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y

5/18/2009

Gnutella Summary
      

Uses a flooding-based algorithm Simple Robust Created too much control traffic

Actually became a problem for ISPs

Low guarantees The problem motivated academic research to create a better solution Discussion:
 

How do we know the first nodes? Can we reduce this traffic without drastic changes?
Q

5/18/2009

What did we cover so far?
  

What is P2P? What problems come with it? Evolution (Discovery Related)

1st Generation:  Napster 2nd Generation: Flooding-Based systems  Gnutella 3rd Generation: Distributed Hash Tables  Chord, DKS, Pastry etc…

Will Cover Today

Applications
23

5/18/2009

Key idea in DHTs 1st,2ndGeneration: Each data item is stored in the machine of its creator/downloader
Third Generation (DHTs): The id of a data item determines the machine on which it is going to be stored Simplest Example: Consistent Hashing using a Ring
5/18/2009 24

Consistent Hashing using a Ring (1/6)

123.55.17.20 -4 machines some of which have data they are willing to share - In 1st, 2nd, those data items would be stored at their creators - Not the case any more in DHTs
5/18/2009

hi.boo.com

foo.kth.se
Plan.doc

course.ericsson.com
Sip.ppt cash.xls

25

Consistent Hashing using a Ring (2/6)
3
hash(123.55.17.20)

12
hash(hi.boo.com)

0
hash(foo.kth.se)

7
hash(course.ericsson.com)

123.55.17.20 - Each machine will have an id based on the hash of its IP address - SHA-1 or MD5sum could be used for this hashing -Naturally all of them are using the same hashing function - Data will also have ids using the same function
5/18/2009

hi.boo.com

foo.kth.se
Plan.doc

course.ericsson.com
Sip.ppt cash.xls

Hash(Plan.doc)

Hash(sip.ppt) Hash(cash.xls) 12
14 26

2

Consistent Hashing using a Ring (3/6)
- In the following example, we are using a hashing function with a range of 0-15, i.e. with a maximum of 16 nodes. - We treat this range as a circular id space 13 15 14 0 1 2

3

-Succ(x) is the first node on the ring 12 with id greater than or equal x, where x is the id of a node or a document 11 -Every node n has one successor pointer to Succ(n) -Thus, the nodes are forming a ring. - Q: how can we build this ring? Failures, joins etc….
5/18/2009

4 5 10 9 8 7 6

27

Consistent Hashing using a Ring (4/6)
- Using this ring, we can decide which document is stored at which node. - Initially, node 0 stored doc 2 and node 7 store docs 9,14 - However, using a DHT scheme for storing files, this will not be the case 13 12 4 11 5 10 9 8
5/18/2009

2 15 14 0 1 2

3

6 7
12

14
28

Consistent Hashing using Ring (5/6)
- The policy is: A doc with id y, would be stored at Succ(y) - So, node 0 gets to store doc 14, node 3 to store doc 2, and node 12 to store doc 12 - But how can we do this? - Simple, if the successor pointers are already there, the two operations, get and put would be simply done by following them sequentially - From any node, you can do:
put( hash(plan.doc), plan.doc )

14

15 14 13 12
12

0

1 2

2

3

4 11 5 10 9 8 7
29

- From any node, you can also do:
get(hash(plan.doc))
5/18/2009

6

Consistent Hashing using a Ring (6/6)
// ask node n to find the successor of id procedure n.findSuccessor(id) if id (n, successor] then return successor else // forward the query around the circle return successor.findSuccessor(id)
   

(a, b] the segment of the ring moving clockwise from but not including a until and including b n.foo(.) denotes an RPC of foo(.) to node n n.bar denotes and RPC to fetch the value of the variable bar in node n We call the process of finding the successor of an id a LOOKUP
30

5/18/2009

Consistent Hashing using a Ring (6/6)
// ask node n to find the successor of id procedure n.put(id,value) s = findSuccessor(id) s.store(id,value) procedure n.get(id) s = findSuccessor(id) return s.retrieve(id)

PUT and GET are nothing but lookups!!
31

5/18/2009

Consistent Hashing using a RingDiscussion
 

We are basically done But….

What about joins and failures?  Nodes come and go as they wish How good or bad is this solution:  How efficient is this solution? O(N)? Redundant traffic?  Should I lose my doc because some kid decided to shut down his machine and he happened to store my file? What about storing addresses of files instead of files?  What did we gain compared to Gnutella? Increased guarantees and determinism?
32

So actually we just started..

5/18/2009

Agenda

Handling data

Joins, Leaves Joins, Leaves Routing table reducing the cost from O(N) to O(logN)

Handling successor pointers

Scalability

Failures (for all the above)
33

5/18/2009

Handling data - Joins (1/2)
- A node with id 9 can come at any time … 14 13 12 4 11 5 10 9 8
5/18/2009

14

15

0

1 2 2 3

9 8

12

6 7
34

Handling data - Joins (2/2)
- Node 9 takes over docs 9,8 from its successor node 12 -Note that we assumed that the successor pointers were correct after the join and the data was handled accordingly 14 13 12 4 8 11 5
9 8
5/18/2009

14

15

0

1 2 2 3

12

9

10 9 8 7

6

35

Handling data - Leaves (1/2)
-Node 12 can decide to leave gracefully 14 13 Leave
12

14

15

0

1 2 2 3

12 4 11 5
9 8

10 9 8 7

6

5/18/2009

36

Handling data - Leaves (2/2)
-Node 12 hands over its data to it successor which is node 14 - Note that we assumed that the successor pointers were correct after the leave and the data was handled accordingly 14 13 12 4 12 11 5
9 8
5/18/2009

12

14

15

0

1 2 2 3

10 9 8 7

6

37

Agenda

Handling data

Joins, Leaves Joins, Leaves, Routing table reducing the cost from O(N) to O(logN)

Handling successor pointers

Scalability

Failures (for all the above)
38

5/18/2009

Handling Successors

Every thing depends on successor pointers, so, we better have them right all the time!! Chord, a system authored at MIT was one of the leading DHTs which provide a mechanism for doing that. In Chord, in addition to the successor pointer, every node has a predecessor pointer as well

5/18/2009

39

Handling Successors - Chord Algorithm

nil

5/18/2009

40

Handling Successors - Chord Algo – Join (1/2)
9 15 14 13 - 9 can join through any other node, take 0 for example - 9 will set its predecessor to nil - 9 will ask 0 to find succ(9) which will be 12 - 9 will set its successor to 12 10 9 8
5/18/2009

0

1 2

3

12 4 11 5 6 7
41

Handling Successors - Chord Algo - Join (2/2)

15 14 13 - 9 can join through any other node, take 0 for example - 9 will set its predecessor to nil - 9 will ask 0 to find succ(9) which will be 12 - 9 will set its successor to 12 10 99 8 12

0

1 2

3

4 11 5 6 7
42

nil
5/18/2009

Handling Successors - Chord Algo - Stabilization (1/3)
15 14 13 12 4 11 5 9 runs Stablize( ), i.e First, 9 asks 12 who is your pred? Second, 9 is notifying 12 that it thinks it is the pred of 12
5/18/2009

0

1 2

3

10 99 8
nil

6 7

43

Handling Successors - Chord Algo - Stabilization (2/3)
15 14 13 12 4 11 5 12 gets 9’s message, so 12 knows that 9 is now its pred instead of 7
5/18/2009

0

1 2

3

10 99 8
nil

6 7

44

Handling Successors - Chord Algo - Stabilization (3/3)
15 14 13 12 4 11 5 10 Finally, when 7 runs its stablization, it will discover from 12 that pred(12) is now 9, so it sets its successor to it and notifies 9. [DONE] 99 8 7 6 0 1 2

3

5/18/2009

S. El-

45

Handling Successors - Chord Algo - Leaves (3/3)

In fact leaves and failures will be treated the same by this algorithm, so we will take it when we come to failures

5/18/2009

46

Agenda

Handling data

Joins, Leaves Joins Routing table reducing the cost from O(N) to O(logN)

Handling successor pointers

Scalability

Failures (for all the above)
47

5/18/2009

Chord – Routing (1/7)
15

Get(15 ) 15 0 1 2

 

Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node

14 13 12

3

4 11 5 10 9 8 7 6

5/18/2009

48

Chord – Routing (2/7)
15

15 0

1 2

 

Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node

14 13 12

3

4 11 5 10 6 9 8 7

Get(15 )
5/18/2009

49

Chord – Routing (3/7)
Get(15 )

15

15 0

1 2

 

Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node

14 13 12

3

4 11 5 10 9 8 7 6

5/18/2009

50

Chord – Routing (4/7)
15
 

Get(15 ) 15 0 1 2

From node 1, only 2 hops to 14 node 0 where item 15 is stored For an id space of 16 is, the 13 maximum is log2(16) = 4 hops between any two nodes In fact, if nodes are uniformly 12 distributed, the maximum is log2(# of nodes), i.e. log2(8) hops between any two nodes 11 The average complexity is: 10 ½ log(#nodes)
9 8 7

3

4 5 6

5/18/2009

51

Chord – Routing (5/7) Pseudo code findSuccessor(.)
// ask node n to find the successor of id procedure n.findSuccessor(id) if id (n, successor] then return successor else n’ := closestPreceedingNode(id) return n’.findSuccessor(id) // search locally for the highest predecessor of id procedure closestPreceedingNode(id) for i = m downto 1 do if finger[i] (n, id) then return finger[i] end return n

5/18/2009

52

Chord – Routing (6/7) Log(# of ids) or Log (# of nodes)?
Succ(n+2log(N)) Succ(n+22) Succ(n+21)
Succ(n+1 )

1/2 of the id space 1/4 of the id space 1/8 of the id space 1/16 of the id space
5/18/2009

Routing table of a Chord Node n: Every Node should know the first node in 1/2, 1/4, ...1/2log(N)of and identifier space starting from n.

upon using the longest finge,r you exclude half of the space, if nodes are uniformly distributed in the space, you also have excluded half of the nodes

n+1

n+4

n+2

n+8

n+N-1

n

53

Chord – Routing (7/7) Average Lookup Length
4 hops 2 hops

The # hops to reach different keys vary between 1 and log(N), on average, it is 1/2log(N)
5/18/2009 54

Handling Join/Leaves For Fingers Finger Stabilization (1/5)
Periodically refresh finger table entries, and store the index of the next finger to fix  This is also the initialization procedure for the finger table  Local variable next initially 0 procedure n.fixFingers() next := next+1 if next > m then next := 1 finger[next] := findSuccessor(n + 2next-1)

5/18/2009 55

Example finger stabilization (2/5)
 

Current situation: succ(N48) is N60 Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60

N21.Fingerj.start

N21.Fingerj.node

N21

N26

N32

N48 N53

N60

5/18/2009

56

Example finger stabilization (3/5)
  

New node N56 joins and stabilizes successor pointer Finger j of node N21 is wrong N53 eventually try to fix finger j by looking up 53 which stops at N48, however and nothing changes
N21.Fingerj.start N21.Fingerj.node

N21

N26

N32

N48 N53 N56

N60

5/18/2009

57

Example finger stabilization (4/5)
 

N48 will eventually stabilize its successor This means the ring is correct now.

N21.Fingerj.start

N21.Fingerj.node

N21

N26

N32

N48 N53

N56

N60

5/18/2009

58

Example finger stabilization (5/5)

When N21 tries to fix Finger j again, this time the response from N48 will be correct and N21 corrects the finger

N21.Fingerj.node N21.Fingerj.start

N21

N26

N32

N48 N53

N56

N60

5/18/2009

59

Agenda

Handling data

Joins, Leaves Joins, Leaves, Routing table reducing the cost from O(N) to O(logN)

Handling successor pointers

Scalability

Failures (for all the above)
60

5/18/2009

Handling Failures – Replication of Successors & Data

Evidently the failure of one successor pointer means total collapse Solution: A node as a “successors list” of size r containing the immediate r successors How big should r be? log(N) or a large constant should be ok Enables stabilization to handle failures

Similarly for data, each node replicates its content on the nodes in its successors list Works perfectly with the sequential search

5/18/2009

61

Handling Failures- Ring (1/5)

Maintaining the ring
  

Each node maintains a successor list of length r If a node’s immediate successor does not respond, it uses the second entry in its successor list updateSuccessorList copies a successor list from s: removing last entry, and prepending s

Join a Chord containing node n’ procedure n.join(n’) predecessor := nil s := n’.findSuccessor(n) updateSuccessorList(s.successorList)
62

5/18/2009

Handling Failures- Ring (2/5)
Check whether predecessor has failed procedure n.checkPredecessor() if predecessor has failed then predecessor := nil

5/18/2009

63

Handling Failures- Ring (3/5)
procedure n.stabilize() s := Find first alive node in successorList x := s.predecessor if x not nil and x  (n, s) then s := x end updateSuccessorList(s.successorList) s.notify(n) procedure n.notify(n’) if predecessor = nil or n’ (predecessor, n) then predecessor := n’

5/18/2009

64

Failure – Ring (4/5) Example – Node failure (N26)

Initially

suc(N21,2)

suc(N21,1)

N21

N26

suc(N26,1)

N32

pred(N32)

pred(N32)

After N21 performed stabilize(), before N21.notify(N32)
suc(N21,1)

N21

N26

N32

pred(N32)
5/18/2009 65

Failure – Ring (5/5) Example - Node failure (N26)
 

After N21 performed stabilize(), before N21.notify(N32) N21.notify(N32) has no effect
suc(N21,1)

N21

N26

N32

pred(N32)

After N32.checkPredecessor()
suc(N21,1)

N21

N26

N32

Next N21.stabilize() fixes N32’s predecessor
66

5/18/2009

Failure – Lookups (1/5)
// ask node n to find the successor of id procedure n.findSuccessor(id) if id (n, successor] then return successor else n’ := closestPreceedingNode(id) return try n’.findSuccessor(id) catch failure of n’ then mark n’ in finger[.] as failed n.findSuccessor(id) // search locally for the highest predecessor of id procedure closestPreceedingNode(id) for i = m downto 1 do if finger[i].node is alive and finger[i] (n, id) then return finger[i] end return n
5/18/2009 67

Failure – Lookups (2/5)

As long as the ring is correct findSuccessor always find the successor of an identifier

5/18/2009

68

Failure – Lookups (3/5) Example- finger stabilization

N56 fails

Finger j

N21

N26

N32

N48 N53

N56

N60

5/18/2009

69

Failure - Lookups(4/5) Example - finger stabilization
 

N56 fails Successor list stabilization corrects the ring

Finger j

N21

N26

N32

N48 N53

N56

N60

5/18/2009

70

Failure - Lookups(5/5) Example - finger stabilization
   

N56 fails Successor list stabilization corrects the ring N21.fixFingers() for finger j calls N21.findSuccessor(53) Returns from node N48
Finger j

N21

N26

N32

N48 N53

N56

N60

5/18/2009

71

What should the VALUE in put(key,value) be?

If we were storing files, replication can be costly, also access time might be high, however, it might be needed in certain applications like distributed files systems For file sharing and other application it would be more suitable to store addresses of files rather that files, however stale addresses could be a problem. One solution is to let them values expire and the publisher refreshes them periodically.
72

5/18/2009

Applications

FileSharing:

Emule and bittorrent are one of many popluar filesharing applications which are currently using DHTs

Decentralized directories of all kinds, work has been done on using DHTs to replace
  

DNS, MIT SIP discovery, Cisco Trackers in Bittorrent Past & Farsite, Microsoft Oceanstore, Berkeley Keso, KTH Distributed backup Decentralized command & control for military apps
73

Decentralized file systems
  

SICS, KTH
 

5/18/2009