Professional Documents
Culture Documents
in Peer-to-Peer Networks
Introduction
• Churn is defined by the dynamics of peer
participation in peer-to-peer networks.
• Not easy to find a model that accurately
describes this behavior.
• Useful for design and evaluation of peer-to-
peer networks.
Study
Overall goal is to draw some conclusions from
measuring the arrival and departure time of peers
in three deployed peer-to-peer networks:
• Gnutella – unstructured file system
• BitTorrent – content distribution system
• Kad – distributed hash table (DHT)
Tools used:
• Cruiser – distributed peer-to-peer crawler to
collect centralized activity logs.
• Activity logs from BitTorrrent trackers.
Measurement collections
Pitfalls
• Missing data – measurement should be continuous for
the collected data to be meaningful.
• Biased peer selection – closed population versus all the
peers in the network.
• Handling long sessions – measurement window length
to capture long sessions.
• False negatives
• Handling brief events
• NAT devices
• Dynamic addresses
Group-Level Characterization
Distribution of Inter-Arrival Time
Weibull distribution:
• k=0.79 Debian
• k=0.53 Red Hat
• k=0.62 FlatOut
Distribution of Session Length
Weibull distribution:
• k=0.38, λ=42.4 Debian
• k=0.34, λ=21.3 Red Hat
• k=0.59, λ=41.9 FlatOut
Summary
• Most sessions are short (minutes).
• Some sessions are very long (days or weeks).
• Data is better described by Weibull or log-
normal distributions rather than exponential
distributions.
Lingering after download completion
• The availability of individual peers across two consecutive days are strongly
correlated.Furthermore, most observed peers appear only once per day.
Design Implications
The results suggest two reliable strategies for identifying long-
lived peers as follows:
• Randomly selecting a set of peers that are all active at the
same time is likely to capture long-lived peers, since the
distribution of uptime among peers up at any given
moment is weighted more heavily towards long-lived peers.
• Observed peers must be weighted by the number of times
they are observed. The key point is that observations made
over time become skewed towards the larger number of
short-lived peers if the observations are not weighted back
in favor of the long-lived peers.
Design Implications
Low cost bootstrapping mechanism that does not require a central
bootstrapping node:
• Since 20% - 30% of peers at any moment have an uptime longer
than one day, each peer can select and cache IP addresses of
several long-lived peers using one of the strategies described
above.
• To connect to the system at any later time, each peer can contact
cached long-lived peers to locate a participating peer in the system.
Maintaining a sufficiently large cache of long-lived peers ensures
that each peer can always successfully bootstrap to the system
without the need to contact a well-known centralized address.
• In contrast, selecting cached peers based on a first-in first-out
strategy, tends to generate a list of short-lived peers that are
unlikely to be available.
Design Implications
The strong correlation in the length of
consecutive sessions and availability implies that
a peer-to-peer application can roughly estimate
the duration of its session length (or availability
per day) based on its last observed behavior. The
advantage of this approach is that it does not
require the application to wait in order to
predict that the user may have a long session.