You are on page 1of 1

Omer C.

Semerci
Student ID: 903064978
5.1 [P2P -22] Tracing a large-scale Peer to Peer System: an hour in the life of Gnutella
Problems
Markatos conducted a study of a popular large scale P2P system called Gnutella protocol traffic
aimed at caching query/response traffic to reduce the network load. The network traffic in
Gnutella is bursty and contains significant amount of temporal locality which means same
queries are submitted more than several times(between 2.5 and 5). He took measurements from
Gnutella clients which are located at Greece, Norway and Rochester for one hour and analyzed
the effects on network traffic. Unsurprisingly, since Gnutella is an overlay network on top of the
network connecting peers, geographic location has not a direct effect on the number of queries
each peer receives. The main problem was 40% of the queries were submitted more than once
and it is identical on all clients. Additionally; cached data becomes stale frequently due to high
network traffic between peers. Therefore; time period for caching must me long enough to
improve performance and short enough to avoid sending stale responses.
Another problem was Gnutella peer may see a query request several times from different peers
with different TTLs, but they may not necessarily produce the same results. Actually, this may
require more advanced caching mechanism than proposed one when number of peers increases.
New Idea and Strengths
The idea proposed is to introducing a basic caching mechanism to reduce query traffic as much
as factor of two. Caching in P2P systems is different from web caching where content is
provided by any well defined single server. In P2P systems, caching is achieved by composing
the results of the content provided by different peers. The proposed approach works in the
following way; when a client receives a query it checks cache, text and TTL, if they are same
returns the response from cache. Even searched text is same, if TTL is not same, client forwards
the new query to the peer client. Then, when it receives results, it combines them with locally
cached ones and returns the result. Even if they are not hit, these queries increasing the
performance by obtaining a substantial part of the responses from the cache. They also decrease
the network traffic since they need to be forwarded only one peer instead of being flooded.
Normally cache hit rate would be the measure of performance but partial correspondence is not a
hit so instead of hit rate they measure the network traffic reduction that is achieved by caching.
Partial correspondence refers to when a portion of the response is located at cache.
Results show the strength of the novel idea. Caching for 5 minutes achieved 30% reduction in
network traffic, where as 30 min caching achieved 50% reduction. Additionally; caching requires
maximum 3MB of memory including memory needed to store the query responses themselves
and metadata needed to organize these responses into appropriate and efficient structures.
Weaknesses and Extensions
Four important conclusions from that research are, P2P systems have bursty traffic, they have
significant amount of locality, caching even in very basic context results in important
performance boost, caching requires small amount of memory. From my point of view, the
caching should be tested with more peers not just 3.

You might also like