Professional Documents
Culture Documents
Peer Networks
Po-Chiang Lin
*
, Tsung-Nan Lin
*
, and Hsinping Wang
*
*
Graduate Institute of Communication Engineering
- 2 '
0
m - 1
2 -
1
1 G (1)
and
Algorithm: The pseudo-code of dynamic search DS
Input: query source s, queried resource f, transmission
probability p
Output: the location information of f
DS(s, f, p)
/* the operation of s */
h 0
if (h <= n)
h h + 1
s choose p portion of its neighbors
m
i
carring h visits these chosen neighbors
elseif (h > n)
h h + 1
m
i
carring h visits one neighbor of s
/* the operation of r */
foreach (r)
if (r has the location information of f)
r returns the information to s
m
i
stops
elseif (h > TTL)
m
i
stops
elseif (h <= n)
h h + 1
r choose p portion of its neighbors
m
i
carring h visits these chosen neighbors
elseif (h > n)
h h + 1
m
i
carring h visits one neighbor of r
( )
( )
- 3
m
1 G
1
1 G
- 3
'
0
'
1
(2)
assuming 2 < < 3, where is the power-law exponent.
B. Performance Metrics
Success Rate (SR)
Success rate (SR) is the probability that the query is success, i.e.,
there is at least one query hit. Assume that the queried resources are
uniformly distributed in the network with replication ratio R, and then
SR can be calculated as
( )
C
R - 1 - 1 SR = (3)
where R is the replication ratio and C is the coverage. This formula
shows that the SR highly depends on the coverage of the search
algorithms. Following we use (3) to obtain an important metric
guaranteed search time.
Guaranteed Search Time (GST)
To represent the capability of one search algorithm to find the
queried resource with a given probability, we define the guaranteed
search time (GST) as the search time it takes to guarantee the query
success with success rate requirement SR
req
. GST represents the hop
count that a search is successful with probabilistic guarantee. Using
Equation (3), GST is obtained when the coverage C is equal to
( )
( )
req R - 1
SR - 1 log . For the MBFS search algorithms, this situation
occurs when
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( )
( )
req R - 1
1 - GST
'
1
'
0
GST
2
'
1
'
0
3 '
1
'
0
2 '
0
SR - 1 log
1 G 1 G p
1 G 1 G p 1 G 1 G p 1 G p
MBFS
MBFS
=
+ +
+ +
(4)
Thus the GST for MBFS is
( )
( ) ( )
( )
( )
( )
|
|
|
.
|
\
|
+
1
1 G p
SR - 1 log 1 - 1 G p
log
GST
'
0
req R - 1
'
1
1 G p
MBFS
'
1
(5)
The GST of flooding is analogue to that of MBFS with probability p =
1. The calculation of RW depends on the number of walkers k. When k
is set as 1, the GST for RW is obviously
( )
( )
req R - 1
SR - 1 log . When k
is larger than 1, assume that
( ) ( ) ( ) ( ) ( ) ( )
t
'
1
'
0
1 - t
'
1
'
0
1 G 1 G k 1 G 1 G
(6)
i.e., k is equal to or larger than the average number of the t
th
neighbors
of the query source, and assume that the effect of redundant paths can
be neglected, and then follow the same reason in (4), the GST for RW
is
( )
( ) ( ) ( ) ( )
t
k
1 G 1 G - SR - 1 log
GST
1 - t
0 i
i
'
1
'
0 req R - 1
RW
+
=
=
(7)
Now we consider the GST for DS. When the hop count h of query
message is smaller than or equal to the decision threshold n, the GST
of DS is equal to (5). When h is larger than n, the GST for DS is
( )
( )
( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
( )
( )
( ) ( ) ( )
1 -
1 G 1 G p
SR - 1 log
n
1 G p 1 - 1 G p
1 - 1 G p
-
1 G 1 G p
SR - 1 log
n
GST
1 - n
'
1
'
0
n
req R - 1
1 - n
'
1
'
1
n
'
1
1 - n
'
1
'
0
n
req R - 1
DS
+
+
=
(8)
We compare the GTS for DS and for RW with 1 walker. The
improvement ratio is
( )
( )
( ) ( )
n
'
1
'
0
'
1
RW
DS RW
1 G p
1
1 G
1 G
- 1
GST
GST - GST
(9)
In Equation (9), the last term on the right would significantly affect the
performance improvement. The GTS of DS would be exponentially
decreased with n, which can be expressed as O(1/n). Larger p would
also affect the performance, but the effect is slow when compared with
n. The extreme case of n is that it is set as the TTL limitation, i.e., DS
performs as flooding or MBFS. In this case the GST would be the
shortest. However, it would generate a huge amount of query
messages. The tradeoff between the search performance and the cost
should be taken into consideration. In the following paragraphs, we
further analyze the number of query hits and the number of query
messages, and further combine these metrics into a unified metric,
search efficiency.
Query Hits (QH)
The number of query hits highly depends on the coverage, i.e., the
number of total visited nodes. Assume that the queried resources are
uniformly distributed with replication ratio R in the network, and the
coverage is C; hence the number of query hits is RC. The coverage C
can be regarded as the summation of the coverage at each hop.
Therefore, we first analyze the coverage C
h
at the h
th
hop. Let V
h
be
the event that a vertex is visited at the h
th
hop. Suppose the probability
that the vertex i is visited at the h
th
hop is P
i
(V
h
). When the hop count
h = 1, C
h
is the expectation of the vertices that are visited at the first
hop. When the hop count h is larger than 1, the calculation of C
h
should preclude the event that the vertex has been visited in the
previous hop. Therefore, the coverage C
h
at the h
th
hop can be written
as
( )
( ) [ ] ( )
=
=
= =
=
2 h for , V P V P - 1
1 h for , V P
C
N
1 i
1 - h
1 j
h i j i
N
1 i
h i
h
(10)
where N is the total number of vertices in the network.
Following we analyze the visiting probability P
i
(V
h
) for flooding,
MBFS, RW, and DS, respectively. First we consider the flooding and
MBFS case. The visiting probability P
i
(V
h
) of flooding or MBFS is
( )
( )
( ) [ ]
=
=
2 h for , 1 G p p - 1 - 1
1 h for , 1 G p p
V P
1 - h
C
'
1 i
'
0 i
h i
(11)
where p
i
is the probability that vertex i is to be reached by certain edge.
Reference [14] shows that p
i
can be written as
=
=
N
1 i
1
1
i
i m
i m
p
(12)
where is the power-law exponent and m is the maximum degree.
When considering RW, we first calculate the probability that a
vertex i is the candidate of RW.
( )
( )
( ) [ ]
=
=
2 h for , 1 G p - 1 - 1
1 h for , 1 G p
R P
1 - h
C
'
1 i
'
0 i
h i
(13)
Then, the average number of candidates of RW at hop h is
( )
=
=
N
1 i
h i h
R P r (14)
Hence, the probability that vertex i is visited at hop h for RW is
( ) ( )
(
(
|
.
|
\
|
=
k
h
h i h i
r
1
- 1 - 1 R P V P (15)
where k is the number of walkers.
The calculation of visiting probability P
i
(V
h
) for DS depends on the
relation between h and n. When h n, P
i
(V
h
) is given by equation
(11). When h n, equation (13) to (15) are used to get P
i
(V
h
), where
the k in (15) is set as C
n
, i.e., the coverage at the n
th
hop. Therefore, the
visiting probability P
i
(V
h
) of DS is given by
( )
( )
( ) [ ]
( )
>
(
(
|
.
|
\
|
=
=
n h for ,
r
1
- 1 - 1 R P
n h 2 for , 1 G p p - 1 - 1
1 h for , 1 G p p
R P
n
1 - h
C
h
h i
C
'
1 i
'
0 i
h i
(16)
Query Messages (QM)
Now we analyze the number of query messages. When considering
the flooding and MBFS case, the query message e
h
generated at hop h
is given by
( )
( )
=
=
2 h for , C 1 G p
1 h for , 1 G p
e
1 - h
'
1
'
0
h
(17)
When considering the RW case, the number of query messages for
each hop is keep fixed as k, i.e., the number of walkers. Therefore, the
total number of query messages for RW is TTL k .
The calculation of query messages for DS depends on h and n. The
query messages e
h
generated at hop h for DS can be written as
( )
( )
>
=
=
n h for , C
n h 2 for , C 1 G p
1 h for , 1 G p
e
n
1 - h
'
1
'
0
h
(18)
Search Efficiency (SE)
Search efficiency (SE) is proposed as a unified performance metric
for search algorithms [12]. It is defined as
( )
R
SR
QM
h h QH
SE
TTL
1 h
=
=
(19)
where QH(h) / h means the query hits in the h
th
hop weighted by the
hop count, QM is the total number of query messages generated during
the query, SR is the probability that the query is success, i.e., there is at
least one query hit, and R means the replication ratio of this system.
Assume that the object is uniformly distributed in the network. Then
the query hit at the h
th
hop is equal to the multiplication of the coverage
at the h
th
hop and the replication rate R. Therefore, (19) can be written
as
( )
R
R - 1 - 1
e
h R C
SE
TTL
1 h
h
C
TTL
1 h
h
TTL
1 h
h
=
=
=
(20)
where C
h
is the coverage at the h
th
hop, e
h
is the query messages
generated at the h
th
hop, and R is the replication ratio.
IV. NUMERICAL RESULTS
In this section we show the numerical results of the performance
analysis described in section III. From these results the effects of the
parameters n and p are shown. The number of nodes N is set as 10000.
The power-law exponent is set as 2.1, which is analogue to the real-
world situation [15].
Figure 3 illustrates that how the decision threshold n would affect
the system performance. Due to the page limit, we only show the
result when p is set as 1. The case of n = 1 is analogue to the RW
search with k equal to the number of first neighbors, which is about
3.55 in this case. Moreover, the case that n = 7 is equal to the flooding
search. As this figure shows, the DS with n = 7 sends the query
messages aggressively in the first three hops and gets good search
efficiency. But the performance degrades rapidly as the hop increases.
This is because that the cost grows exponentially with the path length
between the query source and the target. On the contrary, the SE of
RW is better than that of flooding when the hop is 5 to 7. When n is
set as 2, DS gets the best SE for almost all hop counts. This figure
shows that the choice of parameter n can help DS to takes advantage of
different contexts under which each search algorithm performs well.
In order to obtain the best (n, p) combination, we illustrate the (n, p,
SE) results in Figure 4. When p is large (0.7 ~ 1), set n = 2 would get
the best SE. Moreover, the best n value increases as the p decreases.
For example, when p is set as 0.2, the best n would be 5. This is
because that when p is small, n should be increased to expand the
coverage. On the contrary, when p is large, n should be decreased to
limit the growth of query messages. Therefore, the parameters n and p
provide the tradeoff between search performance and cost. This figure
shows that the best SE is obtained when (n, p) is set as (2, 1).
Figure 3. SE vs. hop count when p is set as 1 and n is changed from
1 to 7. When n is set as 2, DS gets the best SE for almost all hop
counts.
We show the numerical results of GST in . Replication ratio R is set
as 0.01 in this case. Similar results can be obtained when R is set as
other values. The numbers of walkers k for RW are set as 1 and 32.
The decision thresholds n are set as 2, 3 and 7, and p is set as 1. TTL
is set as 7 in this case, thus the DS with n = 7 is equal to flooding.
From this figure DS with large n always gets the short GST because it
always covers more vertices. On the contrary, RW with k = 1 always
gets the longest GST since its coverage is only incremental by one at
each hop. When k is set as 32, its coverage is enlarged and the GST
can be improved. However, DS still performs better than RW with 32
walkers even when n is set as only 2. Note that when n is set as 3, DS
performs as well as that with n = 7, i.e., flooding, while not generating
so many query messages.
In summary, DS with n = 2 and p = 1 would get the best SE and
significantly improve the GST. While increasing n to 3, although SE is
a little degraded, the shortest GST is obtained.
V. CONCLUSION
In this paper we propose the DS algorithm which is a generalization
of flooding, MBFS, and RW. The DS algorithm overcomes the
different disadvantages of flooding and RW, and takes advantage of
different contexts under which each search algorithm performs well.
The operation of DS resembles flooding or MBFS for the short-term
search, and RW for the long-term search. We analyze the performance
of DS based on some metrics including the average search time,
guaranteed search time, number of query hits, number of query
messages, success rate, and search efficiency. The main objective is to
obtain the effects of the parameters of DS. Numerical results show that
proper setting of the parameters of DS can obtain short guaranteed
search time and provide a good tradeoff between the search
performance and the cost.
REFERENCES
[1] Clip2 Distributed Search Services, The Gnutella protocol
specification v0.4, http://dss.clip2.com
[2] V. Kalogeraki, D. Gunopulos, and D. Zeinalipour-Yazti, A
local search mechanism for peer-to-peer networks, in
Proceedings of the 2002 ACM CIKM, November 2002, pp.
300307.
[3] B. Yang and H. Garcia-Molina, Improving search in peer-to-
peer networks, ICDCS, July 2002, pp. 514.
[4] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and
replication in unstructured peer-to-peer networks, in Proc. of
ICS02, June 2002, pp. 8495.
[5] Z. Zhuang, Y. Liu, L. Xiao, and L. M. Ni, Hybrid periodical
flooding in unstructured peer-to-peer networks, in
Proceedings of ICPP03, October 2003, pp. 171178.
[6] D. Tsoumakos and N. Roussopoulos, Adaptive probabilistic
search for peer-to-peer networks, in Proceedings of the 3rd
International Conference on Peer-to-Peer Computing
(P2P03), September 2003, pp. 102109.
[7] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S.
Shenker, Making gnutella-like p2p systems scalable, in
Proceedings of the ACM SIGCOMM, August 2003, pp. 407
418.
[8] A. Crespo, H. Garcia-Molina, Routing Indices for peer-to-
peer systems, ICDCS, 2002.
[9] R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, S.
Jagannathan, Search with probabilistic guarantees in
unstructured peer-to-peer networks, Proc. of P2P05.
[10] N. Sarshar, P. O. Boykin, V. P. Roychowdhury, Percolation
search in power law networks: making unstructured peer-to-
peer networks scalable, Proc of P2P04.
[11] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random
graphs with arbitrary degree distribution and their
applications. Phys. Rev. E, 64:026118, 2001.
[12] H. Wang, T. Lin, On efficiency in searching networks,
INFOCOM, March 2005.
[13] L. A. Adamic, R. M. Lukose, B. A. Huberman, Local search
in unstructured networks, Handbook of graphs and networks,
WILEY-VCH GmbH & Co. KGaA, Weinheim, 2003, pp.295-
317.
[14] W. Aiello, F. Chung, and L. Lu. A random graph model for
massive graphs. Proceedings of the thirty-second annual ACM
symposium on Theory of Computing, pages 171-180, 2000.
[15] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A.
Huberman. Search in power-law networks. Phys. Rev. E,
64:046135, 2001.
Figure 4. The effects of the parameters (n, p) on the SE. The best SE
is obtained when (n, p) is set as (2, 1).
Figure 5. GST vs. SR requirement. R is set as 1%. k for RW are
set as 1 and 32. The n of DS are set as 2, 3 and 7, and p is set as 1.
TTL is set as 7, thus the DS with n = 7 is equal to flooding.