You are on page 1of 5

Dynamic Search Algorithm in Unstructured Peer-to-

Peer Networks

Po-Chiang Lin
*
, Tsung-Nan Lin
*
, and Hsinping Wang
*

*
Graduate Institute of Communication Engineering

Department of Electrical Engineering


National Taiwan University
Taipei, 10617 Taiwan
{d94942014, tsungnan, r91942041}@ntu.edu.tw

Abstract-Flooding and random walk (RW) are the two typical
search algorithms in unstructured peer-to-peer networks. The
flooding algorithm searches the network aggressively. It covers
the most nodes but generates a large number of query messages.
Hence it is considered to be not scalable. This cost issue is
especially serious when the queried resource locates far from the
query source. On the contrary, RW searches the network
conservatively. It only generates a fixed amount of query messages
at each hop, but it may take particularly longer search time to find
the queries resource. We propose the dynamic search algorithm
(DS) which is a generalization of flooding, modified breadth first
search (MBFS), and RW. This search algorithm takes advantage
of different contexts under which each previous search algorithm
performs well. The operation of DS resembles flooding or MBFS
for the short-term search, and RW for the long-term search. We
analyze the performance of DS based on the power-law random
graph model and adopt some performance metrics including the
guaranteed search time, query hits, query messages, success rate,
and a unified metric, search efficiency. The main objective is to
obtain the effects of the parameters of DS. Numerical results show
that proper setting of the parameters of DS can obtain the short
guaranteed search time and provide a good tradeoff between the
search performance and the cost.

I. INTRODUCTION

The design of search algorithms is critical to the performance of
unstructured peer-to-peer (P2P) networks. In the unstructured P2P
networks, each node does not have the global information about the
whole topology and the location of queried resources. Therefore, it
depends on the search algorithms to help locating the queried resource
and routing the message to the target node.
Related works about the search issue in unstructured P2P networks
can be classified into two categories: breadth first search (BFS)-based
methods, and depth first search (DFS)-based methods. Flooding,
which belongs to the BFS-based methods, is the default search
algorithm for Gnutella network [1]. In this method, the query source
sends the query message to all of its neighbors. When a node receives
a query message, it first checks if it has the queried resource. If yes, it
sends a response back to the query source to indicate the query hit.
Otherwise, it sends the query message to all of its neighbors, except the
one the query message comes from. The drawback of flooding is the
query cost. It produces considerable query messages even when the
resource distribution is scarce. The search is especially inefficient
when the target is far from the query source because the number of
query messages would grow exponentially with hop count. Figure 1
shows a simple network scenario. The link degree of each vertex in
this graph is 4. If the network grows unlimited from the query source,
the number of query messages at each hop would be 4, 12, 36,,
respectively. If the queried resource locates at on of the 3
rd
neighbors,
it takes 4 + 12 + 36 = 52 query messages to find just one resource.
Some BFS-based methods try to modify the flooding to improve the
efficiency, including modified-BFS (MBFS) [2], directed BFS [3],
expanding ring [4], and random periodical flooding (RPF) [5].
However, these methods still generate the large amount of query
messages when the queried resource locates far from the source. On
the other hand, random walk (RW) is an efficient search algorithm
which belongs to the DFS-based methods. In RW, the query source
just sends one query message (walker) to one of its neighbors. If this
neighbor does not own the queried resource, it keeps on sending the
walker to one of its neighbors, except the one the query message comes
from. Hence the search cost is reduced. The main drawback of RW is
that the search time to find the target is usually long. Since RW only
visits one node for each hop, the coverage of RW grows linearly with
hop count, which is slow compared with the exponential growth of
flooding. Moreover, the success rate of each query by RW is also low
due to the same coverage issue. Increasing the number of walkers
might help to improve the search time and success rate, but the effect is
limited due to the link degree and the redundant path in the network.
As the example shown in Figure 1, RW can only visit 12 vertices of
second neighbors even when the number of walkers k is set as 32.
Therefore, the search is inefficient because 32 walkers only visit 12
vertices at the second hop. Some research works are designed to
provide additional information about the more possible direction and
route the walkers with different link weights based on these directional
information, such as APS [6], biased RW [7] and routing index (RI) [8].
Some other works focus on replicating the reference pointer to the
queried resources in order to improve the search time [9] [10].
In this paper we propose the dynamic search algorithm (DS) which
is a generalization of flooding, MBFS, and RW. The DS algorithm
overcomes the different disadvantages of flooding and RW, and takes
advantage of different contexts under which each search algorithm
performs well. The operation of DS resembles flooding or MBFS for
the short-term search, and RW for the long-term search. In order to
analyze the performance of DS, we apply Newmans power-law
random graph as the network topology and adopt the generation
functions to model the link degree distribution [11]. We adopt some
performance metrics including the guaranteed search time, number of
query hits, number of query messages, success rate, and search
efficiency [12]. The main objective is to obtain the effects of the
parameters of DS. Numerical results show that proper setting of the
parameters of DS can obtain short guaranteed search time and provide
a good tradeoff between search performance and cost.
The rest of this paper is organized as follows. We provide a detailed
description of the DS algorithm in Section II. The performance
analyses based on the power-law random graph are given in Section III.
The numerical results of these performance analyses are given in
Section IV. Finally, we provide some concluding remarks in Section V.

Figure 1. A simple network topology with link degree 4 of each vertex.

II. DYNAMIC SEARCH ALGORITHM

In this section we provide a detailed description of DS. DS is a
generalization of flooding, MBFS, and RW. There are two phases in
DS algorithm, and each phase has different searching strategy. The
choice of search strategy at each phase depends on the hop count h of
the query messages and the decision threshold n of DS.
Phase 1. When h n:
At this phase, the DS acts as flooding or MBFS. The number of
neighbors that the query source sends the query messages to depends
on the pre-defined transmission probability p. If the link degree of this
query source is d, it would only send the query messages to pd
neighbors. When p is equal to 1, the DS resembles flooding. Otherwise
it operates as MBFS with transmission probability p.
Phase 2. When h n:
At this phase, the operation of DS switches to RW. Each node
which receives the query message would send the query message to
one of its neighbors if it does not have the queried resource. Assume
that the nodes visited by DS at hop h = n is the coverage c
n
, then the
operation of DS at that time can be regarded as RW with c
n
walkers.
However, there are some differences between DS and RW when we
consider the whole operation. Consider the simple scenario as Figure 1
shows. Assume that the decision threshold n is set as 2. Therefore,
when h > 2, DS performs the same as RW with c
2
= 12 walkers. Let us
consider a RW search with k = 12 walkers. At the first hop, the
walkers only visit 4 nodes, but the cost is 12 walkers. This shows the
significant difference between DS and RW.
Suppose that s is the query source, r is the vertex which receives the
query message, f is the queried resource, m
i
is the i
th
query message,
and TTL is the time-to-live limitation. Figure 2 shows the pseudo-code
of DS.

Figure 2. The pseudo-code of DS
In short, DS is designed to perform aggressively in short-term search,
and conservatively in long-term search. Obviously the parameters n
and p would affect the performance of DS. In the next section we
analysis the performance of DS based on some metrics, and show the
effects of parameters n and p.

III. PERFORMANCE ANALYSIS

In this section we present the performance analysis of DS. We apply
Newmans power-law random graph as the network topology, adopt
the generating functions to model the link degree distribution [11], and
analyze the DS based on some performance metrics, including the
guaranteed search time, query hits, query messages, success rate, and
search efficiency.
A. Network Model
We use the idea of generating function, proposed in [11], to derive
the desired terms of a random graph. First, the generating function for
link degree distribution is denoted by G
0
(x) (see [11, 13] for complete
definition). Then, the average degree of a randomly chosen vertex, i.e.
the average number of first neighbors, is z
1
= ( ) 1 G
'
0
and the number of
second neighbors ( ) ( ) 1 G 1 G z
'
1
'
0 2
= .
By [13], the following approximations are obtained.
( ) ( )

- 2 '
0
m - 1
2 -
1
1 G (1)
and
Algorithm: The pseudo-code of dynamic search DS
Input: query source s, queried resource f, transmission
probability p
Output: the location information of f
DS(s, f, p)
/* the operation of s */
h 0
if (h <= n)
h h + 1
s choose p portion of its neighbors
m
i
carring h visits these chosen neighbors
elseif (h > n)
h h + 1
m
i
carring h visits one neighbor of s
/* the operation of r */
foreach (r)
if (r has the location information of f)
r returns the information to s
m
i
stops
elseif (h > TTL)
m
i
stops
elseif (h <= n)
h h + 1
r choose p portion of its neighbors
m
i
carring h visits these chosen neighbors
elseif (h > n)
h h + 1
m
i
carring h visits one neighbor of r
( )
( )

- 3
m

1 G
1
1 G
- 3
'
0
'
1
(2)
assuming 2 < < 3, where is the power-law exponent.
B. Performance Metrics
Success Rate (SR)
Success rate (SR) is the probability that the query is success, i.e.,
there is at least one query hit. Assume that the queried resources are
uniformly distributed in the network with replication ratio R, and then
SR can be calculated as
( )
C
R - 1 - 1 SR = (3)
where R is the replication ratio and C is the coverage. This formula
shows that the SR highly depends on the coverage of the search
algorithms. Following we use (3) to obtain an important metric
guaranteed search time.
Guaranteed Search Time (GST)
To represent the capability of one search algorithm to find the
queried resource with a given probability, we define the guaranteed
search time (GST) as the search time it takes to guarantee the query
success with success rate requirement SR
req
. GST represents the hop
count that a search is successful with probabilistic guarantee. Using
Equation (3), GST is obtained when the coverage C is equal to
( )
( )
req R - 1
SR - 1 log . For the MBFS search algorithms, this situation
occurs when
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( )
( )
req R - 1
1 - GST
'
1
'
0
GST
2
'
1
'
0
3 '
1
'
0
2 '
0
SR - 1 log
1 G 1 G p
1 G 1 G p 1 G 1 G p 1 G p
MBFS
MBFS
=
+ +
+ +

(4)
Thus the GST for MBFS is
( )
( ) ( )
( )
( )
( )
|
|
|
.
|

\
|
+

1
1 G p
SR - 1 log 1 - 1 G p
log
GST
'
0
req R - 1
'
1
1 G p
MBFS
'
1
(5)
The GST of flooding is analogue to that of MBFS with probability p =
1. The calculation of RW depends on the number of walkers k. When k
is set as 1, the GST for RW is obviously
( )
( )
req R - 1
SR - 1 log . When k
is larger than 1, assume that
( ) ( ) ( ) ( ) ( ) ( )
t
'
1
'
0
1 - t
'
1
'
0
1 G 1 G k 1 G 1 G
(6)
i.e., k is equal to or larger than the average number of the t
th
neighbors
of the query source, and assume that the effect of redundant paths can
be neglected, and then follow the same reason in (4), the GST for RW
is
( )
( ) ( ) ( ) ( )
t
k
1 G 1 G - SR - 1 log
GST
1 - t
0 i
i
'
1
'
0 req R - 1
RW
+

=

=

(7)
Now we consider the GST for DS. When the hop count h of query
message is smaller than or equal to the decision threshold n, the GST
of DS is equal to (5). When h is larger than n, the GST for DS is
( )
( )
( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
( )
( )
( ) ( ) ( )
1 -
1 G 1 G p
SR - 1 log
n
1 G p 1 - 1 G p
1 - 1 G p
-
1 G 1 G p
SR - 1 log
n
GST
1 - n
'
1
'
0
n
req R - 1
1 - n
'
1
'
1
n
'
1
1 - n
'
1
'
0
n
req R - 1
DS

+


+
=

(8)
We compare the GTS for DS and for RW with 1 walker. The
improvement ratio is
( )
( )
( ) ( )
n
'
1
'
0
'
1
RW
DS RW
1 G p
1

1 G
1 G
- 1
GST
GST - GST


(9)
In Equation (9), the last term on the right would significantly affect the
performance improvement. The GTS of DS would be exponentially
decreased with n, which can be expressed as O(1/n). Larger p would
also affect the performance, but the effect is slow when compared with
n. The extreme case of n is that it is set as the TTL limitation, i.e., DS
performs as flooding or MBFS. In this case the GST would be the
shortest. However, it would generate a huge amount of query
messages. The tradeoff between the search performance and the cost
should be taken into consideration. In the following paragraphs, we
further analyze the number of query hits and the number of query
messages, and further combine these metrics into a unified metric,
search efficiency.
Query Hits (QH)
The number of query hits highly depends on the coverage, i.e., the
number of total visited nodes. Assume that the queried resources are
uniformly distributed with replication ratio R in the network, and the
coverage is C; hence the number of query hits is RC. The coverage C
can be regarded as the summation of the coverage at each hop.
Therefore, we first analyze the coverage C
h
at the h
th
hop. Let V
h
be
the event that a vertex is visited at the h
th
hop. Suppose the probability
that the vertex i is visited at the h
th
hop is P
i
(V
h
). When the hop count
h = 1, C
h
is the expectation of the vertices that are visited at the first
hop. When the hop count h is larger than 1, the calculation of C
h

should preclude the event that the vertex has been visited in the
previous hop. Therefore, the coverage C
h
at the h
th
hop can be written
as
( )
( ) [ ] ( )


=
=

= =
=
2 h for , V P V P - 1
1 h for , V P
C
N
1 i
1 - h
1 j
h i j i
N
1 i
h i
h
(10)
where N is the total number of vertices in the network.
Following we analyze the visiting probability P
i
(V
h
) for flooding,
MBFS, RW, and DS, respectively. First we consider the flooding and
MBFS case. The visiting probability P
i
(V
h
) of flooding or MBFS is
( )
( )
( ) [ ]


=
=
2 h for , 1 G p p - 1 - 1
1 h for , 1 G p p
V P
1 - h
C
'
1 i
'
0 i
h i
(11)
where p
i
is the probability that vertex i is to be reached by certain edge.
Reference [14] shows that p
i
can be written as

=
=
N
1 i
1
1
i
i m
i m
p


(12)
where is the power-law exponent and m is the maximum degree.
When considering RW, we first calculate the probability that a
vertex i is the candidate of RW.
( )
( )
( ) [ ]


=
=
2 h for , 1 G p - 1 - 1
1 h for , 1 G p
R P
1 - h
C
'
1 i
'
0 i
h i
(13)
Then, the average number of candidates of RW at hop h is
( )

=
=
N
1 i
h i h
R P r (14)
Hence, the probability that vertex i is visited at hop h for RW is
( ) ( )
(
(

|
.
|

\
|
=
k
h
h i h i
r
1
- 1 - 1 R P V P (15)
where k is the number of walkers.
The calculation of visiting probability P
i
(V
h
) for DS depends on the
relation between h and n. When h n, P
i
(V
h
) is given by equation
(11). When h n, equation (13) to (15) are used to get P
i
(V
h
), where
the k in (15) is set as C
n
, i.e., the coverage at the n
th
hop. Therefore, the
visiting probability P
i
(V
h
) of DS is given by
( )
( )
( ) [ ]
( )

>
(
(

|
.
|

\
|


=
=
n h for ,
r
1
- 1 - 1 R P
n h 2 for , 1 G p p - 1 - 1
1 h for , 1 G p p
R P
n
1 - h
C
h
h i
C
'
1 i
'
0 i
h i
(16)
Query Messages (QM)
Now we analyze the number of query messages. When considering
the flooding and MBFS case, the query message e
h
generated at hop h
is given by
( )
( )


=
=
2 h for , C 1 G p
1 h for , 1 G p
e
1 - h
'
1
'
0
h
(17)
When considering the RW case, the number of query messages for
each hop is keep fixed as k, i.e., the number of walkers. Therefore, the
total number of query messages for RW is TTL k .
The calculation of query messages for DS depends on h and n. The
query messages e
h
generated at hop h for DS can be written as
( )
( )

>

=
=
n h for , C
n h 2 for , C 1 G p
1 h for , 1 G p
e
n
1 - h
'
1
'
0
h
(18)
Search Efficiency (SE)
Search efficiency (SE) is proposed as a unified performance metric
for search algorithms [12]. It is defined as
( )
R
SR

QM
h h QH
SE
TTL
1 h
=

=

(19)
where QH(h) / h means the query hits in the h
th
hop weighted by the
hop count, QM is the total number of query messages generated during
the query, SR is the probability that the query is success, i.e., there is at
least one query hit, and R means the replication ratio of this system.
Assume that the object is uniformly distributed in the network. Then
the query hit at the h
th
hop is equal to the multiplication of the coverage
at the h
th
hop and the replication rate R. Therefore, (19) can be written
as
( )
R
R - 1 - 1

e
h R C
SE
TTL
1 h
h
C
TTL
1 h
h
TTL
1 h
h

=

=
=
(20)
where C
h
is the coverage at the h
th
hop, e
h
is the query messages
generated at the h
th
hop, and R is the replication ratio.

IV. NUMERICAL RESULTS

In this section we show the numerical results of the performance
analysis described in section III. From these results the effects of the
parameters n and p are shown. The number of nodes N is set as 10000.
The power-law exponent is set as 2.1, which is analogue to the real-
world situation [15].
Figure 3 illustrates that how the decision threshold n would affect
the system performance. Due to the page limit, we only show the
result when p is set as 1. The case of n = 1 is analogue to the RW
search with k equal to the number of first neighbors, which is about
3.55 in this case. Moreover, the case that n = 7 is equal to the flooding
search. As this figure shows, the DS with n = 7 sends the query
messages aggressively in the first three hops and gets good search
efficiency. But the performance degrades rapidly as the hop increases.
This is because that the cost grows exponentially with the path length
between the query source and the target. On the contrary, the SE of
RW is better than that of flooding when the hop is 5 to 7. When n is
set as 2, DS gets the best SE for almost all hop counts. This figure
shows that the choice of parameter n can help DS to takes advantage of
different contexts under which each search algorithm performs well.
In order to obtain the best (n, p) combination, we illustrate the (n, p,
SE) results in Figure 4. When p is large (0.7 ~ 1), set n = 2 would get
the best SE. Moreover, the best n value increases as the p decreases.
For example, when p is set as 0.2, the best n would be 5. This is
because that when p is small, n should be increased to expand the
coverage. On the contrary, when p is large, n should be decreased to
limit the growth of query messages. Therefore, the parameters n and p
provide the tradeoff between search performance and cost. This figure
shows that the best SE is obtained when (n, p) is set as (2, 1).
Figure 3. SE vs. hop count when p is set as 1 and n is changed from
1 to 7. When n is set as 2, DS gets the best SE for almost all hop
counts.
We show the numerical results of GST in . Replication ratio R is set
as 0.01 in this case. Similar results can be obtained when R is set as
other values. The numbers of walkers k for RW are set as 1 and 32.
The decision thresholds n are set as 2, 3 and 7, and p is set as 1. TTL
is set as 7 in this case, thus the DS with n = 7 is equal to flooding.
From this figure DS with large n always gets the short GST because it
always covers more vertices. On the contrary, RW with k = 1 always
gets the longest GST since its coverage is only incremental by one at
each hop. When k is set as 32, its coverage is enlarged and the GST
can be improved. However, DS still performs better than RW with 32
walkers even when n is set as only 2. Note that when n is set as 3, DS
performs as well as that with n = 7, i.e., flooding, while not generating
so many query messages.
In summary, DS with n = 2 and p = 1 would get the best SE and
significantly improve the GST. While increasing n to 3, although SE is
a little degraded, the shortest GST is obtained.

V. CONCLUSION

In this paper we propose the DS algorithm which is a generalization
of flooding, MBFS, and RW. The DS algorithm overcomes the
different disadvantages of flooding and RW, and takes advantage of
different contexts under which each search algorithm performs well.
The operation of DS resembles flooding or MBFS for the short-term
search, and RW for the long-term search. We analyze the performance
of DS based on some metrics including the average search time,
guaranteed search time, number of query hits, number of query
messages, success rate, and search efficiency. The main objective is to
obtain the effects of the parameters of DS. Numerical results show that
proper setting of the parameters of DS can obtain short guaranteed
search time and provide a good tradeoff between the search
performance and the cost.
REFERENCES
[1] Clip2 Distributed Search Services, The Gnutella protocol
specification v0.4, http://dss.clip2.com
[2] V. Kalogeraki, D. Gunopulos, and D. Zeinalipour-Yazti, A
local search mechanism for peer-to-peer networks, in
Proceedings of the 2002 ACM CIKM, November 2002, pp.
300307.
[3] B. Yang and H. Garcia-Molina, Improving search in peer-to-
peer networks, ICDCS, July 2002, pp. 514.
[4] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and
replication in unstructured peer-to-peer networks, in Proc. of
ICS02, June 2002, pp. 8495.
[5] Z. Zhuang, Y. Liu, L. Xiao, and L. M. Ni, Hybrid periodical
flooding in unstructured peer-to-peer networks, in
Proceedings of ICPP03, October 2003, pp. 171178.
[6] D. Tsoumakos and N. Roussopoulos, Adaptive probabilistic
search for peer-to-peer networks, in Proceedings of the 3rd
International Conference on Peer-to-Peer Computing
(P2P03), September 2003, pp. 102109.
[7] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S.
Shenker, Making gnutella-like p2p systems scalable, in
Proceedings of the ACM SIGCOMM, August 2003, pp. 407
418.
[8] A. Crespo, H. Garcia-Molina, Routing Indices for peer-to-
peer systems, ICDCS, 2002.
[9] R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, S.
Jagannathan, Search with probabilistic guarantees in
unstructured peer-to-peer networks, Proc. of P2P05.
[10] N. Sarshar, P. O. Boykin, V. P. Roychowdhury, Percolation
search in power law networks: making unstructured peer-to-
peer networks scalable, Proc of P2P04.
[11] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random
graphs with arbitrary degree distribution and their
applications. Phys. Rev. E, 64:026118, 2001.
[12] H. Wang, T. Lin, On efficiency in searching networks,
INFOCOM, March 2005.
[13] L. A. Adamic, R. M. Lukose, B. A. Huberman, Local search
in unstructured networks, Handbook of graphs and networks,
WILEY-VCH GmbH & Co. KGaA, Weinheim, 2003, pp.295-
317.
[14] W. Aiello, F. Chung, and L. Lu. A random graph model for
massive graphs. Proceedings of the thirty-second annual ACM
symposium on Theory of Computing, pages 171-180, 2000.
[15] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A.
Huberman. Search in power-law networks. Phys. Rev. E,
64:046135, 2001.

Figure 4. The effects of the parameters (n, p) on the SE. The best SE
is obtained when (n, p) is set as (2, 1).
Figure 5. GST vs. SR requirement. R is set as 1%. k for RW are
set as 1 and 32. The n of DS are set as 2, 3 and 7, and p is set as 1.
TTL is set as 7, thus the DS with n = 7 is equal to flooding.

You might also like