Professional Documents
Culture Documents
and Science
node i i s “on” at time t, if and only if “t > begin_of_sessionsession_lengthindependent, and i”. i and t A
"" (2) A ll
those n they are grouped < begin_of_sessionnodes are not completely i into
several crowds +
depend on time zone of their locations in our planet. The nodes in same crowd act much similarly than the nodes in
tN
other crowds. Then equation () = ∑ 32
partici
= 0
Alive t (3)
∑ ( (1) is rewritten () ) into equation (3):
= nz)(i 0 zi , ∑32
)( (4)
0z =nzn = The nodes are explicitly divided in to 24 c rowds and n(z)|(z=0,1,2,...,23) can describe the uneven
distribution of peer population. The nodes in same time zone are “correlated”, because they conform to the same
statistics characteristics. Besides those identical statistics characteristics, they act independently as many individuals.
B. Uneven distribution of peers
As it is pointed out that the availability of an individual host in P2P network is governed not only by failures, but
more importantly by user decisions to connect to or disconnect from the network. And all those decision are made
during the working time of the P2P users. Most people begin to work in the morning and stop working at some time
in the afternoon. But “morning” differs from 0 to 23 hours for peoples in different time zones. That is why we study
how many networked computers are there in different time zones. Although it is not accurate, we count the internet
users or Internet-ed
computer number over a list of more than 200 counties after searching the internet by Google,
according to their time zones.
Figure 1. shows the uneven distribution of potential peers across the world. It is reasonable that there nearly blank in
the ocean areas, where the Pacific ocean is corresponding to time zone of East-11 to West-9 and the Atlantic ocean is
corresponding to West-1 to West-2 roughly. The remaining areas are covered by continent and so there are many
Internet-ed computers. Moreover, it can be said that the Internet-ed computers are divided into three clusters which
have their peak center location at Eastern Asia, Western Europe and America. For the first glance, the node numbers
change sharply from one time zone to another. But, it is much smoother because the users must not participate and
depart strictly according the time zone. It is explained in next section.
242
x 108
3.5
internet users per timezone
3
e
z = , where z=(0,1,...,23) ( 5)
wxf
222
1
z = 1 1
2 2σπ tpx −
e
21 2
w
+ 2
e
21 σπ tpx − 2
σ , where
1=w2=0.5, 10:30 f or 1/2 users begin their session in the morning and 1/2 users in the afternoon. I f we fix
stands tp1 =
the session time to 1 hour for simplicity then we can see the expected fluctuation, which reach to the peak of 3.1x108
and to the bottom of 0.5x108 as Figure 2. -(a) shows. The potential peer numbers of that case in 24 hours in a day can
be drawn as Figure 2. -(a). Actually, not all the nodes would begin the session, so it should be decreased by multiply
a positive factor which is little than 1.
We also can rewrite the equation (6) to equation (7), so it can cover the nighttime. It is reasonable wp otential 3=0.2,
tppeer 1=
10:00 numbers am, in h ours pm in
tp2 =3:00 24 and a day tpto 3=9:00 can
set be w1=wdrawn pm. 2=
0.4, The
as Figure 2. -(b)
()
( ) ( ) ,
1
( ) 22 2 2 1
21
23
tpx tpx z
w
e − − wxf
=
1
+
1
2σπ
e 2
σ
+ w 2
1
2σπ e 2
σ 2 2 tpx −
2 3 σ
z=(0,1,...,23) σ
3 2 π
, w1+w2+w3=1. (7)
Figure 2. -(c) stands for the case of nighttime only, where we set Because tp3 =9:00pm.
s edone niln 2 1
2.5 o 1.5
0
0.5
0 5 10 15 20 25 Greenwich Mean Time
(a)
243
x 108
2.5
2
s edone niln 1
o1.5
0
0.5
0 5 10 15 20 25 Greenwich Mean Time
(b)
4 x 108
3
3.5
s edone niln 0 2 1
o 0 5 10 15 20 25 2.5 1.5
0.5Greenwich Mean Time
(c)
Figure 2. Peer numbers at time of GMT-0 to GMT-23 for various cases of applications (a) for applications that will happen in the time from
morning to afternoon, (b) for those in the whole day. (c) for entertainment at night only.
III. EXPERIMENTAL EVALUATIONS Inter-arrival times obey the exponential (or Weibull) distribution according to the
measurement reported [4]. So, we take a simulation to evaluate our model. Considering two systems with 5000 and
10000 nodes respectively, the session begin times are recorded to measure the inter-arrival times. These nodes are
distributed to 24 time zones proportionally to the distribution of Figure-1. The measured intervals are counted and
drawn as Figure 3. We also make an exponential curve fitting. The fitting curve for Figure 3. -(a) is
y=453.53*exp(-0.099*x), and the fitting curve for Figure 3. - (b) is y=2052.7*exp(-0.214*x). These two cases are
both compliant to exponential distribution and backup our model strongly.
5000 nodes 600
measure value curve fitting 500
400
r ebmu
n300
200
100
0
0 5 10 15 20 25 30 35 40 45 50 interval(second)
(a)
10000 nodes 2000
measure value 1800
curve fitting
1600
1400
r ebmu
1200 n1000 800
600
400
200
0
0 5 10 15 20 25 30 35 40 45 50 interval(second)
(b)
Figure 3. Inter-arrival times distribution of a global distributed system. (a) of 5000 nodes. (b)10000 nodes.
We don’t take any experimental simulation to verify our model for session length, because it adopts exponential
session length distribution in our model and it is compliant to other existed models and measurements. What is more,
it is easy to adopt other session length distribution under GCP framework. IV. POTENTIAL APPLICATIONS OF GCP
The new model, named GCP, would affect some filed of P2P system in routing maintenance, neighbor selection, date
replication and search ways. If we can rebuild the metrics of GCP model for one P2P application from the
observations of system logs, then the peer population is somewhat predictable. Many storage systems based on DHT
would adjust their effort to repair the system according to the urgency of damage. The urgency of damage is
evaluated by counting the exist resource and compared to the limitation. The more damage occurs and the less
resource remains, the more effort is make to repair. As we can see in Figure 2. , the system should take harder effort
to repair when the resource (nodes) decrease. Then, most of the system will accumulate excessive redundant data of
replication when the time goes
244
forward from any valley of Figure 1. Because most system will maintain enough replicas in any time to ensure
adequate availability, so, from that time on, the replicas reintegrated into the system will increase as more nodes
rejoin. By knowing when the nodes amount will increase, the excessive redundant data could be avoided and reduce
the usage of bandwidth and storage. When the time goes forward from any peak of peer population, most system
would be busy to copy replicas due to the nodes departure rapidly, and might do harm to system availability. Using
GCP make it possible to prepare for that decreasing of population and maintain a higher availability.
It is possible to use GCP to predict the fluctuation of peer population and get more accurate control than those use
historical measurement as prediction. For the cases of routing maintenance, neighbor selection, and search ways, it is
similar to the case of replicas.
V. CONCLUSIONS AND FUTURE WORK By combining the uneven geographical distribution of nodes and their cyclic
behavior, a new model is provided. It gives some more information than those models which treat the whole peer
population as a black-box. It is compliant to the measurement taken in previous works and might be correct for real
application. We also provide an application example for replication management of storage system based on DHTs.
There still many work should be done. The GCP model is coarse and should be refined. The users’ behavior can be
classified into some kinds other than simply modeling by Gaussian, such as to distinguish the servers that will be
online permanently from the home PCs that happen to used P2P applications. Besides that refining, we should figure
out a mechanism and real-time algorithm that is capable of rebuilding the GCP metric from the run-time records or
logs of P2P system. The most important work is to apply GCP in practice for performance optimization.
ACKNOWLEDGMENT This work was supported in part by the Fundamental Research Program of Guangdong Province
(Grant No. 2006B36430001), Foundation of Shenzhen City (Grant No. SG200810220145A and JC200903120069A),
National Science Foundation of China (Grant No. 60602066). The work has also got the support from Sate Key
Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under
the Project number SKLNST-2009-1-8.
REFERENCES
[1] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn in a DHT,” In Proceedings of the 2004,USENIX Annual Technical
Conference (USENIX’04), Boston, MA, USA, Jun. 2004, pp.127–140. [2] Chun, B., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H.,
Kaashoek, M. F., Kubiatowicz, J., and Morris, R. 2006. “Efficient replica maintenance for distributed storage systems”. In Proceedings of the 3rd
Conference on Networked Systems Design &
Implementation - Volume 3 (San Jose, CA, May 08 - 10, 2006). pp. 45-58 [3] R. Bhagwan, K. Tati, Y. C. Cheng, S. Savage, and G. M. Voelker, "Total
recall: System support for automated availability management," in Proceedings of Symposium on Networked Systems Design and Implementation (NSDI),
March 2004. pp.337–350. doi=10.1.1.10.9775 [4] Daniel Stutzbach, Reza Rejaie, “Understanding churn in peer-to-peer networks,” Internet Measurement
Conference 2006, pp.189–202.
245
[5] J. Chu, K. Labonte, and B. N. Levine, "Availability and locality measurements of peer-to-peer file systems," in Proc. of ITCom: Scalability and Traffic
Control in IP Networks, 2002. [Online]. Available: http://eprints.kfupm.edu.sa/27741/ [6] Octavio Herrera, Taieb Znati, "Modeling Churn in P2P
Networks," Simulation Symposium, Annual, 40th Annual Simulation Symposium (ANSS'07), 2007 , pp. 33-40. [Online] Available:
http://doi.ieeecomputersociety.org/10.1109/ANSS.2007.28