You are on page 1of 11

Final Report for A Survey of Load Balancing of Peer-to-Peer Networks

Yunhui Fu*

Abstract Distributed hash table-based indexing has been widely used in many applications of the structured P2P networks. Most of them assume that the data are even mapped to the nodes in the system. However, the nodes have signicant dierences in actual networks, therefore, proper load balancing mechanisms should be used to eliminate the impacts to the performance of the system. In this paper, we focus on load balancing algorithms of structured P2P overlay networks by surveying a number of algorithms with respect to important criteria. We also compared the P2P simulators used in this project.

1 Introduction
P2P is one of the solutions for the distribution system. Started from late 90s, the emerge of P2P applications(such as Napster, gnutella) and its inuence one the Internet cause academics to start exploring the P2P networks. And the P2P networks became one of the fastest growing and most popular in both the application and academic research area. The concept of P2P framework presents some interesting and important study methods. The main idea of the P2P is self-organization without a central infrastructure, compared with the client-server mechanisms of traditional Internet applications. The P2P networks have some advantages over the traditional client-server networks, such as any scale of the node number, no single point of failure problem. At the same time, it also faces some new challenges. The principal challenge of P2P systems is to be self-organize and get high service quality. The key is to nd and locate the data eciently. Theres two principal methods to resolve the problem, which is the unstructured and structured P2P systems. Unstructured P2P networks are formed by nodes which are connected arbitrarily, so a searching of a desired le needs a ood query in the entire network. While in the structured P2P network, the desired le can be found eciently
* School

of Computing, Clemson University, yfu@clemson.edu

Category

Table 1: The attributes of P2P overlay networks Attributes Type The central servers are needed; Theres a centralized central database for indexing the data; pure P2P The system remains usable after removing any node in the system; no central server; The system remains usable after removing any node in the system; dynamic central server; The system remains usable after removing any node in the system; no central server; the connections are xed in overlay network.

Examples Napster Gnutella 0.4, Freenet Gnutella 0.6, JXTA Chord, CAN

Unstructured

hybrid P2P

Structured

DHT-based

by some types of data structure. The distributed hash table-based (DHT) indexing is used in structured P2P networks. The complexity of the algorithms of DHT can be O(log N ), which is similar to the complexity of commonly used for indexing and searching algorithms. While the DHT algorithms used in the structure P2P networks are more powerful than that in an unstructured P2P network in theory, it assumes that the data in the nodes are evenly distributed. Considering the dierent capabilities between the nodes in a P2P network, such as storage, CPU power, bandwidth etc, the load of nodes diers signicantly to each others. So it needs a properly designed load balancing mechanism to keep the DHT algorithm to have the same O(log N ) complexity. During the last decade, Lots of researchers revealed the properties of P2P networks in all its aspects, and the theory of the P2P network becames matures. The advances of this research area benet not only in the le sharing on the Internet, but also in other distribution applications. Load balancing is the key solution to improve the performance of the system. So it will give the foundation for applicating those load balancing algorithms to other applicants that we evaluate the algorithms and its characteristics.

2 Uneven Load
Structured P2P overlay networks need load balancing to make full use the resource of each network node in the network and work eectively. Therere two strategies to implement the load balancing. The rst is average distribution of the tasks, that each of the node will take the same load. Another is to distribute the load according to the ability of each node in the overlay network. In a real P2P overlay network, theres signicant dierences between nodes, such as the processing power, storage capacity, bandwidth, connection of network.

A resource object can be stored in arbitrary node in a structured P2P overlay network. Dierent systems use dierent distribution policies. The common method is to distribute the resource index according to some type of relationship between the object ID and the node ID. Once the objects are allocated, the location is x and can be quickly located. If the resource is very popular, there will be a big amount of requests tring to access those resources, and a lot of query messages ock to the nodes stored the required resources, thus a new hotspot emerges. The emergence of hotspot may cause serious problem for the node, the load of node increased and may lead the node refuse to work. The imbalance of user queries cause the hotspot problem. The studies [1, 2] show that the distribution in P2P overlay networks is very similar to that in HTTP requests, which is a Zipf-like distribution.

3 The algorithms for load balancing


There are serveral P2P load balancing algorithms emerging in recent years, such as virtual servers [3, 4], multiple hash strategy [5, 6, 7], data replication strategy [8, 9], caching strategy [1], and adjustable routing strategy [1, 10]. In addition to these, other scholars proposed a global load balancing policy based on Histogram [11]. 3.0.1 Virtual Servers Virtual servers [3, 4] can be viewed as the nodes in a structured P2P network. Each of the physical nodes may has one or more virtual servers in the lower network layer, and each of the virtual servers manages a block of ID space. So the physical nodes can manage multiple blocks of ID space, and hence act as serval standalone logical nodes. The ID spaces managed by the physical node may be continuous or discontinuous. If one physical has heavy workload, one or more virtual servers will be transfer to the node with light workload. The transfer of the virtual server can be handled by the Join and Leave operations of structured P2P networks. The main idea of the virtual server is to transfer some virtual servers from busy nodes to idle nodes. The suitable virtual servers to be transfered meet the following conditions. 1. The receiver node will not be busy node after accept virtual servers; 2. The virtual server released is the lightest load in the node to reduce the load of node; 3. The virtual server with heaviest load will be selectd to be transfered if theres no releasing of virtual server can reduce the node load. The receiver node should not be overloaded once it accept the overloaded virtual server in the third constraint, so the opportunity is to be increased in next round.

1. One to one For two randomly selected nodes, the virtual servers were transferred from the overloaded node to a light load node. Each of the light load node will select one node randomly, the transfer occurs if that node is overloaded and comply with the above three rules. 2. One to many The overloaded node is allowed to transfer the virtual servers to serval light load nodes. The lightest load virtual server, which is located by applying the above algorithm on the light load nodes set, will be transferred. 3. Many to many This mechanism matches more than one overloaded node to multiple light nodes. Its easy to implement the virtual server strategy by using the Enter and Leave operations of DHT nodes. While the system overhead is relatively large, and it do not take into account the imbalance in the user queries. 3.0.2 Multiple Hash Strategy In this strategy [5, 6, 7], each of the node maps the resource to the DHT ring by using the hash function h1 , h2 , . . . , hd . So every time when the object is to save the voluntary, individual nodes could get dierent hash value h1 (x), h2 (x), . . . , hd (x) under the objects keywords, and these dierent hash values are the candidate node nodeIDs. And then the node is responsible to store the resources on the lightest load candidate node. Therere two query methods. The rst needs re-calculate all of the hash functions to query a resource object. It will be able to nd the resource object after querying all the candidate nodes. These queries can be processed in parallel, but it consumes a lot of bandwidth and thus costs more time than other classic methods. The second method uses the redirection of pointer. All the other nodes maintain a pointer to the light load node which stores the resource object. When a resource object x is queried and there exists a pointer to x, it dont need to calculate all of the hash function and send the request directly to the node stored the x. Overhead of multiple hash strategy is not very heavy for the network, but it did not take into account the inuences to the algorithm from the dynamic of the system and imbalance of user queries posed by hot spot. 3.0.3 Data Replication Strategy Data replication strategy [8, 9] can be used in processing the hot spot problems. When a hot spot problem emerges, other nodes will have a copy of the hot spot nodes resource. By this means it can reduce the load of the hot spot node, and thus achieves load balancing and increasing system utilization purposes. Although it can handle the hot spot problem, it could not resolve the load balancing under the Zipf queries. 4

3.0.4 Caching Strategy Caching strategy [1] is also used to process the hot spot problems. It takes into account the unbalanced nature of the resource query. First of all, the main idea is that each node analysis the access information of hot spot resources, and store the copy of the resources to one of the nodes in the query path the storage is only temporary. By caching hot spot resources, it can reduce load of overloaded nodes, and achieves the purpose of load balancing. The caching strategy can process the hot spot problem, and can also handle the load balancing problem under Zipf queries, but it didnt consider the actual changes in the load balancing algorithms by the underlying physical network in practical. 3.0.5 Adjustable Routing Strategy Adjustable routing strategy [1, 10] relieves the message load of the overloaded nodes through appropriate changes of the routing tables in the structure of P2P network. This method considers the eect of unbalanced nature of the query load on nodes, but also does not take into account the actual changes in the load balancing algorithms by the underlying physical network in practical.

4 Simulators
To study the details of the various P2P network, a handy and powerful simulator is needed. There are some useful simulators were used for research purposes. Some simulators are extensible and can be used for general purpose while others are not. Some of these simulate only the low layer physical netowrk, and are not suitable for P2P overlay networks. Few of them were desiged for structured or unstructured P2P network. Many researchers try to design and implement a common, extensiable P2P simulator which can support various existing P2P protocols. NS-2 NS-2 is a packet layer simulator base on discrete events, commonly used in simulating the packet-switching networks and protocols. The language C++ is used to dene various lower protocol, and OTCL is used by the user to dene the nodes and links to be simulated. NS-2 supports IP, TCP, UDP and many other route protocols. Theres a gnutella module in NS-2 and a basic implementation BitTorrent protocol has been implemented. The latest version was release on November 4, 2011. PeerSim Its a simulator for P2P overlay networks in java, support both structured and unstructured P2P networks. It has two simulation engines which are cycle-based one and event-driven one. The event-driven is more precise than that of cycle-based, while the cycle-based one can support a large network. Various protocols were implemented as a plugin of it, such

Figure 1: Screenshot of NS2 Nam as bandwidth management protocol, a fault-tolerant FSM, Pastry, Chord, Skipnet, BitTorrent, Aggregation, SG-1, Peer sampling service, T-Man, PdProtocol, and Slacer etc. The latest version was released on July 23, 2011. Overlay Weaver The simulator was developed to simulate the structured overlay networks only. Its integrated with protocols such as Chord, Kademlia, Pastry, Tapestry, and Koorde etc. You use a simple script le to dene scenarios. The software was written in Java, and the latest version was released on December 2, 2011. Oversim It runs in Un*x systems and was implemented over OMNeT+ + as its network layer. It also supports the structured and unstructured P2P networks. Implemented protocols include Chord, Pastry, Bamboo, Koorde, Broose, Kademlia, GIA, Vast and Publish-Subscribe for MMOGs. The latest updated version was released on November 3, 2010. PlanetSim PlanetSim is a event-based overlay network simulator. Both the structured and unstructured P2P network are supported. Its architecture was well designed and comprises the layers network, overlay and application. Its very simple for the user to extend it. You may nd the latest version at PlanetSim Sourceforge, the latest updated version was released on September 8, 2008. P2Psim Its a discrete event simulator to evaluate structured P2P protocols. The protocols supported include Chord, Accordion, Koorde, Kelips, Tapestry, and Kademlia. The latest version was released on April 18, 2005. General Purpose P2P simulator (GPS) GPS is a application layer simulator written in Java. It supports both of the structured and unstructured overlay networks. It seems that a full implemented BitTorrent protocol in

Figure 2: Screenshot of Overlay Weaver GPS and can simulate the BitTorrent protocol more precisely than others. The latest version was released on September 25, 2005. NeuroGrid NeuroGrid was written in Java and can simulate structured and unstructured protocols such as Freenet, Gnutella and NeuroGrid etc. The project can be found at (NeuroGrid Sourceforge), the latest version was release on March 4, 2004.

Figure 3: Screenshot of Oversim

Figure 4: Screenshot of GPS

5 Comparision of Load Balancing Algortihms


A complete Chord ring can be used to study and compare the algorithms of load balancing. The study [12] compared by using such method. It focus on the distribution of documents among nodes in a DHT. It suggested that the size of nodes in the Chord Distributed Hash Table should be 4096. Each of the test case needs to be test servaral times to conrm the results. The documents in the system are ranging from 100,000 to 1,000,000, and the keys of the data and nodes were generated randomly. The address space of the Chord ring have size of 22 bits, so it can store and manage 222 = 4194304 documents and/or nodes.

Figure 5: Simulation Results comparing Dierent Approaches for Load Balancing in Chord.[12] Figure 5a shows the document distribution in Chord when theres no load balancing. Therere same nodes that didnt manage any documents even with hugh amount of documents in the DHT. So some other nodes may have to endure the load ten times the average. Figure 5b shows that power of tow choices is better than the Chord without load balancing, while there still have some nodes without any documents. The Chord with virtual servers have a more eective load balancing (Figure 5c), the cons are that the nodes have to manager many virtual servers and the data of those virtual servers should be stored in the memory of the physical nodes. The best one is showed in Figure 5d, which use the heat dispersion algorithm. Each of the node manager a certain amount of data, the uctuations are the smallest. A document is transfered only from 9

neighbor.

6 Conclusions
Load balancing of the P2P networks is very interesting because its research results can be applied in other distribution applications. We introduced the current status of structured P2P load balancing studies, which include serveral commonly used methods, such as virtual servers, multiple hash strategy, data replication strategy, caching strategy, and adjustable routing strategy. We also collected and compared some P2P network simulators used in our further research.

References
[1] S. Bianchi, S. Serbu, P. Felber, and P. Kropf. Adaptive load balancing for dht lookups. In Computer Communications and Networks, 2006. ICCCN 2006. Proceedings.15th International Conference on, pages 411 418, oct. 2006. 3, 5 [2] S. B. Handurukande, A.-M. Kermarrec, F. Le Fessant, L. Massouli, and S. Patarin. Peer sharing behaviour in the edonkey network, and implications for the design of server-less le sharing systems. SIGOPS Oper. Syst. Rev., 40:359371, April 2006. 3 [3] Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: a scalable peer-topeer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11:1732, February 2003. 3 [4] Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Wide-area cooperative storage with cfs, 2001. 3 [5] John Byers, Jerey Considine, and Michael Mitzenmacher. Simple load balancing for distributed hash tables. pages 8087, 2002. 3, 4 [6] Jonathan Ledlie and Margo I. Seltzer. Distributed, secure load balancing with skew, heterogeneity and churn. In INFOCOM05, pages 14191430, 2005. 3, 4 [7] Ye-In Chang, Hue-Ling Chen, Sih-Ning Li, and Hung-Ze Liu. A dynamic hashing approach to supporting load balance in p2p systems. In Distributed Computing Systems Workshops, 2008. ICDCS 08. 28th International Conference on, pages 429 434, june 2008. 3, 4 [8] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and replication in unstructured peer-to-peer networks. pages 8495. ACM Press, 2002. 3, 4 10

[9] Zhiyong Xu and L. Bhuyan. Eective load balancing in p2p systems. In Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, volume 1, pages 81 88, may 2006. 3, 4 [10] A. Datta, R. Schmidt, and K. Aberer. Query-load balancing in structured overlays. In Cluster Computing and the Grid, 2007. CCGRID 2007. Seventh IEEE International Symposium on, pages 453 460, may 2007. 3, 5 [11] Quang Hieu Vu, Beng Chin Ooi, M. Rinard, and Kian-Lee Tan. Histogrambased global load balancing in structured peer-to-peer systems. Knowledge and Data Engineering, IEEE Transactions on, 21(4):595 608, april 2009. 3 [12] Simon Rieche, Leo Petrak, and Klaus Wehrle. A thermal-dissipation-based approach for balancing data load in distributed hash tables. In In Proc. of 29th Annual IEEE Conference on Local Computer Networks (LCN, 2004. 9

11

You might also like