Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
06 Paper 31121142 IJCSIS Camera Ready Paper Pp. 32-41

06 Paper 31121142 IJCSIS Camera Ready Paper Pp. 32-41

Ratings: (0)|Views: 17 |Likes:
Published by ijcsis

More info:

Published by: ijcsis on Feb 19, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/19/2012

pdf

text

original

 
1
Calculating Rank of Nodes in DecentralisedSystems from Random Walks and Network Parameters
Sunantha Sodsee
, Phayung Meesad
, Mario Kubek 
, Herwig Unger
King Mongkut’s University of Technology North Bangkok, Thailand
Fernuniversit¨at in Hagen, Germany
Email: sunantha.sodsee@fernuni-hagen.de
 Abstract
—To use the structure of networks for identifying theimportance of nodes in peer-to-peer networks, a distributed link-based ranking of nodes is presented. Its aim is to calculatethe nodes’ PageRank by utilising the local-link knowledge of neighborhood nodes rather than the entire network structure.Thereby, an algorithm to determine the extended PageRank,which is called NodeRank of nodes by distributed random walksthat supports dynamic P2P networks is presented here. It takesinto account not only the probabilities of nodes to be visitedby a set of random walkers but also network parameters asthe available bandwidth. NodeRanks calculated this way arethen applied for content distribution purposes. The algorithmis validated by numerical simulations. The results show that thenodes suited best to place sharable contents in the communityon are the ones with high NodeRanks, which also offer high-bandwidth connectivity.
 Index Terms
—Peer-to-peer systems, PageRank, NodeRank,random walks, network parameters, content distribution.
I. I
NTRODUCTION
At present, the amount of data available in the
World Wide Web
(WWW) is growing rapidly. To ease searchingfor information, several web search engines were designed,which determine the relevance of keywords characterising thecontent of web pages and return all search results to queryingusers (or nodes) such as an ordinary index-based keywordsearch method. Usually, there are more results than users areexpecting and able to handle. As a consequence of this, aranking of query results is needed to facilitate searchers toaccess lists of search results ranked according to keywordrelevance.In particular, the search engine
Google
is based on key-words. To improve its search quality, a link analysis algorithmcalled
PageRank 
[1] is used to define a rank of any page byconsidering the page’s linkage. The importance of a web pageis assumed to correlate to the importance of the pages pointingto it. Another link-based algorithm is the
Hyperlink-Induced Topic Search (HITS)
[2]. It maintains a hub and authorityscore for each page, in which the authority and hub scores arecomputed by the linkage relationship of pages. Both
PageRank 
and
HITS
have an ability to determine the rank of keywordrelevance but they are iterative algorithms. These algorithmsrequire centralised servers, since they process knowledge onthe entire Internet. Consequently, they cannot be applied indecentralised systems like peer-to-peer (P2P) networks.Because of its higher fault tolerance, autonomy, resourceaggregation and dynamism, the content-based presentationof information in P2P networks has more benefits than thetraditional client-server model. One of the crucial criteriafor the use of the P2P paradigm is the search effectivenessmade possible. The usually employed search method basedon flooding[4] works by broadcasting query messages hop-by-hop across networks. This approach is simple, but not efficientin terms of network bandwidth utilisation. Another method,distributed hash tables based search (DHT) [3] is efficient interms of network bandwidth, but causes considerable overheadwith respect to index files. DHT does not adapt to dynamicnetworks and dynamic content stored in nodes. Exhibiting faulttolerance, self-organisation and low overhead associated withnode creation and removal, conducting
random walks
is apopular alternative to flooding [5]. Many search approachesin distributed search systems seek to optimise search perfor-mance. The objective of a search mechanism is to successfullyreturn desired information to a querying user. In order to meetthis goal, several approaches, e.g. [5], [6], were proposed.Most of them, however, base search on content, only.Due to the efficiency of [1] in the most-used search engine,the link analysis algorithm PageRank for determining theimportance of nodes has become a significance techniqueintegrated in distributed search systems as it is not onlysensible to apply it in centralized system for improving queryresults, but can also be of use in distributed systems. [7],[8] and [9] proposed distributed PageRank computations. Thework in [7] is based on iterative aggregation-disaggregationmethods. Each node calculates a PageRank vector for itslocal nodes by using links within sites. The local PageRank will be updated by communicating with a given coordinator.For [8] and [9], nodes compute their PageRank locally bycommunicating with linked nodes. Moreover, [9] presentedthat each node exchanges its PageRank with nodes to which itlinks to and those linking to it and paid attention to only partsof the linked nodes required to be contacted. Nevertheless,the mentioned works do not employ any network parametersin defining PageRank, which could be of advantage to reduceuser access times.Herein, the first contribution of this paper is to introduce animproved notion of PageRank applied in P2P networks whichworks in a distributed manner. When conducting searches, not
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, January 201232http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
2
only matching content but also content accessibility is con-sidered which will influence the rank calculations presented.Therefore, a distributed algorithm based on random walks isproposed which takes network parameters, of which bandwidthis the most important one, into consideration when calculatingranks, which is called NodeRank. This novel NodeRank de-termination will be described in Sec. III, after the state of theart has been outlined in Sec. II. The second contribution isto enhance the search performance in hybrid P2P systems.The presented NodeRank formula can be applied not onlyto support information retrieval but also content distributionin order to find the most suitable location for contents to bedistributed. Contents will be distributed by artificial behaviorof random walkers, which is based on a modified ant-basedclustering algorithms, to pick from specific nodes and placecontents on the most suitable location based on the presentedNodeRank definition. Its details will be presented in Sec. IV.II. S
TATE OF THE
A
RT
In this section, the background of P2P systems is presentedfirst. Then, ant-based clustering algorithms are introduced.Later, the PageRank formula according to [1] is described.Finally, the simulation tool
P2PNetSim
used in this work ispresented.
 A. P2P Systems
Currently, most of the traffic growth in the Internet is causedby P2P applications. The P2P paradigm allows a group of computer users (employing the same networking software) toconnect with each other to share resources. Peers provide theirresources such as processing power, disk storage, network bandwidth and files to be directly available to other peers.They behave in a distributed manner without a central server.As peers can act as both server and client then they are alsocalled
servent 
, which is different from the traditional client-server model. In addition, P2P systems are adaptive network structures whose nodes can join and leave them autonomously.Self-organisation, fault-tolerance, load balancing mechanismsand the ability to use large amounts of resources constitutefurther advantages of P2P systems.
1) System Architectures:
At present, there are three-majorarchitectures for P2P systems, viz. unstructured, hybrid andstructured ones.In unstructured P2P systems, however, such as Gnutella [4],a node queries its neighbours (and the network) by flood-ing with broadcasts. Unstructuredness supports dynamicityof networks, and allows nodes to be added or removed atany time. These systems have no central index, but they arescalable, because flooding is limited by the messages’ time-to-live (TTL). Moreover, they allow for keyword search, butcannot guarantee a certain search performance.Cluster-based hybrid P2P systems or hybrid P2P systemsare a combination of fully centralised and pure P2P systems.Clustering represents the small-world concept [15], becausesimilar things are kept close together, and long distance linksare added. The concept allows fast access to locations insearching. The most popular example for them is KaZaA[13]. It includes features both from the centralized severmodel and the P2P model. To cluster nodes certain criteriaare used. Nodes with high storage and computing capacitiesare selected as
super nodes
. The normal nodes (
clients
) areconnected to the super nodes. The super nodes communicatewith each other via inter-cluster networks. In contrast, clientswithin the same cluster are connected to a central node. Thesuper nodes carry out query routing, indexing and data searchon behalf of the less powerful nodes. Hybrid P2P systemsprovide better scalability than centralised systems, and showlower transmission latency (i.e. shorter network paths) thanunstructured P2P systems.In structured P2P systems, peers or resources are placedat specified locations based on specific topological criteriaand algorithmic aspects facilitating search. They typicallyuse distributed hash table-based indexing [3]. Structured P2Psystems have the form of self-organising overlay networks,and support node insertion and route look-up in a boundednumber of hops. Chord [10], CAN[11] and Pastry [12] areexamples of such systems. Their features are load balancing,fault-tolerance, scalability, availability and decentralisation.
2) Search Methods:
Generally, in P2P systems, three kindsof content search methods are supported. First, when search-ing with a specific keyword, the query message from therequesting node is repeatedly routed and forwarded to othernodes in order to look for the desired information. Secondly,for advertisement-based search [14], each node advertises itscontent by delivering advertisements and selectively storinginteresting advertisements received from other nodes. Eachnode can locate the nodes with certain content by lookingup its local advertisement repository. Thus, it can obtain suchcontent by a one-hop search with modest search cost. Finally,for cluster-based search, nodes are grouped according to thesimilarity of their contents in clusters. When a client submits aquery to a server, it is transmitted to all nodes whose addressesare kept by the server, and which may be able to provideresources possibly satisfying the query’s search criteria.In this paper, cluster-based P2P systems are considered inthe example application, which combines the advantages of both the centralised server model and distributed systems toenhance search performance.
 B. Ant-based Clustering Methods
In distributed search systems, data clustering is an estab-lished technique for improving quality not only in infor-mation retrieval but also distribution of contents. Clusteringalgorithms, in particular ant-based ones, are self-organizingmethods -there is no central control- and also work efficientlyin distributed systems.Natural ants are social insects. They use a stigmergy [16] asan indirect way of co-ordination between them or their actions.This gave rise to a form of self-organisation, producingintelligence structures without any plans, controls or directcommunication between the ants. Imitating the behaviour of ant societies was first proposed to solve optimisation problemsby
Dorigo
[17].In addition, ants can help each other to form piles of items such as corpses, larvae or grains of sand by using the
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, January 201233http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
3
stigmergy. Initially, ants deposit items at random locations.When other ants visit the locations and perceive depositeditems, they are stimulated to deposit items next to them.This example corresponds to cluster building in distributedcomputer networks.In 1990,
Deneubourg et al.
[18] first proposed a clusteringand sorting algorithm mimicking ant behaviour. This algorithmis implemented based on corpse clustering and larval sortingof ants. In this context, clusters are collections of items piledby ants, and sorting is performed by distinguishing items byants which place them at certain locations according to itemattributes. According to [18], isolated items should be placedat locations of similar items of matching type, or taken awayotherwise. Thus, ants can pick up, carry and deposit itemsdepending on associated probabilities. Moreover, ants mayhave the ability to remember the types of items seen withinparticular durations and moved randomly on spatial grids.Few years later,
Lumer 
and
Faieta
[19] proposed severalmodifications to the work above for application in data anal-ysis. One of their ideas is a similarity definition. They usea distance such as a Euclidean one to identify similarity ordissimilarity between items. An area of local neighbourhoodat which ants are usually centered is defined. Another ideasuggested for ant behaviour is to assume short-term memory.An ant can remember the last
m
items picked up and thelocations where they have been placed.The above mentioned contributions pioneer the area of ant-based clustering. At present, the well-known ant-basedclustering algorithms are being generalised, e.g. in
Merelo
[20].
C. The PageRank Algorithm
As in hybrid P2P architectures, good locations of clusterscan improve search performance. To find suitable locations,ranking algorithms can be applied.Herein, the
PageRank 
(PR) algorithm, introduced by Brinand Page [1], is presented that is well-known, efficient andsupports networks of large sizes. Based on link analysis, it isa method to rank the importance of based on incoming links.The basic idea of 
PageRank 
is that a page’s rank correlatesto the number of incoming links from other, more importantpages. In addition, a page linked with an important pageis also important [7]. Most popular search engines such as
Google
employ the
PageRank 
algorithm to rank search results.
PageRank 
is further based on user behaviour: a user visits aweb page following a hyperlink with a certain probability
η
,or jumps randomly to a page with probability
1
η
. The rank of a page correlates to the number of visiting users.Classically, for PageRank calculation the whole network graph needs to be considered. Let
i
represent a web page,and
be the set of pages pointing to page
i
. Further, let theusers follow links with a certain probability
η
(often calleddamping factor) and jump to random pages with probability
1
η
. Then, with the out-degree
|
j
|
of page
j
PageRank 
PR
i
of page
i
is defined as
PR
i
= (1
η
) +
η
j
PR
j
|
j
|
.
(1)
Fig. 1.
P2PNetSim
: simulation tool for large P2P networks
The damping factor
η
is empirically determined to be
90%
.
 D. The Simulation Tool P2PNetSim
The modified PageRank calculation presented here will beconsidered in general setting. In order to carry out experi-ments, the conditions of real networks are simulated by usingthe artificial environment of the distributed network simulator
P2PNetSim
[21]. This tool was developed, because neithernetwork simulators nor other existing simulation tools are ableto investigate, in decentralised systems, processes programmedon the application level, but executed in real TCP/IP-basednetwork systems. This means, a network simulator was neededthat is capable of 
simulating a TCP/IP network with an IP address space,limited bandwidth and latencies giving developers thepossibility to structure the nodes into subnets like inexisting IPv4 networks,
building up any underlying hardware structure and estab-lishing variable time-dependent background traffic,
setting up an initial small-world structure in peer neigh-bourhood warehouses and
setting up peer structures allowing the programmer toconcentrate on programming P2P functionality and to uselibraries of standard P2P functions like broadcasts.Fig. 1 presents the simulation window of 
P2PNetSim
. Thesimulator allows to simulate large-scale networks and toanalyse them on cluster computers, i.e. up to 2 million peerscan be simulated on up to 256 computers. The behaviour of all nodes can be implemented in Java and, then, be distributedover the nodes of the network simulated.At start up, an interconnection of the peers according tothe small-world concept is established in order to simulatethe typical physical structure of computers connected to theInternet.
P2PNetSim
can be used through its graphical userinterface (GUI) allowing to set up, run and control simulations.For this task, one or more simulators can be set up. Eachsimulator takes care of one class A IP subnet, and all peerswithin this subnet. Each simulator is bound to a so-calledsimulation node, which is a simulators execution engine.Simulation nodes reside on different machines and, therefore,work in parallel. Communication between peers within onesubnet is confined to the corresponding simulation node. This
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, January 201234http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->