(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011
damping factor (α)
is eminently empirical, and in most cases
the value of α can be taken as 0.85 [1].
Page Rank is thestationary state of a Markov chain [2, 7]. The chain is obtainedby perturbing the transition matrix induced by a web graphwith a damping factor that spreads uniformly over the rank.T
he behavior of Page Rank with respect to changes in α
isuseful in link-spam detection [3]. The mathematical analysisof Page Rank with change in
α
show that contrary to popularbelief, for real-
world graphs values of α
close to 1 do not givea more meaningful ranking [2,21]. The order of displayed webpages is computed by the search engine Google as thePageRank vector, whose entries are the Page Ranks of the webpages [4]. The Page Rank vector is the stationary distributionof a stochastic matrix, the Google matrix. The Google matrixin turn is a convex combination of two stochastic matrices:one matrix represents the link structure of the web graph and asecond, rank-one matrix, mimics the random behavior of websurfers and can also be used to fight web spamming. As aconsequence, Page Rank depend mainly the link structure of the web graph, but not on the contents of the web pages. Alsothe Page Rank of the first vertex, the root of the graph, followsthe power law [10]. However, the power undergoes a phase-transition as parameters of the model vary.Link-based ranking algorithms rank web pages by using thedominant eigenvector of certain matrices--like the co-citationmatrix or its variations [17]. Distributed page ranking on top of structured peer-to-peer networks is needed because the size of the web grows at a remarkable speed and centralized pageranking is not scalable [5].Page ranking can be propagation rates depending on the
types of the links and user’s specific set of interest
s [6]. Pagefiltering can be decided based on link types combined withsome other information relevant to links. For ranking, a profilecontaining a set of ranking rules to be followed in the task can
be specified to reflect user’s specific interests
[20].Similarities of contents between hyperlinked pages are usefulto produce a better global ranking of web pages [19].III.
C
HALLENGESThe primary focus of Web Information Retrieval SupportSystem (WIRSS) is to address the aspects of search thatconsider the specific needs and goals of the individualsconducting web searches [15]. The major goal is to providehigh quality search results over a rapidly growing World WideWeb. Google employs a number of techniques to improvesearch quality including page rank, anchor text, and proximityinformation. Decentralized content publishing is the mainreason for the explosive growth of the web. Corresponding to auser query there are many documents that can be retrieve bysearch engine. And every owner of the document wants toimprove the ranking of its document. Commercial searchengine have to maintain the integrity of there search results andthis is one reason for the unavailability of the efforts made bythem publicly. Democratization of content creation on the webgenerates new challenges in WIRSS. This gives rise to thequestion on integrity of web pages. In a simplistic approach,one might argue that only some publishers are trustworthy andothers not. One more challenge is fast crawling technology isneeded to gather the web objects and keep them up to date.IV.
W
EB_
O
BJECT_
R
ANK
A
LGORITHM
A
ND
I
MPLEMENTATIONPage Rank of a web object can be defined as thefraction of time that the surfer spends on an average onthat object.
The probability that the random surfer visits aweb page is its Page Rank [1]. Evidently, web objects thatare hyperlinked by many other pages are visited moreoften. The random surfer gets bored and restarts fromanother random web object with a probability termed asthe
moister factor (m)
.
The probability that the surferfollow a randomly chosen outlink is
(1-m)
.The Markov Chain is a
discrete-time stochastic process:
a process that occurs in a series of time-steps ineach of which a random choice is made [7]. There is onestate corresponding to each web object. Hence, a Markovchain consists of
N states if there are N numbers of WebObjects in the collection.
A Markov chain is characterizedby an
N
×
N Probability Transition Matrix P
each of whose entries is in the interval [0, 1]; the entries in eachrow of
P
add up to 1. Markov Property states that eachentry
Pij
is the transition probability that depends only onthe current state
i.
A Markov chain’s probability
distribution over its states may be viewed as a
ProbabilityVector
: a vector all of whose entries are in the interval [0,1], and the entries add up to 1. According to [7, 14] theproblem of computing bounds on the conditional steady-state Probability Vector of a subset of states in finite,discrete-time Markov chains is considered.
A. Web_Object_Rank Algorithm: Features
Features of Object Rank Algorithm are as follow:
Query independent algorithm (assigns a value toevery document independent of query).
Content independent Algorithm.
Concerns with static quality of a web page.
Object Rank value can be computed offline usingonly web graph.
Object Rank is based upon the linking structure of the whole web.
Object Rank does not rank website as a whole butit is determined for each web page individually.
Object Rank of web pages T
i
which link to page Adoes not influence the rank of page A uniformly.
More are the outbound links on a page T, less willpage A benefit from a link to it.
Object
Rank is a model of user’s behavior
.
B. Web_Object_Rank Algorithm: Assumptions
If there are multiple links between two web objects,only a single edge is placed.
No self loops allowed.
The edges could be weighted, but we assume thatno weight is assigned to edges in the graph.
Links within the same web site are removed.
Isolated nodes are removed from the graph.
163http://sites.google.com/site/ijcsis/ISSN 1947-5500