You are on page 1of 27

ME 2011

Introduction
Hyperlink analysis HITS Algorithm Example Page Ranking Example

Goal of Information retrieval

Classification of rankers

Content based

Connectivity based

Web Page Authors use hyperlinks can give them valuable information content.

navigational aids

point to high quality pages that might be on the same topic as the page containing the hyperlink.

Hyperlink analysis algorithms make either one or both of the following simplifying assumptions:
Assumption 1. A hyperlink from page A to page B is a recommendation of page B by the author of page A. Assumption 2. If page A and page B are connected by a hyperlink, then they might be on the same topic.
The power of hyperlink analysis comes from the fact that it uses the content of other pages to rank the current page.

HITS ALGORITHM

Link Analysis Algorithms


PAGE RANK ALGORITHM

Developed by Jon Kleinberg

Uses link structure to rank pages.

Determines two values for a page: authority and hub .

AUTHORITY

PROVIDES IMPRORTANT AND TRUSTWORTHY INFORMATION

HUBS

Links to authorities

Indegree number of incoming links to a given node, used to measure the authoritativeness. Outdegree number of outgoing links from a given node, here it is used to measure the hubness.

HITS algorithm tries to determine good hubs and authorities. Given a user query, the algorithm iteratively computes hub and authority scores for each node in the neighborhood graph, and then ranks the nodes by those scores. A document that points to many others is a good hub, and a document that many documents point to is a good authority.

1) for all p, hub(p) = 1, auth(p) = 1


2) Authority Update Rule: For each page p, update auth(p) to be the sum of the hub scores of all pages that point to it. 3) Hub Update Rule: For each page p, update hub(p) to be the sum of the authority scores of all pages that it points to.
4) Normalize: the hub and auth vectors

a(1) = h(2) + h(3) + h(4) =1+1+1+=3


a(2) = a(3) = a(4) = 0

a(5)= a(6) = a(7) =1


h(1) = a(5) + a(6) + a(7)=1+1+1=3

Let A be the adjacency matrix of the neighborhood graph. Denote the authority weight vector by v and the hub weight vector by u

Is relatively easy to manipulate hubs and authority scores. Topic drift.

Ranking is a process of ordering the returned documents in decreasing order of relevance, that is so that the best answers are on the top.

Need Page Ranking??

Problems in Web Page Ranking


Huge Size of Web. Exponential increase in size. Unstructured nature of web pages. No control on content.

Basic Concepts
->web pages with high value will be ranked higher

->PageRank can only be increased or improved by getting quality links from other web pages.
-> There should be a page or pages that must be more importance than the others with the same topic in the world wide web.

From the linking structure of document, it can be interpreted as:

PageRank of Page 1 is higher than Page 3.


PageRank of Page 3 is higher than Page 2.

VOTE When Page 1 links out to Page 2, then Page 1 cast a vote to Page 2.

BACKLINKS When Page 1 links out to Page 2 internally, then Page 2 has a Backlink from Page 1. INTERNAL LINKS Links from web pages within your website. OUTGOING LINKS Links to other web pages within a web site

INBOUND LINK

OUTBOUND LINK

DANGLING LINK
The average PageRank number of pages is always one. Inbound Links will increase PageRank value of a page. Outbound Links will loss a portion of PageRank to the linked page.

PR[Page A] = PageRank of Page A


PR[Page 1] = PR value of Page 1 that has a Backlink to Page A Q[Page 1] = number of Outgoing Links of Page 1 d = damping factor

Page A has two backlines - a Backlink from Page 1 with PageRank value of 4 and a Backlink from Page 2 with PageRank value of 2. Page 1 has two outbound Links and Page 2 has only one Outbound Link.

THANK YOU

You might also like