Professional Documents
Culture Documents
Experiment No. 9: Aim: - Theory
Experiment No. 9: Aim: - Theory
2021-2022
Dhruv Jain
60004190030
TE COMPS A
Experiment No. 9
Aim: -Implementation of HITS algorithm.
Theory: -
Hyperlink Induced Topic Search (HITS) Algorithm is a Link Analysis Algorithm that
rates webpages, developed by Jon Kleinberg. This algorithm is used to the web link-structures to
discover and rank the webpages relevant for a particular search.
HITS uses hubs and authorities to define a recursive relationship between webpages. Before
understanding the HITS Algorithm.
Given a query to a Search Engine, the set of highly relevant web pages are called Roots.
They are potential Authorities.
Pages that are not very relevant but point to pages in the Root are called Hubs. Thus, an
Authority is a page that many hubs link to whereas a Hub is a page that links to many
authorities.
Algorithm
-> Let number of iterations be k.
-> Each node is assigned a Hub score = 1 and an Authority score = 1.
-> Repeat k times:
Hub update:Each node’s Hub score = \Sigma (Authority score of each node it pointsto).
Authority update:Each node’s Authority score = \Sigma (Hub score of each node
pointing to it).
Normalizethe scores by dividing each Hub, Authority score by sum of their individual
values
Code: -
importnumpyasnp
defoutgoing(page,old_auth):
count = 0
temp = 0
foriingraph[page]:
ifi==1:
count+=old_auth[chr(65+temp)]
temp+=1
returncount
defincoming(page,old_hub):
count = 0
temp = 0
foriingraph[:,page]:
ifi==1:
count+=old_hub[chr(65+temp)]
temp+=1
returncount
defnormalize(scores):
total = sum(scores.values())
a = {k: round(v / total,2)fork, vinscores.items()}
returna
foriinrange(epochs):
old_auth = auth_score.copy()
old_hub =
hub_score.copy()forjinrange(
n):
auth_score[chr(65+j)] = incoming(j,old_hub)
hub_score[chr(65+j)] = outgoing(j,old_auth)
Output: -
A-->A:0
A-->B:0
A-->C:0
A-->D:1
A-->E:0
A-->F:0
A-->G:0
A-->H:0
B-->A:0
B-->B:0
B-->C:1
B-->D:0
B-->E:1
B-->F:0
B-->G:0
B-->H:0
C-->A:1
C-->B:0
C-->C:0
C-->D:0
C-->E:0
C-->F:0
C-->G:0
C-->H:0
D-->A:0
D-->B:1
D-->C:1
D-->D:0
D-->E:0
D-->F:0
D-->G:0
D-->H:0
E-->A:0
E-->B:1
E-->C:1
E-->D:1
E-->E:0
E-->F:1
E-->G:0
E-->H:0
F-->A:0
F-->B:0
F-->C:1
F-->D:0
F-->E:0
F-->F:0
F-->G:0
F-->H:1
G-->A:1
G-->B:0
G-->C:1
G-->D:0
G-->E:0
G-->F:0
G-->G:0
G-->H:0
H-->A:1
H-->B:0
H-->C:0
H-->D:0
H-->E:0
H-->F:0
H-->G:0
H-->H:0
1 )Auth score:- {'A': 3, 'B': 2, 'C': 5, 'D': 2, 'E': 1, 'F': 1, 'G': 0, 'H': 1}
1 )Hub score:- {'A': 1, 'B': 2, 'C': 1, 'D': 2, 'E': 4, 'F': 2, 'G': 2, 'H': 1}
1 )normalized Auth:- {'A': 0.2, 'B': 0.13, 'C': 0.33, 'D': 0.13, 'E': 0.07, 'F': 0.07, 'G': 0.0, 'H':
0.07}
1 )normalized hub:- {'A': 0.07, 'B': 0.13, 'C': 0.07, 'D': 0.13, 'E': 0.27, 'F': 0.13, 'G': 0.13, 'H':
0.07}
2 )Auth score:- {'A': 0.27, 'B': 0.4, 'C': 0.79, 'D': 0.34, 'E': 0.13, 'F': 0.27, 'G': 0, 'H': 0.13}
2 )Hub score:- {'A': 0.13, 'B': 0.4, 'C': 0.2, 'D': 0.46, 'E': 0.6600000000000001, 'F': 0.4, 'G':
0.53, 'H': 0.2}
2 )normalized Auth:- {'A': 0.12, 'B': 0.17, 'C': 0.34, 'D': 0.15, 'E': 0.06, 'F': 0.12, 'G': 0.0, 'H':
0.06}
2 )normalized hub:- {'A': 0.04, 'B': 0.13, 'C': 0.07, 'D': 0.15, 'E': 0.22, 'F': 0.13, 'G': 0.18, 'H':
0.07}
3 )Auth score:- {'A': 0.32, 'B': 0.37, 'C': 0.81, 'D': 0.26, 'E': 0.13, 'F': 0.22, 'G': 0, 'H':0.13}
3 )Hub score:- {'A': 0.15, 'B': 0.4, 'C': 0.12, 'D': 0.51, 'E': 0.78, 'F': 0.4, 'G': 0.46, 'H':0.12}
3 )normalized Auth:- {'A': 0.14, 'B': 0.17, 'C': 0.36, 'D': 0.12, 'E': 0.06, 'F': 0.1, 'G': 0.0, 'H':
0.06}
3 )normalized hub:- {'A': 0.05, 'B': 0.14, 'C': 0.04, 'D': 0.17, 'E': 0.27, 'F': 0.14, 'G': 0.16, 'H':
0.04}
5 )Auth score:- {'A': 0.27, 'B': 0.43, 'C': 0.8800000000000001, 'D': 0.29, 'E': 0.14, 'F': 0.25,
'G': 0, 'H': 0.14}
5 )Hub score:- {'A': 0.13, 'B': 0.42, 'C': 0.1, 'D': 0.54, 'E': 0.78, 'F': 0.42, 'G':
0.45999999999999996, 'H': 0.1}
5 )normalized Auth:- {'A': 0.11, 'B': 0.18, 'C': 0.37, 'D': 0.12, 'E': 0.06, 'F': 0.1, 'G': 0.0, 'H':
0.06}
5 )normalized hub:- {'A': 0.04, 'B': 0.14, 'C': 0.03, 'D': 0.18, 'E': 0.26, 'F': 0.14, 'G': 0.16, 'H':
0.03}
6 )Auth score:- {'A': 0.22, 'B': 0.44, 'C': 0.8800000000000001, 'D': 0.3, 'E': 0.14, 'F': 0.26,
'G': 0, 'H': 0.14}
6 )Hub score:- {'A': 0.12, 'B': 0.43, 'C': 0.11, 'D': 0.55, 'E': 0.77, 'F': 0.43, 'G': 0.48, 'H': 0.11}
6 )normalized Auth:- {'A': 0.09, 'B': 0.18, 'C': 0.37, 'D': 0.13, 'E': 0.06, 'F': 0.11, 'G': 0.0, 'H':
0.06}
6 )normalized hub:- {'A': 0.04, 'B': 0.14, 'C': 0.04, 'D': 0.18, 'E': 0.26, 'F': 0.14, 'G': 0.16, 'H':
0.04}
Conclusion: -Hyperlink Induced Topic Search (HITS) is an algorithm used in link analysis. It
could discover and rank the webpages relevant for a particular search. The idea of this algorithm
originated from the fact that an ideal website should link to other relevant sites and being linked
by other important sites.