You are on page 1of 6

A.Y.

2021-2022

Dhruv Jain
60004190030
TE COMPS A
Experiment No. 9
Aim: -Implementation of HITS algorithm.

Theory: -
Hyperlink Induced Topic Search (HITS) Algorithm is a Link Analysis Algorithm that
rates webpages, developed by Jon Kleinberg. This algorithm is used to the web link-structures to
discover and rank the webpages relevant for a particular search.
HITS uses hubs and authorities to define a recursive relationship between webpages. Before
understanding the HITS Algorithm.

 Given a query to a Search Engine, the set of highly relevant web pages are called Roots.
They are potential Authorities.
 Pages that are not very relevant but point to pages in the Root are called Hubs. Thus, an
Authority is a page that many hubs link to whereas a Hub is a page that links to many
authorities.

Algorithm
-> Let number of iterations be k.
-> Each node is assigned a Hub score = 1 and an Authority score = 1.
-> Repeat k times:

 Hub update:Each node’s Hub score = \Sigma (Authority score of each node it pointsto).
 Authority update:Each node’s Authority score = \Sigma (Hub score of each node
pointing to it).
 Normalizethe scores by dividing each Hub, Authority score by sum of their individual
values

Code: -
importnumpyasnp

n = int(input("Enter number of pages:- "))


epochs = int(input("Enter number of iterations:- ")) graph =
np.array([[0foriinrange(n)]forjinrange(n)])

print("Enter 1 if link is present 0 otherwise:- \n")


foriinrange(n):
forjinrange(n):
graph[i][j] = int(input(f"{chr(65+i)}-->{chr(65+j)}: "))

auth_score = {} #in lines


hub_score = {} #out lines
#initializationfo
riinrange(n):
auth_score[chr(65+i)] = 1
hub_score[chr(65+i)] = 1

defoutgoing(page,old_auth):
count = 0
temp = 0
foriingraph[page]:
ifi==1:
count+=old_auth[chr(65+temp)]
temp+=1
returncount

defincoming(page,old_hub):
count = 0
temp = 0
foriingraph[:,page]:
ifi==1:
count+=old_hub[chr(65+temp)]
temp+=1
returncount

defnormalize(scores):
total = sum(scores.values())
a = {k: round(v / total,2)fork, vinscores.items()}
returna

foriinrange(epochs):
old_auth = auth_score.copy()
old_hub =
hub_score.copy()forjinrange(
n):
auth_score[chr(65+j)] = incoming(j,old_hub)
hub_score[chr(65+j)] = outgoing(j,old_auth)

print(i+1,")Auth score:- ",auth_score)


auth_score = normalize(auth_score)
print(i+1,")Hub score:- ",hub_score)
print(i+1,")normalized Auth:- ",auth_score)
hub_score = normalize(hub_score)
print(i+1,")normalized hub:-",hub_score,"\n")
Input Graph: -

Output: -

PS F:\SEM 5\DMW> python .\hitsAlgorithm.py


Enter number of pages:- 8
Enter number of iterations:- 6
Enter 1 if link is present 0 otherwise:-

A-->A:0
A-->B:0
A-->C:0
A-->D:1
A-->E:0
A-->F:0
A-->G:0
A-->H:0
B-->A:0
B-->B:0
B-->C:1
B-->D:0
B-->E:1
B-->F:0
B-->G:0
B-->H:0
C-->A:1
C-->B:0
C-->C:0
C-->D:0
C-->E:0
C-->F:0
C-->G:0
C-->H:0
D-->A:0
D-->B:1
D-->C:1
D-->D:0
D-->E:0
D-->F:0
D-->G:0
D-->H:0
E-->A:0
E-->B:1
E-->C:1
E-->D:1
E-->E:0
E-->F:1
E-->G:0
E-->H:0
F-->A:0
F-->B:0
F-->C:1
F-->D:0
F-->E:0
F-->F:0
F-->G:0
F-->H:1
G-->A:1
G-->B:0
G-->C:1
G-->D:0
G-->E:0
G-->F:0
G-->G:0
G-->H:0
H-->A:1
H-->B:0
H-->C:0
H-->D:0
H-->E:0
H-->F:0
H-->G:0
H-->H:0
1 )Auth score:- {'A': 3, 'B': 2, 'C': 5, 'D': 2, 'E': 1, 'F': 1, 'G': 0, 'H': 1}
1 )Hub score:- {'A': 1, 'B': 2, 'C': 1, 'D': 2, 'E': 4, 'F': 2, 'G': 2, 'H': 1}
1 )normalized Auth:- {'A': 0.2, 'B': 0.13, 'C': 0.33, 'D': 0.13, 'E': 0.07, 'F': 0.07, 'G': 0.0, 'H':
0.07}
1 )normalized hub:- {'A': 0.07, 'B': 0.13, 'C': 0.07, 'D': 0.13, 'E': 0.27, 'F': 0.13, 'G': 0.13, 'H':
0.07}

2 )Auth score:- {'A': 0.27, 'B': 0.4, 'C': 0.79, 'D': 0.34, 'E': 0.13, 'F': 0.27, 'G': 0, 'H': 0.13}
2 )Hub score:- {'A': 0.13, 'B': 0.4, 'C': 0.2, 'D': 0.46, 'E': 0.6600000000000001, 'F': 0.4, 'G':
0.53, 'H': 0.2}
2 )normalized Auth:- {'A': 0.12, 'B': 0.17, 'C': 0.34, 'D': 0.15, 'E': 0.06, 'F': 0.12, 'G': 0.0, 'H':
0.06}
2 )normalized hub:- {'A': 0.04, 'B': 0.13, 'C': 0.07, 'D': 0.15, 'E': 0.22, 'F': 0.13, 'G': 0.18, 'H':
0.07}

3 )Auth score:- {'A': 0.32, 'B': 0.37, 'C': 0.81, 'D': 0.26, 'E': 0.13, 'F': 0.22, 'G': 0, 'H':0.13}
3 )Hub score:- {'A': 0.15, 'B': 0.4, 'C': 0.12, 'D': 0.51, 'E': 0.78, 'F': 0.4, 'G': 0.46, 'H':0.12}
3 )normalized Auth:- {'A': 0.14, 'B': 0.17, 'C': 0.36, 'D': 0.12, 'E': 0.06, 'F': 0.1, 'G': 0.0, 'H':
0.06}
3 )normalized hub:- {'A': 0.05, 'B': 0.14, 'C': 0.04, 'D': 0.17, 'E': 0.27, 'F': 0.14, 'G': 0.16, 'H':
0.04}

4 )Auth score:- {'A': 0.24000000000000002, 'B': 0.44000000000000006, 'C':


0.8800000000000001, 'D': 0.32, 'E': 0.14, 'F': 0.27, 'G':
0, 'H': 0.14}
4 )Hub score:- {'A': 0.12, 'B': 0.42, 'C': 0.14, 'D': 0.53, 'E': 0.75, 'F': 0.42, 'G': 0.5, 'H': 0.14}
4 )normalized Auth:- {'A': 0.1, 'B': 0.18, 'C': 0.36, 'D': 0.13, 'E': 0.06, 'F': 0.11, 'G': 0.0, 'H':
0.06}
4 )normalized hub:- {'A': 0.04, 'B': 0.14, 'C': 0.05, 'D': 0.18, 'E': 0.25, 'F': 0.14, 'G': 0.17, 'H':
0.05}

5 )Auth score:- {'A': 0.27, 'B': 0.43, 'C': 0.8800000000000001, 'D': 0.29, 'E': 0.14, 'F': 0.25,
'G': 0, 'H': 0.14}
5 )Hub score:- {'A': 0.13, 'B': 0.42, 'C': 0.1, 'D': 0.54, 'E': 0.78, 'F': 0.42, 'G':
0.45999999999999996, 'H': 0.1}
5 )normalized Auth:- {'A': 0.11, 'B': 0.18, 'C': 0.37, 'D': 0.12, 'E': 0.06, 'F': 0.1, 'G': 0.0, 'H':
0.06}
5 )normalized hub:- {'A': 0.04, 'B': 0.14, 'C': 0.03, 'D': 0.18, 'E': 0.26, 'F': 0.14, 'G': 0.16, 'H':
0.03}
6 )Auth score:- {'A': 0.22, 'B': 0.44, 'C': 0.8800000000000001, 'D': 0.3, 'E': 0.14, 'F': 0.26,
'G': 0, 'H': 0.14}
6 )Hub score:- {'A': 0.12, 'B': 0.43, 'C': 0.11, 'D': 0.55, 'E': 0.77, 'F': 0.43, 'G': 0.48, 'H': 0.11}
6 )normalized Auth:- {'A': 0.09, 'B': 0.18, 'C': 0.37, 'D': 0.13, 'E': 0.06, 'F': 0.11, 'G': 0.0, 'H':
0.06}
6 )normalized hub:- {'A': 0.04, 'B': 0.14, 'C': 0.04, 'D': 0.18, 'E': 0.26, 'F': 0.14, 'G': 0.16, 'H':
0.04}

Conclusion: -Hyperlink Induced Topic Search (HITS) is an algorithm used in link analysis. It
could discover and rank the webpages relevant for a particular search. The idea of this algorithm
originated from the fact that an ideal website should link to other relevant sites and being linked
by other important sites.

You might also like