You are on page 1of 6

Mining of Massive Datasets

Leskovec, Rajaraman, and Ullman


Stanford University
Idea: Links as votes
Page is more important if it has more links
In-coming links? Out-going links?
Think of in-links as votes:
www.stanford.edu has 23,400 in-links
www.joe-schmoe.com has 1 in-link

Are all in-links are equal?


Links from important pages count more
Recursive question!

J. Leskovec, A. Rajaraman, J. Ullman (Stanford University) Mining of Massive Datasets 16


A
B
3.3 C
38.4
34.3

D
E F
3.9
8.1 3.9

1.6
1.6 1.6 1.6
1.6

J. Leskovec, A. Rajaraman, J. Ullman (Stanford University) Mining of Massive Datasets 17


Each link’s vote is proportional to the
importance of its source page

If page j with importance rj has n out-links,


each link gets rj / n votes

Page j’s own importance is the sum of the


votes on its in-links i k
ri/3 r /4
k

j rj/3
rj = ri/3+rk/4
rj/3 rj/3

J. Leskovec, A. Rajaraman, J. Ullman (Stanford University) Mining of Massive Datasets 18


A “vote” from an important The web in 1839

page is worth more y/2


A page is important if it is y
pointed to by other important
a/2
pages y/2
Define a “rank” rj for page j m
a m
a/2
ri
rj “Flow” equations:

j di
ry = ry /2 + ra /2
i ra = ry /2 + rm
rm = ra /2
out-degree of node
J. Leskovec, A. Rajaraman, J. Ullman (Stanford University) Mining of Massive Datasets 19
Flow equations:
3 equations, 3 unknowns, ry = ry /2 + ra /2
no constants ra = ry /2 + rm
rm = ra /2
No unique solution
All solutions equivalent modulo the scale factor
Additional constraint forces uniqueness:
+ + =
Solution: = , = , =
Gaussian elimination method works for
small examples, but we need a better
method for large web-size graphs
We need a new formulation!
J. Leskovec, A. Rajaraman, J. Ullman (Stanford University) Mining of Massive Datasets 20

You might also like