Extrapolation Methods For Accelerating Pagerank Computations

Extrapolation Methods for Accelerating
PageRank Computations
Sepandar D. Kamvar
Taher H. Haveliwala
Christopher D. Manning
Gene H. Golub
Stanford University
Motivation
 Problem: Search: Giants
 Speed up PageRank
 Motivation:
 Personalization
Results: Results:
 “Freshness” 1. The Official Site of 1. The Official Site of
the New York Giants the San Francisco Giants
Note: PageRank Computations don’t get faster as computers do.
2
Outline 0.4
0.4
 Definition of PageRank
0.2
Repeat: x (k 1)  Ax (k)

 Computation of PageRank
 Convergence Properties u1 u2 u3 u4 u5
 Outline of Our Approach

u1 u2 u3 u4 u5
 Empirical Results
3
Link Counts
Taher’s Home Page Sep’s Home Page
DB Pub Server CS361 Yahoo! CNN
Linked by 2 Linked by 2
Unimportant pages Important Pages
4
Definition of PageRank
 The importance of a page is given by the
importance of the pages that link to it.
1
xi   x j
importance of page i
jBi N j
importance of page j
number of outlinks from page j

pages j that link to page i
5
Definition of PageRank
0.05 0.25
Taher Sep
1/2 1/2 1 1
DB Pub Server CNN Yahoo!

0.1 0.1 0.1
6
PageRank Diagram
0.333
0.333
0.333
1
Initialize all nodes to rank xi( 0 ) 
n
7
PageRank Diagram
0.167
0.333
0.167 0.333
Propagate ranks across links

(multiplying by link weights)
8
PageRank Diagram
0.5
0.333
0.167
1 (0)

(1)
xi  xj
jBi N j
9
PageRank Diagram
0.167
0.5
0.167 0.167
10
PageRank Diagram
0.333
0.5
0.167
1 (1)

( 2)
xi  xj
jBi N j
11
PageRank Diagram
0.4
0.4
0.2
After a while…
1
xi   xj
jBi N j
12
Computing PageRank
1
 Initialize: x (0)
i 
n
 Repeat until convergence:
1 (k )
x ( k 1)
i   xj
jBi N j
importance of page i
importance of page j
number of outlinks from

pages j that link to
page j
page i
13
Matrix Notation
1
xi   x j
jBi N j
.1 .1
.3 .3
.2 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2
.3 .3
T
.1
.1 P .1
.1
x 14
Matrix Notation
Find x that satisfies:
xP x T
.1 .1
.3 .3
.2 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2
.3 .3
.1 .1
.1 .1
15
Power Method
 Initialize: T
1 1
x (0)
 ... 
n n
x (k 1)  P T x (k)
16
A side note
 PageRank doesn’t actually use PT.
Instead, it uses A=cPT + (1-c)ET.
 So the PageRank problem is really:

Find x that satisfies: x  Ax
Find x that satisfies: x  P x
T
not:
17
Power Method
 And the algorithm is really . . .
 Initialize: T
1 1
x (0)
 ...
n n

x (k 1)  Ax (k)
18
Outline 0.4
0.4
0.2


u1 u2 u3 u4 u5
19
Power Method
Express x(0) in terms of eigenvectors of A
u1 u2 u3 u4 u5
1 2 3 4 5 20
Power Method
(1)
x
u1 u2 u3 u4 u5
1 22 33 44 55 21
Power Method
( 2)
x
u1 u2 u3 u4 u5
1 222 332 442 552 22
Power Method
(k )
x
u1 u2 u3 u4 u5
1 22k 33k 44k 55k 23
Power Method
( )
x
u1 u2 u3 u4 u5
1     24
Why does it work?
 Imagine our n x n matrix A
has n distinct eigenvectors
Au i  i u i
ui.
 Then, you can write any x (0)

 u1   2u 2  ...   nu n
n-dimensional vector as
a linear combination of
the eigenvectors of A.
u1 u2 u3 u4 u5
1 2 3 4 5
25
Why does it work?
 From the last slide: x (0)
 u1   2u 2  ...   nu n
 To get the first iterate, x (1)  Ax (0)

multiply x(0) by A.  Au1   2 Au 2  ...   n Au n
 1u1   2 2u 2  ...   n n u n
 First eigenvalue is 1. 1  1; 1  2  ...
 Therefore: x (1)  u1   2 2u 2  ...   n nu n
All less than 1

26
Power Method
x ( 0 )  u1   2u 2  ...   nu n
u1 u2 u3 u4 u5
1 2 3 4 5
x  u1   2 2u 2  ...   n nu n

(1)
u1 u2 u3 u4 u5
1 22 33 44 55
2 2
x ( 2)
 u1   2 2 u 2  ...   n n u n
u1 u2 u3 u4 u5
1 222 332 442 552
27
Convergence
k k
x (k )
 u1   2 2 u 2  ...   n n u n
u1 u2 u3 u4 u5
1 22 33 44 55k
k k k
 The smaller 2, the faster the

convergence of the Power Method.
28
Our Approach
Estimate components of current iterate in the directions
of second two eigenvectors, and eliminate them.
u1 u2 u3 u4 u5
29
Why this approach?
 For traditional problems:
 A is smaller, often dense.
 2 often close to , making the power method slow.
 In our problem,
 A is huge and sparse
 More importantly, 2 is small1.
 Therefore, Power method is actually much
faster than other methods.
“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

30
Using Successive Iterates
x(0)
u1
u1 u2 u3 u4 u5
31
x(0)
x(1)
u1
u1 u2 u3 u4 u5
32
x(0)
x(1)
x(2)
u1
u1 u2 u3 u4 u5
33
x(0)
x(1)
x(2)
u1
u1 u2 u3 u4 u5
34
x(0)
x(1)
x’ = u1
u1 u2 u3 u4 u5
35
How do we do this?
 Assume x(k) can be written as a linear
combination of the first three eigenvectors
(u1, u2, u3) of A.
 Compute approximation to {u2,u3}, and
subtract it from x(k) to get x(k)’
36
Assume
 Assume the x(k) can be represented by
first 3 eigenvectors of A
x ( k )  u1   2u 2   3u n
x ( k 1)  Ax( k )  u1   2 2u 2   33u 3
x ( k  2 )  u1   2 22u 2   332u 3
x ( k 3)  u1   2 32u 2   333u 3

37
Linear Combination
 Let’s take some linear combination of
these 3 iterates.
1x ( k 1)   2 x ( k  2 )   3 x ( k 3)
 1 (u1   2 2u 2   33u 3 )
  2 (u1   2 22u 2   332u 3 )
  3 (u1   2 32u 2   333u 3 )
38
Rearranging Terms
 We can rearrange the terms to get:
1x ( k 1)   2 x ( k  2 )   3x ( k 3)
 ( 1   2   3 )u1
  2 ( 12   2 22   332 )u 2
  3 ( 13       )u 3
2
2 3
3
3 3
Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0,

and coefficient of u1 is 1.
39
Summary
 We make an assumption about the
current iterate.
 Solve for dominant eigenvector as a
linear combination of the next three
iterates.
 We use a few iterations of the Power
Method to “clean it up”.
40
Outline 0.4
0.4
0.2


u1 u2 u3 u4 u5
41
Results
Quadratic Extrapolation speeds up convergence.
Extrapolation was only used 5 times!
42
Results
Extrapolation dramatically speeds up convergence,
for high values of c (c=.99)
43
Take-home message
 Speeds up PageRank by a fair amount,
but not by enough for true Personalized
PageRank.
 Ideas are useful for further speedup
algorithms.
 Quadratic Extrapolation can be used for a
whole class of problems.
44
The End
 Paper available at
http://dbpubs.stanford.edu/pub/2003-16
45

Extrapolation Methods For Accelerating Pagerank Computations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Extrapolation Methods For Accelerating Pagerank Computations

Uploaded by

Copyright:

Available Formats

Extrapolation Methods for Accelerating

Note: PageRank Computations don’t get faster as computers do.

Repeat: x (k 1)  Ax (k)

 Outline of Our Approach

DB Pub Server CS361 Yahoo! CNN

number of outlinks from page j

DB Pub Server CNN Yahoo!

Propagate ranks across links

number of outlinks from

 So the PageRank problem is really:

Repeat: x (k 1)  Ax (k)

 Outline of Our Approach

 Then, you can write any x (0)

 To get the first iterate, x (1)  Ax (0)

 First eigenvalue is 1. 1  1; 1  2  ...

 Therefore: x (1)  u1   2 2u 2  ...   n nu n

All less than 1

x  u1   2 2u 2  ...   n nu n

 The smaller 2, the faster the

“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

x ( k 3)  u1   2 32u 2   333u 3

  3 (u1   2 32u 2   333u 3 )

Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0,

Repeat: x (k 1)  Ax (k)

 Outline of Our Approach

You might also like