You are on page 1of 46

Fast Random Walk with

Restart and Its Applications

Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

1
ICDM 2006 Dec. 18-22, HongKong
Motivating Questions
• Q: How to measure the relevance?
• A: Random walk with restart

• Q: How to do it efficiently?
• A: This talk tries to answer!

2
Random walk with restart
10
9

12
2

8
1
11
3

6
5

7
3
Random walk with restart
0.04 0.03
1 Node 4

0.10 9 0 Node 1 0.13


1 Node 2 0.10
0.13 2 0.08 2 0.02 Node 3 0.13
8 Node 4 0.22
1 0.13 1 Node 5 0.13
3 1 0.04 Node 6 0.05
4 Node 7 0.05
Node 8 0.08
0.05 Node 9 0.04
6
5 Node 10 0.03
0.13
Node 11 0.04
7 Node 12 0.02
0.05
Nearby nodes, higher scores Ranking vector
More red, more relevant
4
Automatic Image Caption
•Q

{ Sea Sun Sky Wave} { Cat Forest Grass Tiger }


?

A: RWR!
[Pan KDD2004]
{?, ?, ?,}

5
Region

Image

Test Image

Se Su Sk Wa For Tig Gr
Cat
a n y ve est er ass
Keyword
6
Region

Image

Test Image
{Grass, Forest, Cat, Tiger}

Se Su Sk Wa For Tig Gr
Cat
a n y ve est er ass
Keyword
7
Neighborhood Formulation



Q: what is most related
conference to ICDM
A: RWR!
[Sun ICDM2005]

Conference Author 8
NF: example

9
Center-Piece Subgraph(CePS)
Q

Original Graph CePS


Black: query nodes

A: RWR! [Tong KDD 2006] 10


CePS: Example

11
Other Applications
• Content-based Image Retrieval [He]
• Personalized PageRank [Jeh], [Widom],
[Haveliwala]
• Anomaly Detection (for node; link) [Sun]
• Link Prediction [Getoor], [Jensen]
• Semi-supervised Learning [Zhu], [Zhou]
•…

12
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

13
Computing RWR

Ranking vector Adjacent matrix Restart p Starting vector

1
9 0 1
1 2 2
1 8 1
3 1
4
5 6

7
nx1 nxn nx1
14
Beyond RWR : Maxwell Equation for Web!
[Chakrabarti]

SM P-PageRank
RL in CBIR
Learning [He] [Haveliwala]
[Zhou, Zhu]

RWR PageRank
[Pan, Sun] [Haveliwala]

Fast RWR Finds the Root Solution


15
!
?

• Q: Given query i, how to solve it?

? ?

16
OntheFly:

0.04 11 0.03
0.10 99 00 11
0.13 22 0.08 2
2
1 1 88 11 0.02
3
3 0.13 11
0.04
44
5 5 660.05
0.13
77
0.05

No pre-computation/ light storage

Slow on-line response O(mE)


17
PreCompute

0.04 11
99 0 0.03
0.10 0 11
0.13 22 0.08 2
11 88 11 2 0.02
3 0.13 110.04
R: 44
5 66 0.05
0.13
77 0.05

[Haveliwala]
18
PreCompute:

0.04 11
99 0 0.03
0.10 0 11
0.13 22 0.08 2
11 88 11 2 0.02
3 0.13 110.04
44
5 66 0.05
0.13
77 0.05

Fast on-line response

Heavy pre-computation/storage cost


O(n 3 ) O(n 219
)
Q: How to Balance?

On-line
Off-line

20
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

21
Basic Idea
1
9 0 1
2
Find Community 1 8 1 2
3 1
4
1 5 6
9 0 1 0.04 1 0.03
7 0.10 9 0 1
2 2 0.13 0.08
8 1 2 2 0.02
1 1 8 1
3 1 3 0.13 10.04
4 4
6 5 6 0.05
5 0.13
7
7 0.05
1
9 1
2
0
2
Combine
1 8 1
3 1
4
5 6
Fix the remaining
7 22
Pre-computational stage
• Q: Efficiently compute and store Q-1
• A: A few small, instead of ONE BIG, matrices inversions

23
On-Line Query Stage
• Q: Efficiently recover one column of Q -1
• A: A few, instead of MANY, matrix-vector multiplication

24
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

25
Pre-compute Stage

• p1: B_Lin Decomposition


– P1.1 partition
– P1.2 low-rank approximation
• p2: Q matrices
– P2.1 computing (for each partition)
– P2.2 computing (for concept space)

26
P1.1: partition
1
9 0 1 1
2 2 9 0 1
1 8 1 2
3 2
1 1 8 1
4 3 1
5 6 4
5 6
7
7

Within-partition links cross-partition links


27
P1.1: block-diagonal
1
9 0 1
2 2
1 8 1
3 1
4
5 6

28
P1.2: LRA for
1
9 0 1
2 2
1 8 1
3 1
4
5 6

~
|S| << |W2|
29
=

30
p2.1 Computing

31
Comparing and
• Computing Time
– 100,000 nodes; 100 partitions
– Computing 100,00x is Faster!
• Storage Cost
– 100x saving!

Q1,1
1,2

= Q

Q1,k

32
~
~
~ +

+ ?

• Q: How to fix the green portions?

33
p2.2 Computing:

-1
Q1,1
_ Q
1,2

= V U
Q1,k

1
9 0 1
2 2
1 8 1
3 1
4
5 6

7
34
We have:

Communities Bridges

SM Lemma says:
35
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

36
On-Line Stage
•Q

+ ?

Pre-Computation Query Result

• A (SM lemma)
37
On-Line Query Stage

q1:
q2:
q3:
q4:
q5:
q6:
38
39
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

40
Experimental Setup
• Dataset
– DBLP/authorship
– Author-Paper
– 315k nodes
– 1,800k edges
• Approx. Quality: Relative Accuracy
• Application: Center-Piece Subgraph

41
Query Time vs. Pre-Compute Time

Log Query Time

•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-computation:
•Two orders saving

Log Pre-compute Time

42
Query Time vs. Pre-Storage

Log Query Time

•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-storage:
•Three orders saving

Log Storage

43
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion

44
Conclusion
• FastRWR
– Reasonable quality preservation (90%+)
– 150x speed-up: query time
– Orders of magnitude saving: pre-compute & storage

• More in the paper


– The variant of FastRWR and theoretic justification
– Implementation details
• normalization, low-rank approximation, sparse
– More experiments
• Other datasets, other applications

45
Q&A

Thank you!

htong@cs.cmu.edu
www.cs.cmu.edu/~htong

46

You might also like