Icdm06 Tong

Fast Random Walk with
Restart and Its Applications
Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan
1
ICDM 2006 Dec. 18-22, HongKong
Motivating Questions
• Q: How to measure the relevance?
• A: Random walk with restart
• Q: How to do it efficiently?
• A: This talk tries to answer!
2
Random walk with restart
10
9
12
2
8
1
11
3
6
5
7
3
Random walk with restart
0.04 0.03
1 Node 4
0.10 9 0 Node 1 0.13

1 Node 2 0.10
0.13 2 0.08 2 0.02 Node 3 0.13
8 Node 4 0.22
1 0.13 1 Node 5 0.13
3 1 0.04 Node 6 0.05
4 Node 7 0.05
Node 8 0.08
0.05 Node 9 0.04
6
5 Node 10 0.03
0.13
Node 11 0.04
7 Node 12 0.02
0.05
Nearby nodes, higher scores Ranking vector
More red, more relevant
4
Automatic Image Caption
•Q
…
{ Sea Sun Sky Wave} { Cat Forest Grass Tiger }

?
A: RWR!
[Pan KDD2004]
{?, ?, ?,}
5
Region
Image
Test Image
Se Su Sk Wa For Tig Gr
Cat
a n y ve est er ass
Keyword
6
Region
Image
Test Image
{Grass, Forest, Cat, Tiger}
Se Su Sk Wa For Tig Gr
Cat
a n y ve est er ass
Keyword
7
Neighborhood Formulation
…
…
Q: what is most related
conference to ICDM
A: RWR!
[Sun ICDM2005]
…
…
Conference Author 8
NF: example
9
Center-Piece Subgraph(CePS)
Q
Original Graph CePS

Black: query nodes
A: RWR! [Tong KDD 2006] 10

CePS: Example
11
Other Applications
• Content-based Image Retrieval [He]
• Personalized PageRank [Jeh], [Widom],
[Haveliwala]
• Anomaly Detection (for node; link) [Sun]
• Link Prediction [Getoor], [Jensen]
• Semi-supervised Learning [Zhu], [Zhou]
•…
12
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
13
Computing RWR
Ranking vector Adjacent matrix Restart p Starting vector
1
9 0 1
1 2 2
1 8 1
3 1
4
5 6
7
nx1 nxn nx1
14
Beyond RWR : Maxwell Equation for Web!
[Chakrabarti]
SM P-PageRank
RL in CBIR
Learning [He] [Haveliwala]
[Zhou, Zhu]
RWR PageRank
[Pan, Sun] [Haveliwala]
Fast RWR Finds the Root Solution

15
!
?
• Q: Given query i, how to solve it?
? ?
16
OntheFly:
0.04 11 0.03
0.10 99 00 11
0.13 22 0.08 2
2
1 1 88 11 0.02
3
3 0.13 11
0.04
44
5 5 660.05
0.13
77
0.05
No pre-computation/ light storage
Slow on-line response O(mE)

17
PreCompute
0.04 11
99 0 0.03
0.10 0 11
0.13 22 0.08 2
11 88 11 2 0.02
3 0.13 110.04
R: 44
5 66 0.05
0.13
77 0.05
[Haveliwala]
18
PreCompute:
0.04 11
99 0 0.03
0.10 0 11
0.13 22 0.08 2
11 88 11 2 0.02
3 0.13 110.04
44
5 66 0.05
0.13
77 0.05
Fast on-line response
Heavy pre-computation/storage cost

O(n 3 ) O(n 219
)
Q: How to Balance?
On-line
Off-line
20
Roadmap
• Background
– RWR: Algorithms
• Basic Idea
• FastRWR
– On-Line Stage
• Conclusion
21
Basic Idea
1
9 0 1
2
Find Community 1 8 1 2
3 1
4
1 5 6
9 0 1 0.04 1 0.03
7 0.10 9 0 1
2 2 0.13 0.08
8 1 2 2 0.02
1 1 8 1
3 1 3 0.13 10.04
4 4
6 5 6 0.05
5 0.13
7
7 0.05
1
9 1
2
0
2
Combine
1 8 1
3 1
4
5 6
Fix the remaining
7 22
Pre-computational stage
• Q: Efficiently compute and store Q-1
• A: A few small, instead of ONE BIG, matrices inversions
23
On-Line Query Stage
• Q: Efficiently recover one column of Q -1
• A: A few, instead of MANY, matrix-vector multiplication
24
Roadmap
• Background
– RWR: Algorithms
• Basic Idea
• FastRWR
– On-Line Stage
• Conclusion
25
Pre-compute Stage
• p1: B_Lin Decomposition

– P1.1 partition
– P1.2 low-rank approximation
• p2: Q matrices
– P2.1 computing (for each partition)
– P2.2 computing (for concept space)
26
P1.1: partition
1
9 0 1 1
2 2 9 0 1
1 8 1 2
3 2
1 1 8 1
4 3 1
5 6 4
5 6
7
7
Within-partition links cross-partition links

27
P1.1: block-diagonal
1
9 0 1
2 2
1 8 1
3 1
4
5 6
28
P1.2: LRA for
1
9 0 1
2 2
1 8 1
3 1
4
5 6
~
|S| << |W2|
29
=
30
p2.1 Computing
31
Comparing and
• Computing Time
– 100,000 nodes; 100 partitions
– Computing 100,00x is Faster!
• Storage Cost
– 100x saving!
Q1,1
1,2
= Q
Q1,k
32
~
~
~ +
+ ?
• Q: How to fix the green portions?
33
p2.2 Computing:
-1
Q1,1
_ Q
1,2
= V U
Q1,k
1
9 0 1
2 2
1 8 1
3 1
4
5 6
7
34
We have:
Communities Bridges
SM Lemma says:
35
Roadmap
• Background
– RWR: Algorithms
• Basic Idea
• FastRWR
– On-Line Stage
• Conclusion
36
On-Line Stage
•Q
+ ?
Pre-Computation Query Result
• A (SM lemma)
37
On-Line Query Stage
q1:
q2:
q3:
q4:
q5:
q6:
38
39
Roadmap
• Background
– RWR: Algorithms
• Basic Idea
• FastRWR
– On-Line Stage
• Conclusion
40
Experimental Setup
• Dataset
– DBLP/authorship
– Author-Paper
– 315k nodes
– 1,800k edges
• Approx. Quality: Relative Accuracy
• Application: Center-Piece Subgraph
41
Query Time vs. Pre-Compute Time
Log Query Time
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-computation:
•Two orders saving
Log Pre-compute Time
42
Query Time vs. Pre-Storage
Log Query Time
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-storage:
•Three orders saving
Log Storage
43
Roadmap
• Background
– RWR: Algorithms
• Basic Idea
• FastRWR
– On-Line Stage
• Conclusion
44
Conclusion
• FastRWR
– Reasonable quality preservation (90%+)
– 150x speed-up: query time
– Orders of magnitude saving: pre-compute & storage
• More in the paper

– The variant of FastRWR and theoretic justification
– Implementation details
• normalization, low-rank approximation, sparse
– More experiments
• Other datasets, other applications
45
Q&A
Thank you!
htong@cs.cmu.edu
www.cs.cmu.edu/~htong
46

Icdm06 Tong

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Icdm06 Tong

Uploaded by

Copyright:

Available Formats

Fast Random Walk with

Restart and Its Applications

Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

0.10 9 0 Node 1 0.13

{ Sea Sun Sky Wave} { Cat Forest Grass Tiger }

Original Graph CePS

A: RWR! [Tong KDD 2006] 10

Ranking vector Adjacent matrix Restart p Starting vector

Fast RWR Finds the Root Solution

• Q: Given query i, how to solve it?

No pre-computation/ light storage

Slow on-line response O(mE)

Fast on-line response

Heavy pre-computation/storage cost

• p1: B_Lin Decomposition

Within-partition links cross-partition links

• Q: How to fix the green portions?

Pre-Computation Query Result

Log Query Time

Log Pre-compute Time

Log Query Time

• More in the paper

You might also like