Professional Documents
Culture Documents
Fast Random Walk With Restart and Its Applications: Hanghang Tong
Fast Random Walk With Restart and Its Applications: Hanghang Tong
ICDM 2006
Motivating Questions
Q: How to measure the relevance?
A: Random walk with restart
Q: How to do it efficiently?
A: This talk tries to answer!
10
12
2
8
11
3
4
6
0.10
2
0.13
1
0.13
0.03
10
12
0.08
11
0.04
4
0.13
5
7
Node 4
0.05
0.02
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
0.05
Ranking vector
r
r4
{ Cat
Forest
Grass
Tiger }
?
A: RWR!
{?, ?, ?,}
[Pan KDD2004]
5
Region
Image
Test Image
Sea
Sun
Sky
Wave
Cat
Forest
Keyword
Tiger
Grass
6
Region
Image
Test Image
{Grass, Forest,
Cat, Tiger}
Sea
Sun
Sky
Wave
Cat
Keyword
Forest
Tiger
Grass
7
Neighborhood Formulation
Conference
Author
NF: example
Center-Piece Subgraph(CePS)
Q
Original Graph
Black: query nodes
CePS
10
CePS: Example
11
Other Applications
Content-based Image Retrieval [He]
Personalized PageRank [Jeh], [Widom],
[Haveliwala]
Anomaly Detection (for node; link) [Sun]
Link Prediction [Getoor], [Jensen]
Semi-supervised Learning [Zhu], [Zhou]
12
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
13
Computing RWR
r
r
r
%
ri cWri (1 c ) ei
0.13
0.10
0.13
0.22
0.13
0.05
0.05 0.9
0.08
0.04
0.03
0.04
0.02
nx1
Restart p
Adjacent matrix
Ranking vector
0
0
0
0
0
0
0
0
0
1/4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0 1/4
0 0 0 0
0 0 0 1/4
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1/2 0 1/3 0
0 1/3 0 0
0 1/3 0 1/2
0 1/3 1/3 0
0
0
0
0
0
0
0
0
nxn
0
0
0
0
0
0
0
0
0.13
0.10
0.13
0.22
0.13
0.05
0.1
0.05
0.08
0.04
0.03
0.04
0.02
0
0
1
0
0
0
0
0
0
0
Starting vector
9
2
1
10
12
11
4
5
6
7
nx1
14
[Chakrabarti]
SM Learning
RL in CBIR
[Zhou, Zhu]
[He]
P-PageRank
[Haveliwala]
RWR
PageRank
[Pan, Sun]
[Haveliwala]
15
0 0
0
0.9
0
0 0
0 1/3 0
0 0
0
0
0 0
0 0
0
0 0
0
0 1/4 0 1/2 0
0 1/4 1/2 0 0
0 1/4 0 0 0
0 0 0 0 1/4
0
0
0 0 0 0
0 0 0 1/4
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1/2 0 1/3 0
0 1/3 0 0
0 1/3 1/3 0
0 0 0
0.1
16
OntheFly:
0.13
0.12
0.3
0.14
0.16
0.19
0.18
0.09
0.13
0
0.10
0.16
0.13
0.12
0.3
0.19
0.14
0.35
0.1
0.22
0.18
0.26
0.21
0.13
0.3
0.18
0.10
0.15
0.03
0.05
0.07
0
0.04
0.06
0.9
0
0.04
0.06
0.07
0.05
0.07
0.06
0.08
0
0.07
0.04
0
0
0.01
0.02
00
0
0.01
0.03
0.04
0
0
0.01
0.02
0.01
0.02
0
0
0
r
ri
0
1/3
1/3
1/3
0
0
0
0
0
0
0
0
1/3
0
1/3
0
0
0
0
1/3
0
0
0
0
1/3
1/3
0
1/3
0
0
0
0
0
0
0
0
1/3
0
1/3
0
1/3
0
0
0
0
0
0
0
0 0 0 0
0 0 0 1/4
0 0 0 0
1/4 0 0 0
0 1/2 1/2 1/4
1/4 0 1/2 0
1/4 1/2 0 0
1/4 0 0 0
0 0 0 1/4
0 0 0 0
0 0 0 1/4
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1/2 0 1/3 0
0 1/3 0 0
0 1/3 1/3 0
0
0.13
0
0.10
0.13
1
0.22
0.13
0.05
0
0.1
0
0.05
0
0.08
0.04
0
0.03
0.04
0
0.02
0
r
ri
0.04
0.10
10 0.03
9
10
9
12
12
0.08
0.02
88 11
11
22
1 1
3 30.13
44
5 5 660.05
0.13
77
0.13
0.04
0.05
O(mE)
17
PreCompute
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12
0.20 0.13
0.28 0.20
0.39 0.34
0.56
0.56
0.63 0.44
0.53
0.53
0.85
0.14 0.13
0.20 1.29
0.68
0.56
0.56
0.63
0.13 0.10
0.09 0.09
0.78
0.78
0.61
0.09 1.27
2.41
1.97
1.97 1.05
0.03 0.04
2.06
1.37
0.43
0.03 0.04
0.08 0.11
1.37
2.06
0.43
0.86
0.86
2.13
0.03 0.04
0.30
0.30
0.74
0.04 0.04
0.04 0.05
0.36
0.40
0.36
0.40
0.89
1.00
0.22
0.22
0.56
R:
0.02 0.03
0.35
0.04
0.13
11
99 10
10 0.03
1212
0.08
88
0.02
11
11
0.10
22
44
3 0.13
5
0.13
0.04
66
77
0.05
0.05
[Haveliwala]
18
PreCompute:
0.13
2.20
0.10
1.28
0.13
1.43
0.22
1.29
0.91
0.13
0.05
0.37
0.05
0.37
0.08
0.84
0.04
0.29
0.03
0.35
0.04
0.39
0.02
0.22
1.28
1.43
1.29 0.68
2.02
1.28
2.20
0.96
0.43
0.34
0.86
0.91
1.27
2.41
1.97
1.97 1.05
0.73
0.58
0.35
0.37
0.52 0.98
2.06
1.37 0.43
0.30
0.24
0.35
0.37
0.52 0.98
1.37
0.24
1.14
0.84
0.82 1.05
0.86
0.86 2.13
1.49
1.19
0.40
0.29
0.28 0.36
0.30
0.30 0.74
1.78
1.00
0.48
0.53
0.35
0.39
0.34 0.44
0.38 0.49
0.36
0.40
0.36 0.89
0.40 1.00
1.50
1.14
2.45
1.54
0.30
0.22
0.21 0.28
0.22
0.22 0.56
0.79
1.20
1.29
0.56
0.63
0.44
0.35
0.60
0.48
0.68 0.56
0.56
0.56
0.39 0.34
0.53 0.45
0.39 0.33
0.38 0.32
0.66 0.56
0.27 0.22
0.27 0.22
1.33 1.13
0.76 0.79
1.54 1.80
2.28 1.72
1.14 2.05
0.04
0.13
11
99 10
10 0.03
1212
0.08
88
0.02
11
11
0.10
22
44
3 0.13
5
0.13
0.04
66
77
0.05
0.05
19
Q: How to Balance?
Off-line
On-line
20
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
21
Basic Idea
Find Community
4
9
2
1
9
8
12
11
10
10
0.04
12
0.13
11
0.10
2
4
4
5
0.08
0.13
2
4
9
8
3
5
10
12
0.13
10
11
0.03
12
0.02
0.04
6
7
0.05
0.05
Combine
11
6
7
22
Pre-computational stage
Q: Efficiently compute and store Q-1
A: A few small, instead of ONE BIG, matrices inversions
23
ei
r
ri
0
0
0
1
0
0
0
0
0
0
0
0
24
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
25
Pre-compute Stage
p1: B_Lin Decomposition
P1.1 partition
P1.2 low-rank approximation
p2: Q matrices
P2.1 computing
P2.2 computing
1
1 (for
each partition)
% (for concept space)
26
P1.1: partition
9
2
1
10
12
11
4
5
6
7
9
2
1
10
12
11
4
5
6
7
Within-partition links
cross-partition links
27
P1.1:
9
2
1
block-diagonal
10
12
11
4
5
6
7
28
10
12
11
4
5
6
7
~
|S| << |W2|
29
+
30
p2.1 Computing
31
Comparing
and
Q11
Computing Time
100,000 nodes; 100 partitions
Computing Q11 100,00x is Faster!
Storage Cost
100x saving!
1
1
Q1,1
Q1,2
Q1,k
32
~
~
Q11
p2.2 Computing:
Q1,1
-1
Q1,2
U
Q1,k
9
1
10
12
11
4
5
6
7
34
We have:
Communities
Bridges
SM Lemma says:
1
1
1
1
%
Q Q1 cQ1 U VQ1
35
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
36
On-Line Stage
r
ri
ei
Pre-Computation
0
0
0
1
0
0
0
0
0
0
0
0
Query
?
Result
A (SM lemma)
37
q1:
q2:
q3:
q4:
q5:
q6:
38
39
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
40
Experimental Setup
Dataset
DBLP/authorship
Author-Paper
315k nodes
1,800k edges
Quality: 90%+
On-line:
Up to 150x speedup
Pre-computation:
Two orders saving
Quality: 90%+
On-line:
Up to 150x speedup
Pre-storage:
Three orders saving
Log Storage
43
Roadmap
Background
RWR: Definitions
RWR: Algorithms
Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage
Experimental Results
Conclusion
44
Conclusion
FastRWR
Reasonable quality preservation (90%+)
150x speed-up: query time
Orders of magnitude saving: pre-compute & storage
More experiments
Other datasets, other applications
45
Q&A
Thank you!
htong@cs.cmu.edu
www.cs.cmu.edu/~htong
46