Professional Documents
Culture Documents
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 1 / 26
Breadth-first search (BFS)
1 2
v0 0 2 3 ∞
1 4 4
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 2 / 26
Serial BFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 3 / 26
Summary of results
8
PBFS + comp
PBFS
Serial BFS
7
Speedup
4
0
0 1 2 3 4 5 6 7 8
Processors
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 5 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 6 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3 V∞
0 2 3 ∞
V4
1 4 4
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 6 / 26
Strategy for parallelizing breadth-first search
V1 V2
∞ ∞
V0 V3
0 ∞ ∞ ∞
V4
∞ ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
∞ ∞
V0 V3
0 ∞ ∞ ∞
V4
∞ ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 ∞
V0 V3
0 ∞ ∞ ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 ∞
V0 V3
0 ∞ ∞ ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 ∞ ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 ∞ ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 3 ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 3 ∞
V4
1 ∞ ∞
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 3 ∞
V4
1 4 4
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3
0 2 3 ∞
V4
1 4 4
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Strategy for parallelizing breadth-first search
V1 V2
1 2
V0 V3 V∞
0 2 3 ∞
V4
1 4 4
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 7 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 8 / 26
Storing a layer of the graph
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 8 / 26
Storing a layer of the graph
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 9 / 26
Processing a layer
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 10 / 26
Processing a layer
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 10 / 26
Cilk++ reducers
Cilk++ supports a type of parallel data structure, called a reducer.
1 x = 10 1 x = 10 1 x = 10
2 x++ 2 x++ 2 x++
3 x += 3 3 x += 3 3 x += 3
4 x + = −2 4 x + = −2 x0 = 0
5 x += 6 5 x += 6 4 x 0 + = −2
6 x−− x0 = 0 5 x0 + = 6
7 x += 4 6 x 0 −− 6 x 0 −−
8 x += 3 7 x0 + = 4 x 00 = 0
9 x++ 8 x0 + = 3 7 x 00 + = 4
10 x + = −9 9 x 0 ++ 8 x 00 + = 3
10 x 0 + = −9 9 x 00 ++
x + = x0 10 x 00 + = −9
x + = x0
x + = x 00
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 11 / 26
Cilk++ reducers
Cilk++ supports a type of parallel data structure, called a reducer.
1 x = 10 1 x = 10 1 x = 10
2 x++ 2 x++ 2 x++
3 x += 3 3 x += 3 3 x += 3
4 x + = −2 4 x + = −2 x0 = 0
5 x += 6 5 x += 6 4 x 0 + = −2
6 x−− x0 = 0 5 x0 + = 6
7 x += 4 6 x 0 −− 6 x 0 −−
8 x += 3 7 x0 + = 4 x 00 = 0
9 x++ 8 x0 + = 3 7 x 00 + = 4
10 x + = −9 9 x 0 ++ 8 x 00 + = 3
10 x 0 + = −9 9 x 00 ++
x + = x0 10 x 00 + = −9
x + = x0
x + = x 00
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 11 / 26
Cilk++ reducers
Cilk++ supports a type of parallel data structure, called a reducer.
1 x = 10 1 x = 10 1 x = 10
2 x++ 2 x++ 2 x++
3 x += 3 3 x += 3 3 x += 3
4 x + = −2 4 x + = −2 x0 = 0
5 x += 6 5 x += 6 4 x 0 + = −2
6 x−− x0 = 0 5 x0 + = 6
7 x += 4 6 x 0 −− 6 x 0 −−
8 x += 3 7 x0 + = 4 x 00 = 0
9 x++ 8 x0 + = 3 7 x 00 + = 4
10 x + = −9 9 x 0 ++ 8 x 00 + = 3
10 x 0 + = −9 9 x 00 ++
x + = x0 10 x 00 + = −9
x + = x0
x + = x 00
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 11 / 26
Cilk++ reducers
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 12 / 26
Cilk++ reducers
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 12 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 13 / 26
The bag data structure
x
x y
y
+ =
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 13 / 26
The bag data structure — B AG -I NSERT
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 14 / 26
The bag data structure — B AG -I NSERT
+ +
=
+
=
+
=
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 15 / 26
The bag data structure — B AG -S PLIT
+
= +
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 16 / 26
The bag data structure — B AG -U NION
+ =
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 17 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 18 / 26
Empirical Results
Name |V |
PBFS T1
Description Spy Plot |E| Parallelism S ERIAL BFS T1
PBFS T1 /T8
D
Kkt_power 2.05 M
Optimal power flow, 12.76 M 104.09 0.705 6.102
nonlinear opt. 31
Freescale1 3.43 M
Circuit simulation 17.1 M 153.06 1.120 5.145
128
Cage14 1.51 M
DNA electrophoresis 27.1 M 246.35 1.060 5.442
43
Wikipedia 2.4 M
Links between 41.9 M 179.02 0.804 6.833
Wikipedia pages 460
Grid3D200 8M
3D 7-point 55.8 M 79.27 0.747 4.902
finite-diff mesh 598
RMat23 2.3 M
Scale-free 77.9 M 93.22 0.835 6.794
graph model 8
Cage15 5.15 M
DNA electrophoresis 99.2 M 675.22 1.058 5.486
50
Nlpkkt160 8.35 M
Nonlinear optimization 225.4 M 331.57 1.138 6.096
163
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 19 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 20 / 26
The dag model of computation
continuation sync
strand strand
1 2 12 16 17 18 19
spawn 13 14 15
strand
3 6 9 10 11
7 8
4 5
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 20 / 26
The dag model of computation
1 2 12 16 17 18 19
13 14 15
3 6 9 10 11
7 8
4 5
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 21 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 22 / 26
Modeling reducers
1 2 12 16 17 18 19 1 2 12 16 17 R 18 19
13 14 15 13 14 15 R
3 6 9 10 11 3 6 9 R 10 11
7 8 7 8
4 5 4 5
For a computation A:
Consider the user dag User(A) — a dag where runtime system
operations on reducers are not represented.
This dag models the computation as the user understands it.
Insert R EDUCE strands performed by the runtime system into
User(A) before the sync strand that requires its completion to get
a performance dag Perf(A).
A delay-sequence argument proves that the performance of A is
TP (A) ≤ W (Perf(A))/P + O(S(Perf(A))).
We want a performance bound in terms of User(A).
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 22 / 26
Modeling reducers
1 2 12 16 17 18 19 1 2 12 16 17 R 18 19
13 14 15 13 14 15 R
3 6 9 10 11 3 6 9 R 10 11
7 8 7 8
4 5 4 5
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 23 / 26
Outline
3 Empirical Results
4 Theoretical Results
The DAG Model of Computation
Modeling Reducers
Theoretical Analysis of PBFS
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 24 / 26
Analysis of PBFS
PBFS’s user dag has O(V + E) work and O(D lg(V /D)) span for
bounded-degree input graphs.
The worst-case cost of any B AG -U NION in PBFS is O(lg(V /D)).
Relating the user and performance dags for PBFS, we have
S(PBFS) = O(D lg2 (V /D)) and
W (PBFS) = O(V + E) + O(PD lg3 (V /D)).
Consequently, we have TP (PBFS) = O((V + E)/P + D lg3 (V /D)).
If O((V + E)/P) O(D lg3 (V /D)), then we expect linear
PBFS. Wedefine the effective parallelism of
speedup from
PBFS to be O D lgV3+E E
(V /D)
≈ O D .
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 24 / 26
Conclusion
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 25 / 26
Thank you
Questions?
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 26 / 26
Cilk++ reducers
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 27 / 26
Optimizing the bag data structure
We can improve the real-world efficiency of the bag by storing an array
of data at each node.
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 29 / 26
Relating the user and performance dags
Lemma
Consider a computation A, and let τ be the worst-case running time of
any R EDUCE or C REATE -I DENTITY operation in A. We have
S(Perf(A)) = O(τ · S(User(A))) in expectation.
Proof.
Every successful steal may force a C REATE -I DENTITY operation.
Every successful steal may force a R EDUCE operation.
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 30 / 26
Relating the user and performance dags
1 2 12 16 17 18 19 1 2 12 16 I 17 R 18 19
13 14 15 13 I 14 15 R
3 6 9 10 11 3 6 9 R 10 11
7 8 7 I 8
4 5 4 5
Proof cont’d.
Consider a critical path p in Perf(A).
This path p corresponds to some path q in User(A), which has
length at most S(User(A)).
Since at most every node in q corresponds to a steal, the length of
p is O(τ · S(User(A))) in expectation.
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 31 / 26
Relating the user and performance dags
Lemma
Consider a computation A, and let τ be the worst-case running time of
any R EDUCE or C REATE -I DENTITY operation in A. We have
W (Perf(A)) = W (User(A)) + O(τ 2 P · S(User(A))) in expectation.
Proof.
The computation A contains all of the strands in User(A).
At most O(P · S(Perf(A))) steals occur during A’s execution.
Consequently, R EDUCE and C REATE -I DENTITY strands contribute
O(τ P · S(Perf(A))) additional work to User(A).
From the previous lemma, we have
S(Perf(A)) = O(τ · S(User(A))).
Therefore, we have
W (Perf(A)) = W (User(A)) + O(τ 2 P · S(User(A))) in expectation.
C.E. Leiserson, T.B. Schardl (MIT CSAIL) A Work-Efficient Parallel BFS Algorithm SPAA ’10 32 / 26