Professional Documents
Culture Documents
Ulrich Meyer
Max-Planck-Institut für Informatik (MPII),
Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany,
www.uli-meyer.de
Abstract has finite, nonnegative derivative at . This is also true for
our algorithm.
We study the average-case complexity of the parallel The parallel random access machine (PRAM) [19] is one
single-source shortest-path (SSSP) problem, assuming arbi- of the most widely studied abstract models of a parallel com-
trary directed graphs with nodes, edges, and indepen-
puter. A PRAM consists of J independent processors (pro-
dent random edge weights uniformly distributed in . We cessing units, PUs) and a shared memory, which these pro-
provide a new bucket-based parallel SSSP algorithm that cessors can synchronously access in unit time. The strongest
runs in
!#"%$ '&)(+* ,-.*0/21 average-case model (CRCW) supports concurrent read- and write-access to
time using 3( ( 1 work on a PRAM where & denotes single memory cells. The performance of PRAM algorithms
the maximum shortest-path weight and * ,-.* is the number of is usually described by the two parameters time (assuming
graph vertices with in-degree at least $ . All previous algo- an unlimited number of available PUs) and work (the total
rithms either required more time or more work. The minimum number of operations needed). Even though the strict PRAM
performance gain is a logarithmic factor improvement; on model is only implemented on a number of experimental par-
certain graph classes, accelerations by factors of more than allel machines like the SB-PRAM [13], it is valuable to high-
5476 8 can be achieved. The algorithm allows adaptation to light the main ideas of a parallel algorithm without tedious
distributed memory machines, too. details caused by a particular architecture. Other models like
BSP [30] view a parallel computer as a collection of sequen-
tial processors, each one having its own local memory, so
1 Introduction called distributed memory machines (DMMs). The PUs are
The single-source shortest-path problem (SSSP) is a funda- interconnected by a network that allows them to communi-
mental and well-studied combinatorial optimization problem cate by sending and receiving messages. In order to facili-
with
<; many practical and theoretical applications [1]. Let 9: tate easy exposition we focus on the PRAM model and only
, 1 be a directed graph with * ,* nodes and * ; * = sketch how our SSSP algorithm can be converted to DMMs.
edges, let > be a distinguished vertex of the graph, and ? be a
function assigning a nonnegative real-valued weight to each 1.1 Previous Work.
edge of 9 . The objective of the SSSP is to compute, for each The classical sequential SSSP result is Dijkstra’s algorithm
vertex @ reachable from > , the weight of a minimum-weight [11]; implemented with Fibonacci heaps it solves SSSP on
(“shortest distance”) path from > to @ , denoted by A CBED F> @ 1 , arbitrary directed graphs with nonnegative edge weights in
abbreviated A
E
B D 1
G@ ; the weight of a path is the sum of the K L M' N( 1 time. A number of faster algorithms have
weights of its edges. been developed on the more powerful RAM (random access
Assuming independent random edge weights is a standard machine) model, see [29] for an overview. In particular,
setting for the average-case analysis of graph algorithms; see Thorup [29] has given the first ( 1 worst-case time
[14] for many examples. The uniform edge weight distribu- RAM algorithm for undirected graphs with integer or float
tion is mostly chosen in order to keep the proofs simple. Fre- edge weights. The average-case analysis of shortest-path
quently, the obtained results also hold asymptotically in the algorithms mainly focused on the All–Pairs Shortest Paths
more general situation of random edge weights that are inde- (APSP) problem for the complete graph with random edge
pendent, bounded, and their common distribution function H weights. Recently, the first linear ( 1 average-case
time algorithms for arbitrary directed graphs with random
I edge weights have been given [16, 24].
Partially supported by the Future and Emerging Technologies pro-
gramme of the EU under contract number IST-1999-14186 (ALCOM-FT) So far there is no parallel O N( 1 work PRAM
and the Center of Excellence programme of the EU under contract number
ICAI-CT-2000-70025. Parts of this work were done while the author was
SSSP algorithm with worst-case sublinear running time for
arbitrary digraphs with nonnegative edge weights. The
K M' O( 1 work solution by Driscoll et. al. [12] has
visiting the Computer and Automation Research Institute of the Hungarian
Academy of Sciences, Center of Excellence, MTA SZTAKI, Budapest.
1
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02)
1530-2075/02 $17.00 © 2002 IEEE
running time P Q1 . An K Q1 time algorithm re- 2 Preliminaries
quiring KG Q1 work was presented by Brodal et. al. The classical sequential SSSP approach is Dijkstra’s algo-
[7]. All faster known algorithms require more work, e.g., rithm [11]. It maintains a partition of the node set , into
the approach by Han et. al. [18] needs Q1 time and settled, queued, and unreached nodes. For each node @ it
K SRT UM' WV Q1<X<Y.RZ1 work. The algorithm of Klein keeps a tentative distance D<7vD @ 1 . If @ is unreached, then
and Subramanian [20] takes F[ \ & ] M'^ Q1 time D<7vD @ 1 ; otherwise, D<7vD @ 1 refers to the weight of the
and K_[ ` & M' Q1 work where & is the maximum lightest path from > to @ found so far. Hence, D<7vD @ 1
shortest-path weight, i.e. & `a%bcZdegf hi jlknmMcpo_q5r A BED G@ 1 . A BED G@ 1 . Settled nodes satisfy D.7vD G@ 1 A CB#D @ 1 . Initially,
Similar results have been obtained by Cohen [9] and Shi > is queued ( > ), D<7vD y> 1 , and all other nodes
and Spencer [28]. Most of these algorithms can be modi- are unreached. In each iteration, the queued node @ with
fied to run on the weakest PRAM model without concurrent smallest tentative distance is< scanned: @ is removed from
1 D<7vD 1
1 D<'GD @ 1( are relaxed,
read/write capability. the queue, and all edges i.e.,
S
" <
D 7
v
D E 1p/
Parallel shortest path problems on random graphs [6] is set to G@ ?@ . If was un-
where each of the possible edges is present with a certain reached ( ), it is now queued. It is well known that
probability have been studied intensively [8, 10, 15, 17, 25, D<7vD @ 1 A CB#D @ 1 , when @ is selected from the queue as
26, 27]. Under the assumption of independent random
edge the node with smallest tentative distance. Hence, @ is set-
weights uniformly distributed in the interval the fastest tled and will never re-enter the queue. Therefore, Dijkstra’s
work-efficient parallel SSSP algorithm for random graphs approach is a so called label-setting method. Alternatively,
[25, 26] requires Q1 time 1u and Llinear work on aver- label-correcting variants may scan nodes from the queue for
age; additionally, Gs &t( ( v Q1 time and which D<vD @ 1| A BED G@ 1 and hence have to rescan those
K ( ( s &t( 1\ Q1 work on average is suffi- nodes until they are finally settled.
cient for arbitrary directed graphs with random edge weights Label-correcting SSSP algorithms are natural candidates
where s denotes the maximum node degree in the graph and for parallelization: several queued nodes may be scanned
& is defined as above.
concurrently in one round. However, finding both provably
For arbitrary graphs with large maximum degree s , the al- good criteria to select nodes for scanning and data struc-
gorithms of [25, 26] perform poorly: wxys 1 time is needed. If tures that efficiently support these strategies remains a dif-
the number of high-degree nodes is rather small, then the run- ficult task.
ning time can be considerably improved [23]: let * ,-z* denote
the number of graph vertices with in-degree at least $ then
R
SSSP can be solved in 5{PPKGM' `z "%$ <&L(|* , *}/%1
2.1 Buckets of Fixed Width.
time on average. However, the algorithm needs non-linear The sequential SSSP algorithm of [25], called -stepping,
K ~ M' K( ( { 1 operations. and its parallelizations [25, 26] are label-correcting ap-
proaches that work in phases: if denotes the smallest ten-
1.2 New Result. tative distance in the queue data structure at the beginning
of a phase, then they scan queued nodes @ with tentative dis-
We provide an improved parallel SSSP algorithm that applies tance D<'D @ 1u ( in parallel. The parameter is called
different step-widths on disjoint node subsets at the same time the step-width. The queue is implemented by a linear ar-
y
and utilizes a new split-free bucket data structure. It achieves ray of buckets such that a queued ( @ is 1kept in
average-case running time = U-E"2$ 0&](N* ,-p*}/%1 node
F 7 7 1 . Let
using ( ( 1 operations on a CRCW PRAM, where
for tentative distances in the range
& and * ,-.* are defined as in Section 1.1. denote the largest bucket index such that ¡
are empty. Then a phase scans all nodes from the current
For sparse graphs, this means a logarithmic factor im- bucket cur ¢ in parallel but first of all only relaxes
provement on both running time and work bound compared the light edges (having weight at most ) emanating from
to the superlinear work algorithm from [23]. Furthermore, these nodes. Any node @ that is scanned from cur while
the node degrees of many huge but sparse graphs with small D<7vD @ 1O A BED G@ 1 is eventually reinserted into
cur . Non-
diameter (e.g., WWW, telephone call graphs) follow a power light edges out of @ are only relaxed after cur finally remains
law, i.e., the number of nodes with in-degree is proportional empty. By then, @ is surely settled; its non-light edges are re-
$
Q! for some constant
typically ranging between and
to laxed using the final distance value for @ . When cur becomes
. For thesem graphs,
o the new approach is faster by a factor of empty after a phase, then the algorithm sequentially searches
nearly X<Y '
! as compared to the best previous algorithm for next nonempty current bucket. Testing buckets takes
£ the
with linear average-case work ([25, 26]). y ( M' Q1 parallel time; thus, for maximum shortest-path
The rest of the paper is organized as follows: in Section 2, weight & , at least ¤n &( 1.V 3¥ time is required. In order
we shortly review basic facts and techniques for average- to obtain a reasonable parallel time bound, should not be
case efficient parallel and sequential shortest-path algorithms. chosen too small. On the other hand, must not be taken too
Then, in Section 3, we present our new parallel SSSP algo- large in order to avoid a high work bound due to numerous
rithm. Finally, in Section 4, we give some concluding re- node rescans.
marks.
h
< M
m
c
G o ¶
degree-weight
balanced
.
(dwb) if
concatenation of J and G@ @ 1 . As required, J is a sim-
?@ º X @ _1 ¦ $ ûµ0ª «<¬ ¬
, for all , ¦
ple dwb path of at most \¡ edges where the nodes are
Lemma 2 For each node @ , , the number of rescans dur- scanned in proper order and the equations for the tentative
distances hold. Furthermore, J is different from any other
path J ' ()J ' @ constructed for some þG@ ' ' 1 where
ing the execution of PIS-SP is bounded by the number of sim-
ple dwb paths into @ .
¦ ' \
' ¦ : when
constructing
J ' we either consid-
Proof: Let D.7vD · G@ 1 and denote the value of D<'D @ 1 and ered another edge G@ @ 1 :G@ @ 1 , but then the subpaths J
at the beginning of phase , respectively;
·
the set of nodes in and J ' end in different
cZdÁ D.7vD · G@ 1 . Clearly, for nonnegative nodes; or we
considered
different res-
define · ³ cans þ@ !¡ 1 and þ@ ' ¡ 1 of the same node @ ,
During the -th phase of the B -chunk, PIS-SP scans all
Lemma
!7
3 For random edge weights uniformly
drawn from from the current buckets G Û y· A $'1<V á , ¦
, PIS-SP rescans each node at most K 1 times on the
nodes
¦ . Let H · denote the set of these
holds if Û y X ÂA $ 1.V á å ¿#F ÂA $1<V À
average. nodes. Observe that (1)
c,+ ¦ ¦¢ , i.e., PIS-SP
Let * denote the number of simple dwb paths of
for at least one , _7 7
has advanced
Proof: at least one current bucket from X B . accord-
edges c + into an arbitrary node @ 4 of a graph 9 . We first
¦ $ . The argument is by induction: let s
show - *
ing to the cyclical ordering. F 7
So, let us assume that the cur-
rent buckets from X B in phase are those of
denote the in-degree of node @ . Excluding self-loops there c + is
phase . But then there must be at least one node @ IH
a set . X of at + most s 4 edges into + node @ 4 , hence - * X ¦ with D<vD @ 1t X ( F
s 4 Z$ ûµ0ª «.¬
¶
¦|s 4 Z$ Ùª «<¬ 7 7X V$ .
7B . In particular, there must be
a simple path J J G@ X @ of total weight less than
Now consider the set / "
0 X 02/1 of all simplec
paths
+ B $ B X where @ H , and all @ , ¦ ¦K ,
· $ B
with354 d7 69edges
87: 0- is dwb ¦ $ .
into node @ 4 in 9 . By assumption - * have in-degree less
$ 7
B M m Xthan
o
·
. Into any node @
·
LH there
For each 0- G@ 7 7 @ 4 ;/ there are0´ at most
are less than such paths. As shown in [25], the
sum of independent
random edge weights (uniformly dis-
s edges into @ so that the concatenation with results ) is at most ¢¦
V !N . Hence, the probability that such
tributed in with probability at most
in a simple path of ( edges 7 7 into @ 4 . In particu- m ao path exists into
node @ º OH is bounded by $ BPM X B X V #U¡
lar, for each such path @ @ @ 4 , the weight of the 1 any
·F1 û$ X ¦ )<Q 8 . That proves (1). Therefore, after ¤ &uV F¥B
Ný¦
newly 7attached 7 7 edge G
@ @ is independent
7 7 7 of the weights
on @ ¶ @ 4< . Therefore, : G@ @
·
7
7
7
· @ 4< is dwb ¦ B -chunks the current buckets have been advanced so much
$ ûµ0ª «.¬ :
c º + @ @ 4< is dwb . By linearity of ex- that no node remains in the bucket structure with probability
pectation, - * X is bounded from above by at least ¡|¤ &uV ¥B # 9<Q 8 ¡ 9<Q X . This accounts for
= at most another $ B 7& M' Q1 B -phases with probability
Z$ Uµ0ª «<¬ ¶ : 0 >
@
7 7 7
@ 4< is dwb
9<Q X ; as PIS-SP requires at most phases in
4 _7d 6 8 s
at least ¡
B
the worst case, it needs at most $ _&N (~ _ 9<Q X
1 =
= c
+ $ B 7
& Q
1 additional B -phases on the average.
V$\ : 0 V$]
- * ¦ $ X
K M'
¦ 4 _ 7d 6 8 is dwb ¦ Altogether the algorithm runs in E $ B <&~()* , B * 1 Q1
c,+
phases on the average. Since this analysis holds for any inte-
By Lemma 2, @ 4 is rescanned at most * times, there- 3 ger , the average-case bound for all phases can be restated
·@? X of· @ is at most as K -E"2$ %& (å* ,!z*}/T Q1 . Note once more that the
3 +
-X * ¦ 3 X $ !·\ .
fore the c
average-case number of rescans 4 algorithm itself does not have to find an optimal compromise
·@? · ·@? between B -phases and . B -phases; they are just a theoretical
concept of the analysis.