Dijkstra's algorithm is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms. In this project we investigate the parallelization of this algorithm and its speedup against the sequential one. Message Passing Interface (MPI) is used for this purpose, where each processor has its own private memory and can only access its own memory.
This work has been done in cooperation with Salvator Gkalea, FPGA Hardware Engineer.

© All Rights Reserved

1.6K views

Dijkstra's algorithm is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms. In this project we investigate the parallelization of this algorithm and its speedup against the sequential one. Message Passing Interface (MPI) is used for this purpose, where each processor has its own private memory and can only access its own memory.
This work has been done in cooperation with Salvator Gkalea, FPGA Hardware Engineer.

© All Rights Reserved

- Partial Differential Equation of Parabolic Type
- KTH, DN2221, Computer Lab 2
- Krylov space methods
- Model of a Heat Exchanger ( Hyperbolic PDE)
- Numerical Solution of a Boundary Value Problem
- Graphs Shortest Path
- Programming.language.design.concepts
- Numerical Solution of Initial Value Problems
- IJCER (www.ijceronline.com) International Journal of computational Engineering research
- Capp 01
- GUITAR FINGERING FOR MUSIC PERFORMANCE.pdf
- dayton1323275800.pdf
- Accuracy Evaluation of Typical, Infinite Bandwidth, Minimum Time
- Sc0x w3l2 Clean
- ZhuRamakrishnanHamannNeffComputerAnimationVirtualWorlds2012PaperRevisionAsSubmitted06232012
- Martinez Institue of Basic Education Route Finder System.docx
- sda
- 4_ Networks Model and Shortest Path
- Shortest Path Problem
- topology MIS

You are on page 1of 14

Dijkstra's algorithm

Center for High-Performance Computing & KTH School of Computer

Science and Communication

Salvator Gkalea

Department ICT/ES

KTH

salvator@kth.se

Date: October 2014

Dionysios Zelios

Department of Physics

Stockholm University

dionisis.zel@gmail.com

Contents

Abstract ..................................................................................................................................... 3

Introduction ............................................................................................................................... 3

Dijkstra's Parallelization ............................................................................................................ 5

Simulation results and analysis ................................................................................................. 8

References ............................................................................................................................... 14

Abstract

Dijkstra's algorithm is a graph search algorithm that solves the singlesource shortest path problem for a graph with non-negative edge path

costs, producing a shortest path tree. This algorithm is often used in

routing and as a subroutine in other graph algorithms. In this project we

investigate the parallelization of this algorithm and its speedup against

the sequential one. Message Passing Interface (MPI) is used for this

purpose, in order to parallelize the source-shortest path algorithm.

Introduction

Dijkstras algorithm determines the shortest path from a single source

vertex to every other vertex. It can also be used for finding costs of

shortest paths from a single vertex to a single destination vertex by

stopping the algorithm once the shortest path to the destination vertex

has been determined. For instance, if the vertices of the graph represent

cities and edge path costs represent driving distances between pairs of

cities connected by a direct road, Dijkstra's algorithm can be used to find

the shortest route between one city and all other cities.

A pseudo-code representation of the Dijkstra's sequential single-source

shortest-paths Algorithm is given. The pseudo-algorithm proceeds as

follows:

1. function Dijkstra(V, E, w, s)

2. begin

3.

VT := {s};

4.

for all v exists in (V - VT) do

5.

if (s, v) exists

set l[v] := w(s, v);

6.

else

set l[v] := infinitive;

end for

7.

while VT V do

8.

begin

9.

find a vertex u such that l[u] :=

min{l[v]|v exists in (V - VT)};

10.

VT := VT joint {u};

11.

for all v exists in (V - VT) do

12.

l[v] := min{l[v], l[u] + w(u, v)};

13.

end while

14. end Dijkstra

vertices, E = edges, w = weight. The pseudo-code above represents the

procedure where the shortest path, from vertex s to every other vertex

v, is computed. For every vertex v that exists in (V - VT), the algorithm

stores the minimum cost to reach vertex v from vertex s. In line 6-7 the

procedure stores every weight of each vertex to the l[] array. In the rest

of the lines, the algorithm finds the closest vertex u to vertex s , based

on the weight, among all those not yet examined and then updates the

minimum distance ( l[] array ).

The main parallelization idea to approach this problem is to

assume that every process can discover its node that is closest to the

source. Then, it is supposed to make a reduction in order to distinguish

the closest node and upon success it will broadcast the selection of this

node to all the other processes. At the end each process will update

locally its distant array ( l[] array ).

The parallel computing can be achieved with some basic MPI

communication functions. These functions are described below and they

are going to be used to parallelize the single-source shortest-paths

Algorithm.

MPI_Init

MPI_Finalize

End the parallel environment, release system

resources

MPI_Comm_size Returns the number of parallel computing units

MPI_Comm_rank Returns the current calculation unit identification

nimber

MPI_Reduce

Reduces values on all processes to a single value

MPI_Bcast

Broadcasts a message from the process with rank

"root" to all other processes of the communicator

MPI_Gather

Gathers together values from a group of processes

Dijkstra's Parallelization

The pseudo-code mentioned above can be parallelized based on the line

9 in which the minimum distance among all the nodes is computed or

based on the line 11 where the algorithm calculates the minimum

overall distance and stores it to the local minimum distance array.

The classic approach is to break down the data that need to be

computed into segments. Then each node is responsible to process and

compute a segment of data. At the end all the results from all the

segments must be gathered at a node. The C code above demonstrates

the logic mentioned above.

#include <stdio.h>

#include <mpi.h>

int numV,

//number of vertices

*todo, //vertices to be analysed

numNodes, //number of nodes

chunk, //number of vertices handled by every node

start, end, //start, end point vertex for each node

myNode; //ID of node

unsigned maxInt,

localMin[2], // [0]: min for local node, [1]: vertex

for that min

[1]: vertex for that min

*graph, //hops between vertices

*minDistance; // min distance

double T1, T2; // start, end time

// Generation of the Graph

void init(int ac, char **av) {

int i, j;

unsigned u, tmp;

numV = atoi(av[1]);

MPI_Init(&ac, &av);

MPI_Comm_size(MPI_COMM_WORLD, &numNodes);

MPI_Comm_rank(MPI_COMM_WORLD, &myNode);

chunk = numV / numNodes;

start = myNode * chunk;

end = start + chunk - 1;

maxInt = -1 >> 1;

graph = malloc(chunk * numV * sizeof(int));

minDistance = malloc(numV * sizeof(int));

todo = malloc(numV * sizeof(int));

for (i = 0; i < chunk; i++) {

for (j = i; j < numV; j++) {

if (j == i)

graph[i][j] = 0;

else {

graph[i][j] = rand(1) % 21 + 1;

graph[j][i] = graph[i][j];

}

}

}

for (i = 0; i < numV; i++) {

todo[i] = 1;

minDistance[i] = maxInt;

}

minDistance[0] = 0;

}

int main(int ac, char **av) {

int i,j,k;

//Initialization process

init(ac, av);

if (myNode == 0)

T1 = MPI_Wtime();

for (step = 0; step < numV; step++) {

// Find local minimum distance

if (todo[i] && minDistance[i] < maxInt)

{

localMin[0] =

minDistance[i];

localMin[1] = i;

}

}

MPI_Reduce(localMin,globalMin,1,MPI_2INT,MPI_MINLOC,0,MPI_COMM_

WORLD);

MPI_Bcast(globalMin,1,MPI_2INT,0,MPI_COMM_WORLD);

//check that vertex as passed

todo[globalMin[1]] = 0;

//Update the local min distance array

for (j = 0; j < chunk; j++) {

if (globalMin[0] + graph[globalMin[1] *

chunk + j] < minDistance[j + (myNode * chunk)]){

minDistance[j + (myNode *

chunk)] = globalMin[0] + graph[globalMin[1] * chunk + j];

}

}

}

MPI_Gather(minDistance + start, chunk, MPI_INT, minDistance,

chunk,

MPI_INT, 0, MPI_COMM_WORLD);

if (myNode == 0)

T2 = MPI_Wtime();

if (0 && myNode == 0)

for (k = 1; k < numV; i++) {

printf("node[%d] = %u\n", i, minDistance[i]);

}

if (myNode == 0)

printf("elapsed: %f\n", (float) (T2 - T1));

MPI_Finalize();

}

broadcast (MPI_Bcast) the global minimum distance that has been

computed by all the other nodes. The other nodes are acting as

receivers, waiting for the global minimum distance from the node 0.

As it was mentioned before, each node is responsible to process a

segment of the whole data. The results from all the nodes are gathered

(MPI_Gather) to the node 0 and are been checked with the local results

in order to be stored or rejected (MPI_Reduce).

The experiments have been conducted on Milner with the Intel

compiler.

Time measurements

First, we run our algorithm for different values of meshes. Each time we

vary the number of cores and we get as a result the time needed for the

algorithm to run.

The measurements are shown in the table below and were executed as

aprun n Nodes ./Dijkstra.out mesh, where Nodes=132 and mesh=the

size of the graph (ex. Mesh=4*106 means the number of vertices)

mesh= 4*106, Nodes=1..32

Nodes 1

Time 0.009083

(sec)

2

0.010213

4

0.007118

8

0.008694

16

0.015567

32

0.034436

2

0.150809

4

0.092254

8

0.062077

16

0.105193

32

0.1638634

2

0.597349

4

0.352130

8

0.202681

16

0.189242

32

0.200911

2

2.077290

4

1.213774

8

0.874406

16

0.85756

32

1.056482

mesh= 64*106

Nodes

Time

(sec)

1

0.14132

mesh= 256*106

Nodes

Time

(sec)

1

0.564806

mesh=900*106

Nodes

Time

(sec)

1

1.984156

increases until it reaches the smallest value, then the running time will

increase because of the communication latency.

However, for middle size network the phenomenon of a reducing

running time is not that obvious. For a small size network, the running

communication latency outperforms the benefit from using more cores.

Speed up measurements

In our next step, we want to investigate the speed up of our algorithm.

In order to do that, we divided the base time in one node with time

needed for multiple nodes to be executed. The measurements and the

corresponding plot are given below:

mesh= 4*106

Nodes 1

Time 1

(sec)

2

0.889356

4

1.276060

8

1.044743

16

0.583477

32

0.263764

2

0,9370793

4

1,531857

8

2,27652

16

1,3434353

32

0,862425

2

0,945520

4

1,60397

8

2,78667

16

2,98457

32

2,81122

2

0,9551

4

1,63469

8

2,269147

16

2,3137

32

1,8780

mesh= 64*106

Nodes

Time

(sec)

1

1

mesh= 256*106

Nodes

Time

(sec)

1

1

mesh=900*106

Nodes

Time

(sec)

1

1

reaches the maximum value, then the speed up is decreasing.

This is happening because more cores are used. The speed up is

decreasing because the communication latency outperforms the benefit

from using more cores.

As the network size increases, the number of cores used to get the

maximum speed up increases.

Cost measurements

number of cores that we have used. In order to do that, we multiplied

the number of cores with the execution time in one core. The

measurements and the corresponding diagram are presented below:

mesh= 4*106

Nodes 1

Time 1,87807

(sec)

2

0,020426

4

0,028472

8

0,069552

16

0,249072

32

1,101952

2

0,301618

4

0,369016

8

0,496616

16

1,683088

32

5,2436288

2

1,194698

4

1,40852

8

1,621448

16

3,027872

32

6,429152

2

4,15458

4

4,855096

8

6,995248

16

13,72096

32

36,97687

mesh= 64*106

Nodes

Time

(sec)

1

0,14132

mesh= 256*106

Nodes

Time

(sec)

1

0,564806

mesh=900*106

Nodes

Time

(sec)

1

1,984156

The cost is increasing because the speed up (or the benefit of a reduced

running time) cannot outperform the cost of using more cores.

References

1) Wikipedia: http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

2) Lecture notes by Erwin Laure:

http://agenda.albanova.se/conferenceDisplay.py?confId=4384

3)Han, Xiao Gang, Qin Lei Sun, and Jiang Wei Fan. "Parallel Dijkstra's

Algorithm Based on Multi-Core and MPI." Applied Mechanics and

Materials 441 (2014): 750-753.

4) http://www.mpich.org/

5) http://www.open-mpi.org/

6) Crauser, Andreas, et al. "A parallelization of Dijkstra's shortest

path algorithm."Mathematical Foundations of Computer Science

1998. Springer Berlin Heidelberg, 1998. 722-731

7) Meyer, Ulrich, and Peter Sanders. "-stepping: A parallel single

source shortest path algorithm." AlgorithmsESA98. Springer

Berlin Heidelberg, 1998. 393-404.

8)http://www.inf.ed.ac.uk/publications/thesis/online/IM040172.pdf

- Partial Differential Equation of Parabolic TypeUploaded byDionysios Zelios
- KTH, DN2221, Computer Lab 2Uploaded bysebastian
- Krylov space methodsUploaded byDionysios Zelios
- Model of a Heat Exchanger ( Hyperbolic PDE)Uploaded byDionysios Zelios
- Numerical Solution of a Boundary Value ProblemUploaded bylambdaStudent_epl
- Graphs Shortest PathUploaded bySangeetha Sangu BC
- Programming.language.design.conceptsUploaded byMgciniNcube
- Numerical Solution of Initial Value ProblemsUploaded bylambdaStudent_epl
- IJCER (www.ijceronline.com) International Journal of computational Engineering researchUploaded byInternational Journal of computational Engineering research (IJCER)
- Capp 01Uploaded byadityabaid4
- GUITAR FINGERING FOR MUSIC PERFORMANCE.pdfUploaded bydevonchild
- dayton1323275800.pdfUploaded byjoemar
- Accuracy Evaluation of Typical, Infinite Bandwidth, Minimum TimeUploaded bykostas_marm
- Sc0x w3l2 CleanUploaded byHaseeb Rehman
- ZhuRamakrishnanHamannNeffComputerAnimationVirtualWorlds2012PaperRevisionAsSubmitted06232012Uploaded byAlex Ar
- Martinez Institue of Basic Education Route Finder System.docxUploaded byCleford Andrin
- sdaUploaded byBogdan Flavius
- 4_ Networks Model and Shortest PathUploaded byDr. Ir. R. Didin Kusdian, MT.
- Shortest Path ProblemUploaded byNaveen Angadi
- topology MISUploaded bySonia Machfiro
- L24Uploaded byKat Hawthorne
- 13sssp.ppsUploaded bykhil
- EcolierUploaded byLoganathan Perumal
- Chapter 06Uploaded bybchang412
- Algorithms Lab ManualUploaded byPrem Kumar
- Ivy for Grasshopper Manual 10sdUploaded byAravind Akash
- Chapter 5Uploaded byamondal2
- Module 1 ACAUploaded byRupesh Sapkota
- 47053Uploaded bymaklemz
- Social Networking and BusinessUploaded byRayn Tims-Sur

- Notes on Bernard Williams’ Critique of SidgwickUploaded byJoseph Bowman
- improving studnet peer feedback-nilsonUploaded byapi-296119798
- Hansen ValveUploaded byWayne Seaman
- ab 2 weebly documentUploaded byapi-356110104
- Proof of number of subtriangles within a triangleUploaded byBankole 'Layeni' Samsondeen
- satellite PhoneUploaded byAchyuth Samudrala
- 08-DataAcquisitionUploaded byAbhijeet Alte
- Cadillac ATS 2013-2016Uploaded byGarcia Vargas Jose
- 3Shape Orthoanalyzer 2015Uploaded byPruessada Deeying
- SEP_QMMMUploaded byMOs Osrio
- Internal EnergyUploaded byMarco Septian
- Semisub_ITTCUploaded byIan Pereira
- math 1 midterm spring 2015Uploaded byapi-260589859
- 1st GradingUploaded byJohn Alvin Delavin Gulosino
- Wr Radiantpexal Specification Product-EnUploaded bymehrzadjalali
- BlockbusterUploaded bypheeyona
- ScsUploaded byJaison Sadanandan
- Datasheet 7408Uploaded byAlfonso Ortiz
- January 2008Uploaded byCameisha Foster
- Grundfos_CR-20-2-A-F-A-E-HQQE.pdfUploaded bymaryam
- DochUploaded byJosé Araujo
- L7-434 Color Solutions EnUploaded byLukman Febrianto
- Nonparametric Statistical Inference (Gibbons; Chakrabort) (4ed.) 2003Uploaded byMinoru Oikawa
- Zhang y Wang 2009 Growth Curves Con SASUploaded byCarlos Andrade
- Mineral System Analyses 2 Structures regional.pptUploaded bygeology1
- Speech Recognition System for Robotic Control and MovementUploaded byIJSTR Research Publication
- Tourism of Uzbekistan, 2006Uploaded byАлишер Таксанов
- readme.pdfUploaded bydeepak_dce_me
- endsem NumericalsUploaded byAshish Kumar Sahu
- Electrical TitleUploaded byShaaruk Shk