18bce0537 VL2020210104308 Pe003

Parallelization of Shortest Path Finder between
two nodes: using Dijkstra algorithm
TEAM MEMBERS
Rohith Krishna -18BCE0537
Sai Srujith D -18BCE0560
SUBMITTED TO
Dr. PREETHA EVANGELINE D
School of Computer Science and Engineering,
VIT UNIVERSITY
MONTH OF SUBMISSION
OCTOBER-2020
S.NO TABLE OF CONTENTS
1 ABSTRACT
2 INTRODUCTION
3 LITERATURE SURVEY
4 PROPOSED WORK
5 RESULT
6 CODE AND SCREENSHOTS
7 REFERENCES
1.ABSTRACT
The project deals with implementation of Dijkstra algorithm i.e. All Pair Shortest
Path. This algorithm is implemented using parallel programming concept for
faster solution. Dijkstra algorithm has overcome the drawbacks of Floyd Warshall
algorithm and Bellman Ford Algorithm.
For parallel programming, this project is implemented using MPI for which C
programming is used. The purpose of developing this project is to find the
shortest path between all the present nodes in a graph. This project can be
implemented for Airline Systems, Transportation services, Courier Services,
Networking etc.
From the results it is observed that parallel algorithm is considerably effective for
large graph sizes and MPI implementation is better in terms of performance over
serial implementation of Dijkstra’s algorithm.
The primary goal of this project is to implement the Dijkstra algorithm in Parallel
with the help of MPI Programming to find the shortest path between two nodes
and we compare both serials implemented Dijkstra algorithm and parallel
implemented algorithm.
2. INTRODUCTION
Finding the shortest path between two objects or all objects in a graph is a
common task in solving many day to day and scientific problems. The algorithms
for finding shortest path find their application in many fields such as social
networks, bioinformatics, aviation, routing protocols, Google maps etc. Shortest
path algorithms can be classified into two types: single source shortest paths and
all pair shortest paths. There are many different algorithms for finding the all pair
shortest paths. Some of them are Dijkstra algorithm and Johnson’s algorithm.
All pair shortest path algorithm which is also known as Dijkstra algorithm was
developed in 1962. This algorithm follows the methodology of the dynamic
programming. The algorithm is used for graph analysis and finds the shortest
paths (lengths) between all pair of vertices or nodes in a graph. The graph is a
weighted directed graph with negative or positive edges.
The algorithm is limited to only returning the shortest path lengths and does not
return the actual shortest paths with names of nodes.
3.LITERATURE SURVEY
The computational complexity of graph algorithms is a function of the number of

vertices/edges or both the vertices and edges of a given graph (West 2000). With
the increasing number of either vertices or edges, the computational complexity
increases.
The research community has attempted to devise several efficient algorithms to

reduce such computational complexity. In general, two approaches are dominant:
(1) Improving the serial implementations of the algorithms; and (2) Formulating
parallel implementations of the algorithms.
In the first approach, researchers have adopted approaches, such as graph

partitioning, hierarchical network decomposition, and different data structures, to
improve the speed of the Dijkstra algorithm (Gilbert and Zmijewski 1987;
Hendrickson and Leland 1995; Romeijin and Smith 1999; Möhringet al. 2005).
In the graph partitioning approach, a given graph is divided into multiple pieces
of the same size in such a way that there are few connections between the pieces
(Hendrickson and Leland 1995; Möhring et al. 2005).
For example, to perform graph partitioning, Möhring et al. (2005) used different
data structures, such as a grid, k-dimensional (k-d) trees, and quadtrees to perform
graph partitioning.
A speedup, up to 500, was achieved by using a suitable data structure on certain

graphs. Unfortunately, the graph partitioning itself is a nondeterministic
polynomial- complete problem (West 2000).
Thus, one has to rely on certain heuristics that cannot be guaranteed to work in
every situation. Romeijn and Smith (1999) developed a hierarchical network
decomposition algorithm to partition a network into multiple pieces, which could
be independently solved to yield approximate shortest paths for a large-scale
network.
Their approach a logðnÞ savings in computation time compared to a traditional

implementation of the Dijkstra algorithm, where n represents the number of
vertices in a given network.
Finding minimum spanning tree of graph studied the two sequential algorithms.
Graph calculates the single source shortest path and all pair shortest path using
two serial algorithms, it can be converted into parallel which resulting the number
of iterations reduce consisting less time. Based on two sequential algorithms
Floyd and Dijkstra’s find the serial implementation. Then using OpenMP and
MPI convert the serial code into parallel as less amount of time based on the
speedup factor and efficiency and cost. [Parallelization of Shortest Path
Algorithm Using OpenMP and MPI Rajashri Awari Dept.of Info.Tech. MTech.
Yeshwantrao Chavan College of Engineering Nagpur, India]
Based on hierarchical networks, they propose a parallel approximate shortest path

algorithm which is efficient and maintains high approximation accuracy on large
scale networks. The algorithm condenses central nodes and their neighbours into
super nodes to iteratively construct higher level networks until the scale of the
top level meets a set threshold value. The algorithm approximates the distances
of the shortest paths in the original network by means of super nodes in the
higher-level network.
4.PROPOSED WORK
Dijkstra algorithm:
Dijkstra algorithm is All Pair Shortest Path finder. It is mainly used to overcome
the drawbacks of Floyd’s and Bellman Ford Algorithm. It considers negative
weight present in the graph. In Floyd Warshall algorithm every node of the graph
is visited and the shortest path is computed. The Dijkstra algorithm is an example
of dynamic programming.
Existing Model:
Sequential algorithm
for k = 0 to N-1
For i = 0 to N-1
for j = 0 to N-1
Endfor
The Dijkstra algorithm:
Sequential pseudo-code of this algorithm is given above requires N^3

comparisons. For each value of k that is the count of inter mediatory nodes
between node i and j the algorithm computes the distances between node i and j
and for all k nodes between them and compares it with the distance between i and
j with no inter mediatory nodes between them.
It then considers the minimum distance among the two distances calculated
above. This distance is the shortest distance between node i and j. The time
complexity of the above algorithm is Θ(N3 ).
The space complexity of the algorithm is Θ(N2 ). This algorithm requires the
adjacency matrix as the input.
Algorithm also incrementally improves an estimate on the shortest path between

two nodes, until the path length is minimum.
In this project we have implemented the parallel version of all pair shortest path
algorithm in MPI.
From the results we found that parallel version gave speedup benefits over
sequential one, but these benefits are more observable for large datasets.
Proposed system: Parallel algorithm:

for k = 0 to N-1
for i = local_i_start to local_i_end
for j = 0 to N-1
Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k))
Endfor
Endfor
Endfor
The Dijkstra algorithm The Dijkstra algorithm compares all possible paths
through the graph between each pair of vertices. Consider a graph G (V, E) where
V is no. of vertices and E is no. of edges. For computing minimum path between
each pair of node Wk. is computed where k ranges from 1 to V. For computing
shortest from i to j, W [i, j] = W [i, k] +W [k, j] INPUT: A graph G (V, E) where
V is a set of vertices and E is set of weighted edges between these vertices. A
source vertex form V.
OUTPUT: The distance of shortest paths between the source vertex and every
vertex in G.
MODULES AND DESCRIPTION
Function: Read_n
Purpose: Read in the number of rows in the matrix on process 0 and broadcast
this value to the other processes
Input arguments: my_rank: the calling process' rank
comm: Communicator containing all calling processes
Return value: n: the number of rows in the matrix
Function: Build_blk_col_type
Purpose: Build an MPI_Datatype that represents a block column of a matrix
Input arguments: n: number of rows in the matrix and the block column
loc_n = n/p: number cols in the block column
Return value: blk_col_mpi_t: MPI_Datatype that represents a block column
Function: Read_matrix
Purpose: Read in an nxn matrix of ints on process 0, and distribute it among
the processes so that each process gets a block column with n rows and n/p
columns
Input arguments: n: the number of rows in the matrix and the submatrices
loc_n = n/p: the number of columns in the submatrices
blk_col_mpi_t: the MPI_Datatype used on process 0
my_rank: the caller's rank in comm
comm: Communicator consisting of all the processes
Out arg: loc_mat: the calling process' submatrix (needs to be
allocated by the caller)
Function: Print_local_matrix
Purpose: Store a process' submatrix as a string and print the
string. Printing as a string reduces the chance
that another process' output will interrupt the output.
from the calling process.
Input arguments: loc_mat: the calling process' submatrix
n: the number of rows in the submatrix
loc_n: the number of cols in the submatrix
my_rank: the calling process' rank
Function: Print_matrix
Purpose: Print the matrix that's been distributed among the processes.
Input arguments: loc_mat: the calling process' submatrix
n: number of rows in the matrix and the submatrices
loc_n: the number of cols in the submatrix
blk_col_mpi_t: MPI_Datatype used on process 0 to receive a process'
submatrix
my_rank: the calling process' rank
comm: Communicator consisting of all the processes
Function: Dijkstra
Purpose: Apply Dijkstra's algorithm to the matrix mat
Input arguments: mat: sub matrix
n: the number of vertices
loc_n: size of loc arrays
my_rank: rank of process
MPI_Comm = MPI Communicator
In/Output arguments: loc_dist: loc dist array
loc_pred: loc pred array

Function: Initialize_matrix
Purpose: Initialize loc_dist, known, and loc_pred arrays
Input arguments: mat: matrix
loc_dist: loc distance array
loc_pred: loc pred array
known: known array
Function: Find_min_dist
Purpose: Find the vertex u with minimum distance to 0 (dist[u]) among the
vertices whose distance to 0 is not known.
Input arguments: loc_dist: loc distance array
loc_known: loc known array
loc_n: size of loc arrays
Output arguments: local vertex
Function: Global_vertex
Purpose: Convert local vertex to global vertex
Input arguments: loc_u: local vertex
loc_n: local number of vertices
my_rank: rank of process
Output arguments: global_u: global vertex

Function: Print_dists
Purpose: Print the length of the shortest path from 0 to each
vertex
Input arguments: n: the number of vertices dist: distances from 0 to each vertex
v: dist[v]
is the length of the shortest path 0->v
Function: Print_paths
Purpose: Print the shortest path from 0 to each vertex
Input arguments: n: the number of vertices
pred: list of predecessors: pred[v] = u if u precedes v on the shortest

path 0->v
5.RESULT
Matrix Chosen for Comparison

[ [0 1 3 2 3 2 4 3],
[1 0 3 3 4 2 1 3],
[2 3 0 6 5 7 2 1],
[2 1 1 0 4 3 2 7],
[2 2 3 4 0 6 1 2],
[1 1 3 2 5 0 2 3],
[3 4 5 2 1 2 0 2].
[2 3 1 6 3 2 1 0] ]
Serial Execution Of the Djikstra’s Algorithm for given matrix
Execution time:8.237 seconds
Parallel execution of djikstra’s algorithm

Execution Time:0.00019 seconds
From the above execution time and from the fact that performance is
inversely proportional to time of execution , the performance of parallel
execution of djikstra’s algorithm is 10^4 times more efficient than
serial execution
6.CODE AND OUTPUT SCREENSHOTS
CODE:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
#define MAX_STRING 10000
#define INFINITY 1000000
int Read_n(int my_rank, MPI_Comm comm);
MPI_Datatype Build_blk_col_type(int n, int loc_n);
void Read_matrix(int loc_mat[], int n, int loc_n,
MPI_Datatype blk_col_mpi_t, int my_rank, MPI_Comm comm);
void Print_local_matrix(int loc_mat[], int n, int loc_n, int my_rank);
void Print_matrix(int loc_mat[], int n, int loc_n,
MPI_Datatype blk_col_mpi_t, int my_rank, MPI_Comm comm);
void Dijkstra(int mat[], int dist[], int pred[], int n, int loc_n, int my_rank,
MPI_Comm comm);
void Initialize_matrix(int mat[], int loc_dist[], int loc_pred[], int known[],
int loc_n, int my_rank);
int Find_min_dist(int loc_dist[], int known[], int loc_n, int my_rank,
MPI_Comm comm);
int Global_vertex(int loc_u, int loc_n, int my_rank);
void Print_dists(int loc_dist[], int n, int loc_n, int my_rank, MPI_Comm comm);
void Print_paths(int loc_pred[], int n, int loc_n, int my_rank, MPI_Comm comm);
int main(int argc, char* argv[]) {
int *loc_mat, *loc_dist, *loc_pred;
int n, loc_n, p, my_rank;
MPI_Comm comm;
MPI_Datatype blk_col_mpi_t;
MPI_Init(&argc, &argv);
comm = MPI_COMM_WORLD;
MPI_Comm_size(comm, &p);
MPI_Comm_rank(comm, &my_rank);
n = Read_n(my_rank, comm);
loc_n = n/p;
loc_mat = malloc(n*loc_n*sizeof(int));
loc_dist = malloc(n*loc_n*sizeof(int));
loc_pred = malloc(n*loc_n*sizeof(int));
blk_col_mpi_t = Build_blk_col_type(n, loc_n);
Read_matrix(loc_mat, n, loc_n, blk_col_mpi_t, my_rank, comm);
double start = MPI_Wtime();
Dijkstra(loc_mat, loc_dist, loc_pred, n, loc_n, my_rank, comm);
double end = MPI_Wtime();
Print_dists(loc_dist, n, loc_n, my_rank, comm);

Print_paths(loc_pred, n, loc_n, my_rank, comm);
free(loc_mat);
free(loc_dist);
free(loc_pred);
MPI_Type_free(&blk_col_mpi_t);
printf("time of execution=%f\n",end-start);
MPI_Finalize();
return 0;
int Read_n(int my_rank, MPI_Comm comm) {
int n;
if (my_rank == 0)
printf("Enter number of vertices in the matrix: \n");
scanf("%d", &n);
MPI_Bcast(&n, 1, MPI_INT, 0, comm);
return n;
}
MPI_Datatype Build_blk_col_type(int n, int loc_n) {
MPI_Aint lb, extent;
MPI_Datatype block_mpi_t;
MPI_Datatype first_bc_mpi_t;
MPI_Datatype blk_col_mpi_t;
MPI_Type_contiguous(loc_n, MPI_INT, &block_mpi_t);
MPI_Type_get_extent(block_mpi_t, &lb, &extent);
MPI_Type_vector(n, loc_n, n, MPI_INT, &first_bc_mpi_t);
MPI_Type_create_resized(first_bc_mpi_t, lb, extent,
&blk_col_mpi_t);
MPI_Type_commit(&blk_col_mpi_t);
MPI_Type_free(&block_mpi_t);
MPI_Type_free(&first_bc_mpi_t);
return blk_col_mpi_t;
void Read_matrix(int loc_mat[], int n, int loc_n,
MPI_Datatype blk_col_mpi_t, int my_rank, MPI_Comm comm) {
int* mat = NULL, i, j;
if (my_rank == 0) {
mat = malloc(n*n*sizeof(int));
for (i = 0; i < n; i++)

for (j = 0; j < n; j++)
scanf("%d", &mat[i*n + j]);
MPI_Scatter(mat, 1, blk_col_mpi_t,
loc_mat, n*loc_n, MPI_INT, 0, comm);
if (my_rank == 0) free(mat);
void Print_local_matrix(int loc_mat[], int n, int loc_n, int my_rank) {
char temp[MAX_STRING];
char *cp = temp;
int i, j;
sprintf(cp, "Proc %d >\n", my_rank);
cp = temp + strlen(temp);
for (i = 0; i < n; i++) {
for (j = 0; j < loc_n; j++) {
if (loc_mat[i*loc_n + j] == INFINITY)
sprintf(cp, " i ");
else
sprintf(cp, "%2d ", loc_mat[i*loc_n + j]);
}
sprintf(cp, "\n");
printf("%s\n", temp);
void Print_matrix(int loc_mat[], int n, int loc_n,
MPI_Datatype blk_col_mpi_t, int my_rank, MPI_Comm comm) {
int* mat = NULL, i, j;
if (my_rank == 0) mat = malloc(n*n*sizeof(int));
MPI_Gather(loc_mat, n*loc_n, MPI_INT,
mat, 1, blk_col_mpi_t, 0, comm);
if (my_rank == 0) {
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++)
if (mat[i*n + j] == INFINITY)
printf(" i ");
else
printf("%2d ", mat[i*n + j]);
printf("\n");
free(mat);
}
void Initialize_matrix(int mat[], int loc_dist[], int loc_pred[], int known[],
int loc_n, int my_rank) {
for (int v = 0; v < loc_n; v++) {
loc_dist[v] = mat[0*loc_n + v];
loc_pred[v] = 0;
known[v] = 0;
if (my_rank == 0) {
known[0] = 1;
void Dijkstra(int mat[], int loc_dist[], int loc_pred[], int n, int loc_n,
int my_rank, MPI_Comm comm) {
int i, u, *known, new_dist;
int loc_u, loc_v;
known = malloc(loc_n*sizeof(int));
Initialize_matrix(mat, loc_dist, loc_pred, known, loc_n, my_rank);
for (i = 1; i < n; i++) {
loc_u = Find_min_dist(loc_dist, known, loc_n, my_rank, comm);
int my_min[2], glbl_min[2];
int g_min_dist;
if (loc_u < INFINITY) {
my_min[0] = loc_dist[loc_u];
my_min[1] = Global_vertex(loc_u, loc_n, my_rank);
} else {
my_min[0] = INFINITY;
my_min[1] = INFINITY;
MPI_Allreduce(my_min, glbl_min, 1, MPI_2INT, MPI_MINLOC, comm);
u = glbl_min[1];
g_min_dist = glbl_min[0];
if (u/loc_n == my_rank) {
loc_u = u % loc_n;
known[loc_u] = 1;
for (loc_v = 0; loc_v < loc_n; loc_v++)
if (!known[loc_v]) {
new_dist = g_min_dist + mat[u*loc_n + loc_v];
if (new_dist < loc_dist[loc_v]) {
loc_dist[loc_v] = new_dist;
loc_pred[loc_v] = u;
}
}
free(known);
int Find_min_dist(int loc_dist[], int loc_known[], int loc_n, int my_rank,
MPI_Comm comm) {
int loc_v, loc_u;
int loc_min_dist = INFINITY;
loc_u = INFINITY;
for (loc_v = 0; loc_v < loc_n; loc_v++)
if (!loc_known[loc_v])
if (loc_dist[loc_v] < loc_min_dist) {
loc_u = loc_v;
loc_min_dist = loc_dist[loc_v];
return loc_u;
int Global_vertex(int loc_u, int loc_n, int my_rank)
int global_u = loc_u + my_rank*loc_n;

return global_u;
void Print_dists(int loc_dist[], int n, int loc_n, int my_rank, MPI_Comm comm)
int v;
int* dist = NULL;
if (my_rank == 0) {
dist = malloc(n*sizeof(int));
MPI_Gather(loc_dist, loc_n, MPI_INT, dist, loc_n, MPI_INT, 0, comm);
if (my_rank == 0) {
printf("The distance from 0 to each vertex is:\n");
printf(" v dist 0->v\n");
printf("---- ---------\n");
for (v = 1; v < n; v++)
printf("%3d %4d\n", v, dist[v]);
printf("\n");
free(dist);
}
void Print_paths(int loc_pred[], int n, int loc_n, int my_rank, MPI_Comm comm) {
int v, w, *path, count, i;
int* pred = NULL;
if (my_rank == 0) {
pred = malloc(n*sizeof(int));
MPI_Gather(loc_pred, loc_n, MPI_INT, pred, loc_n, MPI_INT, 0, comm);
if (my_rank == 0) {
path = malloc(n*sizeof(int));
printf("The shortest path from 0 to each vertex is:\n");
printf(" v Path 0->v\n");
printf("---- ---------\n");
for (v = 1; v < n; v++) {
printf("%3d: ", v);
count = 0;
w = v;
while (w != 0) {
path[count] = w;
count++;
w = pred[w];
}
printf("0 ");
for (i = count-1; i >= 0; i--)
printf("%d ", path[i]);
printf("\n");
free(path);
free(pred);
}
OUTPUT SCREENSHOT
Parallel algorithm being executed for a graph with 8 vertices and their
respective unit weights and the time of execution is also shown at the end
7.REFERENCES
[1]Kairanbay Magzhan, Hajar Mat Jani”A review and evaluation of shortest path
algorithms”.
[2] Olaf Schenk,Matthias Christen,Helmar Burkhart”J. Parallel Distrib. Comput”.
[3] http://www.nvidia.com/object/what-is-gpu-computing.html
[4] Efficient multi GPU algorithm for All pair shortest path. [8] G. Venkataraman, S.
Sahni, and S. Mukhopadhyaya, A blocked all-pairs shortest-paths algorithm, J. Exp.
Algorithmics.
[5]R. Seidel. On the all-pairs-shortest-path problem in unweighted undirected graphs.

Journal of Computer and System Sciences
[6] U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff, and P. Sadayappan,

Parallel fpga-based all-pairs shortest-paths in a directed graph, Parallel and
Distributed Processing Symposium, International
[7] [Parallelization of Shortest Path Algorithm Using OpenMP and MPI Rajashri Awari
Dept.of Info.Tech. MTech. Yeshwantrao Chavan College of Engineering Nagpur,
India]
[8] Approximation Algorithm for Shortest Path in Large Social Networks Dennis Nii
Ayeh Mensah * , Hui Gao and Liang Wei Yang

18bce0537 VL2020210104308 Pe003

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18bce0537 VL2020210104308 Pe003

Uploaded by

Copyright:

Available Formats

Parallelization of Shortest Path Finder between

two nodes: using Dijkstra algorithm

6 CODE AND SCREENSHOTS

The computational complexity of graph algorithms is a function of the number of

The research community has attempted to devise several efficient algorithms to

In the first approach, researchers have adopted approaches, such as graph

A speedup, up to 500, was achieved by using a suitable data structure on certain

Their approach a logðnÞ savings in computation time compared to a traditional

Based on hierarchical networks, they propose a parallel approximate shortest path

The Dijkstra algorithm:

Sequential pseudo-code of this algorithm is given above requires N^3

Algorithm also incrementally improves an estimate on the shortest path between

Proposed system: Parallel algorithm:

for i = local_i_start to local_i_end

Iij (k+1) = min (Iij(k), Iik(k) + Ikj(k))

MODULES AND DESCRIPTION

this value to the other processes

Input arguments: my_rank: the calling process' rank

comm: Communicator containing all calling processes

Return value: n: the number of rows in the matrix

Purpose: Build an MPI_Datatype that represents a block column of a matrix

loc_n = n/p: number cols in the block column

Return value: blk_col_mpi_t: MPI_Datatype that represents a block column

blk_col_mpi_t: the MPI_Datatype used on process 0

my_rank: the caller's rank in comm

comm: Communicator consisting of all the processes

Out arg: loc_mat: the calling process' submatrix (needs to be

allocated by the caller)

Purpose: Store a process' submatrix as a string and print the

string. Printing as a string reduces the chance

that another process' output will interrupt the output.

from the calling process.

Input arguments: loc_mat: the calling process' submatrix

n: the number of rows in the submatrix

loc_n: the number of cols in the submatrix

my_rank: the calling process' rank

Input arguments: loc_mat: the calling process' submatrix

n: number of rows in the matrix and the submatrices

loc_n: the number of cols in the submatrix

blk_col_mpi_t: MPI_Datatype used on process 0 to receive a process'

my_rank: the calling process' rank

comm: Communicator consisting of all the processes

Purpose: Apply Dijkstra's algorithm to the matrix mat

Input arguments: mat: sub matrix

n: the number of vertices

loc_n: size of loc arrays

my_rank: rank of process

MPI_Comm = MPI Communicator

In/Output arguments: loc_dist: loc dist array

loc_pred: loc pred array

Purpose: Initialize loc_dist, known, and loc_pred arrays

Input arguments: mat: matrix

loc_dist: loc distance array

loc_pred: loc pred array

known: known array

Input arguments: loc_dist: loc distance array

loc_known: loc known array

loc_n: size of loc arrays

Output arguments: local vertex

Purpose: Convert local vertex to global vertex

Input arguments: loc_u: local vertex

loc_n: local number of vertices

my_rank: rank of process

int loc_mat, loc_dist, *loc_pred;

if (my_rank == 0) mat = malloc(nnsizeof(int));