Professional Documents
Culture Documents
1 1 2 3
2 3
4 5 6 4 5 6
University of Dar es Salaam
What makes a problem graph-like?
• There are two components to a graph
Nodes and edges
• In graph-like problems, these components have natural correspondences to
problem elements
Entities are nodes and interactions between entities are edges
• Most complex systems are graph-like
Friendship Network - FaceBook
Transportation Networks – Road/Flight/Railway
Internet
University of Dar es Salaam
Definition: Graph
• G is an ordered triple G:=(V, E, f)
• V is a set of nodes, points, or vertices
• E is a set, whose elements are known as edges or lines
• f is a function - maps each element of E to an unordered pair of vertices in V
• Vertex
• Basic Element, drawn as a node or a dot
• Vertex set of G is usually denoted by V(G), or V
• Edge
• A set of two elements, drawn as a line connecting two vertices, called end vertices, or endpoints
• An edge of a graph that joins a node to itself is called a loop or self-loop
• Some graphs may have pair of nodes joined by more than one edges-multiple or parallel edges
• The edge set of G is usually denoted by E(G), or E
• Simple graph - a graph without multiple edges or self-loops
• Multigraph - a graph with some multiple edges but no self-loops
• Psuedograph - a graph with some loops and multiple edges
University of Dar es Salaam
Example: A Simple Graph with Six Nodes & Seven Edges
• V:={1,2,3,4,5,6}
• E:={{1,2},{1,5},{2,3},{2,5},{3,4},{4,5},{4,6}}
University of Dar es Salaam
Directed Graph (Digraph)
• Edges have directions
• An edge is an ordered pair of nodes
University of Dar es Salaam
Weighted Graphs
• Graph for which each edge has an associated weight, usually given by
a weight function w: E R
1.2 2
1 2 3 1 2 3
.2
.5 1.5 5 3
.3 1
4 5 6
4 5 6
.5
University of Dar es Salaam
Some Graph Structures and Structural Metrics
• Connectivity
A graph is connected if you can get from any node to any other by following a sequence
of edges OR any two nodes are connected by a path
• Strong Connectedness
A directed graph is strongly connected if there is a directed path from any node to any
other node
• Components
Every disconnected graph can be split up into a number of connected components
University of Dar es Salaam
Degree of a Graph Node
• Number of edges incident on a node
• For directed Graphs (digraph) : degree = indeg + outdeg
In-degree: Number of edges entering
Out-degree: Number of edges leaving
outdeg(1)=2
indeg(1)=0
outdeg(2)=2
indeg(2)=2
outdeg(3)=1
indeg(3)=4
The degree of 5 is 3
University of Dar es Salaam
Degree: Simple Facts
• If G is a graph with m edges, then
deg(v) = 2m = 2 |E |
• If G is a digraph then
indeg(v)= outdeg(v) = |E |
Edge List
1 2 1.2
2 4 0.2
4 5 0.3
4 1 0.5
5 4 0.5
6 3 1.5
Note: Often, dense and sparse graphs are represented by using adjacency matrix
and an adjacency list, respectively
University of Dar es Salaam
Topological Distance
• A shortest path is the minimum path connecting two nodes
• The number of edges in the shortest path connecting p and q is the
topological distance between the two nodes, dp,q
Distance Matrix – is a |V | x |V | matrix D = ( dij )
such that dij is the topological distance between i and j Distance Matrix
1 2 3 4 5 6
1 0 1 2 2 1 3
2 1 0 1 2 1 3
3 2 1 0 1 2 2
4 2 2 1 0 1 1
5 1 1 2 1 0 2
6 3 3 2 1 2 0
University of Dar es Salaam
Graph Traversal (Search)
• Breadth First Search (BFS)
Considers neighbors of a vertex first in the search, before any outgoing edges of the
vertex
The BFS algorithm traverses a graph in a breadth-ward motion and uses a queue to
remember to get the next vertex to start a search, when a dead end occurs in any
iteration
• Depth First Search (DFS)
Considers outgoing edges of a vertex first in the search before any of the outgoing edges
of its predecessor, i.e., extremes are searched first.
The DFS algorithm start from the root or any arbitrary node and mark the node and
move to the adjacent unmarked node and continue this loop until there is no unmarked
adjacent node. Then backtrack and check for other unmarked nodes and traverse them.
Finally print the nodes in the path
University of Dar es Salaam
Breadth First Search (BFS)
• Example
BFS algorithm traverses from A to B to C first then to D to E
and F, lastly to G. It employs the following steps:
• Step 1
Visit the adjacent unvisited vertex
Mark it as visited
Process it
Insert it in a queue
• Step 2
If no adjacent vertex is found, remove the first vertex from
the queue
• Step 3
Repeat Step 1 and Step 2 until the queue is empty
University of Dar es Salaam
The BFS Algorithm
Algorithm BFS
Input: A directed or undirected graph G = (V, E)
Output: Numbering of the vertices in BFS order
1. bfn ←1 //Initialize breadth-first-number
2. for each vertex v ϵ V
3. mark v unvisited
4. end for
5. for each vertex v ϵ V
6. if v is marked unvisited then bfs(v) // starting vertex
7. end for
University of Dar es Salaam
The BFS Algorithm
Procedure bfs(v) // v is starting vertex, using queue
1. Q ← {v} // insert v into queue
2. mark v visited
3. while Q ≠ {}
4. v ←dequeue(Q) // v is current vertex
5. for each edge (v, w) ϵ E
6. if w is marked unvisited then
7. enqueue(w, Q)
8. mark w visited
9. bfn ←bfn + 1
10. end if
11. end for
University of Dar es Salaam
The BFS Algorithm
The queue contents during BFS traversal
that starts from vertex a. Assume that
we choose to visit adjacent vertices in
alphabetical order
University of Dar es Salaam
The BFS Algorithm
1. Z←∅ // The set of vertices that have been visited ‘Z' is initially empty
2. Q←V // The queue 'Q' initially contains all the vertices
3. dist[S] ← 0 // The distance to source vertex is set to 0
4. Π[S] ← NIL // The predecessor of source vertex is set as NIL
5. for all v ∈ V - {S} // For all other vertices
6. do dist[v] ← ∞ // All other distances are set to ∞
7. Π[v] ← NIL // The predecessor of all other vertices is set as NIL
8. while Q ≠ ∅ // While loop executes till the queue is not empty
9. do u ← min-distance (Q, dist) // A vertex from Q with the least distance is selected
10. Z ← Z {u} // Vertex 'u' is added to ‘Z' list of vertices that have been visited
11. for all v ∈ neighbors[u] // For all the neighboring vertices of vertex 'u'
12. do if dist[v] > dist[u] + w(u,v) // if any new shortest path is discovered
13. then dist[v] ← dist[u] + w(u,v) // The new value of the shortest path is selected
14. return dist
Source: https://www.gatevidyalay.com/dijkstras-algorithm-shortest-path-algorithm/
University of Dar es Salaam
Dijkstra Algorithm
1. First step (1-2) define two sets:
A set Z for all vertices which are included in the shortest path tree ---Initially empty
A set Q for all vertices which are yet to be included in the shortest path tree –
Initially contains all the vertices of the given graph
2. In second step (3-7), for each vertex of the given graph, two variables are
defined:
Π[v] which denotes the predecessor of vertex ‘v’
d[v] which denotes the shortest path estimate of vertex ‘v’ from the source vertex
Initially, the value of these variables is set as:
The value of variable ‘Π’ for each vertex is set to NIL, i.e. , Π[v] = NIL
The value of variable ‘d’ for source vertex is set to 0, i.e., d[S] = 0
The value of variable ‘d’ for remaining vertices is set to ∞, i.e., d[v] = ∞
University of Dar es Salaam
Dijkstra Algorithm
3. In Third step(8-13), the following procedure is
repeated until all the vertices of the graph are
processed:
Among unprocessed vertices, a vertex with minimum value
of variable ‘d’ is chosen
Its outgoing edges are relaxed
After relaxing the edges for that vertex, the sets created in
first step are updated
Edge Relaxation:
Consider the edge (a,b) in the graph shown on the right
Here, d[a] and d[b] denotes the shortest path estimate for
vertices a and b respectively from the source vertex ‘S’.
Now, If d[a] + w < d[b] then d[b] = d[a] + w and Π[b] = a
This is called as edge relaxation
University of Dar es Salaam
Time Complexity Analysis
• Case 1:
Valid when
The given graph G is represented as an adjacency matrix A
Priority queue Q is represented as an unordered list
Here:
A[i,j] stores the information about edge (i,j)
Time taken for selecting i with the smallest dist is O(|V|)
For each neighbour of i, time taken for updating dist[j] is O(1) and there will
be maximum |V| neighbours
Time taken for each iteration of the loop is O(|V|) and one vertex is deleted
from Q
Thus, total time complexity becomes O(|V|2)
University of Dar es Salaam
Time Complexity Analysis
• Case 2:
Valid when
The given graph G is represented as an adjacency list
Priority queue Q is represented as a binary heap
Here:
With adjacency list representation, all vertices of the graph can be
traversed using BFS in O(|V|+|E|) time
In min heap, operations like extract-min and decrease-key value
takes O(log|V|) time.
So, overall time complexity becomes O(|E|+|V|) x O(log|V|) which is O((|
E| + |V|) x log|V|) = O(|E|log|V|)
This time complexity can be reduced to O(|E|+|V|log|V|) using Fibonacci
University of Dar es Salaam
Dijkstra Algorithm : Example
Step 2 iterate n−1 times, because there are n−1 vertices (edges) that have to
be added to the tree
The efficiency of the algorithm is determined by how efficiently one can find
a qualifying w
University of Dar es Salaam
Finding a Spanning Tree..
Edge-centric Algorithm
1. Start with the collection of singleton trees (each with exactly one node)
2. As long as there are more than one tree, connect two trees together with
an edge in the graph
Water flowing through a pipe-work system. The values on the pipe are the capacities of water that they can carry
University of Dar es Salaam
Network Flow
• Suppose we turn on the tap, so that water flows along
the path SACT
What is the maximum flow along this path?
The maximum flow is governed by the minimum capacity along
the path----In this case 2!!!
• Now consider the path SBDT
What is the maximum flow along this path?
The minimum capacity along this path, hence the maximum
flow is 4
• Can increase a flow if can find a path from S to T with no
saturated edges
The flow can then be increased by minimum excess capacity
Consider the path SBCT
University of Dar es Salaam
University of Dar es Salaam
University of Dar es Salaam
Network Flow
• Is there another path from S to T that consists only of edges that are not saturated?
• Yes, the path SABCDT. What is the minimum excess capacity along this route?
• The minimum capacity is 1 along BC, therefore can increase the flow by 1
University of Dar es Salaam
Network Flow
• Now we have a total flow of 9 out of the source S and into the sink T
• Need a way of finding out whether this is the maximum possible flow!
University of Dar es Salaam
The Network Flow Problem
• A type of network optimization problem
• Arise in many different contexts
Networks: routing as many packets as possible on a given network
Transportation: sending as many trucks as possible, where roads have limits on the number of
trucks per unit time
Bridges: destroying (?!) some bridges to disconnect s from t, while minimizing the cost of
destroying the bridges
• Settings:
Given a directed graph G = (V, E), where each edge e is associated with its capacity c(e) > 0, and
two special nodes source s and sink t (s ≠ t)
• Problem:
Maximize the total amount of flow from s to t subject to two constraints: Flow on edge e doesn’t
exceed c(e) – For every node v ≠ s, t, incoming flow is equal to outgoing flow
University of Dar es Salaam
The Network Flow Problem
Alternate Formulation: Minimum Cut
• We want to remove some edges from the graph such that after removing the
edges, there is no path from s to t
The cost of removing e is equal to its capacity c(e)
The minimum cut problem is to find a cut with minimum total cost
• Theorem:
(maximum flow) = (minimum cut)
University of Dar es Salaam
The Network Flow Problem
• Back edges:
We don’t need to maintain the amount of flow on each edge but work with capacity
values directly
If f amount of flow goes through u → v, then:
Decrease c(u → v) by f
Increase c(v → u) by f
Why do we need to do this?
Sending flow to both directions is equivalent to cancelling flow
University of Dar es Salaam
Ford-Fulkerson Algorithm : Pseudo-code
• Set ftotal = 0
• Repeat until there is no path from s to t:
Run DFS from s to find a flow path to t
Let f be the minimum capacity value on the path
Add f to ftotal
For each edge u → v on the path:
Decrease c(u → v) by f
Increase c(v → u) by f
University of Dar es Salaam
Ford-Fulkerson Algorithm : Analysis
• Assumption:
capacities are integer-valued
• Finding a flow path takes θ (n + m) time
• We send at least 1 unit of flow through the path
If the max-flow is f⋆, the time complexity is O((n + m)f⋆)
“Bad” in that it depends on the output of the algorithm
Nonetheless, easy to code and works well in practice!!
University of Dar es Salaam
Computing Min-Cut
Residual Graph
University of Dar es Salaam
Computing Min-Cut Residual Graph
Original Graph
• Separate the nodes in A from the others –
i.e, V-A
Cut edges go from A to V − A in the original
graph
Look at the original graph and find the cut:
Why isn’t b → c cut?
University of Dar es Salaam
Bipartite Matching
• A Bipartite Graph G = (V, E) is a graph in which the vertex set V
can be divided into two disjoint subsets X and Y such that every
edge e ∈ E has one end point in X and the other end point in Y
• A matching M is a subset of edges such that each node in V
appears in at most one edge in M
Interested in matching of large size
• Maximal Matching:
A matching to which no more edges can be added without increasing
the degree of one of the nodes to two - it is a local maximum
• Maximum Matching:
A matching with the largest possible number of edges - it is globally
optimal
University of Dar es Salaam
Bipartite Matching
• Goal:
Find a maximum matching in a graph
• Note:
A maximal matching can be found very easily
Keep adding edges to a matching until no more can be added
Can be shown that for any maximal matching M, we have that |M|≥ ½ |M*| where
M* is the maximum matching
Therefore one can easily construct a “2-approximation” to a maximum matching
University of Dar es Salaam
Bipartite Matching and Network Flow
• The problem of finding a maximum matching can be
reduced to maximum flow in the following manner:
Let G(V,E) be the bipartite graph where V is divided
into X and Y
Construct a directed graph G’(V’,E’), in which V’
contains all the nodes of V along with a source node s
and a sink node t
For every edge in E, we add a directed edge in E’ from
X to Y
Finally add a directed edge from s to all nodes in X
and from all nodes of Y to t
Each edge is given unit capacity
University of Dar es Salaam
Bipartite Matching and Network Flow
• Let f be an integral flow of G’ of value k
Make the following observations:
1. There is no node in X which has more than one outgoing edge where there is a flow
2. There is no node in Y which has more than one incoming edge where there is a flow
3. The number of edges between X and Y which carry flow is k
• By these observations:
It is straightforward to conclude that the set of edges carrying flow in f forms a matching
of size k for the graph G
Likewise, given a matching of size k in G, we can construct a flow of size k in G’
Therefore, solving for maximum flow in G’ gives a maximum matching in G
Note that we used the fact that when edge capacities are integral, Ford-Fulkerson
produces an integral flow
University of Dar es Salaam
Bipartite Matching : Example
• Scenario:
n students and d dorms
Each student wants to live in one of the dorms of their choice
Each dorm can accommodate at most one student (?!)
A more reasonable variant of this problem: dorm j can accommodate cj students
Make an edge with capacity cj from dorm j to the sink
• Problem:
Find an assignment that maximizes the number of students who get a housing
University of Dar es Salaam
Flow Network Construction
getOptimal(item_arr[], int n)
1) initialize empty result : result = {}
2) while (All items are not considered)
i = SelectAnItem() // We make a greedy choice to select an item
if (feasible(i)) // If i is feasible, add i to the result
result = result i
3) return result
University of Dar es Salaam
Greedy Approach
• Characteristics
1. There is an ordered list of resources (profit, cost, value, etc.)
2. Maximum/minimum of all the resources (max profit, max value, min cost, etc.) are
taken
3. For example, in fractional knapsack problem, the maximum value/weight is taken
first according to available capacity
• Applications
Finding an optimal solution (Activity selection, Fractional Knapsack, Job Sequencing
, Huffman Coding)
Finding close to the optimal solution for NP-Hard problems like TSP (Travelling
Salesman Problem)
University of Dar es Salaam
Advantages and Disadvantages of Greedy Approach
• Advantages
Greedy approach is easy to design and implement
Analyzing the run time for greedy algorithms will generally be much easier than for
other techniques (e.g. Divide and conquer)
Typically have less time complexities
Greedy algorithms can be used for optimization purposes or finding close to
optimization in case of NP Hard problems
• Disadvantages
Have to work much harder to understand correctness issues
Even with the correct algorithm, it is hard to prove why it is correct - Proving that a greedy
algorithm is correct is more of an art than a science (It involves a lot of creativity)
The local optimal solution may not always be global optimal
University of Dar es Salaam
Standard Greedy Algorithms
• In addition to the input, the algorithm uses a source of pseudo random numbers
• During execution, it takes random choices depending on the random numbers
• The behaviour (output) can vary if the algorithm is run multiple times on the same input
University of Dar es Salaam
Advantage and Disadvantage of Randomized Algorithms
• Advantage
The algorithm is usually simple and easy to implement
The algorithm is fast with very high probability, and/or
It produces optimum output with very high probability
• Disadvantage
There is a finite probability of getting incorrect answer, however, the probability of
getting a wrong answer can be made arbitrarily small by the repeated employment of
randomness
Analysis of running time or probability of getting a correct answer is usually difficult
Getting truly random numbers is Impossible!!
One needs to depend on pseudo random numbers - So, the result highly depends on the quality of
the random numbers
University of Dar es Salaam
Quick Sort
The Problem: Deterministic Quick Sort
• Given an array A[1 . . . n]
containing n
(comparable) elements,
sort them in
increasing/decreasing
order
University of Dar es Salaam
Randomized Quick Sort
• An Useful Concept - The Central Splitter
It is an index s such that the number of elements less (resp. greater) than A[s] is at
least n/4
• The algorithm randomly chooses a key, and checks whether it is a central
splitter or not
• If it is a central splitter, then the array is split with that key as was done in
the QSORT algorithm
• It can be shown that the expected number of trials needed to get a central
splitter is constant
University of Dar es Salaam
Randomized Quick Sort
University of Dar es Salaam
Analysis of RandQSORT
• Fact:
Step 2 needs O(q − p) time
• Question:
How many times Step 2 is executed for finding a central splitter ?
• Result:
The probability that a randomly chosen element is a central splitter is ½
Therefore, the expected number of times the Step 2 needs to be repeated to get a
central splitter is
• Time Complexity
Worst case size of each partition in jth level of recursion is n × (3/4 )j
Number of levels of recursion = log 4/3 n = O(log n)
Recurrence Relation of the time complexity:
• Las Vegas:
A randomized algorithm that always returns a correct result, but the running
time may vary between executions
Example: Randomized QUICKSORT Algorithm
• Monte Carlo:
A randomized algorithm that terminates in polynomial time, but might
produce erroneous result
Example: Randomized MINCUT Algorithm
University of Dar es Salaam
Types of Randomized Algorithms
Example:
Consider the problem of finding an ‘a’ in an array of n elements
Input:
An array of n≥2 elements, in which half are ‘a’s and the other half are ‘b’s
Output:
Find an ‘a’ in the array
University of Dar es Salaam
Las Vegas algorithm
findingA_LV(array A, n)
begin
repeat
Randomly select one element out of n elements
until 'a' is found
end
University of Dar es Salaam
Las Vegas algorithm : Analysis
• The algorithm succeeds with probability 1
• The number of iterations varies and can be arbitrarily large, but the
expected number of iterations is:
• Since it is constant the expected run time over many calls is θ(1)
University of Dar es Salaam
Monte Carlo algorithm
findingA_MC(array A, n, k)
begin
i=0
repeat
Randomly select one element out of n elements
i=i+1
until i=k or 'a' is found
end
University of Dar es Salaam
Monte Carlo algorithm: Analysis
• This algorithm does not guarantee success, but the run time is bounded
The number of iterations is always less than or equal to k
Taking k to be constant the run time (expected and absolute) is θ(1)
University of Dar es Salaam
Randomized Algorithms: Applications
• Randomized algorithms are particularly useful when faced with a
malicious "adversary" or attacker who deliberately tries to feed a
bad input to the algorithm such as in the Prisoner's dilemma
Table 2
University of Dar es Salaam
Exercises
7. The goal of this exercise is to show an application of flows to organize the
presentation defences of Final Year Projects (FYP) of some students. Assume
that the students {S1, · · · , Sn} have to present their work to some panellists at
the end of their projects. There are q panellists P = {P1, · · · , Pq}. Each student Si
has a project Qi , i ≤ n. For any project, each panellist is either a specialist of the
subject or not. That is, for any i ≤ n, P is partitioned into Spi and NSpi , respectively
the subset of the panellists that are specialist of the project Qi , and the
panellists that are not. Finally, each panellist Pj , j ≤ q, can attend at most aj
defences. Each student Si must present his work to x panellists, y of them are
specialists of Pi and z = x − y of them are not. Use a flow-model to organize the
juries (i.e., which panellist will attend which presentation).
University of Dar es Salaam
Exercises
8. Figure 2 shows a flow network on which an 𝑠 − 𝑡
flow has been computed. The capacity of each
edge appears as a label next to the edge, and
the numbers in boxes give the amount of flow
sent on each edge (Edges without boxed
numbers—specifically, the four edges of
capacity 3—have no flow being sent on them).
a) What is the value of this flow?
b) Is this a maximum (s,t) flow in this graph? Figure 2
c) Find a minimum s-t cut in this flow network, and also
state what is its capacity.