You are on page 1of 25

UNIT 5

Complexity Analysis

An essential aspect to data structures is algorithms. Data structures are implemented using
algorithms. An algorithm is a procedure that you can write as a C function or program, or any
other language. An algorithm states explicitly how the data will be manipulated.

Algorithm Efficiency
Some algorithms are more efficient than others. We would prefer to chose an efficient algorithm,
so it would be nice to have metrics for comparing algorithm efficiency.

The complexity of an algorithm is a function describing the efficiency of the algorithm in terms
of the amount of data the algorithm must process. Usually there are natural units for the domain
and range of this function. There are two main complexity measures of the efficiency of an
algorithm:

 Time complexity is a function describing the amount of time an algorithm takes in terms
of the amount of input to the algorithm. "Time" can mean the number of memory
accesses performed, the number of comparisons between integers, the number of times
some inner loop is executed, or some other natural unit related to the amount of real time
the algorithm will take. We try to keep this idea of time separate from "wall clock" time,
since many factors unrelated to the algorithm itself can affect the real time (like the
language used, type of computing hardware, proficiency of the programmer, optimization
in the compiler, etc.). It turns out that, if we chose the units wisely, all of the other stuff
doesn't matter and we can get an independent measure of the efficiency of the algorithm.

 Space complexity is a function describing the amount of memory (space) an algorithm


takes in terms of the amount of input to the algorithm. We often speak of "extra" memory
needed, not counting the memory needed to store the input itself. Again, we use natural
(but fixed-length) units to measure this. We can use bytes, but it's easier to use, say,
number of integers used, number of fixed-sized structures, etc. In the end, the function we
come up with will be independent of the actual number of bytes needed to represent the
unit. Space complexity is sometimes ignored because the space used is minimal and/or
obvious, but sometimes it becomes as important an issue as time.

Asymptotic analysis of an algorithm refers to defining the mathematical boundation/framing of


its run-time performance. Using asymptotic analysis, we can very well conclude the best case,
average case, and worst case scenario of an algorithm.
Asymptotic analysis refers to computing the running time of any operation in mathematical
units of computation

Usually, the time required by an algorithm falls under three types −


 Best Case − Minimum time required for program execution.
 Average Case − Average time required for program execution.
 Worst Case − Maximum time required for program execution.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time
complexity of an algorithm.

 Ο Notation
 Ω Notation
 θ Notation

Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time.
It measures the worst case time complexity or the longest amount of time an algorithm can
possibly take to complete.

For example, for a function f(n)

Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n > n0. }

Omega Notation, Ω

The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time.
It measures the best case time complexity or the best amount of time an algorithm can possibly
take to complete.
For example, for a function f(n)

Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }

Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −

θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }

Common Asymptotic Notations


Following is a list of some common asymptotic notations −

constant − Ο(1)

logarithmic − Ο(log n)

linear − Ο(n)

n log n − Ο(n log n)

quadratic − Ο(n2)

cubic − Ο(n3)

polynomial − nΟ(1)

exponential − 2Ο(n)
Recursion

The process in which a function calls itself directly or indirectly is called recursion and the
corresponding function is called as recursive function. Using recursive algorithm, certain
problems can be solved quite easily. Examples of such problems are Towers of Hanoi
(TOH), Inorder/Preorder/Postorder Tree Traversals, DFS of Graph, etc.

What is base condition in recursion?

In recursive program, the solution to base case is provided and solution of bigger problem is
expressed in terms of smaller problems.
int fact(int n)
{
if (n < = 1) // base case
return 1;
else
return n*fact(n-1);
}
In the above example, base case for n < = 1 is defined and larger value of number can be solved
by converting to smaller one till base case is reached.

A function fun is called direct recursive if it calls the same function fun. A function fun is called
indirect recursive if it calls another function say fun_new and fun_new calls fun directly or
indirectly. Difference between direct and indirect recursion has been illustrated in Table 1.

// An example of direct recursion


voiddirectRecFun()
{
// Some code....

directRecFun();

// Some code...
}

// An example of indirect recursion


void indirectRecFun1()
{
// Some code...
indirectRecFun2();

// Some code...
}
void indirectRecFun2()
{
// Some code...

indirectRecFun1();

// Some code...
}

Example for time and space complexity


1. What is the time, space complexity of following code:

Int a = 0, b = 0;

for(i = 0; i < N; i++) {

a = a + rand();

for(j = 0; j < M; j++) {

b = b + rand();

Options:

Output:

3. O(N + M) time, O(1) space


Explanation: The first loop is O(N) and the second loop is O(M). Since we don’t know which is
bigger, we say this is O(N + M). This can also be written as O(max(N, M)).
Since there is no additional space being utilized, the space complexity is constant / O(1)

2. What is the time complexity of following code:


inta = 0;

for(i = 0; i < N; i++) {

for(j = N; j > i; j--) {

a = a + i + j;

4. O(N*N)

Explanation:
The above code runs total no of times
= N + (N – 1) + (N – 2) + … 1 + 0
= N * (N + 1) / 2
= 1/2 * N^2 + 1/2 * N
O(N^2) times.

3. What is the time complexity of following code:

inti, j, k = 0;

for(i = n / 2; i <= n; i++) {

for(j = 2; j <= n; j = j * 2) {

k = k + n / 2;

2. O(nLogn)
Explanation:If you notice, j keeps doubling till it is less than or equal to n. Number of times, we
can double a number till it is less than n would be log(n).
Let’s take the examples here.
for n = 16, j = 2, 4, 8, 16
for n = 32, j = 2, 4, 8, 16, 32
So, j would run for O(log n) steps.
i runs for n/2 steps.
So, total steps = O(n/ 2 * log (n)) = O(n*logn)
int a = 0, i = N;

while(i > 0) {

a += i;

i /= 2;

4. O(log N)
Explanation: We have to find the smallest x such that N / 2^x N
x = log(N)
Sorting Algorithms
A Sorting Algorithm is used to rearrange a given array or list elements according to a
comparison operator on the elements. The comparison operator is used to decide the
new order of element in the respective data structure.

HeapSort
Heap sort is a comparison based sorting technique based on Binary Heap data
structure. It is similar to selection sort where we first find the maximum element and
place the maximum element at the end. We repeat the same process for remaining
element.
A Binary Heap is a Complete Binary Tree where items are stored in a special order such that
value in a parent node is greater(or smaller) than the values in its two children nodes. The
former is called as max heap and the latter is called min heap. The heap can be represented by
binary tree or array.

Why array based representation for Binary Heap?


Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array
and array based representation is space efficient. If the parent node is stored at index I,
the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2 (assuming the
indexing starts at 0).
Heap Sort Algorithm for sorting in increasing order:
1. Build a max heap from the input data.
2. At this point, the largest item is stored at the root of the heap. Replace it with the last
item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of
tree.
3. Repeat above steps while size of heap is greater than 1.
How to build the heap?
Heapify procedure can be applied to a node only if its children nodes are heapified. So
the heapification must be performed in the bottom up order.
Snapshots:
Searching Algorithms
Searching Algorithms are designed to check for an element or retrieve an element from
any data structure where it is stored. Based on the type of search operation, these
algorithms are generally classified into two categories:
1. Sequential Search: In this, the list or array is traversed sequentially and every element is
checked. For example: Linear Search.

2. Interval Search: These algorithms are specifically designed for searching in sorted data-
structures. These type of searching algorithms are much more efficient than Linear Search
as they repeatedly target the center of the search structure and divide the search space in
half. For Example: Binary Search.

Linear Search to find the element “20” in a given list of numbers

Binary Search to find the element “23” in a given list of numbers


Algorithms

An algorithm is designed to achieve optimum solution for a given problem. In greedy algorithm
approach, decisions are made from the given solution domain. As being greedy, the closest
solution that seems to provide an optimum solution is chosen.

Greedy algorithms try to find a localized optimum solution, which may eventually lead to
globally optimized solutions. However, generally greedy algorithms do not provide globally
optimized solutions.

Examples
Most networking algorithms use the greedy approach. Here is a list of few of them −

 Travelling Salesman Problem


 Prim's Minimal Spanning Tree Algorithm
 Kruskal's Minimal Spanning Tree Algorithm
 Dijkstra's Minimal Spanning Tree Algorithm
 Graph - Map Coloring
 Graph - Vertex Cover
 Knapsack Problem
 Job Scheduling Problem
There are lots of similar problems that uses the greedy approach to find an optimum solution.
1. Greedy Algorithm (Minimum Spanning Tree)

Kruskal’s Minimum Spanning Tree Algorithm


1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle
is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.

The algorithm is a Greedy Algorithm. The Greedy Choice is to pick the smallest weight edge
that does not cause a cycle in the MST constructed so far. Let us understand it with an example:
Consider the below input graph.

The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:
Weight SrcDest
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from sorted list of edges
1. Pick edge 7-6: No cycle is formed, include it.

2. Pick edge 8-2: No cycle is formed, include it.

3. Pick edge 6-5: No cycle is formed, include it.

4. Pick edge 0-1: No cycle is formed, include it.

5. Pick edge 2-5: No cycle is formed, include it.

6. Pick edge 8-6: Since including this edge results in cycle, discard it.
7. Pick edge 2-3: No cycle is formed, include it.

8. Pick edge 7-8: Since including this edge results in cycle, discard it.
9. Pick edge 0-7: No cycle is formed, include it.

10. Pick edge 1-2: Since including this edge results in cycle, discard it.
11. Pick edge 3-4: No cycle is formed, include it.

Since the number of edges included equals (V – 1), the algorithm stops here.
2. Divide And Conquer Approach (Merge Sort)

In divide and conquer approach, the problem in hand, is divided into smaller sub-problems and
then each problem is solved independently. When we keep on dividing the subproblems into
even smaller sub-problems, we may eventually reach a stage where no more division is
possible. Those "atomic" smallest possible sub-problem (fractions) are solved. The solution of
all sub-problems is finally merged in order to obtain the solution of an original problem.

Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself
for the two halves and then merges the two sorted halves. The merge() function is used for
merging two halves. The merge(arr, l, m, r) is key process that assumes that arr[l..m]
andarr[m+1..r] are sorted and merges the two sorted sub-arrays into one. See following C
implementation for details.

MergeSort(arr[], l, r)

If r > l

1. Find the middle point to divide the array into two halves:

middle m = (l+r)/2

2. Call mergeSort for first half:

Call mergeSort(arr, l, m)

3. Call mergeSort for second half:

Call mergeSort(arr, m+1, r)

4. Merge the two halves sorted in step 2 and 3:

Call merge(arr, l, m, r)

The following diagram from wikipedia shows the complete merge sort process for an example
array {38, 27, 43, 3, 9, 82, 10}. If we take a closer look at the diagram, we can see that the array
is recursively divided in two halves till the size becomes 1. Once the size becomes 1, the merge
processes comes into action and starts merging arrays back till the complete array is merged.
3. Dynamic programming(shortest path algorithm)
Bellman–Ford Algorithm

Given a graph and a source vertex src in graph, find shortest paths from src to all vertices in the
given graph. The graph may contain negative weight edges.Dijkstra doesn’t work for Graphs
with negative weight edges, Bellman-Ford works for such graphs. Bellman-Ford is also simpler
than Dijkstra and suites well for distributed systems. But time complexity of Bellman-Ford is
O(VE), which is more than Dijkstra.

Algorithm
Following are the detailed steps.
Input: Graph and a source vertex src
Output: Shortest distance to all vertices from src. If there is a negative weight cycle, then
shortest distances are not calculated, negative weight cycle is reported.
1) This step initializes distances from source to all vertices as infinite and distance to source
itself as 0. Create an array dist[] of size |V| with all values as infinite except dist[src] where src is
source vertex.
2) This step calculates shortest distances. Do following |V|-1 times where |V| is the number of
vertices in given graph.
…..a) Do following for each edge u-v
………………If dist[v] >dist[u] + weight of edge uv, then update dist[v]
………………….dist[v] = dist[u] + weight of edge uv
3) This step reports if there is a negative weight cycle in graph. Do following for each edge u-v
……If dist[v] >dist[u] + weight of edge uv, then “Graph contains negative weight cycle”
The ideael of step 3 is, step 2 guarantees shortest distances if graph doesn’t contain negative
weight cycle. If we iterate through all edges one more time and get a shorter path for any vertex,
then there is a negative weight cycle

Example:

Let the given source vertex be 0. Initialize all distances as infinite, except the distance to
source itself. Total number of vertices in the graph is 5, so all edges must be processed
4 times.
Let all edges are processed in following order: (B,E), (D,B), (B,D), (A,B), (A,C), (D,C),
(B,C), (E,D). We get following distances when all edges are processed first time. The
first row in shows initial distances. The second row shows distances when edges (B,E),
(D,B), (B,D) and (A,B) are processed. The third row shows distances when (A,C) is
processed. The fourth row shows when (D,C), (B,C) and (E,D) are processed.

The first iteration guarantees to give all shortest paths which are at most 1 edge long.
We get following distances when all edges are processed second time (The last row
shows final values).

The second iteration guarantees to give all shortest paths which are at most 2 edges
long. The algorithm processes all edges 2 more times. The distances are minimized
after the second iteration, so third and fourth iterations don’t update the distances.
4. Backtracking Algorithms

Backtracking is finding the solution of a problem whereby the solution depends on the previous
steps taken. For example, in a maze problem, the solution depends on all the steps you take one-
by-one. If any of those steps is wrong, then it will not lead us to the solution. In a maze problem,
we first choose a path and continue moving along it. But once we understand that the particular
path is incorrect, then we just come back and change it. This is what backtracking basically is.

In backtracking, we first take a step and then we see if this step taken is correct or not i.e.,
whether it will give a correct answer or not. And if it doesn’t, then we just come back and change
our first step. In general, this is accomplished by recursion. Thus, in backtracking, we first start
with a partial sub-solution of the problem (which may or may not lead us to the solution) and
then check if we can proceed further with this sub-solution or not. If not, then we just come back
and change it.

Thus, the general steps of backtracking are:

 start with a sub-solution

 check if this sub-solution will lead to the solution or not

 If not, then come back and change the sub-solution and continue again

N queens on NxN chessboard

One of the most common examples of the backtracking is to arrange N queens on an NxN
chessboard such that no queen can strike down any other queen. A queen can attack horizontally,
vertically, or diagonally. The solution to this problem is also attempted in a similar way. We first
place the first queen anywhere arbitrarily and then place the next queen in any of the safe places.
We continue this process until the number of unplaced queens becomes zero (a solution is found)
or no safe place is left. If no safe place is left, then we change the position of the previously
placed queen.

The above picture shows an NxN chessboard and we have to place N queens on it. So, we will
start by placing the first queen.
Now, the second step is to place the second queen in a safe position and then the third queen.

Now, you can see that there is no safe place where we can put the last queen. So, we will just
change the position of the previous queen. And this is backtracking.
Also, there is no other position where we can place the third queen so we will go back one more
step and change the position of the second queen.

And now we will place the third queen again in a safe position until we find a solution.

We will continue this process and finally, we will get the solution as shown below.

As now you have understood backtracking, let us now code the above problem of placing N
queens on an NxN chessboard using the backtracking method.
Topological Sorting (UNIT 4 graphs)
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that
for every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a
graph is not possible if the graph is not a DAG.
For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be more
than one topological sorting for a graph. For example, another topological sorting of the
following graph is “4 5 2 3 1 0”. The first vertex in topological sorting is always a vertex with in-
degree as 0 (a vertex with no incoming edges).

In topological sorting, we use a temporary stack. We don’t print the vertex immediately, we first
recursively call topological sorting for all its adjacent vertices, then push it to a stack. Finally,
print contents of stack. Note that a vertex is pushed to stack only when all of its adjacent vertices
(and their adjacent vertices and so on) are already in stack.

https://youtu.be/Q9PIxaNGnig watch is video


Disjoint Set (Or Union-Find) (Detect Cycle in an
Undirected Graph) UNIT 3
A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned
into a number of disjoint (non-overlapping) subsets. A union-find algorithm is an algorithm that
performs two useful operations on such a data structure:
Find: Determine which subset a particular element is in. This can be used for determining if two
elements are in the same subset.
Union: Join two subsets into a single subset.
In this post, we will discuss the application of Disjoint Set Data Structure. The application is to
check whether a given graph contains a cycle or not.
Union-Find Algorithm can be used to check whether an undirected graph contains cycle or not.
Note that we have discussed an algorithm to detect cycle. This is another method based
on Union-Find. This method assumes that the graph doesn’t contain any self-loops.
We can keep track of the subsets in a 1D array, let’s call it parent[].
Let us consider the following graph:

For each edge, make subsets using both the vertices of the edge. If both the vertices are in the
same subset, a cycle is found.
Initially, all slots of parent array are initialized to -1 (means there is only one item in every
subset).
0 1 2
-1 -1 -1
Now process all edges one by one.
Edge 0-1: Find the subsets in which vertices 0 and 1 are. Since they are in different subsets, we
take the union of them. For taking the union, either make node 0 as parent of node 1 or vice-
versa.
0 1 2 <----- 1 is made parent of 0 (1 is now representative of subset {0, 1})
1 -1 -1
Edge 1-2: 1 is in subset 1 and 2 is in subset 2. So, take union.
0 1 2 <----- 2 is made parent of 1 (2 is now representative of subset {0, 1, 2})
1 2 -1
Edge 0-2: 0 is in subset 2 and 2 is also in subset 2. Hence, including this edge forms a cycle.
How subset of 0 is same as 2?
0->1->2 // 1 is parent of 0 and 2 is parent of 1

https://youtu.be/mHz-mx-8lJ8

You might also like