Professional Documents
Culture Documents
Mentor
Ritik Mandal
Abstract
We start with description of important data structures
which are ways to store and arrange data in computer memory and on the way
learning sorting algorithms then we move onto algorithms which work on different
data structures and are a way to do deal with information efficiently.
Contents
Mentor ................................................................................................................................................... 1
Asymptotic Function ................................................................................................................................ 3
Recursion Trees ......................................................................................................................................... 3
Master Method ......................................................................................................................................... 3
Insertion Sort.............................................................................................................................................. 4
Merge Sort ................................................................................................................................................ 5
Bubble Sort................................................................................................................................................ 5
Binary Heap .............................................................................................................................................. 5
Heapsort ................................................................................................................................................... 7
Priority Queue ........................................................................................................................................... 7
Quicksort ................................................................................................................................................... 8
Abstract Data Type .............................................................................................................................. 8
Linked Lists ................................................................................................................................................. 8
Stack ....................................................................................................................................................... 11
Infix, Postfix and Prefix ............................................................................................................................ 12
Postfix and Prefix evaluation with Stack ............................................................................................... 13
Infix to Postfix using Stack ...................................................................................................................... 13
Queue ..................................................................................................................................................... 14
Hashing ................................................................................................................................................... 15
Direct Access Tables .......................................................................................................................... 15
Hash Table ........................................................................................................................................... 15
Collision................................................................................................................................................ 16
Graph ...................................................................................................................................................... 17
Graph Representation: ...................................................................................................................... 18
Adjacency Matrix ............................................................................................................................... 18
Adjacency List .................................................................................................................................... 18
Edge List .............................................................................................................................................. 19
Depth First Search .................................................................................................................................. 19
Breadth First Search ............................................................................................................................... 19
Shortest Path Algorithms .................................................................................................................... 20
Bellman – Ford Algorithm ....................................................................................................................... 20
Shortest Path Faster Algorithm .............................................................................................................. 21
Dijkstra’s Algorithm ................................................................................................................................. 21
Floyd – Warshall Algorithm..................................................................................................................... 22
Complete Search Algorithms ................................................................................................................ 23
Backtracking .......................................................................................................................................... 23
Greedy Algorithms ................................................................................................................................. 24
Coin Problem ...................................................................................................................................... 24
Data Compression ................................................................................................................................. 24
Huffman Coding .................................................................................................................................... 25
Dynamic Programming ......................................................................................................................... 25
Coin Problem ...................................................................................................................................... 25
Longest Increasing Subsequence ..................................................................................................... 26
Paths in a Grid ..................................................................................................................................... 26
Knapsack Problems ............................................................................................................................ 27
Binary Indexed Trees – Fenwick Tree .................................................................................................... 27
Segment Trees ........................................................................................................................................ 29
Tree .......................................................................................................................................................... 30
Tree Traversal .......................................................................................................................................... 30
Diameter of Tree Algorithm ................................................................................................................... 30
Spanning Trees .................................................................................................................................... 31
Kruskal’s Algorithm ................................................................................................................................. 31
Union-Find Structure ............................................................................................................................... 31
Prim’s Algorithm ...................................................................................................................................... 32
Sliding Window .................................................................................................................................... 32
Bibliography ............................................................................................................................................ 33
2
Asymptotic Function
Θ-Notation:
Θ gives the 𝑔(𝑛) such that 𝑓(𝑛) is sandwiched between different scales of 𝑔(𝑛), thus we say
𝑓(𝑛) 𝜖 Θ(𝑔(𝑛)). 𝑔(𝑛) under these conditions is considered asymptotically tight bound for 𝑓(𝑛).
𝑂-Notation:
This is known as Asymptotic Upper Bound as this gives upper bound over 𝑓(𝑛) by 𝑐𝑔(𝑛).
This is also generally considered to give the complexity of worst case.
Ω-Notation:
𝛀(𝒈(𝒏)) = { 𝒇(𝒏): 𝑻𝒉𝒆𝒓𝒆 𝒆𝒙𝒊𝒔𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝒄𝒐𝒏𝒔𝒕𝒂𝒏𝒕𝒔 𝒄 𝒂𝒏𝒅 𝒏0
𝒔𝒖𝒄𝒉 𝒕𝒉𝒂𝒕 𝟎 ≤ 𝒄𝒈(𝒏) ≤ 𝒇(𝒏)𝒇𝒐𝒓 𝒂𝒍𝒍 𝒏 ≥ 𝒏0.}
This is known as that Asymptotic Lower Bound that is 𝑐𝑔(𝑛) works as the lower bound for 𝑓(𝑛).
Generally considered to give time complexity of best case.
Recursion Trees
A recursion tree is a useful for visualizing what happens when a recurrence is iterated. It diagrams
the tree of recursive calls and the amount of work done at each call. We use it to make a guess
about the time complexity of the program. To get absolutely correct answer, we use a Master
Theorem.
𝑛
Example for recurrence 𝑇(𝑛) = 2𝑇 ( ) + 𝑛2
2
Above tree has 𝑂(𝑛2 ) since it will end in some finite time it time complexity is some constant time
into 𝑛^2.
Master Method
General method for solving time complexity of recurrences. Though not valid for all recurrences
𝑛
but useful for recursive relations of form 𝑇(𝑛) = 𝑎𝑇 (𝑏 ) + 𝑓(𝑛), 𝑎 ≥ 1, 𝑏 > 1 , 𝑓(𝑛) asymptotically
positive function.
Master Theorem:
𝐿𝑒𝑡 𝑎 ≥ 1 𝑎𝑛𝑑 𝑏 ≥ 1 𝑏𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠, 𝑙𝑒𝑡 𝑓(𝑛) 𝑏𝑒 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛, 𝑙𝑒𝑡 𝑇(𝑛)𝑏𝑒 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑛 𝑛𝑜𝑛𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟𝑠 𝑏𝑦
𝑛
𝑡ℎ𝑒 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑇(𝑛) = 𝑎𝑇 ( ) + 𝑓(𝑛),
𝑏
3
𝑛 𝑛 𝑛
𝑊ℎ𝑒𝑟𝑒 𝑤𝑒 𝑖𝑛𝑡𝑒𝑟𝑝𝑟𝑒𝑡 𝑡𝑜 𝑒𝑖𝑡ℎ𝑒𝑟 𝑚𝑒𝑎𝑛 𝑓𝑙𝑜𝑜𝑟 ( ) 𝑜𝑟 𝑐𝑒𝑖𝑙 ( ) . 𝑇ℎ𝑒𝑛 𝑇(𝑛)ℎ𝑎𝑠 𝑓𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐 𝑏𝑜𝑢𝑛𝑑𝑠:
𝑏 𝑏 𝑏
𝑙𝑜𝑔𝑎 𝑙𝑜𝑔𝑎
−𝜖
1. 𝐼𝑓 𝑓(𝑛) = 𝑂 (𝑛 𝑙𝑜𝑔𝑏 ) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝜖 > 0, 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ (𝑛 𝑙𝑜𝑔𝑏 ) .
𝑙𝑜𝑔𝑎 𝑙𝑜𝑔𝑎
( )
2. 𝐼𝑓 𝑓(𝑛) = Θ (𝑛𝑙𝑜𝑔𝑏 ) , 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ (𝑛 𝑙𝑜𝑔𝑏 𝑙𝑜𝑔𝑛) = Θ(𝑓(𝑛)𝑙𝑜𝑔𝑛).
𝑙𝑜𝑔𝑎 𝑛
+𝜖
3. 𝐼𝑓 𝑓(𝑛) = Ω (𝑛 𝑙𝑜𝑔𝑏 ) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝜖 > 0, 𝑎𝑛𝑑 𝑖𝑓 𝑎𝑓 ( ) ≤ 𝑐𝑓(𝑛) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑐 < 1 𝑎𝑛𝑑 𝑎𝑙𝑙
𝑏
𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑙𝑦 𝑙𝑎𝑟𝑔𝑒 𝑛, 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ(𝑓(𝑛)).
Insertion Sort
Creates a sorted array out of initial array by reading each element placing it in a sorted manner
among the already read elements of array. Very slow and not used generally.
C++ Code:
for(int i=1;i<n;i++){
//reads through elements
for(int j=0;j<i;j++){
//finds the first element larger than the element just read
//among the already sorted elements
if(a[j]>=a[i]){
for(int k=i-1;k>=j && k>=0;k--) {swap(a[k],a[i]); i--;}
break;}
}
}
Idea:
Idea for this algorithm is that the elements we have went through are already sorted thus we just
have to find the first element larger than the element we just read and then insert it
appropriately.
Time Complexity:
Best Case 𝑂(𝑛)
Average Case 𝑂(𝑛2 )
Worst Case 𝑂(𝑛2 )
Space Complexity: 1
4
Merge Sort
Merge Sort is one of the fastest sorting algorithm and a type of Divide & Conquer Algorithm. It
works by breaking the array into two halves and sorting each half using merge sort then finally
merging those two halves appropriately to yield sorted array.
Idea:
The bigger problem of sorting is similar to the sub-problem of sorting the half array.
Time Complexity:
Best Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)
Average Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)
Worst Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)
Bubble Sort
It goes through entire array again and again until array is completely sorted. Each time swapping
the adjacent numbers if they are in wrong order. Very basic sorting algorithm.
C++ Code:
for(int j=0;j<len;j++){
int swaps=0;//counter for swaps
for(int i=0;i<len-1;i++){
//swap if not in right order
if(a[i]>a[i+1]) {swap(a[i],a[i+1]); swaps++;}
}
//if no elements swapped. Array already sorted.
if(swaps==0) break;
}
Idea:
Idea follows that two real numbers 𝑎, 𝑏 are either 𝑎 > 𝑏 or 𝑎 ≤ 𝑏.
Time Complexity:
Best Case 𝑂(𝑛)
Average Case 𝑂(𝑛2 )
Worst Case 𝑂(𝑛2 )
Space Complexity: 1
Binary Heap
A data structure of binary tree type such that it is Complete Tree that is all levels are completely
filled except possible the last level. Binary Heap is either Min Heap or Max Heap.
Min Heap:
Key at root node is less than all other keys in Binary Heap. This is recursively true for all nodes.
Max Heap:
Key at root node is more than all other keys in Binary Heap This is recursively true for all nodes.
5
Binary Heap Representation:
Generally represented as array. A[0] at root node. For ith node A[i]:
A[(i-1)/2] Parent Node
A[2*i+1] Left Child Node
A[2*i+2] Right Child Node
Heapify:
When root of a binary heap is removed it replaces it with the last element thus maintaing shape
property of heap. Then out of this root node and it’s two children node it swaps max(case of max
heap) with root node. This continues recursively for each binary heap. This maintains order
property of heap.
At any key if any change is made i.e. if decreased we check through the heap it the root node
of and if increase, we check through the heap which it is child node in.
When a key is inserted it is added as the last element of heap thus maintaining shape property.
Then max(case of max heap) of this node, parent node, brother node is swapped with this node.
This continues upwards until heap retrieves it’s order.
6
C++ Code:
void getMini(int a[MAXSIZE]){cout<<a[0];}
Heapsort
One of the fastest sorting algorithm based on the binary max heap data structure. It uses that
root node of max heap is largest element thus exchanging it with present last element of array. It
heapifies this after removing the now last element. And this continues until one element left.
Before starting this procedure entered array is made into a heap.
Time Complexity:
Best Case 𝑂(𝑛 log 𝑛)
Average Case 𝑂(𝑛 log 𝑛)
Worst Case 𝑂(𝑛 log 𝑛)
Space Complexity: 1
Priority Queue
Priority Queue is a data structure that stores priorities/ comparable values. It supports inserting
new priorities and removing/ returning highest priority. A priority queue can be implemented
using different data structures as Array, Heap,etc.
7
We use heap as it is more efficient over other data structures in using as priority queue cause
deleting or inserting element can be done in 𝑂(𝑙𝑜𝑔 𝑛) time compared to array 𝑂(𝑛) time.
Implementation:
Quicksort
Fastest and the most used sorting algorithm. Type of Divide and Conquer algorithm. It picks up a
pivot element and partitions the given array around the pivot in increasing order. Then, quicksorts
each half thus created. Thus this works recursively yet unlike merge sort no stiching of 2 halves
required.
C++ Code:
Time Complexity:
Best Case 𝑂(𝑛𝑙𝑜𝑔𝑛)
Average Case 𝑂(𝑛𝑙𝑜𝑔𝑛)
Worst Case 𝑂(𝑛2 )
Linked Lists
It is a Linear Data Structure in which elements are stored not in a continguous memory like array
instead it consists of nodes where each node consists of a data field and a refernce(link/pointer)
to the next node in list. To remember a linked list we only need to know about address of head
node, rest will be linked.
Class Node:
class Node{
public:
int data;
Node* next;
};
8
Cost of accessing 𝑂(1) as 𝑗 𝑡ℎ element address by 𝑂(𝑛) as we will have to traverse
an element home node address + 𝑗 ∗ through each element in worst
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑙𝑒𝑚𝑒𝑛𝑡. case.
Memory - Fixed Size - No unused memory
Requirements - Unused Memory remains - Extra memory for pointer
- Memory may not be variables
available as one large block - Memory available as multiple
small blocks
Cost of inserting - At beginning 𝑂(𝑛) - At beginning 𝑂(1)
an element - At end 𝑂(1) if array not - At end 𝑂(𝑛)
full else 𝑂(𝑛) - At mid 𝑂(𝑛)
- At mid 𝑂(𝑛)
Cost of deleting - At beginning 𝑂(𝑛) - At beginning 𝑂(1)
an element - At end 𝑂(1) - At end 𝑂(𝑛)
- At mid 𝑂(𝑛) - At mid 𝑂(𝑛)
Ease of Use Better Not as good
Insertion of Nodes:
This is done by first reaching the node where we wish to add the new node and then changing
the pointer of previous node to point to the location of this new node and the next pointer on
this node to contain address of next node.
//adds new node at beginning of list
void push(Node** head_ref, int new_data){
Node* new_node = new Node();
new_node->data = new_data;
new_node->next = (*head_ref);
(*head_ref) = new_node;
}
9
//adds new node at end of list
void append(Node** head_ref, int new_data){
Node* new_node = new Node();
Node* last=(*head_ref);
new_node->data = new_data;
new_node->next = NULL;
Deletion of Nodes:
To delete node with some data – ‘Key’. We first find that node and the shift pointer of previous
node to point to address of node next to this node. We can design similar code to delete node
at some given position. Below code is for node with some given ‘key’ value.
//delete node with 'key' data
void deleteNode(Node** head_ref, int key){
Node *temp = *head_ref;
Node *prev = NULL;
if(temp==NULL) return;
Final Comments:
As we see that in some cases linked lists work quite efficiently by using memory strategically they
are important data structures and is used frequently in merging two lists, swapping two elements,
these kind of operations are performed quite easily by just changing address of next node for
10
involved nodes. As above implementation we can define other operations like reversing,
searching, et cetera.
Stack
Stack is a linear data structure which follows a particular order in which operations are to be
performed. The order is Last In First Out (LIFO) or First In Last Out (FILO).
Main Operations:
1. Push – Adds an item in the stack if stack not full else Overflow condition. Time Complexity
𝑂(1).
2. Pop – Removes an item from the stack, this item was the last added item. If stack empty
then underflow condition. Time complexity 𝑂(1).
3. Peek / Top – Returns top element of stack. Time Complexity 𝑂(1).
4. isEmpty – Return true if stack is empty else false. Time Complexity 𝑂(1).
Stacks can be used using 1. Array – Better as no pointers involved but fixed memory. 2. Linked List
– Better as no memory restriction.
int Stack::pop(){
if(top<0) {cout<<"Stack Underflow"; return 0;}
else {top--; return a[top];}
}
bool Stack::isEmpty(){
return (top<0);
}
int Stack::peek(){
if(top<0) {cout<<"Stack Empty"; return 0;}
else return a[top];
}
11
Linked List Implementation using Stack:
struct Node{
int data;
Node* link;
};
int peek(){
if(!isEmpty()) return top->data;
else exit(1);
}
void pop(){
Node* temp;
if(isEmpty()){
cout<<"Empty Stack"<<endl;
exit(1);
}
temp = top;
top = top->link;
temp->link = NULL;
free(temp);
}
Final Comments:
Stacks are very useful in infix to postfix or infix to prefix conversion, also used in Backtracking
based algorithms.
Prefix Expression:
< 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >. This is much faster compared to infix expression as this
saves time and space for associativity rules and () as prefix allows only one operand.
12
Postfix Expression:
< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 > . Has similar time and space complexity as prefix expression
but this is majorly used cause the algorithms involving this are more intuitive thus easier to use.
Prefix Evaluation:
Works completely similar to postfix evaluation except for that this time evaluation will be done
from right to left.
13
Pseudo Code:
//assumption that this doesn't handle any error conditon
for i<-0 to length(exp)-1{
if (exp[i] is operand)
res <- res + exp[i];
else
while(!s.empty() && HasHigherPrec(s.top(), exp[i])){
res<-res+s.top();
s.pop;
}
s.push(exp[i]);
}
while (!s.empty())
{
res <- res + s.top();
s.pop();
}
return yes;
Queue
Queue is an First In First Out data structure. Theoritically, queue can be considered as a list which
is open from both sides unlike stack operation. We can enter in queue from one side and remove
elements from other side.
Queue should perform these basic operations:
1. Enqueue(x) or Push(x): Increases the size of queue adding x from one side. 𝑂(1) time
complexity.
2. Dequeue() or Pop(): Decrease size of queue by removing or popping out the element
from other side of queue. 𝑂(1) time complexity.
Array Implementation:
We use a rear and front index for queue inside an array. When we want to denenqueue we
increase front index by 1. When we want to enqueue we add an element to the rear end of
queue. This is assuming front index < rear index.
void Enqueue(int queue[], int val, int n, int front, int rear){
if(rear==n-1) cout<<"Queue Overflow"<<endl;
else{
if(front==-1) front=0;
cout<<"Insert element in queue:"<<val<<endl;
rear++; queue[rear]=val;
}
}
14
Linked List Implementation:
Overcomes limited space error of array. We keep into knowledge the pointers pointing to
address of front and rear end of queue so that adding or removing an element always takes
constant time.
void Dequeue(){
struct Node* temp=front;
if(front==NULL) return;
if(front==rear) front=rear=NULL;
else front=front->next;
free(temp);
}
Hashing
Hashing is an efficient way of storing data. As an example, if we wanted to store a directory with
information about a particular phone number. We could create a way or a hash map such that
when given phone number is entered in it, it returns address of location where data related to it is
stored.
Advantages:
𝑂(1) time Complexity for tasks like adding, deleting and searching phone number or any other
index based input.
Disadvantage:
Total array declared may be of size |U| and used might be just few Keys, so large amount of
memory would be wasted.
Memory on computer might not be large enough to store |U| fully.
Hash Table
Hash Table is generally used when the set of Keys being used is much smaller than |U|. We use
the element corresponding to key ℎ(𝑘). This function is known as Hash Function.
ℎ: 𝑈 → {0,1, . . . , 𝑚 − 1}
Size m much smaller than |U| generally.
15
Collision
Two keys when hash to same slot results in collision. This happens cause 𝑚 < |𝑈|. We minimize
collision by randomizing hash.
Chaining
Simplest resolution for collision. We place all the elements that hash to same slot into linked list.
This is good enough but in this some parts of table may remain unused and also if the linked list
gets large, searching or deleting an elements may take considerable time.
Performance of Chaining
𝑛
Given m number of slots with n keys to be insered, we used Load factor ∝= which gives
𝑚
average size of linked list assuming each key equally likely to be hashed to any slot table.
1. Time to search 𝑂(∝)
2. Time to delete 𝑂(∝)
3. Time to insert 𝑂(1)
For storing chains we could use:
1. Linked Lists – Not cache friendly with 𝑂(𝑙) search, delete and insert time complexity.
2. Dynamic Sized Array – Cache Friendly, Time complexity same as linked list.
3. Self Balancing BST – Not cache friendly. Insert in 𝑂(𝑙). Search and delete in 𝑂(log 𝑙).
Open Addressing
All elements are stored in the hash table itself so size of table must be greater than |𝑈|. Done by
(a) Linear Probing
(b) Quadratic Probing
(c) Double Hashing
Linear Probing
We simply linearly probe the element to the next slot. If we get some ℎ(𝑘) index after applying
the hash function and the slot is occupied we just keep the element to the next exmpty slot.
Quadratic Probing
Unlike linear probing where we went to the next empty slot if hashed to slot is filled here we go to
𝑖 2 + ℎ(𝑘) in the 𝑖 𝑡ℎ iteration.
Double Hashing
We have another hash function ℎ𝑎𝑠ℎ2(𝑥) and if the hashed to slot is empty we go to ℎ𝑎𝑠ℎ(𝑥) +
𝑖. ℎ𝑎𝑠ℎ2(𝑥) in 𝑖 𝑡ℎ iteration.
16
Graph
A graph is a non-linear data structure consisting of nodes and edges. Nodes are also referred to
as vertices and edges are lines or arcs that connect any two nodes in graph.
A finite set of ordered pair (𝑢, 𝑣) represents an edge. Edges can be bidirectional or unidirectional.
Edges can also be weighted, then known as Weighted Edge. Edge (𝑢, 𝑣) refers edge from node 𝑢
to node 𝑣.
Path
Path the ordered set of edges to reach from node a to node b. Length of a path generally is the
number of edges in it or sum of weights on edges. Path is a cycle if the first and last node is same.
Path is Simple if each node appears at most once in the path.
Connectivity
Graph is connected if for any two nodes there exists a path between. In case where graph is
note connected the connected parts of graph are called Components.
Tree
A graph such that it has 𝑛 vertices and 𝑛 − 1 edges placed such that there is a unique path from
vertex 𝑢 to vertex 𝑣.
Graph Colouring
Each node assigned a colour such that no two adjacent nodes have the same colour.
Bipartite Graph : Graph is Bipartitite if it is possible to colour it using two different colour.
17
Bipartite Condition
Graph with no cycle containing odd number of edges ⇔ Bipartite Graph
Graph Representation:
1. Adjacency Matrix
2. Adjacency List
3. Edge List
Adjacency Matrix
We have matrix 𝐴 of size 𝑣 × 𝑣, where v represents the number of nodes in graph and element
𝑎[𝑖][𝑗] of 𝐴 represent the weight or cost of edge (𝑖, 𝑗). 0 can be used to represent edges which
don’t exist while giving other edges some non zero cost.
Pros:
1. Easier to implement
2. Adding or Removing edge in 𝑂(1) time.
Cons:
1. Consumes more space 𝑂(𝑣 2 ) even if the graph is sparse.
2. Adding vertex takes 𝑂(𝑣 2 ) time.
Adjacency List
Each node 𝑢 is assigned an adjacency list that contains list of nodes to which there is an edge
from 𝑢. Most popular representation of Graphs.
We use a list of vectors which for each node consists of node numbers of nodes connected to it.
In Bidirectional graph repititions may occur to signify both directions possible. In case of weighted
edges we use a vector for each edge such that each vector element contains a pair
(𝑏, 𝑤), where b is node where the node to which this vector belongs connects to, w is weight of
this edge.
vector<int> adj[n]; //large enough N
vector< pair<int, int> > adj[N];
adj[1].push_back({2,5});
adj[2].push_back({3,7});
adj[2].push_back({4,6});
adj[3].push_back({4,5});
adj[4].push_back({1,2});
Pros:
1. Saves space 𝑂(|𝑉| + |𝐸|). Only in worst case space is 𝑂(𝑣 2 ).
2. Adding vertex much easier.
18
Cons:
1. Queries like where there is an edge from vertex 𝑢 to vertex 𝑣 not efficient and done in time
𝑂(𝑉).
Edge List
This method has focus on edges and in algos which don’t need edges starting at a particular
node. We create vector of tuple or pair such that (𝑎, 𝑏, 𝑤) tuples represents that there exists an
edge of weight 𝑤 from vertex 𝑎 to vertex 𝑏 with weight 𝑤.
Implementation:
vector<int> adj[N]; // non-weighted graph
bool visited[N]; //initiallly initialized to false for all
In case of disconnected graph, after compling DFS for a given random starting node. Start again
with some other univisited node and continues till all nodes are visited.
19
Implementation:
queue<int> q; //queue of nodes in order
//of increasing distance from some starting node.
//node at beginning is the one to be processed next
//new nodes ended to end of queue.
bool visited[N];
int distance[N];
BFS generally used where distance from given main node are needed as this stores distances
from some starting node. In case of disconnected graph for each component we need to start
with some random node after emptying queue.
Applications:
1. Check for cycles i.e. if the we receive a node such that such that it only has visited neighbours
then we just completed a cycle or if graph has c nodes than if number of edges is greater than
c-1 then it must contain a cycle.
2. Bipartite checking by graph traversal giving a starting node some colour then all it’s adjacent
nodes the opposite colour and continuing on until we notice that some adjacent node has same
colour as this node hence not bipartite.
Idea:
A process repeats (𝑛 − 1) where in each process it goes through all the edges and tries to reduce
20
the distances. An array distance is created which stores all the distances. We do (𝑛 − 1) times as
in worst cose each shortest path can contain at max. (𝑛 − 1) edges. We don’t want negative
cycles as then length can reduced infinitely by repeating cycles.
Implementation:
#define INF 100000
//uses edge list which is tuple <a,b,w> from a to b with weight w
for (int i=1;i<=n;i++) distance[i]=INF;
distance[x]=0//starting node
for(int i=1;i<=n-1;i++){
for(auto e : edges){
//iterate through all edges
int a,b,w;
tie(a,b,w)=e;
distance[b]=min(distance[b],distance[a]+w);
}
}
Remark Practically minimum edge could be found before (𝑛 − 1) rounds so we can make faster
algorithm by stoping. To detect negative cycle we run for 𝑛 times then if in last round there is
reduction in length then there is a negative cycle.
Dijkstra’s Algorithm
A far more efficient algorithm for calculating the shortest distance of each node from some
starting node provided graph has no negative weighted edges.
Idea:
Based on fact that there are no negative weighted edges thus distance of 𝑛𝑜𝑑𝑒 𝑏 from starting
node will be atleast greater than 𝑛𝑜𝑑𝑒 𝑎 distance from starting node assuming starting
node→ 𝑛𝑜𝑑𝑒 𝑎 → 𝑛𝑜𝑑𝑒 𝑏 is the fastest way to reach node b.
Implementation:
Graph stored as adjaceny list. We are using negative of distance as priority queue by default
21
returns maximum element but we need minimum distance node.
priority_queue<int,int> q;
while(!q.empty()){
int a=q.top(); q.pop();
if(processed[a]) continue;
processed[a]=true;
for(auto u : adj[a]){
int b = u.first, w=u.second;
if(distance[a]+w<distance[b]){
distance[b]=distance[a]+w;
q.push({-distance[b],b});
}
}
}
Implementation:
//constructing adjacency matrix
for(int i=1; i<=n; i++){
for(int j=1;j<=n;j++){
if(i==j) distance[i][j]=0;
else if(adj[i][j]) distance[i][j]=adj[i][j];
else distance[i][j]=INF;
}
}
Backtracking
A type of complete search method where we begin with an empty solution and extend solution
step by step. Search recursively goes through all different ways on how a solution can be
constructed. Solution is made incrementally, one piece at a time.
Almost every backtracking problem can be solved much more efficiently using greedy
algorithms or dynamic programming. Unlike recursion where we called function until we reached
a base case, in backtracking we use recursion to explore every possible solution and killing those
on the way which are not possible.
N-Queens Problem:
Classic case of backtracking.
Find number of ways to place n queens on an nxn chessboard such that no two queens attack
each other.
We solve for n=4 case. Step 1 we create all 4 possible subcases of placing queen in row 1 then
go on to place more queen and discarding invalid solutions.
and so on.
Implementation:
We can’t place next queen at some diagonal or row or column. Using that we have assigned
blocks belong to some row, column, or diagonal some index as below.
23
Code:
void search(int y){
if(y==n) {count++; return;}
for(int x=0;x<n;x++){
if(column[x] || diag1[x+y] || diag2[x-y+n-1]) continue;
column[x] = diag1[x+y] = diag2[x-y+n-1] = 1;
search(y+1);
column[x] = diag1[x+y] = diag2[x-y+n-1] = 0;
}
}
Explanation:
1. Start on topmost row.
2. If all queens place, return true.
3. Try all columns in current row. And do this for each row:
(a) If can be safely placed, mark as part of solution and recursively check if this leads to a
solution.
(b) If placing this queens leads to a solution return true.
(c) If no solution, unmark this and go to (a) to try other rows.
4. If all columns tried and no solution, return false.
Greedy Algorithms
Constructs a solution to problem by just caring to make the best choice at any given moment.
Challenge being constructing the solution such that local optimum leads to global optimum.
Coin Problem
Given this set of coins {1,2,3,10,20,50,100,200}, each in infinite number find a way to make 520$
with minimum number of coins.
Answer : 200+200+100+20. This method actually always works for this problem. In general case for
some other given set of coins, this greedy method may not work.
Counter-Example : {1,3,4}, to create 6 greedy gives solution 4+1+1. Answer however is 3+3. These
problems handled by Dynamic Programming.
Data Compression
This is a way of assigning codewords to letters of a string (as an example) such that space
consumed is lesser. Binary Code assigns codeword to each character of string that consists of
bits. We want to be able to generate our original string from this codewords and also design
24
codewords for each character such that space taken by coded string is minimum thus we assign
smaller codewords to characters which appear more often in a string.
Huffman Coding
Huffman Coding is a greedy algorithm that constructs optimal code for compressing a given
string. Algorithm creates a binary tree based on frequencies of each characters in the string, and
each character’s codewords read by moving from root node to corresponding node. Each
move to right corresponds to ‘1’ and move to left ‘0’.
Initially each character is assigned a node and a weight(frequency of it’s appearance in the
string). Then at each step, the two nodes with minimum weights are combined to give a new
node with weight equal to sum of weights of children nodes.
Example Implementation:
𝐴𝐴𝐵𝐴𝐶𝐷𝐴𝐶𝐴 has A- 5, B-1, C-2, D-1 has these weights.
Thus , this is what we get giving final codewords for each character as follows:
Dynamic Programming
Makes use of both greedy algorithms and complete search. Possible to use when problem can
be divided into overlapping subproblems which can be solved independently.
Uses:
Coin Problem
This doesn’t end up being solved each time we use greedy algorithms so we use dynamics
programming. This uses recursive function that goes through all possibilities to form sum like brute
force but this is more efficient cause it calculates answer to each subproblem only once -
Memoization.
Idea:
𝑠𝑜𝑙𝑣𝑒(𝑥) = min(𝑠𝑜𝑙𝑣𝑒(𝑥 − 𝑐) + 1) 𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑐 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑐𝑜𝑖𝑛𝑠. We constructed a recursive solution to
this and this can be used to solve but not quite efficient as repetitive calculation of many values.
Thus when solved 𝑠𝑜𝑙𝑣𝑒(𝑥), we store it in an array to not solve it again later.
25
Implementation 1:
//ready[x] tells if we already have calculated value of solve(x);
//if we have we can return that value
//else calculate it and update ready, value for this x
int solve(int x){
if(x<0) return INF;
if(x==0) return 0;
if(ready[x]) return value[x];
int best = INF;
for(auto c : coins) best=min(best,solve(x-c)+1);
value[x]=best;
ready[x]=true
}
Implementation 2:
Based on that we calculate values for all solve(x) iteratively.
value[0]=0;
for(int x=1;x<=n;x++){
value[x]=INF;
for(auto c : coins){
if(x-c>=0) value[x]=min(value[x],value[x-c]+1);
}
}
Implementation:
for(int k=0;k<n;k++){
length[k]=1;
for(int i=0;i<k;i++){
if(array[i]<array[k]) length[k]=max(length[k],length[i]+1);
}
}
Paths in a Grid
Given 𝑛𝑥𝑛 square grid given a value at each square. We have to find the path from top left to
bottom right square such that sum of values on the squares it traverses is maximum and we can
move down, right only.
Solution:
𝑠𝑢𝑚(𝑦, 𝑥) = max(𝑠𝑢𝑚(𝑦, 𝑥 − 1), 𝑠𝑢𝑚(𝑦 − 1, 𝑥)) + 𝑣𝑎𝑙𝑢𝑒[𝑦][𝑥],
26
Row and columns are numbered from 1 to n. sum(y,x)=0 when y=0 or x=0, assigned as base case.
We can use recursive iteration.
Implementation:
for(int y=1;y<=n;y++){
for(int x=1;x<=n;x++){
sum[y][x] = max(sum[y][x-1],sum[y-1][x]) + value[y][x];
}
}
Knapsack Problems
Knapsack refers to problems where a set of objects is given and we have to find subsets with
some property.
Example Problem:
Given a set of numbers find all sums which can be constructed by taking a subset.
Can be simplified to 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘) = 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥 − 𝑤[𝑘], 𝑘 − 1) 𝑜𝑟 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘 − 1)
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘) tells if a sum of x is possible with first k elements of sequence so this can be brokenn
down to those which involve weight at 𝑘 𝑡ℎ index or those which don’t. Base case being
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 0) = { 𝑇𝑟𝑢𝑒 𝑥 = 0; 𝐹𝑎𝑙𝑠𝑒 𝑥 ≠ 0;
Implementation 1:
possible[0][0] = true;
for(int k=1;k<=n;k++){
//W is sum of all elements in set
for(int x=0;x<=W;x++){
if(x-w[k]>=0) possible[x][k] |= possible[x-w[k]][k-1];
possible[x][k] |= possible[x][k-1];
}
}
Overview:
Let 𝑓 be sum function. Given array A[0 … N-1]. Fenwick Tree is array T[0 … N-1] where each
element is sum of elements of A in some range [𝑔(𝑖), 𝑖] :
𝑖
𝑇𝑖 = ∑ 𝐴𝑗
𝑗=𝑔(𝑖)
where 𝑔 satisfies 0 ≤ 𝑔(𝑖) ≤ 𝑖.
There can be many ways to chose 𝑔(𝑖), if we chose 𝑔(𝑖) = 0 (zero – based indexing), then
calculates sum for range in 𝑂(1) time but update is slow.
Fenwick’s Algorithm:
Fenwick’s genius is in definition of 𝑔(𝑖). Replace all trailing 1 bits in binary representation of 𝑖 with 0
27
bits. 𝒈(𝒊) = 𝒊 & (𝒊 + 𝟏). In easier format 𝑇𝑘 = ∑𝑘𝑗=𝑘+1−𝑝(𝑘) 𝐴𝑗 , where 𝑝(𝑘) is the maximum power of 2
which divides 𝑘.
Using this tree, any 𝑠𝑢𝑚𝑞 (1, 𝑘) can be calculated in 𝑂(log 𝑛) time because range [1, 𝑘] can always
be divided into 𝑂(log 𝑛) ranges with sums stored in tree. Each element belongs to 𝑂(log 𝑛) ranges
in binary indexed tree, it is suffice to update 𝑂(log 𝑛) values in tree.
Implementation:
//calculates sum_q(1,k)
int sum(int k){
int s=0;
while(k>=1){
sum+=tree[k];
k-=k&-k;
}
return s;
}
Example:
28
Segment Trees
Support processing range query and updating range query. In range queries they may answer
maximum, minimum or sum queries int 𝑂(log 𝑛) time. Higher memory requirement than fenwick’s
tree.
Implementation:
If size is a power of 2. Then an array of size 2𝑛 stored to represent tree. Tree[1] represents top
node, tree[2] & tree[3] it’s children and so on. Tree[𝑛] to Tree[2𝑛 − 1] represents actual array.
Parent of Tree[k] is Tree[⌊𝑘/2⌋] and has children Tree[2𝑘] and Tree[2𝑘 + 1].
//calculates sum_q(a,b)
int sum(int a, int b){
a+=n; b+=n;
int s=0;
while(a<=b) {
if(a%2==1) s+=tree[a++];
if(b%2==0) s+=tree[b--];
a/=2; b/=2;
}
return s;
}
Other Queries:
Define internal nodes of segment tree such that for the range which is of size of power of 2, each
node contains the query operation for that range. Then for any asked range answer can be
29
calculated the way sum is calculated in 𝑂(log 𝑛) time.
Tree
Tree is connected, acyclic graph that contains n nodes and n-1 edges. Removing any edge
from tree divides it into two components, and adding any edge to tree creates a cycle and a
unique path between any two nodes of tree.
Tree Traversal
General Depth First Search implemented at an arbitrary node. Dfs function is given startinf node
and previous node (if any) so that dfs doesn’t visit previously visited node.
Implementation:
//ads[s] is vector containing all nodes connected by direct edge to s.
void dfs(int s, int e) {
for (auto u : adj[s]) if (u != e) dfs(u, s);
}
Diameter – Maximum length of a path between any two nodes. Their may be mutiple maximum
length paths.
30
Algorithm 2:
Uses 2 DFS. First chose arbitrary node 𝑎 in tree and find farthest node 𝑏 from 𝑎 and find the
farthest node 𝑐 from 𝑏. Diameter of tree is distance between 𝑏 and 𝑐.
Spanning Trees
Spanning tree of a graph consists of all nodes of graph and some edges of graph so that there is
path between any two nodes.
Weight of ST: Sum of edge weights of ST.
Kruskal’s Algorithm
Sort all the edges in order of increasing weghts and initially just setup a tree with just nodes and
no edges. Now keep adding the next smallest edge as long as tree remains acyclic. Kruskal’s
Algorithm is based on a greedy strategy.
Implementation:
//edge list representation
//sort m edges in O(mlogm) time
for(...){
if(!same(a,b)) unite(a,b);
}
//same(a,b) tells if a,b are in same component
//unite(a,b) joing component containing a,b
𝑠𝑎𝑚𝑒(𝑎, 𝑏), 𝑢𝑛𝑖𝑡𝑒(𝑎, 𝑏) can be implemented using Union-Find data structure in 𝑂(log 𝑛) time.
Union-Find Structure
This maintains collection of sets. Sets are disjoint. Two 𝑂(log 𝑛) operations are supported, 𝑢𝑛𝑖𝑡𝑒
operation joins two sets and 𝑓𝑖𝑛𝑑 operation finds representative of the set given a element.
Sturcture:
Implementation:
Using arrays, array 𝑙𝑖𝑛𝑘 contains for each element the next element in the chain or the element
31
itself if it is a representative.
//initially each element is seperate set
for(int i=1;i<=n;i++) link[i]=i;
for(int i=1;i<=n;i++) size[i]=1;
Prim’s Algorithm
First choses an arbitrary node then always choses minimum weight edge and adds a new node
to the tree. Finally, when all nodes are added it becomes minimum spanning tree.
Implementation is done by priority queue.
Implementation:
Priority queue should contain all nodes that can be connected to current component using
single edge in increasing order of weights.
Sliding Window
Sliding Window is constant size sub-array that moves from left to right through the array. At each
window position, we need to calculate some information about elements inside the window.
32
Bibliography
With the last topic finished, this ends my journey for Summer of Science project based on Data
Structures and Algorithms. I would like to thank my mentor – Ritik Mandal who was very
supportive in whatever way we wanted to make our report and did not impose any form of
restriction on us, which perfectly suited my working style.
I took references from:
1. GeekforGeeks
2. Introduction to Algorithms – Thomas Cormen
3. Competitive Programmer Handbook – Antii Laksoonen
4. Youtube Channel – MyCodeSchool
33