You are on page 1of 33

DATA STRUCTURES AND ALGORITHMS

Amritaansh Narain Yadav


200100022
Undergraduate, Department of Mechanical Engineering
Indian Institute of Technology Bombay

Mentor
Ritik Mandal

Abstract
We start with description of important data structures
which are ways to store and arrange data in computer memory and on the way
learning sorting algorithms then we move onto algorithms which work on different
data structures and are a way to do deal with information efficiently.

Contents
Mentor ................................................................................................................................................... 1
Asymptotic Function ................................................................................................................................ 3
Recursion Trees ......................................................................................................................................... 3
Master Method ......................................................................................................................................... 3
Insertion Sort.............................................................................................................................................. 4
Merge Sort ................................................................................................................................................ 5
Bubble Sort................................................................................................................................................ 5
Binary Heap .............................................................................................................................................. 5
Heapsort ................................................................................................................................................... 7
Priority Queue ........................................................................................................................................... 7
Quicksort ................................................................................................................................................... 8
Abstract Data Type .............................................................................................................................. 8
Linked Lists ................................................................................................................................................. 8
Stack ....................................................................................................................................................... 11
Infix, Postfix and Prefix ............................................................................................................................ 12
Postfix and Prefix evaluation with Stack ............................................................................................... 13
Infix to Postfix using Stack ...................................................................................................................... 13
Queue ..................................................................................................................................................... 14
Hashing ................................................................................................................................................... 15
Direct Access Tables .......................................................................................................................... 15
Hash Table ........................................................................................................................................... 15
Collision................................................................................................................................................ 16
Graph ...................................................................................................................................................... 17
Graph Representation: ...................................................................................................................... 18
Adjacency Matrix ............................................................................................................................... 18
Adjacency List .................................................................................................................................... 18
Edge List .............................................................................................................................................. 19
Depth First Search .................................................................................................................................. 19
Breadth First Search ............................................................................................................................... 19
Shortest Path Algorithms .................................................................................................................... 20
Bellman – Ford Algorithm ....................................................................................................................... 20
Shortest Path Faster Algorithm .............................................................................................................. 21
Dijkstra’s Algorithm ................................................................................................................................. 21
Floyd – Warshall Algorithm..................................................................................................................... 22
Complete Search Algorithms ................................................................................................................ 23
Backtracking .......................................................................................................................................... 23
Greedy Algorithms ................................................................................................................................. 24
Coin Problem ...................................................................................................................................... 24
Data Compression ................................................................................................................................. 24
Huffman Coding .................................................................................................................................... 25
Dynamic Programming ......................................................................................................................... 25
Coin Problem ...................................................................................................................................... 25
Longest Increasing Subsequence ..................................................................................................... 26
Paths in a Grid ..................................................................................................................................... 26
Knapsack Problems ............................................................................................................................ 27
Binary Indexed Trees – Fenwick Tree .................................................................................................... 27
Segment Trees ........................................................................................................................................ 29
Tree .......................................................................................................................................................... 30
Tree Traversal .......................................................................................................................................... 30
Diameter of Tree Algorithm ................................................................................................................... 30
Spanning Trees .................................................................................................................................... 31
Kruskal’s Algorithm ................................................................................................................................. 31
Union-Find Structure ............................................................................................................................... 31
Prim’s Algorithm ...................................................................................................................................... 32
Sliding Window .................................................................................................................................... 32
Bibliography ............................................................................................................................................ 33

2
Asymptotic Function
Θ-Notation:

𝚯(𝒈(𝒏)) = { 𝒇(𝒏): 𝑻𝒉𝒆𝒓𝒆 𝒆𝒙𝒊𝒔𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝒄𝒐𝒏𝒔𝒕𝒂𝒏𝒕𝒔 𝒄1 , 𝒄2 𝒂𝒏𝒅 𝒏0


𝒔𝒖𝒄𝒉 𝒕𝒉𝒂𝒕 𝟎 ≤ 𝒄1𝒈(𝒏) ≤ 𝒇(𝒏) ≤ 𝒄2𝒈(𝒏) 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒏 ≥ 𝒏0 }

Θ gives the 𝑔(𝑛) such that 𝑓(𝑛) is sandwiched between different scales of 𝑔(𝑛), thus we say
𝑓(𝑛) 𝜖 Θ(𝑔(𝑛)). 𝑔(𝑛) under these conditions is considered asymptotically tight bound for 𝑓(𝑛).

𝑂-Notation:

𝑶(𝒈(𝒏)) = { 𝒇(𝒏): 𝑻𝒉𝒆𝒓𝒆 𝒆𝒙𝒊𝒔𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝒄𝒐𝒏𝒔𝒕𝒂𝒏𝒕 𝒄 𝒂𝒏𝒅 𝒏0


𝒔𝒖𝒄𝒉 𝒕𝒉𝒂𝒕 𝟎 ≤ 𝒇(𝒏) ≤ 𝒄𝒈(𝒏) 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒏 ≥ 𝒏0}

This is known as Asymptotic Upper Bound as this gives upper bound over 𝑓(𝑛) by 𝑐𝑔(𝑛).
This is also generally considered to give the complexity of worst case.

Ω-Notation:
𝛀(𝒈(𝒏)) = { 𝒇(𝒏): 𝑻𝒉𝒆𝒓𝒆 𝒆𝒙𝒊𝒔𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝒄𝒐𝒏𝒔𝒕𝒂𝒏𝒕𝒔 𝒄 𝒂𝒏𝒅 𝒏0
𝒔𝒖𝒄𝒉 𝒕𝒉𝒂𝒕 𝟎 ≤ 𝒄𝒈(𝒏) ≤ 𝒇(𝒏)𝒇𝒐𝒓 𝒂𝒍𝒍 𝒏 ≥ 𝒏0.}

This is known as that Asymptotic Lower Bound that is 𝑐𝑔(𝑛) works as the lower bound for 𝑓(𝑛).
Generally considered to give time complexity of best case.

Recursion Trees
A recursion tree is a useful for visualizing what happens when a recurrence is iterated. It diagrams
the tree of recursive calls and the amount of work done at each call. We use it to make a guess
about the time complexity of the program. To get absolutely correct answer, we use a Master
Theorem.
𝑛
Example for recurrence 𝑇(𝑛) = 2𝑇 ( ) + 𝑛2
2

Above tree has 𝑂(𝑛2 ) since it will end in some finite time it time complexity is some constant time
into 𝑛^2.

Master Method
General method for solving time complexity of recurrences. Though not valid for all recurrences
𝑛
but useful for recursive relations of form 𝑇(𝑛) = 𝑎𝑇 (𝑏 ) + 𝑓(𝑛), 𝑎 ≥ 1, 𝑏 > 1 , 𝑓(𝑛) asymptotically
positive function.

Master Theorem:
𝐿𝑒𝑡 𝑎 ≥ 1 𝑎𝑛𝑑 𝑏 ≥ 1 𝑏𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠, 𝑙𝑒𝑡 𝑓(𝑛) 𝑏𝑒 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛, 𝑙𝑒𝑡 𝑇(𝑛)𝑏𝑒 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑜𝑛 𝑛𝑜𝑛𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟𝑠 𝑏𝑦
𝑛
𝑡ℎ𝑒 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑇(𝑛) = 𝑎𝑇 ( ) + 𝑓(𝑛),
𝑏

3
𝑛 𝑛 𝑛
𝑊ℎ𝑒𝑟𝑒 𝑤𝑒 𝑖𝑛𝑡𝑒𝑟𝑝𝑟𝑒𝑡 𝑡𝑜 𝑒𝑖𝑡ℎ𝑒𝑟 𝑚𝑒𝑎𝑛 𝑓𝑙𝑜𝑜𝑟 ( ) 𝑜𝑟 𝑐𝑒𝑖𝑙 ( ) . 𝑇ℎ𝑒𝑛 𝑇(𝑛)ℎ𝑎𝑠 𝑓𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐 𝑏𝑜𝑢𝑛𝑑𝑠:
𝑏 𝑏 𝑏
𝑙𝑜𝑔𝑎 𝑙𝑜𝑔𝑎
−𝜖
1. 𝐼𝑓 𝑓(𝑛) = 𝑂 (𝑛 𝑙𝑜𝑔𝑏 ) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝜖 > 0, 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ (𝑛 𝑙𝑜𝑔𝑏 ) .
𝑙𝑜𝑔𝑎 𝑙𝑜𝑔𝑎
( )
2. 𝐼𝑓 𝑓(𝑛) = Θ (𝑛𝑙𝑜𝑔𝑏 ) , 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ (𝑛 𝑙𝑜𝑔𝑏 𝑙𝑜𝑔𝑛) = Θ(𝑓(𝑛)𝑙𝑜𝑔𝑛).
𝑙𝑜𝑔𝑎 𝑛
+𝜖
3. 𝐼𝑓 𝑓(𝑛) = Ω (𝑛 𝑙𝑜𝑔𝑏 ) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝜖 > 0, 𝑎𝑛𝑑 𝑖𝑓 𝑎𝑓 ( ) ≤ 𝑐𝑓(𝑛) 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑐 < 1 𝑎𝑛𝑑 𝑎𝑙𝑙
𝑏
𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑙𝑦 𝑙𝑎𝑟𝑔𝑒 𝑛, 𝑡ℎ𝑒𝑛 𝑇(𝑛) = Θ(𝑓(𝑛)).

Meaning and Intuition:


𝑙𝑜𝑔𝑎
We can see it as that when 𝑓(𝑛) is asymptotically smaller than 𝑛 𝑙𝑜𝑔𝑏 we get 𝑇(𝑛) is bounded by
𝑙𝑜𝑔𝑎 𝑙𝑜𝑔𝑎
−𝜖
𝑛 𝑙𝑜𝑔𝑏 , else if 𝑓(𝑛) is asymptotically larger than 𝑛 𝑙𝑜𝑔𝑏 we get 𝑇(𝑛) bounded by 𝑓(𝑛). Note that the
condition 1 and 2 not only require f(n) to be just smaller or larger but some polynomoial times
larger or smaller, precisely 𝑛𝜖 times.
𝑙𝑜𝑔𝑎
If 𝑓(𝑛) and 𝑛 𝑙𝑜𝑔𝑏 have similar sizes we make it bounded by either of these functions multiplied by
𝑙𝑜𝑔𝑛. These are the only conditions where Master theorem works.

Insertion Sort
Creates a sorted array out of initial array by reading each element placing it in a sorted manner
among the already read elements of array. Very slow and not used generally.

C++ Code:
for(int i=1;i<n;i++){
//reads through elements
for(int j=0;j<i;j++){
//finds the first element larger than the element just read
//among the already sorted elements
if(a[j]>=a[i]){
for(int k=i-1;k>=j && k>=0;k--) {swap(a[k],a[i]); i--;}
break;}
}
}

Idea:
Idea for this algorithm is that the elements we have went through are already sorted thus we just
have to find the first element larger than the element we just read and then insert it
appropriately.

Time Complexity:
Best Case 𝑂(𝑛)
Average Case 𝑂(𝑛2 )
Worst Case 𝑂(𝑛2 )

Space Complexity: 1

4
Merge Sort
Merge Sort is one of the fastest sorting algorithm and a type of Divide & Conquer Algorithm. It
works by breaking the array into two halves and sorting each half using merge sort then finally
merging those two halves appropriately to yield sorted array.

Idea:
The bigger problem of sorting is similar to the sub-problem of sorting the half array.

Time Complexity:
Best Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)
Average Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)
Worst Case 𝑂(𝑛 𝑙𝑜𝑔𝑛)

Space Complexity: 𝑂(𝑛)

Bubble Sort
It goes through entire array again and again until array is completely sorted. Each time swapping
the adjacent numbers if they are in wrong order. Very basic sorting algorithm.

C++ Code:
for(int j=0;j<len;j++){
int swaps=0;//counter for swaps
for(int i=0;i<len-1;i++){
//swap if not in right order
if(a[i]>a[i+1]) {swap(a[i],a[i+1]); swaps++;}
}
//if no elements swapped. Array already sorted.
if(swaps==0) break;
}

Idea:
Idea follows that two real numbers 𝑎, 𝑏 are either 𝑎 > 𝑏 or 𝑎 ≤ 𝑏.

Time Complexity:
Best Case 𝑂(𝑛)
Average Case 𝑂(𝑛2 )
Worst Case 𝑂(𝑛2 )

Space Complexity: 1

Binary Heap
A data structure of binary tree type such that it is Complete Tree that is all levels are completely
filled except possible the last level. Binary Heap is either Min Heap or Max Heap.

Min Heap:
Key at root node is less than all other keys in Binary Heap. This is recursively true for all nodes.

Max Heap:
Key at root node is more than all other keys in Binary Heap This is recursively true for all nodes.

5
Binary Heap Representation:
Generally represented as array. A[0] at root node. For ith node A[i]:
A[(i-1)/2] Parent Node
A[2*i+1] Left Child Node
A[2*i+2] Right Child Node

Operations on Min Heap:

1. getMini(): Return minimum element. Time complexity 𝑂(𝑛).


2. extractMin(): Removes minimum element form set. Time Complexity 𝑂(𝑛 log 𝑛).
3. decreaseKey(): Decreases value of a key. Worst case time complexity 𝑂(log 𝑛).
4. insert(): Insert new key. Worst case time complexity 𝑂(log 𝑛).
5. delete(): Deletes a key. Worst case time complexity 𝑂(log 𝑛).

Heapify:
When root of a binary heap is removed it replaces it with the last element thus maintaing shape
property of heap. Then out of this root node and it’s two children node it swaps max(case of max
heap) with root node. This continues recursively for each binary heap. This maintains order
property of heap.
At any key if any change is made i.e. if decreased we check through the heap it the root node
of and if increase, we check through the heap which it is child node in.
When a key is inserted it is added as the last element of heap thus maintaining shape property.
Then max(case of max heap) of this node, parent node, brother node is swapped with this node.
This continues upwards until heap retrieves it’s order.

6
C++ Code:
void getMini(int a[MAXSIZE]){cout<<a[0];}

void extractMin(int a[MAXSIZE], int size){


//removes root node and places the last element in it.
//then heapifies the structure
if(size>1){
a[0]=a[size]; size--;
heapify(a,size);
}
else size--;
}

void decreaseKey(int a[MAXSIZE], int size, int key){


a[key]--;
heapify(a,size);
}

void insert_heap(int a[MAXSIZE], int size, int element){


a[size+1]=element; //adds new element to the last level
size++;
heapify(a,size);
}

void delete_heap(int a[MAXSIZE], int size, int key){


//places the last element of heap at this key
//then heapifies the tree
a[key]=a[size-1];
size--;
heapify(a,size);
}

Heapsort
One of the fastest sorting algorithm based on the binary max heap data structure. It uses that
root node of max heap is largest element thus exchanging it with present last element of array. It
heapifies this after removing the now last element. And this continues until one element left.
Before starting this procedure entered array is made into a heap.

Time Complexity:
Best Case 𝑂(𝑛 log 𝑛)
Average Case 𝑂(𝑛 log 𝑛)
Worst Case 𝑂(𝑛 log 𝑛)

Space Complexity: 1

Priority Queue
Priority Queue is a data structure that stores priorities/ comparable values. It supports inserting
new priorities and removing/ returning highest priority. A priority queue can be implemented
using different data structures as Array, Heap,etc.

7
We use heap as it is more efficient over other data structures in using as priority queue cause
deleting or inserting element can be done in 𝑂(𝑙𝑜𝑔 𝑛) time compared to array 𝑂(𝑛) time.

Implementation:

Quicksort
Fastest and the most used sorting algorithm. Type of Divide and Conquer algorithm. It picks up a
pivot element and partitions the given array around the pivot in increasing order. Then, quicksorts
each half thus created. Thus this works recursively yet unlike merge sort no stiching of 2 halves
required.

C++ Code:

Time Complexity:
Best Case 𝑂(𝑛𝑙𝑜𝑔𝑛)
Average Case 𝑂(𝑛𝑙𝑜𝑔𝑛)
Worst Case 𝑂(𝑛2 )

Space Complexity: 𝑂(𝑙𝑜𝑔𝑛)

Why quicksort better over merge sort for sorting arrays?


Quicksort is better than merge sort for sorting arrays cause of how memory allocation works for
arrays. For arrays, both have same average time complexity but quick sort being in-place sorting
algorithm and merge sort not, merge sort has extra space complexity 𝑂(𝑛) which slows it. For
linked lists, merge sort is used over quick sort cause in 𝑂(1) time and 𝑂(1) space an element can
be inserted in middle of linked lists thus no extra space required however quick sort needs lot of
direct access to element at 𝑗th which is tough in linked lists.

Abstract Data Type


Type or calss of objects whose behaviour is defined by a set of values and a set of operations
and does not talk about it’s implementation.

Linked Lists
It is a Linear Data Structure in which elements are stored not in a continguous memory like array
instead it consists of nodes where each node consists of a data field and a refernce(link/pointer)
to the next node in list. To remember a linked list we only need to know about address of head
node, rest will be linked.

Class Node:
class Node{
public:
int data;
Node* next;
};

Array vs Linked List:

Parameter Array Linked List

8
Cost of accessing 𝑂(1) as 𝑗 𝑡ℎ element address by 𝑂(𝑛) as we will have to traverse
an element home node address + 𝑗 ∗ through each element in worst
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑙𝑒𝑚𝑒𝑛𝑡. case.
Memory - Fixed Size - No unused memory
Requirements - Unused Memory remains - Extra memory for pointer
- Memory may not be variables
available as one large block - Memory available as multiple
small blocks
Cost of inserting - At beginning 𝑂(𝑛) - At beginning 𝑂(1)
an element - At end 𝑂(1) if array not - At end 𝑂(𝑛)
full else 𝑂(𝑛) - At mid 𝑂(𝑛)
- At mid 𝑂(𝑛)
Cost of deleting - At beginning 𝑂(𝑛) - At beginning 𝑂(1)
an element - At end 𝑂(1) - At end 𝑂(𝑛)
- At mid 𝑂(𝑛) - At mid 𝑂(𝑛)
Ease of Use Better Not as good

Insertion of Nodes:
This is done by first reaching the node where we wish to add the new node and then changing
the pointer of previous node to point to the location of this new node and the next pointer on
this node to contain address of next node.
//adds new node at beginning of list
void push(Node** head_ref, int new_data){
Node* new_node = new Node();
new_node->data = new_data;
new_node->next = (*head_ref);
(*head_ref) = new_node;
}

//adds new node at the node we are given pointer of


void insetAfter(Node* prev_node, int new_data){
if(prev_node == NULL) {
cout<<"Given previous node cannot be NULL";
return;
}
Node* new_node = new Node();
new_node->data=new_data;
new_node->next=prev_node->next;
prev_node->next=new_node;
}

9
//adds new node at end of list
void append(Node** head_ref, int new_data){
Node* new_node = new Node();
Node* last=(*head_ref);
new_node->data = new_data;
new_node->next = NULL;

//if no nodes already it will add new node


if(*head_ref == NULL){
*head_ref = new_node;
return;
}

//does not stop until we reach last node


while(last->next!=NULL){
last=last->next;
}
last->next = new_node;
return;
}

Deletion of Nodes:
To delete node with some data – ‘Key’. We first find that node and the shift pointer of previous
node to point to address of node next to this node. We can design similar code to delete node
at some given position. Below code is for node with some given ‘key’ value.
//delete node with 'key' data
void deleteNode(Node** head_ref, int key){
Node *temp = *head_ref;
Node *prev = NULL;

if(temp==NULL) return;

if(temp!=NULL && temp->data==key){


*head_ref = temp->next;
delete temp;
return;
}

while(temp->next!=NULL && temp->data!=key){


prev = temp;
temp = temp->next;
}
prev->next = temp->next;
delete temp;
}

Final Comments:
As we see that in some cases linked lists work quite efficiently by using memory strategically they
are important data structures and is used frequently in merging two lists, swapping two elements,
these kind of operations are performed quite easily by just changing address of next node for
10
involved nodes. As above implementation we can define other operations like reversing,
searching, et cetera.

Doubly Linked List:


Unlike singly linked list, each node has some data, pointer to previous node and pointer to next
node. This makes it possible to traverse the node along both directions. Similar time complexity as
singly linked list but space required is more.

Circular Linked List:


A linked list where there no node has a null pointer to next node cause the list is circular. These
are of two types: 1. Singly Circular Linked List – Each node has pointer only for the node on right
or left. 2. Doubly Circular Linked List – Each node has pointer for both nodes on it’s right and left.

Stack
Stack is a linear data structure which follows a particular order in which operations are to be
performed. The order is Last In First Out (LIFO) or First In Last Out (FILO).

Main Operations:

1. Push – Adds an item in the stack if stack not full else Overflow condition. Time Complexity
𝑂(1).
2. Pop – Removes an item from the stack, this item was the last added item. If stack empty
then underflow condition. Time complexity 𝑂(1).
3. Peek / Top – Returns top element of stack. Time Complexity 𝑂(1).
4. isEmpty – Return true if stack is empty else false. Time Complexity 𝑂(1).

Stacks can be used using 1. Array – Better as no pointers involved but fixed memory. 2. Linked List
– Better as no memory restriction.

Array Implementation using stack:


class Stack{
int top;
public:
int a[MAX];
Stack() {top = -1;}//generator
bool push(int x);
int pop();
int peek();
bool isEmpty();
};

int Stack::pop(){
if(top<0) {cout<<"Stack Underflow"; return 0;}
else {top--; return a[top];}
}

bool Stack::isEmpty(){
return (top<0);
}

int Stack::peek(){
if(top<0) {cout<<"Stack Empty"; return 0;}
else return a[top];
}

bool Stack::push(int x){


if(top>=MAX-1) {cout<<"Stack overflow"; return false;}
else {top++; a[top]=x; return true;
}
}

11
Linked List Implementation using Stack:
struct Node{
int data;
Node* link;
};

struct Node* top;

void push(int new_data){


struct Node* temp;
temp = new Node();
if(!temp){
cout<<"Stack Overflow"<<endl;
exit(1);
}
temp->data = new_data;
temp->link = top;
top = temp;
}

bool isEmpty(){ return top==NULL; }

int peek(){
if(!isEmpty()) return top->data;
else exit(1);
}

void pop(){
Node* temp;
if(isEmpty()){
cout<<"Empty Stack"<<endl;
exit(1);
}
temp = top;
top = top->link;
temp->link = NULL;
free(temp);
}

Example for checking regular bracket sequence:


For regular bracket sequence last opened bracket must be closed first. This idea implemented
using stack as a follows. We go through each element of entered array and create a stack of
opening brackets and simulatneously reading th closing brackets. If the closing bracket is
counterpart for the top of stack we pop that element from stack. This continues till we reach end
of array. If the finally no elements remain in array then it was a regular bracket sequence else
not.

Final Comments:
Stacks are very useful in infix to postfix or infix to prefix conversion, also used in Backtracking
based algorithms.

Infix, Postfix and Prefix


Infix Expression:
< Operand >< Operator >< Operand > format. This is the general way of writing expresssion, where
operands are generally numbers while operators being‘ + ’, ’ − ’, ’/’ and ’ ∗ ’. However evaluating
expressions using infix format involves adjusting the paranthes first after seeing the expression as
some operators have higher precedence over others. Thus this requires higher memory and time
to parse through.

Prefix Expression:
< 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >. This is much faster compared to infix expression as this
saves time and space for associativity rules and () as prefix allows only one operand.

12
Postfix Expression:
< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑛𝑑 >< 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 > . Has similar time and space complexity as prefix expression
but this is majorly used cause the algorithms involving this are more intuitive thus easier to use.

Postfix and Prefix evaluation with Stack


Postfix evaluation:
We parse through expression from left to right and as soon as we encounter an operator we
solve it with the last two operators and replace this part of expression with the result. We repeat
this until no operator remains.
Since we are going from left to right and solving the first postfix sub-expression we meet i.e. Last In
First Out principle thus we will use Stack.

Pseudo Code for Postfix evaluation:


//assumption that this doesn't handle any error conditon
EvaluatePostfix(exp){
create stack S;
for i<-0 to length(exp)-1 {
if(exp[i] is operand) Push(exp[i]);
else{
int op2 = Pop();
int op1 = Pop();
int res = Perform(exp[i], op1, op2);
Push(res);
}
}
}

Prefix Evaluation:
Works completely similar to postfix evaluation except for that this time evaluation will be done
from right to left.

Infix to Postfix using Stack


We parse through the expression from left to right and keep placing the opeators in a stack as
we meet them and keep placing operands as we create postfix expression. The moment we
encounter an operator which has lower precedence than the top element of stack created we
pop the stack and push this new operator. This continues till we reach end of infix expression
where we then pop remaining elements of stack one by one.

13
Pseudo Code:
//assumption that this doesn't handle any error conditon
for i<-0 to length(exp)-1{
if (exp[i] is operand)
res <- res + exp[i];
else
while(!s.empty() && HasHigherPrec(s.top(), exp[i])){
res<-res+s.top();
s.pop;
}
s.push(exp[i]);
}
while (!s.empty())
{
res <- res + s.top();
s.pop();
}
return yes;

Queue
Queue is an First In First Out data structure. Theoritically, queue can be considered as a list which
is open from both sides unlike stack operation. We can enter in queue from one side and remove
elements from other side.
Queue should perform these basic operations:

1. Enqueue(x) or Push(x): Increases the size of queue adding x from one side. 𝑂(1) time
complexity.
2. Dequeue() or Pop(): Decrease size of queue by removing or popping out the element
from other side of queue. 𝑂(1) time complexity.

Array Implementation:
We use a rear and front index for queue inside an array. When we want to denenqueue we
increase front index by 1. When we want to enqueue we add an element to the rear end of
queue. This is assuming front index < rear index.
void Enqueue(int queue[], int val, int n, int front, int rear){
if(rear==n-1) cout<<"Queue Overflow"<<endl;
else{
if(front==-1) front=0;
cout<<"Insert element in queue:"<<val<<endl;
rear++; queue[rear]=val;
}
}

void Denqueue(int queue[], int front, int rear){


if(front==-1 || front>rear) cout<<"Queue Underflow"<<endl;
else{
cout<<"Element deleted fron queue: "<<queue[front]<<endl;
front++;
}
}

14
Linked List Implementation:
Overcomes limited space error of array. We keep into knowledge the pointers pointing to
address of front and rear end of queue so that adding or removing an element always takes
constant time.

void Enqueue(int x){


struct Node* temp=(struct Node*)malloc(sizeof(struct Node*));
temp->data=x;
temp->next=NULL;
if(front==NULL && rear==NULL){ front=rear=temp; return; }
rear->next=temp;
rear=temp;
}

void Dequeue(){
struct Node* temp=front;
if(front==NULL) return;
if(front==rear) front=rear=NULL;
else front=front->next;
free(temp);
}

Hashing
Hashing is an efficient way of storing data. As an example, if we wanted to store a directory with
information about a particular phone number. We could create a way or a hash map such that
when given phone number is entered in it, it returns address of location where data related to it is
stored.

Direct Access Tables


In above example of phone numbers assume we create an array or linked list of such large size
which at index equal to the phone number contains address to some location where this data is
stored. In this sense, it is considered a DIRECT ACCESS table. The indexes where nothing is stored
will be considered NIL.

Advantages:
𝑂(1) time Complexity for tasks like adding, deleting and searching phone number or any other
index based input.

Disadvantage:
Total array declared may be of size |U| and used might be just few Keys, so large amount of
memory would be wasted.
Memory on computer might not be large enough to store |U| fully.

Hash Table
Hash Table is generally used when the set of Keys being used is much smaller than |U|. We use
the element corresponding to key ℎ(𝑘). This function is known as Hash Function.
ℎ: 𝑈 → {0,1, . . . , 𝑚 − 1}
Size m much smaller than |U| generally.

15
Collision
Two keys when hash to same slot results in collision. This happens cause 𝑚 < |𝑈|. We minimize
collision by randomizing hash.

Chaining
Simplest resolution for collision. We place all the elements that hash to same slot into linked list.
This is good enough but in this some parts of table may remain unused and also if the linked list
gets large, searching or deleting an elements may take considerable time.

Performance of Chaining
𝑛
Given m number of slots with n keys to be insered, we used Load factor ∝= which gives
𝑚
average size of linked list assuming each key equally likely to be hashed to any slot table.
1. Time to search 𝑂(∝)
2. Time to delete 𝑂(∝)
3. Time to insert 𝑂(1)
For storing chains we could use:
1. Linked Lists – Not cache friendly with 𝑂(𝑙) search, delete and insert time complexity.
2. Dynamic Sized Array – Cache Friendly, Time complexity same as linked list.
3. Self Balancing BST – Not cache friendly. Insert in 𝑂(𝑙). Search and delete in 𝑂(log 𝑙).

Open Addressing
All elements are stored in the hash table itself so size of table must be greater than |𝑈|. Done by
(a) Linear Probing
(b) Quadratic Probing
(c) Double Hashing

Linear Probing
We simply linearly probe the element to the next slot. If we get some ℎ(𝑘) index after applying
the hash function and the slot is occupied we just keep the element to the next exmpty slot.

Challenger In Linear Probing:


1. Primary Clustering - Many consecutive elements form groups then it starts to take time to find
the next empty slot or search an element.
2. Secondary Clustering – Two records have the same collission chain if their initial position is
same.

Quadratic Probing
Unlike linear probing where we went to the next empty slot if hashed to slot is filled here we go to
𝑖 2 + ℎ(𝑘) in the 𝑖 𝑡ℎ iteration.

Double Hashing
We have another hash function ℎ𝑎𝑠ℎ2(𝑥) and if the hashed to slot is empty we go to ℎ𝑎𝑠ℎ(𝑥) +
𝑖. ℎ𝑎𝑠ℎ2(𝑥) in 𝑖 𝑡ℎ iteration.

16
Graph
A graph is a non-linear data structure consisting of nodes and edges. Nodes are also referred to
as vertices and edges are lines or arcs that connect any two nodes in graph.

A finite set of ordered pair (𝑢, 𝑣) represents an edge. Edges can be bidirectional or unidirectional.
Edges can also be weighted, then known as Weighted Edge. Edge (𝑢, 𝑣) refers edge from node 𝑢
to node 𝑣.

Path
Path the ordered set of edges to reach from node a to node b. Length of a path generally is the
number of edges in it or sum of weights on edges. Path is a cycle if the first and last node is same.
Path is Simple if each node appears at most once in the path.

Connectivity
Graph is connected if for any two nodes there exists a path between. In case where graph is
note connected the connected parts of graph are called Components.

Components : {1,2,3}, {4,5,6,7}, {8}.

Tree
A graph such that it has 𝑛 vertices and 𝑛 − 1 edges placed such that there is a unique path from
vertex 𝑢 to vertex 𝑣.

Neighbours and Degrees


If some two vertices have an edge connecting them then they are neighbours or are adjacent.
Degree – Degree of each node is the number of neighbours this node has.

Sum of Degrees in a graph = 2𝑚, 𝑚 is number of edges.


Regular Graph has same degree 𝑑 for each node. Complete graph is a regular graph which has
edge connecting every pair of nodes.
In case of bidirectional graph, each node has Indegree and Outdegree.

Graph Colouring
Each node assigned a colour such that no two adjacent nodes have the same colour.
Bipartite Graph : Graph is Bipartitite if it is possible to colour it using two different colour.

17
Bipartite Condition
Graph with no cycle containing odd number of edges ⇔ Bipartite Graph

Graph Representation:
1. Adjacency Matrix
2. Adjacency List
3. Edge List

Adjacency Matrix
We have matrix 𝐴 of size 𝑣 × 𝑣, where v represents the number of nodes in graph and element
𝑎[𝑖][𝑗] of 𝐴 represent the weight or cost of edge (𝑖, 𝑗). 0 can be used to represent edges which
don’t exist while giving other edges some non zero cost.

Pros:
1. Easier to implement
2. Adding or Removing edge in 𝑂(1) time.

Cons:
1. Consumes more space 𝑂(𝑣 2 ) even if the graph is sparse.
2. Adding vertex takes 𝑂(𝑣 2 ) time.

Adjacency List
Each node 𝑢 is assigned an adjacency list that contains list of nodes to which there is an edge
from 𝑢. Most popular representation of Graphs.

We use a list of vectors which for each node consists of node numbers of nodes connected to it.
In Bidirectional graph repititions may occur to signify both directions possible. In case of weighted
edges we use a vector for each edge such that each vector element contains a pair
(𝑏, 𝑤), where b is node where the node to which this vector belongs connects to, w is weight of
this edge.
vector<int> adj[n]; //large enough N
vector< pair<int, int> > adj[N];

adj[1].push_back({2,5});
adj[2].push_back({3,7});
adj[2].push_back({4,6});
adj[3].push_back({4,5});
adj[4].push_back({1,2});

This creates this graph :

Pros:
1. Saves space 𝑂(|𝑉| + |𝐸|). Only in worst case space is 𝑂(𝑣 2 ).
2. Adding vertex much easier.

18
Cons:
1. Queries like where there is an edge from vertex 𝑢 to vertex 𝑣 not efficient and done in time
𝑂(𝑉).

Edge List
This method has focus on edges and in algos which don’t need edges starting at a particular
node. We create vector of tuple or pair such that (𝑎, 𝑏, 𝑤) tuples represents that there exists an
edge of weight 𝑤 from vertex 𝑎 to vertex 𝑏 with weight 𝑤.

Below code creates the above graph.


vector< tuple<int, int, int> > edges;
edges.push_back({1,2,5});
edges.push_back({2,3,7});
edges.push_back({2,4,6});
edges.push_back({3,4,5});
edges.push_back({4,1,2});

Depth First Search


Method of Graph Traversal such that algorithm begins at a starting node and proceeds to all
other nodes that are reachable from starting node using edges of the graph. This follows single
path as long as it finds new nodes and after this it returns to previous node whose neighbours
were unvisited and starts to cover other parts of graph. Algorithm keeps track of visited node so
each node visited only once. Starting node can be chosen at random.

Implementation:
vector<int> adj[N]; // non-weighted graph
bool visited[N]; //initiallly initialized to false for all

void dfs(int s){


if(visited[s]) return;
visited[s] = true;

for(auto u: ads[s]) dfs(u);


}

Time Comlexity 𝑂(𝑉 + 𝐸)


Space Complexity 𝑂(𝑉)

In case of disconnected graph, after compling DFS for a given random starting node. Start again
with some other univisited node and continues till all nodes are visited.

Breadth First Search


Visits nodes in increasing order of their distance from the starting node. First all nodes at distance
1 from the starting node then nodes with distance 2 from starting node and so on. Some graphs
may contain cycles thus we also keep a record of visited nodes.

19
Implementation:
queue<int> q; //queue of nodes in order
//of increasing distance from some starting node.
//node at beginning is the one to be processed next
//new nodes ended to end of queue.
bool visited[N];
int distance[N];

Now starting BFS at some starting node 𝑥:


visited[x]=true;
distance[x]=0;
q.push(x);
while(!q.empty()){
int s=q.front();
q.pop();
//process node s
for(auto u : adj[s]){
if(visited[u]) continue;
visited[u]=true;
distance[u]=distance[s]+1;
q.push(u);
}
}

Time Complexity 𝑂(𝑉 + 𝐸)

BFS generally used where distance from given main node are needed as this stores distances
from some starting node. In case of disconnected graph for each component we need to start
with some random node after emptying queue.

Applications:
1. Check for cycles i.e. if the we receive a node such that such that it only has visited neighbours
then we just completed a cycle or if graph has c nodes than if number of edges is greater than
c-1 then it must contain a cycle.
2. Bipartite checking by graph traversal giving a starting node some colour then all it’s adjacent
nodes the opposite colour and continuing on until we notice that some adjacent node has same
colour as this node hence not bipartite.

Shortest Path Algorithms


These algorithms help in finding the shortest distance between any two nodes on a graph. Easy in
non-weighted graphs using BFS. But following algos work on weighted graphs.

Bellman – Ford Algorithm


Finds shortest path from starting node to all other nodes of graph provided graph has no
negative sum cycles and if it has, algo will detect it. Initially distance to starting node is zero and
all other nodes is infinite, as it finds shorter path it keeps reducing distance.

Idea:
A process repeats (𝑛 − 1) where in each process it goes through all the edges and tries to reduce

20
the distances. An array distance is created which stores all the distances. We do (𝑛 − 1) times as
in worst cose each shortest path can contain at max. (𝑛 − 1) edges. We don’t want negative
cycles as then length can reduced infinitely by repeating cycles.

Implementation:
#define INF 100000
//uses edge list which is tuple <a,b,w> from a to b with weight w
for (int i=1;i<=n;i++) distance[i]=INF;
distance[x]=0//starting node
for(int i=1;i<=n-1;i++){
for(auto e : edges){
//iterate through all edges
int a,b,w;
tie(a,b,w)=e;
distance[b]=min(distance[b],distance[a]+w);
}
}

Time Complexity 𝑂(𝑛𝐸)


as run through all E edges n times.

Remark Practically minimum edge could be found before (𝑛 − 1) rounds so we can make faster
algorithm by stoping. To detect negative cycle we run for 𝑛 times then if in last round there is
reduction in length then there is a negative cycle.

Shortest Path Faster Algorithm


A faster variant of bellman-ford algorithm. Every vertex is used to relax it’s adjacent vertices was
the idea in bellman-ford algorithm but in case of SPFA we keep a query of vertices and a vertex
is added to it only if it is relaxed. Process repeats until no more vertices can be relaxed. Though
faster than Bellman-Ford but worst case time complexity is same as it’s predecessor.

Dijkstra’s Algorithm
A far more efficient algorithm for calculating the shortest distance of each node from some
starting node provided graph has no negative weighted edges.

Idea:
Based on fact that there are no negative weighted edges thus distance of 𝑛𝑜𝑑𝑒 𝑏 from starting
node will be atleast greater than 𝑛𝑜𝑑𝑒 𝑎 distance from starting node assuming starting
node→ 𝑛𝑜𝑑𝑒 𝑎 → 𝑛𝑜𝑑𝑒 𝑏 is the fastest way to reach node b.

Implementation:
Graph stored as adjaceny list. We are using negative of distance as priority queue by default

21
returns maximum element but we need minimum distance node.
priority_queue<int,int> q;

for(int i=1;i<=n;i++) distance[i]=INF;


distance[x]=0; //starting node x
q.push({0,x});

while(!q.empty()){
int a=q.top(); q.pop();
if(processed[a]) continue;
processed[a]=true;
for(auto u : adj[a]){
int b = u.first, w=u.second;
if(distance[a]+w<distance[b]){
distance[b]=distance[a]+w;
q.push({-distance[b],b});
}
}
}

Time Complexity 𝑂(𝐸 𝑙𝑜𝑔𝑉)

Floyd – Warshall Algorithm


Graph is stored as adjacency matrix. Initially if two nodes are connected or are itself, only those
weights are written in matrix. Then at every round algo selects a new node that can act as an
intermmediate node and all distances which can be reduced are reduced using this node.
Continues until all nodes have been appointed intermmediate node.

Implementation:
//constructing adjacency matrix
for(int i=1; i<=n; i++){
for(int j=1;j<=n;j++){
if(i==j) distance[i][j]=0;
else if(adj[i][j]) distance[i][j]=adj[i][j];
else distance[i][j]=INF;
}
}

//k th node is intermmediate node each time


for(int k=1;k<=n;k++){
for(int i=1;i<=n;i++){
for(int j=1;j<=n;j++)
distance[i][j]=min(distance[i][j],distance[i][k]+distance[k][j]);
}
}

Time Complexity 𝑂(𝑛3 )


Easy implementation but quite slow hence used only over small graphs.
22
Complete Search Algorithms
This is practically a brute force method which creates all possible solutions to a problem using
brute force, then select best solution or count the number of solutions. These algos take a lot of
time and improved by Dynamic and Greedy Algorithms.

Backtracking
A type of complete search method where we begin with an empty solution and extend solution
step by step. Search recursively goes through all different ways on how a solution can be
constructed. Solution is made incrementally, one piece at a time.

Types of Backtracking problems:


1. Decision Problem – We search for a feasible solution.
2. Optimization Problem – We search for best solution.
3. Enumeration Problem – We find all feasible solutions.

Almost every backtracking problem can be solved much more efficiently using greedy
algorithms or dynamic programming. Unlike recursion where we called function until we reached
a base case, in backtracking we use recursion to explore every possible solution and killing those
on the way which are not possible.

N-Queens Problem:
Classic case of backtracking.
Find number of ways to place n queens on an nxn chessboard such that no two queens attack
each other.

We solve for n=4 case. Step 1 we create all 4 possible subcases of placing queen in row 1 then
go on to place more queen and discarding invalid solutions.

and so on.

Implementation:
We can’t place next queen at some diagonal or row or column. Using that we have assigned
blocks belong to some row, column, or diagonal some index as below.

23
Code:
void search(int y){
if(y==n) {count++; return;}

for(int x=0;x<n;x++){
if(column[x] || diag1[x+y] || diag2[x-y+n-1]) continue;
column[x] = diag1[x+y] = diag2[x-y+n-1] = 1;
search(y+1);
column[x] = diag1[x+y] = diag2[x-y+n-1] = 0;
}
}

Explanation:
1. Start on topmost row.
2. If all queens place, return true.
3. Try all columns in current row. And do this for each row:
(a) If can be safely placed, mark as part of solution and recursively check if this leads to a
solution.
(b) If placing this queens leads to a solution return true.
(c) If no solution, unmark this and go to (a) to try other rows.
4. If all columns tried and no solution, return false.

Greedy Algorithms
Constructs a solution to problem by just caring to make the best choice at any given moment.
Challenge being constructing the solution such that local optimum leads to global optimum.

Coin Problem
Given this set of coins {1,2,3,10,20,50,100,200}, each in infinite number find a way to make 520$
with minimum number of coins.

Answer : 200+200+100+20. This method actually always works for this problem. In general case for
some other given set of coins, this greedy method may not work.
Counter-Example : {1,3,4}, to create 6 greedy gives solution 4+1+1. Answer however is 3+3. These
problems handled by Dynamic Programming.

Data Compression
This is a way of assigning codewords to letters of a string (as an example) such that space
consumed is lesser. Binary Code assigns codeword to each character of string that consists of
bits. We want to be able to generate our original string from this codewords and also design

24
codewords for each character such that space taken by coded string is minimum thus we assign
smaller codewords to characters which appear more often in a string.

Huffman Coding
Huffman Coding is a greedy algorithm that constructs optimal code for compressing a given
string. Algorithm creates a binary tree based on frequencies of each characters in the string, and
each character’s codewords read by moving from root node to corresponding node. Each
move to right corresponds to ‘1’ and move to left ‘0’.

Initially each character is assigned a node and a weight(frequency of it’s appearance in the
string). Then at each step, the two nodes with minimum weights are combined to give a new
node with weight equal to sum of weights of children nodes.

Example Implementation:
𝐴𝐴𝐵𝐴𝐶𝐷𝐴𝐶𝐴 has A- 5, B-1, C-2, D-1 has these weights.

Thus , this is what we get giving final codewords for each character as follows:

. Now, this can be used to create the string 𝐴𝐴𝐵𝐴𝐶𝐷𝐴𝐶𝐴.

Dynamic Programming
Makes use of both greedy algorithms and complete search. Possible to use when problem can
be divided into overlapping subproblems which can be solved independently.
Uses:

1. Finding an optimal solution : Find a solution as large or as small as possible.


2. Counting number of solutions : Calculate total number of possible solutions.

Coin Problem
This doesn’t end up being solved each time we use greedy algorithms so we use dynamics
programming. This uses recursive function that goes through all possibilities to form sum like brute
force but this is more efficient cause it calculates answer to each subproblem only once -
Memoization.

Idea:
𝑠𝑜𝑙𝑣𝑒(𝑥) = min(𝑠𝑜𝑙𝑣𝑒(𝑥 − 𝑐) + 1) 𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑐 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑐𝑜𝑖𝑛𝑠. We constructed a recursive solution to
this and this can be used to solve but not quite efficient as repetitive calculation of many values.
Thus when solved 𝑠𝑜𝑙𝑣𝑒(𝑥), we store it in an array to not solve it again later.

25
Implementation 1:
//ready[x] tells if we already have calculated value of solve(x);
//if we have we can return that value
//else calculate it and update ready, value for this x
int solve(int x){
if(x<0) return INF;
if(x==0) return 0;
if(ready[x]) return value[x];
int best = INF;
for(auto c : coins) best=min(best,solve(x-c)+1);
value[x]=best;
ready[x]=true
}

Implementation 2:
Based on that we calculate values for all solve(x) iteratively.
value[0]=0;
for(int x=1;x<=n;x++){
value[x]=INF;
for(auto c : coins){
if(x-c>=0) value[x]=min(value[x],value[x-c]+1);
}
}

Longest Increasing Subsequence


Find the maximum length of an increasing subsequence of a given array. We create an array
𝑙𝑒𝑛𝑔𝑡ℎ[𝑁] of size 𝑁 which 𝑙𝑒𝑛𝑔𝑡ℎ[𝑘] tell length of largest subsequence whose last element is at
index k of given array. Same as
𝑙𝑒𝑛𝑔𝑡ℎ[𝑘] = 𝑙𝑒𝑛𝑔𝑡ℎ[𝑖] + 1, 𝑙𝑒𝑛𝑔𝑡ℎ[𝑖] 𝑎𝑠 𝑙𝑎𝑟𝑔𝑒 𝑎𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒, 𝑖 < 𝑘
Base case being that if no such i exists, length[k] = 1. Since construction of 𝑙𝑒𝑛𝑔𝑡ℎ[𝑘] requires
values for <k, we use DP.

Implementation:
for(int k=0;k<n;k++){
length[k]=1;
for(int i=0;i<k;i++){
if(array[i]<array[k]) length[k]=max(length[k],length[i]+1);
}
}

Paths in a Grid
Given 𝑛𝑥𝑛 square grid given a value at each square. We have to find the path from top left to
bottom right square such that sum of values on the squares it traverses is maximum and we can
move down, right only.
Solution:
𝑠𝑢𝑚(𝑦, 𝑥) = max(𝑠𝑢𝑚(𝑦, 𝑥 − 1), 𝑠𝑢𝑚(𝑦 − 1, 𝑥)) + 𝑣𝑎𝑙𝑢𝑒[𝑦][𝑥],
26
Row and columns are numbered from 1 to n. sum(y,x)=0 when y=0 or x=0, assigned as base case.
We can use recursive iteration.

Implementation:
for(int y=1;y<=n;y++){
for(int x=1;x<=n;x++){
sum[y][x] = max(sum[y][x-1],sum[y-1][x]) + value[y][x];
}
}

Time Complexity : 𝑂(𝑛2 )

Knapsack Problems
Knapsack refers to problems where a set of objects is given and we have to find subsets with
some property.

Example Problem:
Given a set of numbers find all sums which can be constructed by taking a subset.
Can be simplified to 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘) = 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥 − 𝑤[𝑘], 𝑘 − 1) 𝑜𝑟 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘 − 1)
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 𝑘) tells if a sum of x is possible with first k elements of sequence so this can be brokenn
down to those which involve weight at 𝑘 𝑡ℎ index or those which don’t. Base case being
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒(𝑥, 0) = { 𝑇𝑟𝑢𝑒 𝑥 = 0; 𝐹𝑎𝑙𝑠𝑒 𝑥 ≠ 0;

Implementation 1:
possible[0][0] = true;
for(int k=1;k<=n;k++){
//W is sum of all elements in set
for(int x=0;x<=W;x++){
if(x-w[k]>=0) possible[x][k] |= possible[x-w[k]][k-1];
possible[x][k] |= possible[x][k-1];
}
}

Binary Indexed Trees – Fenwick Tree


𝑓 be some reversible function and A be array of integers of length N. Fenwick Tree is a DS which
calculates the value of a function 𝑓 in a given range [𝑙, 𝑟] in 𝑂(log 𝑛) time and updates a values in
the same time and requires 𝑂(𝑁) memory.

Overview:
Let 𝑓 be sum function. Given array A[0 … N-1]. Fenwick Tree is array T[0 … N-1] where each
element is sum of elements of A in some range [𝑔(𝑖), 𝑖] :
𝑖

𝑇𝑖 = ∑ 𝐴𝑗
𝑗=𝑔(𝑖)
where 𝑔 satisfies 0 ≤ 𝑔(𝑖) ≤ 𝑖.

There can be many ways to chose 𝑔(𝑖), if we chose 𝑔(𝑖) = 0 (zero – based indexing), then
calculates sum for range in 𝑂(1) time but update is slow.

Fenwick’s Algorithm:
Fenwick’s genius is in definition of 𝑔(𝑖). Replace all trailing 1 bits in binary representation of 𝑖 with 0

27
bits. 𝒈(𝒊) = 𝒊 & (𝒊 + 𝟏). In easier format 𝑇𝑘 = ∑𝑘𝑗=𝑘+1−𝑝(𝑘) 𝐴𝑗 , where 𝑝(𝑘) is the maximum power of 2
which divides 𝑘.

Using this tree, any 𝑠𝑢𝑚𝑞 (1, 𝑘) can be calculated in 𝑂(log 𝑛) time because range [1, 𝑘] can always
be divided into 𝑂(log 𝑛) ranges with sums stored in tree. Each element belongs to 𝑂(log 𝑛) ranges
in binary indexed tree, it is suffice to update 𝑂(log 𝑛) values in tree.

Implementation:

Use 𝑝(𝑘) = 𝑘& − 𝑘.

//calculates sum_q(1,k)
int sum(int k){
int s=0;
while(k>=1){
sum+=tree[k];
k-=k&-k;
}
return s;
}

//increases array value at position k by x


void add(int k, int x){
while(k<=n){
tree[k]+=x;
k+=k&-k;
}
}

Example:

28
Segment Trees
Support processing range query and updating range query. In range queries they may answer
maximum, minimum or sum queries int 𝑂(log 𝑛) time. Higher memory requirement than fenwick’s
tree.

Structure to deal with sum queries:


Zero based indexing used throughout.
Nodes on the bottom level are elements of array. Each node at a level above is sum of the
nodes below in the order as follows.

This is segment tree for array :


Each internal tree node corresponds to array range whose size is a power of two.
Any range [𝑎, 𝑏] can be divided into 𝑂(log 𝑛) range.
When an array node updated all nodes to the top which have that element in their range will be
updated, thus 𝑂(log 𝑛) nodes will be updated.

Implementation:
If size is a power of 2. Then an array of size 2𝑛 stored to represent tree. Tree[1] represents top
node, tree[2] & tree[3] it’s children and so on. Tree[𝑛] to Tree[2𝑛 − 1] represents actual array.
Parent of Tree[k] is Tree[⌊𝑘/2⌋] and has children Tree[2𝑘] and Tree[2𝑘 + 1].

//calculates sum_q(a,b)
int sum(int a, int b){
a+=n; b+=n;
int s=0;
while(a<=b) {
if(a%2==1) s+=tree[a++];
if(b%2==0) s+=tree[b--];
a/=2; b/=2;
}
return s;
}

//increases value of array at position k by x


void add(int k, int x){
k+=n;
tree[k]+=x;
for(k/=2 ; k>=1 ; k/=2 ) tree[k] = tree[2*k]+tree[2*k+1];
}

Other Queries:
Define internal nodes of segment tree such that for the range which is of size of power of 2, each
node contains the query operation for that range. Then for any asked range answer can be

29
calculated the way sum is calculated in 𝑂(log 𝑛) time.

Segment Tree to deal with minimum query.

Tree
Tree is connected, acyclic graph that contains n nodes and n-1 edges. Removing any edge
from tree divides it into two components, and adding any edge to tree creates a cycle and a
unique path between any two nodes of tree.

Leaves – Nodes with only one neighbour


Rooted Tree – One of the nodes is appointed the root of tree and has no parent node, all other
nodes are placed underneath the root. Structure of Rooted Tree is recursive, each node of a tree
acts as a root of it’s children.

Tree Traversal
General Depth First Search implemented at an arbitrary node. Dfs function is given startinf node
and previous node (if any) so that dfs doesn’t visit previously visited node.

Implementation:
//ads[s] is vector containing all nodes connected by direct edge to s.
void dfs(int s, int e) {
for (auto u : adj[s]) if (u != e) dfs(u, s);
}

Diameter – Maximum length of a path between any two nodes. Their may be mutiple maximum
length paths.

Diameter of Tree Algorithm


Algotihm 1:
Every path in a rooted tree has a highest point: the highest node that belongs to the path. Thus
we can calculate for each node the length of the longest path whose highest point is the node.
One of those paths is the diameter of tree.
For each node 𝑥 we calculate:
1. 𝑡𝑜𝑙𝑒𝑎𝑓(𝑥): maximum length of path from 𝑥 to any leaf
2. 𝑚𝑎𝑥𝐿𝑒𝑛𝑔𝑡ℎ(𝑥): maximum length of path whose highest point is 𝑥.
Calculated in 𝑂(𝑛) time with DP.
To calculate 𝑡𝑜𝑙𝑒𝑎𝑓(𝑥), we go through children of 𝑥 and chose child 𝑐 with maximum 𝑡𝑜𝑙𝑒𝑎𝑓(𝑐)
and add 1 to this.
To calculate 𝑚𝑎𝑥𝐿𝑒𝑛𝑔𝑡ℎ(𝑥) we chose two distinct 𝑎, 𝑏 nodes such that 𝑡𝑜𝐿𝑒𝑎𝑓(𝑎) + 𝑡𝑜𝐿𝑒𝑎𝑓(𝑏) is
maximum and add 2 to this sum.

30
Algorithm 2:
Uses 2 DFS. First chose arbitrary node 𝑎 in tree and find farthest node 𝑏 from 𝑎 and find the
farthest node 𝑐 from 𝑏. Diameter of tree is distance between 𝑏 and 𝑐.

Spanning Trees
Spanning tree of a graph consists of all nodes of graph and some edges of graph so that there is
path between any two nodes.
Weight of ST: Sum of edge weights of ST.

Minimum Spanning Tree: Spanning Tree with minimum weight.


Maximum Spanning Tree: Spanning Tree with maximum weight.

Kruskal’s Algorithm
Sort all the edges in order of increasing weghts and initially just setup a tree with just nodes and
no edges. Now keep adding the next smallest edge as long as tree remains acyclic. Kruskal’s
Algorithm is based on a greedy strategy.

Implementation:
//edge list representation
//sort m edges in O(mlogm) time
for(...){
if(!same(a,b)) unite(a,b);
}
//same(a,b) tells if a,b are in same component
//unite(a,b) joing component containing a,b

𝑠𝑎𝑚𝑒(𝑎, 𝑏), 𝑢𝑛𝑖𝑡𝑒(𝑎, 𝑏) can be implemented using Union-Find data structure in 𝑂(log 𝑛) time.

Time Complexity (𝑚 log 𝑛)

Union-Find Structure
This maintains collection of sets. Sets are disjoint. Two 𝑂(log 𝑛) operations are supported, 𝑢𝑛𝑖𝑡𝑒
operation joins two sets and 𝑓𝑖𝑛𝑑 operation finds representative of the set given a element.

Sturcture:

Here, representative of each set is 4, 5 and 2.


Two sets joined by joining representative elements. Joining is generally followed by that
representative of smaller is connected to representative of larger set.

Implementation:
Using arrays, array 𝑙𝑖𝑛𝑘 contains for each element the next element in the chain or the element

31
itself if it is a representative.
//initially each element is seperate set
for(int i=1;i<=n;i++) link[i]=i;
for(int i=1;i<=n;i++) size[i]=1;

int find(int x){


while(x!=link[x]) x=link[x];
return x;
}

bool same(int a, int b){


return find(a)==find(b);
}

void unite(int a,int b){


a=find(a);
b=find(b);
if(size[a]<size[b]) swap(a,b);
size[a]+=size[b];
link[b]=a;
}

Prim’s Algorithm
First choses an arbitrary node then always choses minimum weight edge and adds a new node
to the tree. Finally, when all nodes are added it becomes minimum spanning tree.
Implementation is done by priority queue.

Implementation:
Priority queue should contain all nodes that can be connected to current component using
single edge in increasing order of weights.

Time Complexity 𝑂(𝑛 + 𝑚 log 𝑚 )

Sliding Window
Sliding Window is constant size sub-array that moves from left to right through the array. At each
window position, we need to calculate some information about elements inside the window.

32
Bibliography
With the last topic finished, this ends my journey for Summer of Science project based on Data
Structures and Algorithms. I would like to thank my mentor – Ritik Mandal who was very
supportive in whatever way we wanted to make our report and did not impose any form of
restriction on us, which perfectly suited my working style.
I took references from:

1. GeekforGeeks
2. Introduction to Algorithms – Thomas Cormen
3. Competitive Programmer Handbook – Antii Laksoonen
4. Youtube Channel – MyCodeSchool

33

You might also like