You are on page 1of 230

Module I:Overview of Data Structures

ty
At the end of this module, you will learn:

 Concept of Data Structures,


 Overview of algorithmic analysis and

si
 Various other approaches.

Unit Outcomes:

er
At the end of the unit, you will learn:
 Concept of StackData Structures
 Queue Data Structure and Algorithm
 Linked List Data Structure and Algorithm
 Doubly Linked List – Data Structure and Algorithm

iv
 Stack and Queue Applications
 Analysis and Efficiency of algorithms
 Concept of Asymptotic Notations
 Complexity in Time
Un
 Analyzing Recursive Programs Using Various Strategies
 Divide and conquer recurrence equations and their solutions
 Review of various sorting techniques and its time complexity
ity

1.1 Overview of Data Structures

1.1.1 Stack Data Structure


A stack is an abstract data form with a capability that is limited (predefined). It is a basic
structure of data that allows elements in a specific order to be added and removed. It goes
Am

to the top of the stack every time an element is added, and the element at the top of the
stack, like a heap of objects, is the only one that can be removed.
)
(c
ty
si
er
Stack's Basic Characteristics
iv
Un
The stack is an ordered list of similar types of data.

A stack is a system of LIFO (Last in First out) or we can tell FILO (First in Last out).
ity

The function push() is used to insert new elements into the stack, and the function pop() is

used to delete elements from the stack. Only one end of the stack, called Top, allows both

insertion and deletion.


Am

When it is fully full, the stack is said to be in the overflow state and if it is empty, it is said to

be in the underflow state.

Stack's Applications

Reversing a term is the easiest use of a stack. You push a word into a stack, letter by letter,
)
(c

and then pop letters out of the stack. Other uses still exist, such as Parsing Speech

Conversion (Infix to Postfix, Postfix to Prefix, etc.)


Implementation of Stack Data Structure

ty
Using an Array or Related List, the stack can be easily implemented. Arrays are fast but are

limited in size, and the Linked List needs overhead but is not limited in size, to assign,

si
connect, unlink, and deallocate. We are going to implement Stack using an array here.

er
iv
Un
ity

Fundamental Activities
Am

Stack operations can include initialization, use, and then de-initialization of the stack. A

stack is used for the following two primary operations, in addition to these essential items.

Push()-Pushing an element on the stack (storing it).


)

Removing (accessing) an element from the stack by pop().


(c

If the information is moved onto the stack.


To use a stack effectively, we also need to verify the state of the stack. The following

ty
functionality is applied to stacks for the same reason −

Peek(): get the stack's top data part, without removing it.

si
IsFull()-checks if the stack is complete.

IsEmpty()-check whether there is an empty stack.

er
We retain a pointer to the last PUSHed data on the stack at all times. Since this pointer

always represents the top of the stack, the top is thus called. The top pointer gives the

iv
stack's top value without necessarily removing it.
Un
1.1.2 Queue Data Structure and Algorithm

Like stack data structure, Queue is either an abstract data form or a linear data structure in

which the first element is added from one end called REAR (also called the tail) and the
ity

existing element is removed from the other end called FRONT (also called the head).

This generates a queue as the data structure of FIFO(First in First Out), which ensures that

the variable inserted first will be removed first.


Am

This is exactly how the queue structure in the real-world functions. If you go to the movie

ticket counter to purchase movie tickets, and you're in the queue first, you're going to be

the first one to get the tickets. Huh? Right? The same is the case for the structure of queue

data. The data that is entered first will first exit the queue.

The process by which an element is inserted into the queue is called Enqueue, and the
)
(c

process by which an element is removed from the queue is called Dequeue.

Fig below shows the Queue Operation:


ty
si
We will try to explain the fundamental operations associated with queues here.

er
Enqueue() − to add an object (store) to the queue.

Dequeue() − delete an object from the queue (access).

iv
Un
ity
Am

Queue's Basic Features

1. The queue is often an ordered list of elements of related data types, like a stack.
)

2. A FIFO(First In First Out) system is a queue.


(c
3. Once a new element is inserted in the queue, it is important to remove all the

ty
elements inserted in the queue before the new element is inserted, to delete the

new element.

4. The peek() function is often used to return, without dequeuing, the value of the first

si
element.

er
Queue Applications

Queue, as the name indicates, is used if we need to treat any group of objects in an order in

iv
which the first one comes in, while the others wait for their turn, also gets out first, as in the

following scenarios:
Un
1. Serving demands, such as a printer, CPU task scheduling, etc., on a single shared

resource

2. Call Center telephone networks use lines in real-life situations to keep people calling

them in order until a service representative is ready.


ity

3. Interrupt management of real-time applications. The interrupts are treated in the

same order, i.e., first come first served, as they arrive.


Am

Queue Data Structure Implementation

An Array, Stack, or Linked List may be used to enforce the queue. Using an array is the best

way of implementing a queue.

At the first index of the array, initially the head(FRONT) and the tail(REAR) of the queue
)

points (starting the index of an array from 0). The tail keeps going forward as we add
(c
elements to the queue, always pointing to the place where the next element will be added,

ty
while the head stays on the first index.

si
er
iv
Un
ity
Am

We can follow two possible approaches when we remove an element from the queue

(mentioned [A] and [B] in the above diagram). We delete the element at the head position

in the [A] approach and then transfer all the other elements one by one into the forward

position.
)
(c

We remove the element from the head position in Approach[B] and then move the head to

the next position.


There is an overhead in approach[A] of moving the elements one position forward each

ty
time we remove the first element.

There is no such overhead in approach [B], but whenever we move head one position

si
ahead, after removing the first variable, each time the size of the queue is reduced by one

space.

er
Algorithm for running ENQUEUE

1. Check to see whether or not the queue is complete.

iv
2. If the queue is complete, then the program will print an overflow error and exit.

3. If the queue is not complete, raise the queue and add an element to it.
Un
Algorithm for working with DEQUEUE

1. Please check whether or not the queue is empty.

2. If the queue is empty, then the program will print an underflow error and exit.
ity

3. If the queue is not empty, print the head element and increase the head value.

/* Below program is written in C++ language */


Am

#include<iostream>

using namespace std;

#define SIZE 10

class Queue

{
)

int a[SIZE];
(c

int rear; //same as tail

int front; //same as head


public:

Queue()

ty
{

rear = front = -1;

si
}

//declaring enqueue, dequeue and display functions

void enqueue(int x);

er
int dequeue();

void display();

iv
};

// function enqueue - to add data to queue

void Queue :: enqueue(int x)


Un
{

if(front == -1) {

front++;

}
ity

if( rear == SIZE-1)

cout<< "Queue is full";

}
Am

else

a[++rear] = x;

}
)

// function dequeue - to remove data from queue


(c

int Queue :: dequeue()

{
return a[++front]; // following approach [B], explained above

ty
// function to display the queue elements

void Queue :: display()

si
{

int i;

for( i = front; i<= rear; i++)

er
{

cout<< a[i] <<endl;

iv
}

// the main function


Un
int main()

Queue q;

q.enqueue(10);
ity

q.enqueue(100);

q.enqueue(1000);

q.enqueue(1001);

q.enqueue(1002);
Am

q.dequeue();

q.enqueue(1003);

q.dequeue();

q.dequeue();

q.enqueue(1004);
)

q.display();
(c

return 0;

}
return a[0]; //returning first element

ty
You simply need to change the dequeue method to implement approach[A] and add a for

loop that will change all the remaining elements by one location.

si
for (i = 0; i< tail-1; i++) //shifting all other elements

er
a[i] = a[i+1];

iv
tail--;

}
Un
1.1.3 Linked List Data Structure and Algorithm

Introduction to Linked Lists


ity

The Linked List is a linear data structure consisting of a set of nodes in a sequence that is

very widely used. Each node contains its data and the address of the next node, thus

creating a structure like a chain. To build trees and graphs, connected lists are used.
) Am
(c
Advantages of Linked Lists

ty
1. They are dynamic in nature which, when needed, allocates the memory.

2. It is possible to quickly execute insertion and deletion operations.

si
3. Stacks and queues can be executed conveniently.

4. The Linked List reduces the time for entry.

er
Linked Lists Drawbacks

iv
1. As pointers require extra memory for storage, the memory is wasted.

2. No element can be randomly accessed; each node must be sequentially accessed.


Un
3. In the linked list, Reverse Traversing is hard.

Related Linked Lists Program


ity

1. For implementing stacks, queues, graphs, etc., linked lists are used.

2. At the beginning and end of the list, connected lists let you insert elements.

3. We do not need to know the size in advance for Linked Lists.


Am

Types of Linked Lists

There are 3 different Linked List implementations available, and they are:

1. Linked Single List.

2. Doubly LinkedList.
)

3. Circular Linked List.


(c

Let us get to know more about them and how different they are from each other.
Single Linked List

ty
Single linked lists contain nodes that have a part of the data as well as a part of the address,

i.e. the next, referring to the next node in the node chain.

si
Insertion, deletion, and traversal are the operations we can perform on independently

bound lists.

er
iv
Un
Doubly Linked List
ity

Each node contains a data component and two addresses in a doubly-linked list, one for the

previous node and one for the next node.


) Am
(c

Circular Linked List


The last node of the list in a circular connected list holds the address of the first node, thus

ty
creating a circular chain.

si
er
Fundamental Operations

iv
Un
The basic operations supported by a list are then followed.

 Insertion − At the beginning of the list, add an element.


 Deletion − Deletes an item at the beginning of the list.
 Display − Shows the list in full.
 Search: Use the given key to search for an element.
 Delete − Using the given key, delete an element.
ity

Operation Insertion
Am

Adding a new node to the linked list is a multi-step activity. With diagrams here, we shall

learn this. First, using the same structure, create a node and find the place where it has to
)

be inserted.
(c
Imagine inserting a B (NewNode) node between A (LeftNode) and C (RightNode). Point

ty
B.next to C, then −

NewNode.next −>RightNode;

si
er
iv
Un
Now, the next node should point to the new node on the left.

LeftNode.next +>NewNode;
ity
Am

This is going to position the new node in the middle of the two. The updated list ought to

look like the figure below


)
(c
Similar steps should be taken if the node at the beginning of the list is added. The second

ty
last node of the list should point to a new node when adding it at the top, and the new node

should point to NULL.

si
Operation Deletion
Deletion is often a process involving more than one Phase. With pictorial representation, we

er
will understand. First, by using search algorithms, find the target node that you want to

delete.

iv
Un
The target node's left (previous) node should now point to the target node's next node −

LeftNode.next =>TargetNode.next;
ity
Am

This will remove the connection to the target node that was pointing to it. Now we're going

to remove what the target node is pointing to, using the following code.
)

TargetNode.next => NULL;


(c
ty
si
We need to use the node that was deleted. Otherwise, we can simply allocate memory and

wipe off the target node entirely. We can keep that in memory.

er
iv
Un
Operation Reverse
This operation is an exhaustive one. We need to render the last node that the head node is

pointing at and reverse the entire linked list.


ity
Am

Next, we're going to the end of the page. That should point to NULL. Now, we will point it to

its previous node,


)
(c
ty
si
We've got to make sure the last node isn't the last one. So, we're going to have a temp

node, which looks like a head node that points to the last node. Now, one by one, we are

er
going to make all left side nodes point to their previous nodes.

iv
Un
All nodes should point to their predecessor, excluding the node (first node) pointed to by

the head node, making them their current successor. NULL will be referred to by the first

node.
ity
Am

Using the temp node, we will make the head node point to the first new node.
)
(c
The list linked is now reversed.

ty
1.1.4 Doubly Linked List – Data Structure and Algorithm
The Doubly Linked List is a form of linked list in which there are two links to each node apart

si
from storing its data. The first connection points to the previous node in the list, while the

second connection points to the next node in the list. The list's first node has its previous

er
link pointing to NULL The last node in the list, likewise, has its next node pointing to NULL.

The relevant words for understanding the notion of a doubly connected list are below.

iv
Link − A data called an element can be stored by each link of a linked list.

Next − Each link in a linked list includes a link named Next to the next link.
Un
Prev − Each link on a linked list includes a link named Prev to the previous link.

Linked List- The relation link to the first link named First and to the last link called Last is

found in − A Linked List.


ity

Representation of Doubly Linked List:


) Am

As per the above example, the essential points to be considered are below.
(c

 There is a link feature named first and last in the Doubly Linked List.
 Each link has a field(s) of data and two fields of links called next and prev.

ty
 Using his next link, each link is connected to his next link.

 Using its previous link, each link is associated with its previous link.

 To mark the end of the list, the last link carries a null link.

si
Fundamental Activities

er
 The basic operations provided by a list are then followed.

 Insertion − At the beginning of the list, add an element.

iv
 Deletion eliminates an item from the top of the list.

 Insert Last − The element at the end of the list is inserted.


Un
 Delete Last − Deletes an element from the list's edge.

 Insert After − Adds an element to the list after an object.

 Delete − Deletes an element using a key from the list.

 Show forward: Shows in a forward manner the full list.


ity

 Show backward-Shows in a backward way the complete list.

Operation Insertion
Am

The following code indicates the insertion operation at the start of a double-linked list.

//insert link at the first location


voidinsertFirst(int key,int data){

//create a link
struct node *link =(struct node*)malloc(sizeof(struct node));
link->key = key;
link->data = data;
)

if(isEmpty()){
(c

//make it the last link


last= link;
}else{
//update first prev link
head->prev= link;

ty
}

//point it to old first link


link->next= head;

si
//point first to new first link
head = link;
}

er
Operation Deletion

The following code indicates the deletion operation at the start of a double-linked list.

//delete first item


struct node* deleted first(){

iv
//save reference to first link
struct node *tempLink= head;
Un
//if only one link
if(head->next== NULL){
last= NULL;
}else{
head->next->prev= NULL;
}

head = head->next;
ity

//return the deleted link


returntempLink;
}

Insertionat the End of an Operation


//insert link at the last location
Am

void insertlast(int key,int data){

//create a link
struct node *link =(struct node*)malloc(sizeof(struct node));
link->key = key;
link->data = data;

if(isEmpty()){
//make it the last link
last= link;
)

}else{
(c

//make link a new last link


last->next= link;

//mark old last node as prev of new link


link->prev=last;
}

ty
//point last to new last node
last= link;
}

si
1.1.5 Applications

er
Stack's Applications

Reversing a term is the easiest use of a stack. You push a word into a stack, letter by letter,

iv
and then pop letters out of the stack. Other uses still exist, such as Parsing Speech

Conversion (Infix to Postfix, Postfix to Prefix, etc.)


Un
Queue Applications

Queue, as the name indicates, is used if we need to treat any group of objects in an order in

which the first one comes in, while the others wait for their turn, also gets out first, as in the

following scenarios:
ity

1. Serving demands, such as a printer, CPU task scheduling, etc., on a single shared

resource

2. Call Center telephone networks use lines in real-life situations to keep people calling
Am

them in order until a service representative is ready.

3. Interrupt management of real-time applications. The interrupts are treated in the

same order, i.e., first come first served, as they arrive.

1.1.6 Analysis and Efficiency of algorithms


)

An algorithm's efficiency can be evaluated at two separate levels, before and after
(c

implementation. These are the following −


A Priori Analysis- This is an algorithm theoretical analysis. An algorithm's efficiency is

ty
calculated by assuming that all other variables are constant and have no effect on the

implementation, such as processor speed.

si
A Posterior Analysis- This is an algorithm's empirical analysis. A programming language is

used to implement the chosen algorithm. This is executed on the target computer machine

er
afterward. In this analysis, actual statistics are collected, such as the necessary running time

and space.

iv
We will think about testing the algorithm a priori. The study of algorithms deals with the

execution or running time of different operations involved. You may describe the running
Un
time of an operation as the number of machine instructions per operation performed.

Complexity of Algorithms

If X is an algorithm and n is the input data size, the time and space used by the X algorithm
ity

are the two key factors that determine X's performance.

Time Factor: Time is calculated by counting the number of main operations in the sorting

algorithm, such as comparisons.


Am

Space Factor: Space is determined by counting the algorithm's maximum memory space

available.

The complexity of an algorithm f(n) gives the algorithm's necessary runtime and/or storage

space in terms of n as the input data size.


)
(c

Complexity in Space
The algorithm's space complexity reflects the amount of memory space needed by the

ty
algorithm throughout its life cycle. The space needed by an algorithm is equal to the sum of

the two components that follow,

si
A fixed component is a space needed to store some information and variables that are

independent of the problem's size. Easy variables and constants, for instance, used,

er
program size, etc.

A part of a variable is the space needed by variables, the size of which depends on the

iv
problem's size. For instance, allocating dynamic memory, recursion stack space, etc.

Complexity in space Of any algorithm, S(P) P is S(P) = C + SP(I), where C is the fixed part and
Un
S(I) is the algorithm's variable part, which depends on the characteristic of the instance I.

The following is a brief example that helps to illustrate the theory.

SUM: algorithm (A, B)


ity

Phase 1 - Beginning

Phase 2 - C > A + B + 10
Am

Phase 3 - Avoiding

We have three variables here, namely A, B, and C, and one constant. Therefore, S(P)= 1 + 3.

Now, space depends on and will be multiplied accordingly by data types of given variables

and constant types.


)

Analysis of Algorithm
(c
In the theoretical study of algorithms, it is usual to estimate the complexity function in an

ty
asymptotic sense, that is, for arbitrarily large input. Donald Knuth coined the term "the

study of algorithms."

si
Algorithm analysis is a vital aspect of computational complexity theory since it provides a

theoretical estimate of how much time and energy an algorithm would take to solve a given

er
problem. The majority of algorithms are made to work with inputs of any length. Algorithm

analysis is the calculation of the amount of time and space resources necessary for it to be

performed.

iv
Usually, an algorithm's efficiency or running time is expressed as a function comparing the
Un
input length to the number of steps (time complexity) or the volume of memory (space

complexity).

The Importance of Analysis


ity

In this chapter, we will talk about why it is important to evaluate algorithms and how to

choose the best algorithm for a given problem, since one computational problem can be

solved by a variety of algorithms.


Am

Through considering an algorithm for a particular problem, we can begin to improve pattern

recognition so that the aid of this algorithm can solve similar types of problems.

Algorithms are also very different from each other, but these algorithms have the same

purpose. We know, for instance, that a set of numbers can be sorted using various
)

algorithms. For the same data, the number of comparisons performed by one algorithm
(c

differs from that of others. Therefore, the time complexity of those algorithms will vary. We

need to measure, at the same time, the memory space needed by each algorithm.
Algorithm analysis is the method of evaluating the algorithm's problem-solving potential in

ty
terms of the time and size required (the size of memory for storage while implementation).

However, the most important factor in algorithm analysis is the necessary time or efficiency.

We usually perform the following kinds of analysis,

si
Worst-case-the maximum number of steps taken in an example of a scale.

er
The smallest number of steps taken on any instance of size and in the best-case scenario.

An average number of steps taken on any instance of size and in the average case.

iv
Amortized is a concept that refers to a series of operations that are applied to a size input

and averaged over time.


Un
We need to consider time and space complexity to solve a problem, as the program can run

on a device where memory is limited but sufficient space is available or vice versa. If we

compare the bubble sort and the merged sort in this sense, the Bubble sort needs no
ity

additional memory, but additional space is required for merge sorting. Though bubble sort

has a higher time complexity than merge sort, it may be appropriate to use it if the program

must run in a memory-constrained environment.


Am

Analysis Method – Asymptotic Analysis

An algorithm's asymptotic analysis refers to the concept of its run-time performance's

mathematical foundation/framing. We can very well infer the best case, average case, and

worst-case scenario of an algorithm using asymptotic analysis.


)
(c
Asymptotic analysis is bound by input, i.e. it is concluded to operate in a constant time if

ty
there is no input to the algorithm. All other variables are considered stable, other than the

input.

si
In mathematical units of computation, asymptotic analysis refers to calculating the running

time of any operation. E.g., one operation's running time is computed as f(n), while another

er
operation's running time is computed as g. (n2). This means that as n increases, the first

operation's running time will increase linearly, while the second operation's running time

will increase exponentially. Similarly, if n is substantially small, the runtime of both

iv
operations would be almost the same.
Un
An algorithm's time requirement typically falls into one of three groups.

 In the best-case scenario, the program execution time is as short as possible.

 Average Case Average time needed for execution of the program.

 Bad Case-Maximum time needed for implementation of the program


ity

1.1.7 Asymptotic Notations

The widely used asymptotic notations to measure an algorithm's running time complexity
Am

are as follows.

O Notation

Ω Notation
)

θ Notation
(c

Big Oh Notation, Ο
The formal way to express the upper bound of an algorithm's running time is to use the

ty
notation (n). It calculates the worst-case time complexity, or the maximum time an

algorithm will take to complete.

si
er
iv
For example, for a function f(n)
Un
Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n > n0. }

Omega Notation, Ω

The notation Ω (n) is the formal way of describing the lower limit of the running time of an
ity

algorithm. It calculates the best-case time complexity, or the shortest time an algorithm will

take to complete.
) Am
(c

For a function f(n)


Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }

ty
Theta Notation 

The notation (n) is the official way to express both the lower bound and the upper bound

si
of the running time of an algorithm. The following is a representation of it:

er
iv
Un
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }

Asymptotic Notations Used Frequently


ity

A list of some common asymptotic notations is given below.


) Am
(c
1.1.8 Complexity in Time

ty
An algorithm's time complexity reflects the amount of time needed for the algorithm to run

si
to completion. Time requirements can be defined as a T(n) numeric function, where T(n) can

be calculated as the number of steps, given that constant time is consumed by each step.

er
For instance, n steps are used to add two n-bit integers. Therefore, the total computational

time is T(n) = c * n, where c is the time taken for two bits to be inserted. Here, as the input

iv
size increases, we note that T(n) grows linearly.
Un
1.1.9 Analyzing Recursive Programs Using Various Strategies.
Many algorithms are necessarily recursive. We get a recurrence relation for time complexity
when we evaluate them. We can measure the running time for an input of size n as a
function of n, as well as the running time for smaller inputs. To sort a given array in Merge
Sort, for example, we divide it into two halves and recursively repeat the process for each
half. Finally, we integrate the effects. The Merge Sort's time complexity can be written as
T(n) = 2T(n/2) + cn. There are many other algorithms, such as Binary Search, Hanoi Tower,
ity

etc.

Three methods of solving recurrences are mainly available.

1. Substitution Method: We make an informed guess at the answer and then use
Am

mathematical inference to show whether or not our guess is correct.

Consider, for instance, the recurrence T(n) = 2T(n/2) + n

We think of the solution as T(n) = O (N log N). Now, to prove our guess, we are using

induction.

We've got to show T(n) <= cnLogn. For values smaller than n, we can assume that it
)
(c

is valid.

T(n) = 2T(n/2) + n
<= 2cn/2Log(n/2) + n

ty
= cnLogn - cnLog2 + n

= cnLogn - cn + n

<= cnLogn

si
2. Recurrence Tree Method: We draw a recurrence tree in this method and measure

the time taken by every tree stage. Finally, we add up all of the work we have done

er
so far. To draw the recurrence tree, we begin with the given recurrence and

continue drawing until a pattern emerges among the levels. Typically, the pattern is

iv
an arithmetical or geometric sequence.

Clipping Below Shows an example of a recurrence relation.


Un
ity
)Am
(c
To find the value of T(n), we must add the sums of the tree nodes at each step. If we

ty
sum the tree level above by level, we get the series below

T(n) = c(n^2 + 5(n^2)/16 + 25(n^2)/256) + ______

Geometrical progression with a ratio of 5/16 is the sequence above.

si
We can sum the infinite series up to get an upper bound.

The sum is given as (n2)/(1 - 5/16), which is O (n2)

er
3. Master Method: A straightforward way to get the solution is the Master Method.

The master method only works with the following type of recurrence or recurrence

iv
that, can be translated to the next type.

T(n) = aT(n/b) + f(n), where a and b are both greater than one.(a & b >1)
Un
The following are three scenarios:

If f(n) = Θ(nc) & c <Logba then T(n) = Θ(nLogba)

If f(n) = Θ(nc) & c = Logba then T(n) = Θ(ncLog n)

If f(n) = Θ(nc) & c >Logba then T(n) = Θ(f(n))


ity

How is this functioning?

The recurrence tree method is the key source of inspiration for the master method.
Am

If we draw T(n)= aT(n/b) + f(n) recurrence tree, we can see that the root word is f(n)

and the work performed on all leaves is al(NC) where c is Logba. And the recurrence

tree's height is log in.


)
(c
Unit 1.2 DIVIDE AND CONQUER PARADIGM

ty
1.2.1 Divide and conquer recurrence equations and their solutions

si
The issue at hand is divided into smaller sub-problems in the divide and conquer method,

and then each issue is solved separately. We can eventually reach a point where no further

er
division is possible if we continue to split the subproblems into smaller subproblems. The

smallest possible "atomic" sub-problem (fractions) is solved. Finally, the solution of all sub-

iv
problems is combined to achieve the solution of an initial problem.
Un
ity
) Am

The divide-and-conquer strategy can be broken down into three stages.


(c

Divide
Breaking the problem down into smaller sub-problems is the next step. Sub-problems

ty
should be a part of the original topic. In general, this move uses a recursive method to

divide the issue until no sub-problem is further divisible. Sub-problems become atomic in

nature at this stage, but they still reflect a portion of the larger problem.

si
Conquering/Solving

er
This process requires several smaller sub-problems to be tackled. Generally, the issues are

deemed 'solved' on their own at this stage.

iv
Combine/Merge

This stage recursively combines them as the smaller sub-problems are solved, before they
Un
formulate a solution to the original problem. This algorithmic approach is recursive, and the

conquer and merge phases are so similar together that they appear to be one.

The following computer algorithms are based on the programming method of divide and
ity

conquer.

 Merge Sort

 Quick Sort
Am

 Binary Search

 Strassen's Matrix Multiplication

 Closest pair (points)

There are several approaches to solving any computer problem, but the ones mentioned
)

above are a good example of a divide and conquer strategy.


(c
Dynamic Programming Approach

ty
In breaking down the problem into smaller and smaller potential sub-problems, the dynamic

programming approach is similar to divide and conquer. But these sub-problems, unlike

si
divide and conquer, are not individually solved. Instead, for related or overlapping sub-

problems, the outcomes of these smaller sub-problems are remembered and used.

er
Where we have problems, dynamic programming is used, which can be broken into related

sub-problems such that their results can be reused. For optimization, mainly, these

iv
algorithms are used. The dynamic algorithm will try to analyze the outcomes of the

previously solved sub-problems before solving the in-hand sub-problem. Sub-problem


Un
solutions are mixed to find the best solution.

But we should say that...

 The issue can be divided into smaller sub-problems that overlap.


ity

 Using the optimal solution of smaller sub-problems, an optimum solution can be

obtained.

 Memorization is used in complex algorithms.


Am

Comparison

Dynamic algorithms are driven to optimize the problem overall, as opposed to greedy
)

algorithms, where local optimization is discussed.


(c

Dynamic algorithms use the output of a smaller sub-problem and then try to maximize a

bigger sub-problem, in comparison to divide and conquer algorithms, where solutions are
combined to achieve an overall solution. Memorization is used by dynamic algorithms to

ty
recall the performance of sub-problems already solved.

As an example

si
The dynamic programming methodology can be used to solve the following computer

problems:

er
Series of Fibonacci numbers

Knapsack problem

The Hanoi Tower

iv
Floyd-Warshall found the shortest path for all pairs.

The shortest Dijkstra route


Un
Timetables for programs

Both top-down and bottom-up approaches can be used for dynamic programming. And, of
ity

course, much of the time, in terms of CPU cycles, referring to previous solution performance

is cheaper than recomputing.


) Am
(c
ty
si
er
iv
Un
We measure the total work done in the recurrence tree process. If the work done at the

leaves is polynomially more, leaves are the dominant component, and our result is the work

done at the leaves (Case 1). If the work done at the leaves and root is asymptotically the
ity

same, the result is height multiplied by the work done at either step (Case 2). If the sum of

work done at the root is asymptotically greater, our result is work done at the root (Case 3).
Am

Examples of some traditional algorithms whose time complexity can be calculated using the

Master Method

Sort of Merge: T(n) = 2T(n/2) + Θ(n). It falls in case 2 as c is 1 and Logba] is also 1. Hence
)

solution is Θ(n Logn)


(c
T(n) = T(n/2) + Θ(1).As c is 0 and Logba is also 0, it falls into case 2. Hence the solution is

ty
(Logn)

Observations:

1) A recurrence of the form T(n) = aT(n/b) + f(n) is not expected to be resolved using the

si
Master Theorem. There are some differences between the three cases presented. The

recurrence T(n) = 2T(n/2) + n/Logn, for example, cannot be solved using the master form.

er
2) For f(n) = f(n) = f(n) = f(n) = f(n) = f(n) (ncLogkn)

T(n) = (ncLogk+1n) if f(n) = (ncLogkn) for some constant k >= 0 and c = Logba.

iv
Un
1.2.2 Review of various sorting techniques and its timecomplexity

Sorting Techniques – Data Structure


ity

Sorting refers to sorting information in a certain way. The sorting algorithm determines how

data should be ordered in a given order. The most popular forms of orders are numerical

and lexicographical.
Am

The importance of sorting lies in the fact that, if data is stored in a sorted manner, data

searching can be streamlined to a very high degree. To represent data in more readable

formats, sorting is often used. Some examples of sorting in real-life situations are below:

Telephone Directory − The telephone directory stores individuals' telephone numbers


)

sorted by their initials, so that they can quickly search for names.
(c

The dictionary organizes words alphabetically so that searching for a particular word is easy.

Sorting in-place and sorting out-of-place


For comparison and temporary storage of a few data elements, sorting algorithms which

ty
require some extra space. These algorithms do not need any additional storage, and sorting

is said to take place in-place, or within the array itself. In-place sorting is called this. In-place

sorting is illustrated by the bubble type.

si
The software, however, needs space in some sorting algorithms that are more than or equal

er
to the elements being sorted. Not-in-place sorting is called sorting that uses equivalent or

greater space. Not-in-place sorting is exemplified by merge-sort.

iv
Sorting Stable and Not Stable

If, after sorting the content, a sorting algorithm does not modify the sequence of related
Un
content in which it appears, it is called stable sorting.
ity
Am

Stable sorting happens when a sorting algorithm, after sorting the contents, changes the

order in which related content appears.


)
(c
ty
si
er
iv
If we want to preserve the sequence of original elements, as in a tuple, for example, an

algorithm's stability matters.


Un
Algorithm for Adaptive and Non-Adaptive Sorting

If a sorting algorithm takes advantage of already sorted elements in the list to be sorted, it is

said to be adaptive. That is, adaptive algorithms will take into account the fact that certain
ity

elements in the source list have already been sorted and will try to avoid re-ordering them

when sorting.

A non-adaptive algorithm lacks the elements that have already been sorted. To prove their
Am

orderliness, they try to force every single thing to be re-ordered.

TermMeanings

When addressing sorting methods, some words are commonly coined, here is a brief
introduction to them
)

Increasing the Degree of Order


(c
A sequence of values, if the successive element is greater than the previous one, is said to
be in increasing order. 1, 3, 4, 6, 8, 9, for instance, are in increasing order, since the next

ty
element is greater than the previous one.

Diminishing Order

si
A series of values, if the successive element is less than the current one, is said to be in

decreasing order. For instance, as every next element is less than the previous element, 9, 8,

er
6, 4, 3, 1 are in decreasing order.

iv
Order Non-Increasing
Un
If the successive element in the sequence is less than or equal to its previous element, a

sequence of values is said to be in non-increasing order. This order occurs when duplicate

values are present in the sequence. For example, the numbers 9, 8, 6, 3, 3, 1 are not in

ascending order since each subsequent element is less than or equal to (in the case of 3) but
ity

not greater than the previous element.

Order of Non-Decreasing
Am

A set of values, whether the successive element is greater than or equal to the previous

element in the sequence, is said to be in non-decreasing order. This order occurs when

duplicate values are present in the sequence. For example, the numbers 1, 3, 3, 6, 8, and 9

are not in decreasing order because each subsequent element is greater than or equal to (in

the case of 3) but not less than the previous one.


)
(c
Bubble Sort Algorithm

ty
Bubble sorting is a basic algorithm for sorting. This sorting algorithm is a comparison-based

algorithm that compares each pair of adjacent elements and, if they are not in order, swaps

si
the elements. For large data sets, this algorithm is not suitable because its average and

worst-case complexity are O(n2), where n is the number of products.

er
What is the method for sorting bubbles?

As an example, we will use an unsorted array. Bubble sorting takes μ(n2) time, so we keep it

iv
fast and accurate.
Un
The first two elements in the bubble type are compared to see which one is greater.
ity
Am

Value 33 is greater than 14 in this situation because it is already in the sorted positions.

After that, we equate 33 to 27.


)
(c

We discover that 27 is less than 33, so the two numbers must be switched.
ty
The new array ought to look like this,

si
er
iv
After that, we'll compare 33 and 35. We learn that both have already been sorted.
Un
After that, we'll look at the next two numbers, 35 and 10.
ity
Am

Therefore, we know that 10 is less than 35. As a result, they are not sorted.
)

These values are swapped. We conclude that we have reached the end of the array. The
(c

array should look like this after one iteration.


ty
To be more accurate, we are now illustrating how an array should appear after each

si
iteration. It should look like this after the second iteration:

er
iv
It is worth remembering that at the end of each iteration, at least one value shifts.
Un
And bubble sorts learn that an array is fully sorted when there is no swap needed.
ity

Now it is time to look at some of the practical aspects of bubble sorting.


Am

The Algorithm

We assume that the list is an array of elements with n. We also assume that the swap
function swaps the values of the array elements passed to it.

start sorting bubbles (list)


)

for all list elements


(c

If list[i] > list[i+1] is defined,

Exchange(list[i], list[i+1])
if it's over

End for

ty
Return List

Terminate the Bubble Sort

si
er
Pseudocode for

iv
We note in the algorithm that, unless the entire array is completely sorted in ascending

order, Bubble Sort compares each pair of array elements. This can cause a few problems of
Un
complexity, such as what if the array no longer needs to be switched as all the elements are

already raising.

We use a swapped flag variable to ease the problem, which will help us see whether any
ity

swap has occurred or not. It will exit the loop if no swap has occurred, i.e. the array needs

no further processing to be sorted. The pseudo-code for the Bubble Sort algorithm is as

follows:
Am

The bubble Sort procedure ( list: An array of items )

list. Count; loop = list. Count;

To loop-1 for I = 0, do:

Correct = switched

Do the following for j = 0 to loop-1:


)

/* Compare the elements next to each other */


(c

If list[j] is greater than list[j+1], then

/* turn them around */


Exchange(list[j], list[j+1])

Swapped = Actual

ty
if it's over

End for the

si
/*If no numbers were swapped, it means that

Break the loop now that the list has been sorted.

Then if(not swapped)-- Split in a break  if it's over End for

er
Return list at the end of the treatment

iv
Putting it into effect
Un
Another problem we did not answer in our original algorithm and its improvised pseudo

code is that the highest values settle at the end of the list after each iteration. As a

consequence, the next iteration does not need to contain elements that have already been

sorted. For this reason, we restrict the inner loop in our implementation to avoid already
ity

sorted values.

Insertion Sort Algorithm


Am

This is a sorting algorithm that uses in-place comparisons. Here, a sub-list that is always

sorted is preserved. The lower part of an array, for example, is held to be sorted. An

element to be 'inserted' in this sorted sub-list must find its suitable position and then it must

be inserted there. As a result, the word "insertion type" was coined.


)

The collection is sequentially scanned, and unsorted objects are transferred to the sorted
(c

sub-list and inserted (in the same array). For large data sets, this algorithm is not suitable

since its average and worst-case complexity is O(n2), where the number of items is n.
What Is Insertion Sort and How Does It Work?

ty
In our example, we take an unsorted array.

Array unsorted

si
er
The first two elements are compared in insertion sort.

iv
Un
It discovers that 14 and 33 are already grouped in ascending order. For the time being, 14 is
in the sorted sub-list.
ity

Insertion sort advances to the next stage, comparing 33 to 27.


Am

And he learns that 33 is not in the right place.


)

Swap 33 for 27. It also double-checks all of the elements in the sorted sub-list. Here we see
(c

that only one element of the sorted sub-list is 14, and 27 is greater than 14. Hence, after
switching, the sorted sub-list remains sorted.
ty
si
In the sorted sub-list, we now have 14 and 27. Next, you equate 33 to 10.

er
iv
These numbers aren't grouped in any specific order.
Un
But we're exchanging them.
ity
Am

Swapping makes 27 and 10 unsorted, however.


)
(c

We also switch them, thus.


ty
si
We'll find 14 and 10 again in an unsorted order.

er
iv
We turn them around once more. We have a sorted sub-list of four items at the end of the
Un
third iteration.
ity

This method continues until a sorted sub-list includes all of the unsorted values. We'll now
look at some programming aspects of the insertion type.

Algorithm
Am

Now we have a bigger picture of how this sorting process operates, so we can take easy
steps to accomplish the form of insertion.

Phase 1: It is already sorted if it is the first element. 1 is returned;

Phase 2: Choose the next feature


)
(c

Step 3: Compare all the elements in the sub-list of the sorted items
Phase 4: Move all elements in the sorted sub-list that are greater than the threshold.

ty
A value that should be sorted

si
Phase 5: Fill in the value

Phase 6: Proceed until the list is sorted.

er
Pseudocode for

iv
Insertion procedure

SortSorting procedure ( A: an array of items )


Un
Int HolePositionLocation

ValueToInsert int
ity

For I = 1 to inclusive length(A) do:

/* Select the attribute to be added */


Am

ToInsertValue = A[i]

HolePosition = I HolePosition= I

/*find the hole location for the inserted part */


)
(c

If holePosition> 0 and A[holePosition-1] >ToInsertValue do:


A[holePosition] = A[holePosition-1] A[holePosition]

ty
holePosition = holePosition -1; holePosition = holePosition -1; holePosition

si
End While Finishing

/* Place the number in the appropriate hole location */

A[holePosition] = valueToInsert Value

er
End for the

End Method Procedure

Selection Sort Algorithm

iv
Un
Selection sort is a basic algorithm for sorting. This sorting algorithm is a comparison-based

in-place algorithm in which the list is split into two parts, the left-end sorted part, and the

right-end unsorted part. The sorted part is initially empty, while the unsorted part contains
ity

the entire list.

The unsorted array's smallest element is chosen and switched with the leftmost element,
Am

resulting in that element being a part of the sorted array. This process continues to shift the

unsorted array boundary to the right by one portion.

Since the average and worst-case complexities of this algorithm are O(n2), where n is the

number of objects, it is not suitable for large data sets.


)
(c

How Does the Selection Sort Work?

As an example, consider the array depicted below.


ty
si
The entire sorted list is searched sequentially for the first place. We scan the entire list and
find that 10 is the lowest value in the first position where 14 is currently stored.

er
And so we're replacing 14 with 10. After one iteration of 10, which is the minimum value of

iv
the list, the first position of the sorted list is shown.
Un
We begin scanning the remainder of the list linearly for the second location, where 33
resides.
ity

We find that the second lowest value in the list is 14 and that it should appear in the second
position. We are swapping these qualities.
Am

After two iterations, the two least values are sorted and put at the beginning.
)
(c

The remainder of the objects in the collection are handled in the same way.
The following is a diagram of the entire sorting procedure.

ty
si
er
iv
Un
ity
Am

Algorithm

Set MIN to 0 in the first step.

Phase 2: Check in the list for a minimum element

Phase 3 Switch at the MIN position with value


)

Phase 4: Increase the MIN to indicate the next factor


(c

Phase 5: Proceed until the list is sorted.


Selection of Pseudocode Procedures List of kinds: The Object Collection

ty
N: list size for I = 1 to n - 1

/* set the current element to a minimum of */

si
min = I /* check the element to be a minimum of */

er
for j = i+1 to n if list[j] < list[min] then min = j; end when /* ends, swap the minimum

element with the current element*/

iv
if indexMin != I swap list[min] and list[i] end when list[min] and list[i] ends.

End Method Procedure


Un
Merge Sort Algorithm
ity

Merge sort is a sorting system based on the strategy of divide and conquer. It is one of the

most respected algorithms, with the worst-case time complexity being O(n log n).

Merge sort splits the list into equal halves before merging them in sorted order.
Am

What Is Merge Sort and How Does It Work?

To understand merge sort, consider the following unsorted array:


)
(c
We know that the merging sort first iteratively splits the entire array into equal halves

ty
before the atomic values are reached. We can see that an 8-item array is split into two 4-

item arrays.

si
er
The sequence of appearance of objects in the original does not alter this. We are breaking

these two arrays into halves now.

iv
Un
We split these arrays further and atomic value is achieved, which can no longer be divided.
ity

Now we put them back together in the same order as they were broken down. Please notice
Am

the color codes that are given for these lists.

We compare the elements in each list first, then combine them in a sorted manner into
)

another list. We can see that 14 and 33 are in the right order. We compare 27 and 10, and
(c
we put 10 first in the goal list of two values, followed by 27. We modify the order of 19 and

ty
35, while 42 and 44 are sequentially positioned.

We compare lists of two data values in the next iteration of the combining process and

si
combine them into a list of found data values, putting them all in sorted order.

er
iv
The list should look like this after the final merger.
Un
We should now review some merge sorting programming aspects.

Algorithm
ity

Merge sort splits the list into identical halves until it can't be divided any longer. By

definition, if there is only one element in the list, then that element is sorted. Merge sort
Am

then joins the smaller sorted lists together, keeping the new list sorted as well.

Phase 1: Return if there is only one element in the list that has already been sorted.

Phase 2: split the list into two halves, recursively, until it can no longer be divided.

Phase 3: Organize the smaller lists and combine them into a new list.
)
(c
Pseudocode

ty
We shall now see the pseudo-codes for merge sort functions. As our algorithms mean two

main functions − divide & merge.

si
Merge sort works with recursion which we shall see our implementation within an

equivalent way.

er
procedure mergesort( var a as array ) if ( n == 1 ) return a

var l1 as array = a[0] ... a[n/2]

iv
var l2 as array = a[n/2+1] ... a[n]
Un
l1 = mergesort( l1 )

l2 = mergesort( l2 )

return merge( l1, l2 )


ity

end procedure

procedure merge( var a as array, var b as array )


Am

var c as array while ( a and b have elements )

if ( a[0] > b[0] )

add b[0] to the highest of c


)

remove b[0] from b


(c

else add a[0] to the highest of c


remove a[0] from a

ty
end if

end while

si
while ( a has elements ) add a[0] to the highest of c

er
remove a[0] from a

end while

iv
while ( b has elements )

add b[0] to the highest of c


Un
remove b[0] from b

end while
ity

return c

end procedure

The Complexity of Sorting Algorithm


Am

The complexity of a sorting algorithm is used to measure the time it takes for a function to

sort 'n' objects. The choice of which sorting method is sufficient for a problem relies on

multiple dependency configurations for various issues. Of these factors, the most notable
)

are:
(c

The amount of time spent by the programmer programming a particular sorting program
Amount of computer time needed for the program to run

ty
The total amount of memory available to run the program.

The efficiency of Sorting Algorithm

si
The standard approach for calculating the amount of time needed to sort an array of 'n'

er
elements by a method is to evaluate the method and calculate the number of comparisons

(or exchanges) it takes. The metrics for most sorting techniques are based on the order in

which they appear in an input sequence since they are data sensitive.

iv
Various sorting methods are investigated in several cases, which are labeled as follows:
Un
The Best Case

Worst-case scenario

Case average
ity

Therefore, the outcome of these instances is always a formula that gives the average time

needed for a specific type of size 'n.' The majority of sorting methods have time

requirements ranging from O(log n) to O(n2).


Am

Time Complexity Analysis-

With possible cases, we have addressed the best, average, and worst-case complexity of

various sorting techniques.


)
(c
Sorting based on contrast or comparisons

ty
To find the sorted list, elements of an array are compared to each other in comparison-

based sorting.

si
Type: Bubble and Sort of Insertion

er
The complexity of the average and worst-case time: n^22.

When the list is already sorted, the best case time complexity is n.

iv
The worst-case scenario is when the list is sorted backward.

Type: selection-Sort
Un
Best time complexity, average, and worst-case: n^2, which is independent of data

distribution.

Type: Sort merge-


ity

nlogn, which is independent of data distribution, is the best, mediocre, and worst-case time

complexity.
Am

Type: Heap sorting-

Best, average, and worst-case time complexity: nlogn, which is independent of data

distribution.

Type: Quick Sort


)
(c

It is an approach to divide and conquer with recurrence:

T(n) = T(k) + T(n-k-1) + cn-k-1 cn


The worst-case scenario: the partition algorithm breaks the array into two sub-arrays of 0

ty
and n-1 elements when the array is sorted or reverse-sorted. As a result

T(n) = T(0) + T(n-1) + cn

si
T(n) = O(n2) is the output of solving this.

The best-case scenario and the average case scenario are as follows: The partition algorithm

er
divides the array into two equal-sized subarrays on average. As a consequence, T(n) =

2T(n/2) + cn

iv
T(n) = O(nlogn) is the output of solving this
Un
Non-comparison-based sorting

When non-comparison-based sorting is used, the elements of the array are not compared to

find the sorted array.


ity

Best, average, and worst-case time complexity of the Radix type: nk where k is the

maximum number of digits in array elements.

The best, average, and worst-case time complexity of the count type:n+k where k is the size
Am

of the count array.

Bucket type: Best and average complexity of time: n+k where the number of buckets is k.

If all elements belong to the same bucket, the worst-case time complexity is n2.
)

Technique In-place/Outplace-
(c

If it doesn't need any extra memory to sort the list, it's a sorting technique.
Only merge sort is outplaced among the comparison-based techniques addressed because it

ty
needs an extra array to merge the sorted subarrays.

Outplaced techniques are all of the non-comparison-based techniques discussed. A counting

si
array is used in counting sort, while a hash table is used in bucket sort.

The technique of Online/Offline-

er
If it can accept new data while the process is underway, i.e., complete data is not required

to start the sorting operation, a sorting technique is called online.

iv
Only Insertion Sort qualifies for this among the comparison-based techniques addressed

because of the underlying algorithm it employs, which processes the array (not just the
Un
elements) from left to right and is unaffected by new elements added to the right.

Stable/Unstable process-

If a sorting technique does not alter the order of elements with the same value, it is said to
ity

be stable.

Bubble sort, insertion form, and merge type are stable techniques based on comparison
Am

techniques. The type of selection is unpredictable, as the order of elements with the same

value may be modified. Consider, for example, the 4, 4, 1, 3 series.

The minimum factor contained in the first iteration is 1 and at position 0 it is swapped with

4. As a consequence, the order of four about four in the first place would change. Likewise,

rapid sort and heap form is also unstable.


)
(c

Counting sort and Bucket sort are stable sorting techniques out of non-comparison-based

techniques, while radix sort stability relies on the underlying algorithm used for sorting.
Exercise:

ty
A. Choose the correct answers:

1. Minimum number of fields in each node of a doubly linked list is ____

si
(A) 5

(B) 3

er
(C) 6

(D) All of the above

Answer: B

iv
2. The elements of a linked list are stored

(A) In a structure
Un
(B) In an array

(C) Anywhere the computer has space for them

(D) In contiguous memory locations

Answer: C
ity

3. A parentheses checker program would be best implemented using

(A) List
Am

(B) Queue

(C) Stack

(D) Any of the above

Answer: C

4. To perform level-order traversal on a binary tree, which of the following data structure
)

will be required?
(c

(A) Hash table


(B) Queue

ty
(C) Binary search tree

(D) Stack

Answer: B

si
5. What is the best-case time complexity of deleting a node in a Singly Linked list?

(A) O (n3)

er
(B) O (n2)

(C) O (nlog)

iv
(D) O (1)

Answer: D
Un
6. Minimum number of queues to implement stack is ___________

(A) 5

(B) 3

(C) 1
ity

(D) 7

Answer: C

7. A circular linked list can be used for


Am

(A) Stack

(B) Queue

(C) Both Stack & Queue

(D) Neither Stack or Queue


)

Answer: C
(c

9. If a problem can be solved by combining optimal solutions to non-overlapping problems,

the strategy is called _____________


(A) Dynamic programming

ty
(B) Greedy

(C) Divide and conquer

(D) Recursion

si
Answer: C

10. _______________operations can include initialization, use, and then de-initialization of

er
the stack.

(A) Stack

iv
(B) Queue

(C) Both Stack & Queue


Un
(D) Neither Stack or Queue

Answer: A
ity
) Am
(c
Module II:
Key learning outcomes
At the end of this module, you will learn:

ty
1. Trees & Basic Graph Algorithms

2. Trees: Basic Terminology & Its Application

3. Binary Trees, Binary Search Trees,

si
4. Red-Black Trees,
5. AVL Trees and B Tree
6. Spanning trees: Prim’s and Kruskal’s algorithm

er
7. graph Terminology, representations, traversals
8. basic graph algorithms: Depth-first search
9. Breadth-first search and its analysis

iv
10. single source and all-pair shortest path problem

Module III:
Key learning outcomes
Un
At the end of this module, you will learn:
11. Concept of Dynamic and Greedy Approach
12. Huffman codes and fractional knapsack problem
13. Greedy Programming: Activity Selection Problem
14. Huffman codes and fractional knapsack problem
ity

2.1: Trees & Basic Graph Algorithms


Am

Unit 2.1.1 : Trees: basic Terminology & Its Application


Unit Outcomes :
At the end of the unit, you will learn
1. The basic concept of Tree
2. Tree terminology & Its Application
)
(c

Tree

A tree is an assortment of components called nodes. Every node contains some worth or component. We will utilize
the term nodes, instead of the vertex with a binary tree. Nodes are the primary segment of any tree structure. It
stores the real information alongside connections to different nodes
ty
si
er
Tree Terminology

iv
path

A way is an arrangement of edges between hubs.


Un
root

The root is the exceptional hub from which any remaining hubs "dive".

Each tree has a solitary root hub.

parent of hub n
ity

The parent of hub n is the remarkable hub with an edge to hub n and which is the main hub on the way from n to the
root.

Note:
Am

The root is the lone hub with no parent.

Except for the root, every hub has one parent.

Offspring of hub n

An offspring of hub n is a hub for which hub n is the following hub on the way to the root hub.

All the more officially, if the keep going edge on the way from root r of tree T to hub x is (y, x) at that point hub y is
)

the parent of hub x and hub x is an offspring of hub y.


(c

Note:

Every hub may have at least 0 kids.

siblings Nodes with a similar parent are kin.


the progenitor of hub n

Any hub y on the (remarkable) way from root r to hub n is a predecessor of hub n.

ty
Each hub is a progenitor of itself.

the appropriate predecessor of n

si
A legitimate precursor of n is any hub y to such an extent that hub y is a progenitor of hub n and y isn't a similar hub
as n.

er
descendent of n

Any hub y for which n is a precursor of y.

Each hub is a descendent of itself.

iv
legitimate descendent of n

An appropriate descendent of n is any hub y for which n is a precursor of y and y isn't a similar hub as n.
Un
subtrees of hub n

Subtrees of hub n are the trees whose roots are the offspring of n.

level of hub n
ity

The level of hub n is the quantity of kids hub n has.

leaf node

A leaf hub is a hub without any kids.


Am

outside node

An outer hub is a hub without any youngsters (same as a leaf hub).

inside node

An interior hub is a non-leaf hub.


)

the profundity of hub n The profundity of hub n is the length of the way from root to hub n.
(c

level d Level d is the hubs at profundity d.

the stature of tree T

The tallness of tree T is the biggest profundity of any hub in T.


We, as a rule, consider tallness identifying with trees and profundity identifying with hubs anyway for full trees the
terms are here and there utilized interchangeably.

Requested tree

A requested tree is a tree wherein the offspring of every hub are requested.

ty
Full k-ary tree

A full k-ary tree is a tree where all leaves have similar profundity and all inner hubs have degree k. (The Corman et.
al. Algorithm text calls this total.)

si
Binary tree

er
A binary tree is a tree where every hub has two kids, perhaps missing, named the left youngster and the correct kid.
A twofold tree is more than an arranged tree of degree 2, in an arranged tree of degree 2 there is no recognizing of a
sole kid as left or right

Example

iv
Un
ity
Am

Terminology
In a tree data structure, we use the following terminology...

1. Root

In a tree information structure, the primary hub is called Root Node. Each tree should have a root hub. We can say
that the root hub is the beginning of the tree information structure. In any tree, there should be just one root hub.
We never have numerous root hubs in a tree.
)

A root node is very much like any node, in that it is important for a data structure that comprises at least one field
with connections to a different node and contains an information field; it just turns out to be the principal node. In
(c

such a manner, any node can be a root node corresponding to itself and its youngsters if that part of the tree is
impartially chosen.
ty
si
er
2. Edge

iv
In a tree information structure, the associating join between any two hubs is called EDGE. In a tree with 'N' number
of hubs, there will be a limit of 'N-1' number of edges.
Un
ity
Am

3. Parent

In a tree information structure, the hub which is an archetype of any hub is called a PARENT NODE. In
straightforward words, the hub which has a branch from it to some other hub is known as a parent hub. Parent hub
)
(c
can likewise be characterized as "The hub which has kid/kids".

ty
si
er
iv
4. Child

In a tree information structure, the hub which is an archetype of any hub is called a PARENT NODE. In
straightforward words, the hub which has a branch from it to some other hub is known as a parent hub. Parent hub
can likewise be characterized as "The hub which has kid/kids".
Un
ity
Am

5. Siblings
)
(c
In a tree information structure, hubs that have a place with the same Parent are called SIBLINGS. In straightforward
words, the hubs with similar parents are called Sibling hubs.

ty
si
er
iv
6. Leaf

In a tree information structure, the hub which doesn't have a youngster is called LEAF Node. In basic words, a leaf is
a hub with no kid.
Un
In a tree information structure, the leaf hubs are likewise called External Nodes. The outer hub is additionally a hub
with no kid. In a tree, leaf hub is additionally called as 'Terminal' hub
ity
Am

7. Internal Nodes
)

In a tree information structure, the hub which has at least one youngster is called INTERNAL Node. In basic words, an
inward hub is a hub with at least one kid.
(c

In a tree information structure, hubs other than leaf hubs are called Internal Nodes. The root hub is additionally
supposed to be an Internal Node if the tree has more than one hub. Inside hubs are likewise called 'Non-Terminal'
hubs.

ty
si
er
iv
8. Degree

In a tree information structure, the complete number of offspring of a hub is called as DEGREE of that Node. In basic
words, the Degree of a hub is all outnumber of youngsters it has. The most extensive level of a hub among all the
hubs in a tree is called as 'Level of Tree'
Un
ity
Am

9. Level

In a tree information structure, the root hub is supposed to be at Level 0 and the offspring of the root hub is at Level
1 and the offspring of the hubs which are at Level 1 will be at Level 2, etc... In basic words, in a tree, each progression
)
(c
through and through is called a Level and the Level tally begins with '0' and increased by one at each level (Step).

ty
si
er
iv
10. Height

In a tree information structure, the complete number of edges from the leaf hub to a specific hub in the longest way
is called as HEIGHT of that Node. In a tree, the stature of the root hub is supposed to be the tallness of the tree. In a
tree, the tallness of all leaf hubs is '0'.
Un
ity
Am

11. Depth

In a tree information structure, the complete number of edges from the root hub to a specific hub is called as DEPTH
of that Node. In a tree, the absolute number of edges from the root hub to a leaf hub in the longest way is supposed
to be the Depth of the tree. In basic words, the most noteworthy profundity of any leaf hub in a tree is supposed to
)
(c
be the profundity of that tree. In a tree, the profundity of the root hub is '0'

ty
si
er
iv
12. Path

In a tree information structure, the succession of Nodes and Edges starting with one hub then onto the next hub is
called PATH between that two Nodes. Length of a Path is the absolute number of hubs in that way. In the
underneath model the way A - B - E - J has length 4.
Un
ity
Am

13. Sub Tree


)
(c
In a tree information structure, every youngster from a hub shapes a subtree recursively. Each kid hub will shape a
subtree on its parent hub.

ty
si
er
iv
Application of Tree:
Trees are hierarchical structures having a single root node.
Un
ity
Am

Some applications of the trees are:

XML Parser utilizes a tree algorithm.

The choice-based calculation is utilized in AI which works upon the calculation of trees.

Information bases likewise utilize tree information structures for ordering.


)

Area Name Server(DNS) likewise utilizes tree structures.


(c

Record pilgrim/my PC of versatile/any PC

BST utilized in PC Graphics

Posting inquiries on sites like Quora, the remarks are offspring of inquiries
Exercise
What is the tree?

ty
Write about the Application of trees?

What is root?

si
Unit 2.1.2: Binary Trees, Binary Search Trees, Red-Black Trees,
AVL Trees and B Tree

er
Unit Outcomes :
At the end of the unit, you will learn
1. Binary Tree Data Structure

iv
2. Binary search tree
3. Red-Black Trees
Un
4. AVL Trees
5. B tree
What is Binary Tree Data Structure?
A binary tree is a tree-type non-straight information structure with a limit of two kids for
each parent. Each hub in a binary tree has a left and right reference alongside the
ity

information component. The hub at the highest point of the order of a tree is known as the
root hub. The hubs that hold other sub-hubs are the parent hubs.
A parent hub has two youngster hubs: the left kid and the right kid. Hashing, directing
information for network traffic, information pressure, planning binary stacks, and twofold
hunt trees are a portion of the applications that utilization a binary tree.
) Am
(c

Terminologies associated with Binary Trees and Types of Binary Trees


Hub: It addresses an endpoint in a tree.
Root: A tree's highest hub.
Parent: Each hub (aside from the root) in a tree that has in any event one sub-hub of its own is known as a parent
hub.

ty
Kid: A hub that straightway came from a parent hub while moving away from the root is the youngster hub.
Leaf Node: These are outside hubs. They are the hubs that have no youngsters.
Inside Node: As the name proposes, these are inward hubs with at any rate one youngster.

si
The profundity of a Tree: The quantity of edges from the tree's hub to the root is.
The stature of a Tree: It is the number of edges from the hub to the most profound leaf. The tree stature is
additionally viewed as the root tallness.

er
As you are currently acquainted with the phrasings related to the binary tree and the sorts of the binary tree, the
time has come to comprehend the binary tree parts.

iv
Binary Tree Components
There are three binary tree parts. Each twofold tree hub has these three parts related to it. It turns into a
fundamental idea for developers to comprehend these three binary tree parts:
Un
Information component
Pointer to left subtree
Pointer to right subtree
kinds of the binary tree model
ity
) Am

Source
(c

These three twofold tree parts address a hub. The information lives in the center. The left pointer focuses on the kid
hub, framing the left sub-tree. The correct pointer focuses on the kid hub at its right, making the privilege subtree.
Peruse: Top Guesstimate Questions and Informative Methods for Data Science

Kinds of Binary Trees

ty
There are different sorts of binary trees, and every one of these binary tree types has remarkable qualities. Here is
every one of the binary tree types in detail:

si
1. Full Binary Tree
It is an exceptional sort of a binary tree that has either zero youngsters or two kids. It implies that all the hubs in that
binary tree ought to either have two kid hubs of its parent hub or the parent hub is itself the leaf hub or the outer
hub.

er
As such, a full binary tree is a remarkable binary tree where each hub aside from the outside hub has two
youngsters. At the point when it holds a solitary youngster, a particularly binary tree won't be a full binary tree.
Here, the amount of leaf hubs is equivalent to the number of inner hubs in addition to one. The condition resembles
L=I+1, where L is the number of leaf hubs, and I is the number of interior hubs.

iv
Here is the design of a full binary tree:
sorts of binary tree - full binary tree
Un
ity

2. Complete Binary Tree


A total binary tree is another particular kind of twofold tree where all the three levels are filled with hubs, aside from
the most reduced level of the tree. Additionally, in the last or the most reduced level of this binary tree, each hub
ought to perhaps dwell on the left side. Here is the design of a total binary tree:
sorts of twofold tree - complete binary tree
Am

3. Wonderful Binary Tree


)

A binary tree is supposed to be 'awesome' if all the inward hubs have carefully two youngsters, and each outer or
leaf hub is at a similar level or same profundity inside a tree. An ideal binary tree having tallness 'h' has 2h – 1 hub.
(c

Here is the design of an ideal binary tree:


kinds of binary tree - amazing tree
ty
4. Adjusted Binary Tree
A binary tree is supposed to be 'adjusted' if the tree tallness is O(logN), where 'N' is the number of hubs. In a

si
reasonable twofold tree, the stature of the left and the privilege subtrees of every hub ought to fluctuate by all
things considered one. An AVL Tree and a Red-Black Tree are some regular instances of information structure that
can produce a fair binary inquiry tree. Here is an illustration of a fair binary tree:
kinds of binary tree - adjusted binary tree

er
iv
5. Degenerate Binary Tree
A binary tree is supposed to be a savage twofold tree or obsessive binary tree if each inside hub has just a solitary
kid. Such trees are like a connected rundown execution astute. Here is an illustration of a savage binary tree:
Un
degenerate binary tree
ity

Advantages of a Binary Tree


Trees are so valuable and as often as possible utilized because they have some intense benefits:
1. Trees show structural relationships in the data.
2. Trees are used to represent hierarchies.
Am

3. Trees provide an efficient insertion and searching.


4. Trees are very flexible data, allowing to move subtrees around with minimum effort.
1. The inquiry activity in a twofold tree is quicker when contrasted with different trees
2. Just two crossings are sufficient to give the components in arranged request
3. It is not difficult to get the greatest and least components
4. Diagram crossing likewise utilizes binary trees
)

5. Changing over various postfix and prefix articulations are conceivable utilizing twofold trees
(c

The doubletree is quite possibly the most broadly utilized tree in the information structure. Every one of the paired
tree types has its one-of-a-kind highlights. These information structures have explicit prerequisites in applied
software engineering. We trust this article about kinds of parallel trees was useful. upgrade offers different courses
in information science, AI, large information, and then some.
What is a Binary Search Tree?
A BST is viewed as an information structure comprised of hubs, as Linked Lists. These hubs are either invalid or have
references (joins) to different hubs. These 'other' hubs are youngster hubs, called left hub and right hub. Hubs have
values. These qualities figure out where they are put inside the BST.

ty
Additionally to a connected rundown, every hub is referred to by just a single hub, its parent (except the root hub).
So we can say that every hub in a BST is in itself a BST. Since further down the tree, we arrive at another hub and
that hub has a left and a right. At that point contingent upon what direction we go, that hub has a left and a
privilege, etc.

si
1. The left hub is consistently more modest than its parent.

2. The correct hub is consistently more noteworthy than its parent.

3. A BST is viewed as adjusted if each level of the tree is completely loaded up except for the last level. On the

er
last level, the tree is filled left to right.

4. A Perfect BST is one in which it is both complete (all youngster hubs are on a similar level and every hub has
a left and a correct kid hub).

iv
Un
ity

Why do we need a Binary Search Tree?


The two central points that make a twofold hunt tree an ideal answer for any genuine issues are Speed and
Accuracy.
Am

Because of the way that the twofold hunt is in a branch-like organization with parent-kid relations, the calculation
knows in which area of the tree the components should be looked at. This abatement the quantity of key-esteem
examinations the program needs to make to find the ideal component.

Moreover, if the component to be looked through more noteworthy or not exactly the parent hub, the hub realizes
which tree side to look for. The explanation is, the left sub-tree is consistently lesser than the parent hub, and the
correct sub-tree has values consistently equivalent to or more prominent than the parent hub.

BST is regularly used to actualize complex quests, powerful game rationales, auto-complete exercises, and designs.
)
(c

The calculation effectively underpins tasks like hunt, embed, and erase.

Red-Black Tree
Introduction:
A red-dark tree is a sort of self-adjusting double hunt tree where every hub has an additional piece, and that piece is
frequently deciphered as the tone (red or dark). These tones are utilized to guarantee that the tree stays adjusted
during additions and cancellations. Albeit the equilibrium of the tree isn't great, it is adequate to decrease the
looking through time and keep up it around O(log n) time, where n is the absolute number of components in the
tree. This tree was developed in 1972 by Rudolf Bayer.

ty
It should be noticed that as every hub requires just 1 bit of room to store the shading data, these kinds of trees show
indistinguishable memory impression to the work of art (uncolored) parallel pursuit tree.

Decides That Every Red-Black Tree Follows:

si
1. Each hub has a shading either red or dark.

2. The base of the tree is consistently dark.

3. There are no two neighboring red hubs (A red hub can't have a red parent or red youngster).

er
4. Each way from a hub (counting root) to any of its relative NULL hub has a similar number of dark hubs.

Why Red-Black Trees?

iv
The majority of the BST activities (e.g., search, max, min, embed, erase.. and so forth) take O(h) time where h is the
tallness of the BST. The expense of these tasks may become O(n) for a slanted Binary tree. If we ensure that the
tallness of the tree remains O(log n) after each inclusion and erasure, at that point we can ensure an upper bound of
Un
O(log n) for every one of these tasks. The stature of a Red-Black tree is consistently O(log n) where n is the number
of hubs in the tree.

Sr. No. Algorithm Time Complexity

1. Search O(log n)
ity

2. Insert O(log n)

3. Delete O(log n)

“n” is the total number of elements in the red-black tree.


Am

How does a Red-Black Tree ensure balance?


A basic guide to comprehend adjusting is, a chain of 3 hubs is preposterous in the Red-Black tree. We can attempt
any mix of tones and see every one of them disregard Red-Black tree property.

A chain of 3 nodes is not possible in Red-Black Trees.


Following are NOT Red-Black Trees
30 30 30
/\ / \ / \
)

20 NIL 20 NIL 20 NIL


/\ /\ / \
(c

10 NIL 10 NIL 10 NIL


Violates Violates Violates
Property 4. Property 4 Property 3

Following are different possible Red-Black Trees with the above 3 keys
20 20
/ \ / \
10 30 10 30
/ \ / \ / \ / \
NIL NIL NIL NIL NIL NIL NIL NIL

ty
Fascinating focuses about Red-Black Tree:
The dark stature of the red-dark tree is the number of dark hubs away from the root hub to a leaf hub. Leaf hubs are
likewise considered dark hubs. In this way, a red-dark tree of stature h has dark tallness >= h/2.

si
Stature of a red-dark tree with n hubs is h<= 2 log2(n + 1).

All leaves (NIL) are dark.

er
The dark profundity of a hub is characterized as the number of dark hubs from the root to that hub i.e the number of
dark predecessors.

Each red-dark tree is an extraordinary instance of a twofold tree.

Dark Height of a Red-Black Tree :

iv
Dark stature is the number of dark hubs on away from the root to a leaf. Leaf hubs are likewise tallied dark hubs.
From the above properties 3 and 4, we can determine, a Red-Black Tree of stature h has dark tallness >= h/2.

Each Red Black Tree with n hubs has stature <= 2Log2(n+1)
Un
This can be demonstrated utilizing the accompanying realities:

For an overall Binary Tree, let k be the base number of hubs on all root to NULL ways, at that point n >= 2k – 1 (Ex. If
k is 3, n is at any rate 7). This articulation can likewise be composed as k <= Log2(n+1).

From property 4 of Red-Black trees or more case, we can say in a Red-Black Tree with n hubs, there is a root to leaf
way with at-most Log2(n+1) dark hubs.

From property 3 of Red-Black trees, we can guarantee that the quantity of dark hubs in a Red-Black tree is at any
ity

rate ⌊ n/2 ⌋ where n is the complete number of hubs.

From the above focuses, we can finish up the way that Red Black Tree with n hubs has tallness <= 2Log2(n+1)

Search Operation in Red-dark Tree:

As each red-dark tree is an exceptional instance of a twofold tree so the looking through the calculation of a red-dark
Am

tree is like that of a doubletree.

Algorithm:
searchElement (tree, val)

Step 1:
If tree -> data = val OR tree = NULL
Return tree
)

Else
(c

If val data
Return searchElement (tree -> left, val)
Else
Return searchElement (tree -> right, val)
[ End of if ]
[ End of if ]

ty
Step 2: END

si
AVL Tree
AVL tree is a tallness adjusted binary search tree. That implies, an AVL tree is likewise a paired

er
hunt tree yet it is a decent tree. A twofold tree is supposed to be adjusted if, the contrast between
the statures of left and right subtrees of each hub in the tree is either - 1, 0, or +1. At the end of
the day, a double tree is supposed to be adjusted if the tallness of the left and right offspring of
each hub contrast by either - 1, 0 or +1. In an AVL tree, each hub keeps up additional data known

iv
as the equilibrium factor.
The AVL tree was introduced in the year 1962 by G.M. Adelson-Velsky and E.M. Landis.
An AVL tree is defined as follows...
Un
An AVL tree is a balanced binary search tree. In an AVL tree, balance factor of every node is either -
1, 0 or +1.
Equilibrium factor of a hub is the contrast between the statures of the left and right subtrees of
that hub. The equilibrium factor of a hub is determined either tallness of left subtree - stature of
right subtree (OR) tallness of right subtree - stature of left subtree. In the accompanying
clarification, we ascertain as follows...
ity

Balance factor = heightOfLeftSubtree - heightOfRightSubtree


Example of AVL Tree
) Am
(c

The above tree is a binary search tree and every node is satisfying balance factor condition. So this
tree is said to be an AVL tree.

AVL Tree Rotations


In AVL tree, subsequent to performing activities like inclusion and cancellation we need to check
the equilibrium factor of each hub in the tree. On the off chance that each hub fulfills the
equilibrium factor condition, we close the activity else we should make it adjusted. At whatever
point the tree gets imbalanced because of any activity we use pivot tasks to make the tree

ty
adjusted.

Pivot tasks are utilized to make the tree adjusted.

si
There are four rotations and they are classified into two types.

er
iv
Un
Single Left Rotation (LL Rotation)
In LL Rotation, every node moves one position to left from the current position. To understand LL
Rotation, let us consider the following insertion operation in AVL Tree...
ity
) Am

Single Right Rotation (RR Rotation)


(c

In RR Rotation, every node moves one position to right from the current position. To understand
RR Rotation, let us consider the following insertion operation in AVL Tree...
ty
si
Left Right Rotation (LR Rotation)

er
The LR Rotation is a sequence of single left rotation followed by a single right rotation. In LR
Rotation, at first, every node moves one position to the left and one position to right from the
current position. To understand LR Rotation, let us consider the following insertion operation in
AVL Tree...

iv
Un
ity

Right Left Rotation (RL Rotation)


Am

The RL Rotation is sequence of single right rotation followed by single left rotation. In RL Rotation,
at first every node moves one position to right and one position to left from the current position.
To understand RL Rotation, let us consider the following insertion operation in AVL Tree...
)
(c
ty
si
er
Operations on an AVL Tree
The following operations are performed on AVL tree...

iv
Search
Un
Insertion
Deletion
Search Operation in AVL Tree
In an AVL tree, the search operation is performed with O(log n) time complexity. The search
operation in the AVL tree is similar to the search operation in a Binary search tree. We use the
following steps to search an element in AVL tree...
ity

Step 1 - Read the search element from the user.


Step 2 - Compare the search element with the value of root node in the tree.
Am

Step 3 - If both are matched, then display "Given node is found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search element is smaller or larger than that
node value.
Step 5 - If search element is smaller, then continue the search process in left subtree.
Step 6 - If search element is larger, then continue the search process in right subtree.
Step 7 - Repeat the same until we find the exact element or until the search element is compared
)

with the leaf node.


(c

Step 8 - If we reach to the node having the value equal to the search value, then display "Element
is found" and terminate the function.
Step 9 - If we reach to the leaf node and if it is also not matched with the search element, then
display "Element is not found" and terminate the function.
Insertion Operation in AVL Tree
In an AVL tree, the insertion operation is performed with O(log n) time complexity. In AVL Tree, a

ty
new node is always inserted as a leaf node. The insertion operation is performed as follows...

Step 1 - Insert the new element into the tree using Binary Search Tree insertion logic.

si
Step 2 - After insertion, check the Balance Factor of every node.
Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.

er
Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then that tree is said to be
imbalanced. In this case, perform suitable Rotation to make it balanced and go for next operation.
Example: Construct an AVL Tree by inserting numbers from 1 to 8.

iv
Un
ity
) Am
(c
(c
) Am
ity
Un
iv
er
si
ty
Deletion Operation in AVL Tree
The deletion operation in AVL Tree is similar to deletion operation in BST. But after every deletion
operation, we need to check with the Balance Factor condition. If the tree is balanced after
deletion go for next operation otherwise perform suitable rotation to make the tree Balanced.

ty
B-Tree

si
B-Tree is a self-adjusting search tree. In the vast majority of the other self-adjusting search trees (like AVL and Red-
Black Trees), it is expected that everything is in primary memory. To comprehend the utilization of B-Trees, we
should think about the immense measure of information that can't fit in fundamental memory. At the point when
the quantity of keys is high, the information is perused from the circle as squares. Circle access time is extremely

er
high in contrast with the principle memory access time. The primary thought of utilizing B-Trees is to decrease the
quantity of circle gets to. The vast majority of the tree tasks (search, embed, erase, max, min, ..and so on ) require
O(h) circle to get to where h is the stature of the tree. B-tree is a fat tree. The stature of B-Trees is kept low by
placing the greatest potential keys in a B-Tree hub. For the most part, the B-Tree hub size is held equivalent to the
circle block size. Since the stature of the B-tree is low so complete plate gets to for the greater part of the activities

iv
are diminished fundamentally contrasted with adjusted Binary Search Trees like AVL Tree, Red-Black Tree, ..and so
on

Time Complexity of B-Tree:


Un
Sr. No. Algorithm Time Complexity

1. Search O(log n)

2. Insert O(log n)
ity

3. Delete O(log n)
“n” is the total number of elements in the B-tree.
Properties of B-Tree:
Am

All leaves are at a similar level.

A B-Tree is characterized by the term least degree 't'. The estimation of t relies on plate block size.

Each hub except root should contain in any event (ceiling)([t-1]/2) keys. The root may contain at least 1 key.

All hubs (counting root) may contain all things considered t – 1 key.

The number of offspring of a hub is equivalent to the number of keys in it in addition to 1.

All keys of a hub are arranged in expanding request. The youngster between two keys k1 and k2 contains all keys in
)

the reach from k1 and k2.

B-Tree develops and shrivels from the root which is not normal for Binary Search Tree. Parallel Search Trees become
(c

descending and shrivel from descending.

Like other adjusted Binary Search Trees, time intricacy to look, embed, and erase is O(log n).

Following is an illustration of B-Tree of least request 5. Note that in viable B-Trees, the estimation of the base
request is considerably more than 5.
ty
si
We can find in the above chart that all the leaf hubs are at a similar level and all non-leaf have no vacant sub-tree
and have keys one not exactly the number of their kids.

er
iv
Un
ity
) Am

In this model, we can see that our pursuit was diminished simply by restricting the odds where the key containing
(c

the worth could be available. Likewise if inside the above model we've to search for 180, the control will stop at
stage 2 because the program will locate that the key 180 is available inside the ebb and flow hub. Furthermore,
correspondingly, on the off chance that it's to search out 90, as 90 < 100 so it'll go to one side subtree consequently,
and along these lines the control stream will go also as demonstrated inside the above model.

Exercise
1. What is Binary Tree Data Structure?
2. What is a Binary search tree?
3. What are Red-Black Trees?

ty
4. What are AVL Trees?
5. What is the B tree?

6. Write Search Operation in Red-dark Tree?

si
7. How does a Red-Black Tree ensure balance?

8. Why do we need a Binary Search Tree?

er
iv
Unit 2.1.3: Spanning trees: Prim’s and Kruskal’s algorithm
Unit Outcomes :
1. About Spanning Tree
Un
2. Prim’s algorithm
3. Kruskal’s algorithm
4. Properties of spanning-tree

Spanning Tree
A spanning tree is a subset of Graph G, which has all the vertices covered with the least conceivable number of
ity

edges. Consequently, a traversing tree doesn't have cycles and it can't be separated.. By this definition, we can make
an inference that each associated and undirected Graph G has, in any event, one spanning over the tree. A detached
chart doesn't have any crossing tree, as it can't be spread over to all its vertices.
) Am
(c
We discovered three spanning over trees off one complete diagram. A total undirected chart can have the greatest
nn-2 number of spanning over trees, where n is the number of hubs. In the above tended to model, n is 3,
subsequently, 33−2 = 3 traversing trees are conceivable.

ty
General Properties of Spanning Tree
We presently comprehend that one chart can have more than one spanning over the tree. Following are a couple of
properties of the crossing tree associated with diagram G −

si
5. An associated diagram G can have more than one spanning over the tree.
6. All conceivable spanning over trees of chart G has a similar number of edges and vertices.
7. The crossing tree doesn't have any cycle (circles).

er
8. Removing one edge from the spanning over the tree will make the diagram detached, for example, the
traversing tree is negligibly associated.
9. Adding one edge to the spanning over the tree will make a circuit or circle, for example, the crossing tree is
maximally non-cyclic.

iv
Numerical Properties of Spanning Tree
10. The spanning tree has n-1 edges, where n is the number of hubs (vertices).
11. From a total diagram, by eliminating the most extreme e - n + 1 edges, we can build a crossing tree.
Un
12. A complete diagram can have the greatest nn-2 number of spanning over trees.
Hence, we can presume that traversing trees are a subset of associated Graph G, and disengaged diagrams don't
have to span over the tree
Minimum Spanning-Tree Algorithm

We shall learn about the two most important spanning tree algorithms here −
ity

1. Kruskal's Algorithm
2. Prim's Algorithm

Kruskal's Spanning Tree Algorithm


Am

Kruskal's algorithm to locate the base expense traversing tree utilizes the voracious methodology. This calculation
regards the chart as wood and each hub it has as an individual tree. A tree associates with another just and just if, it
has the most minimal expense among every single accessible alternative and doesn't disregard MST properties.
To understand Kruskal's algorithm let us consider the following example −
)
(c
ty
si
er
Step 1 - Remove all loops and Parallel Edges

Remove all loops and parallel edges from the given graph.

iv
Un
In the case of parallel edges, keep the one that has the least cost associated and remove all others.
ity
Am

Step 2 - Arrange all edges in their increasing order of weight

The following stage is to make a bunch of edges and weight and mastermind them in a climbing
)

request of weightage (cost).


(c

Step 3 - Add the edge which has the least weightage

Presently we begin adding edges to the diagram starting from the one which has the least weight. All
through, we will continue watching that the spreading over properties stays unblemished. On the off
chance that, by adding one edge, the traversing tree property doesn't hold then we will think about not
remembering the edge for the diagram.
ty
si
The most minimal expense is 2 and the edges included are B, D, and D, T. We add them. Adding them
doesn't abuse traversing tree properties, so we proceed to our next edge determination.
The next cost is 3, and the associated edges are A, C, and C, D. We add them again −

er
iv
Un
The next cost in the table is 4, and we observe that adding it will create a circuit in the graph. −
ity

We ignore it. In the process, we shall ignore/avoid all edges that create a circuit.
Am

We observe that edges with costs 5 and 6 also create circuits. We ignore them and move on.
)
(c
ty
si
Now we are left with only one node to be added. Between the two least-cost edges available 7 and 8,
we shall add the edge with cost 7.

er
iv
By adding edge S, we have included all the nodes of the graph and we now have a minimum cost
Un
spanning tree.

Prim's Spanning Tree Algorithm


Prim's algorithm to discover the least expense traversing tree (as Kruskal's calculation) utilizes the
avaricious methodology. Tidy's calculation imparts comparability to the briefest way first algorithm.
ity

Prim's algorithm, interestingly with Kruskal's calculation, regards the hubs as a solitary tree and
continues adding new hubs to the crossing tree from the given chart.
To stand out from Kruskal's calculation and to comprehend Prim's calculation better, we will utilize a
similar model −
) Am
(c

Step 1 - Remove all loops and parallel edges


ty
si
er
Remove all loops and parallel edges from the given graph. In the case of parallel edges, keep the one
that has the least cost associated and remove all others.

iv
Un
Step 2 - Choose any arbitrary node as the root node

For this situation, we pick the S hub as the root hub of Prim's traversing tree. This hub is discretionarily
picked, so any hub can be the root hub. One may ask why any video can be a root hub. So the
appropriate response is, in the crossing tree all the hubs of a chart are incorporated and because it is
ity

associated then there should be at any rate one edge, which will go along with it to the remainder of
the tree.
Step 3 - Check outgoing edges and select the one with less cost

After choosing the root node S, we see that S, A, and S, C are two edges with weights 7 and 8,
respectively. We choose the edge S, A as it is lesser than the other.
) Am

Now, the tree S-7-A is treated as one node and we check for all edges going out from it. We select the
(c

one which has the lowest cost and include it in the tree.
ty
si
After this progression, the S-7-A-3-C tree is shaped. Presently we'll again regard it as a hub and will
check all the edges once more. Be that as it may, we will pick just the smallest expense edge. For this
situation, C-3-D is the new edge, which is not exactly other edges' expense 8, 6, 4, and so on

er
iv
After adding hub D to the traversing tree, we presently have two edges leaving it having a similar
Un
expense, for example, D-2-T and D-2-B. In this manner, we can add possibly one. However, the
subsequent stage will again yield edge 2 as the most minimal expense. Henceforth, we are showing a
traversing tree with the two edges included. etc.
We may find that the output spanning tree of the same graph using two different algorithms is the
same.
ity
Am

Kruskal's algorithm

Kruskal's algorithm to track down the base expense spreading over tree utilizes the eager
methodology. This calculation regards the diagram as a woods and each hub it has as an individual
tree. A tree interfaces with another just and just in the event that, it has the smallest expense
)

among every accessible choice and doesn't disregard MST properties.


(c

To comprehend Kruskal's algorithm let us think about the accompanying model –


ty
si
er
Step 1 - Remove all loops and Parallel Edges

Remove all loops and parallel edges from the given graph.

iv
Un
ity

In case of parallel edges, keep the one which has the least cost associated and remove all others.
Am

Step 2 - Arrange all edges in their increasing order of weight

The next step is to create a set of edges and weight, and arrange them in an ascending order of
)

weightage (cost).
(c

Step 3 - Add the edge which has the least weightage


Now we start adding edges to the graph beginning from the one which has the least weight.
Throughout, we shall keep checking that the spanning properties remain intact. In case, by adding
one edge, the spanning tree property does not hold then we shall consider not to include the edge
in the graph.

ty
si
er
The least cost is 2 and edges involved are B,D and D,T. We add them. Adding them does not
violate spanning tree properties, so we continue to our next edge selection.
Next cost is 3, and associated edges are A,C and C,D. We add them again −

iv
Un
Next cost in the table is 4, and we observe that adding it will create a circuit in the graph. −
ity
Am

We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
)
(c

We observe that edges with cost 5 and 6 also create circuits. We ignore them and move on.
ty
si
Now we are left with only one node to be added. Between the two least cost edges available 7 and
8, we shall add the edge with cost 7.

er
iv
By adding edge S,A we have included all the nodes of the graph and we now have minimum cost
spanning tree.
Un
Exercise

1. Write About Spanning Tree.


2. What is Prim’s algorithm?
3. What is Kruskal’s algorithm?
ity

4. What are the Properties of a spanning tree?


Am

Unit 2.1.3 : Graphs: Terminology, representations, traversals


Unit Outcomes :
At the end of the unit, you will learn
1. Graphs terminology

2. Graphs representation
)

3. About Traversals
(c

Graphs Terminology

The next big step, graphs, can represent more than 3 dimensions. Graphs are used to represent many data
structures ranging from airline routes to program code.

Illustrate: airlines and branching in programs

Types of graphs:
1. Hierarchical or dependence graphs
2. Maps, schematic or geographical graphs
3. Trees are graphs

Terminology

ty
A graph consists of:

1. A set, V, of vertices (nodes)


2. A collection, E, of pairs of vertices from V called edges (arcs)

si
Edges, also called arcs, are represented by (u, v) and are either:

Directed if the pairs are ordered (u, v)


u the origin
v the destination

er
Undirected if the pairs are unordered

Then a graph can be:

A directed graph (digraph) if all the edges are directed

iv
Undirected graph (graph) if all the edges are undirected
Mixed graph if edges are both directed or undirected

Illustrate terms on graphs


Un
The end-vertices of an edge are the endpoints of the edge.

Two vertices are adjacent if they are endpoints of the same edge.

An edge is an incident on a vertex if the vertex is an endpoint of the edge.

Outgoing edges of a vertex are directed edges that the vertex is the origin.
Incoming edges of a vertex are directed edges that the vertex is the destination.

The degree of a vertex, v, denoted deg(v) is the number of incident edges.


ity

Out-degree, outdeg(v), is the number of outgoing edges.


In-degree, indeg(v), is the number of incoming edges.

Parallel edges or multiple edges are edges of the same type and end-vertices
Self-loop is an edge with the end vertices the same vertex
Simple graphs have no parallel edges or self-loops
Am

Graph Representation
Charts are numerical designs that address pairwise connections between objects. A chart is a stream structure that
addresses the connection between different items. It tends to be pictured by utilizing the accompanying two
essential segments:

Hubs: These are the main parts in any chart. Hubs are elements whose connections are communicated utilizing
)

edges. On the off chance that a diagram contains 2 hubs and an undirected edge between them, at that point, it
(c

communicates a bi-directional connection between the hubs and edge.

Edges: Edges are the parts that are utilized to address the connections between different hubs in a chart. An edge
between two hubs communicates a single direction or two-path connection between the hubs.
Types of nodes

Root hub: The root hub is the predecessor of any remaining hubs in a chart. It doesn't have any predecessor. Each
chart comprises precisely one root hub. For the most part, you should begin navigating a chart from the root hub.

ty
Leaf hubs: In a chart, leaf hubs address the hubs that don't have any replacements. These hubs just have precursor
hubs. They can have quite a few approaching edges however they won't have any friendly edges.

si
Types of graphs

Undirected: An undirected chart is a diagram wherein all the edges are bi-directional for example the
edges don't point a particular way.

er
iv
Un
ity

Directed: A directed graph is a diagram where all the edges are uni-directional for example the edges point
Am

a solitary way.

Graph traversals
We often want to solve problems that are expressible in terms of a traversal or search over a graph. Examples
include:

1. Finding all reachable nodes (for garbage collection)


)

2. Finding the best reachable node (single-player game search) or the minmax best reachable node (two-
player game search)
(c

3. Finding the best path through a graph (for routing and map directions)
4. Determining whether a graph is a DAG.
5. Topologically sorting a graph.
The goal of a graph traversal, generally, is to find all nodes reachable from a given set of root nodes. In an
undirected graph, we follow all edges; in a directed graph we follow only out-edges.

Tricolor algorithm

ty
Abstractly, graph traversal can be expressed in terms of the tricolor algorithm due to Dijkstra and others. In this
algorithm, graph nodes are assigned one of three colors that can change over time:

si
1. White nodes are undiscovered nodes that have not been seen yet in the current traversal and may even
be unreachable.
2. Black nodes are nodes that are reachable and that the algorithm is done with.

er
3. Gray nodes are nodes that have been discovered but that the algorithm is not done with yet. These
nodes are on a frontier between white and black.
The progress of the algorithm is depicted by the following figure. Initially, there are no black nodes and the
roots are gray. As the algorithm progresses, white nodes turn into gray nodes and gray nodes turn into black

iv
nodes. Eventually, there are no gray nodes left and the algorithm is done.
Un
ity

The algorithm maintains a key invariant at all times: there are no edges from white nodes to black nodes. This is
true initially, and because it is true at the end, we know that any remaining white nodes cannot be reached
from the black nodes.
Am

The algorithm pseudo-code is as follows:

1. Color all nodes white, except for the root nodes, which are colored gray.
2. While some gray node n exists:
1. color some white successors of n gray.
2. if n has no white successors, optionally color n black.
This algorithm is abstract enough to describe many different graph traversals. It allows the particular
)

implementation to choose the node n from among the gray nodes; it allows choosing which and how many
(c

white successors to color gray, and it allows delaying the coloring of gray nodes black. We say that such an
algorithm is nondeterministic because its behavior is not fully defined. However, as long as it does some work
on each gray node that it picks, any implementation that can be described in terms of this algorithm will finish.
Further, because the black-white invariant is maintained, it must reach all reachable nodes in the graph.
One value of defining graph search in terms of the tricolor algorithm is that the tricolor algorithm works even
when gray nodes are worked on concurrently, as long as the black-white invariant is maintained. Thinking about
this invariant therefore helps us ensure that whatever graph traversal we choose will work when parallelized,
which is increasingly important.

ty
Topological sort
One of the most useful algorithms on graphs is topological sort, in which the nodes of an acyclic graph are

si
placed in an order consistent with the edges of the graph. This is useful when you need to order a set of
elements where some elements have no ordering constraint relative to other elements.

For example, suppose you have a set of tasks to perform, but some tasks have to be done before other tasks

er
can start. In what order should you perform the tasks? This problem can be solved by representing the tasks as
a node in a graph, where there is an edge from task 1 to task 2 if task 1 must be done before task 2. Then a
topological sort of the graph will give an ordering in which task 1 precedes task 2. To topologically sort a graph,
it cannot have cycles. For example, if you were making lasagna, you might need to carry out tasks described by

iv
the following graph: Un
ity

There is some flexibility about what order to do things in, but clearly, we need to make the sauce before we
assemble the lasagna. A topological sort will find some ordering that obeys this and the other ordering
constraints. Of course, it is impossible to topologically sort a graph with a cycle in it.
Am

The key observation is that a node finishes (is marked black) after all of its descendants have been marked
black. Therefore, a node that is marked black later must come earlier when topologically sorted.
A postorder traversal generates nodes in the reverse of a topological sort:

Algorithm:

Perform a depth-first search over the entire graph, starting anew with an unvisited node if previous starting nodes
did not visit every node. As each node is finished (colored black), put it on the head of an initially empty list. This
takes time linear in the size of the graph: O(|V| + |E|).
)

For example, in the traversal example above, nodes are marked black in the order C, E, D, B, A. Reversing this,
(c

we get the ordering A, B, D, E, C. This is a topological sort of graph. Similarly, in the lasagna example, assuming
that we choose successors top-down, nodes are marked black in the order bake, assemble lasagna, make the
sauce, fry sausage, boil pasta, grate cheese. So the reverse of this ordering gives us a recipe for successfully
making lasagna, even though successful cooks are likely to do things more in parallel!
Detecting cycles
Since a node finishes after its Descendents, a cycle involves a gray node pointing to one of its gray ancestors
that haven't finished yet. If one of a node's successors is gray, there must be a cycle. To detect cycles in graphs,

ty
therefore, we choose an arbitrary white node and run DFS. If that completes and there are still white nodes left
over, we choose another white node arbitrarily and repeat. Eventually, all nodes are colored black. If at any
time we follow an edge to a gray node, there is a cycle in the graph. Therefore, cycles can be detected in
O(|V+E|) time.

si
Edge classifications
We can classify the various edges of the graph based on the color of the node reached when the algorithm

er
follows the edge. Here is the expanded (A–G) graph with the edges colored to show their classification.

iv
Un
Note that the classification of edges depends on what trees are constructed, and therefore depends on what
node we start from and in what order the algorithm happens to select successors to visit.

When the destination node of a followed edge is white, the algorithm performs a recursive call. These edges
ity

are called tree edges, shown as solid black arrows. The graph looks different in this picture because the nodes
have been moved to make all the tree edges go downward. We have already seen that tree edges show the
precise sequence of recursive calls performed during the traversal.

When the destination of the followed edge is gray, it is a back edge, shown in red. Because there is only a single
Am

path of gray nodes, a back edge is looping back to an earlier gray node, creating a cycle. A graph has a cycle if
and only if it contains a back edge when traversed from some node.

When the destination of the followed edge is colored black, it is a forward edge or across the edge. It is a cross
edge if it goes between one tree and another in the forest; otherwise, it is a forward edge.

Detecting cycles
)

It is often useful to know whether a graph has cycles. To detect whether a graph has cycles, we perform a
(c

depth-first search of the entire graph. If a back edge is found during any traversal, the graph contains a cycle. If
all nodes have been visited and no back edge has been found, the graph is acyclic.

Connected components
Graphs need not be connected, although we have been drawing connected graphs thus far. A graph is
connected if there is a path between every two nodes. However, it is entirely possible to have a graph in which
there is no path from one node to another node, even following edges backward. For connectedness, we don't
care which direction the edges go in, so we might as well consider an undirected graph. A connected

ty
component is a subset S such that for every two adjacent vertices v and v', either v and v' are both in S or
neither one is.

For example, the following undirected graph has three connected components:

si
er
iv
The connected components problem is to determine how many connected components make up a graph, and
to make it possible to find, for each node in the graph, which component it belongs to. This can be a useful way
to solve problems. For example, suppose that different components correspond to different jobs that need to
Un
be done, and there is an edge between two components if they need to be done on the same day. Then to find
out what is the maximum number of days that can be used to carry all the jobs, we need to count the
components.

Algorithm:

Perform a depth-first search over the graph. As each traversal starts, create a new component. All nodes reached
during the traversal belong to that component. The number of traversals done during the depth-first search is the
ity

number of components. Note that if the graph is directed, the DFS needs to follow both in- and out-edges.

For directed graphs, it is usually more useful to define strongly connected components. A strongly connected
component (SCC) is a maximal subset of vertices such that every vertex in the set is reachable from every other.
All cycles in a graph are part of the same strongly connected component, which means that every graph can be
viewed as a DAG composed of SCCs. There is a simple and efficient algorithm due to Kosaraju that uses depth-
Am

first search twice:

1. Topologically sort the nodes using DFS. The SCCs will appear in sequence.
2. Now traverse the transposed graph, but pick new (white) roots in topological order. Each new subtraversal
reaches a distinct SCC.
For example, consider the following graph, which is not a DAG:
)
(c
Running a depth-first traversal in which we happen to choose children left-to-right, and extracting the nodes in
reverse postorder, we obtain the ordering 1, 4, 5, 6, 2, 3, 7. Notice that the SCCs occur sequentially within this
ordering. The job of the second part of the algorithm is to identify the boundaries.

ty
In the second phase, we start with 1 and find it has no predecessors, so {1} is the first SCC. We then start with 4
and find that 5 and 6 are reachable via backward edges, so the second SCC is {4,5,6}. Starting from 2, we
discover {2,3}, and the final SCC is {7}. The resulting DAG of SCCs is the following:

si
er
Notice that the SCCs are also topologically sorted by this algorithm.

iv
One inconvenience of this algorithm is that it requires being able to walk edges backward. Tarjan's algorithm for
identifying strongly connected components is only slightly more complicated, yet performs just one depth-first
traversal of the graph and only in a forward direction.
Un
Exercise

1. Write about Graphs terminology?


2. What is Graphs representation?

3. Write about About Traversals?


ity

Unit 2.1.5: basic graph algorithms: Depth-first search and Breadth-first search
and its analysis
Unit Outcomes :
Am

At the end of the unit, you will learn


1. Depth-first search
2. Breadth-first search

Depth-first search
)

What if we were to replace the FIFO queue with a LIFO stack? In that case, we get a completely different order
of traversal. Assuming that successors are pushed onto the stack in reverse alphabetic order, the successive
(c

stack states look like this:

BDE
CDE

DE

ty
With a stack, the search will proceed from a given node as far as it can before backtracking and considering
other nodes on the stack. For example, node E had to wait until all nodes reachable from B and D were
considered. This is a depth-first search.

si
A more standard way of writing depth-first search is as a recursive function, using the program stack as the
stack above. We start with every node white except the starting node and apply the function DFS to the starting
node:

er
DFS(Vertex v) {

mark v visited

iv
set color of v to gray

for each successor v' of v {


Un
if v' not yet visited {

DFS(v')

}
ity

set color of v to black

You can think of this as a person walking through the graph following arrows and never visiting a node twice
except when backtracking when a dead-end is reached. Running this code on the graph above yields the
Am

following graph colorings in sequence, which are reminiscent of but a bit different from what we saw with the
stack-based version:
)
(c
ty
si
er
iv
Un
ity
Am

Notice that at any given time there is a single path of gray nodes leading from the starting node and leading to
the current node v. This path corresponds to the stack in the earlier implementation, although the nodes end
up being visited in a different order because the recursive algorithm only marks one successor gray at a time.
The sequence of calls to DFS forms a tree. This is called the call tree of the program, and in fact, any program
has a call tree. In this case, the call tree is a subgraph of the original graph:
)
(c
ty
si
The algorithm maintains an amount of state that is proportional to the size of this path from the root. This
makes DFS rather different from BFS, where the amount of state (the queue size) corresponds to the size of the

er
perimeter of nodes at a distance k from the starting node. In both algorithms, the amount of state can be
O(|V|). For DFS this happens when searching a linked list. For BFS this happens when searching a graph with a
lot of branching, such as a binary tree, because there are 2k nodes at a distance k from the root. On a balanced
binary tree, DFS maintains a state proportional to the height of the tree, or O(log |V|). Often the graphs that

iv
we want to search are more like trees than linked lists, and so DFS tends to run faster.

There can be at most |V| calls to DFS_visit. And the body of the loop on successors can be executed at most
|E| times. So the asymptotic performance of DFS is O(|V| + |E|), just like for breadth-first search.
Un
If we want to search the whole graph, then a single recursive traversal may not suffice. If we had started a
traversal with node C, we would miss all the rest of the nodes in the graph. To do a depth-first search of an
entire graph, we call DFS on an arbitrary unvisited node and repeat until every node has been visited. For
example, consider the original graph expanded with two new nodes F and G:
ity
Am

DFS starting at A will not search all the nodes. Suppose we next choose F to start from. Then we will reach all
nodes. Instead of constructing just one tree that is a subgraph of the original graph, we get a forest of two
trees:
)
(c
ty
si
er
Breadth-first search
Breadth-first search (BFS) is a graph traversal algorithm that explores nodes in the order of their distance from
the roots, where distance is defined as the minimum path length from a root to the node. Its pseudo-code looks

iv
like this:

// let s be the source node


Un
frontier = new Queue()

mark root visited (set root.distance = 0)

frontier.push(root)

while frontier not empty {


ity

Vertex v = frontier.pop()

for each successor v' of v {

if v' unvisited {
Am

frontier.push(v')

mark v' visited (v'.distance = v.distance + 1)

}
)

Here the white nodes are those not marked as visited, the gray nodes are those marked as visited and that are
(c

in fronter, and the black nodes are visited nodes no longer in the frontier. Rather than having a visited flag, we
can keep track of a node's distance in the field v.distance. When a new node is discovered, its distance is set to
be one greater than its predecessor v.
When frontier is a first-in, first-out (FIFO) queue, we get breadth-first search. All the nodes on the queue have a
minimum path length within one of each other. In general, there is a set of nodes to be popped off, at some
distance k from the source, and another set of elements, later on, the queue, at distance k+1. Every time a new
node is pushed onto the queue, it is at distance k+1 until all the nodes at distance k are gone, and k then goes
up by one. Therefore newly pushed nodes are always at a distance at least as great as any other gray node.
Suppose that we run this algorithm on the following graph, assuming that successors are visited in alphabetic

ty
order from any given node: :

si
er
In that case, the following sequence of nodes passes through the queue, where each node is annotated by its

iv
minimum distance from the source node A. Note that we're pushing onto the right of the queue and popping
from the left.

A0 B1 D1 E1 C2
Un
Nodes are popped in distance order: A, B, D, E, C. This is very useful when we are trying to find the shortest
path through the graph to something. When a queue is used in this way, it is known as a worklist; it keeps track
of work left to be done.

Exercise
1. Write about Depth-first search?
ity

2. Write about Breadth-first search?


Am

Unit 2.1.6: single source and all-pair shortest path problem


Unit Outcomes :
At the end of the unit, you will learn
1. single-source Problem
2. all-pair shortest path problem
)
(c

single source and all-pair shortest path problem


They all pair briefest way calculation is otherwise called Floyd-Warshall calculation is
utilized to discover all pair most limited way issue from a given weighted diagram. Because
of this calculation, it will create a grid, which will address the base separation from any hub
to any remaining hubs in the chart.

ty
si
er
From the start, the yield grid is the same as the given expense network of the chart. After
that, the yield grid will be refreshed with all vertices k as the middle of the road vertex.

iv
The time intricacy of this calculation is O(V3), where V is the number of vertices in the
Un
diagram.

Information − The expense grid of the chart

036∞∞∞∞

3021∞∞∞
ity

620142∞

∞1102∞4

∞∞42021
Am

∞∞2∞201

∞∞∞4110

Output − Matrix of all pair shortest path.

0345677

3021344

4201323
)

5110233
(c

6332021

7423201

7433110
Algorithm
floydWarshal(cost)

ty
Input − The cost matrix of the given Graph.
Output − Matrix to for shortest path between any vertex to any vertex.

si
Begin

for k := 0 to n, do

for i := 0 to n, do

er
for j := 0 to n, do

if cost[i,k] + cost[k,j] < cost[i,j], then


cost[i,j] := cost[i,k] + cost[k,j]

iv
done

done
Un
done

display the current cost matrix

End

Example
ity

#include<iostream>

#include<iomanip>

#define NODE 7

#define INF 999


Am

using namespace std;

//Cost matrix of the graph

int costMat[NODE][NODE] = {

{0, 3, 6, INF, INF, INF, INF},


)

{3, 0, 2, 1, INF, INF, INF},


(c

{6, 2, 0, 1, 4, 2, INF},

{INF, 1, 1, 0, 2, INF, 4},

{INF, INF, 4, 2, 0, 2, 1},

{INF, INF, 2, INF, 2, 0, 1},


{INF, INF, INF, 4, 1, 1, 0}

};

void floydWarshal(){

ty
int cost[NODE][NODE]; //defind to store shortest distance from any node to any node

for(int i = 0; i<NODE; i++)

si
for(int j = 0; j<NODE; j++)

cost[i][j] = costMat[i][j]; //copy costMatrix to new matrix

er
for(int k = 0; k<NODE; k++){

for(int i = 0; i<NODE; i++)

for(int j = 0; j<NODE; j++)

iv
if(cost[i][k]+cost[k][j] < cost[i][j])

cost[i][j] = cost[i][k]+cost[k][j];
Un
}

cout << "The matrix:" << endl;

for(int i = 0; i<NODE; i++){

for(int j = 0; j<NODE; j++)


ity

cout << setw(3) << cost[i][j];

cout << endl;

}
Am

int main(){

floydWarshal();

Output
)

The matrix:
(c

0354677

3021344

5201323

4110233
6332021

7423201

7433110

ty
Single-Source Shortest Paths

si
The single source most limited way calculation (for subjective weight positive or negative) is likewise
known Bellman-Ford calculation is utilized to discover the least separation from the source vertex to

er
some other vertex. The principle contrast between this calculation with Dijkstra's calculation is, in
Dijkstra's calculation we can't deal with the negative weight, yet here we can deal with it without any
problem.

iv
Un
ity
) Am
(c
Bellman-Ford calculation finds the distance in base up way. From the outset, it finds those distances
which have just one edge in the way. After that expansion the way length to locate every conceivable
arrangement.

Input − The cost matrix of the graph:

ty
06∞7∞

∞ 0 5 8 -4

si
∞ -2 0 ∞ ∞

∞ ∞ -3 0 9

2∞7∞0

er
Output − Source Vertex: 2
Vert: 0 1 2 3 4
Dist: -4 -2 0 3 -6

iv
Pred: 4 2 -1 0 1
The graph has no negative edge cycle
Un
Algorithm
bellmanFord(dist, pred, source)
Input − Distance list, predecessor list, and the source vertex.
Output − True, when a negative cycle is found.
ity

Begin

iCount := 1

maxEdge := n * (n - 1) / 2 //n is number of vertices

for all vertices v of the graph, do


Am

dist[v] := ∞

pred[v] := ϕ

done
dist[source] := 0

eCount := number of edges present in the graph

create edge list named edgeList


)

while iCount < n, do


(c

for i := 0 to eCount, do

if dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i)

dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i)


pred[edgeList[i].v] := edgeList[i].u

done

done

ty
iCount := iCount + 1

for all vertices i in the graph, do

if dist[edgeList[i].v] > dist[edgeList[i].u] + (cost[u,v] for edge i), then

si
return true

done

er
return false

End

iv
Exercise
1. What is the single-source problem?
Un
2. Write about the all-pair shortest path problem.
ity
) Am
(c
Module III:

3.1: Dynamic Programming


Key learning outcomes

ty
At the end of this module, you will learn:

 Concept of Dynamic and Greedy Approach


 Huffman codes and fractional knapsack problem

si
 Greedy Programming: Activity Selection Problem
 Huffman codes and fractional knapsack problem

er
Unit 3.1.1: Concept of Dynamic and Greedy Approach
Unit Outcomes :

iv
At the end of the unit, you will learn
 Dynamic Approach
 Greedy Approach
Un
Dynamic programming approach

The dynamic programming approach is like separation and vanquish in separating the issue into more
modest but then more modest conceivable sub-issues. However, in contrast, to, separate and vanquish,
these sub-issues are not addressed freely. Or maybe, aftereffects of these more modest sub-issues are
recalled and utilized for comparable or covering sub-issues.
ity

Dynamic writing computer programs are utilized where we have issues, which can be separated into
comparative sub-issues, so their outcomes can be re-utilized. For the most part, these algorithms are
utilized for enhancement. Before tackling the close-by sub-issue, the dynamic calculation will attempt to
inspect the consequences of the recently tackled sub-issues. The arrangements of sub-issues are joined to
accomplish the best arrangement.
Am

So we can say that –

The issue ought to have the option to be separated into a more modest covering sub-issue.

An ideal arrangement can be accomplished by utilizing an ideal arrangement of more modest sub-issues.

Dynamic algorithm use Memorization.

Comparison
)

Rather than greedy algorithms, where neighborhood improvement is tended to, dynamic algorithms are
propelled for a general streamlining of the issue.
(c

As opposed to isolate and overcome algorithm, where arrangements are consolidated to accomplish a
general arrangement, the dynamic algorithm utilizes the yield of a more modest sub-issue and afterward
attempt to improve a greater sub-issue. Dynamic algorithms use Memoization to recall the yield of
effectively tackled sub-issues.

Example

ty
The following computer problems can be solved using the dynamic programming approach −

 Fibonacci number series


 Knapsack problem

si
 Tower of Hanoi
 All pair shortest path by Floyd-Warshall
 Shortest path by Dijkstra

er
 Project scheduling

Dynamic programming can be utilized in both top-down and base-up ways. What's more, the greater part

iv
of the occasions, alluding to the past arrangement yield is less expensive than recomputing regarding CPU
cycles.

Greedy Algorithms
Un
A calculation is intended to accomplish an ideal answer for a given issue. In the eager calculation approach,
choices are produced using the given arrangement space. As being insatiable, the nearest arrangement
that appears to give an ideal arrangement is picked.

Greedy is an algorithmic prototype that builds up a solution piece by piece, always choosing the next piece
ity

that offers the most apparent and immediate benefit. So the problems where decide locally optimal also
leads to the global solution are best fit for Greedy.

Eager algorithms attempt to locate a restricted ideal arrangement, which may ultimately prompt
worldwide upgraded arrangements. Notwithstanding, for the most part, the avaricious algorithm doesn't
Am

give all around the world improved arrangements

Counting Coins
This issue is total to an ideal incentive by picking the most un-potential coins and the insatiable
methodology powers the calculation to pick the biggest conceivable coin. If we are given coins of ₹ 1,
2, 5, and 10 and we are approached to tally ₹ 18 then the insatiable method will be −
 1 − Select one ₹ 10 coins, the remaining count is 8
 2 − Then select one ₹ 5 coin, the remaining count is 3
)

 3 − Then select one ₹ 2 coin, the remaining count is 1


(c

 4 − And finally, the selection of one ₹ 1 coins solves the problem

However, it is by all accounts turned out great, for this tally we need to pick just 4 coins. Be that as it
may, if we marginally change the issue, a similar methodology will most likely be unable to deliver a
similar ideal outcome.
For the money framework, where we have coins of 1, 7, 10 worth, tallying coins for esteem 18 will be
ideal yet for a tally like 15, it might utilize a greater number of coins than needed. For instance, the
ravenous methodology will utilize 10 + 1 + 1 + 1 + 1 + 1, complete 6 coins. While a similar issue could

ty
be addressed by utilizing just 3 coins (7 + 7 + 1)

si
Henceforth, we may reason that the greedy methodology picks a prompt enhanced arrangement and
may bomb where worldwide improvement is a significant concern.

Examples

er
Most networking algorithms use the greedy approach. Here is a list of a few of them −

 Travelling Salesman Problem

iv
 Prim's Minimal Spanning Tree Algorithm
 Kruskal's Minimal Spanning Tree Algorithm
 Dijkstra's Minimal Spanning Tree Algorithm
Un
 Graph - Map Coloring
 Graph - Vertex Cover
 Knapsack Problem
 Job Scheduling Problem
There are lots of similar problems that use the greedy approach to find an optimum solution.
ity

Exercise
1. What is a Dynamic Approach?
2. What is Greedy Approach?
Am

Unit 3.1.2: Comparison between Dynamic and Greedy Approach to solve a


problem
Unit Outcomes :
At the end of the unit, you will learn
)

 Comparison between Dynamic and Greedy Approach


(c

A Greedy calculation is an algorithmic worldview that develops an answer piece by piece, continually
picking the following piece that offers the clearest and prompt advantage. So the issues where picking
locally ideal additionally prompts a worldwide arrangement that is best fit for Greedy.
For instance, think about the Fractional Knapsack Problem. The neighborhood ideal system is to pick the
thing that has the greatest worth versus weight proportion. This methodology additionally prompts
worldwide ideal arrangement since we permitted taking parts of a thing.

ty
Dynamic writing computer programs is fundamentally a streamlining over plain recursion. Any place we
see a recursive arrangement that has rehashed requires similar information sources, we can enhance it
utilizing Dynamic Programming. The thought is to just store the aftereffects of subproblems with the goal

si
that we don't need to re-register them when required later. This straightforward enhancement lessens
time intricacies from remarkable to polynomial. For instance, if we compose a basic recursive answer for
Fibonacci Numbers, we get dramatic time intricacy and on the off chance that we improve it by putting

er
away arrangements of subproblems, time intricacy diminishes to straight.

iv
Un
ity

Difference Between Greedy Method and Dynamic Programming

Feature Greedy method Dynamic programming


Feasibility In a greedy Algorithm, we make In Dynamic Programming, we decide on
whatever choice seems best at the each step considering the current problem
moment in the hope that it will lead to and solution to the previously solved
Am

the global optimal solution. subproblem to calculate the optimal


solution.
Optimality In Greedy Method, sometimes there is It is guaranteed that Dynamic
no such guarantee of getting an Programming will generate an optimal
Optimal Solution. solution as it generally considers all
possible cases and then chooses the best.
Recursion A greedy method follows the problem- Dynamic programming is an algorithmic
solving heuristic of making the locally technique that is usually based on a
)

optimal choice at each stage. recurrent formula that uses some


previously calculated states.
(c

Memorization It is more efficient in terms of memory It requires dp table for memorization and
as it never looks back or revises it increases its memory complexity.
previous choices
Time Greedy methods are generally faster. Dynamic Programming is generally slower.
complexity For example, Dijkstra’s shortest For example, the Bellman-Ford
algorithm takes O(VE) time.
path algorithm takes O(ELogV + VLogV)
time.
Fashion The greedy method computes its Dynamic programming computes its
solution by making its choices in a serial solution bottom-up or top-down by
forward fashion, never looking back or synthesizing them from smaller optimal

ty
revising previous choices. sub solutions.
Example Fractional knapsack. 0/1 knapsack problem

si
er
Comparison Chart

BASIS FOR GREEDY METHOD DYNAMIC PROGRAMMING

COMPARISON

Basic

iv
Generates a single decision Many decision sequences may
Un
sequence. be generated.

Reliability Less reliable Highly reliable


ity

Follows Top-down approach. Bottom-up approach.

Solutions Contain a particular set of the There is no special set of

feasible set of solutions. feasible set of solutions.


Am

Efficiency More Less

Overlapping Cannot be handled Chooses the optimal solution

subproblems to the subproblem.


)
(c

Example Fractional knapsack, shortest 0/1 Knapsack

path.
Key Differences Between Greedy Method and Dynamic Programming

Eager strategy delivers a solitary choice arrangement while in powerful programming numerous choice

ty
groupings might be created.

The dynamic programming approach is more dependable than insatiable methodology.

The insatiable technique follows a top-down approach. As against, dynamic writing computer programs

si
depends on base up procedure.

Insatiable calculation contains an exceptional arrangement of the plausible arrangement of arrangements

er
where nearby decisions of the subproblem prompt the ideal arrangement. Conversely, in powerful
programming one issue is browsed all the auxiliary subproblems which are equipped for giving an ideal
arrangement.

Dynamic writing computer programs is less effective and can be pointlessly exorbitant than greedy

iv
calculation.

The avaricious strategy doesn't deal with covering subproblems though a powerful programming approach
effectively handles the covering subproblems.
Un
The execution of the voracious technique is partial rucksack, most limited way calculation, etcetera. 0/1
rucksack is one of the instances of dynamic programming.

Conclusion

Greedy method and dynamic programming both are utilized to locate the ideal answer for the issue from a
bunch of attainable arrangements. The primary distinction between them is that the Greedy technique
ity

never reevaluates its determinations while Dynamic writing computer programs is converse which
additionally guarantees to give an ideal answer for the issue.

Exercise

1. Comparison between Dynamic and Greedy Approach


Am

Unit 3.1.3: Dynamic Programming: Matrix Chain Multiplication


Unit Outcomes :
At the end of the unit, you will learn
)

 Dynamic Programming
 Matrix Chain Multiplication
(c

Dynamic Programming
Dynamic Programming is principally an enhancement over plain recursion. Any place we see a
recursive arrangement that has rehashed calls for the same data sources, we can streamline it
utilizing Dynamic Programming. The thought is to just store the aftereffects of subproblems, so
we don't need to re-register them when required later. This straightforward enhancement
lessens time intricacies from outstanding to polynomial. For instance, on the off chance that
we compose a straightforward recursive answer for Fibonacci Numbers, we get dramatic time

ty
intricacy and if we streamline it by putting away arrangements of subproblems, time intricacy
lessens to direct.

Introduction

si
Dynamic programming (ordinarily alluded to as DP ) is an exceptionally amazing strategy to
take care of a specific class of issues. It requests rich detailing of the methodology and
straightforward reasoning and the coding part is simple. The thought is exceptionally basic, If

er
you have tackled an issue with the given information, at that point save the outcome for future
reference, to try not to take care of a similar issue once more.. presently 'Recall your Past' :).
On the off chance that the given issue can be separated into more modest sub-issues and these
more modest subproblems are thus partitioned into in any case more modest ones, and in this

iv
cycle, on the off chance that you notice some over-lapping subproblems, its a major clue for
DP. Likewise, the ideal answers for the subproblems add to the ideal arrangement of the given
issue ( alluded to as the Optimal Substructure Property ).
Un
There are two different ways of doing this.

1.) Top-Down: Start taking care of the given issue by separating it. If you see that the issue has
been settled effectively, simply return the saved answer. If it has not been settled, address it
ity

and save the appropriate response. This is generally simple to consider and exceptionally
natural. This is alluded to as Memoization.
Am

2.) Bottom-Up: Analyze the issue and see the request where the sub-issues are tackled and
begin addressing from the minor subproblem, up towards the given issue. In this cycle, it is
ensured that the subproblems are tackled before taking care of the issue. This is alluded to as
Dynamic Programming.

Note that separate and overcome is somewhat an alternate procedure. In that, we partition
the issue into non-covering subproblems and settle them autonomously, as in mergesort and
)

brisk sort.
(c

If you are keen on seeing representations identified with Dynamic Programming give this a
shot.
Corresponding to Dynamic Programming are Greedy Algorithms that settle on a choice
unequivocally every time they need to settle on a decision so that it prompts a close ideal

ty
arrangement. A Dynamic Programming arrangement depends on the head of Mathematical
Induction ravenous algorithm require different sorts of verification.

Matrix Chain Multiplication

si
Given a succession of lattices, locate the most effective approach to duplicate these grids
together. The issue isn't really to play out the increases, however simply to choose in which
request to play out the duplications.

er
We have numerous choices to increase a chain of lattices since framework duplication is
affiliated. As such, regardless of how we parenthesize the item, the outcome will be the
equivalent. For instance, if we had four lattices A, B, C, and D, we would have:

iv
(ABC)D = (AB)(CD) = A(BCD) = ....
Nonetheless, the request where we parenthesize the item influences the number of basic
Un
number-crunching activities expected to figure the item or the effectiveness. For instance,
assume A will be a 10 × 30 grid, B is a 30 × 5 network, and C is a 5 × 60 framework. At that
point,
(AB)C = (10×30×5) + (10×5×60) = 1500 + 3000 = 4500 operations
A(BC) = (30×5×60) + (10×30×60) = 9000 + 18000 = 27000 operations.

Unmistakably the principal parenthesization requires fewer tasks.


ity

Given an exhibit p[] which addresses the chain of lattices with the end goal that the ith grid
Ai is of measurement p[i-1] x p[i]. We need to compose a capacity MatrixChainOrder() that
Am

should restore the base number of duplications expected to increase the chain.

Input: p[] = {40, 20, 30, 10, 30}


Output: 26000
There are 4 networks of measurements 40x20, 20x30, 30x10 and 10x30.

Leave the information 4 grids alone A, B, C, and D. The base number of

augmentations are gotten by placing enclosure in after manner


)

(A(BC))D --> 20*30*10 + 40*20*10 + 40*10*30


(c

Input: p[] = {10, 20, 30, 40, 30}


Output: 30000
There are 4 frameworks of measurements 10x20, 20x30, 30x40 and 40x30.
Leave the information 4 networks alone A, B, C, and D. The base number of

augmentations are gotten by placing enclosure in after manner

((AB)C)D --> 10*20*30 + 10*30*40 + 10*40*30

ty
Input: p[] = {10, 20, 30}
Output: 6000
There are only two matrices of dimensions 10x20 and 20x30. So there

si
is only one way to multiply the matrices, the cost of which is 10*20*30

1) Optimal Substructure:

er
A basic arrangement is to put a bracket at all potential spots, compute the expense for every
situation and return the base worth. In a chain of grids of size n, we can put the initial set of
enclosures in n-1 different ways. For instance, if the given chain is of 4 networks. leave the

iv
chain alone ABCD, at that point, there are 3 different ways to put the initially set of enclosure
external side: (A)(BCD), (AB)(CD), and (ABC)(D). So when we place a bunch of enclosures, we
partition the issue into subproblems of more modest size. Hence, the issue has ideal base
property and can be handily addressed utilizing recursion.
Un
Least number of increase expected to duplicate a chain of size n = Minimum of all n-1
situations (these positions make subproblems of more modest size)
ity

2) Overlapping Subproblems

Following is a recursive usage that essentially follows the above ideal foundation property.
Am

Exercise:

1. What is Dynamic Programming


2. What is Matrix Chain Multiplication
)
(c

Unit 3.1.4: Longest common subsequence and 0/1 Knapsack Problem


Unit Outcomes :
At the end of the unit, you will learn
 Longest common subsequence
 0/1 Knapsack Problem

Longest common subsequence

ty
If a bunch of successions is given, the longest normal aftereffect issue is to locate a typical aftereffect of
the multitude of arrangements that is of maximal length.

si
The longest normal aftereffect issue is an exemplary software engineering issue, the premise of
information correlation projects, for example, the diff-utility, and has applications in bioinformatics. It is

er
additionally broadly utilized by modification control frameworks, like SVN and Git, for accommodating
various changes made to a correction-controlled assortment of documents.

iv
This is a genuine illustration of the procedure of dynamic programming, which is the accompanying basic
thought: start with a recursive calculation for the issue, which might be wasteful because it calls itself
consistently on few subproblems. Essentially recall the answer for each subproblem on the first occasion
when you process it, at that point after that find it as opposed to recomputing it. The general time-bound
Un
at that point turns out to be (ordinarily) relative to the number of particular subproblems instead of the
bigger number of recursive calls. We previously saw this thought momentarily in the principal address.

As we'll see, there are two different ways of doing dynamic programming, top-down and base up. The top-
down (memoizing) technique is nearer to the first recursive calculation, so more clear, however, the base-
up strategy is generally somewhat more productive.
ity

Subsequence testing
Before we characterize the longest regular aftereffect issue, we should begin with a simple warmup.
Assume you're given a short string (design) and long string (text), as in the string coordinating issue. Be
Am

that as it may, presently you need to know whether the letters of the example show up altogether
(however potentially isolated) in the content. On the off chance that they do, we say that the example is an
aftereffect of the content.

For instance, is "nano" an aftereffect of "nematode information"? Indeed, and in just a single way. The
most effortless approach to see this model is to underwrite the aftereffect: "NemAtode kNOwledge".
)
(c

When all is said in done, we can test this before utilizing a limited state machine. Draw circles and bolts as
in the past, comparing to fractional aftereffects (prefixes of the example), yet now there is no requirement
for backtracking.
ty
Equivalently, it is easy to write code or pseudo-code for this:

subseq(char * P, char * T)
{

si
while (*T != '\0')
if (*P == *T++ && *++P == '\0')
return TRUE;
return FALSE;
}

er
Longest common subsequence problem
Imagine a scenario where the example doesn't happen in the content. It bodes well to locate the longest

iv
aftereffect that happens both in the example and in the content. This is the longest regular aftereffect
issue. Since the example and text have symmetric parts, from now into the foreseeable future we will not
give them various names yet call them strings An and B. We'll utilize m to mean the length of An and n to
indicate the length of B.
Un
Note that the automata-hypothetical technique above doesn't tackle the issue - rather it gives the longest
prefix of A that is an aftereffect of B. Be that as it may, the longest normal aftereffect of An and B isn't
generally a prefix of A.
ity

For what reason may we need to take care of the longest regular aftereffect issue? There are a few rousing
applications.

Atomic science. DNA groupings (qualities) can be addressed as successions of four letters ACGT, comparing
Am

to the four submolecular shaping DNA. At the point when scholars locate another grouping, they ordinarily
need to understand what different successions it is generally like. One method of figuring how comparable
two groupings are is to discover the length of their longest basic aftereffect.

Document examination. The Unix program "diff" is utilized to look at two changed adaptations of a similar
record, to figure out what changes have been made to the document. It works by finding the longest
normal aftereffect of the lines of the two records; any line in the aftereffect has not been changed, so what
)

it shows is the leftover arrangement of lines that have changed. On this occasion of the difficulty, we
should consider each line of a record being a solitary confounded character in a string.
(c

Screen redisplay. Numerous word processors like "emacs" show part of a record on the screen, refreshing
the screen picture as the document is changed. For moderate dial-in terminals, these projects need to send
the terminal a couple of characters as conceivable to make it update its presentation effectively. It is
conceivable to see the calculation of the base length grouping of characters expected to refresh the
terminal similar to such a typical aftereffect issue (the basic aftereffect discloses to you the pieces of the
showcase that are as of now right and don't should be changed).

ty
(As an aside, it is normal to characterize a comparable longest basic substring issue, requesting the longest
substring that shows up in two info strings. This issue can be settled in straight time utilizing an
information structure known as the addition tree however the arrangement is amazingly confounded.)

si
Recursive solution
So we want to solve the longest common subsequence problem by dynamic programming.

er
To do this, we first need a recursive arrangement. The unique programming thought doesn't
reveal to us how to discover this, it simply gives us a method of making the arrangement more
productive once we have.

iv
We should begin with certain straightforward perceptions about the LCS issue. On the off
chance that we have two strings, say "nematode information" and "void jug", we can address
an aftereffect as a method of composing the two so certain letters line up:
Un
nematode kno w ledge
|| | | | ||
empty bottle
ity

If we draw lines interfacing the letters in the main string to the relating letters in the second,
no two lines cross (the top and base endpoints happen in a similar request, the request for the
letters in the aftereffect). Alternately any arrangement of lines drawn this way, without
intersections, addresses an aftereffect.
Am

From this we can notice the accompanying straightforward actuality: if the two strings start
with a similar letter, it's consistently protected to pick that beginning letter as the principal
character of the aftereffect. This is because, on the off chance that you have some other
aftereffect, addressed as an assortment of lines as drawn above, you can "push" the furthest
left line to the beginning of the two strings, without causing some other intersections, and get
a portrayal of a similarly long aftereffect that begins thusly.
)
(c

Then again, assume that similar to the model over, the two first characters contrast. At that
point it isn't workable for the two of them to be important for a typical aftereffect - either (or
perhaps both) should be eliminated.
At last, see that whenever we've chosen how to manage the main characters of the strings, the
leftover subproblem is again the longest regular aftereffect issue, on two more limited strings.
Accordingly, we can settle it recursively.

ty
As opposed to finding the actual aftereffect, it ends up being more effective to discover the
length of the longest aftereffect. At that point for the situation where the primary characters
vary, we can figure out which subproblem gives the right arrangement by settling both and

si
taking the maximum of the subsequent aftereffect lengths. When we transform this into a
unique program we will perceive how to get the actual arrangement.

er
These perceptions give us the accompanying, wasteful, recursive calculation.

Recursive LCS:

iv
int lcs_length(char * A, char * B)
{
if (*A == '\0' || *B == '\0') return 0;
else if (*A == *B) return 1 + lcs_length(A+1, B+1);
Un
else return max(lcs_length(A+1,B), lcs_length(A,B+1));
}
This is a correct solution but it's very time-consuming. For example, if the two strings have no matching
characters, so the last line always gets executed, the time bounds are binomial coefficients, which (if m=n)
are close to 2^n.

0/1 Knapsack Problem


ity

In the 0–1 Knapsack problem, we are given a bunch of things, each with a weight and a worth, and we
need to decide the quantity of everything to remember for an assortment so the complete weight is not
exactly or equivalent to a given cutoff and the absolute worth is pretty much as extensive as could be
expected.
Am

Please note that the items are indivisible; we can either take an item or not (0-1 property). For example,

Input:

value = [ 20, 5, 10, 40, 15, 25 ]


weight = [ 1, 2, 3, 8, 7, 4 ]
int W = 10

Output: Knapsack value is 60


)

value = 20 + 40 = 60
weight = 1 + 8 = 9 < W
(c

Knapsack Problem:
The backpack is essentially implying sack. A pack of given limits.
We need to pack n things in your baggage.

The ith thing is worth vi dollars and weight wi pounds.

Take as important a heap as could be expected, yet can't surpass W pounds.

ty
vi wi W are whole numbers.

si
1. W ≤ capacity
2. Value ← Max

er
Input:

o Knapsack of capacity

o List (Array) of weight and their corresponding value.

iv
Output: To maximize profit and minimize weight incapacity.

The knapsack problem where we have to pack the knapsack with maximum value in such a manner that
Un
the total weight of the items should not be greater than the capacity of the knapsack.

Knapsack problem can be further divided into two parts:

1. Fractional Knapsack: Fractional knapsack problem can be solved by Greedy Strategy whereas the 0 /1
problem is not.

It cannot be solved by the Dynamic Programming Approach.


ity

0/1 Knapsack Problem:


This thing can't be broken which implies criminals should accept the thing all in all or should
Am

leave it. That is the reason it is called the 0/1 backpack Problem.

Everything is taken or not taken.

Can't take a fragmentary measure of a thing taken or take a thing more than once.

It can't be tackled by the Greedy Approach since it is empowering to fill the rucksack.
)

The greedy Approach doesn't guarantee an Optimal Solution.


(c

Example of 0/1 Knapsack Problem:


Example: The maximum weight the knapsack can hold is W is 11. There are five items to choose from. Their weights
and values are presented in the following table:
ty
The [i, j] entry here will be V [i, j], the best value obtainable using the first "i" rows of items if the maximum capacity
were j. We begin by initialization and first row.

si
er
iv
V [i, j] = max {V [i - 1, j], vi + V [i - 1, j -wi]
Un
ity
Am

The value of V [3, 7] was computed as follows:

V [3, 7] = max {V [3 - 1, 7], v3 + V [3 - 1, 7 - w3]


= max {V [2, 7], 18 + V [2, 7 - 5]}
= max {7, 18 + 6}
= 24
)
(c
Finally, the output is:

ty
si
The maximum value of items in the knapsack is 40, the bottom-right entry). The dynamic programming approach can
now be coded as the following algorithm:

er
Algorithm of Knapsack Problem
KNAPSACK (n, W)

iv
1. for w = 0, W
2. do V [0,w] ← 0
3. for i=0, n
Un
4. do V [i, 0] ← 0
5. for w = 0, W
6. do if (wi≤ w & vi + V [i-1, w - wi]> V [i -1,W])
7. then V [i, W] ← vi + V [i - 1, w - wi]
8. else V [i, W] ← V [i - 1, w]
ity

Exercise

 Write about Longest common subsequence?


 What is the 0/1 Knapsack Problem?
Am

Unit 3.1.5: Greedy Programming: Activity Selection Problem


Unit Outcomes :
At the end of the unit, you will learn

 Activity Selection Problem


 Steps for Activity Selection Problem
)
(c

Activity Selection Problem


Greedy is an algorithmic worldview that develops an answer piece by piece, continually picking
the following piece that offers the most evident and quick advantage. Ravenous algorithms are
utilized for improvement issues. An enhancement issue can be tackled utilizing Greedy if the
issue has the accompanying property: At each progression, we can settle on a decision that
takes a gander right now, and we get the ideal arrangement of the total issue.

ty
If a Greedy Algorithm can tackle any issue, it for the most part turns into the best strategy to
take care of that issue as the Greedy algorithm is as a rule more productive than different
procedures like Dynamic Programming. However, the Greedy algorithm can't generally be

si
applied. For instance, the Fractional Knapsack issue (See this) can be addressed utilizing
Greedy, however, 0-1 Knapsack can't be settled utilizing Greedy.

er
What is the Activity Selection Problem?
How about we consider that you have n exercises with their beginning and finish times, the
goal is to discover an arrangement set having the greatest number of non-clashing exercises
that can be executed in a solitary period, accepting that just a single individual or machine is

iv
accessible for execution.

A few focuses to note here:


Un
It probably won't be conceivable to finish all the exercises, since their timings can fall.

Two exercises, say I and j, are supposed to be non-clashing if si >= fj or sj >= fi where si and sj
signify the beginning season of exercises I and j separately, and fi and fj allude to the
completing season of the exercises I and j individually.
ity

The insatiable methodology can be utilized to discover the arrangement since we need to
augment the tally of exercises that can be executed. This methodology will insatiably pick an
action with the most punctual completion time at each progression, along these lines yielding
an ideal arrangement.
Am

Algorithm
The algorithm of Activity Selection is as follows:

Activity-Selection(Activity, start, finish)


Sort Activity by finish times stored in finish
)

Selected = {Activity[1]}
n = Activity.length
(c

j=1
for i = 2 to n:
if start[i] ≥ finish[j]:
Selected = Selected U {Activity[i]}
j=i
return Selected

Complexity

ty
Time Complexity:
When activities are sorted by their finish time: O(N)
When activities are not sorted by their finish time, the time complexity is O(N log N) due to the complexity of sorting
Example

si
er
iv
Un
ity
) Am
(c
(c
) Am
ity
Un
iv
er
si
ty
ty
si
er
iv
Un
In this example, we take the start and finish time of activities as follows:
start = [1, 3, 2, 0, 5, 8, 11]
finish = [3, 4, 5, 7, 9, 10, 12]
Sorted by their finish time, activity 0 gets selected. As activity 1 has starting time that is equal to the finish time of
activity 0, it gets selected. Activities 2 and 3 have a smaller starting time than the finish time of activity 1, so they get
rejected. Based on similar comparisons, activities 4 and 6 also get selected, whereas activity 5 gets rejected. In this
ity

example, in all the activities 0, 1, 4, and 6 get selected, while others get rejected.

Steps for Activity Selection Problem


Following are the steps we will be following to solve the activity selection problem,
Step 1: Sort the given activities in ascending order according to their finishing time.
Am

Step 2: Select the first activity from sorted array act[] and add it to sol[] array.
Step 3: Repeat steps 4 and 5 for the remaining activities in act[].
Step 4: If the start time of the currently selected activity is greater than or equal to the finish time of the
previously selected activity, then add it to the sol[] array.
Step 5: Select the next activity in the act[] array.
Step 6: Print the sol[] array.
)
(c
Activity Selection Problem Example
Let's try to trace the steps of the above algorithm using an example:
In the table below, we have 6 activities with the corresponding start and end time, the objective is to

ty
compute an execution schedule having a maximum number of non-conflicting activities:

Start Time (s) Finish Time (f) Activity Name

si
5 9 a1

er
1 2 a2

iv
3 4 a3
Un
0 6 a4

5 7 a5
ity

8 9 a6

A possible solution would be:


Step 1: Sort the given activities in ascending order according to their finishing time.
Am

The table after we have sorted it:

Start Time (s) Finish Time (f) Activity Name

1 2 a2
)
(c

3 4 a3
0 6 a4

ty
5 7 a5

si
5 9 a1

er
8 9 a6

Step 2: Select the first activity from sorted array act[] and add it to the sol[] array, thus sol = {a2}.
Step 3: Repeat steps 4 and 5 for the remaining activities in act[].

iv
Step 4: If the start time of the currently selected activity is greater than or equal to the finish time of
the previously selected activity, then add it to sol[].
Step 5: Select the next activity in act[]
Un
For the data given in the above table,

A. Select activity a3. Since the start time of a3 is greater than the finish time of a2 (i.e. s(a3) >
f(a2)), we add a3 to the solution set. Thus sol = {a2, a3}.

B. Select a4. Since s(a4) < f(a3), it is not added to the solution set.
ity

C. Select a5. Since s(a5) > f(a3), a5 gets added to solution set. Thus sol = {a2, a3, a5}
D. Select a1. Since s(a1) < f(a5), a1 is not added to the solution set.
E. Select a6. a6 is added to the solution set since s(a6) > f(a5). Thus sol = {a2, a3, a5, a6}.
Am

Step 6: At last, print the array sol[]


Hence, the execution schedule of the maximum number of non-conflicting activities will be:

(1,2)

(3,4)

(5,7)
)

(8,9)
(c

Exercise
 What is the Activity Selection Problem?
 Write the Steps for Activity Selection Problem?
Unit 3.1.5: Huffman codes and fractional knapsack problem
Unit Outcomes :

ty
At the end of the unit, you will learn

 Huffman codes
 Fractional knapsack problem

si
er
Greedy algorithm – and Huffman code

Two primary properties:

1. Greedy decision property: At every choice point, settle on the decision that is best on the

iv
occasion. We ordinarily show that on the off chance that we settle on a ravenous decision, just
one property stays (in contrast to dynamic programming, where we need to address numerous
subproblems to settle on a decision)
Un
2. Ideal foundation: This was additionally a sign of dynamic programming. In the avaricious
algorithm, we can show that having settled on the greedy decision, at that point a blend of the
ideal answer for the leftover subproblem and the insatiable decision, gives an ideal answer for
the first issue. (note: this is expecting that the greedy decision prompts an ideal arrangement;
few out of every odd ravenous the decision does as such).
ity

Greedy versus dynamic:


- both powerful programming and insatiable algorithm utilize the ideal base

- however when we have a powerful programming answer for an issue, voracious some of the
Am

time does or doesn't ensure the ideal arrangement (when not ideal, can frequently
demonstrate by inconsistency; discover a model wherein the insatiable decision doesn't
prompt an ideal arrangement)

- could be unpretentious contrasts between issues that fit into the two methodologies

Model: two rucksack issues.


)

a. 0-1 rucksack issue


(c

- n things worth vi dollars each, and gauging wi pounds each.

- a cheat ransacking a store (or somebody pressing for an excursion… ) can convey all things
considered w pounds in the rucksack and needs to take the most important things
- It's called 0-1 because everything can either be taken in entire or not taken at all.

b. Fragmentary rucksack issue:

ty
- same as above, then again, actually criminal (or outing packer) would now be able to take
parts of things.

Here the fragmentary rucksack issue (b) has an avaricious system that is ideal yet, the 0-1 issue

si
(a) doesn't!

We show the figure in the book, and afterward, give a concise clarification

er
Clarification:

a. 0-1: Consider taking things in a covetous way dependent on the most elevated worth per

iv
pound. In the 0-1 backpack issue, this would prompt a logical inconsistency, since

at that point would take 10 the pound thing first with esteem per pound equivalent to 6 (non
Un
ideal). Note: we can just take everything one time all things considered.

To get the ideal answer for the 0-1 issue, we should think about the arrangement or

subproblem that incorporates the 10-pound thing, with the arrangement or subproblem that
ity

avoids it. There are many covering subproblems we should look at.

b. Fragmentary: The partial has an ideal avaricious arrangement of filling the

most noteworthy worth per pound first until the rucksack is full. This works surely
Am

since we can fill parts of things, thus continue until rucksack is totally

full. See board (c) in the figure.


)
(c
ty
si
er
A model showing that the voracious procedure doesn't work for the 0-1 backpack

iv
issue. (a) The criminal should choose a subset of the three things shown whose weight should
not surpass
Un
50 pounds. (b) The ideal subset incorporates things 2 and 3. Any arrangement with thing 1 is
problematic,

although thing 1 has the best worth per pound. (c) For the partial backpack issue, taking

the things arranged by most prominent worth per pound yields an ideal arrangement.
ity

decision. The issue formed in this manner offers to ascend to many covering subproblems—a
sign of dynamic programming, and undoubtedly, as Exercise 16.2-2

requests that you show, we can utilize dynamic programming to tackle the 0-1 issue.
Am

Activities

16.2-1

Demonstrate that the partial rucksack issue has the covetous decision property.

16.2-2
)

Give a dynamic-programming answer for the 0-1 backpack issue that runs in
(c

O.n W/time, where n is the number of things and W is the most extreme load of

things that the criminal can place in his rucksack.

16.2-3
Assume that in a 0-1 backpack issue, the request for the things when arranged by

expanding weight is equivalent to their request when arranged by diminishing worth. Give

ty
an effective calculation to locate an ideal answer for this variation of the backpack

the issue, and contend that your calculation is right.

si
16.2-4

Educator Gekko has consistently longed for inline skating across North Dakota. He

er
plans to go across the state on thruway U.S. 2, which runs from Grand Forks, on the

eastern line with Minnesota, to Williston, close to the western line with Montana.

iv
Additionally note: These are notable issues you find out about in meetings

today; could be for some improvement issues…


Un
Huffman code

A helpful application for avaricious calculations is for pressure—putting away pictures or

words with the least measure of pieces.


ity

1. Illustration of coding letters (wastefully)-

A - > 00 ("code word")

B - > 01
Am

C - > 10

D - > 11

AABABACA is coded by:

0000010001001000
)
(c

This is inefficient; a few characters may show up more frequently than others (for

For example, if "A" is more regular than "D" in the English language or a coursebook

that you are coding and wish to pack). Be that as it may, in the methodology most importantly
characters are addressed with the equivalent number of pieces.

2. More productive: on the off chance that a few characters show up more regularly, we can

ty
code

them with a more limited length in bits. Suppose A shows up more regularly and then B.

si
A - > 0 (regular)

B - > 10

er
C - > 110

D - > 111 (less regular)

iv
AABABACA is coded by:

001001001100
Un
We addressed similar grouping with fewer pieces = pressure. This is a

variable length code.

This is for example application for the English language ("a" more successive than "q").
ity

This sort of approach of coding more successive characters with fewer pieces is very

helpful in pressure, and comparative methodologies likewise apply to the pressure of

pictures.
Am

Prefix codes: We consider just codes in which no codeword is a prefix for the

another one (= a beginning for the other one).

[so we're truly just considering non-prefix codes, albeit that is the word that is

used]
)

Prefix codes are helpful because as we'll see, it is simpler to interpret (go from
(c

001001001100 to the characters).

Trees corresponding to the coding examples


ty
si
A few notes on the trees:

er
- Later: how to build trees and the recurrence hubs.

- We can without much of a stretch use tree for translating – continue to go down until arriving
at a leaf as

iv
you experience 101 (b) 0 (a) 1101 (f) (in the variable length).
Un
- Variable length is a full twofold tree (two youngsters for each hub until reach

leaves). Fixed length isn't.

Principle insatiable methodology for building the Huffman tree: Begins with a bunch of

leaves, and each time distinguishes the two least continuous items to combine.
ity

At the point when we combine the two articles, the outcome is currently an item whose total is
the

recurrence of the blended items.


Am

Model building Huffman code tree:


)
(c
ty
si
er
iv
Un
The means of Huffman's calculation for the frequencies are given in Figure 16.3. Each part

shows the substance of the line arranged into expanding request by recurrence. At each
progression, the two
ity

trees with the most minimal frequencies are combined. Leaves have appeared as square
shapes containing a character

what's more, it's recurrence. Inside hubs have appeared as circles containing the amount of the
frequencies of their
Am

youngsters. An edge interfacing an inward hub with its youngsters is named 0 if it is an edge to
one side

youngster and 1 if it is an edge to a correct kid. The codeword for a letter is the arrangement of
names on the

edges interfacing the root to the leaf for that letter. (a) The underlying arrangement of n D 6
hubs, one for each
)
(c

letter. (b)– (e) Intermediate stages. (f) The last tree


ty
si
er
The principle thought Huffman pseudo-code: Here C is the characters. Over and again removes

iv
two least frequencies from a need line Q (eg, a stack) and consolidations them

as another hub in the line. Toward the end, restores the one hub left in the line,
Un
which is the ideal tree.

Run time: Huffman code:

For circle runs n-multiple times O(n)


ity

Each extricating min requires O(log n)

Complete: O(n log n)

(the line additionally should be introduced with the characters and their frequencies,
Am

in any case, doesn't change the general run time)

Exercise

1. What are Huffman codes?


2. Write about the Fractional knapsack problem?
)
(c
Module – IV: String Matching Algorithms

ty
Key Learning Outcomes
At the End of thismodule you will learn:

 Concept of String Matching Algorithms

si
 Concept of Exact String-Matching Algorithms
 Concept of Brute Force algorithm
 Concept of Naive Bayes
 What is Rabin-Karp Algorithm

r
 Concept of Knuth-Morris-Pratt Algorithm

ve
Concept of Simon's Algorithm
 Come to Know about Reverse Colussi algorithm.
 Approximation Algorithms: Vertex Cover
 What is the Purpose of Set Cover Problem and its Uses?

ni
U
Unit Outcomes :
At the end of the unit, you will learn
 Concept of String Matching Algorithms

ity

Concept of Exact String-Matching Algorithms


 Concept of Brute Force algorithm
 Concept of Naive Bayes
 What is Rabin-Karp Algorithm
m

4.1 String Matching Algorithms

String Matching Algorithms -: String Matching Algorithms have significantly impacted


software engineering and assume a fundamental part in different genuine issues. It helps in
)A

performing time-proficient assignments in various areas. These calculations are valuable on


account of looking through a string inside another string. String coordinating is likewise utilized in
the Database construction, Network frameworks.

Allow us to take a look at a couple of string coordinating calculations before continuing to their
applications in the genuine world.
(c

4.1.1 Exact String-Matching Algorithms -String-matching is a vital subject in the more extensive
space of text preparation. String-coordinating calculations are essential parts utilized in executions of
down-to-earth programming existing under most working frameworks. Also, they stress programming
techniques that fill in as ideal models in different fields of software engineering (framework or
1
programming plan). At last, they likewise assume a significant part in hypothetical software engineering
by giving testing issues.
Althoughthe information is retained differently, the content remaining parts the primary structure to
trade data. This is especially apparent in writing or phonetics where information is made out of

ty
enormous corpus and word references. This applies too to software engineering where a lot of
information is put away indirect records. What's more, this is likewise the situation, for example, in
atomic science because organic particles can regularly be approximated as groupings of nucleotides or
amino acids. Besides, the amount of accessible information in these fields keeps an eye on twofold like

si
clockwork. This is the motivation behind why calculations ought to be effective regardless of whether
the speed and limit of the capacity of PCs increment routinely.
String-matching comprises discovering, at least one, by and large, all the events of a string (all the more,
for the most part, called an example) in a content. All the calculations in this book yield all events of the
example in the content. The example is indicated by x=x [0 .. m-1]; its length is equivalent to m. The

r
content is indicated by y=y[0 .. n-1]; its length is equivalent to n. The two strings are work over a limited
arrangement of the character called a letter set meant by Sigma with a size is equivalent to sigma.

ve
Applications require two sorts of arrangements relying upon which string, the example or the content, is
given first. Calculations dependent on the utilization of automata or combinatorial properties of strings
are regularly actualized to preprocess the example and take care of the principal sort of issue. The idea
of lists acknowledged by trees or automata is utilized in the second sort of arrangements.
They check the content with the assistance of a window which size is by and large equivalent to m. They

ni
initially adjust the left finishes off the window and the content, at that point contrast the characters of
the window and the characters of the example - this particular work is called an endeavor - and after an
entire match of the example or after a confound they move the window to one side. They rehash a
similar technique until the correct finish of the window goes past the correct finish of the content. This
U
instrument is normally called the sliding window component. We partner each endeavor with the
position j in the content when the window is situated on y[j .. j+m-1].

The Brute Force algorithm finds all events of x in y in time O(mn). The numerous enhancements of the
beast power strategy can be grouped relying upon the request they played out the correlations between
design characters and text characters et each endeavor. Four classes emerge: the most common
ity

approach to play out the correlations is from left to right, which is the understanding course; playing out
the examinations from option to left by and large prompts the best calculations by and by; the best
hypothetical limits are arrived at when correlations are done in a particular request; at long last there
exist a few calculations for which the request wherein the examinations are done isn't significant (such
as the savage power calculation).
m

4.1.2 From left to right

Hashing gives a basic technique that stays away from the quadratic number of character examinations in
)A

most down-to-earth circumstances, and that runs in direct time under sensible probabilistic suspicions. It
has been presented by Harrison and later completely investigated by Karp and Rabin

Accepting that the example length is no longer than the memory-word size of the machine, the Shift Or
calculation is a proficient calculation to tackle the specific string-coordinating issue and it adjusts
effectively to a wide scope of estimated string-coordinating issues.
(c

The principal direct time string-matching algorithm is from Morris and Pratt. It has been improved by
Knuth, Morris, and Pratt. The inquiry acts like an acknowledgment interaction via machine, and a character
of the content is contrasted with a character of the example close to Log Phi(m+1) (Phi is the golden ratio)

2
Handcart demonstrated that this deferral of a connected calculation found by Simon makes close to
1+log2m correlations per text character. Those three calculations perform all things considered 2n-1
content character correlations in the most pessimistic scenario.

ty
The pursuit with a Deterministic Finite Automaton performs precisely n text character examinations
however it requires an additional room in O(msigma). The Forward Dawg Matching calculation performs
the very same number of text character examinations utilizing the postfix robot of the example.

The Apostolico-Crochemore calculation is a straightforward calculation that performs 3/2n content

si
character correlations in the most pessimistic scenario.

The Not So Naive calculation is an exceptionally basic calculation with a quadratic most pessimistic
scenario time intricacy however it requires a preprocessing stage in steady existence and is somewhat sub-

r
straight in the normal case.

ve
4.1.3From right to left

ni
The Boyer-Moore algorithm is considered the most proficient string-coordinating calculation in normal
applications. An improved form of it (or the whole calculation) is regularly actualized in word processors
for the "search" and "substitute" orders. Cole demonstrated that the most extreme number of character
correlations is firmly limited by 3n after the preprocessing for non-intermittent examples. It possesses
U
quadratic most pessimistic scenario energy for intermittent examples.

A few variations of the Boyer-Moore calculation evade its quadratic conduct. The most productive
arrangements as far as the number of image examinations have been planned by Apostolico and Giancarlo,
Crochemore et alii (Turbo BM), and Colussi (Reverse Colussi).
ity

Observational outcomes show that the varieties of Boyer and Moore's calculation planned by Sunday
(Quick Search) and a calculation dependent on the addition robot by Crochemore et alii (Reverse Factor
and Turbo Reverse Factor) are the most proficient practically speaking.

The Zhu and Takaoka and Berry-Ravindran calculations are variations of the Boyer-Moore calculation
which require additional room in O(sigma2).
m

4.1.4In a specific order


)A

The two first linear ideal space string-algorithms calculations are because of Galil-Seiferas and
Crochemore-Perrin (Two Way calculation). The parcel the example in two sections, the first quest for the
correct piece of the example from left to right, and afterward if no jumble happens, they look for the left
part.

The algorithms of Colossi and Galil-Giancarlo parcel the arrangement of example positions into two
subsets. The first quest for the example characters whose positions are in the primary subset from left to
(c

right and afterward if no crisscross happens, they look for the excess characters from left to right. The
Colussi calculation is an improvement over the Knuth-Morris-Pratt algorithm and performs all things
considered 3/2n content character correlations in the most pessimistic scenario. The Galil-Giancarlo
calculation improves the Colussi calculation in one extraordinary case which empowers it to perform all
things considered 4/3n content character correlations in the most pessimistic scenario.
3
Sunday's Optimal Mismatch and Maximal Shift algorithms sort the pattern positions according to their
character frequency and their leading shift respectively.

Skip Search, KMP Skip Search and Alpha Skip Search algorithms by Charras (et alii) use buckets to

ty
determine starting positions on the pattern in the text.

4.1.4In any order

si
The Horspool algorithm is a variation of the Boyer-Moore algorithm, it utilizes one of his days of work and
the request in which the content character examinations are performed is superfluous. This is likewise
valid for the wide range of various variations like the Quick Search of Sunday, Tuned Boyer-Moore of Hume

r
and Sunday, the Smith algorithm,and the Raita algorithm.

Definitions

ve
We will think about functional inquiries. We will expect that the letters in order are the arrangement of
ASCII codes or any subset of them. The calculations are introduced in C programming language, in this
manner for a word w of length ell the characters are w[0], ...,w[ell-1] and w[ell] contained the exceptional
end character (invalid character) that can't happen anyplace inside any word however eventually. The two

Let us introduce some definitions.


ni
words the example and the content live in primary memory.

A word u is a prefix of a word w is there existing a word v (perhaps unfilled) with the end goal that w=uv.
U
A word v is a postfix of a word w is there existing a word u (perhaps unfilled) to such an extent that w=uv.
A word z is a substring or a subword of a factor of a word w is there existing two words u and v
(conceivably unfilled) with the end goal that w=uzv.
ity

A number p is a time of a word w if for I, 0 leq I < m-p, w[i]=w[i+p]. The littlest time of w is known as the
time frame, it is signified by per(w).
A word w of length ell is occasional if the length of his littlest period is more modest or equivalent to ell/2,
else it is non-intermittent.
m

A word w is fundamental on the off chance that it can't be composed as a force of another word: there
exists no word z and no number k with the end goal that w=zk.
A word z is a boundary of a word w if there exist two words u and v to such an extent that w=uz=zv, z is
both a prefix and a postfix of w. Note that for this situation |u|=|v| is a time of w.
)A

The opposite of a word w of length ell signified by wR is the perfect representation of w; wR=w[ell-1]w[ell-
2] ... w[1]w[0].
A Deterministic Finite Automaton (DFA) A is a quadruple (Q, q0, T, E) where:

 Q is a finite set of states;



(c

q0 in Q is the initial state;


 T, a subset of Q, is the set of terminal states;
 E, a subset of (Q.Sigma.Q), is the set of transitions.

4
For each definite string-coordinating algorithm introduced in the current book we first give its principle
highlights, at that point we clarified how it functions before giving its C code. After that, we show its
conduct on a commonplace model where x=GCAGAGAG and y=GCATCGCAGAGAGTATACAGTACG. At last,
we give a rundown of references where the per user will discover more definite introductions and

ty
confirmations of the calculation. At each endeavor, matches have appeared in light dim while bungles have
appeared in the dim dark. A number demonstrates the request wherein the character correlations are
done except for the calculations utilizing automata where the number addresses the state came to after
the character assessment

si
Implementation
In this book, we utilize C strings and expect that a line of length m contains a (m+1) th image (generally

r
doled out with the worth '\0'). In any case, some calculation usage may be smashed when getting to a
position just past the genuine last character of a string. (Much thanks to you too David B. Trout from

ve
Software Development Laboratories for pointing this specific point).

In this book, we will utilize old-style devices. One of them is a connected rundown of the whole
number. It will be characterized in C as follows:

struct _cell{

};
int element;
struct _cell *next; ni
U
typedef struct _cell *List;
Another significant design are automata and explicitly addition automata. Essentially automata are
coordinated diagrams. We will utilize the accompanying interface to control automata:
ity

Graph newGraph(int v, int e);

Graph newAutomaton(int v, int e);

Graph newSuffixAutomaton(int v, int e);

int newVertex(Graph g);


m

int getInitial(Graph g);

booleanisTerminal(Graph g, int v);


)A

void setTerminal(Graph g, int v);

int getTarget(Graph g, int v, unsigned char c);

void setTarget(Graph g, int v, unsigned char c, int t);

int getSuffixLink(Graph g, int v);


(c

void setSuffixLink(Graph g, int v, int s);

int getLength(Graph g, int v);

void setLength(Graph g, int v, int ell);

int getPosition(Graph g, int v);


5
void setPosition(Graph g, int v, int p);

int getShift(Graph g, int v, unsigned char c);

void setShift(Graph g, int v, unsigned char c, int s);

ty
void copyVertex(Graph g, int target, int source);

si
A possible implementation of this interface follows.

struct _graph {

int vertexNumber,

r
edgeNumber,

ve
vertexCounter,

initial,

*terminal,

*target,

*suffixLink,

*length,
ni
U
*position,

*shift;

};
ity

typedef struct _graph *Graph;

typedef int boolean;


m

#define UNDEFINED -1
)A

/* returns a new data structure for

a graph with v vertices and e edges */

Graph newGraph(int v, int e) {

Graph g;
(c

g = (Graph)calloc(1, sizeof(struct _graph));

if (g == NULL)

6
error("newGraph");

g->vertexNumber = v;

g->edgeNumber = e;

ty
g->initial = 0;

g->vertexCounter = 1;

si
return(g);

r
ve
/* returns a new data structure for

a automaton with v vertices and e edges */

Graph newAutomaton(int v, int e) {

Graph aut;

aut = newGraph(v, e);


ni
U
aut->target = (int *)calloc(e, sizeof(int));

if (aut->target == NULL)

error("newAutomaton");
ity

aut->terminal = (int *)calloc(v, sizeof(int));

if (aut->terminal == NULL)

error("newAutomaton");
m

return(aut);

}
)A

/* returns a new data structure for

a suffix automaton with v vertices and e edges */

Graph newSuffixAutomaton(int v, int e) {


(c

Graph aut;

aut = newAutomaton(v, e);

7
memset(aut->target, UNDEFINED, e*sizeof(int));

aut->suffixLink = (int *)calloc(v, sizeof(int));

if (aut->suffixLink == NULL)

ty
error("newSuffixAutomaton");

aut->length = (int *)calloc(v, sizeof(int));

si
if (aut->length == NULL)

error("newSuffixAutomaton");

aut->position = (int *)calloc(v, sizeof(int));

r
if (aut->position == NULL)

ve
error("newSuffixAutomaton");

aut->shift = (int *)calloc(e, sizeof(int));

if (aut->shift == NULL)

}
error("newSuffixAutomaton");

return(aut);
ni
U
/* returns a new data structure for
ity

a trie with v vertices and e edges */

Graph newTrie(int v, int e) {

Graph aut;
m

aut = newAutomaton(v, e);

memset(aut->target, UNDEFINED, e*sizeof(int));


)A

aut->suffixLink = (int *)calloc(v, sizeof(int));

if (aut->suffixLink == NULL)

error("newTrie");

aut->length = (int *)calloc(v, sizeof(int));


(c

if (aut->length == NULL)

error("newTrie");

aut->position = (int *)calloc(v, sizeof(int));

8
if (aut->position == NULL)

error("newTrie");

aut->shift = (int *)calloc(e, sizeof(int));

ty
if (aut->shift == NULL)

error("newTrie");

si
return(aut);

r
ve
/* returns a new vertex for graph g */

int newVertex(Graph g) {

if (g != NULL && g->vertexCounter<= g->vertexNumber)

}
return(g->vertexCounter++);

error("newVertex");
ni
U
/* returns the initial vertex of graph g */
ity

int getInitial(Graph g) {

if (g != NULL)

return(g->initial);
m

error("getInitial");

}
)A

/* returns true if vertex v is terminal in graph g */

booleanisTerminal(Graph g, int v) {

if (g != NULL && g->terminal != NULL &&


(c

v < g->vertexNumber)

return(g->terminal[v]);

error("isTerminal");

9
}

ty
/* set vertex v to be terminal in graph g */

void setTerminal(Graph g, int v) {

si
if (g != NULL && g->terminal != NULL &&

v < g->vertexNumber)

g->terminal[v] = 1;

r
else

ve
error("isTerminal");

/* returns the target of edge from vertex v

labelled by character c in graph g */


ni
U
int getTarget(Graph g, int v, unsigned char c) {

if (g != NULL && g->target != NULL &&

v < g->vertexNumber&& v*c < g->edgeNumber)


ity

return(g->target[v*(g->edgeNumber/g->vertexNumber) +

c]);

error("getTarget");
m

}
)A

/* add the edge from vertex v to vertex t

labelled by character c in graph g */

void setTarget(Graph g, int v, unsigned char c, int t) {

if (g != NULL && g->target != NULL &&


(c

v < g->vertexNumber&&

v*c <= g->edgeNumber&& t < g->vertexNumber)

g->target[v*(g->edgeNumber/g->vertexNumber) + c] = t;

10
else

error("setTarget");

ty
si
/* returns the suffix link of vertex v in graph g */

int getSuffixLink(Graph g, int v) {

if (g != NULL && g->suffixLink != NULL &&

r
v < g->vertexNumber)

ve
return(g->suffixLink[v]);

error("getSuffixLink");

/* set the suffix link of vertex v


ni
U
to vertex s in graph g */

void setSuffixLink(Graph g, int v, int s) {

if (g != NULL && g->suffixLink != NULL &&


ity

v < g->vertexNumber&& s < g->vertexNumber)

g->suffixLink[v] = s;

else
m

error("setSuffixLink");

}
)A

/* returns the length of vertex v in graph g */

int getLength(Graph g, int v) {

if (g != NULL && g->length != NULL &&


(c

v < g->vertexNumber)

return(g->length[v]);

error("getLength");

11
}

ty
/* set the length of vertex v to integer ell in graph g */

void setLength(Graph g, int v, int ell) {

si
if (g != NULL && g->length != NULL &&

v < g->vertexNumber)

g->length[v] = ell;

r
else

ve
error("setLength");

/* returns the position of vertex v in graph g */

int getPosition(Graph g, int v) {


ni
U
if (g != NULL && g->position != NULL &&

v < g->vertexNumber)

return(g->position[v]);
ity

error("getPosition");

}
m

/* set the length of vertex v to integer ell in graph g */

void setPosition(Graph g, int v, int p) {


)A

if (g != NULL && g->position != NULL &&

v < g->vertexNumber)

g->position[v] = p;

else
(c

error("setPosition");

12
/* returns the shift of the edge from vertex v

labelled by character c in graph g */

ty
int getShift(Graph g, int v, unsigned char c) {

if (g != NULL && g->shift != NULL &&

si
v < g->vertexNumber&& v*c < g->edgeNumber)

return(g->shift[v*(g->edgeNumber/g->vertexNumber) +

c]);

r
error("getShift");

ve
}

/* set the shift of the edge from vertex v

ni
labelled by character c to integer s in graph g */

void setShift(Graph g, int v, unsigned char c, int s) {


U
if (g != NULL && g->shift != NULL &&

v < g->vertexNumber&& v*c <= g->edgeNumber)

g->shift[v*(g->edgeNumber/g->vertexNumber) + c] = s;
ity

else

error("setShift");

}
m

/* copies all the characteristics of vertex source


)A

to vertex target in graph g */

void copyVertex(Graph g, int target, int source) {

if (g != NULL && target < g->vertexNumber&&

source < g->vertexNumber) {


(c

if (g->target != NULL)

memcpy(g->target +

target*(g->edgeNumber/g->vertexNumber),

13
g->target +

source*(g->edgeNumber/g->vertexNumber),

(g->edgeNumber/g->vertexNumber)*

ty
sizeof(int));

if (g->shift != NULL)

si
memcpy(g->shift +

target*(g->edgeNumber/g->vertexNumber),

g->shift +

r
source*(g->edgeNumber/g->vertexNumber),

ve
g->edgeNumber/g->vertexNumber)*

sizeof(int));

if (g->terminal != NULL)

g->terminal[target] = g->terminal[source];

if (g->suffixLink != NULL)
ni
g->suffixLink[target] = g->suffixLink[source];
U
if (g->length != NULL)

g->length[target] = g->length[source];

if (g->position != NULL)
ity

g->position[target] = g->position[source];

else
m

error("copyVertex");

4.1.5. Brute Force algorithm


)A

Main features
 no preprocessing stage;
 consistent additional room required;
 continuously moves the window by precisely 1 situation to one side;
 correlations should be possible in any request;
(c

 looking through the stage in O(mn) time intricacy;


 2n expected content characters correlations.

14
Description
The brute force algorithm comprises checking, at all situations in the content among 0 and n-m, if an event
of the example begins there. At that point, after each endeavor, it moves the example by precisely one

ty
situation to one side.
The animal power calculation requires no preprocessing stage, and a consistent additional room
notwithstanding the example and the content. During the looking-through stage, the content character
correlations should be possible at any request. The time intricacy of this looking through the stage is O(mn)

si
(while looking for am-1b in a for example). The normal number of text character examinations is 2n.

void BF(char *x, int m, char *y, int n) {


int i, j;

r
/* Searching */

ve
for (j = 0; j <= n - m; ++j) {
for (i = 0; i< m && x[i] == y[i + j]; ++i);
if (i>= m)
OUTPUT(j);
}
}

ni
This algorithm can be rewritten to give a more efficient algorithm in practice as follows:
#define EOS '\0'
U
void BF(char *x, int m, char *y, int n) {
char *yb;
/* Searching */
for (yb = y; *y != EOS; ++y)
if (memcmp(x, y, m) == 0)
ity

OUTPUT(y - yb);
}

4.1.6Naive Bayes
m

Naive Bayes is a probabilistic algorithm that is regularly utilized for order issues. Guileless Bayes is basic,
natural, but then performs shockingly well as a rule. For instance, spam channels Email application utilizes are
based on Naive Bayes. In this article, I'll clarify the reasons behind Naive Bayes and construct a spam channel
in Python. (For straightforwardness, I'll center around twofold arrangement issues)
)A

Theory
(c

15
Basic Idea
To make orders, we need to utilize X to anticipate Y. At the end of the day, given an information point
X=(x1,x2,…,xn), what the odd of Y being y. This can be revamped as the accompanying condition:

ty
si
This is the fundamental thought of Naive Bayes; the remainder of the calculation is truly more zeroing in
on the best way to compute the contingent likelihood above.

r
(a) Bayes Theorem

ve
So far Mr. Bayes does not commit the algorithm. Presently is his chance to hit one out of the ballpark. As
indicated by the Bayes Theorem:

ni
This is a somewhat straightforward change, yet it overcomes any issues between what we need to do and
U
what we can do. We can't get P(Y|X) straightforwardly, yet we can get P(X|Y) and P(Y) from the
preparation information. Here's an example:
ity
m
)A
(c

For this situation, X =(Outlook, Temperature, Humidity, Windy), and Y=Play. P(X|Y) and P(Y) can be
determined:

16
ty
Naive Bayes Assumption and Why

si
Theoretically, it isn't elusive P(X|Y). Notwithstanding, it is a lot harder truly as the quantity of highlights
develops.

r
ve
ni
7 parameters are needed for a 2-feature binary dataset
U
ity

Having this measure of boundaries in the model is unreasonable. To tackle this issue, a guileless
supposition is made. We imagine all highlights are free. What's the significance here?
m

Presently with the assistance of this innocent supposition (credulous because highlights are seldom free),
we can make grouping with many fewer boundaries:
)A
(c

17
ty
r si
ve
Naive Bayes needfewer parameters (4 in this case)

ni
U
This is serious. We changed the number of boundaries from outstanding to direct. This implies that Naive
Bayes handles high-dimensional information well.
ity

Categorical and Continuous Features

Categorical Data
For categorical structures, the approximation of P(Xi|Y) is easy.
m
)A

Calculate the likelihood of categorical structures

In any case, one issue is that if some component esteems never show (perhaps the absence of
information), their probability will be zero, which makes the entire back likelihood zero. One basic
approach to fix this issue is called Laplace Estimator: add nonexistent examples (generally one) to every
class
(c

Laplace Estimator
18
Continuous Data

ty
For continuous structures, there are fundamentally two selections: discretization and continuous Naive
Bayes.

Discretization works by breaking the information into all-out qualities. The least difficult discretization is
uniform binning, which makes receptacles with fixed reach. There are, obviously, more brilliant and more

si
convoluted ways, for example, Recursive negligible entropy dividing or SOM-based parceling.

r
ve
ni
U
ity

Discretizing Continuous Feature for Naive Bayes

The second choice is utilizing known distributions. If the structures are continuous, the Naive Bayes
m

algorithm can be written as:


)A

f is the chance density function

For example, on the off chance that we imagine the information and see a ringer bend like conveyance, it
is reasonable to make a presumption that the element is typically appropriated
(c

The initial step is ascertaining the mean and fluctuation of the component for a given mark y:

19
ty
variance accustomed by the degree of freedom

Now we can analyze the chance density f(x):

r si
ve
There are, of course, other distributions:

Although these strategies fluctuate in structure, the center thought behind them is the equivalent:
expecting the component fulfills a specific dispersion, assessing the boundaries of the circulation, and
afterward get the likelihood thickness work.

Strength and Weakness


ni
U
Strength
1. Although the guileless supposition that is once in a while evident, the calculation performs
shockingly well as a rule.
ity

2. Handles high-dimensional information well. Simple to parallelize and handles enormous


information well
3. Performs better than more muddled models when the informational collection is little.

Weakness
m

1. The assessed likelihood is regularly erroneous in light of innocent suspicion. Not ideal for relapse
use or likelihood assessment
2. At the point when information is bountiful, other more confounded models will, in general,
outflank Naive Bayes
)A

Summary
Naive Bayes uses the most key likelihood information and makes an innocent supposition that all highlights
are autonomous. Regardless of the effortlessness (some may say misrepresentation), Naive Bayes gives a
good exhibition in numerous applications.
(c

Presently you see how Naive Bayes functions, the time has come to attempt it in genuine tasks!

20
4.1.7Rabin-Karp Algorithm
The Rabin-Karp algorithm is a string-looking through a calculation that utilizations hashing to discover
designs in strings. A string is a theoretical information type that comprises a grouping of characters.

ty
Letters, words, sentences, and more can be addressed as strings.

String coordinating is a vital use of software engineering. If you've at any point looked through an archive
for a specific word, you have profited by string-coordinating innovation. String coordinating can likewise be
utilized to recognize copyright infringement by contrasting strings in record AA and strings in report BB.

si
Here are instances of strings in Python:

string = 'abcdefghijklmnopqrstuvwxyz'

r
sport = 'basketball'

ve
food = 'pizza'

The Rabin-Karp algorithm utilizes hash capacities and the moving hash strategy. A hash work is a capacity
that maps one thing to a worth. Specifically, hashing can plan information of subjective size to an
estimation of fixed size.

ni
Hashing - Hashing is an approach to relate values utilizing a hash capacity to plan a contribution to a yield.
The reason for a hash is to take an enormous piece of information and have the option to be addressed by
a more modest structure. Hashing is inconceivably helpful and however has a few destructions,
U
specifically, crashes. The categorize standard reveals to us that we can't place xx balls into x - 1x−1 cans
without having, at any rate, one basin with two balls in it. With hashing, we can consider the contributions
to the capacity of the balls and the pails as the yields of the capacity—a few data sources should
unavoidably share a yield worth, and this is known as a crash.
ity

A rolling hash permits a calculation to ascertain a hash an incentive without repeating the whole string. For
instance, while looking for a word in content, as the calculation shifts one letter to one side (as in the
movement for animal power), rather than computing the hash of the part of text [1:3] as it shifts from text
[0:2], the calculation can utilize a moving hash to do a procedure on the hash to get the new hash from the
old hash.
m

Example

Let's assume you have a hash work that maps each letter in the letter sent to a novel indivisible number,
)A

[\text{prime}_a \ldots \text{prime}_z][prime

… prime

z]. You are looking through a content TT of length nn for a word WW of length mm. Suppose that the
(c

content is the accompanying string "abcdefghijklmnopqrstuvwxyz" and the word is "fgh". How might we
utilize a moving hash to look for "fgh"?

21
At each stage, we comparison three letters of the text to the three letters of the word. At the first step, we
can multiply primeaXprimeb X primecto get the hash of the string “abc”.

Similarly, the hash value of “fgh” is primeaXprimeb X primec,Comparison the hash value of “abc” with “fgh”

ty
and see that the numbers do not match.

To move onto the next iteration, we need to calculate the hash value of “bcd”. How can we do this? We
could compute primeb X primecX primed but we would be repetition work we’ve already done!

si
Rabin-Karp uses only simple developments and additions in its continuing hash implementation.

All the tasks are done modulo an indivisible number pp to try not to manage huge numbers. For

r
intelligibility and straightforwardness, the modulo pp has been barred, yet the counts hold when modulo p
is available.

ve
The Rabin-Karp Rolling Hash

H=c1ak−1+c2ak−2+c3ak−3+⋯+cka0

ni
aa is a constant, c_1 \ldots c_kc1…ck is the input characters, and kk is the number of characters there are
in the string we are comparing (this is the length of the word).
U
At each step, there are a constant number of operations (one subtraction, one multiplication, and one
addition) and these are done in constant time, O(1)O(1). When checking if a substring of the text is equal
to the word, the algorithm just needs to see if the hash values are equal. This can also be done in constant
time. Because the algorithm performs O(1)O(1) operations on each character of the text and there are nn
characters in the text, this step takes O(n)O(n) time.
ity
m
)A
(c

22
Unit Outcomes :

ty
At the end of the unit, you will learn
 Concept of Knuth-Morris-Pratt Algorithm
 Concept of Simon's Algorithm
 Come to Know about Reverse Colussi algorithm.

si
 Approximation Algorithms: Vertex Cover
 What is the Purpose of Set Cover Problem and its Uses?

r
ve
4.2.1 Knuth-Morris-Pratt Algorithm

Overview

ni
We next depict a more effective algorithm, distributed by Donald E. Knuth, James H. Morris, and Vaughan
R. Pratt, 1977 in: "Quick Pattern Matching in Strings." In SIAM Diary on Computing, 6(2): 323–350. To
outline the thoughts of the calculation, we consider the accompanying model:
U
T=xyxxyxyxyyxyxyxyyxyxyxxy
ity

And
m

P = xyxyyxyxyxx
)A

At an undeniable level, the KMP algorithm is like the naive algorithm: it thinks about movements all
together from 1 to n−m, and decides whether the example matches at that move. The distinction is that
the KMP calculation utilizes data gathered from halfway matches of the example also, text to skirt moves
that are ensured not to bring about a match.
(c

Assume that, beginning with the example adjusted under the content at the furthest left end, we over and
over "slide" the example to one side and endeavor to coordinate it with the content.

23
How about we take a gander at certain instances of how sliding should be possible. The content and
example are incorporated in Figure 1, with numbering, to make it simpler to follow.

ty
r si
ve
1. Think about the circumstance when P[1, . . . , 3] is effectively coordinated with T[1, . . . , 3].
Weat that point discover a confusion: P[4] 6= T[4]. In light of our insight that P[1, . . . , 3] =
T[1, . . . , 3], and disregarding images of the example and text after position 3, what can we

somewhere in the range of P[2] and T[4].


ni
find about where a potential match maybe? For this situation, the calculation slides the
example 2 situations to one side so P[1] is agreed with T[3]. The following correlation is

2. Since P[2] 6= T[4], the example slides to the privilege once more, so the following
examination is somewhere in the range of P[1] and T[4].
U
3. At a later point, P[1, . . . , 10] is coordinated with T[6, . . . , 15]. At that point a jumble is
found: P[11] 6= T[16]. In view of the way that we know T[6, . . . , 15] = P[1, . . . , 10] (also,
disregarding images of the example after position 10 and images of the content after
position 15), we can tell that the main conceivable move that may bring about a match is
12. Along these lines, we will slide the example right, and next find out if P[1, . . . , 11] =
ity

T[13, . . . , 23]. Along these lines, the following examinations done are P[4] == T[16], P[5]
== T[17], P[6] == T[18], etc, insofar as matches are found.

Sliding rule
m

We need to make exactly precisely how to actualize the sliding guideline. The accompanying
documentation is valuable. Let S = s1s2 . . . sk be a string. Each line of the structure s1 . . . si

, 1 ≤ I ≤ k is called a prefix of s. Additionally, we characterize the unfilled string (containing no images) to be


a prefix of s. A prefix s 0 of s is an appropriate prefix if s 0 6= s. Likewise, each line of the structure si . . . sk,
)A

1 ≤ I ≤ k is known as a postfix of s. Additionally, the unfilled string (containing no images) is an addition of s.


A postfix s 0 of s is an appropriate postfix if s 0 6= s. Assume that P[1, . . . , q] is coordinated with the
content T[i−q+1, . . . , i] and a jumble at that point happens: P[q+1] 6= T[i+1]. At that point, slide the
example right so the longest conceivable legitimate prefix of P[1, . . . , q] that is additionally a postfix of
P[1, . . . , q] is presently lined up with the content, with the last image of this prefix adjusted at T[i]. If π(q)
is the number with the end goal that P[1, . . . , π(q)] is the longest legitimate prefix that is additionally a
(c

postfix of P[1, . . . , q], at that point the example slides so that

P[1, . . . , π(q)] is lined up with T[i − π(q) + 1, . . . , i].

The KMP algorithm precomputes the qualities π(q) and stores them in a table π[1, . . . , m].

24
We will clarify later how this is finished. A portion of the qualities π(q) for our model are given in Table 1.
Would you be able to sort out the thing are the excess qualities?

In outline, a "progression" of the KMP algorithm gains ground in one of two different ways. Previously the

ty
progression, assume that P[1, . . . , q] is now coordinated with T[i − q + 1, . . . , i].

• If P[q + 1] = T[i+ 1], the length of the match is broadened, except if q + 1 = m, in which case we have
discovered a total match of the example in the content.

si
• If P[q + 1] 6= T[i + 1], the example slides to one side.

Regardless, progress is made. The calculation rehashes such strides of progress until the

end of the content is reached. Pseudocode for the KMP calculation is given in Algorithm 1.

r
ve
Running Time ni
U
Each time through the circle, possibly we increment I or we slide the example right. Both of these
occasions can happen all things considered n times; thus the recurrent circle has executed all things
considered 2n times. The expense of every emphasis of the recurrent circle is O(1). Along these lines, the
running time
ity

is O(n), expecting that the qualities π(q) are now figured.

Computing the values π(q)


We currently depict how to register the table π[1, . . . , m]. Note that π(1) is consistently equivalent to 0.
m

Assume that we have processed π[1, . . . , i] and we need to process π(i + 1). At first we know P[1, . . . , π(i)]
is the longest appropriate prefix of P[1, . . . , i] that is likewise an addition of P[1, . . . , i]. Let q = π(i). On the
off chance that P[i + 1] = P[q + 1], it should be that π(i + 1) = q + 1. Else, we set q = π(q) and rehash the test.
We proceed until q = 0, at which point we just set π(i + 1) = 0. Some pseudocode is given in Algorithm 2;
)A

you should fill in the rest as an activity. Calculation 2 runs in straight an ideal opportunity for much similar
explanation as does the KMP algorithm. Subsequently, the entire KMP calculation runs in time O(n + m),
which is superior to the straightforward quadratic time algorithm.
(c

25
ty
r si
ve
ni
U
ity
m
)A

1. Construct a pattern of length 10, over the alphabet {x, y}, such that the number of iterations of the while
loop of Algorithm 2, when i = 10, is as large as possible.

2. Suppose that the pattern P and the text T are strings over an alphabet of size 2. In this case, if you know
that P[q + 1] 6= T[i + 1], it is possible to tell by looking only at the pattern (specifically at P[q + 1]), what is
the symbol of the text at position i + 1. Such knowledge could be used (in some cases) to increase the
(c

amount by which the pattern slides, thus speeding up the algorithm. How might you change the algorithm
to take advantage of this? How does this affect the amount of memory needed by the algorithm?

26
ty
r si
ve
ni
U
ity
m
)A

Appendix: a review of Big-Oh notation Table 2 summarizes big-oh notation, which we use to describe the
asymptotic running time of algorithms.
(c

27
(c
)A
m
ity

28
U
ni
ve
r si
ty
ty
r si
ve
ni
U
ity
m
)A

4.2.2 Simon's Algorithm


(c

Simon's algorithm, first presented in Reference [1], was the primary quantum calculation to show an
outstanding velocity up versus the best old-style calculation in tackling a particular issue. This propelled
the quantum calculations dependent on the quantum Fourier change, which is utilized in the most well-
known quantum calculation: Shor's considering algorithm.

29
a. Simon's Problem
We are given an obscure Blackbox function f, which is destined to be it is possible that balanced ( 1 : 1 ) or
two-to-one ( 2 : 1 ), where coordinated and two-to-one capacities have the accompanying properties:

ty
 one-to-one: maps just one unique output for each input. A sample with a function that
takes 4 inputs is:

f(1)→1,f(2)→2,f(3)→3,f(4)→4

si
 two-to-one: maps just two inputs to every sole output. A sample with a function that takes
4 inputs is:

f(1)→1,f(2)→2,f(3)→1,f(4)→2f(1)→1,f(2)→2,f(3)→1,f(4)→2

r
This two-to-one mapping is according to a hidden bitstring, bb, where:

ve
given x1,x2:f(x1)=f(x2)it is guaranteed :x1⊕x2=bgiven x1,x2:f(x1)=f(x2)it is guaranteed :x1⊕x2=b
Given this Blackbox ff, how quickly can we determine if ff is one-to-one or two-to-one? Then, if ff goes out
to be two-to-one, how quickly can we control bb? As it turns out, both cases ulcer down to the same
problem of finding bb, where a bit string of b=000...b=000... signifies the one-to-one ff.

b. Simon's Algorithm
ni
U
Classical Solution
Traditionally, on the off chance that we need to know what b is with 100% sureness for guaranteed f, we
need to check up to 2 n − 1 + 1

inputs, where n is the number of pieces in the info. This implies examining only 50% of the relative
ity

multitude of potential contributions until we discover two instances of a similar yield. Similar to the
Deutsch-Jozsa issue, on the off chance that we luck out, we could tackle the issue with our initial two
attempts. In any case, on the off chance that we end up getting an f that is coordinated, or get truly
unfortunate with an f that is two-to-one, at that point we're left with the full 2n − 1 + 1. There are known
calculations that have a lower bound of Ω ( 2 n / 2 ) yet, as a rule, the intricacy develops dramatically with
m

n.

Quantum Solution
)A

The quantum circuit that implements Simon's algorithm is shown underneath


(c

30
ty
r si
ve
ni
U
Where the query function, Qf performances on two quantum registers as:
ity

|x⟩|a⟩→|x⟩|a⊕f(x)⟩|x⟩|a⟩→|x⟩|a⊕f(x)⟩

In the exact case that the second list is in the state |0⟩=|00…0⟩|0⟩=|00…0⟩ we have:
x⟩|0⟩→|x⟩|f(x)⟩|x⟩|0⟩→|x⟩|f(x)⟩
m

The algorithm involves the following steps.

1. Two n-qubit input records are reset to the zero states:

|ψ1⟩=|0⟩⊗n|0⟩⊗n
)A

2. Apply a Hadamard transform to the first


register:|ψ2⟩=1√ 2n ∑x∈{0,1}n|x⟩|0⟩⊗n|ψ2⟩=12n∑x∈{0,1}n|x⟩|0⟩⊗n
3. Apply the query function QfQf:|ψ3⟩=1√ 2n ∑x∈{0,1}n|x⟩|f(x)⟩|ψ3⟩=12n∑x∈{0,1}n|x⟩|f(x)⟩
4. Measure the second register. A certain value of f(x)f(x) will be observed. Because of the setting of
the problem, the observed value f(x)f(x) could correspond to two possible
(c

inputs: xx and y=x⊕by=x⊕b. Therefore the first register


becomes:|ψ4⟩=1√ 2 (|x⟩+|y⟩)|ψ4⟩=12(|x⟩+|y⟩)where we omitted the second register since it has
been measured.

31
5. Apply Hadamard on the first
register:|ψ5⟩=1√ 2n+1 ∑z∈{0,1}n[(−1)x⋅z+(−1)y⋅z]|z⟩|ψ5⟩=12n+1∑z∈{0,1}n[(−1)x⋅z+(−1)y⋅z]|z⟩
6. Measurement of the first list will give an output only

ty
if:(−1)x⋅z=(−1)y⋅z(−1)x⋅z=(−1)y⋅zwhichmeans:x⋅z=y⋅zx⋅z=(x⊕b)⋅zx⋅z=x⋅z⊕b⋅zb⋅z=0 (mod 2)

A string zz will be measured, whose internal product with b=0 b=0. Thus, repetition the
algorithm ≈n≈n times, we will be able to obtain n different values of z and the next system of the equation
can be written:

si
b⋅z1=0b⋅z2=0⋮b⋅zn=0

r
From which b can be determined, for example by Gaussian removal.

ve
Along these lines, in this specific issue, the quantum calculation performs dramatically fewer strides than
the old-style one. By and by, it very well may be hard to imagine a use of this calculation (although it
propelled the most acclaimed calculation made by Shor) however it addresses the primary evidence that
there can be a remarkable accelerate in tackling a particular issue by utilizing a quantum PC as opposed to
an old-style one.

2. Example ni
U
Let's see the example of Simon's algorithm for 2 qubits with the secret string b=11b=11, so
that f(x)=f(y)f(x)=f(y) if y=x⊕by=x⊕b. The quantum circuit to solve the problem is:
ity
m
)A
(c

32
ty
r si
ve
ni
U
1. Two 22-qubit input registers are initialized to the zero
state:|ψ1⟩=|00⟩1|00⟩2|ψ1⟩=|00⟩1|00⟩2
2. Apply Hadamard gates to the qubits in the first
register:|ψ2⟩=12(|00⟩1+|01⟩1+|10⟩1+|11⟩1)|00⟩2|ψ2⟩=12(|00⟩1+|01⟩1+|10⟩1+|11⟩1)|00⟩2
ity

3. For the string b=11b=11, the query function can be implemented


as Qf=CX1a2aCX1a2bCX1b2aCX1b2bQf=CX1a2aCX1a2bCX1b2aCX1b2b (as seen in the circuit
diagram
above):|ψ3⟩=12(|00⟩1|0⊕0⊕0,0⊕0⊕0⟩2+|01⟩1|0⊕0⊕1,0⊕0⊕1⟩2+|10⟩1|0⊕1⊕0,0⊕1
⊕0⟩2+|11⟩1|0⊕1⊕1,0⊕1⊕1⟩2)|ψ3⟩=12(|00⟩1|0⊕0⊕0,0⊕0⊕0⟩2+|01⟩1|0⊕0⊕1,0⊕0
⊕1⟩2+|10⟩1|0⊕1⊕0,0⊕1⊕0⟩2+|11⟩1|0⊕1⊕1,0⊕1⊕1⟩2)Thus:|ψ3⟩=12(|00⟩1|00⟩2+|01⟩
m

1|11⟩2+|10⟩1|11⟩2+|11⟩1|00⟩2)|ψ3⟩=12(|00⟩1|00⟩2+|01⟩1|11⟩2+|10⟩1|11⟩2+|11⟩1|00⟩2)
4. We measure the second register. With 50%50% probability, we will see
either |00⟩2|00⟩2 or |11⟩2|11⟩2. For the sake of the example, let us assume that we
see |11⟩2|11⟩2. The state of the system is
)A

then|ψ4⟩=1√ 2 (|01⟩1+|10⟩1)|ψ4⟩=12(|01⟩1+|10⟩1)where we omitted the second register


since it has been measured.
5. Apply Hadamard on the first
register|ψ5⟩=12√ 2 [(|0⟩+|1⟩)⊗(|0⟩−|1⟩)+(|0⟩−|1⟩)⊗(|0⟩+|1⟩)]=12√ 2 [|00⟩−|01⟩+|10⟩−|11⟩
+|00⟩+|01⟩−|10⟩−|11⟩]=1√ 2 (|00⟩−|11⟩)|ψ5⟩=122[(|0⟩+|1⟩)⊗(|0⟩−|1⟩)+(|0⟩−|1⟩)⊗(|0⟩+|1⟩
)]=122[|00⟩−|01⟩+|10⟩−|11⟩+|00⟩+|01⟩−|10⟩−|11⟩]=12(|00⟩−|11⟩)
(c

6. Measuring the first register will give either |00⟩|00⟩ or |11⟩|11⟩ with equal probability.
7. If we see |11⟩|11⟩, then:b⋅11=0b⋅11=0which tells us that b≠01b≠01 or 1010, and the two
remaining potential solutions are b=00b=00 or b=11b=11. Note that b=00b=00 will always be a
trivial solution to our simultaneous equations. If we repeat steps 1-6 many times, we would
only measure |00⟩|00⟩ or |11⟩|11⟩ asb⋅11=0b⋅11=0b⋅00=0b⋅00=0are the only equations that
33
satisfy b=11b=11. We can verify b=11b=11 by picking a random input (xixi) and
checking f(xi)=f(xi⊕b)f(xi)=f(xi⊕b). For
example:01⊕b=1001⊕b=10f(01)=f(10)=11f(01)=f(10)=11

ty
Qiskit Implementation
We now implement Simon's algorithm for an example with 33-qubits and b=110b=110.

si
# importing Qiskit

from qiskit import IBMQ, BasicAer

r
from qiskit.providers.ibmq import least_busy

ve
from qiskit import QuantumCircuit, execute

# import basic plot tools

from qiskit.visualization import plot_histogram

from qiskit_textbook.tools import simon_oracle


ni
U
ity
m

The function simon_oracle (imported above) creates a Simon oracle for the bitstring b. This is given
without clarification.
)A

In Qiskit, sizes are only allowed at the end of the quantum circuit. In the case of Simon's algorithm, we do
not care about the output of the second register, and will only measure the first register.
(c

34
b = '110'

ty
n = len(b)

simon_circuit = QuantumCircuit(n*2, n)

si
# Apply Hadamard gates before querying the oracle

simon_circuit.h(range(n))

r
# Apply barrier for visual separation

ve
simon_circuit.barrier()

simon_circuit += simon_oracle(b)

# Apply barrier for visual separation

simon_circuit.barrier()
ni
U
# Apply Hadamard gates to the input register

simon_circuit.h(range(n))
ity

# Measure qubits

simon_circuit.measure(range(n), range(n))
m

simon_circuit.draw()
)A
(c

35
ty
r si
Table 1

ve
a. Experiment with Simulators
We can run the above circuit on the simulator

# use local simulator

backend = BasicAer.get_backend('qasm_simulator')

shots = 1024
ni
U
results = execute(simon_circuit, backend=backend, shots=shots).result()

counts = results.get_counts()

plot_histogram(counts)
ity
m
)A
(c

36
ty
r si
ve
ni
U
Since we know b already, we can verify these results do satisfy b⋅z=0
ity

(mod2):

# Calculate the dot product of the results

def bdotz(b, z):


m

accum = 0

for i in range(len(b)):

accum += int(b[i]) * int(z[i])


)A

return (accum % 2)

for z in counts:

print( '{}.{} = {} (mod 2)'.format(b, z, bdotz(b,z)) )


(c

try

110.000 = 0 (mod 2)

110.001 = 0 (mod 2)
37
110.111 = 0 (mod 2)

110.110 = 0 (mod 2)

Using these results, we can recover the value of b=110 by solving this set of simultaneous

ty
equations. For example, say we first measured 001, this tells us:
b⋅001=0(b2⋅0)+(b1⋅0)+(b0⋅1)=0(b2⋅0)+(b1⋅0)+(b0⋅1)=0b

0=0 If we next measured 111, we have: b⋅111=0

si
(b2⋅1)+(b1⋅1)+(0⋅1)=0(b2⋅1)+(b1⋅1)=0

Which tells us either:

r
b2=b1=0,

b=000

ve
or

b2=b1=1,b=110Of which b=110is the non-trivial solution to our simultaneous equations. We can solve
these problems in general using Gaussian elimination, which has a run time of

O(n3) .

b. Experiment with Real Devices


ni
U
The circuit in section 3a uses

2n=6qubits, while at the time of writing many IBM Quantum devices only have 5 qubits. We will
run the same code, but instead using
ity

b=11 as in the example in section 2, requiring only 4 qubits.

b = '11'

n = len(b)

simon_circuit_2 = QuantumCircuit(n*2, n)
m

# Apply Hadamard gates before querying the oracle

simon_circuit_2.h(range(n))
)A

# Query oracle

simon_circuit_2 += simon_oracle(b)

# Apply Hadamard gates to the input register

simon_circuit_2.h(range(n))
(c

# Measure qubits

simon_circuit_2.measure(range(n), range(n))

38
simon_circuit_2.draw()

ty
r si
ve
Table 2
ni
U
This circuit is slightly different from the circuit shown in Table 1. The outputs are different, but the input
collisions are the same, i.e. both have the property that f(x)=f(x⊕11)f(x)=f(x⊕11).
ity

# Load our saved IBMQ accounts and get the least busy backend device with less than or equal to 5
qubits

IBMQ.load_account()

provider = IBMQ.get_provider(hub='ibm-q')
m

backend = least_busy(provider.backends(filters=lambda x: x.configuration().n_qubits>= n and

not x.configuration().simulator and x.status().operational==True))

print("least busy backend: ", backend)


)A

# Execute and monitor the job

from qiskit.tools.monitor import job_monitor

shots = 1024
(c

job = execute(simon_circuit_2, backend=backend, shots=shots, optimization_level=3)

job_monitor(job, interval = 2)

39
# Get results and plot counts

device_counts = job.result().get_counts()

plot_histogram(device_counts)

ty
least busy backend: ibmq_burlington

si
Job Status: the job has successfully run

r
ve
ni
U
ity
m

# Calculate the dot product of the results

def bdotz(b, z):

accum = 0
)A

for i in range(len(b)):

accum += int(b[i]) * int(z[i])

return (accum % 2)
(c

print('b = ' + b)

for z in device_counts:

print( '{}.{} = {} (mod 2) ({:.1f}%)'.format(b, z, bdotz(b,z), device_counts[z]*100/shots))


40
ty
b = 11

11.00 = 0 (mod 2) (49.2%)

si
11.11 = 0 (mod 2) (32.0%)

11.10 = 1 (mod 2) (9.1%)

r
ve
11.01 = 1 (mod 2) (9.7%)

As we can see, the most significant results are those for which b⋅z=0b⋅z=0 (mod 2). The other results are
erroneous but have a lower probability of occurring. Assuming we are unlikely to measure the erroneous
results, we can then use a classical computer to recover the value of bb by solving the linear system of
equations. For this n=2n=2 case, b=11b=11.

ni
The above example and implementation of Simon's algorithm are specifically for specific values of bb. To
extend the problem to other secret bit strings, we need to discuss the Simon query function or oracle in
U
more detail.
The Simon algorithm deals with finding a hidden bitstring b∈{0,1}nb∈{0,1}n from an oracle fbfb that
satisfies fb(x)=fb(y)fb(x)=fb(y) if and only if y=x⊕by=x⊕b for all x∈{0,1}nx∈{0,1}n. Here, the ⊕⊕ is the
bitwise XOR operation. Thus, if b=0…0b=0…0, i.e., the all-zero bitstring, then fbfb is a 1-to-1 (or,
permutation) function. Otherwise, if b≠0…0b≠0…0, then fbfb is a 2-to-1 function.
ity

In the algorithm, the oracle receives |x⟩|0⟩|x⟩|0⟩ as input. With regards to a predetermined bb, the oracle
writes its output to the second register so that it transforms the input to |x⟩|fb(x)⟩|x⟩|fb(x)⟩ such
that f(x)=f(x⊕b)f(x)=f(x⊕b) for all x∈{0,1}nx∈{0,1}n.
Such a black-box function can be realized by the following procedures.

 Copy the content of the first register to the second register.|x⟩|0⟩→|x⟩|x⟩|x⟩|0⟩→|x⟩|x⟩


m

 (Creating 1-to-1 or 2-to-1 mapping) If bb is not all-zero, then there is the least index jj so
that bj=1bj=1. If xj=0xj=0, then XOR the second register with bb. Otherwise, do not change the
second register.|x⟩|x⟩→|x⟩|x⊕b⟩ if xj=0 for the least index j|x⟩|x⟩→|x⟩|x⊕b⟩ if xj=0 for the
least index j
)A

 (Creating random permutation) Randomly permute and flip the qubits of the second
register.|x⟩|y⟩→|x⟩|fb(y)⟩|x⟩|y⟩→|x⟩|fb(y)⟩
(c

4.2.3 Reverse Colussi algorithm

The Reverse Colussi Algorithm is an algorithm that takes care of the string coordinating issue and
it is in the soul of the first Colussi Algorithm.
41
The Main Points of the Reverse Colussi Algorithm
It changes the awful character rule from coordinating one character to coordinating a couple of
characters.

ty
Turn around Colussi calculation partitions the situation into extraordinary position and non-
exceptional position. Extraordinary position permitsa more modest number of the hop. The
Reverse Colussi Algorithm measures the unique position first.

si
Note that the Colussi Algorithm doesn't consider the entirety of the positions where the prefix
work expects esteem - 1.

That this should be possible can be seen by the accompanying actuality: The position where prefix

r
work accepts - 1 permits the biggest number of steps to move.

ve
Accordingly, the Colussi Algorithm analyzes all positions which permit a more modest number of
steps of the move which is a protected activity.
In this Reverse Colussi Algorithm, we characterize a few focuses which are extraordinary and a
few focuses that are not uncommon.

focuses. ni
Uncommon focuses permit a more modest number of steps to move than non-exceptional

In this manner, in the Reverse Colussi Algorithm, we look at the exceptional positions first.
U
Ti is the ith character in T (1≦i≦n)
Ti is the ith character in T (1≦i≦n). Pjis the jth character in P (1≦j≦m).
The depraved character law is like Rule 2-1, Character Identical Rule.
ity

Rule 2-1: Character Matching Rule(A Special Version of Rule 2)


For any character x in T, find the nearest x in P which is to the left of x in T.
m

4.2.4 Approximation Algorithms: Vertex Cover


There are a few advancement issues like Minimum Spanning Tree (MST), Min-Cut,
MaximumMatching, in which you can tackle this precisely and productively in polynomial time.
)A

However, numerous down-to-earth huge advancement issues are NP-Hard, in which we are
probably not going to discover a calculation that tackles the issue precisely in polynomial time.
Instances of the standard NP-Hard issues with a portion of their short portrayal are as following:

• Traveling Salesman Problem (TSP) - finding a base expense visit through all urban
communities
(c

• Vertex Cover - discover the least arrangement of the vertex that covers all the edges in the
chart (we will portray this in more detail)
• Max Clique
• Set Cover - locate a littlest size cover set that covers each vertex

42
• Shortest Superstring - given a bunch of string, locate the littlest subset of strings that
contain determined words
These are NP-Hard issues, i.e., If we could tackle any of these issues in polynomial time, at that

ty
point P = NP. An illustration of an issue that isn't known to be either NP-Hard: Given 2 diagrams of
nvertices, would they say they are the equivalent up to the stage of vertices? This is called Graph
Isomorphism. As of now, there is no known polynomial careful calculation for NP-Hard issues.
Notwithstanding, it could be conceivable to locate a close ideal arrangement in polynomial time.

si
A calculation that runs in polynomial time and yields an answer near the ideal arrangement is
called a guess calculation. We will investigate polynomial-time guess calculations for a few NP-
Hard issues.

r
Definition: Let P be a minimization issue, and I be a case of P. Leave An alone a calculation that

ve
finds doable answer for occurrences of P. Let A(I) is the expense of the arrangement returned by
A for the occasion I and OP T(I) is the expense of the ideal arrangement (minimum) for me. At that
point, An is said to be a α-estimate calculation for P if
∀I, A(I) OP T(I) ≤ α
where α ≥ 1

ni
Notice that since this is a base streamlining issue A(I) ≥ OP T(I). Accordingly, 1-estimation
calculation creates an ideal arrangement, a guess calculation witH an enormous α may restore an
U
answer that is a lot more terrible than ideal. So the more modest α is, the better nature of the
estimation the calculation produces
For instance size n, the most common approximation classes are: α = O(n c ) for c < 1, e.g. Clique.
α = O(log n), e.g. Set Cover.
ity

α = O(1), e.g. Vertex Cover.

α = 1 + ε, ∀ε > 0, this is called Polynomial-time Approximation Scheme (PTAS), e.g. certain


m

scheduling problems. α = 1 + ε in time that is polynomial in (n, 1ε), this is called Fully Polynomial-
time approximation. Scheme (FPTAS), e.g. Knapsack, Subset Sum.
)A

Approximation Algorithm for Vertex Cover


Given a G = (V, E),

find a minimum subset C ⊆ V, such that C “covers” all edges in E, i.e., every edge ∈ E is incident to
(c

at least one vertex in C.

43
ty
r si
ve
Figure 1: An instance of Vertex Cover problem. An optimal vertex cover is {b, c, e, i,
g}.

ni
U
Algorithm 1: Approx-Vertex-Cover(G)

Algorithm 1: Approx-Vertex-Cover(G)
ity

1 C←∅
2 while E 6= ∅
pick any {u, v} ∈ E C ← C ∪ {u, v}
delete all edges incident to either u or v
m

return C
As it turns out, this is the best approximation algorithm known for vertex cover. It is an open
)A

problem to either do better or prove that this is a lower bound. Observation: The set of edges
picked by this algorithm is matching, no 2 edges touch each other (edges disjoint). It is a maximal
matching. We can then have the following alternative description of the algorithm as follows. Find
a maximal matching M Return the set of end-points of all edges ∈ M.
(c

Analysis of Approximation Algorithm for VC


Claim 1: This algorithm gives a vertex cover Proof: Every edge ∈ M is covered. If an edge, e /∈ M is
not covered, then M ∪ {e} is a matching, which contradicts to maximality of M.
44
Claim 2: This vertex cover has size ≤ 2×minimum size (optimal solution) Proof:

ty
r si
ve
Figure 2: Another instance of Vertex Cover and its optimal cover shown in blue
squares

ni
The optimum vertex cover must cover every edge in M. So, it must include at least one of the
endpoints of each edge ∈ M, where no 2 edges in M share an endpoint. Hence, optimum vertex
U
cover must have a size
OP T(I) ≥ |M|
But the algorithm A return a vertex cover of size 2|M|, so ∀I we have
ity

A(I) = 2|M| ≤ 2 × OP T(I)


implying that A is a 2-approximation algorithm.

We realize that the ideal arrangement is unmanageable (else we can presumably concoct a
calculation to discover it). Along these lines, we can't make an immediate correlation between
m

calculation An's answer and the ideal arrangement. Be that as it may, we can demonstrate Claim 2
by making aberrant examinations of An'sthe answer and the ideal arrangement with the size of
the maximal coordinating, |M|. We regularly utilize this strategy for guess verifications for NP-
Hard issues, as you will see later on
)A

But is α = 2 a tight bound for this algorithm? Is it possible that this algorithm can do better than 2-
approximation? We can show that 2-approximation is a tight bound by a tight example: Tight
Example: Consider a complete bipartite graph of n black nodes on one side and n red nodes on
the other side, denoted Kn,n. Notice that the size of any maximal matching of this graph equals n,
(c

|M| = n

45
ty
r si
ve
so the Approx-Vertex-Cover(G) algorithm returns a cover of size 2n.
A(Kn,n) = 2n
But, clearly the optimal solution = n.
OP T(Kn,n) = n
ni
U
Note that a tight model requirement to have the self-assertively enormous size to demonstrate
snugness of analysis, otherwise we can simply utilize savage power for little charts and A for huge
ones to get a calculation that dodge that tightly bound. Here, it shows that this calculation gives
2-guess regardless of what size n is
ity

Last time: α-approximation algorithms


Definition: For a minimization (or maximization) problem P, A is an α-approximation algorithm if
for every instance I of P, A(I) OPT(I) ≤ α (or OPT(I) A(I) ≤ α).
m

Last time we saw a 2-approximation for Vertex Cover [CLRS 35.1]. Today we will see a 2-
approximation for the Traveling Salesman Problem (TSP) [CLRS 35.2].
)A

Example: A salesman wants to visit each of n cities exactly once each, minimizing total distance
traveled, and returning to the starting point.

Traveling Salesman Problem (TSP). Input: a complete, undirected graph G = (V, E), with edge
weights (costs) w : E −→ R +, and where |V | = n. Output: a tour (a cycle that visits all n vertices
exactly once each, and returning to starting vertex) of minimum cost
(c

46
Inapproximability Result for General TSP
Theorem: For any constant k, it is NP-hard to approximate TSP to a factor of k.

ty
Proof: Recall that Hamiltonian Cycle (HC) is NP-complete (Sipser). The definition of HC is as
follows. Input: an undirected (not necessarily complete) graph G = (V, E). Output: YES if G has a
Hamiltonian cycle (or tour, as defined above), NO otherwise. Suppose A is a k-approximation
algorithm for TSP. We will use A to solve HC in polynomial time, thus implying P = NP

r si
ve
ni
Figure 4: Example of construction of G0 from G for HC-to-TSP-approximation reduction.
Given the input G = (V, E) to HC, we modify it to construct the graph G0 = (V 0, E0 ) and weight
U
function w as input to A as follows (Figure 4). Let all edges of G have weight 1. Complete the
resulting graph, letting all new edges have weight L for some large constant L. The algorithm for
HC is then:
Algorithm 2: HC-Reduction(G) Construct G0
ity

1 as described above. if A(G0


2 ) returns a ‘small’ cost tour (≤ kn) then
3 return YES if A(G0
m

4 ) returns a ‘large’ cost tour (≥ L) then

5 return NO It then remains to choose our constant L ≥ kn, to ensure that the 2 cases are
differentiated
)A

Approximation Algorithm for Metric TSP

Definition. A metric space is a pair (S, d), where S is a set and d : S 2 −→ R + is a distance function
that satisfies, for all u, v, w ∈ S, the following conditions.
(c

1. d(u, v) = 0
2. d(u, v) = d(v, u)
3. d(u, v) + d(v, w) ≥ d(u, w) (triangle inequality)
47
For a complete graph G = (V, E) with cost c : E −→ R +, we say “the costs form a metric space” if (V,
cˆ) is a metric space, where ˆc(u, v) := c({u, v}). Given this restriction (in particular, the addition of
the triangle inequality condition), we have the following simple approximation algorithm for TSP.

ty
r si
ve
ni
U
ity
m
)A
(c

48
Analysis of Approximation Algorithm for Metric TSP On an instance I of TSP, let us compare A(I) to
OPT(I), via the intermediate value MST(I) (the weight of the MST). Claim: Comparing A(I) to

ty
MST(I): A(I) ≤ 2 × MST(I). Proof: Let σ be a full walk along the MST in pre-order (that is, we revisit
vertices as we backtrack through them). In Figure 5, σ would be the path along with all the
arrows, wrapping around the entire MST, namely, 1 → 2 → 1 → 3 → 4 → 5 → 4 → 6 → 4 → · · · →

si
1. It is clear that cost(σ) = 2 × MST(I). Now, the tout output by A is a subsequence of the full
walk σ, so by the triangle inequality: A(I) ≤ cost(σ) = 2 × MST(I) proving our claim.

Claim: Comparing OPT(I) to MST(I): OPT(I) ≥ MST(I). Proof: Let σ ∗ be an optimum tour,

r
that is, cost(σ ∗ ) = OPT(I). Deleting an edge from σ ∗ results in a spanning tree T, whose

ve
cost by definition is the cost(T) ≥ MST(I). Hence, OPT(I) = cost(σ ∗ ) ≥ cost(T) ≥ MST(I)
as required.
Combining these 2 claims, we get: A(I) ≤ 2 × MST(I) ≤ 2 × OPT(I) Hence, A is a 2-approximation
algorithm for (Metric) TSP.

Concluding Remarks ni
U
It is possible (and relatively easy) to improve the approximation factor to 3/2 for Metric TSP. Note
that in the original wording of the problem, with the salesman touring cities, the cost (distance)
function is even more structured than just a metric. Here, we have Euclidean distance, and as it
turns out, this further restriction allows us to get a PTAS, although this is a more difficult
algorithm.
ity

4.2.5 Set Cover Problem


What is the set cover problem?
m

Idea: “You must select a minimum number [of any size set] of these sets so that the sets you have
picked contain all the basics that are limited in any of the sets in the input (Wikipedia).”
Moreover, you want to minimize the cost of the sets.
)A

Input:
Ground elements, or Universe = { 21 ,...,, uuuU n } Subsets 21 ,...,,
k ⊆ USSS Costs k ,...,, ccc 21 Goal:
Find a set I ⊆ { ,...,2,1 m} that minimizes ∑∈Ii i c ,
(c

such that USiIi = ∈U .


(note: in the un-weighted Set Cover Problem, = 1 for all j)

49
Why is it useful?
It was one of Karp’s NP-complete problems, shown to be so in 1972. Other applications: edge
covering, vertex cover Interesting example: IBM finds computer viruses (Wikipedia) elements-

ty
5000 known viruses sets- 9000 substrings of 20 or more consecutive bytes from viruses, not found
in ‘good’ code A set cover of 180 was found. It suffices to search for these 180 substrings to verify
the existence of known computer viruses. Another example: Consider General Motors needs to
buy a certain amount of varied supplies and some suppliers offer various deals for different

si
combinations of materials (Supplier A: 2 tons of steel + 500 tiles for $x; Supplier B: 1 ton of steel +
2000 tiles for $y; etc.). You could use set covering to find the best way to get all the materials
while minimizing cost.

r
How can we solve it?

ve
Greedy Method – or “brute force” method
Let C represent the set of elements covered so far
Let cost-effectiveness, or α, be the average cost per newly covered node.
Algorithm
1. C Å 0

ni
2. While C ≠ U do Find the set whose cost-effectiveness is smallest, say S
Let CS Sc − = )( α For each e∈S-C, set price(e) = α C Å C ∪ S 3.
Output picked sets.
U
ity
m
)A

(note: The greedy algorithm is not optimal. An optimal solution would have chosen Y and Z
for a cost of 22)
Theorem: The greedy algorithm is a factor approximation algorithm for the minimum set
(c

Hncover problem, where


Hn= 1+1/2+…+1/n = log n

Proof:

50
(i) We know ∑ = cost of the greedy algorithm = ∈Ue e price )( )(...)()( 1 2 ++ + ScScSc
m because of the nature in which we distribute costs of elements.
(ii) We will show 1 )( +− ≤ n k OPT price k , where is the kth element covered. k e Say

ty
the optimal sets are . OOO p ,..., 21 So, OPT )(...)()( . 1 2 p a +++= OcOcOc Now,
assume the greedy algorithm has covered the elements in C so far. Then we
know the uncovered elements, or −CU, are at most the intersection of all of the
optimal sets intersected with the uncovered elements: )(...)()( 1 2 p

si
CUOCUOCUOCU b −∩++−∩+−∩≤− In the greedy algorithm, we select a set with
cost-effectiveness α, where pi CUO Ocii c ...1, )( )( = −∩ α ≤ . We know this
because the greedy algorithm will always choose the set with the smallest cost-
effectiveness, which will either be smaller than or equal to a set that the optimal

r
algorithm chooses.

ve
Exercise

Q1. What is the Set Cover Algorithm?

Q2. Why Karp Algorithm is Useful?

Q3. Write the Algorithm for Metric TSP?

Q4. What is Vertex Cover Algorithm?


ni
U
Q5. What is the Reverse Colussi algorithm?
ity
m
)A
(c

51
Module V NP-Completeness
Key learning outcomes

ty
At the End of this module you will learn:

 Concept NP-Completeness
 What is Reduction

si
Concept of Polynomial-Time Algorithms
 Concept of Clique
 Concept of NP-Complete Problems
 Proofs of NP-Completeness

r
 SAT and 3-SAT

ve
 The SUBSET-SUM Problem

5.1.1 NP-Completeness

ni
Can all computational problems be solved by a computer?
There are computational issues that cannot be settled by calculations even with limitless time. For
U
instance Turing Halting issue (Given a program and info, regardless of whether the program will in
the long run end when run with that input, or will run for eternity). Alan Turing demonstrated
that an overall calculation to tackle the stopping issue for all conceivable program-input sets can't
exist. A critical piece of the confirmation is, Turing machine was utilized as a numerical meaning of
a PC and program.
ity

Status of NP-Complete issues is another disappointment story, NP-complete issues are issues
whose status is obscure. No polynomial-time calculation has yet been found for any NP-complete
issue, nor has anyone yet had the option to demonstrate that no polynomial-time calculation
exists for any of them. The fascinating part is if any of the NP-complete issues can be tackled in
m

polynomial time, at that point every one of them can be addressed.

What are NP, P, NP-complete, and NP-Hard problems?


)A

P is a set of issues that can be tackled by a deterministic Turing machine in Polynomial-time.


NP is a set of choice issues that can be tackled by a Non-deterministic Turing Machine in
Polynomial-time. P is a subset of NP (any difficulty that can be tackled by a deterministic machine
in polynomial time can likewise be addressed by a non-deterministic machine in polynomial time).
(c

Casually, NP is a set of choice issues that can be addressed by a polynomial-time through a


"Fortunate Algorithm", a mystical calculation that consistently makes a correct conjecture among
the given arrangement of decisions
A choice issue L is NP-finished if:

1
1) L is in NP (Any given answer for NP-complete issues can be checked rapidly, yet there is no
productive known arrangement).
2) Every issue in NP is reducible to L in polynomial time (Reduction is characterized beneath).

ty
An issue is NP-Hard if it follows property 2 referenced above, doesn't have to follow property 1.
Accordingly, the NP-Complete set is likewise a subset of the NP-Hard set.
Decision vs Optimization Problems

si
NP-completeness applies to the domain of choice issues. It was set up this way since it's simpler

r
ve
ni
U
ity

to analyze the trouble of choice issues than that of improvement issues. In all actuality, however,
having the option to tackle a choice issue in polynomial time will regularly allow us to take care of
the comparing enhancement issue in polynomial time (utilizing a polynomial number of calls to
the choice issue). Along these lines, talking about the trouble of choice issues is frequently truly
identical to examining the trouble of streamlining issues.
m

It is a streamlining issue. Comparing choice issue is, given undirected diagram G and k, is there a
vertex front of size k?
)A

What is Reduction?

Let L1 and L2 alone two choice issues. Assume calculation A2 settles L2. That is, if y is a
contribution for L2, calculation A2 will answer Yes or No relying on if y has a place with L2.
The thought is to discover a change from L1 to L2 so the calculation A2 can be essential for a
(c

calculation A1 to address L1

2
ty
si
Learning decrease when all is said in done is vital. For instance, on the off chance that we have
library capacities to take care of a certain issue and if we can decrease another issue to one of the
tackled issues, we save a great deal of time. Consider the case of an issue where we need to

r
discover the least item way in a given coordinated diagram where the result of way is the

ve
duplication of loads of edges along the way. If we have code for Dijkstra's calculation to discover
the most limited way, we can take the log, all things considered, and utilize Dijkstra's calculation
to locate the base item way instead of composing a new code for this new issue.
How to prove that a given problem is NP-complete?

ni
From the meaning of NP-complete, it seems difficult to demonstrate that a difficult L is NP-
Complete. By definition, it expects us to that show each issue in NP is polynomial-time reducible
to L. Luckily, there is a substitute method to demonstrate it. The thought is to take a known NP-
Complete issue and lessen it to L. On the off chance that polynomial-time decrease is conceivable,
U
we can demonstrate that L is NP-Complete by transitivity of decrease (If an NP-Complete issue is
reducible to L in polynomial time, at that point all issues are reducible to L in polynomial time).
ity

5.1.2 What was the first problem proved as NP-Complete?

There should be some first NP-Complete issue demonstrated by the meaning of NP-Complete
issues. SAT (Boolean satisfiability issue) is the main NP-Complete issue demonstrated by Cook
(See CLRS book for evidence).
m

It is consistently valuable to think about NP-Completeness in any event, for engineers. Assume
you are approached to compose a proficient calculation to take care of a critical issue for your
organization. After a ton of reasoning, you can just come up outstanding time approach which is
illogical. If you don't think about NP-Completeness, you can just say that I was unable to
)A

accompany a proficient calculation. On the off chance that you think about NP-Completeness and
demonstrate that the issue is NP-complete, you can gladly say that the polynomial-time
arrangement is probably not going to exist. On the off chance that there is a polynomial-time
arrangement conceivable, that arrangement takes care of a major issue of software engineering
numerous researchers have been pursuing for quite a long time.
(c

Polynomial Time Algorithms


A computational issue example has info and a target it needs to register on that input. A
calculation is a system to process the goal work. The calculation is said to run in polynomial time,
3
or proportionately the calculation is a polynomial-time calculation if there exists a polynomial p(),
and the time taken by the calculation to tackle each example of the issues is upper limited by
p(size(input)), the polynomial assessed at the size of the info.

ty
To ascertain the time taken to address, we typically assess the number of steps the technique
takes to tackle the issue, expecting each progression takes unit time. The meaning of step relies

si
upon the model of calculation, anyway, we won't delve into the subtleties of this and do the trick
ourselves with the suspicion that a number-crunching step or an examination step is taken one
unit of time. Subsequently, for example, if at any time a calculation requires the expansion of two
numbers, we will accept this should be possible in unit time, independent of the estimation of the

r
numbers.

ve
The size of the info is more unpretentious. It comprises of two sections. The first is the quantity of
information focuses that should be accommodated the info. For instance, in a planning issue, say,
the information needs to give the preparing seasons of the relative multitude of occupations on
the single machine. Consequently, the quantity of info focuses is n, the number of occupations.
The second is the size of each info esteem. The size of an info esteem pj, the preparing time, relies

ni
upon how this information is encoded. For example, the worth pj, expect it is a number, could be
put away as a piece exhibit of pj 1's, in which case the size is the worth pj, or it tends to be put
away as a double numeral, as it regularly is, in which case the size is log2 pj. The main strategy is
U
known as the unary portrayal of the info, the second is the double portrayal. The default input
mode is thought to be the paired portrayal of the information. The size of a computational issue is
for the most part indicated as |X|.
Thus, returning to issue (1), the size of the info is ( +log) which is in any event n+log2 pmax. A
ity

calculation for (1||Cj) is polynomial time if the running time is limited by p(n +log2 pmax), where
p() doesn't rely upon the contribution of the occurrence.
A calculation that runs in time polynomial of the information when the information is addressed
in unary is known as a pseudo-polynomial time calculation. Pseudo-polynomial time calculations
are not polynomial time. A calculation that runs in time polynomial of the number of information
m

focuses, regardless of the size of the genuine information, is known as a firmly polynomial-time
calculation. Review that the calculation SPT, which arranges the positions in expanding request of
preparing time, runs in time O(n log n) and this manner is unequivocally polynomial time. (Note
)A

that we expect here that the time taken to analyze two numbers is unit regardless of whether the
numbers are colossal).

Polynomial Time Reductions


We have already looked at reduction among scheduling problems. The same can be extended to
(c

any computational problem.


Definition 1.1. A computational problem X is polynomial-time reducible to a computational
problem Y, if given an algorithm for the problem Y with time complexity T ( Y ) (note that T might

4
or might not be a polynomial), we can solve an instance of X in time (p(|X|) + q(|X|)T (|X|)). Note
that the input to the time function T is the size of X. We denote this as X Ð Y or X ≤P Y.
Regularly a polynomial-time decrease continues in an accompanying way: given an occurrence of

ty
X, we concoct a case of Y and contend that given the answer for this example of Y, one can
recuperate, in polynomial time the answer for the occasion of X. We give a model.

si
Example 1.2. Consider the following two problems.
Hamiltonian Circuit Problem (HCP): Given a diagram G on n vertices and m edges, is there a
hamiltonian circuit in G?

r
Mobile Salesman Problem (TSP): Given a chart G on n vertices (urban areas) with distance dij

ve
between any two urban communities I and j, discover a visit through least distance.
Note that HCP P TSP. To see this, given an example of HCP, develop the occurrence of TSP by
characterizing the distance between vertices I and j as follows – dij = 1 is (I, j) is an edge, and dij =
2 in any case. It is currently not very difficult to see that there is a hamiltonian way in the chart iff
the ideal visit has cost n.

ni
For what reason are polynomial-time decreases helpful? All things considered, if X P Y, and on the
off chance that we have a polynomial-time calculation for Y, we make some polynomial memories
U
calculation for X also. Hence, these decreases help in grouping the issues into "simple" and
"difficult" issues. Hide the more, if there is motivation to accept that the difficult X doesn't have
any polynomial-time calculation, at that point Y can't make some polynomial memories
calculation all things considered. This is the place where decreases come in generally helpful.
ity

Lamentably, there is certainly not a solitary issue for which we can say there are no polynomial-
time calculations. Notwithstanding, as we find in the following segment, there is a
characterization of issues into "simple" and "difficult" issues with the property that if any of the
difficult issues make some polynomial memories calculation, they all do. This is likely the most
convincing motivation behind why PC researchers accept that there can be no polynomial-time
calculation for any of them. Pretty flaky? That is lamentably the situation!
m

P, NP, and all that Jazz


)A

Before going into the grouping of issues, we remark on two unique kinds of issues we have seen
Decision vs Optimization Problems
Take a gander at the model above. The issue HCP was a "yes/no" question. It found out if a
diagram has a hamiltonian cycle or not. The appropriate response can be given in the slightest bit.
Such issues are called choice issues. The issue TSP anyway was an enhancement issue. It
(c

requested the base distance visit. We will fret about choice issues in this part (albeit all issues in
this course are advancement questions). Each enhancement issue X has a choice variant XJ. For
example, the choice form of TSP has input the contribution of TSP and an additional number B,
and the issue is: "Given a contribution of TSP, is there a visit through distance all things
considered B?". Note that this is a choice issue. Besides, TSP J P TSP, for on the off chance that we
5
have a calculation for the enhancement issue, we can without a doubt tackle the choice issue. The
opposite is consistently false. Nonetheless, in most cases, the answer for the choice issue likewise
thinks of an answer which can be utilized for the enhancement issue. Ultimately, note that since

ty
XJ ≤P X, X is harder than XJ.

The complexity classes

si
A computational (choice or improvement) issue is supposed to be in the intricacy class P if there
exists a polynomial-time calculation to settle it.
A choice issue X is supposed to be in NP, if for each "yes" occurrence there exists a polynomial

r
estimated "authentication" which can be "confirmed" in polynomial time. Allow us to explain. A
"yes" occasion of the difficult X is one in which the appropriate response is "yes". For example, a

ve
diagram that has a hamiltonian circuit is a yes-occurrence for HCP. A "testament" for the yes-
occasion is proof that the occurrence is without a doubt a yes-case. For example, given a chart
that has a hamiltonian circuit, the circuit is the declaration. The size of the endorsement is needed
to be a polynomial in the size of the info. A "verifier" is only a polynomial-time calculation that

ni
takes input the yes-occasion and the declaration. If for every yes-example there exists an
authentication, and if the yes-occasion can be confirmed in polynomial time in the size of the
information, at that point the issue is in NP. Since given a competitor hamiltonian circuit of the
diagram, we can confirm it is hamiltonian by basically checking the presence of the different
U
edges guaranteed by the circuit, we get that the issue HCP is in NP.

Allow us to take a gander at another model. Consider the issue COMPOSITE which given
information about a characteristic number N, yields yes if the number is composite, and no if the
number is prime. (What is the size of the information?) We guarantee that COMPOSITE NP. Why?
ity

Since for each yes-example, that is, a composite number N, we can give a declaration, the
elements of the numbers, say an and b, and this can be checked to be an appropriate
authentication simply by duplicating the two numbers and checking whether it gives N or not.
Comment 2.1. The P in P means "polynomial time". Notwithstanding, NP doesn't mean "not
m

polynomial". It represents non-deterministic polynomial time. This is because any issue in NP can
be settled in the polynomial season of one could "non-deterministically surmise" the polynomial
estimated endorsement for each yes-case.
We currently give a thorough meaning to the classes. Any choice issue X has two occasions, the
)A

yes-occurrence, and the no-example. An authentication or a proof, y, of a case is a line of 0, 1


which may rely upon x. y is said to have polynomial-size if y = poly( x ). A verifier V takes as
information an example, x, and a proof y, and yields yes or no.
Definition 2.2. A decision problem X ∈ P, if there exists a polynomial-time algorithm A such
that for every yes instance x ∈ X, Ax = yes; and for every no instance x ∈ X, we have Ax = no.
(c

A choice issue X NP, if there exists a polynomial-time verifier V


with the end goal that

6
For each yes example x ∈ X, there exists a polysized evidence y with the end goal that V (x, y)
= yes. For each no example x ∈ X, and for each verification y, we have V (x, y) = no.

What is the connection between the classes P and NP? Well P ⊆ NP.

ty
P ⊆ NP.

si
Proof. Let X alone any choice issue in P (the advancement issues can be taken care of
comparably). Leave An alone a calculation to take care of the issue. Leave x alone a yes-case of X.
As a verifier, one can just run An on x to check it is a yes-example. Subsequently, the testament is
unfilled and the verifier is only the calculation A.

r
Are all choice issues in NP? All the issues we have taken a gander at so far are in NP. Yet, that

ve
doesn't imply that all issues lie in this class, or possibly are not known to lie in this class.
Let us turn the COMPOSITE issue on its head to get the PRIMES issue: Given a characteristic
number N, choose whether the number N is an indivisible number or not. Could you presently
locate a decent testament for yes-occurrences? That is, given an indivisible number N, would you

ni
be able to give some additional verification that N is great which can be checked in polynomial
time? It turns that one can, anyway the appropriate response isn't basic and one requirement to
utilize variable-based math to give this endorsement.
U
NP-completeness
It is protected to say that most specialists accept that P = NP. Be that as it may, numerous
specialists likewise accept we don't have a notion of a thought how to tackle this issue. What
specialists have done anyway is they can distinguish the "most difficult" issues in NP.
ity

Definition: A problem Y is called NP-hard if any problem X ∈ NP can be reduced to Y, that is,
X ≤P Y. Y is called NP-complete if Y ∈ NP.
In this manner, NP-complete issues are the most difficult issues in NP, on the off chance that one
can be tackled in polynomial time, so can any remaining issues in NP. Deduced there is no
motivation to accept that there will be any NP-complete issue. Notwithstanding, Stephen Cook
m

from UofT showed that there is in any event one. The issue is CIRCUIT SAT, which asks given a
circuit (made of doors and wires) with n boolean ( 0, 1 ) information sources and 1 boolean yield,
is there a task of the information sources which makes the yield 1?
)A

(S. Cook, 1971). CIRCUIT SAT ∈ NP.


Presently comes the excellence of decreases. Given an NP-complete issue Y, if we can
diminish it to another issue Z NP, Z additionally is NP-finished. This is because the decreases
are transitive; X P Y, Y P Z suggests X P Z. Along these lines when one issue ends up being NP-
finished, decreases can be utilized to demonstrate many are. Not long after Cook's paper was
distributed, Richard Karp from Berkeley distributed the possibility of decreases and gave a
(c

rundown of 21 NP-complete issues. Presently there are many NP-complete issues.

We currently give a rundown of some valuable NP-complete issues which will be helpful to
demonstrate

7
NP-culmination of some planning issues.
1. HCP and TSP are NP-finished.
2. 3-parcel: Given 3 numbers, and a target, with42 and ai = nB, would we be able to

ty
segment the numbers into n-sets S1, . . . , Sn every one of cardinality 3 to such an extent that
each set has numbers summarizing to precisely B.

3. Vertex Cover: Given a diagram G and a number B, are there all things considered B

si
vertices to such an extent that each edge in G is adjoining, at any rate, one of these vertices.

In the following talk or two, we will see a rundown of planning issues that are NP-finished.

r
The essential pattern we will use to demonstrate NP-fulfillment of a difficult X will be the
accompanying:

ve
1. Show X ∈ NP (This is typically simple).
2. Pick an appropriate NP-complete issue Y.
3. Reduce Y ≤P X.


ni
To diminish a difficult X to Y, typically the accompanying three stages are utilized.
Come up with a strategy which for a self-assertive example x ∈ X gives a case
U
y ∈ Y in polynomial time to such an extent that
– YES case: If x is a yes-occurrence of X, at that point y is a yes-occasion of Y.
– NO case: If x is a no-case of X, at that point y is a no-case of Y.
ity

In this way, if Y can be addressed in polynomial time, one can utilize the above strategy to
change an occurrence of X over to a case of Y, and check if the Y - the example is yes or no,
utilizing the calculation.

Guarantee 2.9. The backpack issue is NP-finished.


m

Verification. The choice rendition of the backpack issue inquires as to whether there exists a
doable subset of things with benefit in any event P, where P is an info boundary. We leave
checking if Knapsack in NPas an activity. We lessen Partition to the backpack issue. Given an
occasion of the segment issue: a1, . . . , a develop a rucksack example with n things with thing
I having weight ai and benefit ai also. Leave the limit of the rucksack alone B/2 and the
)A

estimation of P be B/2 too.


On the off chance that the example of a parcel is a yes-occasion, the occurrence of the
backpack is likewise a yes-case. This is because the arrangement of numbers that amount to
B/2 gives a benefit of B/2 also. If the occurrence of the parcel is a no-occasion, any
arrangement of numbers that amount to all things considered B/2 should be carefully less. In
this way, any doable arrangement of the rucksack case should have absolute weight and
(c

consequently complete benefit equalling carefully not as much as P. Hence, it is a no-example


of the rucksack issue.
This finishes the verification. Indeed, it demonstrates the unique instance of the backpack
when the weight of a thing approaches its benefit is additionally NP-complete.

8
5.1.3 NP-Completeness

• We have been discussing complexity theory


– classification of problems according to their difficulty

ty
• We introduced the classes P, NP, and EXP
EXP = {Decision problems solvable in exponential time}
P = {Decision problems solvable in polynomial time}

si
NP = {Decision problems where YES solution can verify in polynomial time}
• A major open question in theoretical computer science is if P = NP or not.
• We also introduced the notion of polynomial-time reductions

r
X ≤P Y :

ve
solve can
be computed in polynomial time from the instance of problem X).
• We then introduced the class of NP-complete problems NP C
A problem Y is in NPC if

a) Y ∈ NP
b) X ≤P Y for all X ∈ NP ni
and discussed how the problems in NPC are the hardest problems in NP and the key to
U
resolving the P = NP question.
– If one problem Y ∈ NP C is in P then P = NP.
– If one problem Y ∈ NP is not in P then NP C ∩ P = ∅.
– By now a lot of problems have been proved NP-complete
ity

– We think the world looks like this—but we do not know:


m

– If someone found a polynomial-time solution to a problem in NPC our world would


)A

“collapse” and a lot of smart people have tried hard to solve NPC problems efficiently

We regard Y ∈ NPC as strong evidence for Y being hard!

NP-Complete Problems
(c

• The following lemma helps us to prove a problem NP-complete using another NP-complete
problem.
Lemma: If Y ∈ NP and X ≤P Y for some X ∈ N PC then Y ∈ N PC

– To prove Y NPC we just need to prove Y NP (often easy) and reduce the problem
∈ ∈
9
in
NPC to Y (no lower bound proof needed!).

• Finding the first problem in NP C is somewhat difficult and require quite a lot of formalism

ty
– The first problem proven to be in NPC was SAT:
Give a boolean formula, is there an assignment of true and false to the variables that
make the formula true?

si
– For example:
Can ((x1 ⇒ x2) ∨ ¬((¬x1 ⇔ x3) ∨ x4)) ∧ ¬x2 be satisfied?
• Last time we discussed what seems to be an easier problem 3SAT: Given a formula in 3-CNF,is it
satisfactory?

r
– A formula is in 3-CNF (conjunctive normal form) if it consists of an AND of ’clauses’each

ve
of which is the OR of 3 ’literals’ (a variable or the negation of a variable)
– Example: (x1 ∨ ¬x2 ∨ ¬x3) ∧ (¬x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x3)
• We prove that 3SAT is an NPC, that is, that it is as hard as the general SAT.
– 3SAT ∈ NP
– SAT ≤P 3SAT

CLIqUE
ni
(we showed how to transform the general formula into 3-CNF in polynomial time.)
U
• NP-complete problems arise in many domains
– Many important graph problems are in NPC.

• CLIqUE: Given a graph G = (V, E) decide if there is a subset V ⊂ V of size k such that
J
ity

there is an edge between every pair of vertices in V J

– The decision version of the problem of finding maximal clique.

Example (clique of size 4):


m
)A

|V |
••
••
(c

by testing each of the ( k


k.
– but would take exponential time for k = Θ(|V |)
• CLIqUE is indeed hard:

10
Theorem: CLIqUE ∈ NPC
Proof:

- CLIqUE ∈ NP: Given a subset V J we can easily check in polynomial time that V J =| k |

ty
and that V J is a clique.
- 3SAT P CLIqUE (somewhat surprising since formulas seem to have little to do with graphs):

r si
ve
ni
U
ity
m
)A
(c

11
5.1.4 NP-Complete Problems
In this section, we will show that specific old-style algorithmic issues are NP-finished.

ty
This part is intensely enlivened by Lewis and Papadim-itriou's brilliant therapy

To contemplate the intricacy of these issues regarding asset (time or space) limited Turing mama chines (or
RAM programs), it is essential to have the option to encode occasions of a difficult P as strings in a language
LP.

si
At that point, an example of a difficult P is feasible iff the comparing string has a place with the language
LP.

This infers that our issues should have a yes-no answer, which isn't generally the standard definition of

r
enhancement issues where what is required is to locate some ideal arrangement, that is, an answer
limiting or boosting so level-headed (cost) work F.

ve
For instance, the standard definition of the mobile sales rep issue requests a visit through (the urban
communities) of insignificant expense.

Luckily, there is a stunt to reformulate an optimization issue as a yes-no answer issue, which is to
unequivocally consolidate a financial plan (or cost) term B into the issue, and as opposed to finding out if

ni
some target work F has a base or a most extreme w, we find out if there is an answer w with the end goal
that F (w) B on account of a base arrangement, or F (w) >= B on account of a most extreme arrangement.

The problems that will consider are


U
1. Exact Cover

2. Hamiltonian Cycle for directed graphs

3. Hamiltonian Cycle for undirected graphs


ity

4. The Traveling Salesman Problem

5. Independent Set

6. Clique
m

7. Node Cover

8. Knapsack, also called subset sum


)A

9. Inequivalence of ∗-free Regular Expressions


10. The 0-1-integer programming problem
We start by depicting every one of these issues.
(1) Exact Cover
We are given a limited nonempty set U = {u1,. .. , un} (the universe), and a family F = {S1,. .. , Sm} of m ≥ 1
(c

nonempty subsets of U .

The inquiry is whether there is an accurate cover, that is, a subfamily C ⊆ F of subsets in F to such an
extent that the sets in C are disjoint and their association is equivalent to U.
12
For instance, let

U = {u1, u2, u3, u4, u5, u6}, and let F be the family

F = {{u1, u3}, {u2, u3, u6}, {u1, u5}, {u2, u3, u4},

ty
{u5, u6}, {u2, u4}}.

The subfamily

si
C = {{u1, u3}, {u5, u6}, {u2, u4}}

is an accurate cover.

It is not difficult to see that the Exact Cover is in NP. This implies that we give a strategy running in

r
polynomial time that changes over each case of the Satisfiability Problem to an example of Exact Cover,
with the end goal that the principal issue has an answer if the converted issue has an answer.

ve
(2) Hamiltonian Cycle (for Directed Graphs)

Review that a coordinated diagram G is a couple G = (V, E), where E ⊆ V × V.

Components of V are called hubs (or vertices). A couple (u, v) ∈ E is called an edge of G

ni
We will confine ourselves to straightforward diagrams, that is, charts without edges of the structure (u, u);

proportionately, G = (V, E) is a basic chart if, at whatever point (u, v) ∈ E, u ≠ v.


U
Given any two hubs, u, v ∈ V, away from u to v is any grouping of n +1 edges (n ≥ 0)

(u, v1), (v1, v2),. .., (vn, v).

(On the off chance that n = 0, away from u to v is essentially a solitary edge, (u, v).)

We will limit our thoughtfulness regarding limited diagrams, for example, diagrams (V, E) where V is a
ity

limited set.
Definition: Given a coordinated chart G, A Hamiltonian cycle is a cycle that goes through all the hubs
precisely once (note, a few edges may not be crossed by any means).

Hamiltonian Cycle Problem (for Directed Graphs): Given a coordinated chart G, is there a Hamiltonian cycle
m

in G?

Is there is a Hamiltonian cycle in the coordinated chart D appeared in the Figure


)A
(c

13
ty
r si
ve
Finding a Hamiltonian cycle in this graph does notappear to be so easy! A solution
is shown in Figure.

It is not difficult to see that Hamiltonian Cycle (for Directed Graphs) is in NP.
To demonstrate that it is -complete, we will diminish the Ex-act Cover to it.

ni
This implies that we give a strategy running in polynomial time that changes each case of
Exact Cover over to an occurrence of Hamiltonian Cycle (for Directed Graphs) to such an
extent that the main issue has an answer iff the changed over the issue has an answer.
This is perhaps the hardest decrease.
U
(3) Hamiltonian Cycle (for Undirected Graphs)
Review that an undirected chart G is a couple G = (V, E), where E is a bunch of subsets u, v
of V consisting of precisely two particular components.
ity

Components of V are called hubs (or vertices). A couple


{u, v} ∈ E is called an edge of G.
Given any two hubs u, v ∈ V , a way from u to v is any arrangement of n hubs (n ≥ 2)
u = u1, u2,. .. , un = v
m

such that ui, ui+1 E for I = 1,. .. , n 1. (If


n = 2, a way from u to v is basically a solitary edge,
{u, v}.)
An undirected chart G is associated if for each pair (u, v) ∈ V × V, there is a way from u to
)A

v.
A shut way, or cycle, is away from some hub u to itself.

Definition: Given an undirected diagram G, a Hamiltonian cycle is a cycle that goes


through all the hubs precisely once (note, a few edges may not be navigated by any
means).
(c

To demonstrate that it is complete, we will diminish the Hamiltonian Cycle (for Directed Graphs) to
it.

14
This implies that we give a strategy running in polynomial time that changes over each occurrence
of the Hamiltonian Cycle (for Directed Graphs) to an in-position of Hamiltonian Cycle (for Undirected
Graphs) to such an extent that the primary issue has an answer iff the change over the issue has an
answer. This is a simple decrease.

ty
(4) Traveling Salesman Problem

si
We are given a set {c1, c2,. .., cn} of n ≥ 2 urban areas, and a n × n network D = (dij) of nonnegative whole
numbers, where dij is the distance (or cost) of going from city ci to city cj. We accept that dii = 0 and dij =
dji for all I, j, so the lattice D is symmetric and has zero inclinings.

r
Mobile Salesman Problem: Given some n×n network D = (dij) as above and some whole number B 0 (the
financial plan of the mobile sales rep), discover a stage π of {1, 2, .., n} to such an extent that

ve
c(π) = dπ(1)π(2) + dπ(2)π(3) + ·· ·

+ dπ(n−1)π(n) + dπ(n)π(1) ≤ B.

π.
ni
The amount c(π) is the expense of the excursion determined by

The Traveling Salesman Problem has been expressed regarding a spending plan so it has a yes or no
U
answer, which permits us to change over it into a language. A smaller than expected mal arrangement
relates to the littlest achievable estimation of B.

The Traveling Salesman Problem is in NP. To show that it is NP-finished, we decrease the Hamiltonian Cycle
Problem (Undirected Graphs) to it.
ity

This implies that we give a technique running in polynomial time that changes over each example of
Hamiltonian Cycle Problem (Undirected Graphs) to an occurrence of the Traveling Salesman Problem with
the end goal that the primary issue has an answer iff the change over the issue has an answer.
m

(5) Independent Set

The issue is this: Given an undirected chart G = (V, E) and a number K ≥ 2, is there a set C of hubs with |C|
≥ K to such an extent that for all vi, vj ∈ C, there is no edge {vi, vj} ∈ E?
)A

A maximal autonomous set with 3 hubs has appeared in Figure


(c

15
ty
r si
ve
ni
This implies that we give a technique running in polynomial time that changes each case of Exact 3-
Satisfiability over to an occurrence of Independent Set with the end goal that the principal issue has an
answer iff the converted issue has an answer.
U
(6) Clique

The issue is this: Given an undirected chart G = (V, E) and a number K ≥ 2, is there a set C of hubs with |C|
≥ K to such an extent that for all vi, vj ∈ C, there is some edge {vi, vj} ∈ E?
ity

Proportionally, does G contain a total subgraph within any event K hubs?

A maximal arrangement relates to the biggest plausible estimation of K.

The difficult Clique is clearly in NP. To show it is NP-Complete, we diminish Independent Set to it.

This implies that we give a technique running in polynomial time that changes each example of
m

Independent Set over to an occasion of Clique to such an extent that the main issue has an answer if the
changed over the issue has an answer.
)A

(7) Node Cover

The issue is this: Given an undirected diagram G = (V, E) and a whole number B ≥ 2, is there a set C of hubs
with |C| ≤ B to such an extent that C covers all edges in G, which implies that for each edge {vi, vj} ∈ E,
either vi ∈ C or vj ∈ C?

An insignificant arrangement compares to the littlest practical estimation of B.


(c

The problem Node Cover is obviously in NP. This means that we provide a method running in poly- nomial
time that converts every instance of Independent Set to an instance of Node Cover such that the first
problem has a solution iff the converted problem has a solution.

16
The Node Cover problem has the following interesting interpretation:

think of the nodes of the graph as rooms of a museum (or art gallery, etc.), and each edge as a straight
corridor that joins two rooms.

ty
Then Node Cover may be useful in assigning as few as possible guards to the rooms so that all corridors
can be seen by a guard.

si
(8) Knapsack (also called Subset-sum)

The problem is this: Given a finite nonempty set S = {a1, a2,. .., an} of nonnegative integers, and some
integer K ≥ 0, all represented in binary, is there

r
a nonempty subset I ⊆ {1, 2, .., n} such that

ve
ai = K?

i∈I

ni
A “concrete” realization of this problem is that of a hiker who is trying to fill her/his backpack to its
maximum capacity with items of varying weights or values.

It is easy to see that the Knapsack Problem is in NP.


U
To show that it is -complete, we reduce the Exact Cover to it.

This means that we provide a method running in poly- nomial time that converts every instance of Exact
Cover to an instance of Knapsack Problem such that the first problem has a solution iff the converted
problem has a solution.
ity

Remark: The 0 -1 Knapsack Problem is defined as the following problem.

Given a set of n items, numbered from 1 to n, each with weight wi ∈ N and a value vi ∈ N, given a
maximum capacity W ∈ N and a budget B ∈ N, is there a set of n variables x1, .., xn with xi 0, 1 such
that
m

Informally, the problem is to pick items to include in the knapsack so that the sum of the values exceeds a
given minimum B (the goal is to maximize this sum), and the sum of the weights is less than or equal to the
capacity W of the knapsack.

A maximal solution corresponds to the largest feasible value of B.


)A

The Knapsack Problem as we defined it (which is how Lewis and Papadimitriou define it) is the special case
where vi = wi = 1 for i = 1, ..,n and W = B.

For this reason, it is also called the Subset Sum

Problem.
(c

The Knapsack (Subset Sum) Problem re- duces to the 0 -1 Knapsack Problem, and thus the 0 -1 Knapsack
Problem is also NP-complete.

(9) Inequivalence of ∗-free Regular Expressions

17
Recall that the problem of deciding the equivalence R1 ∼= R2 of two regular expressions R1 and R2 is the
problem of deciding whether R1 and R2 define the same language, that is, L[R1] = L[R2].

Is this problem in NP?

ty
To show that the equivalence problem for regular expressions is in we would have to be able to somehow
check in polynomial time that two ex- pressions define the same language, but this is still an open
problem.

si
What might be easier is to decide whether two regular expressions R1 and R2 are inequivalent.

For this, we just have to find a string w such that either w ∈ L[R1] − L[R2] or w ∈ L[R2] − L[R1].

r
The problem is that if we can guess such a string w, we still have to check in polynomial time that w ∈ (

ve
[R1] [R2]) ( [R2] [R1]), and this implies that there is a bound on the length of w which is polynomial
in the sizes of R1 and R2.

Again, this is an open problem.

To obtain a problem in we have to consider a restricted type of regular expressions, and it turns out that ∗-
free regular expressions are the right candidate.

ni
A -free regular expression is a regular expression that is built up from the atomic expressions using only +
and ·, but not ∗. For example,
U
R = ((a + b)aa(a + b)+ aba(a + b)b) is such an expression.

It is easy to see that if R is a ∗-free regular expression,

then for every string w ∈ L[R] we have |w| ≤ |R|. In particular, L[R] is finite.
ity

The above observation shows that if R1 and R2 are

∗-free and if there is a string w ∈ (L[R1] − L[R2]) ∪ ( [R2] [R1]), then w R1 + R2, so we can indeed check
this in polynomial time.

It follows that the inequivalence problem for ∗ -free regular expressions is in NP.
m

To show that it is -complete, we reduce the Satisfiability Problem to it.

This means that we provide a method running in poly- nomial time that converts every instance of
Satisfiability Problem to an instance of Inequivalence of Regular Expressions such that the first problem
has a solution iff the converted problem has a solution.
)A

(10) 0-1 integer programming problem

Let A be any p × q matrix with integer coefficients and let b ∈ Zp be any vector with integer coefficients.

The 0-1 integer programming problem is to find whether a system of p linear equations in q vari- ables
(c

a11x1 + · · · + a1qxq = b1

. .

ai1x1 + · · · + aiqxq = bi

. .
18
ap1x1 + ·· · + apqxq = bp

with aij, bi ∈ Z has any solution x ∈ {0, 1}q, that is, with xi ∈ {0, 1}

ty
Immediately, 0-1 integer programming problem is in NP.

si
To prove that it is -complete we reduce the

bounded tiling problem to it.

This means that we provide a method running in polynomial time that converts every instance of the

r
bounded tiling problem to an instance of the 0-

ve
1 integer programming problem such that the

the first problem has a solution iff the converted problem

has a solution.

Proofs of NP-Completeness

Exact Cover
ni
To prove that the Exact Cover is -complete, we reduce the Satisfiability Problem to it:
U
Satisfiability Problem ≤P Exact Cover

Given a set F = {C1,. .., Cℓ} of ℓ clauses constructed from n propositional variables x1, .., xn, we must
construct in polynomial time an instance τ(F ) = (U, ) of Exact Cover such that F is satisfiable iff τ(F ) has a
solution.
ity

Example If

F = {C1 = (x1 ∨ x2), C2 = (x1 ∨ x2 ∨ x3), C3 = (x2), C4 = (x2 ∨ x3)},


m

then the universe U is given by

U = {x1, x2, x3, C1, C2, C3, C4, p11, p12, p21, p22, p23, p31, p41, p42},

and the family F consists of the subsets


)A

{p11}, {p12}, {p21}, {p22}, {p23}, {p31}, {p41}, {p42}

T1,F = {x1, p11}

T1,T = {x1, p21} T2,F = {x2, p22, p31} T2,T = {x2, p12, p41} T3,F = {x3, p23}

T3,T = {x3, p42}


(c

{C1, p11}, {C1, p12}, {C2, p21}, {C2, p22}, {C2, p23},

{C3, p31}, {C4, p41}, {C4, p42}.

It is easy to check that the set consisting of the following subsets is an exact cover:

19
T1,T = {x1, p21}, T2,T = {x2, p12, p41}, T3,F = {x3, p23},

{C1, p11}, {C2, p22}, {C3, p31}, {C4, p42}.

The general method to construct (U, F) from

ty
F = {C1,. .., Cℓ} proceeds as follows. Say

Cj = (Lj1 ∨ · · · ∨ Ljmj )

si
is the jth clause in F, where Ljk denotes the kth literal in Cj and mj 1. The universe of τ(F ) is the set

U = {xi | 1 ≤ i ≤ n} ∪ {Cj | 1 ≤ j ≤ ℓ}

∪ {pjk | 1 ≤ j ≤ ℓ, 1 ≤ k ≤ mj}

r
wherein the third set pjk corresponds to the kth literal in Cj.

ve
The following subsets are included in F:

(a) There is a set {pjk} for every pjk.

(b) For every boolean variable xi, the following two sets are in F:

Ti,T = {xi} ∪ {pjk | Ljk = xi}

which contains xi and all negative occurrences of

xi, and
ni
U
Ti,F = {xi} ∪ {pjk | Ljk = xi}

which contains xi and all its positive occurrences. Note carefully that Ti,T involves negative occur- rences of
xi whereas Ti,F involves positive occur- rences of xi.
ity

(c) For every clause Cj, the mj sets {Cj, pjk} are in

F.

It remains to prove that F is satisfiable iff τ(F ) has a solution.

We claim that if v is a truth assignment that satisfies


m

F, then we can make an exact cover C as follows:

For each xi, we put the subset Ti,T in C iff v(xi) = T, else we we put the subset Ti,F in C iff v(xi) = F.
)A

Also, for every clause Cj, we put some subset {Cj, pjk} in C for a literal Ljk which is made true by v.

By construction of Ti, T and Ti, F, this pjk is not in any set in selected so far. Since hypothesis F is satisfiable,
such a literal exists for every clause.

Having covered all xi and Cj, we put a set {pjk} in C for every remaining pjk which has not yet been covered
by the sets already in C.
(c

Going back to Example 13.2, the truth assigment

v(x1) = T, v(x2) = T, v(x3) = F satisfies

F = {C1 = (x1 ∨ x2), C2 = (x1 ∨ x2 ∨ x3), C3 = (x2), C4 = (x2 ∨ x3)},

20
so we put

T1,T = {x1, p21}, T2,T = {x2, p12, p41}, T3,F = {x3, p23},

{C1, p11}, {C2, p22}, {C3, p31}, {C4, p42}

ty
in C.

Conversely, if is an exact cover of τ(F ), we define a truth assignment as follows:

si
For every xi, if Ti,T is in C, then we set v(xi) = T, else if Ti,F is in C, then we set v(xi) = F.

Example Given the exact cover

T1,T = {x1, p21}, T2,T = {x2, p12, p41}, T3,F = {x3, p23},

r
{C1, p11}, {C2, p22}, {C3, p31}, {C4, p42},

ve
we get the satisfying assigment v(x1) = T, v(x2) =

T, v(x3) = F .

If we now consider the proposition is CNF given by

F2 = {C1 = (x1 ∨ x2), C2 = (x1 ∨ x2 ∨ x3), C3 = (x2),

C4 = (x2 ∨ x3 ∨ x4)}
ni
where we have added the boolean variable x4 to clause C4, then U also contains x4 and p43 so we need to
U
add the following subsets to F:

T4,F = {x4, p43}, T4,T = {x4}, {C4, p43}, {p43}.

The truth assigment v(x1) = T, v(x2) = T, v(x3) =


ity

F, v(x4) = T satisfies F2, so an exact cover C is

T1,T = {x1, p21}, T2,T = {x2, p12, p41},

T3,F = {x3, p23}, T4,T = {x4},


m

{C1, p11}, {C2, p22}, {C3, p31}, {C4, p42}, {p43}.

Observe that this time, because the truth assignment v makes both literals corresponding to p42 and p43
true and since we picked p42 to form the subset {C4, p42}, we need to add the singleton p43 to cover all
)A

elements of U.
(c

5.1.5 NP-Hard

21
What is the class NP?

ty
Class P consists of all polynomial-time solvable decision problems. What

is the class NP? There are two misunderstandings:

(1) NP is the class of problems that cannot be solved in polynomial-time.

si
(2) A decision problem belongs to the class NP if its answer can be checked

in polynomial-time.

The misunderstanding

r
(1) comes from mis explanation of the brief name

ve
NP for” Not Polynomial-time solvable”. It is polynomial-time solvable, but in a wide sense of computation,
non-deterministic computation, that

is, NP is th class of all non-deterministic polynomial-time solvable

Polynomial-time”.
ni
decision problems. Thus, NP is the brief name of” Nondeterministic

In non-deterministic computation, there may contains some guess steps.


U
Let us look at an example. Consider the Hamiltonian Cycle problem:

Given a graph G = (V, E), does G contain a Hamiltonian cycle? Here, a Hamiltonian cycle is a cycle passing
through each vertex exactly once. The following is a nonterminated algorithm for the Hamiltonian Cycle
problem
ity

input a graph G = (V, E).

step 1 guess a permutation of all vertices.

step 2 check if guessed permutation gives a Hamiltonian cycle.


m

if yes, then accept input

Note that in step 2, if the outcome of checking is no, then we cannot give any conclusion, and hence
)A

nondeterministic computation stucks. However, a nondeterministic algorithm is considered to solve a


decision problem correctly if there exists a guessed result leading to a correct yes-answer. For example, in
the above algorithm, if the input graph contains a Hamiltonian cycle, then there exists a guessed
permutation which gives a Hamiltonian cycle and hence gives a yes-answer. Therefore, it is a
nondeterministic algorithm that solves the

Hamitonian Cycle problem.


(c

Why (2) is wrong? This is because not only checking step is required to be polynomial-time computable,
but also guessing step is required to be polynomial-time computable. How do we estimate guessing time?
Let us explain this starting from what is a legal guess. A legal guess is a guess from a pool with a size-

22
independent from input size. For example, in the above algorithm, guess in step 1 is not legal because the
number of permutation of n vertices is n! which depends on input size.

What is the running time of step 1? It is the number of legal guesses spent in the implementation of the

ty
guess in step 1. To implement the guess in

step 1, we may encode each vertex into a binary code of length ⌈log2 n⌉.

Then each permutation of n vertices is encoded into a binary code of length O(n log n). Now, guessing a

si
permutation can be implemented by O (n log n) legal guesses each of which chooses either 0 or 1.
Therefore, the running time of step 1 is O(n log n).

Why the pool size for a legal guess is restricted to be bounded by a constant independent from input size?

r
To answer, we need first to explain what is the model of computation.
The computation model for the study of computational complexity is the Turing machine (TM) as shown in

ve
Fig. 10.1, which consists of three parts, a tape, ahead, and a finite control. Each TM can be described by
the following parameters, an alphabet Σ of input symbols, an alphabet Γ of tape symbols, a finite set Q of
states infinite control, a transition function δ, and an initial state. There exist two types of Turing machines,
deterministic TM and nondeterministic TM. They differ for the transition function of the finite control. In
the deterministic TM, the transition function δ is a mapping from Q × Γ to Q × Γ × {R, L}. A transition δ(q, a)

ni
= (p, b, R) (δ(q, a) = (p, b, L)) means that when the machine in state q reads symbol a, it would change state
to p, the symbol to b and move the head to the right (left) as shown in Fig. 10.2. However, in the
nondeterministic TM, the
U
ity
m

transition function δ is a mapping from Q × Γ to 2Q×Γ×{R,L}. That is, for any (q, a) ∈ Q×Γ, δ(q, a) is a subset
of Q×Γ× {R, L}. In this subset, every member can be used for the rule to guide the transition. When this
subset contains at least two members, the transition corresponds to a legal guess. The size of the subset is
)A

at most |Q| · |Γ| · 2 which is a constant independent of the input size. In many cases, the guessing step is
easily implemented by a polynomial number of legal guesses. However, there are some exceptions; one of
them is the following.

Integer Programming: Given an m×n integer matrix A and an n-dimensional integer vector b,
(c

determine whether there exists a m-dimensional integer vector x such that Ax ≥ b.

To prove Integer Programming in NP, we may guess an

n-dimensional integer vector x and check whether x satisfies Ax ≥ b. However, we need to make sure that
guessing can be done in nondeterministic
23
polynomial-time. That is, we need to show that if the problem has a solution, then there is a solution of
polynomial size. Otherwise, our guess cannot find it. This is not an easy job. We include the proof into the
following three lemmas.

ty
Let α denote the maximum absolute value of elements in A and b. Denote q = max(m, n).

Lemma 10.1.1 If B is a square submatrix of A, then | det B| ≤ (αq) q

Proof: Let k be the order of B. Then | det B| ≤ k!α k ≤ k kα k ≤ q qα q = (qα) q. ✷ Lemma 10.1.2 If rank(A) =

si
r < n, then there exists a nonzero vector z such that Az = 0 and every component of z is at most (αq) q.
Proof. Without loss of generality, assume that the left-upper r×r submatrix B is nonsingular. Set xr+1 = · · ·
= xn−1 = 0 and xn = −1. Apply Cramer’s rule to the system of equations

r
B(x1, · · · , xr) T = (a1n, · · · , arn) T

ve
where aij is the element of A on the ith row and the jth column. Then we can obtain xi = det Bi/ det B
where Bi is a submatrix of A. By Lemma 3.1, | det Bi | ≤ (αq) q. Now, set z1 = det B1, · · · , zr = det Br, zr+1 =
· · · = zn−1 = 0, and zn = det B. Then Az = 0.

Lemma 10.1.3 If Ax ≥ b has an integer solution, then it must have an integer solution whose components of
absolute value not exceed 2(αq) 2q+1.

ni
Proof. Let ai denote the ith row of A and bi the ith component of b. Suppose that Ax ≥ b has an integer
solution. Then we choose a solution x such that the following set gets the maximum number of elements.
U
Ax = {ai | bi ≤ aix ≤ bi + (αq) q+1} ∪ {ei | |xi | ≤ (αq) q },

where ei = (0, · · · , 0, 1 | {z } i , 0, · · · , 0). We first prove that the rank of Ax is n


For otherwise, suppose that the rank of Ax is less than n. Then we can find nonzero integer vector z such
ity

that for any d ∈ Ax, dz = 0, and each component of z does not exceed (αq) q. Note that ek ∈ Ax implies that
kth component zk of z is zero since 0 = ekz = zk. If zk 6= 0, then ek 6∈ Ax, so, |xk| > (αq) q . Set y = x + z or x
− z such that |yk| < |xk|. Then for every ei ∈ Ax, yi = xi, so, ei ∈ Ay, and for ai ∈ Ax, aiy = aix, so, ai ∈ Ay.
Thus, Ay contains Ax. Moreover, for ai 6∈ Ax, aiy ≥ aix − |aiz| ≥ bi +(αq) q+1 −nα(αq) q ≥ bi. Thus, y is an
integer solution of Ax ≥ b. By the maximality of Ax, Ay = Ax. This means that we can decrease the value of
the kth component again. However, it cannot be decreased forever. Finally, a contradiction would appear.
m

Thus, Ax must have rank n. Now, choose n linearly independent vectors d1, · · ·, dn from Ax. Denote ci =
dix. Then |ci | ≤ α + (αq) q+1. Applying Cramer’s rule to the system of equations dix = ci, i = 1, 2, · · ·, n, we
obtain a representation of x through ci’s: xi = det Di/ det D where D is a square submatrix of (AT, I) T and Di
is a square matrix obtained from D by replacing the ith column by the vector (c1, · · ·, cn) T. Note that the
)A

determinant of any submatrix of (AT, I) T equals to the determinant of a submatrix of A. By Laplace


expansion, it is easy to see that |xi | ≤ | det Di | ≤ (αq) q (|c1| + · · · + |cn|) ≤ (αq) qn(α + (αq) q+1) ≤ 2(αq)
2q+1

By Lemma 10.1.3, it is enough to guess a solution x whose total size is at most n log2 (2(αq) 2q+1)) = O(q 2
(log2 q + log2 α). Note that the input A and b have total length at least β = Pm i=1 Pn j=1 log2 |aij | + Pn j=1
(c

log2 |bj | ≥ mn + log2 α ≥ q+ IP is in NP.

24
Theorem 10.1.4 Integer Programming is in NP

Proof. It follows immediately from Lemma 10.1.3

The definition of the class NP involves three concepts, non-deterministic computation, polynomial-time,

ty
and decision problems. The first two concepts have been explained as above. Next, we explain what is the
decision problem.

A problem is called a decision problem if its answer is “Yes” or “No”. For example, the Hamiltonian Cycle

si
problem is a decision problem and all combinatorial optimization problems are not decision problems.
However, every combinatorial optimization problem can be transformed into a decision version. For
example, consider the Traveling Salesman problem as follows: Given n cities and a distance table between
n cities, find the shortest Hamiltonian tour where a Hamiltonian tour is a Hamiltonian cycle in the

r
complete graph on the n cities.

ve
Its decision version is as follows: Given n cities, a distance table between n cities, and an integer K > 0, is
there a Hamiltonian tour with total distance at most K?

If the Hamiltonian Cycle problem can be solved in polynomial-time, so is its decision version. Conversely, if
its decision version can be solved in polynomial-time, then we may solve the Hamiltonian Cycle problem in
polynomial-time in the following way. Let dmin and dmax be the smallest distance and the maximum

ni
distance between two cities. Let a = ndmin and b = ndmax. Set K = ⌈(a + b)/2⌉.
Determine whether there is a tour with total distance at most K by solving the decision version of the
Hamiltonian Cycle problem. If the answer is Yes, then set b ← K; else set a ← K. Repeat this process until |b
U
− a| ≤ 1. Then, compute the exact optimal objective function value of the Hamiltonian Cycle problem by
solving its decision version twice with K = a and K = b, respectively. In this way, suppose the decision
version of the Hamiltonian Cycle problem can be solved in polynomial-time p(n). Then the Hamiltonian
Cycle problem can be solved in polynomial-time O(log(ndmax))p(n).
ity

NP-Completeness
Essential facts about NP-completeness:

• Any NP-complete problem can be solved by a simple, but an exponentially slow algorithm.

 We don’t have polynomial-time solutions to any NP-complete problem.


m

 We can prove that either all NP-complete solutions have polynomial-time solutions, or none of them
do.
 It is generally believed that none of them do. But proving this would require solving the P versus NP
problem, one of the best-known unsolved problems in mathematics, much less theoretical computer
)A

science.

NP-completeness is a property of decision problems. It can be used to prove that other types of problems
are NP-hard, which means that (unless P = NP) they have no polynomial-time solutions. These include
search, optimization, and approximation problems.
(c

25
Definition: The class P is the set of decision problems for which there exists an algorithm solving them in O
(n k ) time for some constant k.

Definition: A formal language A is in NP if there exists another language B in P, such that for any string x, x

ty
is in A iff there exists a string y, with |y| = |x| O(1), such that (x, y) is in B.

An equivalent, more algorithmic definition of NP is as follows. An NP-procedure consists of a guess phase


and a verification phase. Given an input x, the guessing phase chooses an arbitrary string y such that |y| =
|x| O(1). The verification phase is an algorithm that takes both x and y as input and returns a bit. We say

si
that x is in the language of the NP-procedure iff the procedure can guess a y making the verification phase
output “true”.

There is an obvious deterministic decision procedure for any NP language – simply cycle through all

r
possible strings y and see whether the verification procedure accepts(x, y). The problem is that there are 2
n O(1) possible y’s, of course. Our question is whether there is a better way to decide membership in the

ve
NP language, one that might run in polynomial time.

Proving a language to be in NP is generally simple. We need to define a string to be guessed and define a
polytime verification procedure that accepts the input and a guess iff the guess proves that the input is in
the language.

ni
We need to determine relationships among the languages in NP, using the notion of reduction. We want to
show that if problem B is in P, then so is problem A. One way to do this is to describe an algorithm for A
that would run in polynomial time if it were allowed to make calls to a hypothetical poly-time algorithm to
decide membership in B. This is called a Cook reduction. For defining NP-completeness, though, we will
U
need a slightly different notion of reduction.

Let A and B be two formal languages, possibly over different alphabets. A Karp reduction from A to B is a
poly-time computable function f such that for any string x, x ∈ A if and only if f(x) ∈ B. If such a reduction
exists we say that A is poly-time reducible to B and write “A ≤p B”. We also sometimes read this as “A is no
ity

harder than B”.

Two languages A and B are said to be p-equivalent if both A ≤p B and B ≤p A. The relation ≤p is a partial
order on the equivalence classes. We are interested in the maximal equivalence class in NP:

Definition: A language B is NP-complete if (1) it is in NP, and (2) for any language A in NP, A ≤p B. Thus the
NP-complete languages, if they exist, are the hardest languages in NP. It should be easy to see that they
m

form an equivalence class and that if any NP-complete language is also in P, it follows that P = NP.

If a language is NP-complete, then, we have strong evidence that it is not in P. We can use the NP-
completeness of a language to talk about non-decision problems, even though by definition these cannot
)A

be NP-complete.

Definition: A problem X (with boolean output or otherwise) is said to be NP-hard if there is a Cook
reduction from X to some NP-complete problem B. That is, there is a poly-time algorithm, with access to a
hypothetical poly-time algorithm for X, that decides B.

It should be clear that if X is NP-hard and there is a poly-time algorithm for X, then P = NP.
(c

Our general methodology will be to develop a library of NP-complete problems. Once we have one
problem, there is a straightforward way to get more:

26
Lemma: Let B be an NP-complete language and A be a language. If we prove: • A ∈ NP • B ≤p A

then we may conclude that A is NP-complete.

But how do we start this process? In CMPSCI 601, we prove the Cook-Levin Theorem, that the language

ty
SAT is NP-complete. Last time, we gave unconditional proof that a particular generic NP language is NP-
complete. We’ll still need the Cook-Levin Theorem because SAT is a much more convenient starting place
to build up a library of NP-complete languages.

si
5.1.6 SAT and 3-SAT
Remember that a Boolean formula is defined to be satisfiable if there is at least one set of its input
variables that makes it true. The language SAT is the set of satisfiable Boolean formulas.

r
Recall that a boolean formula is in conjunctive normal form (CNF) if it is the AND of zero or more clauses,
each of which is the OR of zero or more literals. A formula is in 3-CNF if it is in CNF and has at most three

ve
literals in any clause. The language CNF-SAT is the set of CNF formulas that are satisfiable, and the
language 3-SAT is the set of 3-CNF formulas that are satisfiable.

Note that 3-SAT is a subset of CNF-SAT, which is a subset of SAT. In general, if A ⊆ B, we can’t be certain
that A ≤p B. Although the identity function maps elements of A to elements of B, we can’t be sure that it

ni
doesn’t map a non-element of A to an element of B. But here we know more – we can easily test the
formula to see whether it is in CNF or 3-CNF. To reduce 3-SAT to SAT, for example, we map a formula ϕ to
itself if it is in 3-CNF, and to 0 if it is not. A similar reduction works to show A ≤p B whenever A is such an
identifiable special case of B – that is, when A = B ∩ C and C ∈ P.
U
The Cook-Levin Theorem tells us that SAT is NP-complete, essentially by mapping an instance of the
generic NP problem to a formula that says that a particular string is a witness for a particular NP-procedure
on a particular input. (Recall that if A is an NP language defined so that x ∈ A iff ∃y : (x, y) ∈ B, we call y
variously a witness, proof, or certificate of x’s membership in A.)
ity

We’d like to show that CNF-SAT and 3-SAT are also NP-complete. It’s clear that they are in NP, but the easy
special case reduction does not suffice to show them NP-complete. We can reduce 3-SAT to SAT, but what
we need is to reduce the known NP-complete language, SAT, to the language we want to show to be NP-
complete, 3- SAT.

On HW#4 I’ll have you work through the general reduction from SAT to 3-SAT. Here, I’ll present the easier
m

reduction from CNF-SAT to 3-SAT. (The proof of the Cook-Levin Theorem given in CMPSCI 601 shows
directly that CNF-SAT is NP-complete.)

Let’s now see how to reduce CNF-SAT to 3-SAT. We need a function f that takes a CNF formula ϕ, in CNF,
)A

and produces a new formula f(ϕ) such that f(ϕ) is in 3- CNF and the two formulas are either both
satisfiable or both unsatisfiable. If we could make ϕ and f(ϕ) equivalent, this would do, but there is no
reason to think that an arbitrary CNF formula will even have a 3-CNF equivalent form. (Every formula can
be translated into CNF, but not necessarily into 3-CNF.)

Instead, we will make f(ϕ) have a different meaning from ϕ, and even a different set of variables. We will
add variables to f(ϕ) in such a way that a satisfying set of both old and new variables of f(ϕ) will exist if and
(c

only if there is a satisfying set of the old variables alone in ϕ. The old variables will be set the same way in
each formula.

27
Because ϕ is in CNF, we know that it is the AND of clauses, which we may name (`11 ∨ . . . ∨ `1k1 ), (`21 ∨ .
. . ∨ `2k2 ),. . .(`1m ∨ . . . ∨ `mkm ), where the `’s are each literal. For each of these clauses in ϕ, we will
make one or more 3-CNF clauses in f(ϕ), possibly including new variables, so that the one clause in ϕ will
be satisfied iff all the corresponding clauses in f(ϕ) are satisfied.

ty
So let’s consider a single clause `1∨. . .∨`k in ϕ. If k ≤ 3, we can simply copy the clause over to f(ϕ), because
it is already suitable for a CNF formula. What if k = 4? We can add one extra variable and make two
clauses: (`1 ∨ `2 ∨ x1) and (¬x1 ∨ `3 ∨ `4). It’s not too hard to see that both of these clauses are satisfied iff

si
at least one of the `’s is true. If `1 or `2 is true, we can afford to make x1 false, and if `3 or `4 is true, we can
make x1 true.

The general construction for k > 4 is similar. We have k − 2 clauses and k − 3 new variables: The clauses are
(`1 ∨ `2 ∨ x1), (¬x1 ∨ `3 ∨ x2), (¬x2 ∨ `4 ∨ x3), and so on until we reach (¬xk−4 ∨ `k−2 ∨ xk−3) and finally

r
(¬xk−3 ∨ `k−1 ∨ `k).

ve
If we satisfy the original clause with some `i , this satisfies one of the new clauses, and we can satisfy the
others by making all the xi’s before it true and all those after it false. Conversely, if we satisfy all the new
clauses, we cannot have done it only with xi’s because there are more clauses than xi’s and each xi only
appears at most once as true and at most once as false, and so can satisfy at most one clause.

ni
Since this reduction is easily computable in polynomial time, it shows that CNF-SAT ≤p 3-SAT, and thus
(with the quoted result that CNF-SAT is NP-complete) that 3- SAT is NP-complete.
3-SAT is often the most convenient problem to reduce to something else, but other variants of SAT are also
sometimes useful. One we’ll use later is not-all-equal-SAT or NAE-SAT. Here the input is a formula in 3-CNF,
U
but the formula is “satisfied” only if there is both a true literal and a false literal in each clause.

Let’s prove that NAE-SAT is NP-complete. Is it in NP? Yes, if we guess a satisfying assignment it is easy (in
linear time) to check the input formula and verify that there is a true literal and a false literal in each
clause. So we need to reduce a known NP-complete problem to NAESAT – we’ll choose 3-SAT itself. Again
ity

we’ll transform each old clause into the AND of some new clauses, in this case, three of them.

Given the clause `1 ∨ `2 ∨ `3, we introduce two new variables x and y that appear only in the new clauses
for this clause, and a single new variable α that appears several times. The three new clauses are (`1 ∨ `2 ∨
x) ∧ (¬x ∨ `3 ∨ y) ∧ (x ∨ y ∨ α).
We must show that the three new clauses are jointly NAE satisfiable if the original clause is satisfiable in an
m

ordinary way. First, we assume that the old clause is satisfied and show that we can choose values for x, y,
and α to NAE-satisfy the new clauses. We make α true (for all the clauses in the formula) and consider the
seven possibilities for the values of `1, `2, and `3. In each of the seven cases, we can set x and y, not both
true, to NAE-satisfy the three new clauses – we’ll check this on the board.
)A

Now assume that the three new clauses are NAE-satisfied – we will show that at least one of the `i’s is true.
First assume that α is true, because if the new formula is NAE satisfied with α false we can just negate
every variable in the formula and get a setting that NAE-satisfies all the clauses but has α true.

If α is true, then either x or y must be false. If x is false, then either `1 or `2 must be true. If y is false and x is
true, then `3 must be true. So one of the three `’s must be true, and the original clause is satisfied
(c

ordinarily.

28
5.1.7 The SUBSET-SUM Problem
For our final problem today, we revisit the SUBSET-SUM problem – the input is a set of numbers {a1, . . . ,
an} and a target number t, and we ask whether there is a subset of the numbers that add exactly to t.

ty
Using dynamic programming, we showed that we could decide this language in time that is polynomial in n
and s, the sum of all the ai.

Now we allow the numbers to get larger so that they now might be n bits long. The problem is still in NP
because we can guess a subset by guessing a bit path, add the numbers in the set, and verify that we get t.

si
But it’s no longer clear that we are in P, and we will now see that the general problem is NP-complete.

We reduce 3-SAT to SUBSET-SUM (with large numbers). We first assume that every clause in our input
formula has exactly three literals – we can just repeat literals in the same clause to make this true. Our

r
numbers will be represented in decimal notation, with a column for each of the v variables and a column
for each clause in the formula.

ve
We’ll create an item ai for each of the 2v literals. This item will have a 1 in the column for its variable, a 1 in
the column of each clause where the literal appears, and zeroes everywhere else. We also have two items
for each clause, each with a 1 in the column for that clause and zeroes everywhere else. The target number
has a 1 for each variable column and a 3 for each clause column.

ni
We now have to prove that there is a subset summing to the target iff the formula is satisfactory. If there is
a satisfying assignment, we choose the item for each literal in that assignment. This has one 1 in each
variable column, and somewhere from one to three 1’s in each clause column. Using extra items as
needed, we can reach the target.
U
Conversely, if we reach the target we must have chosen one item with a 1 in each variable column, so we
have picked v variables forming an assignment. Since we have three 1’s in each clause column and at most
two came from the extra items, we must have at least one 1 in each clause column from our assignment,
making it a satisfying assignment.
ity

Given a problem with numerical parameters, we say that it is pseudo-polynomial if it becomes polynomial
when those parameters are given in unary. If it is NP-complete with parameters given in unary, we say that
it is strongly NP-complete. The SUBSET-SUM problem is pseudo-polynomial, but all our graph problems are
strongly NP-complete. Recall that the KNAPSACK is similar to SUBSET-SUM but has a value for each item as
well as its weight. We are asked to find whether a set of at least a given value exists with at most a given
m

weight. Since SUBSET-SUM is an identifiable special case of KNAPSACK (where weight and value are both
equal), we know that SUBSET-SUM ≤p KNAPSACK. Since KNAPSACK (as a decision problem) is in NP, it is
NP-complete. The associated optimization problem is thus NP-hard.
)A
(c

29
Exercise

Q1. Explain The complexity classes


Q2. Explain Knapsack

ty
Q3. What is Node Cover?

Q4. Explain Clique.

si
Q5. What is the Traveling Salesman Problem?

r
ve
ni
U
ity
m
)A
(c

30

You might also like