Professional Documents
Culture Documents
UNIT V
For example: Searching of an element from the list O(logn), sorting of elements
O(logn).
The second group consists of problems that can be solved in non-deterministic
polynomial time.
For example: Knapsack problem O(2n/2) and Travelling Salesperson
problem(O(n22n )).
• Any problem for which answer is either yes or no is called decision problem. The
algorithm for decision problem is called decision algorithm.
• Any problem that involves the identification of optimal cost (minimum or
maximum) is called optimization problem. The algorithm for optimization
problem is called optimization algorithm.
• Definition of P - Problems that can be solved in polynomial time. (“P” stands for
polynomial).
Examples - Searching of key element, Sorting of elements, All pair shortest path.
• Definition of NP - It stands for “non-deterministic polynomial time”. Note
that NP does not stand for “non-polynomial”.
Examples - Travelling salesperson problem, Graph coloring problem, Knapsack
problem, Hamiltonian circuit problems
• The NP class problems can be further categorized into NP-complete and NP hard
problems.
All NP-complete problems are NP-hard but all NP-hard problems cannot be NP-
complete.
The NP class problems are the decision problems that can be solved by non deterministic
polynomial algorithms.
The problems that are solved in polynomial time are called tractable problems and the problems
that require super polynomial time are called non-tractable problems. All deterministic
polynomial time algorithms are tractable and the non-deterministic polynomials are intractable
Satisfiability Problem:
The satisfiability is a boolean formula that can be constructed using the following
literals and operations.
1. A literal is either a variable or its negation of the variable.
2. The literals are connected with operators ˅, ˄͢, ⇒ , ⇔
3. Parenthesis
The satisfiability problem is to determine whether a Boolean formula is truefor
some assignment of truth values to the variables. In general, formulas are
expressed in Conjunctive Normal Form (CNF).
Types of Problems:
Tractable
Intractable
Decision
Optimization
Tractable: Problems that can be solvable in a reasonable (polynomial) time.
Intractable: Some problems are intractable, as they grow large, we are unable to solve them in
reasonable time.
Decision Problem:
• Any problem for which the answer is either yes or no is called decision problem. The
• algorithm for decision problem is called decision algorithm.
• Example: Sum of subsets problem.
Reducibility:
A problem Q1 can be reduced to Q2 if any instance of Q1 can be easily rephrased as an
instance of Q2. If the solution to the problem Q2 provides a solution to the problem Q1,
then these are said to be reducable problems.
Let L1 and L2 are the two problems. L1 is reduced to L2 iff there is a way to solve L1 by
a deterministic polynomial time algorithm using a deterministic algorithm that solves L2
in polynomial time and is denoted by L1α L2.
If we have a polynomial time algorithm for L2 then we can solve L1 in polynomial time.
Two problems L1 and L2 are said to be polynomially equivalent iff L1α L2 and L2 α L1.
Example: Let P1 be the problem of selection and P2 be the problem of sorting. Let the
input have n numbers. If the numbers are sorted in array A[ ] the ith smallest element of
the input can be obtained as A[i]. Thus P1 reduces to P2 in O(1) time.
Class P:
P: the class of decision problems that are solvable in O(p(n)) time, where p(n) is a polynomial of
problem’s input size n
Examples:
• searching
• element uniqueness
• graph connectivity
• graph acyclicity
• primality testing
Class NP
NP (nondeterministic polynomial): class of decision problems whose proposed solutions can be
verified in polynomial time = solvable by a nondeterministic polynomial algorithm
A nondeterministic polynomial algorithm is an abstract two-stage procedure that:
Normally the decision problems are NP-complete but the optimization problems are NPHard.
However if problem L1 is a decision problem and L2 is an optimization problem, then it is
possible that L1α L2.
Example: Knapsack decision problem can be reduced to knapsack optimization problem.
There are some NP-hard problems that are not NP-Complete.
String Matching
String Matching Algorithm is also called "String Searching Algorithm." This is a vital class
of string algorithm is declared as "this is the method to find a place where one is several
strings are found within the larger string."
Given a string T [1......n], the substrings are represented as T [i. .... j] for some 0≤i ≤ j≤n-1,
the string formed by the characters in T from index i to index j, inclusive. This process that a
string is a substring of itself (take i = 0 and j =m).
The proper substring of string T [1......n] is T [1......j] for some 0<i ≤ j≤n-1. That is, we
must have either i>0 or j < m-1.
Plagiarism Detection:
The documents to be compared are decomposed into string tokens and compared using
string matching algorithms. Thus, these algorithms are used to detect similarities between
them and declare if the work is plagiarized or original.
Digital Forensics: String matching algorithms are used to locate specific text strings of
interest in the digital forensic text, which are useful for the investigation.
Spelling Checker: Trie is built based on a predefined set of patterns. Then, this trie is used for
string matching. The text is taken as input, and if any such pattern occurs, it is shown by
reaching the acceptance state.
Search engines or content search in large databases: To categorize and organize data
efficiently, string matching algorithms are used. Categorization is done based on the search
keywords. Thus, string matching algorithms make it easier for one to find the information they
are searching for.
The naïve approach tests all the possible placement of Pattern P [1.......m] relative to text T
[1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s. Compare T
[s+1.......s+m] to P [1. m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P [1. m] = T
[s+1. s+m] for each of the n - m +1 possible value of s.
Example:
The operation of the naive string matcher for the pattern P = aab and the text T = acaabc. We
can imagine the pattern P as a template that we slide next to the text. (a)–(d) The four successive
alignments tried by the naive string matcher. In each part, vertical lines connect corresponding
regions found to match (shown shaded), and a jagged line connects the first mismatched
character found, if any. The algorithm finds one occurrence of the pattern, at shift s D 2, shown
in part (c).
If the hash values are unequal, the algorithm will determine the hash value for next M-
character sequence.
If the hash values are equal, the algorithm will analyze the pattern and the M-character
sequence.
In this way, there is only one comparison per text subsequence, and character matching is
only required when the hash values match.
Algorithm:
Example:
For string matching, working module q = 11, how many spurious hits does the Rabin-Karp
matcher encounters in Text T = 31415926535
T = 31415926535.......
P = 26
Here T.Length =11 so Q = 11
And P mod Q = 26 mod 11 = 4
Now find the exact match of P mod Q...
Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. A
matching time of O (n) is achieved by avoiding comparison with an element of 'S' that have
previously been involved in comparison with some element of the pattern 'p' to be matched. i.e.,
backtracking on the string 'S' never occurs
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about
how the pattern matches against the shift of itself. This information can be used to avoid a
useless shift of the pattern 'p.' In other words, this enables avoiding backtracking of the string 'S.'
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the
occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found.
In the above pseudo code for calculating the prefix function, the for loop from step 4 to step 10
runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing prefix
function is O (m).
The KMP Matcher with the pattern 'p,' the string 'S' and prefix function 'Π' as input, finds a
match of p in S. Following pseudo code compute the matching component of KMP algorithm:
The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the string 'S.' Since
step 1 to step 4 take constant times, the running time is dominated by this for the loop. Thus
running time of the matching function is O (n).
Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'
For 'p' the prefix function, ? was computed previously and is as follows:
Tries can be used to perform prefix queries for information retrieval. Prefix queries
search for the longest prefix of a given string X that matches a prefix of some string in
the trie.
For example, the standard trie over the alphabet Σ ={a, b} for the set {aabab, abaab, babbb,
bbaaa, bbab}
An internal node can have 1 to d children when d is the size of the alphabet. Our example
is essentially a binary tree.
We can implement a trie with an ordered tree by storing the character associated with an
edge at the child node below it.
Compressed Tries:
A compressed trie is like a standard trie but makes sure that each trie had a degree of at
least 2. Single child nodes are compressed into an single edge.
A critical node is a node v such that v is labeled with a string from S, v has at least 2
children, or v is the root.
To convert a standard trie to a compressed trie we replace an edge (v0, v1) each chain on
nodes (v0, v1...vk) for k 2 such that
Suffix tree is a compressed trie of all the suffixes of a given string. Suffix trees help in solving a
lot of string related problems like pattern matching, finding distinct substrings in a given string,
finding longest palindrome etc. In this tutorial following points will be covered:
Compressed Trie
Suffix Tree Construction (Brute Force)
Brief description of Ukkonen's Algorithm
Before going to suffix tree, let's first try to understand what a compressed trie is.
As it might be clear from the images show above, in a compressed trie, edges that direct to a
node having single child are combined together to form a single edge and their edge labels are
concatenated. So this means that each internal node in a compressed trie has atleast two children.
Also it has atmost leaves, where is the number of strings inserted in the compressed trie. Now
both the facts: Each internal node having atleast two children, and that there are leaves, implies
that there are atmost nodes in the trie. So the space complexity of a compressed trie is as
compared to the of a normal trie.
So that is one reason why to use compressed tries over normal tries.
Before going to construction of suffix trees, there is one more thing that should be understood,
Implicit Suffix Tree. In Implicit suffix trees, there are atmost leaves, while in normal one there
should be exactly leaves. The reason for atmost leaves is one suffix being prefix of another
suffix. Following example will make it clear. Consider the string
Implicit Suffix Tree for the above string is shown in image below: