Design and Analysis of Algorithms

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT 1
DESIGN AND ANALYSIS OF ALGORITHMS
UNIT I Introduction: Algorithm Definition, Algorithm Specification, performance Analysis, Randomized

Algorithms. Sets & Disjoint set union: introduction, union and find operations. Basic Traversal & Search
Techniques: Techniques for Graphs, connected components and Spanning Trees, Bi-connected components
and DFS.
INTRODUCTION :
WHAT IS AN ALGORITHM?
The word algorithm comes from the name of a Persian author, Abu Ja'far Mohammed ibn Musa al
Khowarizmi(c.825A.D.),who wrote a textbook on mathematics. "algorithm" has come to refer to a method that
can be used by a computer for the solution of a problem.This is what makes algorithm different from words
such as process, technique,or method
ALGORITHM DEFINITION
An algorithm is a finite set of instructions that, if followed, accomplishes a particular task.In addition,all
algorithms must satisfy the following criteria:
1. Input. Zero or more quantities are externally supplied.
2. Output. At least one quantity is produced.
3. Definiteness. Each instruction is clear and unambiguous.
4. Finiteness. If we trace out the instructions of an algorithm,then for all cases,the algorithm terminates after a
finite number of steps.
5. Effectiveness. Every instruction must be very basic so that it can be carried out, in principle,by a person
using only pencil and paper.It is not enough that each operation be definite as in criterion3; it also must be
feasible
Differences between Algorithm and Program
Issues or study of Algorithm:

How to device or design an algorithm creating and algorithm.
How to express an algorithm definiteness.
How to analysis an algorithm time and space complexity.
How to validate an algorithm fitness.
Testing the algorithm checking for error.
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 1
The study of Algorithms includes many important and active areas of research. There are four distinct areas of
study one can identify
How to device algorithms-

Creating an algorithm is an art which many never fully automated. A major goal is to study various
design techniques that have proven to be useful. By mastering these design strategies, it will become easier for
you to device new and useful algorithms. some of techniques may already be familiar, and some have been
found to be useful. Dynamic programming is one technique. Some of the techniques are especially useful in
fields other than computer science such as operations research and electrical engineering.
How to validate algorithms:

Once an algorithm is devised, it is necessary to show that it computes the correct answer for all
possible legal inputs. We refer to this process as algorithm validation. The algorithm need not as yet be
expressed as a program. The purpose of validation is to assure us that this algorithm will work correctly
independently. Once the validity of the method has been shown, a program can be written and a second phase
begins. This phase is referred to as program proving or sometimes as program verification.
A proof of correctness requires that the solution be stated in two forms. One form is usually as a
program which is annotated by a set of assertions about the input and output variables of the program. These
assertions are often expressed in the predicate calculus. The second form is called a specification, and this may
also be expressed in the predicate calculus. A complete proof of program correctness requires that each
statement of a programming language be precisely defined and all basic operations be proved correct.
How to analyze algorithms:

As an algorithm is executed, it uses the computer's central processing unit (CPU) to perform operations
and its memory to hold the program and data. Analysis of algorithms or performance analysis refers to the task
of determining how much computing time and storage algorithms replace.we analyze the algorithm based on
time and space complexity.The amount of time need to run the
algorithm is called time complexity.The amount of memory need to run the algorithm is called space
complexity
How to test a program:

Testing a program consists of two phases
1. Debugging
2. Profiling
Debugging: It is the process of executing programs on sample data sets to determine whether faulty
results occur and, if so to correct them. However, as E. Dijkstra has pointed out, “debugging can only point to
the presence of errors, but not to the absence".
Profiling: Profiling or performance measurement is the process of executing a correct program on data
sets and measuring the time and space it takes to compute the results.
Advantages of Algorithms:
1. It is a step-wise representation of a solution to a given problem, which makes it easy to understand.
2. An algorithm uses a definite procedure.
3. It is not dependent on any programming language, so it is easy to understand for anyone even without
programming knowledge.
4. Every step in an algorithm has its own logical sequence so it is easy to debug.
5. By using algorithm, the problem is broken down into smaller pieces or steps hence, it is easier for
programmer to convert it into an actual program.

Disadvantages of Algorithms:
1. Algorithms is Time consuming.
2. Difficult to show Branching and Looping in Algorithms.
3. Big tasks are difficult to put in Algorithms.
ALGORITHM SPECIFICATION
Algorithm can be described in three ways.
1. Natural language like English:
When this way is used care should be taken, we should ensure that each & every statement is definite.
2. Graphic representation called flowchart:
This method will work well when the algorithm is small& simple.
3. Pseudo-code Method:
This method describe algorithms as program, which resembles language like Pascal & algol.
Pseudo-Code Conventions for expressing algorithms:

1. Comments begin with // and continue until the end of line.
2. Blocks are indicated with matching braces {and}.
3. An identifier begins with a letter. The data types of variables are not explicitly declared.
Compound data types can be formed with records. Here is an example,
node = record
{
data type – 1 data-1;
.
.
.
data type – n data – n; node * link;
}
Here link is a pointer to the record type node. Individual data items of a record can be accessed with  and
period.
4. Assignment of values to variables is done using the assignment statement.
<Variable>:= <expression>;
5. There are two Boolean values true and false.
Logical Operators and, or, not
Relational Operators <, <=,>,>=, =, !=
6. Elements of multidimensional arrays are accessed using[ and ]. For example,if A is a two dimensional
array, the (i,j)th element of the array is denoted as -A[i,j].Array indices start at zero.
7. The following looping statements are employed.
for, while and repeat-until
While Loop:
while < condition > do
{ As long as (condition)is true,the statements get
<statement-1> executed.When (condition) becomes false, the
. loop is exited.The value of (condition) is
. evaluated at the top of the loop
.
.
<statement-n>
} Here value l,value 2, and step are arithmetic
expressions.A variable of type integer or real or a
For Loop: numerical constant is a simple form of an arithmetic
for variable: = value 1 to value 2 step step do expression.The clause "step step" is optional and taken
{ as +1if it does not occur,step could either be positive
<statement-1> or negative
.
.
.
<statement-n>
}
repeat-until:
repeat The statements are executed as long as {condition) is false.The
<statement-1> value of {condition)is computed after executing the statements. The
. instruction break;can be used within any of the above looping
. instructions to force exit. In case of nested loops,break;results in the
. exit of the innermost loop that it is a part of. A return statement
<statement-n> within any of the above also will result in exiting the loops.A return
until<condition> statement results in the exit of the function itself
8. A conditional statement has the following forms.

if <condition> then <statement>
if <condition> then <statement-1> else <statement-1>
Case statement: A case statement is interpreted as follows. If (condition1)

case is true,(statement1) gets executed and the case statement is
{ exited.If (statement1)is false,(condition2) is evaluated. If
: <condition-1> : <statement-1> (condition2) is true,(statement2) gets executed and the
. case statement exited,and soon.If none of the conditions
. (condition1),..., (condition n) are true,(statement n+1)is
. executed and the case statement is exited.The else clause
: <condition-n> : <statement-n> is optional
: else : <statement-n+1>
}
9. Input and output are done using the instructions read & write. .No format is used to specify the size of input
or output quantities
10. There is only one type of procedure: Algorithm, the heading takes the form,
Algorithm Name (Parameter lists)
Where Name is the name of the procedure and (parameter list) is a listing of the procedure parameters.The
body has one or more (simple or compound)statements enclosed within braces{and }.An algorithm may or may
not return any values
As an example,the following algorithm finds and returns the maximum of n given number
Recursive Algorithms:
-->A Recursive function is a function that is defined in terms of itself.
-->Similarly, an algorithm is said to be recursive if the same algorithm is invoked in the body.
-->An algorithm that calls itself is Direct Recursive.
-->Algorithm “A” is said to be Indirect Recursive if it calls another algorithm which in turns calls “A”.

TOWERS OF HANOI
Towers of Hanoi is a problem in which there will be some disks which of decreasing sizes and were stacked
on the tower in decreasing order of size bottom to top. Besides this there are two other towers (B and C) in
which one tower will be act as destination tower and other act as intermediate tower. In this problem we have to
move the disks from source tower to the destination tower. The conditions included during this problem are:
1.Only one disk should be moved at a time.
2.No larger disks should be kept on the smaller disks.
The recurrence relation capturing the optimal time of the Tower of Hanoi problem with n discs is. T(n) = 2T(n – 1) + 1
Where n is the number of disks.
Step 1: Move the smaller disk which is present at the top of the tower A to C.
Step 2: Then move the next smallest disk present at the top of the tower A to B.
Step 3: Now move the smallest disk present at tower C to tower B
Step 4: Now move the largest disk present at tower A to tower C
Step 5: Move the disk smallest disk present at the top of the tower B to tower A.
Step 6: Move the disk present at tower B to tower C.
Step 7: Move the smallest disk present at tower A to tower C In this way disks are moved from source tower to
destination tower.
PERFORMANCE ANALYSIS:
1. Space Complexity : The space complexity of an algorithm is the amount of memory it needs to run to compilation.
2. Time Complexity : The time complexity of an algorithm is the amount of computer time it needs to run to compilation.
Space Complexity:
The Space needed by each of these algorithms is seen to be the sum of the following component.
1. A fixed part that is independent of the characteristics (eg: number, size)of the inputs and outputs. The part typically
includes the instruction space (ie. Space for the code), space for simple variable and fixed- size component variables
space for constants, and so on.

2. A variable part that consists of the space needed by component variables whose size is dependent on the particular
problem instance being solved, the space needed by referenced variables (to the extent that is depends on instance
characteristics), and the recursion stack space.
The space requirement s(p) of any algorithm p may therefore be written as,
S(P) = c+ Sp(Instance characteristics)
Where “c”is a constant.
Time Complexity:
The time T(p) taken by a program P is the sum of the compile time and the run time(execution time)
The compile time does not depend on the instance characteristics.
T(P)=Compile time + Execution time
T(P)=tp(Execution time)
Step Count
For algorithm heading -> 0
For comments -> 0
For Braces -> 0
Expressions -> 0
For any loop, if condition check is in form of i<=n then time taken is [(top index+1)-bottom index]+1
if condition check is in form of i<n then time taken is [top index+-bottom index]+1
Example1:
Algorithm sum(a,n)
{
s= 0.0;
count = count+1;
for i=1 to n do
{
count =count+1;
s=s+a[i];
count=count+1;
}
count=count+1;
count=count+1;
return s;
}

If the count is zero to start with, then it will be 2n+3 on termination. So each invocation of sum execute a total of 2n+3
steps
Amortized Analysis:
->Amortized analysis means finding average running time per operation over a worst case sequence of operations.
->Suppose a sequence I1,I2,D1,I3,I4,I5,I6,D2,I7 of insert and delete operations is performed on a set.
->Assume that the actual cost of each of the seven inserts is one and for delete operations D1 and D2 have an
actual cost of 8 and 10 so the total cost of sequence of operations is 25.
->In amortized scheme we charge some of the actual cost of an operation to other operations.
->This reduce the charge cost of some operations and increases the cost of other operations.
->The amortized cost of an operation is the total cost charge to it.
->The only requirement is that the some of the amortized complexities of all operations in any sequence of
operations be greater than or equal to their some of actual complexities.
->Let us assume the ‘amortized cost’ is greater than ‘actual cost’ I.e amortized(i) >= actual(i) (or)
amortized(i) - actual(i) >= 0
->In the above equations, amortized(i) and actual(i) denotes amortized and actual complexities of ith operation

-> The sum of amortized complexities of all operations in any sequence of operations be greater than or equal
to their sum of the actual complexities I.e Σ amortized(i) >= Σ actual(i)
-> Relative to actual and amortized cost of each operation in any sequence we define a potential function p(i) is
as follows p(i) = amortized(i) - actual(i) + p(i-1) ---->②
-> In the above equation 2, the ith operation causes the potential function to change by the differences between
amortized cost and actual cost for all 1 <= i <= n I.e Σ1<= i <= n p(i) = Σ1 <= i <= n (amortized(i) - actual(i) + p(i-1))
Expand Σ1<= i <= n p(i)-p(i-1) = p(1)+p(2)+p(3)+…..+p(n)-p(0)-p(1)-p(2)-……..-p(n-1) = p(n)-p(0)
Therefore p(n)-p(o) = Σ1 <= i <= n (amortized(i) - actual(i) )
So, under the assumption of p(0)=0, the potential p(i) is the amount by which ith operation has been over
changed
There are three main techniques used for amortized analysis:

The aggregate method, where the total running time for a sequence of operations is analyzed. We determine
an upper bound T(n) on the total sequence of n operations. The cost of each will be then be T(n)/n
The accounting (or banker's) method, where we impose an extra charge on inexpensive operations and use it
to pay for expensive operations later on.
The potential (or physicist's) method, in which we derive a potential function characterizing the amount of
extra work we can do in each step. This potential either increases or decreases with each successive operation,
but cannot be negative.


Accounting Method
Potential Method

Asymptotic Notations
->The best algorithm can be measured by the efficiency of that algorithm.
->The efficiency of an algorithm is measured by computing time complexity.
->The asymptotic notations are used to find the time complexity of an algorithm.
->Asymptotic notations gives fastest possible, slowest possible time and average time of the algorithm.
The basic asymptotic notations are
Big-oh(O)
Omega(Ω)
Theta(Θ).
Little-oh(o)
Little Omega (ω)
1 BIG-OH(O) NOTATION:
->It is denoted by 'O'.
->It is used to find the upper bound time of an algorithm , that means the maximum time taken by the algorithm.
Definition : Let f(n),g(n) are two non-negative functions. If there exists two positive constants c ,n0 . such that
c>0 and for all n>=n0 if f(n)<=c*g(n) then we say that f(n)=O(g(n))

example : consider f(n)=2n+3 and g(n)=n^2

Sol :f(n)<=c*g(n)
let us assuming as c=1, then f(n)<=g(n)
if n=1, 2n+3<=n^2 = 2(1)+3<=1^2 =>5<=1 (false)
If n=2, n+3<=n^2=2(2)+3<=2^2= 7<=4 (false)
If n=3, 2n+3<=n^2= 2(3)+3<=3^2=9<=9 (true)
If n=4, 2n+3<=n^2=>2(4)+3<=4^2=11<=16 (true)
if n=5, 2n+3<=n^2=2(5)+3<=5^2=13<=25 (true)
If n=6, 2n+3<=n^2=2(6)+3<=6^2=15<=36 (true)
.:n>=3, f(n)=O(n^2) i.e, f(n)=O(g(n))
2 . OMEGA (Ω) NOTATION:

->It is denoted by ' Ω'.
->It is used to find the lower bound time of an algorithm, that means the minimum time taken by an algorithm.
Definition : Let f(n),g(n) are two non-negative functions. If there exists two positive constants c,n0.such that
c>0 and for all n>=n0.if f(n)>=c*g(n) then we say that f(n)=Ω(g(n))
Example : consider f(n)=2n+5, g(n)=2n

Sol : Let us assume as c=1
If n=1: 2n+5>=2n => 2(1)+5>=2(1) => 7>=2 (true)
if n=2: 2n+3>=2n=> 2(2)+5>=2(2)=> 9>=4 (true)
if n=3: 2n+3>=2n=> 2(3)+5>=2(3)=> 11>=6(true)
for all .:n>=1, f(n)=Ω(n) i.e , f(n)=Ω(g(n))
3. THETA (Θ) NOTATION:

->It is denoted by the symbol called as (Θ).
->It is used to find the time in-between lower bound time and upper bound time of an algorithm.
Definition : Let f(n),g(n) are two non-negative functions. If there exists positive constants c1,c2,n0.such that
c1>0,c2>0 and for all n>=n0.if c1*g(n)<=f(n)<=c2*g(n) then we say that f(n)=Θ(g(n)).
Example : consider f(n)=2n+5, g(n)=n
Sol :c1*g(n)<=f(n)<=c2*g(n)
let us assuming as c1=3 then c1*g(n)=3n
if n=1,3n<=2n+5=>3(1)<=2(1)+5=>3<=7 (true)
If n=2,3n<=2n+5=>3(2)<=2(2)+5=>6<=9 (true)
If n=3,3n<=2n+5=>3(3)<=2(3)+5=>9<=11 (true)
let us assuming as c2=4 c2*g(n)=4n

if n=1,2n+5<=4n=>2(1)+5<=4(1)=>7<=4 (false)
If n=2,2n+5<=4n=>2(2)+5<=4(2)=>9<=8 (false)
If n=3,2n+5<=4n=>2(3)+5<=4(3)=>11<=12 (true)
If n=4,2n+5<=4n=>2(4)+5<=4(4)=>13<=16 (true)
for all .:n>=3 f(n)=Θ(n) f(n)= Θ (g(n))
4. LITTLE O NOTATIONS :
-> It is denoted by ‘o’
-> Little o notation is used to describe an upper bound that cannot be tight. In other words, loose upper bound
of f(n).
-> Let f(n) and g(n) are the functions that map positive real numbers. We can say that the function f(n) is o(g(n))
if for any real positive constant c, there exists an integer constant n0 ≤ 1 such that f(n) > 0.
Mathematical Relation of Little o notation

Using mathematical relation, we can say that f(n) = o(g(n)) means,
Example on little o asymptotic notation

If f(n) = n2 and g(n) = n3 then check whether f(n) = o(g(n)) or not.
5. LITTLE OMEGA NOTATION :

Definition : Let f(n) and g(n) be functions that map positive integers to positive real numbers. We say that f(n)
is ω(g(n)) (or f(n) ∈ ω(g(n))) if for any real constant c > 0, there exists an integer constant n0 ≥ 1 such that
f(n) > c * g(n) ≥ 0 for every integer n ≥ n0.
f(n) has a higher growth rate than g(n) so main difference between Big Omega (Ω) and little omega (ω) lies in
their definitions.In the case of Big Omega f(n)=Ω(g(n)) and the bound is 0<=cg(n)<=f(n), but in case of little
omega, it is true for 0<=c*g(n)<f(n).
The relationship between Big Omega (Ω) and Little Omega (ω) is similar to that of Big-Ο and Little o except
that now we are looking at the lower bounds. Little Omega (ω) is a rough estimate of the order of the growth
whereas Big Omega (Ω) may represent exact order of growth. We use ω notation to denote a lower bound that
is not asymptotically tight.
And, f(n) ∈ ω(g(n)) if and only if g(n) ∈ ο((f(n)).
In mathematical relation,
if f(n) ∈ ω(g(n)) then,
lim f(n)/g(n) = ∞
n→∞
Example:
Prove that 4n + 6 ∈ ω(1);
the little omega(ο) running time can be proven by applying limit formula given below.
if lim f(n)/g(n) = ∞ then functions f(n) is ω(g(n))
n→∞
here,we have functions f(n)=4n+6 and g(n)=1
lim (4n+6)/(1) = ∞
n→∞
and,also for any c we can get n0 for this inequality 0 <= c*g(n) < f(n), 0 <= c*1 < 4n+6
Hence proved.
RANDOMIZED ALGORITHMS
Basics of Probability Theory
->The probability of an event E is defined to be |E|/|S| where S is the sample space
->Probability theory has the goal of characterizing the outcomes of natural or conceptual "experiments”.
->Examples of such experiments include tossing a coin ten times, rolling a die three times, playing a lottery,
gambling, picking a ball from an urn containing white and red balls, and soon.
->Each possible outcome of an experiment is called a sample point and the set of all possible out comes is
known as the sample space S.
->In this An event E is a subset of the sample space consists of n sample points, then there are2^n possible
events.
Deterministic algorithm:
->Deterministic algorithm is an algorithm in which for given particular input it will always produce the same
output.
->The goal of a deterministic algorithm is to always solve a problem correctly and quickly (in polynomial time).
Randomized algorithms:
->In addition to input, the algorithm uses random numbers to make random choices during execution.
->For the same input there can be different output, i.e. its behavior changes with different runs.
->These algorithms depend upon Random Number Generators.
->Randomized algorithms rely on the statistical properties of random numbers. example of a randomized
algorithm is quick sort.
->It tries to achieve good performance in the average case.
->Randomized algorithm is also called as probabilistic algorithm.
->To overcome the computation problem of exploitation of running time of a deterministic algorithm,
randomized algorithm is used.
When randomized algorithms are used?

->They are used whenever we are presented with time or memory constraints, and when an average case
solution is acceptable.
Will see randomized algorithms in the following fields:
Number theoretic algorithms – Primality Testing
Data Structures – Searching, sorting, computational geometry
Algebraic Identities – Verification of polynomial and matrix identities
Mathematical Programming – Faster linear programming algorithms
Graph algorithms – Finding minimum spanning trees, shortest paths, minimum cuts
Counting and enumeration – Counting combinational structures
Parallel and distributed computing – Deadlock avoidance, distributed consensus
Quick sort: The worst case for quick sort is to come across a list that is already mostly sorted (the time
complexity shoots up to O(n2).
Randomized Quick sort avoids this by shuffling the positions of the elements in the list, and on average is
able to sort in O(nlogn) time.
Types of Randomized algorithms
1.Las Vegas algorithm:

->A Las Vegas algorithm is a randomized algorithm that always produces a correct result for the same input or
simply doesn’t find one.
->The execution time of a Las Vegas algorithm depends on the output of the randomizer, but it cannot
guarantee a time constraint.
->It might terminate fast, and if not, it might run for a longer period of time. In general the execution time of a
Las Vegas algorithms characterized as a random variable.
2.Monte Carlo algorithm:

->In contrast to Las Vegas algorithms, whose outputs might differ from run to run (for the same input).
->Monte Carlo algorithm is a probabilistic algorithm which, depending on the output of the randomizer, has a
slight probability of producing an incorrect result or failing to produce a result altogether.
->This probability of incorrectness is known and is typically small.
Example for randomized algorithm

->The problem of finding ‘a’ in an array of n elements:
->In a given array of n elements let half elements are ‘a’ and that of half are ‘b’. One approach of the approach
is to look at each element of the array and this is an expensive one and will take n/2 operations, if the array
were ordered as ‘b’s first followed by ‘a’s.
->With this approach we cannot guarantee that the algorithm will complete quickly.
Using randomized algorithm, if we look randomly then we can find ‘a’ quickly with high probability.
Identifying the Repeated Element

->Consider an array a[ ] of n numbers that has n/2 distinct elements and n/2 copies of another elements The
problem is to identify the repeated element.
->Any deterministic algorithm for solving this problem will need at least (n/2)+ 2 time steps in the worst case
->In contrast there is a simple and elegant randomized Las Vegas algorithm that takes only O(logn) time. It
randomly picks two array elements and checks whether they come from two different cells and have the
same value. If they do,the repeated element has been found. If not, this basic step of sampling is repeated as
many times as it takes to identify the repeated element
->The algorithm returns the array index of one of the copies of the repeated element

Primality Testing :
->Any integer greater than one is said to be a prime if its only divisors are 1 and the integer itself
->Given an integer ‘n’, the problem of deciding whether ‘n’ is a prime is known as primality testing. It has a
number of applications including cryptology
-> Assuming that it takes Θ(1)time to determine whether one integer divides another, the naive primality testing
algorithm has a run time of 0(^/n). The input size for this problem is [(logn+ 1)],since n can be represented in
binary form with these many bits.Thus the run time of this simple algorithm is exponential in the input
size(notice that i/n = 22logn). We can devise a Monte Carlo randomized algorithm for primality testing that
runs in time 0((logn)2). The output of this algorithm is correctwith high probability. If the input is prime,the
algorithm never gives an incorrect answer. However,if the input number is composite(i.e.,non prime),then there
is a small probability that the answer may be incorrect.Algorithms of this kind are said to have one-sided error
Advantages:
->The algorithm is usually simple and easy to implement.
->The algorithm is fast with very high probability, and/or it produces optimum output with very high
probability.
Disadvantages:
->The use of a randomizer within an algorithm doesn’t always result in better performance.
->Small probability of error cannot be tolerated in some critical applications such as nuclear reactor.

SETS & DISJOINT SET UNION
INTRODUCTION
->Set: It is a collection of elements

E. g. S= {1,2,3,….;…n}
->Disjoint Set:
If two sets A,B are said to be disjoint if A ∩ B =null i.e; there is no common element in both A and B.
E.g. S1 = {1,7,8,9}, S2 = {2,5,10}, S3= {3,4,6}
->Each set is represented as a tree
->Each node points towards its parent
The operations we wish to perform on these sets are:

1.Disjoint set union : If Si and Sj are two disjoint sets,then their union Si U Sj= all elements x such that x is in Si or
Sj.Thus, SiUS2 = {1,7, 8,9,2,5,10}. Since we have assumed that all sets are disjoint, we can assume that following the
union of Si and Sj,the sets Si and Sj do not exist independently;that is,they are replaced by Si U Sj in the collection of sets.
2.Find(i) : Given the element i,find the set containing i. Thus, 4 is in setS3,and 9 is in setS\\.
UNION AND FIND OPERATION

->To obtain the union of ,Si and S2
->Since we have linked the nodes from children to parent,we simply make one of the trees a sub tree of the
other.S1 U S2 could then have one of the representations of Figure2.1
To obtain the union of two sets,all that has to be done is to set the parent field of one of the roots to the other
root.This can be accomplished easily if, with each set name,we keep a pointer to the root of the tree
representing that set.If, in addition,each root has a pointer to the set name,then to determine which set an
element is currently in, we follow parent links to the root of its tree and use the pointer to the set name.The data
representation for S1,S2,and S3 may then take the form shown in Figure2.19

->In presenting the union and find algorithms,we ignore the set names and identify sets just by the roots of the
trees representing them
-> FindPointer is a function that takes a set name and determines the root of the tree that represents it. This is
done by an examination of the [set name,pointer] table.
-> Since the set elements are numbered 1through n, we represent the tree nodes using an array p[1 :n], where n
is the maximum number of element
-> The ith element of this array represents the tree node that contains element i. This array element gives the
parent pointer of the corresponding tree node.Figure2.20 shows this representation of the sets S1,S2, and S3
of Figure2.17.Notice that root nodes have a parent of -1
->We can now implement Find(i)by following the indices,starting at i until we reach a node with parent value
-1.For example,Find(6) starts at 6 and then moves to 6'sparent,3.Since p[3] is negative,we have reached the
root.
-> The operation Union(i,j)is equally simple.We pass in two trees with roots i and j. Adopting the convention
that the first tree becomes a sub tree of the second,the statement p[i] :=j; accomplishes the union.

Weighted Union
->If the number of nodes in the tree with root i is less than the number in the tree with root j, then make j the
parent of i; otherwise make i the parent of j.
Collapsing Rule :
->To find the root, we replace the parent pointer of the given node with a pointer to the root.
-> In fact, we can go one step further and replace the parent pointer of every node along the search path to
the root. This is called a collapsing find operation.

BASIC TRAVERSAL & SEARCH TECHNIQUES
The techniques to be discussed in this chapter are divided into two categories. The first category includes
techniques applicable only to binary trees.As described,these techniques involve examining every node in the
given data object instance.Hence,these techniques are referred to as traversal methods. The second category
includes techniques applicable to graphs(and hence also to trees and binary trees).These may not examine all
vertices and so are referred to only as search methods.During a traversal or search the fields of a node may be
used several times.It may be necessary to distinguish certain uses of the fields of a node.During these uses,the
node is said to be visited. Visiting a node may involve printing out its data field, evaluating the operation
specified by the node in the case of a binary tree representing an expression,setting a mark bit to one or
zero,and so on. Since we are describing traversals and searches of trees and graphs independently of their
application,we use the term "visited" rather than the term for the specific function performed on the node at this
time.
TECHNIQUES FOR GRAPHS :

-> Traversing the graph G = (V, E) means visiting all the vertices of the graph.
-> There are two standard methods by using which, we can traverse the graphs.
1. Breadth First Search (BFS)
2. Depth First Search (DFS)
Breadth First Search (BFS)

-> When we use the BFS algorithm for the traversal in a graph, we can consider any node as a root node.
->Breadth first search is a graph traversal algorithm that starts traversing the graph from root node and explores
all the neighboring nodes.
->Then, it selects the nearest node and explore all the unexplored nodes. The algorithm follows the same
process for each of the nearest node until it finds the goal.
->It is also known as level order traversal.
->The Queue data structure is used for the Breadth First Search traversal.





BFS ALGORITHM
Depth first search (DFS)

->A depth first search(DFS) the exploration of a vertex is suspended as soon as a new vertex is reached then
exploration of the new vertex begins.
->Depth first search (DFS) algorithm starts with the initial node of the graph G, and then goes to deeper and
deeper until we find the goal node or the node which has no children.
->The algorithm, then backtracks from the dead end towards the most recent node that is yet to be completely
unexplored.
->The data structure which is being used in DFS is stack.
->The process is similar to BFS algorithm.
-> In DFS, the edges that leads to an already visited node are called back edges.
DFS ALGORITHM





CONNECTED COMPONENTS AND SPANNING TREES :
Connected graph:
->A connected graph is a graph in which a path always exists from a vertex to any other vertex.
->A graph is connected if we can reach any vertex from any other vertex by following edges in either direction.
Connected Component Definition:
->A connected component or simply component of an undirected graph is a sub graph in which each pair of
nodes is connected with each other via a path.

SPANNING TREES
->If we have a graph containing V vertices and E edges, then the graph can be represented as: G(V, E)
->If we create the spanning tree from the above graph, then the spanning tree would have the same number of
vertices as the graph, but the edges are not equal.
->The edges in the spanning tree would be equal to the number of vertices in the graph minus 1.



BICONNECTED COMPONENTS AND DFS
Articulation point:
->A vertex v in a connected graph G is an articulation point if and only if the deletion of vertex v together with
all edges incident to v disconnects the graph into two or more non empty components.
->Articulation points are sometimes called cut vertices.
Biconnected Graph:
A graph is said to be Biconnected if:
1) It is connected, i.e. it is possible to reach every vertex from every other vertex, by a simple path.
2) Even after removing any vertex the graph remains connected.
Note that once the edges(vj, vj+i) of

line 6 (Algorithm6.8)are added,
vertex a is no longer an articulation
point Hence following the addition of
the edges corresponding to all
articulation points,G has no
articulation points and so is
biconnected
Problem of identifying the articulation points and biconnected components of a connected graph G with n > 2
vertices.The problem is efficiently solved by considering a depth first spanning tree of G. Figure6.9(a)and (b)
shows a depth first spanning tree of the graph of Figure6.6(a).In each figure there is a number out side each
vertex.These numbers correspond to the order in which a depth first search visits these vertices and are referred
to as the depth first numbers (dfns) of the vertex. Thus,dfn[l]= 1,dfn[4] = 2,dfn[6] = 8,and soon.In
Figure6.9(b)solid edges form the depth first spanning tree. These edges are called tree edges. Broken
edges(i.e.,all the remaining edges) are called back edge

->Initially, the algorithm chooses any random vertex to start the algorithm and marks its status as visited. The
next step is to calculate the depth of the selected vertex. The depth of each vertex is the order in which they
are visited.
->Next, we need to calculate the lowest discovery number. This is equal to the depth of the vertex reachable
from any vertex by considering one back edge in the DFS tree. An edge is a back edge if is an ancestor of
edge but not part of the DFS tree. But the edge is a part of the original graph.
->Let’s say we want to calculate the lowest discovery number for vertex 4. The neighbor vertex is 3, but it
doesn’t have any back edge. So the search goes on and reaches vertex 2 . Vertex 2 has a back edge, and it
connects to vertex 1. The lowest discovery number of is equal to the depth of 1. Therefore the lowest
discovery number of is 1 .
->L[u] = min {dfn[u], min {L[w} | w is a child of u},min {dfn[w}| (u,w) is a back edge}}
L[1: 10]= {1,1,1,1,6,8,6,6,5,4}
->After calculating the depth and lowest discovery number for all the vertices, the algorithms take a pair of
vertices and checks whether its an articulation point or not.
->Let’s take two vertices and . We’re further considering is a parent, and is a child vertex. If the lowest
discovery number of is greater than or equal to the depth of , then is an articulation point.
->There’s one special case when the root is an articulation point. The root is an articulation point if and only if
it has more then one child in the tree. That’s why we put a checking condition in our algorithm to identify the
root of the tree.
->Checking condition for articulation point: L[w] >dfn[u]
L[5] >=dfn[2] 6>=6 (True)
L[10] >=dfn[3] 4>=3 (True)
L[6] >=dfn[5] 8>=7 (True)
Hence, the vertices 2,3,5 are articulation points in the graph.
->Time Complexity
The algorithm mainly uses the DFS search with some additional steps. So basically, the time complexity of
this algorithm is equal to the time complexity of the DFS algorithm.
In case the given graph uses the adjacency list, the time complexity of the DFS algorithm would be (O+E) .
Therefore, the overall time complexity of this algorithm is(O+E) .
Applications
->The concept of articulation points is very crucial in a network. The presence of articulation points in a
connected network is an indication of high risk and vulnerability. If a connected network has an articulation
point, then a single point failure would disconnect the whole network.
->In computer science, one popular problem is to find out the maximum amount of flow passing from a source
vertex to a sink vertex. This problem is known as the max-flow min-cut theorem. The concept of articulation
point is used here to find the solution.
Algorithm to find Bi-connected components in the graph
Graph
FIG1
-> So simply check if the given graph has any articulation point or not. If it has no articulation point then it is
Biconnected otherwise not.
->Clearly for vertex 4 and its child 1, low[1]≥disc[4], so that means 4 is an articulation point. The algorithm
returns false as soon as it discovers that 4 is an articulation point and will not go on to check low[] for vertices
0, 2 and 3.

->First it finds visited[0] is false so it starts with vertex 0 and first discovers the edge 0-3 and pushes it in the
stack.
->Then with 3 it finds the edge 3-2 and pushes that in stack.
->With 2 it finds the edge 2-4 and pushes that in stack.

->For 4 it first finds edge 4-1 and pushes that in stack.
->It then discovers the fact that low[1]≥disc[4] i.e. discovers that 4 is an articulation point. So all the edges
inserted after the edge 4-1 along with edge 4-1 will form first biconnected component. So it pops and print
the edges till last edge is 4-1 and then prints that too and pop it from the stack
Then it discovers the edge 4-5 and pushes that in stack
->For 5 it discovers the back edge 5-2 and pushes that in stack
-> After that no more edge is connected to 5 so it goes back to 4. For 4 also no more edge is connected and
also low[5]≱ disc[4].
->Then it goes back to 2 and discovers that low[4]≥disc[2], that means 2 is an articulation point. That means all
the edges inserted after the edge 4-2 along with the edge 4-2 will form the second biconnected component.
So it print and pop all the edges till last edge is 4-2 and then print and pop that too.
->Then it goes to 3 and discovers that 3 is an articulation point as low[2]≥disc[3] so it print and pops till last
edge is 3-2 and the print and pop that too. That will form the third biconnected component.
->Now finally it discovers that for edge 0-3 also low[3]≥disc[0] so it pops it from the stack and print it as the
fourth biconnected component.
Then it checks visited[] value of other vertices and as for all vertices it is true so the algorithm terminates.
So ultimately the algorithm discovers all the 4 biconnected components shown
->Time complexity of the algorithm is same as that of DFS. If V is the number of vertices and E is the number
of edges then complexity is O(V+E).

Design and Analysis of Algorithms

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Analysis of Algorithms

Uploaded by

Copyright:

Available Formats

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT 1

DESIGN AND ANALYSIS OF ALGORITHMS

UNIT I Introduction: Algorithm Definition, Algorithm Specification, performance Analysis, Randomized

Differences between Algorithm and Program

Issues or study of Algorithm:

How to device algorithms-

How to validate algorithms:

How to analyze algorithms:

How to test a program:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 2

Pseudo-Code Conventions for expressing algorithms:

8. A conditional statement has the following forms.

Case statement: A case statement is interpreted as follows. If (condition1)

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 4

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 5

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 6

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 7

There are three main techniques used for amortized analysis:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 8

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 9

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 10

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 11

example : consider f(n)=2n+3 and g(n)=n^2

2 . OMEGA (Ω) NOTATION:

Example : consider f(n)=2n+5, g(n)=2n

3. THETA (Θ) NOTATION:

let us assuming as c2=4 c2*g(n)=4n

Mathematical Relation of Little o notation

Example on little o asymptotic notation

5. LITTLE OMEGA NOTATION :

When randomized algorithms are used?

Types of Randomized algorithms

1.Las Vegas algorithm:

2.Monte Carlo algorithm:

Example for randomized algorithm

Identifying the Repeated Element

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 15

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 16

->Set: It is a collection of elements

The operations we wish to perform on these sets are:

UNION AND FIND OPERATION

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 17

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 18

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 19

TECHNIQUES FOR GRAPHS :

Breadth First Search (BFS)

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 20

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 21

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 22

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 23

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 24

Depth first search (DFS)

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 26

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 27

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 28

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 29

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 30

CONNECTED COMPONENTS AND SPANNING TREES :

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 31

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 32

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 33

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 34