You are on page 1of 85

Design and Analysis of Algorithm

An algorithm is a set of steps of operations to solve


a problem performing calculation, data processing,
and automated reasoning tasks.
An algorithm is an efficient method that can be
expressed within finite amount of time and space.
An algorithm is the best way to represent the
solution of a particular problem in a very simple and
efficient way.
• If we have an algorithm for a specific problem,
then we can implement it in any programming
language, meaning that the algorithm is
independent from any programming
languages.
Algorithm Design

• The important aspects of algorithm design include creating


an efficient algorithm to solve a problem in an efficient way
using minimum time and space.
• To solve a problem, different approaches can be followed.
Some of them can be efficient with respect to time
consumption, whereas other approaches may be memory
efficient.
• However, one has to keep in mind that both time
consumption and memory usage cannot be optimized
simultaneously. If we require an algorithm to run in lesser
time, we have to invest in more memory and if we require
an algorithm to run with lesser memory, we need to have
more time.
Problem Development Steps

The following steps are involved in solving computational


problems.
• Problem definition
• Development of a model
• Specification of an Algorithm
• Designing an Algorithm
• Checking the correctness of an Algorithm
• Analysis of an Algorithm
• Implementation of an Algorithm
• Program testing
• Documentation
Characteristics of Algorithms

The main characteristics of algorithms are as


follows
• Algorithms must have a unique name
• Algorithms should have explicitly defined set of
inputs and outputs
• Algorithms are well-ordered with unambiguous
operations
• Algorithms halt in a finite amount of time.
Algorithms should not run for infinity, i.e., an
algorithm must end at some point
Analysis of algorithms
• Algorithm analysis is an important part of
computational complexity theory, which provides
theoretical estimation for the required resources of an
algorithm to solve a specific computational problem.
• Most algorithms are designed to work with inputs of
arbitrary length. Analysis of algorithms is the
determination of the amount of time and space
resources required to execute it.
• Usually, the efficiency or running time of an algorithm
is stated as a function relating the input length to the
number of steps, known as time complexity, or volume
of memory, known as space complexity.
• Analysis of algorithm is the process of analyzing the problem-
solving capability of the algorithm in terms of the time and size
required (the size of memory for storage while implementation).
• However, the main concern of analysis of algorithms is the required
time or performance. Generally, we perform the following types of
analysis −
• Worst-case − The maximum number of steps taken on any instance
of size a.
• Best-case − The minimum number of steps taken on any instance of
size a.
• Average case − An average number of steps taken on any instance
of size a.
• Amortized − A sequence of operations applied to the input of size a
averaged over time.
Proving Correctness
Proof by:
• Counterexample (indirect proof )
• Induction (direct proof )
• Loop Invariant

Other approaches:
• proof by cases/enumeration
• proof by chain
• proof by contradiction
• proof by contrapositive
• For any algorithm, we must prove that it always returns
the desired output for all legal instances of the
problem. For sorting, this means even if the input is
already sorted or it contains repeated elements.
Proof by Counterexample
• Searching for counterexamples is the best way to
disprove the correctness of some things.
• Identify a case for which something is NOT true
• If the proof seems hard or tricky, sometimes a
counterexample works
• Sometimes a counterexample is just easy to see, and
can shortcut a proof
• If a counterexample is hard to nd, a proof might be
easier
Proof by Induction
• Failure to find a counterexample to a given
algorithm does not mean “it is obvious" that the
algorithm is correct.
• Mathematical induction is a very useful method
for proving the correctness of recursive
algorithms.
• 1. Prove base case
• 2. Assume true for arbitrary value n
• 3. Prove true for case n + 1
Proof by Loop Invariant

Proof by Loop Invariant


• Built of proof by induction.
• Useful for algorithms that loop.
Formally: Find loop invariant, then prove:
1. Define a Loop Invariant
2. Initialization
3. Maintenance
4. Termination
Informally:
1. Find p, a loop invariant
2. Show the base case for p
3. Use induction to show the rest.
Proof by Counterexample
• Used to prove statements false, or algorithms
either incorrect or non-optimal
Examples: Counterexample
Prove or disprove: [x + y] = [x] + [y]
Proof by counterexample: x = 1/2 and y = 1/2
Prove or disprove: “Every positive integer is the
sum of two squares of integers"
Proof by counterexample: 3
Greedy Algorithms
• An algorithm that selects the best choice at each step,
instead of considering all sequences of steps that may
lead to an optimal solution.
• It's usually straight-forward to find a greedy algorithm
that is feasible, but hard to find a greedy algorithm that
is optimal
• Either prove the solution optimal, or find a
counterexample such that the algorithm yields a non-
optimal solution
• An algorithm can be greedy even if it doesn't produce
an optimal solution
Fractional Knapsack
Problem- Greedy Approach
Greedy Approach
• It can give maximum optimum solution.
• Feasible: It has to satisfy the problem
constraints
• Locally Optimal: It has to be the best local
choice
• Irrevocable: Once made it cannot be changed
on subsequent steps of the algorithm.
Steps for fractional Knapsack problem
Step 1:
For each product, compute its profit / weight
ratio.
Step 2:
• Start putting the products into the knapsack
beginning from the product with the highest
ratio.
Huffman Coding
• Human codes provide a method of encoding data
efficiently.
• When characters are coded using standard codes
like ASCII or the Unicode, each character is
represented by a fixed-length code word of bits
(e.g., 8 or 16 bits per character)
• Fixed-length codes are popular, However, fixed-
length codes may not be the most efficient from
the perspective of minimizing the total quantity
of data.
Huffman coding
• It is used for reducing the data
• Data compression algorithm
• Lossless compression- without loss of
information
• Reduce the cost of compression
Idea
Consider the paths from the root to each of
the leaves A, B, C, D:

A:0
0 1
B : 10
C : 110 A
0 1
D : 111 B
0 1
C D
Observe:
1. This is a prefix code, since each of the
leaves has a path ending in it, without
continuation.
2. If the tree is full then we are not
“wasting” bits.
3. If we make sure that
0 1
the more frequent
symbols are closer to A
0 1
the root then they will B
have a smaller code. 0 1
C D
Greedy Algorithm:

1. Consider all pairs: <frequency, symbol>.

2. Choose the two lowest frequencies, and


make them brothers, with the root
having the combined frequency.

3. Iterate.
Greedy Algorithm Example:

Alphabet: A, B, C, D, E, F

Frequency table:
A B C D E F
10 20 30 40 50 60

Total File Length: 210


Algorithm Run:
A 10 B 20 C 30 D 40 E 50 F 60
Algorithm Run:
X 30 C 30 D 40 E 50 F 60

A 10 B 20
Algorithm Run:
Y 60 D 40 E 50 F 60

X 30 C 30

A 10 B 20
Algorithm Run:
D 40 E 50 Y 60 F 60

X 30 C 30

A 10 B 20
Algorithm Run:
Z 90 Y 60 F 60

D 40 E 50 X 30 C 30

A 10 B 20
Algorithm Run:
Y 60 F 60 Z 90

X 30 C 30 D 40 E 50

A 10 B 20
Algorithm Run:
W 120 Z 90

Y 60 F 60 D 40 E 50

X 30 C 30

A 10 B 20
Algorithm Run:
Z 90 W 120

D 40 E 50 Y 60 F 60

X 30 C 30

A 10 B 20
Algorithm Run:
V 210
0 1

Z 90 W 120
0 1 1
0
D 40 E 50 Y 60 F 60
0 1

X 30 C 30
0 1

A 10 B 20
The Huffman encoding:
A: 1000 0
V 210
1
B: 1001
C: 101 Z 90
1
W 120
0 0 1
D: 00
E: 01 D 40 E 50 Y 60 F 60
0 1
F: 11
X 30 C 30
0 1

A 10 B 20

File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =


40 + 80 + 90 + 80 + 100 + 120 = 510 bits
Note the savings:

The Huffman code:


Required 510 bits for the file.

Fixed length code:


Need 3 bits for 6 characters.
File has 210 characters.

Total: 630 bits for the file.


Time complexity:

O(nlogn) where n is the number of unique characters. If there are n


nodes, extractMin() is called 2*(n – 1) times. extractMin() takes O(logn)
time as it calls minHeapify().
So, the overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm.

Applications of Huffman Coding:

They are used for transmitting fax and text.


They are used by conventional compression formats like PKZIP, GZIP,
etc.
Multimedia codecs like JPEG, PNG, and MP3 use Huffman encoding(to
be more precise the prefix codes).
It is useful in cases where there is a series of frequently occurring
characters.
Karatsuba faster integer multiplication
algorithm
• In order to multiply two integers of n-bits,
Karatsuba algorithm uses divide and conquer
technique and takes O(nlog 3) bit operations.
Karatsuba algorithm performs multiplication
operation by replacing some multiplications
with subtraction and addition operations
which are less costly.
• Note: Karatsuba problem is attached in
separate PDF file in VTOP(please check it)
Maximum Subarray Sum using Divide
and Conquer algorithm
• We can easily find the crossing sum in linear time. The
idea is simple, find the maximum sum starting from
mid point and ending at some point on left of mid,
then find the maximum sum starting from mid + 1 and
ending with some point on right of mid + 1.
• A one dimensional array that may contain both positive
and negative integers, find the sum of contiguous
subarray of numbers which has the largest sum.
• Using Divide and Conquer approach, we can find the
maximum subarray sum in O(nLogn) time
• Divide the given array in two halves
• Return the maximum of following three
– Maximum subarray sum in left half (Make a
recursive call)
– Maximum subarray sum in right half (Make a
recursive call)
– Maximum subarray sum such that the subarray
crosses the midpoint
Example :

• Input array =[-6,-2,8,3,4,-2].


• Here maximum contiguous subarray sum from left-side i.e
left-side sum (LS)= 8 .
• Maximum contiguous subarray sum from right-side i.e
right-side sum(RS)= 7.
• Cross sum (CS)= LSS+RSS
• LSS(leftside sub sum)= Max{8, 8-2, 8-2-6} =8
• RSS(rightside sub sum)= Max{3, 3+4, 3+4-2} =7
• So, Cross sum (CS) = 8+7=15.
• Therefore select the max{LS,RS,CS} i.e max{8,7,15} = 15
• midpoint cross subarray sum is 15.
• hence the maximum subarray sum is 15.
Time analysis
• Find-Max-Cross-Subarray: O(n) time

• Two recursive calls on input size n/2

• Thus:
T(n) = 2T(n/2) + O(n)
T(n) = O(n log n)
Dynamic Programming
• Dynamic Programming is a technique of breaking down
a problem into sub problems, solving these sub
problems once, and storing their solutions.
• The main use of dynamic programming is to solve
optimization problems. Here, optimization problems
mean that when we are trying to find out the minimum
or the maximum solution of a problem.
• Dynamic Programming, what we are ultimately hoping
to achieve is a significantly faster computation time at
the expense of a modest increase in space used.
Approaches
• There are essentially two ways you can go
about storing solutions to problems:
• Memoization (top-down approach)
• Tabulation (bottom-up approach)
Assembly line Scheduling using
Dynamic Programming
• In an product industry, products are produced
using assembly lines. Multiple lines are
worked together to produce a useable product
and completed useable product exits at the
end of the line.
• For making a product in the optimal time we
should choose the optimised assembly line
from a station so that a company can make
product in the best utilized time.
Problem Statement

• The main goal of solving the Assembly Line


Scheduling problem is to determine which
stations to choose from line 1 and which to
choose from line 2 in order to minimize
assembly time.
• The pictorial view of the dynamic
programming (dp) array that store the partial
results calculated for each station in an
assembly line. And line number table that help
you to understand from which line the
optimized solution comes from (it is for
explanation only).
Time Complexity

• As the dynamic programming tabulation array


is used and the 'n' iteration is done to store
the optimal time in the array, therefore the
time complexity of the above Dynamic
Programming implementation of the assembly
line scheduling is O(n).
We are storing the optimal time taken to pass
through station in an array, therefore the
space complexity will be O(n).
Matrix chain multiplication using
dynamic programming
• Matrix chain multiplication (or the matrix
chain ordering problem) is an optimization
problem concerning the most efficient way to
multiply a given sequence of matrices. The
problem is not actually to perform the
multiplications, but merely to decide the
sequence of the matrix multiplications
involved. The problem may be solved using
dynamic programming.
Example of Matrix Chain
Multiplication

• Example: We are given the sequence {4, 10, 3,


12, 20, and 7}. The matrices have size 4 x 10,
10 x 3, 3 x 12, 12 x 20, 20 x 7. We need to
compute M [i,j], 0 ≤ i, j≤ 5. We know M [i, i] =
0 for all i.
• Let us proceed with working away from the
diagonal. We compute the optimal solution
for the product of 2 matrices.
• Here P0 to P5 are Position and M1 to M5 are
matrix of size (pi to pi-1)
• In Dynamic Programming, initialization of every
method done by '0'.So we initialize it by '0'.It will sort
out diagonally.
• We have to sort out all the combination but the
minimum output combination is taken into
consideration.
• Calculation of Product of 2 matrices:
1. m (1,2) = m1 x m2 = 4 x 10 x 10 x 3 = 4 x 10 x 3 = 120
2. m (2, 3) = m2 x m3 = 10 x 3 x 3 x 12 = 10 x 3 x 12 = 360
3. m (3, 4) = m3 x m4 = 3 x 12 x 12 x 20 = 3 x 12 x 20 = 720
4. m (4,5) = m4 x m5 = 12 x 20 x 20 x 7 = 12 x 20 x 7 = 1680
• We initialize the diagonal element with equal i,j
value with '0'.
• After that second diagonal is sorted out and we
get all the values corresponded to it
• Now the third diagonal will be solved out in the
same way.
• Now product of 3 matrices:
• M [1, 3] = M1 M2 M3
• There are two cases by which we can solve this
multiplication: ( M1 x M2) + M3, M1+ (M2x M3)
• After solving both cases we choose the case in which
minimum output is there.

• M [1, 3] =264
• As Comparing both output 264 is minimum in both cases so
we insert 264 in table and ( M1 x M2) + M3 this combination
is chosen for the output making.
• M [2, 4] = M2 M3 M4
• There are two cases by which we can solve this
multiplication: (M2x M3)+M4, M2+(M3 x M4)
• After solving both cases we choose the case in which
minimum output is there.

• M [2, 4] = 1320
• As Comparing both output 1320 is minimum in both
cases so we insert 1320 in table and M2+(M3 x M4) this
combination is chosen for the output making.
• M [3, 5] = M3 M4 M5
• There are two cases by which we can solve
this multiplication: ( M3 x M4) + M5, M3+ (
M4xM5)
• After solving both cases we choose the case in
which minimum output is there.
• As Comparing both output 1140 is minimum
in both cases so we insert 1140 in table and (
M3 x M4) + M5this combination is chosen for
the output making.
• Now Product of 4 matrices:
• M [1, 4] = M1 M2 M3 M4
• There are three cases by which we can solve
this multiplication:
• ( M1 x M 2 x M 3 ) M4
• M1 x(M2 x M3 x M4)
• (M1 xM2) x ( M3 x M4)
• After solving these cases we choose the case
in which minimum output is there
As comparing the output of different cases then '1080' is minimum output, so we
insert 1080 in the table and (M1 xM2) x (M3 x M4) combination is taken out in
output making,
• There are three cases by which we can solve
this multiplication:
• (M2 x M3 x M4)x M5
• M2 x( M3 x M4 x M5)
• (M2 x M3)x ( M4 x M5)
• After solving these cases we choose the case
in which minimum output is there
• Now Product of 5 matrices:
• M [1, 5] = M1 M2 M3 M4 M5
• There are five cases by which we can solve this
multiplication:
• (M1 x M2 xM3 x M4 )x M5
• M1 x( M2 xM3 x M4 xM5)
• (M1 x M2 xM3)x M4 xM5
• M1 x M2x(M3 x M4 xM5)
• After solving these cases we choose the case
in which minimum output is there
Longest common subsequence

• A longest common subsequence (LCS) is the


longest subsequence common to all sequences in
a set of sequences (often just two sequences)
• The problem of computing longest common
subsequences is a classic computer science
problem, the basis of data comparison programs
such as the diff utility, and has applications in
computational linguistics and bioinformatics.
• It is also widely used by revision control systems
such as Git for reconciling multiple changes made
to a revision-controlled collection of files.
• Let's understand the subsequence through an example.
• Suppose we have a string 'w'.
• W1 = abcd
• The following are the subsequences that can be created from the above
string:
• ab
• bd
• ac
• ad
• acd
• bcd
• The above are the subsequences as all the characters in a sub-string are
written in increasing order with respect to their position.
• If we write ca or da then it would be a wrong subsequence as characters
are not appearing in the increasing order.
• The total number of subsequences that would be possible is 2n, where n is
the number of characters in a string. In the above string, the value of 'n' is
4 so the total number of subsequences would be 16.
• W1 = abcd
• W2= bcd
• By simply looking at both the strings w1 and
w2, we can say that bcd is the longest
common subsequence.
• If the strings are long, then it won't be
possible to find the subsequence of both the
string and compare them to find the longest
common subsequence.
• Finding LCS using dynamic programming with
the help of a table.
• Consider two strings:
• X= a b a a b a
• Y= b a b b a b
(a, b)
• For index i=1, j=1
• Since both the characters are different so we
consider the maximum value. Both contain
the same value, i.e., 0 so put 0 in (a,b).
Suppose we are taking the 0 value from 'X'
string, so we put arrow towards 'a' as shown
in the above table.
• (a, a)
• For index i=1, j=2
• Both the characters are the same, so the value
would be calculated by adding 1 and upper
diagonal value. Here, upper diagonal value is
0, so the value of this entry would be (1+0)
equal to 1. Here, we are considering the upper
diagonal value, so the arrow will point
diagonally.
• (a, b)
• For index i=1, j=3
• In the above table, we can observe that all the entries
are filled. Now we are at the last cell having 4 value.
This cell moves at the left which contains 4 value.;
therefore, the first character of the LCS is 'a'.
• The left cell moves upwards diagonally whose value is
3; therefore, the next character is 'b' and it becomes
'ba'. Now the cell has 2 value that moves on the left.
The next cell also has 2 value which is moving upwards;
therefore, the next character is 'a' and it becomes
'aba'.
• The next cell is having a value 1 that moves upwards.
Now we reach the cell (b, b) having value which is
moving diagonally upwards; therefore, the next
character is 'b'. The final string of longest common
subsequence is 'baba'.
N-Queens Problem

• N - Queens problem is to place n - queens in such


a manner on an n x n chessboard that no queens
attack each other by being in the same row,
column or diagonal.
• It can be seen that for n =1, the problem has a
trivial solution, and no solution exists for n =2 and
n =3. So first we will consider the 4 queens
problem and then generate it to n - queens
problem.
• Given a 4 x 4 chessboard and number the rows
and column of the chessboard 1 through 4.
• Since, we have to place 4 queens such as q1 q2 q3
and q4 on the chessboard, such that no two
queens attack each other. In such a conditional
each queen must be placed on a different row,
i.e., we put queen "i" on row "i."
• Now, we place queen q1 in the very first acceptable position (1, 1).
Next, we put queen q2 so that both these queens do not attack
each other. We find that if we place q2 in column 1 and 2, then the
dead end is encountered.
• Thus the first acceptable position for q2 in column 3, i.e. (2, 3) but
then no position is left for placing queen 'q3' safely. So we backtrack
one step and place the queen 'q2' in (2, 4), the next best possible
solution.
• Then we obtain the position for placing 'q3' which is (3, 2). But later
this position also leads to a dead end, and no place is found where
'q4' can be placed safely.
• Then we have to backtrack till 'q1' and place it to (1, 2) and then all
other queens are placed safely by moving q2 to (2, 4), q3 to (3, 1)
and q4 to (4, 3). That is, we get the solution (2, 4, 1, 3).
• This is one possible solution for the 4-queens problem. For another
possible solution, the whole method is repeated for all partial
solutions. The other solutions for 4 - queens problems is (3, 1, 4, 2)
i.e.
Subset Sum Problem

• Subset sum problem is the problem of finding


a subset such that the sum of elements equal
a given number.
• The backtracking approach generates all
permutations in the worst case but in
general, performs better than the recursive
approach towards subset sum problem.
• The problem is to find a subset of a given set
S= {s1,s2,s3……sn} of n positive integers
whose sum is equal to a given positive
integer “m”
Let us consider S={1,3,4,5} and m=8
The possible solutions are
{1,3,4} and {3,5}
Graph Colouring
• The backtracking approach to solving the
graph colouring problem can be to assign the
colours one after the other to the various
vertices.
• If the current colour assignment does not
violate the condition then add it into the
solution else, backtrack by returning false.
Graph Colouring
• The goal is to color vertices in a graph G={V,E}
so that no 2 adjacent vertices have the same
color. Partial 3-coloring problem means only 3
colors are considered.
• Direct approach builds the tree of ALL
possibilities in exponential time.
• Partial 3-coloring (3 colors) is solved by the following method:
• Color first vertex with 1st color, color next vertex with the next color, check if
those two vertices are adjacent, if not - coloring is legal, proceed to next
vertex, if yes and color is the same – coloring is illegal, try next color for
second vertex. If all colors tried and all colorings are illegal, backtrack, try next
color for previous vertex etc.
• Note: sometimes solution is impossible.

• Exponential O(3^n) complexity is reduced to O(n) on average .

You might also like