Unit 4

Unit -4
Dynamic Programming
Dynamic Programming is a technique in computer programming that helps
to efficiently solve a class of problems that have overlapping subproblems
and optimal substructure property.
If any problem can be divided into subproblems, which in turn are divided
into smaller subproblems, and if there are overlapping among these
subproblems, then the solutions to these subproblems can be saved for
future reference. In this way, efficiency of the CPU can be enhanced. This
method of solving a solution is referred to as dynamic programming.
Such problems involve repeatedly calculating the value of the same

subproblems to find the optimum solution.
Dynamic Programming Example
Let's find the fibonacci sequence upto 5th term. A fibonacci series is the
sequence of numbers in which each number is the sum of the two
preceding ones. For example, 0,1,1, 2, 3 . Here, each number is the sum of
the two preceding numbers.
Algorithm
Let n be the number of terms.
1. If n <= 1, return 1.
2. Else, return the sum of two preceding numbers.

We are calculating the fibonacci sequence up to the 5th term.
1. The first term is 0.
2. The second term is 1.
3. The third term is sum of 0 (from step 1) and 1(from step 2), which is 1.
4. The fourth term is the sum of the third term (from step 3) and second term
(from step 2) i.e. 1 + 1 = 2.
5. The fifth term is the sum of the fourth term (from step 4) and third term
(from step 3) i.e. 2 + 1 = 3.
Hence, we have the sequence 0,1,1, 2, 3 . Here, we have used the results
of the previous steps as shown below. This is called a dynamic
programming approach.
F(0) = 0
F(1) = 1
F(2) = F(1) + F(0)
F(3) = F(2) + F(1)
F(4) = F(3) + F(2)
How Dynamic Programming Works
Dynamic programming works by storing the result of subproblems so that

when their solutions are required, they are at hand and we do not need to
recalculate them.
This technique of storing the value of subproblems is called memoization.
By saving the values in the array, we save time for computations of sub-
problems we have already come across.
var m = map(0 → 0, 1 → 1)
function fib(n)
if key n is not in map m
m[n] = fib(n − 1) + fib(n − 2)
return m[n]
Longest Common Subsequence

The longest common subsequence (LCS) is defined as the longest
subsequence that is common to all the given sequences, provided that the
elements of the subsequence are not required to occupy consecutive
positions within the original sequences.
If S1 and S2 are the two given sequences then, Z is the common
subsequence of S1 and S2 if Z is a subsequence of both S1 and S2 .
Furthermore, Z must be a strictly increasing sequence of the indices of
both S1 and S2 .
In a strictly increasing sequence, the indices of the elements chosen from
the original sequences must be in ascending order in Z .
If
S1 = {B, C, D, A, A, C, D}
Then, {A, D, B} cannot be a subsequence of S1 as the order of the

elements is not the same (ie. not strictly increasing sequence).
Let us understand LCS with an example.
If
S1 = {B, C, D, A, A, C, D}
S2 = {A, C, D, B, A, C}
Then, common subsequences are {B, C}, {C, D, A, C}, {D, A, C}, {A, A,
C}, {A, C}, {C, D}, ...
Among these subsequences, {C, D, A, C} is the longest common

subsequence. We are going to find this longest common subsequence
using dynamic programming.
Before proceeding further, if you do not already know about dynamic
programming, please go through dynamic programming.
Using Dynamic Programming to find the LCS
Let us take two sequences:
The first sequence
Second Sequence
The following steps are followed for finding the longest common
subsequence.
1. Create a table of dimension n+1*m+1 where n and m are the lengths

of X and Y respectively. The first row and the first column are filled with
zeros. Initialise a table

2. Fill each cell of the table using the following logic.
3. If the character correspoding to the current row and current column are
matching, then fill the current cell by adding one to the diagonal element.
Point an arrow to the diagonal cell.
4. Else take the maximum value from the previous column and previous row
element for filling the current cell. Point an arrow to the cell with maximum
value. If they are equal, point to any of them.
Fill the values

5. Step 2 is repeated until the table is filled.
Fill all the values

6. The value in the last row and the last column is the length of the longest
common subsequence. The bottom right

corner is the length of the LCS
7. In order to find the longest common subsequence, start from the last
element and follow the direction of the arrow. The elements corresponding
to () symbol form the longest common subsequence.
Create a path according to the arrows
Thus, the longest common subsequence is CA .
LCS
How is a dynamic programming algorithm more efficient than the

recursive algorithm while solving an LCS problem?
The method of dynamic programming reduces the number of function calls.
It stores the result of each function call so that it can be used in future calls
without the need for redundant calls.
In the above dynamic algorithm, the results obtained from each comparison
between elements of X and the elements of Y are stored in a table so that
they can be used in future computations.
So, the time taken by a dynamic approach is the time taken to fill the table
(ie. O(mn)). Whereas, the recursion algorithm has the complexity of 2 max(m, n)
.
Greedy Algorithm
A greedy algorithm is an approach for solving a problem by selecting the
best option available at the moment. It doesn't worry whether the current
best result will bring the overall optimal result.
The algorithm never reverses the earlier decision even if the choice is
wrong. It works in a top-down approach.
This algorithm may not produce the best result for all the problems. It's
because it always goes for the local best choice to produce the global best
result.
However, we can determine if the algorithm can be used with any problem
if the problem has the following properties:
1. Greedy Choice Property

If an optimal solution to the problem can be found by choosing the best
choice at each step without reconsidering the previous steps once chosen,
the problem can be solved using a greedy approach. This property is called
greedy choice property.
2. Optimal Substructure
If the optimal overall solution to the problem corresponds to the optimal
solution to its subproblems, then the problem can be solved using a greedy
approach. This property is called optimal substructure.
Advantages of Greedy Approach
 The algorithm is easier to describe.

 This algorithm can perform better than other algorithms (but, not in all
cases).
Drawback of Greedy Approach
As mentioned earlier, the greedy algorithm doesn't always produce the

optimal solution. This is the major disadvantage of the algorithm
For example, suppose we want to find the longest path in the graph below
from root to leaf. Let's use the greedy algorithm here.
Apply greedy approach

to this tree to find the longest route
Greedy Approach
1. Let's start with the root node 20. The weight of the right child is 3 and the
weight of the left child is 2.
2. Our problem is to find the largest path. And, the optimal solution at the
moment is 3. So, the greedy algorithm will choose 3.
3. Finally the weight of an only child of 3 is 1. This gives us our final
result 20 + 3 + 1 = 24 .
However, it is not the optimal solution. There is another path that carries
more weight ( 20 + 2 + 10 = 32 ) as shown in the image below.
Longest path
Therefore, greedy algorithms do not always give an optimal/feasible

solution.
Greedy Algorithm
1. To begin with, the solution set (containing answers) is empty.
2. At each step, an item is added to the solution set until a solution is reached.
3. If the solution set is feasible, the current item is kept.
4. Else, the item is rejected and never considered again.

Huffman Coding
Huffman Coding is a technique of compressing data to reduce its size
without losing any of the details. It was first developed by David Huffman.
Huffman Coding is generally useful to compress the data in which there are
frequently occurring characters.
How Huffman Coding works?
Suppose the string below is to be sent over a network.
Initial string
Each character occupies 8 bits. There are a total of 15 characters in the

above string. Thus, a total of 8 * 15 = 120 bits are required to send this
string.
Using the Huffman Coding technique, we can compress the string to a
smaller size.
Huffman coding first creates a tree using the frequencies of the character
and then generates code for each character.
Once the data is encoded, it has to be decoded. Decoding is done using

the same tree.
Huffman Coding prevents any ambiguity in the decoding process using the
concept of prefix code ie. a code associated with a character should not
be present in the prefix of any other code. The tree created above helps in
maintaining the property.
Huffman coding is done with the help of the following steps.
1. Calculate the frequency of each character in the string.
Frequency of string
2. Sort the characters in increasing order of the frequency. These are stored
in a priority queue Q . Characters

sorted according to the frequency
3. Make each unique character as a leaf node.
4. Create an empty node z . Assign the minimum frequency to the left child of
z and assign the second minimum frequency to the right child of z . Set the
value of the z as the sum of the above two minimum frequencies.
Getting the sum of the least numbers

5. Remove these two minimum frequencies from Q and add the sum into the
list of frequencies (* denote the internal nodes in the figure above).
6. Insert node z into the tree.
7. Repeat steps 3 to 5 for all the characters.
Repeat steps 3 to 5 for all the characters.

Repeat steps 3 to 5 for all the characters.
8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.
Assign 0 to the left edge and 1 to the right

edge
For sending the above string over a network, we have to send the tree as
well as the above compressed-code. The total size is given by the table
below.
Character Frequency Code Size
A 5 11 5*2 = 10
B 1 100 1*3 = 3
C 6 0 6*1 = 6
D 3 101 3*3 = 9
4 * 8 = 32 bits 15 bits 28 bits
Without encoding, the total size of the string was 120 bits. After encoding
the size is reduced to 32 + 15 + 28 = 75 .
Decoding the code
For decoding the code, we can take the code and traverse through the tree
to find the character.
Let 101 is to be decoded, we can traverse from the root as in the figure
below.
Decoding
Huffman Coding Algorithm
create a priority queue Q consisting of each unique character.
sort then in ascending order of their frequencies.
for all the unique characters:
create a newNode
extract minimum value from Q and assign it to leftChild of newNode

extract minimum value from Q and assign it to rightChild of newNode
calculate the sum of these two minimum values and assign it to the value
of newNode
insert this newNode into the tree
return rootNode
Python, Java and C/C++ Examples
Python
Java
C++
# Huffman Coding in python
string = 'BCAADDDCCACACAC'
# Creating tree nodes

class NodeTree(object):
def __init__(self, left=None, right=None):

self.left = left
self.right = right
def children(self):
return (self.left, self.right)
def nodes(self):
return (self.left, self.right)
def __str__(self):
return '%s_%s' % (self.left, self.right)
# Main function implementing huffman coding
def huffman_code_tree(node, left=True, binString=''):
if type(node) is str:
return {node: binString}
(l, r) = node.children()
d = dict()
d.update(huffman_code_tree(l, True, binString + '0'))
d.update(huffman_code_tree(r, False, binString + '1'))
return d
# Calculating frequency
freq = {}
for c in string:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
nodes = freq
while len(nodes) > 1:

(key1, c1) = nodes[-1]
(key2, c2) = nodes[-2]
nodes = nodes[:-2]
node = NodeTree(key1, key2)
nodes.append((node, c1 + c2))
nodes = sorted(nodes, key=lambda x: x[1], reverse=True)
huffmanCode = huffman_code_tree(nodes[0][0])
print(' Char | Huffman code ')

print('----------------------')
for (char, frequency) in freq:
print(' %-4r |%12s' % (char, huffmanCode[char]))
Huffman Coding Complexity
The time complexity for encoding each unique character based on its frequency
is O(nlog n) .
Extracting minimum frequency from the priority queue takes place 2*(n-1) times
and its complexity is O(log n) . Thus the overall complexity is O(nlog n) .
Huffman Coding Applications
 Huffman coding is used in conventional compression formats like GZIP, BZIP2,

PKZIP, etc.
 For text and fax transmissions.

Unit 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4

Uploaded by

Copyright:

Available Formats

Unit -4

Such problems involve repeatedly calculating the value of the same

Dynamic Programming Example

Let n be the number of terms.

2. Else, return the sum of two preceding numbers.

1. The first term is 0.

2. The second term is 1.

F(2) = F(1) + F(0)

F(3) = F(2) + F(1)

F(4) = F(3) + F(2)

How Dynamic Programming Works

Dynamic programming works by storing the result of subproblems so that

if key n is not in map m

m[n] = fib(n − 1) + fib(n − 2)

Longest Common Subsequence

Then, {A, D, B} cannot be a subsequence of S1 as the order of the

C}, {A, C}, {C, D}, ...

Among these subsequences, {C, D, A, C} is the longest common

Using Dynamic Programming to find the LCS

Let us take two sequences:

The first sequence

1. Create a table of dimension n+1*m+1 where n and m are the lengths

zeros. Initialise a table

Fill the values

Fill all the values

common subsequence. The bottom right

Thus, the longest common subsequence is CA .

How is a dynamic programming algorithm more efficient than the

1. Greedy Choice Property

 The algorithm is easier to describe.

Drawback of Greedy Approach

As mentioned earlier, the greedy algorithm doesn't always produce the

Apply greedy approach

Therefore, greedy algorithms do not always give an optimal/feasible

1. To begin with, the solution set (containing answers) is empty.

3. If the solution set is feasible, the current item is kept.

4. Else, the item is rejected and never considered again.

How Huffman Coding works?

Suppose the string below is to be sent over a network.

Each character occupies 8 bits. There are a total of 15 characters in the

Once the data is encoded, it has to be decoded. Decoding is done using

1. Calculate the frequency of each character in the string.

in a priority queue Q . Characters

Getting the sum of the least numbers

7. Repeat steps 3 to 5 for all the characters.

Repeat steps 3 to 5 for all the characters.

Assign 0 to the left edge and 1 to the right

Character Frequency Code Size

4 * 8 = 32 bits 15 bits 28 bits

Huffman Coding Algorithm

create a priority queue Q consisting of each unique character.

sort then in ascending order of their frequencies.

for all the unique characters:

extract minimum value from Q and assign it to leftChild of newNode

insert this newNode into the tree

Python, Java and C/C++ Examples

# Huffman Coding in python

# Creating tree nodes

def __init__(self, left=None, right=None):

freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)

while len(nodes) > 1:

nodes = sorted(nodes, key=lambda x: x[1], reverse=True)

print(' Char | Huffman code ')

def init(self, left=None, right=None):