You are on page 1of 24

Dynamic Programming - III

Longest Common Sequence

Instructor: Ashok Singh Sairam


Lecture Plan
• Longest Common Sequence
 Motivation and Problem Definition
 Brute Force Approach
 Dynamic Programming Approach

MA512: Data Structures and Algorithms


2
Longest Common Subsequence (LCS)
• We have two strings X and Y and we want to find
the longest possible string that is a subsequence
of both X and Y.
• When we say that Z is a subsequence of X, we
mean we can derive Z from X by removing some of
its elements

MA512: Data Structures and Algorithms


3
Ex: LCS
• X = A, B, C, B, D, A, B X = A, B, C, B, D, A, B
• Y = B, D, C, A, B, A Y = B, D, C, A, B, A

• B, C, B, A and B, D, A, B are longest


common subsequences of X and Y (length = 4)

• B, C, A, however is not a LCS of X and Y

MA512: Data Structures and Algorithms


4
Practical example of LCS
• DNA string: A sequence of symbols A, C, G, T
 S= ACCGGTCGAGCTTCGAATT

• DNA analysis: two DNA string comparison

MA512: Data Structures and Algorithms


5
Practical example of LCS
• DNA strand has millions of base elements and even a
simple gene has many thousands of elements.
 Efficiency in comparison is important

• Exact match of gene unlikely, so researchers look for


common subsequences of gene and the DNA strand
 If subsequence is most of length of gene sequence,

strand is regarded as containing the gene.

MA512: Data Structures and Algorithms


6
Problem Defintion
• Given two sequences
X = x1, x2, …, xm
Y = y1, y2, …, yn
find a maximum length common subsequence
(LCS) of X and Y

MA512: Data Structures and Algorithms


7
Brute-Force Solution
• For every subsequence of X, check whether it’s a
subsequence of Y

• There are 2m subsequences of X to check


• Each subsequence takes (n) time to check
 scan Y for first letter, scan for second, and so on

• Running time: (n2m)

MA512: Data Structures and Algorithms


8
Notations
• Given a sequence X = x1, x2, …, xm we define the
i-th prefix of X, for i = 0, 1, 2, …, m

Xi = x1, x2, …, xi

• c[i, j] = the length of an LCS of the sequences


Xi = x1, x2, …, xi and Yj = y1, y2, …, yj

MA512: Data Structures and Algorithms


9
LCS DP: Step 1
• Characterize optimal substructure

MA512: Data Structures and Algorithms


10
What the theorem says?
• Case 1: If 𝑥𝑚 = 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛−1 ,
then append 𝑥𝑚
 Include one element into the common sequence (Z) and
solve the resulting subproblem
 One subproblem

X = A, B, D, E
Y = Z, B, E

MA512: Data Structures and Algorithms


11
What the theorem says? – contd.
• Case2: If 𝑥𝑚 ≠ 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛 and LCS
of 𝑥𝑚 and 𝑦𝑛−1 , take which one is longer
 Exclude an element from a string and solve the resulting
subproblem
 Results in two subproblems
X = A, B, D, G
Y = Z, B, D

• find a LCS of Xi-1 and Yj: Xi-1 = A, B, D and Yj = Z, B, D


• find a LCS of Xi and Yj-1: Xi = A, B, D, G and Yj = Z, B

MA512: Data Structures and Algorithms


12
Overlapping substructure
• Overlapping substructure
 Both LCS of 𝑥𝑚−1 and 𝑦𝑛 and
LCS of 𝑥𝑚 and 𝑦𝑛−1 ,
will need to solve LCS of 𝑥𝑚−1 and 𝑦𝑛−1

MA512: Data Structures and Algorithms


13
LCS DP: Step 2 – Recursive solution
• Recall c[i, j] = Length of LCS of sequences Xi and Yj
• Case 1: xi = yj
c[i, j] = c[i - 1, j - 1] + 1
• Case 2: xi  yj
c[i, j] = max { c[i - 1, j], c[i, j-1] }
• Recursive formula

MA512: Data Structures and Algorithms


14
Overlapping Subproblems
• To find a LCS of X and Y
 we may need to find the LCS between Xm and Yn-1 and
that of Xm-1 and Y
 Both the above subproblems has the subproblem of
finding the LCS of Xm-1 and Yn-1

• Subproblems share subsubproblems

MA512: Data Structures and Algorithms


15
LCS DP: Step 3 – Computing length
of the LCS

j
• Store the value of yj: y1 y2 yn
the c[i,j] in a table 0 xi 0 0 0 0 0 0
 Compute the 1 x1 first
0
entries in a row-
2 x2 0 second
major order i
0
0
m xm 0
0 1 2 n
MA512: Data Structures and Algorithms
16
Additional information
A matrix b[i, j]:
• For a subproblem [i, j] it 0 1 2 3 n
tells us what choice was b & c: yj: A C D F
made to obtain the 0 xi 0 0 0 0 0 0
optimal value 1 A 0
• If xi = yj
2 B
b[i, j] = “ ” 0 c[i-1,j]
i
• Else, if c[i - 1, j] ≥ c[i, j-1] 3 C 0 c[i,j-1]
b[i, j] = “  ” 0
else m D 0
b[i, j] = “  ”
j

MA512: Data Structures and Algorithms


17
LCS-LENGTH(X, Y)
1. for i ← 1 to m
The length of the LCS if one of the sequences
2. do c[i, 0] ← 0 is empty is zero
3. for j ← 0 to n
4. do c[0, j] ← 0
5. for i ← 1 to m
6. do for j ← 1 to n
7. do if xi = yj
Case 1: xi = yj
8. then c[i, j] ← c[i - 1, j - 1] + 1
9. b[i, j ] ← “ ”
10. else if c[i - 1, j] ≥ c[i, j - 1]
11. then c[i, j] ← c[i - 1, j]
12. b[i, j] ← “↑”
Case 2: xi  yj
13. else c[i, j] ← c[i, j - 1]
14. b[i, j] ← “←”
15. return c and b
Running time: (mn)
MA512: Data Structures and Algorithms
18
0 if i = 0 or j = 0
Example c[i, j] = c[i-1, j-1] + 1 if xi = yj
max(c[i, j-1], c[i-1, j]) if xi  yj
X = A, B, C, B, D, A
Y = B, D, C, A, B, A 0 1 2 3 4 5 6
yj B D C A B A
If xi = yj 0 xi 0 0 0 0 0 0 0
b[i, j] = “ ” 1 A   
0 0 0 0 1 1 1
Else if c[i - 1, j] ≥ c[i, j-1] 
2 B 0 1 1 1 1 2 2
b[i, j] = “  ”
3 C    
else 0 1 1 2 2 2 2
4 B   
b[i, j] = “  ” 0 1 1 2 2 3 3
5 D     
0 1 2 2 2 3 3
   
6 A 0 1 2 2 3 3 4
   
7 B 0 1 2 2 3 4 4

19
4. Constructing a LCS
• Start at b[m, n] and
follow the arrows 0 1 2 3 4 5 6
• When we encounter yj B D C A B A
a “ “ in b[i, j]  0 xi 0 0 0 0 0 0 0
xi = yj is an 1 A 0

0

0

0 1 1 1
element of the LCS 2 B 
0 1 1 1 1 2 2
3 C    
0 1 1 2 2 2 2
4 B   
0 1 1 2 2 3 3
5 D     
0 1 2 2 2 3 3
   
6 A 0 1 2 2 3 3 4
   
7 B 0 1 2 2 3 4 4
20
PRINT-LCS(b, X, i, j)
1. if i = 0 or j = 0 Running time: (m + n)
2. then return
3. if b[i, j] = “ ”
4. then PRINT-LCS(b, X, i - 1, j - 1)
5. print xi
6. elseif b[i, j] = “↑”
7. then PRINT-LCS(b, X, i - 1, j)
8. else PRINT-LCS(b, X, i, j - 1)
Initial call: PRINT-LCS(b, X, length[X], length[Y])
21
Improving the Code
• What can we say about how each entry c[i, j] is
computed?
 It depends on c[i -1, j - 1], c[i - 1, j], and c[i, j - 1]
 Eliminate table b and compute in O(1) which of the
three values was used to compute c[i, j]
 We save (mn) space from table b
 However, we do not asymptotically decrease the
auxiliary space requirements: still need table c

MA512: Data Structures and Algorithms


22
Improving the Code – contd.
• If we only need the length of the LCS
 LCS-LENGTH works only on two rows of c at a time

• The row being computed and the previous row

 We can reduce the asymptotic space requirements by


storing only these two rows

MA512: Data Structures and Algorithms


23
Exercise

MA512: Data Structures and Algorithms


24

You might also like