You are on page 1of 30

Longest Common

Subsequence Using
Dynamic Programming

Submitted To: Submitted By:


Mrs. Shano Solanki Swati Nautiyal
(Assistant Professor CSE) Roll No.162420
ME-CSE(R)
Contents
Difference in substring and subsequence
Longest common subsequence
LCS with brute force method
LCS with dynamic programming
Recursion tree of LCS
LCS example
Analysis of LCS
Applications
References
Substring and Subsequence
A substring of a string S is another string S that occurs
in S and all the letters are contiguous in S
E.g. Amanpreet
substring1 : Aman substring2 : preet

A subsequence of a string S is another string S that


occurs in S and all the letters need not to be contiguou
s in S
E.g Amanpreet
subsequence1 : Ant subsequence2 : mnet
Longest Common Subsequence
The Longest Common Subsequence (LCS) problem is
as follows. We are given two strings: string A of length x
and string B of length y. We have to find the longest
common subsequence: the longest sequence of
characters that appear left-to-right in both strings.
Example, A= KASHMIR
B= CHANDIGARH
Longest Common Subsequence
The Longest Common Subsequence (LCS) problem is
as follows. We are given two strings: string A of length x
and string B of length y. We have to find the longest
common subsequence: the longest sequence of
characters that appear left-to-right in both strings.
Example, A= KASHMIR
B= CHANDIGARH

LCS has 3 length and string is HIR


Brute Force Method
Given two strings X of length m and Y of length n, find a longest subse
quence common to both X and Y

STEP1 : Find all subsequences of X.

STEP2: For each subsequence, find whether it is a subsequence of


Y.

STEP3: Find the longest common subsequence from available s


ubsequences
Brute Force Method
Given two strings X of length m and Y of length n, find a longest subse
quence common to both X and Y

STEP1 : Find all subsequences of X. 2m


STEP2: For each subsequence, find whether it is a subsequence of

Y. n*2m
STEP3: Find the longest common subsequence from available s
ubsequences.

T.C= O(n2m)
To improve time complexity, we use dynamic programming
Dynamic Programing
Optimal substructure
We have two strings
X= { x1,x2,x3,,xn}

Y= {y1,y2,y3,.,ym}
First compare xn and ym. If they matched, find the subsequence in
the remaining string and then append the xn with it.
If xn ym,
Remove xn from X and find LCS from x1 to xn-1 and y1 to ym
Remove ym from Y and find LCS from x1 to xn and y1 to ym-1

In each step, we reduce the size of the problem into the subproblem
s. It is optimal substructure.
Cont.
Recursive Equation

X= { x1,x2,x3,,xn}

Y= {y1,y2,y3,.,ym}
C[I,j] is length of LCS in X and Y

0
c[I,j] =
{ ; i=0 or j=0
1+c[i-1,j-1] ; I,j>0 and xi=yi
max(c[i-1,j],c[I,j-1]) ; I,j>0 and xiyi
Recursion Tree Of LCS
BEST CASE
X={A,A,A,A}
Y={A,A,A,A}
{0
c[I,j] =
; i=0 or j=0
1+c[i-1,j-1] ; I,j>0 and xi=yi
max(c[i-1,j],c[I,j-1]); I,j>0 and xiyi

C(4,4) =4
1+C(3,3) =4

1+C(2,2) =3

1+C(1,1) =2

1+C(0,0) =1
LCS = 4
T.C. = O(n) 0
Cont.
WORST CASE
X={A,A,A}
Y={B,B,B}
0
c[I,j] =
{ ; i=0 or j=0
1+c[i-1,j-1] ; I,j>0 and xi=yi
max(c[i-1,j],c[I,j-1]); I,j>0 and xiyi

C(3,3)

C(3,2) C(2,3)

C(3,1) C(2,2) C(2,2) C(1,3)

C(3,0) C(2,1) C(2,1) C(1,2) C(2,1) C(1,2)

C(2,0) C(1,1) C(2,0) C(1,1) C(1,1) C(0,2) C(2,0) C(1,1)

C(1,0) C(0,1) C(1,0) C(0,1) C(1,0) C(0,1) C(1,0) C(0,1)

No of nodes O(2+)
As here, the overlapping problem exits, we can apply the dynamic programming.
There are 3*3 unique subproblems. So we compute them once and save in table for
further refrence so n*n memory space required.
Cont.

0 1 - - m
00 01 02 0m
0
(m+1)*(n+1)
=O(m*n)
10 11 12 1m
1
Every element is
20 21 22 2m depend on
2 diagonal, left or
above element

C(2,2)
-
n0 n1 n2 nm
n C(2,1) C(1,2)

We can compute either row wise or column wise


Algorithm
Algorithm LCS(X,Y ):
Input: Strings X and Y with m and n elements, respectively
Output: For i = 0,,m; j = 0,...,n, the length C[i, j] of a longest string that is
a subsequence of both the strings.
for i =0 to m
c[i,0] = 0
for j =0 to n
c[0,j] = 0
for i =0 to m
for j =0 to n do
if xi = yj then
c[i, j] = c[i-1, j-1] + 1
else
L[i, j] = max { c[i-1, j] , c[i, j-1]}
return c
Example
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A
for i =1 to m
0 c[i,0] = 0
for j =0 to n
A c[0,j] = 0

B
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A
if xi = for
yj i =1 to m
0 0 0 0 0 0 0 0 then c[i,0] = 0
c[i, j] for j =0 to
= c[i-1, n
j-1]+1
A 0 else c[0,j] = 0
L[i, j]=max{c[i-1,j], c[i, j-1]
B 0

C 0

B 0

D 0

A 0

B 0
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A
if xi = yj
0 0 0 0 0 0 0 0 then
c[i, j] = c[i-1, j-1]+1
A 0 0 0 0 else
L[i, j]=max{c[i-1,j], c[i, j-1]
B 0

C 0

B 0

D 0

A 0

B 0
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A
if xi = yj
0 0 0 0 0 0 0 0 then
c[i, j] = c[i-1, j-1]+1
A 0 0 0 0 0+1 else
L[i, j]=max{c[i-1,j], c[i, j-1]
B 0

C 0

B 0

D 0

A 0

B 0
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A
if xi = yj
0 0 0 0 0 0 0 0 then
c[i, j] = c[i-1, j-1]+1
A 0 0 0 0 1 1 1 else
L[i, j]=max{c[i-1,j], c[i, j-1]
B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2
Subsequence=BCBA
C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1 Subsequence = BCAB
B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0

A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont.
X={B,D,C,A,B,A}
Y={A,B,C,B,D,A,B}

0 B D C A B A

0 0 0 0 0 0 0 0
Subsequence = BDAB
A 0 0 0 0 1 1 1

B 0 1 1 1 1 2 2

C 0 1 1 2 2 2 2

B 0 1 1 2 2 3 3

D 0 1 2 2 2 3 3

A 0 1 2 2 3 3 4

B 0 1 2 2 3 4 4
Cont
We get three longest common subsequences
BCBA
BCAB
BDAB
Length of longest common subsequence is 4
Analysis Of LCS
We have two nested loops
The outer one iterates n times
The inner one iterates m times
A constant amount of work is done inside each
iteration of the inner loop
Thus, the total running time is O(nm)

Space complexity is also O(nm) for n*m table


Application
DNA matching
DNA comprises of {A,C,G,T}.
DNA1= AGCCTCAGT
DNA2=ATCCT
DNA3=AGTAGC
DNA 1 and DNA 3 are more similar.
Edit Distance
The Edit Distance is defined as the minimum number of edits neede
d to transform one string into the other.
REFERENCES
Textbook Introduction to Algorithm by Coreman
http://www.perlmonks.org/?node_id=652798
https://www.ics.uci.edu/~eppstein/161/960229.html
https://
en.wikipedia.org/wiki/Longest_common_subsequence_problem
http://www.slideshare.net/ShahariarRabby1/longest-common-sub
sequence-lcs?qid=49affff4-9c19-4957-bdb9-7877801a569e&v=&b=&
from_search=1
THANK YOU

You might also like