You are on page 1of 4

Longest Common subsequence(LCS)

Let P=set of alphabet.And A and B are two strings of size n from the alphabet .
i.e. A=a1,a2,.....,am and B=b1,b2,.....,bn . Now we have to _nd the longest common subsequence
between them .
Now suppose we de_ne L[i,j]as
L[i,j]= Length of longest common subsequence between the strings a1,a2,.....,ai and b1,b2,.....,bj
.
Now from the problem we can see that L[i,j] can be de_ned in terms of the previous ones
L[i-1,j-1] , L[i-1,j] , L[i,j-1] as bellow .
L[i,j] = L[i-1,j-1]+1 if ai = bj
L[i,j] = Max(L[i,j-1] , L[i-1,j]) otherwise .
Let C[m_n] matrix contains the length of an longest common subsequence between
5
a1,a2,.....,am and b1,b2,.....,bn and D[m_n] matrix contains the longest common subsequence
. If D[i;j]=- then ai or bj ( both are equal ) is part of the longest common
subsequence .
The Algorithm is given bellow .
Method
Input:Two Arrayes A[m] and B[n].
Output: Two m_n matrices C and D.
LCS
m=length[A];
n=length[B];
for i 1 upto m
C[i,0] 0;
for j 1 upto n
C[0,j] 0;
for i 1 upto m
for j 1 upto n
if(ai = bj)
C[i,j] C[i-1,j-1]+1;
D[i,j] "-";
else if(C[i-1,j]_C[i,j-1])
C[i,j] C[i-1,j];
D[i,j] """;
elseC[i,j] C[i,j-1];
D[i,j] " ";
endfor
endfor
return C and D
4.1 Construction Of Longest Common Subsequence:
From the D[m_n] matrix we can get the LCS as bellow .
Method
6
Input: D[m_n] matrix output from LCS Algorithm .
Output: Longest Common Subsequence .
PrintLCS(D,A,i,j)
if(i=0 or j=0)
then return ;
endif
if(D[i,j]="-")
then PrintLCS(D,A,i-1,j-1);
print Ai;
else if(D[i,j]=""")
then PrintLCS(D,A,i-1,j);
else PrintLCS(D,A,i,j-1);
endif
endif
4.2 Complexity analysis
4.2.1 Time Complexity
From the above LCS algorithm it is very clear that the time complexity is _( m*n ) as
there are two nested for loop of length m and n .
4.2.2 Space Complexity
As we need to store the m_n matrices C and D the space complexity is also _( m*n )
. But we can reduce the space for storing storing the m*n matrix C . We can store only
i-1st and i-2nd row/column whichever is smaller for computing at ith level in the LCS
algorithm . Thus we can improve the space complexity to _( Min ( m,n ) ) .
7

Longest common subsequence problem

Why might we want to solve the longest common subsequence problem? There are

several motivating applications.

a) Molecular biology.

DNA sequences (genes) can be represented as sequences of four letters

ACGT,(A=adenine,C=cytosine,G=guanine and T=thymine) , corresponding to the four

submolecules forming DNA. When biologists find a new sequences, they typically want

to know what other sequences it is most similar to. One way of computing how similar

two sequences are is to find the length of their longest common subsequence.

b) File comparison.

The Unix program "diff" is used to compare two different versions of the same file, to
determine what changes have been made to the file. It works by finding a longest

common subsequence of the lines of the two files; any line in the subsequence has not

been changed, so what it displays is the remaining set of lines that have changed. In this

instance of the problem we should think of each line of a file as being a single

complicated character in a string.

c) Screen redisplay.

Many text editors like "emacs" display part of a file on the screen, updating the screen

image as the file is changed. For slow dial-in terminals, these programs want to send the

terminal as few characters as possible to cause it to update its display correctly. It is

possible to view the computation of the minimum length sequence of characters needed

to update the terminal as being a sort of common subsequence problem (the common

subsequence tells you the parts of the display that are already correct and don't need to be

changed).

Brute-force methods:-

Using brute-force methods,we are solving LCS problem. If we have two strings, say

"subsequence" and "opsubset", we can represent a subsequence as a way of writing the

two so that certain letters line up:

Subsequence

|||||

Opsubset

If we draw lines connecting the letters in the first string to the corresponding letters in the

second, no two lines cross (the top and bottom endpoints occur in the same order, the

order of the letters in the subsequence). Conversely any set of lines drawn like this,

without crossings, represents a subsequence.

On the other hand, suppose that, like the example above, the two first characters differ.
Then it is not possible for both of them to be part of a common subsequence - one or the

other (or maybe both) will have to be removed.

Finally, observe that once we've decided what to do with the first characters of the

strings, the remaining subproblem is again a longest common subsequence problem, on

two shorter strings. Therefore we can solve it recursively.

These observations give us the following, very inefficient, recursive algorithm.

Recursive LCS:

int lcs_length(char * A, char * B)

if (*A == '\0' || *B == '\0') return 0;

else if (*A == *B) return 1 + lcs_length(A+1, B+1);

else return max(lcs_length(A+1,B), lcs_length(A,B+1));

This is a correct solution but it's very time consuming. For example, if the two strings

have no matching characters, so the last line always gets executed, the the time bounds

are binomial coefficients, which (if m=n) are close to O(2^n).

You might also like