You are on page 1of 6

The 25th Workshop on Combinatorial Mathematics and Computation Theory

Fast Algorithms for the Constrained Longest Increasing


Subsequence Problems

I-Hsuan Yang1 and Yi-Ching Chen2


1
Department of Computer Science
Stanford University, Stanford, CA 94305, USA
ihyang@cs.stanford.edu
2
Department of Computer Science and Information Engineering
National Taiwan University, Taipei, Taiwan 106
d94010@csie.ntu.edu.tw

Abstract formatics [8, 10, 18, 21]. For example, MUM-


mer [8, 9, 14], a large-scale DNA sequence align-
Let a1 , a2 , . . . , an  be a sequence of compara- ment tool, extracts the longest possible set of
ble elements. In this paper, we study two con- matches found in the MUM (maximal unique
strained versions of the longest increasing subse- match) alignment by solving an LIS problem.
quence (LIS) problem. The first problem is the When extending the tool to align three or more
range-constrained longest increasing subsequence sequences, Yang et al. [20] formulated the prob-
(RLIS) problem. Given 0 < LI ≤ UI < n and 0 ≤ lem as the longest common increasing subsequence
LV ≤ UV , the objective of the RLIS problem is to (LCIS) problem which combines the LIS prob-
deliver a maximum-length increasing subsequence lem with the longest common subsequence (LCS)
ai1 , ai2 , . . . , ail  satisfying LI ≤ ik+1 − ik ≤ UI problem. Further results in this line of investiga-
and LV ≤ aik+1 − aik ≤ UV for all 1 ≤ k < l. We tion can be found in [5, 6, 12].
give an O(n log(UI −LI ))-time and O(n)-space al- Albert et al. [2] defined a problem called
gorithm for solving the RLIS problem. The second LISW, which is to find the LIS in sliding windows
problem is the slope-constrained longest increasing over a sequence of n elements. Chen et al. [7]
subsequence (SLIS) problem. Given a nonnegative solved the LISW problem by maintaining a canon-
slope m, the objective of the SLIS problem is to ical antichain partition in windows. While deal-
obtain a maximum-length increasing subsequence ing with the special case in which the input se-
ai −aik
ai1 , ai2 , . . . , ail  satisfying ik+1
k+1 −ik
≥ m for all quence is a permutation of 1, 2, . . . , n, Bespamy-
1 ≤ k < l. Our algorithm for the SLIS problem atnikh and Segal [4] gave an algorithm running
runs in O(n log r) time and O(n) space, where r in O(n log log n) time by utilizing a data structure
is the length of an SLIS. called van Emde Boas (vEB) tree [19].
In some situations, it might be desirable to im-
pose some constraints between two consecutive el-
1 Introduction ements in the resulting subsequence. For exam-
ple, we might require two consecutive MUMs or
The longest increasing sequence (LIS) problem HSPs (high-scoring segment pairs) not too close
is a very classical problem in computer science [15]. or too far away [8, 10]. Therefore, in this pa-
Fredman [11] showed a lower bound Ω(n log n) of per, we define two constrained versions of the LIS
the LIS problem in the decision tree model. Knuth problem, namely, the range-constrained longest in-
[13] and Schensted [16] gave an optimal time algo- creasing subsequence (RLIS) problem and slope-
rithm for this problem where the input is an arbi- constrained longest increasing subsequence (SLIS)
trary sequence of n numbers. The expected length problem, respectively. Let A be a sequence
of an LIS
√ of a random
√ permutation has been shown a1 , a2 , . . . , an  whose elements are comparable,
to be 2 n − o( n) [3]. for example, a sequence of real numbers. An LIS of
Recently, researchers have found several new the sequence A is a subsequence ai1 , ai2 , . . . , ail 
applications related to the LIS problem in bioin- of A with maximum length, where i1 < i2 < . . . <

-226-
The 25th Workshop on Combinatorial Mathematics and Computation Theory

il and ai1 < ai2 < . . . < ail . The RLIS and SLIS in the subsequence. In this section, we define that
problems are formally defined as follows. ai ≺R aj if and only if LV ≤ aj − ai ≤ UV and
LI ≤ j − i ≤ UI .
The RLIS Problem:
Input: a sequence of comparable elements
2.1 The Algorithm
a1 , a2 , . . . , an  and 0 < LI ≤ UI < n,
0 ≤ LV ≤ UV
The Rank of element ai , denoted by Ranki , is
Output: an RLIS ai1 , ai2 , . . . , ail , which is a
defined as the length of an RLIS ending at ai and
maximum-length increasing subse-
computed by the following lemma.
quence satisfying LI ≤ ik+1 − ik ≤
UI and LV ≤ aik+1 − aik ≤ UV for all Lemma 1: If aj ⊀R ai for all j, let Ranki = 1,
1≤k<l and max1≤j<i {Rankj |aj ≺R ai } + 1 otherwise.
An element can be plotted on a 2-D plane us-
Our algorithm utilizes a data structure called
ing its index as the x-ordinate and value as the
the dynamic RMQ [17] described as follows. Let
y-ordinate. Consequently, the sequence A would
S be a set of items which is initially an empty set
become n points on a 2-D plane and any pair of
and maintained by an AVL tree [1]. Each item
points makes a slope.
(v, r) of S includes two given values v and r, and
The SLIS Problem: can be inserted to or deleted from S by the key v
Input: a sequence of comparable elements at any time. For any pair (p, q), the maximum r
a1 , a2 , . . . , an  and a nonnegative slope among the items in S whose v value is within the
boundary m range [p, q] can be queried by this data structure.
Output: an SLIS ai1 , ai2 , . . . , ail , which is a All operations run in O(log |S|) time, where |S| is
maximum-length increasing subse- the cardinality of S at the operation time.
quence such that the slope between In the RLIS problem, let v and r be the value
two consecutive points is no less and Rank of ai , respectively. In other words, each
ai −aik
than the input ratio, i.e., ik+1 k+1 −ik
≥ item in S has the form (ai , Ranki ). We define
m for all 1 ≤ k < l three operations specific to this problem, in which
all of them run in O(log |S|) time by the dynamic
Notice that the inputs of the RLIS and SLIS RMQ datastructure.
problems are constrained on 0 < LI ≤ UI < n,
0 ≤ LV ≤ UV , and m ≥ 0, respectively. If LV = 0 1. Insert(ai , Ranki ) — inserts the item
or m = 0, the output subsequence would be a non- (ai , Ranki ) into S.
decreasing subsequence. Moreover, our algorithm
for the RLIS problem still works if LV < 0, and 2. Delete(ai ) — deletes the item (ai , Ranki )
the output would be another related subsequence from S.
where any two consecutive elements may decrease 3. Query(p, q) — returns the item (ai , Ranki )
by at most −LV . with maximum Rank among all elements in
The rest of this paper is organized as follows. S, where p ≤ ai ≤ q.
Section 2 gives an O(n log(UI −LI )) time and O(n)
space algorithm for the RLIS problem. In Section The strategy of this algorithm is using the dy-
3, an O(n log r) time and O(n) space algorithm is namic programming to parse each element ai from
proposed for solving the SLIS problem, where r i = 1 to n. At the ith iteration, S maintains the
is the length of an SLIS. Concluding remarks are set of items (ak , Rankk ) where the index k is in
given in Section 4. the range [i − UI , i − LI ], and Ranki is computed.
Then, any element in S dominates ai if and only if
its value is in the range [ai − UV , ai − LV ]. Query-
2 The RLIS Problem ing in the dynamic RMQ datastructure with the
range [ai − UV , ai − LV ] obtains the element which
The input of the RLIS problem contains an ar- dominates ai with maximum Rank.
ray A = a1 , a2 , . . . , an , 0 < LI ≤ UI < n, and The pseudo code for the RLIS problem is given
0 ≤ LV ≤ UV . For any two elements ai and aj in Figure 1. At the first step of for loop, in or-
in A, ai is said to dominate aj , or said to be a der to fit in with the constraint of LI and UI , it
dominating element of aj , denoted by ai ≺ aj , if adjusts the set S by deleting ai−UI −1 and insert-
and only if aj can be the successive element of ai ing (ai−LI , Ranki−LI ). Secondly, querying the

-227-
The 25th Workshop on Combinatorial Mathematics and Computation Theory

By controlling the index in the set S and query-


Algorithm RLIS ing the value range in the dynamic RMQ data
Input: A = a1 , a2 , . . . , an , UI , LI , UV , LV structure, the for loop in Algorithm RLIS main-
Output: an RLIS tains Lemma 2 to get the correct Rank of all el-
for i ← 1 to n do ements. The following theorem analyzes the time
if i − UI − 1 ≥ 1 then Delete(ai−UI −1 ); and space complexities of Algorithm RLIS.
if i − LI ≥ 1 thenInsert(ai−LI , Ranki−LI );
(ak , Rankk ) ← Query(ai − UV , ai − LV )) Theorem 3: Algorithm RLIS takes O(n log(UI −
if ((ak , Rankk ) = N U LL then Ranki ← 1; LI )) time and O(n) space where n is the size of
else Ranki ← Rankk + 1; input and (UI − LI ) < n.
Make backtracking link (i → k);
end for Proof: In each round of the for loop, each
Find j s.t. Rankj = max1≤i≤n Ranki ; operation of Delete, Insert and Query is per-
Output Rankj and the sequence in reverse order formed once which runs in O(log|S|) time. The
by backtracking from j; cardinality of S is not bigger than UI − LI
because S only keeps the elements whose indices
are in the range [i − UI , i − LI ]. Thus, the total
Figure 1: An algorithm for the RLIS problem. time complexity is O(n log(UI − LI )). The space
complexity used by the dynamic RMQ is O(|S|).
The total space complexity is O(|S| + n) = O(n).
range [ai − UV , ai − LV ] returns the element which
dominates element ai with maximum Rank. If it
returns N U LL, ai will be the starting sequence
with Rank = 1. Otherwise, Ranki equals to the
Rank of returned element plus one, and we make
a backtracking link from i to the index of returned 3 The SLIS Problem
element.
After the loop, the maximum value Rankj in The input of this problem contains an array
{Ranki |1 ≤ i ≤ n} will be the length of an RLIS. A = a1 , a2 , . . . , an  and a nonnegative ratio m.
Tracing back through the backtracking links from For any two elements ai and aj in A, we define
a −a
j obtains the reverse order of the RLIS of the input that ai ≺S aj if and only if jj−i i ≥ m and j > i
sequence A. in this section. A main observation is that the
property of transitivity holds for the relation ≺S .
2.2 Correctness and Efficiency That is, if a ≺S b and b ≺S c, we have a ≺S c.
Note that this property does not hold for the re-
lation ≺R in Section 2 because of two constraints
Lemma 2: Ranki computed by Lemma 1 is the
LI and LV . The objective of the SLIS problem is
length of an RLIS ending with ai .
to find a maximum-length increasing subsequence
ai −aik
ai1 , ai2 , . . . , ail  satisfying ik+1
k+1 −ik
≥ m for all
Proof: We prove it by induction on i. If
1 ≤ k < l.
i = 1 or there does not exist any element that
dominates ai , the subsequence only contains
ai , and this lemma is true. Assume that the
3.1 The Algorithm
lemma holds when i = 2 to k. Now we con-
sider the condition i = k + 1. Let aj have the In this problem, the Rank of element ai is de-
maximum Rank value and dominate ak+1 . If fined as the length of an SLIS ending at ai and
Rankk+1 > Rankj + 1, the Rank value of ak+1 ’s computed by the following lemma.
preceding element in the subsequence will be equal
Lemma 4: If aj ⊀S ai for all j, let Ranki = 1,
to Rankk+1 − 1 > Rankj , and it also dominates
and max1≤j<i {Rankj |aj ≺S ai } + 1 otherwise.
ak+1 , a contradiction. If Rankk+1 < Rankj + 1,
it is easy to see that aj ⊀R ak+1 because We partition the whole sequence into some
Rankj ≥ Rankk+1 , also a contradiction. Then groups R1 , R2 , . . . , Rk according to the value of
we have Rankk+1 = Rankj + 1. By the induction Ranki . In other words, ai ∈ Rk if and only if
hypothesis, the lemma holds. Ranki = k. The pseudo code for the SLIS prob-
lem is given in Algorithm SLIS. The strategy of

-228-
The 25th Workshop on Combinatorial Mathematics and Computation Theory

that dominates element ai . We show that c is the


Algorithm SLIS critical point if c also dominates ai . Suppose each
Input: A = a1 , a2 , . . . , an , m element is represented on the plane which has been
Output: an SLIS described above. It forms a triangle with three
for i ← 1 to n do vertices a, c, and ai . Note that the index order
Find max{j|Rj ≺S ai }; // use binary search will be a, c and ai since c is the last element in Rk
if j = N U LL then R1 ← ai ; and ai is the current element under processing.
else Rj+1 ← ai ; Let the slopes of aai , ac and cai be represented
Make backtracking link (i ← Rj ’s index); by maai , mac and mcai , respectively. Since
end for a ≺S ai , we have maai ≥ m. Assume that
Find max{j|Rj = N U LL}; c ⊀S ai . Then we have mcai < m. Because maai
Output j and the sequence in reverse order by is the convex combination of mac and mcai , it
backtracking from j; follows that mac is greater than m. In other
words, a ≺S c and c should not have been inserted
into Rk . This is a contradiction. Therefore, we
Figure 2: An algorithm for the SLIS problem.
conclude that c ≺S ai .

this algorithm is also using dynamic programming For any element a ∈ Rk , there must exist an
to parse each element ai from i = 1 to n, and element a ∈ Rk−1 such that a ≺S a. At the ith
Ranki is computed at the ith iteration. At the iteration, a ≺S ai implies that a ≺S ai by the
ith iteration, let Rk be any partition of the ele- property of transitivity. By Lemma 5, we know
ment set {a1 , a2 , . . . , ai−1 }. An element c ∈ Rk is that the critical point c in Rk−1 dominates ai , i.e.,
called a critical point in Rk if an element in Rk c ≺S ai . Thus, the following lemma holds.
dominates ai implies that c also dominates ai .
When we want to find out if there exists any Lemma 6: For any element a ∈ Rk at the ith
dominating element in a partition Rk , it is suffi- iteration, if a ≺S ai , then the critical point of
cient to check only the critical point in Rk . We Rk−1 also dominates ai .
shall prove that for any partition Rk , the last in-
By the property of transitivity and Lemma 6,
serted element is the critical point. Thus, only the
we have Corollary 7.
element with maximum index in each partition Rk
should be recorded in our algorithm. Corollary 7: For any element a ∈ Rk at the ith
At the ith iteration of the algorithm, suppose iteration, if a ≺S ai , then for all critical points in
that the critical point which dominates ai with Rj with j < i dominates ai .
maximum Rank is in the partition of Rk . We then
have Ranki = k + 1, thus ai is inserted into Rk+1 . By Lemma 5, it is obvious to know that Al-
Similarly, if there does not exist any dominating gorithm SLIS implements the Lemma 6. By
element of ai , then ai ∈ R1 and Ranki = 1. the Corollary 7, performing a binary search takes
We shall also show that for any element a ∈ Rk O(log |R|) time for each element, where |R| is the
at the ith iteration, if a ≺S ai , then we have number of Rank partitions and does not greater
c ≺S ai for all critical points c in Rj with j < k. than the output size r. It takes O(n) space
By this property, the algorithm uses binary search to record all the critical points and backtracking
to find the dominating critical point with maxi- links. Therefore, we have the following theorem.
mum Rank. The remaining steps of the algorithm
such as making backtracking links and outputting Theorem 8 : Algorithm SLIS delivers an SLIS
length and sequence are the same with Section 2. of an input sequence in O(n log r) time and O(n)
space, where n is the input size and r is the output
3.2 Correctness and Efficiency size.

Lemma 5 : For any partition Rk , the last in-


serted element (with maximum index) is the crit- 4 Concluding Remarks
ical point.
We investigate two constrained versions of the
Proof: At the ith iteration, let c be the last in- LIS problem. By using the dynamic RMQ data
serted element in Rk and let a ∈ Rk be an element structure, we present an algorithm that solves the

-229-
The 25th Workshop on Combinatorial Mathematics and Computation Theory

RLIS problem in O(n log(UI −LI )) time and O(n) [8] A.L. Delcher, S. Kasif, R.D. Fleischmann, J.
space, where UI and LI are the upper and lower Peterson, O. white, and S.L. Salzberg. Align-
bounds of differences on indices, respectively. An- ment of whole genomes. Nucleic Acids Res.,
other algorithm presented in this paper utilizes the 27(11):2369–2376, 1999.
concept of “critical point” to solve the SLIS prob-
lem in O(n log r) time and O(n) space, where r is [9] A.L. Delcher, A. Phillippy, J. Carlton, and
the output length. S.L. Salzberg. Fast algorithms for large-scale
genome alignment and comparison. Nucleic
Acknowledgments. We thank our advisor Prof. Acids Res., 30(11):2478–2483, 2002.
Kun-Mao Chao for valuable comments. I-Hsuan
Yang and Yi-Ching Chen were supported in part [10] L. Florea, G. Hartzell, Z. Zhang, G.M. Rubin,
by NSC grants 95-2221-E-002078, 95-2221-E-002- and W. Miller. A computer program for align-
126-MY3, and 96-2221-E-002-034 from the Na- ing a cDNA sequence with a genomic DNA
tional Science Council, Taiwan. sequence. Genome Res., 8(9), 967–974, 1998.

[11] M.L. Fredman. On computing the length


of longest increasing subsequences. Discrete
References Math., 11:29–35, 1975.

[1] G.M. Adelson-Velskil and E.M. Landis. So- [12] I. Katriel and M. Kutz. A faster algorithm
viet Math. (Dokl.) 3, 1259–1263, 1962. for computing a longest common increasing
subsequence. Research Report MPI-I-2005-
[2] M.H. Albert, A. Golynski, A.M. Hamel, A.
1-007, Max-Planck-Institut für Informatik,
López-Ortiz, S.S. Rao, and M.A. Safari.
Stuhlsatzenhausweg 85, 66123 Saarbrücken,
Longest increasing subsequences in sliding
German, March 2005.
windows. Theor. Comput. Sci., 321:405–414,
2004. [13] D.E. Knuth. The Art of Computer Program-
[3] D. Aldous and P. Diaconis. Longest increas- ming. Vol 3, Sorting and searching. Addison-
ing subsequences: From patience sorting to Wesley, 1998.
the Baik-Deift-Johansson theorem. Bulletin
[14] S. Kurtz, A. Phillippy, A.L. Delcher, M.
(New Series) of the American Mathematical
Smoot, M. Shumway, C. Antonescu, and
Society, 36(4):413–432, 1999.
S.L. Salzberg. Versatile and open software for
[4] S. Bespamyatnikh and M. Segal. Enumerat- comparing large genomes. Genome Biology.,
ing longest increasing subsequences and pa- 5:R12, 2004.
tience sorting. Inf. Process. Lett., 76:7–11,
2000. [15] U. Manber. Introduction to algorithms – A
creative approach. Addison-Wesly, 1989.
[5] G.S. Brodal, K. Kaligosi, I. Katriel, and
M. Kutz. Faster algorithms for comput- [16] C. Schensted. Longest increasing and decreas-
ing longest common increasing subsequences. ing subsequences. Canad. J. Math., 13:179–
BRICS-RS-05-37, December 2005. 191, 1961.

[6] W.T. Chan, Y. Zang, S.P.Y Fung, D. Ye, [17] T. Shibuya and I. Kurochkin. Match chain-
and H. Zhu. Efficient algorithms for finding ing algorithms for cDNA mapping. In Proc.
a longest common increasing subsequence. In 3rd International Workshop on Algorithms in
Proc. 16th Annual International Symposium Bioinformatics (WABI 2003), 2812:462–475,
on Algorithms and Computation (ISAAC 2003.
2005), 665–674, 2005.
[18] T.F. Smith and M.S. Waterman. Comparison
[7] E. Chen, H. Yuan, and L. Yang. Longest of biosequences. Adv. in Appl. Math., 2:482–
increasing subsequences in windows based 489, 1981.
on canonical antichain partition. In Proc.
16th Annual International Symposium on Al- [19] P. van Emde Boas. Preserving order in a for-
gorithms and Computation (ISAAC 2005), est in less than logarithmic time and linear
1153–1162, 2005. space. Inf. Process. Lett., 6(3):80–82, 1977.

-230-
The 25th Workshop on Combinatorial Mathematics and Computation Theory

[20] I.H. Yang, C.P. Huang, and K.M. Chao.


A fast algorithm for computing a longest
common increasing subsequence. Inf. Process.
Lett., 93:249–253, 2005.

[21] H. Zhang. Alignment of blast high-scoring


segment pairs based on the longest increas-
ing subsequence algorithm. Bioinformatics,
19(11):1391–1396, 2003.

-231-

You might also like