Sorting Genomes by SBR

A FASTER AND SIMPLER ALGORITHM FOR SORTING SIGNED
PERMUTATIONS BY REVERSALS
HAIM KAPLANy , RON SHAMIRz ,
AND
ROBERT E. TARJANx
Abstra t. We give a quadrati -time algorithm for nding the minimum number of reversals
needed to sort a signed permutation. Our algorithm is faster than the previous algorithm of Hannenhalli and Pevzner and its faster implementation of Berman and Hannenhalli. The algorithm
is on eptually simple and does not require spe ial data stru tures. Our study also onsiderably
simplies the ombinatorial stru tures used by the analysis.
AMS (MOS) subje t lassi ation: 62P10 68P10

Key words: sorting permutations, reversal distan e, omputational mole ular bi-
ology.
In this paper we study the problem of sorting signed permutations by reversals. A signed permutation is a permutation = ( ; : : : ; n) on the
integers f1; : : : ; ng, where ea h number is also assigned a sign of plus or minus. A
reversal, (i; j ), on transforms to
0 = (i; j ) =
( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):
The minimum number of reversals needed to transform one permutation to another is alled the reversal distan e between them. The problem of sorting signed permutations by reversals is to nd, for a given signed permutation , a sequen e of reversals of minimum length that transforms to the identity permutation (+1; +2; : : : ; +n).
The motivation to studying the problem arises in mole ular biology: Con urrent
with the fast progress of the Human Genome Proje t, geneti and DNA data on many
model organisms is a umulating rapidly, and onsequently the ability to ompare
genomes of dierent spe ies has grown dramati ally. One of the best ways of he king
similarity between genomes on a large s ale is to ompare the order of appearan e
of identi al genes in the two spe ies. In the Thirties, Dobzhansky and Sturtevant [7
had already studied the notion of inversions in hromosomes of drosophila. In the
late Eighties, Jerey Palmer demonstrated that dierent spe ies may have essentially
the same genes, but the gene orders may dier between spe ies. Taking an abstra t
perspe tive, the genes along a hromosome an be thought of as points along a line.
Numbers identify the parti ular genes; and, as genes have dire tionality, signs orrespond to their dire tion. Palmer and others have shown that the dieren e in order
may be explained by a small number of reversals [17, 18, 19, 20, 12. These reversals
orrespond to evolutionary hanges during the history of the two genomes, so the num1. Introdu tion.
A
+1
preliminary version of this paper was presented at the Eighth ACM-SIAM Symposium on
Dis rete Algorithms [13.
y AT&T-labs resear h, 180 Park Ave, Florham Park, NJ 07932 USA. hklresear h.att. om
z Department of Computer S ien e, Sa kler Fa ulty of Exa t S ien es, Tel Aviv University,
Tel-Aviv 69978 ISRAEL. Resear h supported in part by a grant from the Ministry of S ien e
and the Arts, Israel, and by US Department of Energy, grant No. DE-FG03-94ER61913/A000.
shamirmath.tau.a .il
x Department of Computer S ien e, Prin eton University, Prin eton, NJ 08544 USA and InterTrust
Te hnologies Corporation, Sunnyvale, CA 94086 USA. Resear h at Prin eton University partially
supported by the NSF, Grants CCR-8920505 and CCR-9626862, and the O e of Naval Resear h,
Contra t No. N00014-91-J-1463. ret s.prin eton.edu
1
ber of reversals re e ts the evolutionary distan e between the spe ies. Hen e, given
two su h permutations, their reversal distan e measures their evolutionary distan e.
Mathemati al analysis of genome rearrangement problems was initiated by Sanko
[22, 21. Ke e ioglu and Sanko [16 gave the rst onstant-fa tor polynomial approximation algorithm for the problem and onje tured that the problem is NP-hard.
Bafna and Pevzner [3, and more re ently Christie [6 improved the approximation
fa tor, and additional studies have revealed the ri h ombinatorial stru ture of rearrangement problems [15, 14, 2, 9, 10. Quite re ently, Caprara [5 has established
that sorting unsigned permutations is NP-hard, using some of the ombinatorial tools
developed by Bafna and Pevzner [3.
In 1995, Hannenhalli and Pevzner [11 showed that the problem of sorting a signed
permutation by reversals is polynomial. They proved a duality theorem that equates
the reversal distan e with the sum of three ombinatorial parameters (see Theorem 2.3
below). Based on this theorem, they proved that sorting signed permutations by
reversals an be done in O(n ) time. More re ently, Berman and Hannenhalli [4
des ribed a faster implementation that nds a minimum sequen e of reversals in
O(n (n)) time, where is the inverse of A kerman's fun tion [1 (see also [23).
In this study we give an O(n ) algorithm for sorting a signed permutation of n
elements, thereby improving upon the previous best known bound [4. In fa t, if the
reversal distan e is r, our algorithm requires O(r n + n(n)) time. In addition to
giving a better time bound, our work onsiderably simplies both the algorithm and
ombinatorial stru ture needed for the analysis, as follows:
The basi obje t we work with is an impli it representation of the overlap graph,
to be dened later, in ontrast with the interleaving graph in [11 and [4. The
overlap graph is ombinatorially simpler than the interleaving graph. As a result,
it is easier to produ e a representation for the overlap graph from the input, and
to maintain it while sear hing for reversals.
As a onsequen e of our ability to work with the overlap graph we need not perform
any \padding transformations", nor do we have to work with \simple permutations"
as in [11 and [4.
We deal with the unoriented and oriented parts of the permutation separately,
whi h makes the algorithm mu h simpler.
The notion of a hurdle, one of the ombinatorial entities dened by [11 for the
duality theorem, is simplied and is handled in a more symmetri manner.
The sear h for the next reversal is mu h simpler, and requires no spe ial data
stru tures. Our algorithm omputes onne ted omponents only on e, and any
simple implementation of it su es to obtain the quadrati time bound. In ontrast, in [4 a logarithmi number of onne ted omponent omputations may be
performed per reversal, using the union-nd data stru ture.
The paper is organized as follows: Se tion 2 gives the ne essary preliminaries. Se tion
3 gives an overview of our algorithm. Se tions 4 and 5 give the details of our algorithm.
We summarize our results and suggest some further resear h in Se tion 6.
This se tion gives the basi ba kground, primarily the theory
of Hannenhalli and Pevzner, on whi h we base our algorithm. The reader may nd
it helpful to refer to Figure 2.1, in whi h the main denitions are illustrated. We
start with some denitions for unsigned permutations. Let = ( ; : : : ; n) denote a
permutation of f1; : : : ; ng. Augment to a permutation on n + 2 verti es by adding
= 0 and n = n + 1 to it. A pair (i ; i ), 0 i n is alled a gap. Gaps
are lassied into two types. A gap (i ; i ) is a breakpoint of if and only if
4
2. Preliminaries.
+1
+1
+1
ji i j > 1; otherwise, it is an adja en y of . We denote by b() the number of

+1
breakpoints in .
A reversal, (i; j ), on a permutation transforms to
0 = (i; j ) =
( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):
We say that a reversal (i; j ) a ts on the gaps (i ; i ) and (j ; j ).
1
+1
10
14
+1
13
14
11
12
10
13
12
15
15
11
12,13
2,3
4,5
0,1
14,15
10,11
8,9
6,7
Fig. 2.1. a) The breakpoint graph, B ( ), of the permutation = (4;

3; 1; 5; 2; 7; 6). Bla k
edges are solid; gray edges are dashed; oriented edges are bold. b) B () de omposes into two disjoint
alternating y les. ) The overlap graph, OV (). Bla k verti es orrespond to oriented edges.
The breakpoint graph B() of a permutation

= ( ; : : : ; n) is an edge- olored graph on n + 2 verti es f ; ; : : : ; n g =
f0; 1; : : : ; n + 1g. We join verti es i and j by a bla k edge if (i ; j ) is a breakpoint
in and by a gray edge if (i; j ) is a breakpoint in .
We dene a one-to-one mapping u from the set of signed permutations of order
n into the set of unsigned permutations of order 2n as follows. Let be a signed
permutation. To obtain u(), repla e ea h positive element x in by 2x 1; 2x
and ea h negative element x by 2x; 2x 1. For any signed permutation , let
B () = B (u()). Note that in B () every vertex is either isolated or in ident to
exa tly one bla k edge and one gray edge. Therefore, there is a unique de omposition
of B() into y les. The edges of ea h y le alternate between gray and bla k. Call a
reversal (i; j ) su h that i is odd and j even an even reversal. The reversal (2i +1; 2j )

2.1. The breakpoint graph.

1
+1
on u() mimi s the reversal (i+1; j ) on . Thus, sorting by reversals is equivalent to

sorting the unsigned permutation u() by even reversals. Hen eforth we will onsider
the latter problem, and by a reversal we will always mean an even reversal. Let
b() = b(u()) and let () be the number of y les in B ().
Figure 2.1(a) shows the breakpoint graph of the permutation = (4; 3; 1; 5; 2; 7; 6).
It has eight breakpoints and de omposes into two alternating y les, i.e. b() = 8, and
() = 2. The two y les are shown in Figure 2.1(b). Figure 2.2(a) shows the breakpoint graph of 0 = (4; 3; 1; 2; 5; 7; 6), whi h has seven breakpoints and de omposes
into two y les.
For an arbitrary reversal on a permutation , dene b(; ) = b() b()
and (; ) = () (). When the reversal and the permutation are lear
from the ontext, we will abbreviate b(; ) by b and (; ) by . As Bafna
and Pevzner [3 observed, the following values are taken by b and depending on
the types of the gaps (i; j ) a ts on:
1. Two adja en ies: = 1 and b = 2.
2. A breakpoint and an adja en y: = 0 and b = 1.
3. Two breakpoints ea h belonging to a dierent y le: b = 0, = 1.
4. Two breakpoints of the same y le C :
a. (i ; j ) and (i ; j ) are gray edges: = 1, b = 2.
b. Exa tly one of (i ; j ) and (i ; j ) is a gray edge: = 0, b = 1.
. Neither (i ; j ) nor (i ; j ) is a gray edge, and when breaking C at i and
j verti es i 1 and j + 1 end up in the same path: b = 0, = 0.
d. Neither (i ; j ) nor (i ; j ) is a gray edge, and when breaking C at i and
j verti es i 1 and j + 1 end up in dierent paths: b = 0, = 1.
Call a reversal proper if b = 1, i.e. it is either of type 4a, 4b, or 4d.
We say that a reversal a ts on a gray edge e if it a ts on the breakpoints whi h
orrespond to the bla k edges in ident with e. A gray edge is oriented if a reversal
a ting on it is proper, otherwise it is unoriented. Noti e that a gray edge (k ; l ) is
oriented if and only if k + l is even. For example, the gray edge (0; 1) in the graph of
Figure 2.1(a) is unoriented, while the gray edge (7; 6) is oriented.
Two intervals on the real line overlap if their interse tion is nonempty but neither properly ontains the other. A graph G is an interval
overlap graph if one an assign an interval to ea h vertex su h that two verti es are
adja ent if and only if the orresponding intervals overlap (see, e.g., [8). For a permutation , we asso iate with a gray edge (i ; j ) the interval [i; j . The overlap graph
of a permutation , denoted OV (), is the interval overlap graph of the gray edges
of B(). Namely, the vertex set of OV () is the set of gray edges in B(), and two
verti es are onne ted if the intervals asso iated with their gray edges overlap. We
shall identify a vertex in OV () with the edge it represents and with its interval in the
representation. Thus, the endpoints of a gray edge are a tually the endpoints of the
interval representing the orresponding vertex in OV (). Note that all the endpoints
of intervals in this representation are distin t integers. A onne ted omponent of
OV () that ontains an oriented edge is alled an oriented omponent; otherwise, it
is alled an unoriented omponent.
Figure 2.1( ) shows the interval overlap graph for = (4; 3; 1; 5; 2; 7; 6). It
has only one0 oriented omponent. Figure 2.2(b) shows the overlap graph of the permutation = (4; 3; 1; 2; 5; 7; 6), whi h has two onne ted omponents, one oriented
and the other unoriented.
+1
+1
+1
+1
2.2. The overlap graph.
a
0
12,13
10
13
4,5
0,1
8,9
6,7
14
11
12
15
14,15
10,11
Fig. 2.2.
a) The breakpoint graph of 0 = (4; 3; 1; 2; 5; 7; 6). 0 was obtained from of
Figure 2.1 by the reversal (7; 10); or, equivalently, by the reversal dened by the gray edge (2; 3).
b) The overlap graph of 0 .
Let X be a set of
gray edges in B(). Dene min(X ) = minfi j (i ; j ) 2 X g, max(X ) = maxfj j (i ; j ) 2
X g and span(X ) = [min(X ); max(X ). Equivalently, one an look at the interval overlap representation of OV () mentioned above and dene the span of a set of verti es
X as the minimum interval whi h ontains all the intervals of verti es in X .
The major obje t our algorithm will work with is OV (), though for e ien y
onsiderations we will avoid generating it expli itly. In ontrast, Pevzner and Hannenhalli worked with the interleaving graph H , whose verti es are the alternating
y les of B(). Two y les C and C are onne ted by an edge in H i there exists
a gray edge e 2 C and a gray edge e 2 C that overlap.
The following lemma and its orollary imply that the partition imposed by the
onne ted omponents of OV () on the set of gray edges is identi al to the one
imposed by the onne ted omponents of H :
Lemma 2.1. If M is a set of gray edges in B ( ) that orresponds to a onne ted
omponent in OV ( ) then min(M ) is even and max(M ) is odd.
Proof. Assume min(M ) is odd. Then M + 1 and M
1 must both
be in span(M ) (i.e. there exist l ; l 2 span(M ) su h that l1 = M + 1 and
l2 = M 1). Thus M is neither the maximum nor the minimum element
in the set fi j i 2 span(M )g. Hen e, either the maximum element or the minimum
element in span(M ) is j for some min(M ) < j < max(M ). By the denition of B()
there must be a gray edge (j ; l ) for some l 62 span(M ), ontradi ting the fa t that
M is a onne ted omponent in OV (). The proof that max(M ) is odd is similar.
As an illustration of Lemma 2.1, onsider Figure 2.2(a). Let M = f(0; 1); (4; 5); (8; 9); (6; 7)g
and M = f(10; 11); (12; 13); (14; 15)g. Then span(M ) = [0; 9 and span(M ) =
[10; 15.
Corollary 2.2. Every onne ted omponent of OV ( ) orresponds to the set of
gray edges of a union of y les.
Proof. Assume by ontradi tion that C is a y le whose gray edges belong to
at least two onne ted omponents in OV (). Assume M and M are two of these
omponents su h that there are two onse utive gray edges e 2 M and e 2 M
along C . Sin e the spans of dierent onne ted omponents in OV () annot overlap
there are two dierent ases to onsider.
2.3. The onne ted omponents of the overlap graph.
min(
min(
min(
min(
min(
1. span(M ) span(M ) (the ase span(M ) span(M ) is symmetri ). Sin e

e and e are in dierent omponents they annot overlap. Thus, either the right
endpoint of e is even and equals max(M ) or the left endpoint of e is odd and
equals min(M ). In both ases we have a ontradi tion to Lemma 2.1.
2. span(M ) and span(M ) are disjoint intervals. W.l.o.g. assume that max(M ) <
min(M ). The right endpoint of e is even and equals max(M ), whi h ontradi ts
Lemma 2.1.
Note that in parti ular Corollary 2.2 implies that an overlap graph annot ontain
isolated verti es.
Let i1 ; i2 ; : : : ; i be the subsequen e of 0; ; : : : ; n ; n +1 onsisting of those elements in ident with gray edges that o ur in unoriented omponents of OV (). Order i1 ; i2 ; : : : ; i on a ir le CR su h that i follows i 1 for
2 j k and i1 follows i . Let M be an unoriented onne ted omponent in
OV (). Let E (M ) fi1 ; i2 ; : : : ; i g be the set of endpoints of the edges in M . An
unoriented omponent M is a hurdle if the elements of E (M ) o ur onse utively on
CR.
This denition of a hurdle is dierent from the one given by Hannenhalli and
Pevzner [11. It is simpler in the sense that minimal hurdles and the maximal one do
not have to be treated in dierent ways. Using Corollary 2.2 above, one an prove that
the hurdles as we have dened them are identi al to the ones dened by Hannenhalli
and Pevzner. Let h() denote the number of hurdles in a permutation .
A hurdle is simple if when one deletes it from OV () no other unoriented omponent be omes a hurdle, and it is a super hurdle otherwise. A fortress is a permutation
with an odd number of hurdles all of whi h are super hurdles.
The following theorem was proved by Hannenhalli and Pevzner.
Theorem 2.3. [11 The minimum number of reversals required to sort a permutation is b( ) ( ) + h( ), unless is a fortress, in whi h ase exa tly one
2
2.4. Hurdles.
additional reversal is ne essary and su ient.
3. Overview of our algorithm. Denote by d( ) the reversal distan e of , i.e.,

d() = b() ()+ h()+1 if is a fortress and d() = b() ()+ h() otherwise.
Following the theory developed in [11, it turns out that given a permutation
with h() >0 0 one an perform
t = dh()=2e reversals and transform into a
permutation su h that h(0 ) = 0 and d(0 ) = d() t. If OV () has unoriented
omponents then our algorithm rst nds t su h reversals that transform into a 0
whi h has only oriented omponents.
Our method of \ learing the hurdles" uses the theory developed by Hannenhalli
and Pevzner. In Se tion 5 we des ribe an e ient implementation of this pro ess
whi h uses the impli it representation of the overlap graph OV (). Our implementation runs in O(n) time assuming OV () is already partitioned into its onne ted
omponents. Re ently, Berman and Hannenhalli [4 gave an O(n(n)) algorithm for
omputing the onne ted omponents of an interval overlap graph given impli itly by
its representation. Using their algorithm we an lear the hurdles from a permutation
in O(n(n)) time.
The overlap graph of 0 , OV (0 ), has only oriented omponents. In Se tion 4 we
prove that in the neighborhood of any oriented gray edge e there is an oriented gray
edge e (e ould be the same as e) su h that a reversal a ting on e does not reate
new hurdles. Call su h a reversal a safe reversal. We develop an e ient algorithm
to lo ate a safe reversal in a permutation with at least one oriented gray edge. Our
algorithm uses only an impli it representation of the overlap graph and runs in O(n)
time.
The se ond stage of our algorithm repeatedly nds a safe reversal and performs
it as long as OV () is not empty. Clearly the overall omplexity is O(r n + n(n)),
where r is the number of reversals required to sort 0.
We assume that the input is given as
a sequen e of n signed integers representing . First the permutation = u( ) is
onstru ted as des ribed in Se tion 2.1 and stored in an array. We also onstru t an
array representing . It is straightforward to verify that with these two arrays we
an determine for ea h element in whether it is a left or a right endpoint of a gray
edge in onstant time. In ase the element is an endpoint of a gray edge we an also
nd the other endpoint and he k whether the edge is oriented in onstant time.
Thus the arrays and omprise a representation of OV (). Our algorithm
will maintain these two arrays while arrying out the reversals that it nds. The time
to update the arrays is proportional to the length of the interval being reversed, whi h
is O(n). We shall give a high-level presentation of our algorithm and use primitives
like \S an the oriented gray edges in in reasing left endpoint order". It is easy to
see how to implement these primitives using the arrays and ; we shall omit the
details.
It is easy to produ e a list of the intervals in the representation of OV () sorted
by either left or right endpoint from the arrays and . It is also possible to
maintain them without in reasing the asymptoti time bound of the algorithm. In
pra ti e it may be faster to maintain su h lists instead of, or in addition to and
.
First we introdu e some notation. Re all that the verti es of OV () are the gray edges of B(). In order to avoid onfusion
we will usually refer to them as verti es of OV (). Hen e a vertex of OV () is oriented
if the orresponding gray edge is oriented and it is unoriented otherwise. Let e be a
vertex in OV (). Denote by r(e) the reversal a ting on the gray edge orresponding
to e. Denote by N (e) the set of neighbors of e in OV () in luding e itself. Denote by
ON (e) the subset of N (e) ontaining the oriented verti es and by UN (e) the subset
of N (e) ontaining the unoriented verti es.
In this se tion we prove that if an oriented vertex e exists in OV () then there
exists an oriented vertex f 2 ON (e) su h that r(f ) is proper and safe. We also
des ribe an algorithm that nds a proper safe reversal in a permutation that ontains
at least one oriented edge.
We start with the following useful observation:
Observation 4.1. Let e be a vertex in OV ( ) and let 0 = r(e). OV ( 0 ) ould
be obtained from OV ( ) by the following operations. 1) Complement the graph indu ed
by OV ( ) on N (e) feg, and ip the orientation of every vertex in N (e) feg. 2)
If e is oriented in OV ( ) then remove it from OV ( ). 3) If there exists an oriented
edge e0 in OV ( ) with r(e) = r(e0 ) then remove e0 from OV ( ).
Note that if e is an oriented vertex in a omponent M of OV (), M feg may
split into several omponents in OV (0 ). (Compare gures 2.1( ) and 2.2(b).) Denote
these omponents by M 0 (e); : : : ; Mk0 (e), where k 1. We will refer to Mi0(e) simply
as Mi0 whenever e is lear from the ontext.
Let C be a lique of oriented verti es in OV (). We say that C is happy if for
every oriented vertex e 62 C and every vertex f 2 C su h that (e; f ) 2 E (OV ()) there
exists an oriented vertex g 62 C su h that (g; e) 2 E (OV ()) and (g; f ) 62 E (OV ()).
3.1. Representing the overlap graph.
0
4. Eliminating oriented omponents.
For example, in the overlap graph shown in Figure 2.1( ) f(2; 3); (10; 11)g and f(6; 7)g
are happy liques, but f(2; 3); (10; 11); (8; 9)g is not. Our rst theorem laims that one
of verti es in any happy lique denes a safe proper reversal.
Theorem 4.1. Let C be a happy lique and let e be a vertex in C su h that
jUN (e0 )j jUN (e)j for every e0 2 C . Then the reversal r(e) is safe.
Proof. Let 0 = r(e) and assume by ontradi tion that Mi0 (e) is unoriented for
some 1 i k. Clearly N (e) \ Mi0 6= ;.
Assume there exists y 2 N (e) \ Mi0 su h that y 62 C . Clearly y must be oriented
in OV () and sin e C is happy it must also have an oriented neighbor y0 su h that
(y0; e) 62 E (OV ()). Sin e
y0 is not adja ent to e in OV () it stays oriented and
0
adja ent to y in OV ( ), in ontradi tion with the assumption that Mi0 is unoriented.
Hen e we may assume that N (e) \ Mi0 C .
Let y 2 N (e) \ Mi0 and let z 2 UN (e). Vertex z is oriented in OV (0 ) and if it is
adja ent to y in OV (0 ) we obtain a ontradi tion. Hen e, z and y are not adja ent in
OV (0 ), so they must be adja ent in OV (). Hen e we obtain that UN (e) UN (y)
in OV (). Corollary 2.2 implies that omponent Mi0 annot ontain y alone. Thus y
must have a neighbor x in Mi0. Sin e N (e) \ Mi0 C , vertex x is not adja ent to e
in OV (). Thus we obtain that (x; y) 2 OV (), (x; e) 62 OV (), and x is unoriented
in OV (). Sin e we have already proved that UN (e) UN (y), this implies that
UN (e) UN (y), in ontradi tion with the hoi e of e.
For example Theorem 4.1 implies that the reversal dened by the gray edge
(10; 11) is a safe proper reversal for the permutation of Figure 2.1 (a), sin e it
orresponds to the vertex with maximum unoriented degree in the happy lique
f(2; 3); (10; 11)g. On the other hand, the reversal dened by (2; 3) reates a new
unoriented omponent, as it yields the permutation shown in Figure 2.2.
The following theorem proves that a happy lique exists in the neighborhood of
any oriented edge.
Theorem 4.2. Let e be an oriented vertex in OV ( ). There exists an oriented
vertex f 2 ON (e) su h that for 0 = r(f ), all the omponents in OV ( 0 ) are oriented.
Proof. By Theorem 4.1 it su es to show that there exists a happy lique C in
ON (e).
Let Ext(e) = fx 2 ON (e) j there exists y 2 ON (x) su h that y 62 ON (e)g.
That is, Ext(e) ontains all oriented neighbors of e whi h have oriented neighbors
outside of ON (e).
Case 1: Ext(e) = ON (e) feg. Set C = feg.
Case 2: Ext(e) ON (e) feg. Let D = ON (e) Ext(e). For j 0, while Dj is
not a lique let K j be a maximal lique in Dj and dene Dj = Dj K j . Let Dk ,
k 0 be the nal lique and set C = Dk .
It is straightforward to verify that in ea h of the two ases C is indeed a happy lique.
0
+1
In the next se tion we des ribe an algorithm that will nd an oriented edge e
su h that r(e) is safe given the representation of OV () des ribed in Se tion 3.1. The
algorithm rst nds a happy lique C and then sear hes for the vertex with maximum
unoriented degree in C . A ording to Theorem 4.1 this vertex denes a safe reversal.
Even though Theorem 4.2 guarantees the existen e of a happy lique in the neighborhood of any xed oriented vertex, our algorithm does not sear h in one parti ular
su h neighborhood. We will prove that the algorithm is guaranteed to nd a happy
lique assuming that there exists at least one oriented edge. Therefore the algorithm
8
provides an alternative proof to a weaker version of Theorem 4.2 that only laims the
existen e of a happy lique somewhere in the graph.
In this se tion we give an algorithm that
lo ates a happy lique in OV (). Let e ; : : : ; ek be the oriented verti es in OV () in
in reasing left endpoint order. The algorithm traverses the oriented verti es in OV ()
a ording to this order. Let L(e) and R(e) be the left and right endpoints, respe tively,
of vertex e in the realization of OV (). After traversing e ; : : : ; ei, 1 i k, the
algorithm maintains a happy lique Ci in the subgraph of OV () indu ed by these
verti es. Assume jCi j = j , j i and let ei1 ; : : : ; ei be the verti es in Ci where
i < i < : : : < ij . The verti es of Ci are maintained in a linked list ordered in
in reasing left endpoint order. If there exists an interval that ontains all the intervals
in Ci then the algorithm maintains a minimal su h interval ti . The lique Ci and the
vertex ti (if exists) satisfy the following invariant.
Invariant 4.1.
1) Every vertex el 62 Ci , l i, su h that L(ei1 ) < L(el ) must be adja ent to ti , i.e.,
R(el ) > R(ti ).
2) Every vertex el 62 Ci , L(el ) < L(ei1 ) that is adja ent to a vertex in Ci is either
adja ent to an interval ep su h that R(ep ) < L(ei1 ) or adja ent to ti .
The fa t that Ci is happy in the subgraph indu ed by e ; : : : ; ei follows from
this invariant. We initialize the algorithm by setting C = fe g. Initially, t is not
dened. Let the urrent interval be ei . If R(ei ) < L(ei ) then Ci is guaranteed
to be happy in OV () sin e all remaining oriented verti es are not adja ent to Ci .
Hen e the algorithm stops and returns Ci as the answer. See Figure 4.1(a).
We now assume that L(ei ) R(ei ) and show how to obtain Ci and ti .
We have to onsider the following ases.
Case 1. The interval ti is dened and R(ti ) < R(ei ). Continue with Ci = Ci and
ti = ti . See Figure 4.1(b).
Case 2. The interval ti is not dened or R(ei ) R(ti ).
a) R(ei ) < R(ei ) and L(ei ) R(ei1 ). Ci is obtained by adding ei to Ci and
ti = ti . See Figure 4.1( ).
b) R(ei ) < R(ei ) and L(ei ) > R(ei1 ). The lique Ci onsists of ei alone and
ti = ti . See Figure 4.1(d).
) R(ei ) < R(ei ). As in the previous ase Ci = fei g. In this ase ti is set
to ei , the last interval in Ci . See Figure 4.1(e).
4.1. Finding a happy lique.
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
The following theorem proves that the algorithm above produ es a happy lique.
Let Cl be the urrent lique when the algorithm stops. Then Cl
is a happy lique in OV ( ).
Proof. A straightforward indu tion on the number of oriented verti es traversed
by the algorithm proves that Cl and tl satisfy Invariant 4.1.
The algorithm stops either when R(ei ) < L(el ) or when l is equal to the
number of oriented verti es. In either ase sin e Cl is happy in the subgraph indu ed
by e ; : : : ; el it must be happy in OV ().
The running time of the algorithm is proportional to the number of oriented
verti es traversed sin e a onstant amount of work is performed for ea h su h vertex.
After lo ating a happy lique C in OV ()
we need to sear h it for a vertex with a maximum number of unoriented neighbors.
In this se tion we give an algorithm that performs this task.
Theorem 4.3.
4.2. Sear hing the happy lique.
+1
e
Fig. 4.1. The various ases of the algorithm for nding a happy lique. The topmost interval
is always ti . The three thi k intervals omprise Ci . The dotted interval orresponds to ei+1 .
Let e ; : : : ; ej be the intervals in C ordered in in reasing left endpoint order.

Clearly, L(1) < L(2) < : : : < L(j ) < R(1) < R(2) < : : : < R(j ). Thus the endpoints
of the j verti es in C partition the line into 2j + 1 disjoint intervals I ; : : : ; I j , where
I = ( 1; L(1), Il = (L(l); L(l + 1) for 1 l < j , Ij = (L(j ); R(1), Il = (R(l
j ); R(l j + 1) for j < l < 2j and I j = (R(j ); 1). The algorithm onsists of the
following three stages.
Stage 1: Let e be an unoriented vertex that has a non-empty interse tion with the
interval [L(1); R(j ). Mark ea h of e's endpoints with the index of the interval that
ontains it.
to a vertex in C . The
Stage 2: Let o be an array of j ounters, ea h orresponding
P
l
intention is to assign values to o su h that the sum i o[i is the unoriented degree
of the vertex el 2 C . The ounters are initialized to zero. For ea h unoriented vertex
e that overlaps with the interval [L(1); R(j ) we hange at most four of the ounters
as follows. Let Il and Ir be the intervals in whi h L(e) and R(e) o ur, respe tively.
We may assume l < r as otherwise e is not adja ent to any vertex in C and we an
ignore it. We ontinue a ording to one of the following ases.
Case 1: r j . All the verti es from el to er are adja ent to e: we in rement o[l +1
and de rement o[r + 1 (if r < j ).
Case 2: j l. All the verti es from el j to er j are adja ent to e: we in rement
o[l j + 1 and de rement o[r j + 1 (if r < 2j ).
Case 3: l < j and j < r. Let m = minfl; r j g. If m > 0 then all the verti es from
e to em are adja ent to e: we in rement o[1 and de rement o[m + 1. Similarly let
M = maxfl; r j g. If M < j then the verti es from el to ej are adja ent to e: we
in rement the ounter o[l + 1.P
Stage 3: Compute f = maxlf li o[ij1 l j g. Return ef .
The following theorem summarizes the result of this se tion. We omit the proof,
whi h is straightforward.
1
=1
+1
+1
+1
=1
Theorem 4.4. Given a lique C , the vertex ef 2 C omputed by the algorithm

above has maximum unoriented degree among the verti es in C .
10
The omplexity of the algorithm is proportional to the size of C plus the number
of unoriented verti es in OV (), and hen e is O(n).
In ase there are unoriented omponents in OV (),
there exists a sequen e r ; : : : rt of t reversals that transform into 0 su h that d(0 ) =
d() t, where t = dh()=2e. In this se tion we summarize the hara terization given
by Hannenhalli and Pevzner for these t reversals and outline how to nd them using
our impli it representation of OV ().
We will use the following denitions. A reversal merges hurdles H and H if it
a ts on two breakpoints, one in ident with a gray edge in H and the other in ident
with a gray edge in H . Re all the ir le CR dened in Se tion 2, in whi h the
endpoints of the edges in the unoriented omponents of OV () are ordered onsistently
with their order in . Two hurdles H and H are onse utive if their sets of endpoints
E (H ) and E (H ) o ur onse utively on CR, i.e., there is no hurdle H su h that
E (H ) separates E (H ) and E (H ) on CR.
The following lemmas were essentially proved by Hannenhalli and Pevzner though
stated dierently in their paper.
Lemma 5.1 ([11). Let be a permutation with an even number, say 2k , of
hurdles. Any sequen e of k 1 reversals ea h of whi h merges two non- onse utive
hurdles followed by a reversal merging the remaining two hurdles will transform into
0 su h that d(0 ) = d() k and 0 has only oriented omponents.
Lemma 5.2 ([11). Let be a permutation with an odd number, say 2k + 1, of
hurdles. If at least one hurdle H is simple then a reversal a ting on two breakpoints
in ident with edges in H transforms into 0 with 2k hurdles su h that d( 0 ) =
d() 1. If is a fortress then a sequen e of k 1 reversals merging pairs of non5. Clearing the hurdles.
1
onse utive hurdles followed by two additional merges of pairs of onse utive hurdles
(one merges two original hurdles and the next merges a hurdle reated by the rst and
the last original hurdle) will transform into 0 su h that d( 0 ) = d( ) (k + 1) and
0 has only oriented omponents.
We now outline how to turn these lemmas into an algorithm that nds a parti ular
sequen e of reversals r ; : : : ; rt with the properties des ribed above. First OV () is
de omposed into onne ted omponents as des ribed in [4. One then has to identify
those unoriented omponents that are hurdles. This task an be done by traversing the
endpoints of the ir le CR, ounting the number of elements in ea h run of onse utive
endpoints belonging to the same omponent. If a run ontains all endpoints of a
parti ular unoriented omponent M then M is an hurdle.
In a similar fashion one an he k for ea h hurdle whether it is a simple hurdle
or a super hurdle. While traversing the y le, a list of the hurdles in the order they
o ur on CR is reated. At the next stage this list is used to identify orre t hurdles
to merge.
We assume that given an endpoint one an lo ate its onne ted omponent in
onstant time. It is easy to verify that the data an be maintained so that this is
possible.
Theorem 5.3. Given OV ( ) de omposed into its onne ted omponents, the
algorithm outlined above nds t reversals su h that when we apply them to we obtain
a 0 whi h is hurdle-free and has d( 0 ) = d( ) t. The algorithm an be implemented
to run in O(n) time.
Proof. Corre tness follows from Lemma 5.1 and 5.2. The time bound is a hieved
if we always merge hurdles that are separated by a single hurdle. If the ith merge
merged hurdles H and H that are separated by H , then H should be merged in the
1
11
i + 1st merge. Carrying out the merges this way guarantees that the span of ea h
hurdle H overlaps at most two merging reversals, the se ond of whi h eliminates H .
6. Summary.
Figure 6.1 gives a s hemati des ription of the algorithm.
Signed Reversals();
/* is a signed permutation */
1. Compute the onne ted omponents of OV ( ).
2. Clear the hurdles.
3. while is not sorted do :
/* iteration */
algorithm
begin
a. nd a happy lique C in OV ( ).
b. nd a vertex ef 2 C with maximum unoriented
degree, and perform a safe reversal on ef ;
. update and the representation of OV ( ).
end
end
4. output the sequen e of reversals.

. An algorithm for sorting signed permutations
Fig. 6.1
Theorem 6.1.
Algorithm
Signed Reversals
nds the reversal distan e r in
O(n(n) + r n) time, and in parti ular in O(n2 ) time.
Proof. The orre tness of the algorithm follows from Theorem 2.3, Theorem 4.1
and Lemmas 5.1 and 5.2.
Step 1 takes O(n(n)) time by the algorithm of Berman and Hannenhalli [4.
Step 2 takes O(n) time by Theorem 5.3. Step 3 takes O(n) time per reversal, by the
dis ussion in Se tion 4.
It is an intriguing open question whether a faster algorithm for sorting signed
permutations by reversals exists. It ertainly might be the ase that one an nd an
optimal sequen e of reversals faster. To date, no nontrivial lower bound is known for
this problem.
We thank Donald Knuth, Sridhar Hannenhalli, Pavel Pevzner,
and Itsik Pe'er for their omments on a preliminary version of this paper.
A knowledgments.
REFERENCES
[1
[2
[3
[4
[5
[6
[7
, Zum hilbertshen aufbau der reelen zahlen, Math. Ann., 99 (1928), pp. 118{133.
, Sorting permutations by transpositions, in Pro eedings of the 6th
Annual Symposium on Dis rete Algorithms, ACM Press, Jan. 1995, pp. 614{623.
V. Bafna and P. A. Pevzner, Genome rearragements and sorting by reversals, SIAM Journal
on Computing, 25 (1996), pp. 272{289. A preliminary version appeared in Pro . 34th IEEE
Symp. of the Foundations of Computer S ien e, pages 148{157, 1994.
P. Berman and S. Hannenhalli, Fast sorting by reversals, in Pro . Combinatorial Pattern
Mat hing (CPM), 1996, pp. 168{185. LNCS 1075.
A. Caprara, Sorting by reversals is di ult, in Pro eedings of the First International Conferen e on Computational Mole ular Biology (RECOMB), ACM Press, 1997, pp. 75{83.
D. A. Christie, A 3/2-approximation algorithm for sorting by reversals, in Pro . ninth annual
ACM-SIAM Symp. on Dis rete Algorithms (SODA 98), ACM Press, 1998, pp. 244{252.
T. Dobzhansky and A. H. Sturtevant, Inversions in the hromosomes of drosophila pseudoobs ura, Geneti s, 23 (1938), pp. 28{64.
W. A kermann
V. Bafna and P. Pevzner
12
[8
[9
[10
[11
[12
[13
[14
[15
[16
[17
[18
[19
[20
[21
[22
[23
, Algorithmi Graph Theory and Perfe t Graphs, A ademi Press, New York,
M. C. Golumbi
1980.
, Polynomial algorithm for omputing translo ation distan e between genomes,

Dis . Appl. Math., 71 (1996), pp. 137{151.
S. Hannenhalli and P. Pevzner, Transforming men into mi e (polynomial algorithm for
genomi distan e problems, in Pro . IEEE Symp. of the Foundations of Computer S ien e,
1995, pp. 581{592.
S. Hannenhalli and P. A. Pevzner, Transforming abbage into turnip (polynomial algorithm
for sorting signed permutations by reversals), in Pro eedings of the Twenty-Seventh Annual
ACM Symposium on Theory of Computing, Las Vegas, Nevada, 29 May{1 June 1995,
pp. 178{189.
S. B. Hoot and J. D. Palmer, Stru tural rearrangements, in luding parallel inversions, within
the hloroplast genome of Anemone and related genera, J. Mole ular Evooution, 38 (1994),
pp. 274{281.
H. Kaplan, R. Shamir, and R. E. Tarjan, Faster and simpler algorithm for sorting signed
permutations by reversals, in Pro . 8th ACM-SIAM Symposium on Dis rete Algorithms,
ACM-SIAM, 1997, pp. 344{351. Also in Pro . RECOMB 97, page 163.
J. Ke e ioglu and R. Ravi, Physi al mapping of hromosomes using unique probes, in Pro .
sixth annual ACM-SIAM Symp. on Dis rete Algorithms (SODA 95), ACM Press, 1995,
pp. 604{613.
J. Ke e ioglu and D. Sankoff, E ient bounds for oriented hromosome inversion distan e,
in Pro . of 5th Ann. Symp. on Combinatorial Pattern Mat hing, Springer, 1994, pp. 307{
325. LNCS 807.
, Exa t and approximation algorithms for sorting by reversals, with appli ation to genome
rearrangement, Algorithmi a, 13 (1995), pp. 180{210. A preliminary version appeared in
Pro . CPM93, Springer, Berlin, 1993, pages 87{105.
J. D. Palmer and L. A. Herbon, Tri ir ular mito hondrial genomes of Brassi a and
Raphanus: reversal of repeat ongurations by inversion, Nu lei A ids Resear h, 14
(1986), pp. 9755{9764.
, Uni ir ular stru ture of the Brassi a hirta mito hondrial genome, Current Geneti s, 11
(1987), pp. 565{570.
, Plant mito hondrial DNA evolves rapidly in stru ture, but slowly in sequen e, J. Mole ular Evolution, 28 (1988), pp. 87{97.
J. D. Palmer, B. Osorio, and W. Thompson, Evolutionalry signi an e of inversions in
legume horloplast DNAs, Current Geneti s, 14 (1988), pp. 65{74.
D. Sankoff, Edit distan e for genome omparison based on non-lo al operations, Le ture Notes
in Computer S ien e, 644 (1992), pp. 121{135.
D. Sankoff, R. Cedergren, and Y. Abel, Genomi divergen e through gene rearrangement.,
Methods in Enzymology, 183 (1990), pp. 428{438.
R. E. Tarjan, E ien y of a good but not linear set union algorithm, J. ACM, 22 (1979),
pp. 215{225.
S. Hannenhalli
13

Sorting Genomes by SBR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sorting Genomes by SBR

Uploaded by

Copyright:

Available Formats

A FASTER AND SIMPLER ALGORITHM FOR SORTING SIGNED

HAIM KAPLANy , RON SHAMIRz ,

AMS (MOS) subje t lassi ation: 62P10 68P10

ji i j > 1; otherwise, it is an adja en y of . We denote by b() the number of

Fig. 2.1. a) The breakpoint graph, B ( ), of the permutation = (4;

The breakpoint graph B() of a permutation

2.1. The breakpoint graph.

on u() mimi s the reversal (i+1; j ) on . Thus, sorting by reversals is equivalent to

2.2. The overlap graph.

1. span(M ) span(M ) (the ase span(M ) span(M ) is symmetri ). Sin e

additional reversal is ne essary and su ient.

3. Overview of our algorithm. Denote by d( ) the reversal distan e of , i.e.,

4. Eliminating oriented omponents.

4.2. Sear hing the happy lique.

Let e ; : : : ; ej be the intervals in C ordered in in reasing left endpoint order.

Theorem 4.4. Given a lique C , the vertex ef 2 C omputed by the algorithm

Figure 6.1 gives a s hemati des ription of the algorithm.

4. output the sequen e of reversals.

nds the reversal distan e r in

O(n(n) + r n) time, and in parti ular in O(n2 ) time.

V. Bafna and P. Pevzner

, Polynomial algorithm for omputing translo ation distan e between genomes,

You might also like

Sorting Genomes by SBR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sorting Genomes by SBR

Uploaded by

Copyright:

Available Formats

A FASTER AND SIMPLER ALGORITHM FOR SORTING SIGNED

HAIM KAPLANy , RON SHAMIRz ,

AMS (MOS) subje t lassi ation: 62P10 68P10

ji i j > 1; otherwise, it is an adja en y of . We denote by b() the number of

Fig. 2.1. a) The breakpoint graph, B ( ), of the permutation  = (4;

The breakpoint graph B() of a permutation

2.1. The breakpoint graph.

on u() mimi s the reversal (i+1; j ) on . Thus, sorting  by reversals is equivalent to

2.2. The overlap graph.

1. span(M )  span(M ) (the ase span(M )  span(M ) is symmetri ). Sin e

additional reversal is ne essary and su ient.

3. Overview of our algorithm. Denote by d( ) the reversal distan e of  , i.e.,

4. Eliminating oriented omponents.

4.2. Sear hing the happy lique.

Let e ; : : : ; ej be the intervals in C ordered in in reasing left endpoint order.

Theorem 4.4. Given a lique C , the vertex ef 2 C omputed by the algorithm

Figure 6.1 gives a s hemati des ription of the algorithm.

4. output the sequen e of reversals.

nds the reversal distan e r in

O(n (n) + r n) time, and in parti ular in O(n2 ) time.

V. Bafna and P. Pevzner

, Polynomial algorithm for omputing translo ation distan e between genomes,

You might also like

ji i j > 1; otherwise, it is an adja en y of . We denote by b() the number of

Fig. 2.1. a) The breakpoint graph, B ( ), of the permutation = (4;

The breakpoint graph B() of a permutation

on u() mimi s the reversal (i+1; j ) on . Thus, sorting by reversals is equivalent to

1. span(M ) span(M ) (the ase span(M ) span(M ) is symmetri ). Sin e

3. Overview of our algorithm. Denote by d( ) the reversal distan e of , i.e.,

O(n(n) + r n) time, and in parti ular in O(n2 ) time.