Professional Documents
Culture Documents
PERMUTATIONS BY REVERSALS
AND
ROBERT E. TARJANx
Abstra
t. We give a quadrati
-time algorithm for nding the minimum number of reversals
needed to sort a signed permutation. Our algorithm is faster than the previous algorithm of Hannenhalli and Pevzner and its faster implementation of Berman and Hannenhalli. The algorithm
is
on
eptually simple and does not require spe
ial data stru
tures. Our study also
onsiderably
simplies the
ombinatorial stru
tures used by the analysis.
ology.
In this paper we study the problem of sorting signed permutations by reversals. A signed permutation is a permutation = ( ; : : : ; n) on the
integers f1; : : : ; ng, where ea
h number is also assigned a sign of plus or minus. A
reversal, (i; j ), on transforms to
0 = (i; j ) =
( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):
The minimum number of reversals needed to transform one permutation to another is
alled the reversal distan
e between them. The problem of sorting signed permutations by reversals is to nd, for a given signed permutation , a sequen
e of reversals of minimum length that transforms to the identity permutation (+1; +2; : : : ; +n).
The motivation to studying the problem arises in mole
ular biology: Con
urrent
with the fast progress of the Human Genome Proje
t, geneti
and DNA data on many
model organisms is a
umulating rapidly, and
onsequently the ability to
ompare
genomes of dierent spe
ies has grown dramati
ally. One of the best ways of
he
king
similarity between genomes on a large s
ale is to
ompare the order of appearan
e
of identi
al genes in the two spe
ies. In the Thirties, Dobzhansky and Sturtevant [7
had already studied the notion of inversions in
hromosomes of drosophila. In the
late Eighties, Jerey Palmer demonstrated that dierent spe
ies may have essentially
the same genes, but the gene orders may dier between spe
ies. Taking an abstra
t
perspe
tive, the genes along a
hromosome
an be thought of as points along a line.
Numbers identify the parti
ular genes; and, as genes have dire
tionality, signs
orrespond to their dire
tion. Palmer and others have shown that the dieren
e in order
may be explained by a small number of reversals [17, 18, 19, 20, 12. These reversals
orrespond to evolutionary
hanges during the history of the two genomes, so the num1. Introdu
tion.
A
+1
preliminary version of this paper was presented at the Eighth ACM-SIAM Symposium on
Dis
rete Algorithms [13.
y AT&T-labs resear
h, 180 Park Ave, Florham Park, NJ 07932 USA. hklresear
h.att.
om
z Department of Computer S
ien
e, Sa
kler Fa
ulty of Exa
t S
ien
es, Tel Aviv University,
Tel-Aviv 69978 ISRAEL. Resear
h supported in part by a grant from the Ministry of S
ien
e
and the Arts, Israel, and by US Department of Energy, grant No. DE-FG03-94ER61913/A000.
shamirmath.tau.a
.il
x Department of Computer S
ien
e, Prin
eton University, Prin
eton, NJ 08544 USA and InterTrust
Te
hnologies Corporation, Sunnyvale, CA 94086 USA. Resear
h at Prin
eton University partially
supported by the NSF, Grants CCR-8920505 and CCR-9626862, and the O
e of Naval Resear
h,
Contra
t No. N00014-91-J-1463. ret
s.prin
eton.edu
1
ber of reversals re
e
ts the evolutionary distan
e between the spe
ies. Hen
e, given
two su
h permutations, their reversal distan
e measures their evolutionary distan
e.
Mathemati
al analysis of genome rearrangement problems was initiated by Sanko
[22, 21. Ke
e
ioglu and Sanko [16 gave the rst
onstant-fa
tor polynomial approximation algorithm for the problem and
onje
tured that the problem is NP-hard.
Bafna and Pevzner [3, and more re
ently Christie [6 improved the approximation
fa
tor, and additional studies have revealed the ri
h
ombinatorial stru
ture of rearrangement problems [15, 14, 2, 9, 10. Quite re
ently, Caprara [5 has established
that sorting unsigned permutations is NP-hard, using some of the
ombinatorial tools
developed by Bafna and Pevzner [3.
In 1995, Hannenhalli and Pevzner [11 showed that the problem of sorting a signed
permutation by reversals is polynomial. They proved a duality theorem that equates
the reversal distan
e with the sum of three
ombinatorial parameters (see Theorem 2.3
below). Based on this theorem, they proved that sorting signed permutations by
reversals
an be done in O(n ) time. More re
ently, Berman and Hannenhalli [4
des
ribed a faster implementation that nds a minimum sequen
e of reversals in
O(n (n)) time, where is the inverse of A
kerman's fun
tion [1 (see also [23).
In this study we give an O(n ) algorithm for sorting a signed permutation of n
elements, thereby improving upon the previous best known bound [4. In fa
t, if the
reversal distan
e is r, our algorithm requires O(r n + n(n)) time. In addition to
giving a better time bound, our work
onsiderably simplies both the algorithm and
ombinatorial stru
ture needed for the analysis, as follows:
The basi
obje
t we work with is an impli
it representation of the overlap graph,
to be dened later, in
ontrast with the interleaving graph in [11 and [4. The
overlap graph is
ombinatorially simpler than the interleaving graph. As a result,
it is easier to produ
e a representation for the overlap graph from the input, and
to maintain it while sear
hing for reversals.
As a
onsequen
e of our ability to work with the overlap graph we need not perform
any \padding transformations", nor do we have to work with \simple permutations"
as in [11 and [4.
We deal with the unoriented and oriented parts of the permutation separately,
whi
h makes the algorithm mu
h simpler.
The notion of a hurdle, one of the
ombinatorial entities dened by [11 for the
duality theorem, is simplied and is handled in a more symmetri
manner.
The sear
h for the next reversal is mu
h simpler, and requires no spe
ial data
stru
tures. Our algorithm
omputes
onne
ted
omponents only on
e, and any
simple implementation of it su
es to obtain the quadrati
time bound. In
ontrast, in [4 a logarithmi
number of
onne
ted
omponent
omputations may be
performed per reversal, using the union-nd data stru
ture.
The paper is organized as follows: Se
tion 2 gives the ne
essary preliminaries. Se
tion
3 gives an overview of our algorithm. Se
tions 4 and 5 give the details of our algorithm.
We summarize our results and suggest some further resear
h in Se
tion 6.
This se
tion gives the basi
ba
kground, primarily the theory
of Hannenhalli and Pevzner, on whi
h we base our algorithm. The reader may nd
it helpful to refer to Figure 2.1, in whi
h the main denitions are illustrated. We
start with some denitions for unsigned permutations. Let = ( ; : : : ; n) denote a
permutation of f1; : : : ; ng. Augment to a permutation on n + 2 verti
es by adding
= 0 and n = n + 1 to it. A pair (i ; i ), 0 i n is
alled a gap. Gaps
are
lassied into two types. A gap (i ; i ) is a breakpoint of if and only if
4
2. Preliminaries.
+1
+1
+1
breakpoints in .
A reversal, (i; j ), on a permutation transforms to
0 = (i; j ) =
( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):
We say that a reversal (i; j ) a
ts on the gaps (i ; i ) and (j ; j ).
1
+1
10
14
+1
13
14
11
12
10
13
12
15
15
11
12,13
2,3
4,5
0,1
14,15
10,11
8,9
6,7
+1
+1
+1
+1
a
0
12,13
10
13
4,5
0,1
8,9
6,7
14
11
12
15
14,15
10,11
Fig. 2.2.
a) The breakpoint graph of 0 = (4; 3; 1; 2; 5; 7; 6). 0 was obtained from of
Figure 2.1 by the reversal (7; 10); or, equivalently, by the reversal dened by the gray edge (2; 3).
b) The overlap graph of 0 .
Let X be a set of
gray edges in B(). Dene min(X ) = minfi j (i ; j ) 2 X g, max(X ) = maxfj j (i ; j ) 2
X g and span(X ) = [min(X ); max(X ). Equivalently, one
an look at the interval overlap representation of OV () mentioned above and dene the span of a set of verti
es
X as the minimum interval whi
h
ontains all the intervals of verti
es in X .
The major obje
t our algorithm will work with is OV (), though for e
ien
y
onsiderations we will avoid generating it expli
itly. In
ontrast, Pevzner and Hannenhalli worked with the interleaving graph H , whose verti
es are the alternating
y
les of B(). Two
y
les C and C are
onne
ted by an edge in H i there exists
a gray edge e 2 C and a gray edge e 2 C that overlap.
The following lemma and its
orollary imply that the partition imposed by the
onne
ted
omponents of OV () on the set of gray edges is identi
al to the one
imposed by the
onne
ted
omponents of H :
Lemma 2.1. If M is a set of gray edges in B ( ) that
orresponds to a
onne
ted
omponent in OV ( ) then min(M ) is even and max(M ) is odd.
Proof. Assume min(M ) is odd. Then M + 1 and M
1 must both
be in span(M ) (i.e. there exist l ; l 2 span(M ) su
h that l1 = M + 1 and
l2 = M 1). Thus M is neither the maximum nor the minimum element
in the set fi j i 2 span(M )g. Hen
e, either the maximum element or the minimum
element in span(M ) is j for some min(M ) < j < max(M ). By the denition of B()
there must be a gray edge (j ; l ) for some l 62 span(M ),
ontradi
ting the fa
t that
M is a
onne
ted
omponent in OV (). The proof that max(M ) is odd is similar.
As an illustration of Lemma 2.1,
onsider Figure 2.2(a). Let M = f(0; 1); (4; 5); (8; 9); (6; 7)g
and M = f(10; 11); (12; 13); (14; 15)g. Then span(M ) = [0; 9 and span(M ) =
[10; 15.
Corollary 2.2. Every
onne
ted
omponent of OV ( )
orresponds to the set of
gray edges of a union of
y
les.
Proof. Assume by
ontradi
tion that C is a
y
le whose gray edges belong to
at least two
onne
ted
omponents in OV (). Assume M and M are two of these
omponents su
h that there are two
onse
utive gray edges e 2 M and e 2 M
along C . Sin
e the spans of dierent
onne
ted
omponents in OV ()
annot overlap
there are two dierent
ases to
onsider.
2.3. The
onne
ted
omponents of the overlap graph.
min(
min(
min(
min(
min(
2.4. Hurdles.
Following the theory developed in [11, it turns out that given a permutation
with h() >0 0 one
an perform
t = dh()=2e reversals and transform into a
permutation su
h that h(0 ) = 0 and d(0 ) = d() t. If OV () has unoriented
omponents then our algorithm rst nds t su
h reversals that transform into a 0
whi
h has only oriented
omponents.
Our method of \
learing the hurdles" uses the theory developed by Hannenhalli
and Pevzner. In Se
tion 5 we des
ribe an e
ient implementation of this pro
ess
whi
h uses the impli
it representation of the overlap graph OV (). Our implementation runs in O(n) time assuming OV () is already partitioned into its
onne
ted
omponents. Re
ently, Berman and Hannenhalli [4 gave an O(n(n)) algorithm for
omputing the
onne
ted
omponents of an interval overlap graph given impli
itly by
its representation. Using their algorithm we
an
lear the hurdles from a permutation
in O(n(n)) time.
The overlap graph of 0 , OV (0 ), has only oriented
omponents. In Se
tion 4 we
prove that in the neighborhood of any oriented gray edge e there is an oriented gray
edge e (e
ould be the same as e) su
h that a reversal a
ting on e does not
reate
new hurdles. Call su
h a reversal a safe reversal. We develop an e
ient algorithm
to lo
ate a safe reversal in a permutation with at least one oriented gray edge. Our
algorithm uses only an impli
it representation of the overlap graph and runs in O(n)
time.
The se
ond stage of our algorithm repeatedly nds a safe reversal and performs
it as long as OV () is not empty. Clearly the overall
omplexity is O(r n + n(n)),
where r is the number of reversals required to sort 0.
We assume that the input is given as
a sequen
e of n signed integers representing . First the permutation = u( ) is
onstru
ted as des
ribed in Se
tion 2.1 and stored in an array. We also
onstru
t an
array representing . It is straightforward to verify that with these two arrays we
an determine for ea
h element in whether it is a left or a right endpoint of a gray
edge in
onstant time. In
ase the element is an endpoint of a gray edge we
an also
nd the other endpoint and
he
k whether the edge is oriented in
onstant time.
Thus the arrays and
omprise a representation of OV (). Our algorithm
will maintain these two arrays while
arrying out the reversals that it nds. The time
to update the arrays is proportional to the length of the interval being reversed, whi
h
is O(n). We shall give a high-level presentation of our algorithm and use primitives
like \S
an the oriented gray edges in in
reasing left endpoint order". It is easy to
see how to implement these primitives using the arrays and ; we shall omit the
details.
It is easy to produ
e a list of the intervals in the representation of OV () sorted
by either left or right endpoint from the arrays and . It is also possible to
maintain them without in
reasing the asymptoti
time bound of the algorithm. In
pra
ti
e it may be faster to maintain su
h lists instead of, or in addition to and
.
First we introdu
e some notation. Re
all that the verti
es of OV () are the gray edges of B(). In order to avoid
onfusion
we will usually refer to them as verti
es of OV (). Hen
e a vertex of OV () is oriented
if the
orresponding gray edge is oriented and it is unoriented otherwise. Let e be a
vertex in OV (). Denote by r(e) the reversal a
ting on the gray edge
orresponding
to e. Denote by N (e) the set of neighbors of e in OV () in
luding e itself. Denote by
ON (e) the subset of N (e)
ontaining the oriented verti
es and by UN (e) the subset
of N (e)
ontaining the unoriented verti
es.
In this se
tion we prove that if an oriented vertex e exists in OV () then there
exists an oriented vertex f 2 ON (e) su
h that r(f ) is proper and safe. We also
des
ribe an algorithm that nds a proper safe reversal in a permutation that
ontains
at least one oriented edge.
We start with the following useful observation:
Observation 4.1. Let e be a vertex in OV ( ) and let 0 = r(e). OV ( 0 )
ould
be obtained from OV ( ) by the following operations. 1) Complement the graph indu
ed
by OV ( ) on N (e) feg, and
ip the orientation of every vertex in N (e) feg. 2)
If e is oriented in OV ( ) then remove it from OV ( ). 3) If there exists an oriented
edge e0 in OV ( ) with r(e) = r(e0 ) then remove e0 from OV ( ).
Note that if e is an oriented vertex in a
omponent M of OV (), M feg may
split into several
omponents in OV (0 ). (Compare gures 2.1(
) and 2.2(b).) Denote
these
omponents by M 0 (e); : : : ; Mk0 (e), where k 1. We will refer to Mi0(e) simply
as Mi0 whenever e is
lear from the
ontext.
Let C be a
lique of oriented verti
es in OV (). We say that C is happy if for
every oriented vertex e 62 C and every vertex f 2 C su
h that (e; f ) 2 E (OV ()) there
exists an oriented vertex g 62 C su
h that (g; e) 2 E (OV ()) and (g; f ) 62 E (OV ()).
3.1. Representing the overlap graph.
0
For example, in the overlap graph shown in Figure 2.1(
) f(2; 3); (10; 11)g and f(6; 7)g
are happy
liques, but f(2; 3); (10; 11); (8; 9)g is not. Our rst theorem
laims that one
of verti
es in any happy
lique denes a safe proper reversal.
Theorem 4.1. Let C be a happy
lique and let e be a vertex in C su
h that
jUN (e0 )j jUN (e)j for every e0 2 C . Then the reversal r(e) is safe.
Proof. Let 0 = r(e) and assume by
ontradi
tion that Mi0 (e) is unoriented for
some 1 i k. Clearly N (e) \ Mi0 6= ;.
Assume there exists y 2 N (e) \ Mi0 su
h that y 62 C . Clearly y must be oriented
in OV () and sin
e C is happy it must also have an oriented neighbor y0 su
h that
(y0; e) 62 E (OV ()). Sin
e
y0 is not adja
ent to e in OV () it stays oriented and
0
adja
ent to y in OV ( ), in
ontradi
tion with the assumption that Mi0 is unoriented.
Hen
e we may assume that N (e) \ Mi0 C .
Let y 2 N (e) \ Mi0 and let z 2 UN (e). Vertex z is oriented in OV (0 ) and if it is
adja
ent to y in OV (0 ) we obtain a
ontradi
tion. Hen
e, z and y are not adja
ent in
OV (0 ), so they must be adja
ent in OV (). Hen
e we obtain that UN (e) UN (y)
in OV (). Corollary 2.2 implies that
omponent Mi0
annot
ontain y alone. Thus y
must have a neighbor x in Mi0. Sin
e N (e) \ Mi0 C , vertex x is not adja
ent to e
in OV (). Thus we obtain that (x; y) 2 OV (), (x; e) 62 OV (), and x is unoriented
in OV (). Sin
e we have already proved that UN (e) UN (y), this implies that
UN (e) UN (y), in
ontradi
tion with the
hoi
e of e.
For example Theorem 4.1 implies that the reversal dened by the gray edge
(10; 11) is a safe proper reversal for the permutation of Figure 2.1 (a), sin
e it
orresponds to the vertex with maximum unoriented degree in the happy
lique
f(2; 3); (10; 11)g. On the other hand, the reversal dened by (2; 3)
reates a new
unoriented
omponent, as it yields the permutation shown in Figure 2.2.
The following theorem proves that a happy
lique exists in the neighborhood of
any oriented edge.
Theorem 4.2. Let e be an oriented vertex in OV ( ). There exists an oriented
vertex f 2 ON (e) su
h that for 0 = r(f ), all the
omponents in OV ( 0 ) are oriented.
Proof. By Theorem 4.1 it su
es to show that there exists a happy
lique C in
ON (e).
Let Ext(e) = fx 2 ON (e) j there exists y 2 ON (x) su
h that y 62 ON (e)g.
That is, Ext(e)
ontains all oriented neighbors of e whi
h have oriented neighbors
outside of ON (e).
Case 1: Ext(e) = ON (e) feg. Set C = feg.
Case 2: Ext(e) ON (e) feg. Let D = ON (e) Ext(e). For j 0, while Dj is
not a
lique let K j be a maximal
lique in Dj and dene Dj = Dj K j . Let Dk ,
k 0 be the nal
lique and set C = Dk .
It is straightforward to verify that in ea
h of the two
ases C is indeed a happy
lique.
0
+1
In the next se
tion we des
ribe an algorithm that will nd an oriented edge e
su
h that r(e) is safe given the representation of OV () des
ribed in Se
tion 3.1. The
algorithm rst nds a happy
lique C and then sear
hes for the vertex with maximum
unoriented degree in C . A
ording to Theorem 4.1 this vertex denes a safe reversal.
Even though Theorem 4.2 guarantees the existen
e of a happy
lique in the neighborhood of any xed oriented vertex, our algorithm does not sear
h in one parti
ular
su
h neighborhood. We will prove that the algorithm is guaranteed to nd a happy
lique assuming that there exists at least one oriented edge. Therefore the algorithm
8
provides an alternative proof to a weaker version of Theorem 4.2 that only
laims the
existen
e of a happy
lique somewhere in the graph.
In this se
tion we give an algorithm that
lo
ates a happy
lique in OV (). Let e ; : : : ; ek be the oriented verti
es in OV () in
in
reasing left endpoint order. The algorithm traverses the oriented verti
es in OV ()
a
ording to this order. Let L(e) and R(e) be the left and right endpoints, respe
tively,
of vertex e in the realization of OV (). After traversing e ; : : : ; ei, 1 i k, the
algorithm maintains a happy
lique Ci in the subgraph of OV () indu
ed by these
verti
es. Assume jCi j = j , j i and let ei1 ; : : : ; ei be the verti
es in Ci where
i < i < : : : < ij . The verti
es of Ci are maintained in a linked list ordered in
in
reasing left endpoint order. If there exists an interval that
ontains all the intervals
in Ci then the algorithm maintains a minimal su
h interval ti . The
lique Ci and the
vertex ti (if exists) satisfy the following invariant.
Invariant 4.1.
1) Every vertex el 62 Ci , l i, su
h that L(ei1 ) < L(el ) must be adja
ent to ti , i.e.,
R(el ) > R(ti ).
2) Every vertex el 62 Ci , L(el ) < L(ei1 ) that is adja
ent to a vertex in Ci is either
adja
ent to an interval ep su
h that R(ep ) < L(ei1 ) or adja
ent to ti .
The fa
t that Ci is happy in the subgraph indu
ed by e ; : : : ; ei follows from
this invariant. We initialize the algorithm by setting C = fe g. Initially, t is not
dened. Let the
urrent interval be ei . If R(ei ) < L(ei ) then Ci is guaranteed
to be happy in OV () sin
e all remaining oriented verti
es are not adja
ent to Ci .
Hen
e the algorithm stops and returns Ci as the answer. See Figure 4.1(a).
We now assume that L(ei ) R(ei ) and show how to obtain Ci and ti .
We have to
onsider the following
ases.
Case 1. The interval ti is dened and R(ti ) < R(ei ). Continue with Ci = Ci and
ti = ti . See Figure 4.1(b).
Case 2. The interval ti is not dened or R(ei ) R(ti ).
a) R(ei ) < R(ei ) and L(ei ) R(ei1 ). Ci is obtained by adding ei to Ci and
ti = ti . See Figure 4.1(
).
b) R(ei ) < R(ei ) and L(ei ) > R(ei1 ). The
lique Ci
onsists of ei alone and
ti = ti . See Figure 4.1(d).
) R(ei ) < R(ei ). As in the previous
ase Ci = fei g. In this
ase ti is set
to ei , the last interval in Ci . See Figure 4.1(e).
4.1. Finding a happy
lique.
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
The following theorem proves that the algorithm above produ
es a happy
lique.
Let Cl be the
urrent
lique when the algorithm stops. Then Cl
is a happy
lique in OV ( ).
Proof. A straightforward indu
tion on the number of oriented verti
es traversed
by the algorithm proves that Cl and tl satisfy Invariant 4.1.
The algorithm stops either when R(ei ) < L(el ) or when l is equal to the
number of oriented verti
es. In either
ase sin
e Cl is happy in the subgraph indu
ed
by e ; : : : ; el it must be happy in OV ().
The running time of the algorithm is proportional to the number of oriented
verti
es traversed sin
e a
onstant amount of work is performed for ea
h su
h vertex.
After lo
ating a happy
lique C in OV ()
we need to sear
h it for a vertex with a maximum number of unoriented neighbors.
In this se
tion we give an algorithm that performs this task.
Theorem 4.3.
+1
e
Fig. 4.1. The various
ases of the algorithm for nding a happy
lique. The topmost interval
is always ti . The three thi
k intervals
omprise Ci . The dotted interval
orresponds to ei+1 .
=1
+1
+1
+1
=1
10
The
omplexity of the algorithm is proportional to the size of C plus the number
of unoriented verti
es in OV (), and hen
e is O(n).
In
ase there are unoriented
omponents in OV (),
there exists a sequen
e r ; : : : rt of t reversals that transform into 0 su
h that d(0 ) =
d() t, where t = dh()=2e. In this se
tion we summarize the
hara
terization given
by Hannenhalli and Pevzner for these t reversals and outline how to nd them using
our impli
it representation of OV ().
We will use the following denitions. A reversal merges hurdles H and H if it
a
ts on two breakpoints, one in
ident with a gray edge in H and the other in
ident
with a gray edge in H . Re
all the
ir
le CR dened in Se
tion 2, in whi
h the
endpoints of the edges in the unoriented
omponents of OV () are ordered
onsistently
with their order in . Two hurdles H and H are
onse
utive if their sets of endpoints
E (H ) and E (H ) o
ur
onse
utively on CR, i.e., there is no hurdle H su
h that
E (H ) separates E (H ) and E (H ) on CR.
The following lemmas were essentially proved by Hannenhalli and Pevzner though
stated dierently in their paper.
Lemma 5.1 ([11). Let be a permutation with an even number, say 2k , of
hurdles. Any sequen
e of k 1 reversals ea
h of whi
h merges two non-
onse
utive
hurdles followed by a reversal merging the remaining two hurdles will transform into
0 su
h that d(0 ) = d() k and 0 has only oriented
omponents.
Lemma 5.2 ([11). Let be a permutation with an odd number, say 2k + 1, of
hurdles. If at least one hurdle H is simple then a reversal a
ting on two breakpoints
in
ident with edges in H transforms into 0 with 2k hurdles su
h that d( 0 ) =
d() 1. If is a fortress then a sequen
e of k 1 reversals merging pairs of non5. Clearing the hurdles.
1
onse
utive hurdles followed by two additional merges of pairs of
onse
utive hurdles
(one merges two original hurdles and the next merges a hurdle
reated by the rst and
the last original hurdle) will transform into 0 su
h that d( 0 ) = d( ) (k + 1) and
0 has only oriented
omponents.
We now outline how to turn these lemmas into an algorithm that nds a parti
ular
sequen
e of reversals r ; : : : ; rt with the properties des
ribed above. First OV () is
de
omposed into
onne
ted
omponents as des
ribed in [4. One then has to identify
those unoriented
omponents that are hurdles. This task
an be done by traversing the
endpoints of the
ir
le CR,
ounting the number of elements in ea
h run of
onse
utive
endpoints belonging to the same
omponent. If a run
ontains all endpoints of a
parti
ular unoriented
omponent M then M is an hurdle.
In a similar fashion one
an
he
k for ea
h hurdle whether it is a simple hurdle
or a super hurdle. While traversing the
y
le, a list of the hurdles in the order they
o
ur on CR is
reated. At the next stage this list is used to identify
orre
t hurdles
to merge.
We assume that given an endpoint one
an lo
ate its
onne
ted
omponent in
onstant time. It is easy to verify that the data
an be maintained so that this is
possible.
Theorem 5.3. Given OV ( ) de
omposed into its
onne
ted
omponents, the
algorithm outlined above nds t reversals su
h that when we apply them to we obtain
a 0 whi
h is hurdle-free and has d( 0 ) = d( ) t. The algorithm
an be implemented
to run in O(n) time.
Proof. Corre
tness follows from Lemma 5.1 and 5.2. The time bound is a
hieved
if we always merge hurdles that are separated by a single hurdle. If the ith merge
merged hurdles H and H that are separated by H , then H should be merged in the
1
11
i + 1st merge. Carrying out the merges this way guarantees that the span of ea
h
hurdle H overlaps at most two merging reversals, the se
ond of whi
h eliminates H .
6. Summary.
Signed Reversals();
/* is a signed permutation */
1. Compute the
onne
ted
omponents of OV ( ).
2. Clear the hurdles.
3. while is not sorted do :
/* iteration */
algorithm
begin
a. nd a happy
lique C in OV ( ).
b. nd a vertex ef 2 C with maximum unoriented
degree, and perform a safe reversal on ef ;
. update and the representation of OV ( ).
end
end
Fig. 6.1
Theorem 6.1.
Algorithm
Signed Reversals
Proof. The
orre
tness of the algorithm follows from Theorem 2.3, Theorem 4.1
and Lemmas 5.1 and 5.2.
Step 1 takes O(n(n)) time by the algorithm of Berman and Hannenhalli [4.
Step 2 takes O(n) time by Theorem 5.3. Step 3 takes O(n) time per reversal, by the
dis
ussion in Se
tion 4.
It is an intriguing open question whether a faster algorithm for sorting signed
permutations by reversals exists. It
ertainly might be the
ase that one
an nd an
optimal sequen
e of reversals faster. To date, no nontrivial lower bound is known for
this problem.
We thank Donald Knuth, Sridhar Hannenhalli, Pavel Pevzner,
and Itsik Pe'er for their
omments on a preliminary version of this paper.
A
knowledgments.
REFERENCES
[1
[2
[3
[4
[5
[6
[7
, Zum hilbertshen aufbau der reelen zahlen, Math. Ann., 99 (1928), pp. 118{133.
, Sorting permutations by transpositions, in Pro
eedings of the 6th
Annual Symposium on Dis
rete Algorithms, ACM Press, Jan. 1995, pp. 614{623.
V. Bafna and P. A. Pevzner, Genome rearragements and sorting by reversals, SIAM Journal
on Computing, 25 (1996), pp. 272{289. A preliminary version appeared in Pro
. 34th IEEE
Symp. of the Foundations of Computer S
ien
e, pages 148{157, 1994.
P. Berman and S. Hannenhalli, Fast sorting by reversals, in Pro
. Combinatorial Pattern
Mat
hing (CPM), 1996, pp. 168{185. LNCS 1075.
A. Caprara, Sorting by reversals is di
ult, in Pro
eedings of the First International Conferen
e on Computational Mole
ular Biology (RECOMB), ACM Press, 1997, pp. 75{83.
D. A. Christie, A 3/2-approximation algorithm for sorting by reversals, in Pro
. ninth annual
ACM-SIAM Symp. on Dis
rete Algorithms (SODA 98), ACM Press, 1998, pp. 244{252.
T. Dobzhansky and A. H. Sturtevant, Inversions in the
hromosomes of drosophila pseudoobs
ura, Geneti
s, 23 (1938), pp. 28{64.
W. A
kermann
12
[8
[9
[10
[11
[12
[13
[14
[15
[16
[17
[18
[19
[20
[21
[22
[23
, Algorithmi Graph Theory and Perfe t Graphs, A ademi Press, New York,
M. C. Golumbi
1980.
13