5 views

Uploaded by tanmay100

Research Paper on Genome matching and sorting by SBR,SBT algorithms.

save

You are on page 1of 13

PERMUTATIONS BY REVERSALS

HAIM KAPLANy , RON SHAMIRz ,

AND

ROBERT E. TARJANx

**Abstra
t. We give a quadrati
-time algorithm for nding the minimum number of reversals
**

needed to sort a signed permutation. Our algorithm is faster than the previous algorithm of Hannenhalli and Pevzner and its faster implementation of Berman and Hannenhalli. The algorithm

is
on
eptually simple and does not require spe
ial data stru
tures. Our study also
onsiderably

simplies the
ombinatorial stru
tures used by the analysis.

**AMS (MOS) subje
t
lassi
ation: 62P10 68P10
**

Key words: sorting permutations, reversal distan
e,
omputational mole
ular bi-

ology.

In this paper we study the problem of sorting signed permutations by reversals. A signed permutation is a permutation = ( ; : : : ; n) on the

integers f1; : : : ; ng, where ea
h number is also assigned a sign of plus or minus. A

reversal, (i; j ), on transforms to

0 = (i; j ) =

( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):

The minimum number of reversals needed to transform one permutation to another is
alled the reversal distan
e between them. The problem of sorting signed permutations by reversals is to nd, for a given signed permutation , a sequen
e of reversals of minimum length that transforms to the identity permutation (+1; +2; : : : ; +n).

The motivation to studying the problem arises in mole
ular biology: Con
urrent

with the fast progress of the Human Genome Proje
t, geneti
and DNA data on many

model organisms is a
umulating rapidly, and
onsequently the ability to
ompare

genomes of dierent spe
ies has grown dramati
ally. One of the best ways of
he
king

similarity between genomes on a large s
ale is to
ompare the order of appearan
e

of identi
al genes in the two spe
ies. In the Thirties, Dobzhansky and Sturtevant [7℄

had already studied the notion of inversions in
hromosomes of drosophila. In the

late Eighties, Jerey Palmer demonstrated that dierent spe
ies may have essentially

the same genes, but the gene orders may dier between spe
ies. Taking an abstra
t

perspe
tive, the genes along a
hromosome
an be thought of as points along a line.

Numbers identify the parti
ular genes; and, as genes have dire
tionality, signs
orrespond to their dire
tion. Palmer and others have shown that the dieren
e in order

may be explained by a small number of reversals [17, 18, 19, 20, 12℄. These reversals

orrespond to evolutionary
hanges during the history of the two genomes, so the num1. Introdu
tion.

1

1

1

1

A

+1

**preliminary version of this paper was presented at the Eighth ACM-SIAM Symposium on
**

Dis
rete Algorithms [13℄.

y AT&T-labs resear
h, 180 Park Ave, Florham Park, NJ 07932 USA. hklresear
h.att.
om

z Department of Computer S
ien
e, Sa
kler Fa
ulty of Exa
t S
ien
es, Tel Aviv University,

Tel-Aviv 69978 ISRAEL. Resear
h supported in part by a grant from the Ministry of S
ien
e

and the Arts, Israel, and by US Department of Energy, grant No. DE-FG03-94ER61913/A000.

shamirmath.tau.a
.il

x Department of Computer S
ien
e, Prin
eton University, Prin
eton, NJ 08544 USA and InterTrust

Te
hnologies Corporation, Sunnyvale, CA 94086 USA. Resear
h at Prin
eton University partially

supported by the NSF, Grants CCR-8920505 and CCR-9626862, and the OÆ
e of Naval Resear
h,

Contra
t No. N00014-91-J-1463. ret
s.prin
eton.edu

1

**ber of reversals re
e
ts the evolutionary distan
e between the spe
ies. Hen
e, given
**

two su
h permutations, their reversal distan
e measures their evolutionary distan
e.

Mathemati
al analysis of genome rearrangement problems was initiated by Sanko

[22, 21℄. Ke
e
ioglu and Sanko [16℄ gave the rst
onstant-fa
tor polynomial approximation algorithm for the problem and
onje
tured that the problem is NP-hard.

Bafna and Pevzner [3℄, and more re
ently Christie [6℄ improved the approximation

fa
tor, and additional studies have revealed the ri
h
ombinatorial stru
ture of rearrangement problems [15, 14, 2, 9, 10℄. Quite re
ently, Caprara [5℄ has established

that sorting unsigned permutations is NP-hard, using some of the
ombinatorial tools

developed by Bafna and Pevzner [3℄.

In 1995, Hannenhalli and Pevzner [11℄ showed that the problem of sorting a signed

permutation by reversals is polynomial. They proved a duality theorem that equates

the reversal distan
e with the sum of three
ombinatorial parameters (see Theorem 2.3

below). Based on this theorem, they proved that sorting signed permutations by

reversals
an be done in O(n ) time. More re
ently, Berman and Hannenhalli [4℄

des
ribed a faster implementation that nds a minimum sequen
e of reversals in

O(n (n)) time, where is the inverse of A
kerman's fun
tion [1℄ (see also [23℄).

In this study we give an O(n ) algorithm for sorting a signed permutation of n

elements, thereby improving upon the previous best known bound [4℄. In fa
t, if the

reversal distan
e is r, our algorithm requires O(r n + n(n)) time. In addition to

giving a better time bound, our work
onsiderably simplies both the algorithm and

ombinatorial stru
ture needed for the analysis, as follows:

The basi
obje
t we work with is an impli
it representation of the overlap graph,

to be dened later, in
ontrast with the interleaving graph in [11℄ and [4℄. The

overlap graph is
ombinatorially simpler than the interleaving graph. As a result,

it is easier to produ
e a representation for the overlap graph from the input, and

to maintain it while sear
hing for reversals.

As a
onsequen
e of our ability to work with the overlap graph we need not perform

any \padding transformations", nor do we have to work with \simple permutations"

as in [11℄ and [4℄.

We deal with the unoriented and oriented parts of the permutation separately,

whi
h makes the algorithm mu
h simpler.

The notion of a hurdle, one of the
ombinatorial entities dened by [11℄ for the

duality theorem, is simplied and is handled in a more symmetri
manner.

The sear
h for the next reversal is mu
h simpler, and requires no spe
ial data

stru
tures. Our algorithm
omputes
onne
ted
omponents only on
e, and any

simple implementation of it suÆ
es to obtain the quadrati
time bound. In
ontrast, in [4℄ a logarithmi
number of
onne
ted
omponent
omputations may be

performed per reversal, using the union-nd data stru
ture.

The paper is organized as follows: Se
tion 2 gives the ne
essary preliminaries. Se
tion

3 gives an overview of our algorithm. Se
tions 4 and 5 give the details of our algorithm.

We summarize our results and suggest some further resear
h in Se
tion 6.

This se
tion gives the basi
ba
kground, primarily the theory

of Hannenhalli and Pevzner, on whi
h we base our algorithm. The reader may nd

it helpful to refer to Figure 2.1, in whi
h the main denitions are illustrated. We

start with some denitions for unsigned permutations. Let = ( ; : : : ; n) denote a

permutation of f1; : : : ; ng. Augment to a permutation on n + 2 verti
es by adding

= 0 and n = n + 1 to it. A pair (i ; i ), 0 i n is
alled a gap. Gaps

are
lassied into two types. A gap (i ; i ) is a breakpoint of if and only if

4

2

2

2. Preliminaries.

1

0

+1

+1

+1

2

**ji i j > 1; otherwise, it is an adja
en
y of . We denote by b() the number of
**

+1

breakpoints in .

A reversal, (i; j ), on a permutation transforms to

0 = (i; j ) =

( ; : : : ; i ; j ; j ; : : : ; i ; j ; : : : ; n ):

We say that a reversal (i; j ) a
ts on the gaps (i ; i ) and (j ; j ).

1

1

1

+1

1

a

0

7

8

6

1

b

5

1

2

10

9

4

14

0

+1

3

13

14

7

11

12

4

6

10

13

8

2

12

15

15

5

9

11

3

12,13

2,3

4,5

0,1

14,15

10,11

8,9

6,7

c

**Fig. 2.1. a) The breakpoint graph, B ( ), of the permutation = (4;
**

3; 1; 5; 2; 7; 6). Bla
k

edges are solid; gray edges are dashed; oriented edges are bold. b) B () de
omposes into two disjoint

alternating
y
les.
) The overlap graph, OV (). Bla
k verti
es
orrespond to oriented edges.

**The breakpoint graph B() of a permutation
**

= ( ; : : : ; n) is an edge-
olored graph on n + 2 verti
es f ; ; : : : ; n g =

f0; 1; : : : ; n + 1g. We join verti
es i and j by a bla
k edge if (i ; j ) is a breakpoint

in and by a gray edge if (i; j ) is a breakpoint in .

We dene a one-to-one mapping u from the set of signed permutations of order

n into the set of unsigned permutations of order 2n as follows. Let be a signed

permutation. To obtain u(), repla
e ea
h positive element x in by 2x 1; 2x

and ea
h negative element x by 2x; 2x 1. For any signed permutation , let

B () = B (u()). Note that in B () every vertex is either isolated or in
ident to

exa
tly one bla
k edge and one gray edge. Therefore, there is a unique de
omposition

of B() into
y
les. The edges of ea
h
y
le alternate between gray and bla
k. Call a

reversal (i; j ) su
h that i is odd and j even an even reversal. The reversal (2i +1; 2j )

**2.1. The breakpoint graph.
**

1

0

1

3

1

+1

**on u() mimi
s the reversal (i+1; j ) on . Thus, sorting by reversals is equivalent to
**

sorting the unsigned permutation u() by even reversals. Hen
eforth we will
onsider

the latter problem, and by a reversal we will always mean an even reversal. Let

b() = b(u()) and let
() be the number of
y
les in B ().

Figure 2.1(a) shows the breakpoint graph of the permutation = (4; 3; 1; 5; 2; 7; 6).

It has eight breakpoints and de
omposes into two alternating
y
les, i.e. b() = 8, and

() = 2. The two
y
les are shown in Figure 2.1(b). Figure 2.2(a) shows the breakpoint graph of 0 = (4; 3; 1; 2; 5; 7; 6), whi
h has seven breakpoints and de
omposes

into two
y
les.

For an arbitrary reversal on a permutation , dene b(; ) = b() b()

and
(; ) =
()
(). When the reversal and the permutation are
lear

from the
ontext, we will abbreviate b(; ) by b and
(; ) by
. As Bafna

and Pevzner [3℄ observed, the following values are taken by b and
depending on

the types of the gaps (i; j ) a
ts on:

1. Two adja
en
ies:
= 1 and b = 2.

2. A breakpoint and an adja
en
y:
= 0 and b = 1.

3. Two breakpoints ea
h belonging to a dierent
y
le: b = 0,
= 1.

4. Two breakpoints of the same
y
le C :

a. (i ; j ) and (i ; j ) are gray edges:
= 1, b = 2.

b. Exa
tly one of (i ; j ) and (i ; j ) is a gray edge:
= 0, b = 1.

. Neither (i ; j ) nor (i ; j ) is a gray edge, and when breaking C at i and

j verti
es i 1 and j + 1 end up in the same path: b = 0,
= 0.

d. Neither (i ; j ) nor (i ; j ) is a gray edge, and when breaking C at i and

j verti
es i 1 and j + 1 end up in dierent paths: b = 0,
= 1.

Call a reversal proper if b
= 1, i.e. it is either of type 4a, 4b, or 4d.

We say that a reversal a
ts on a gray edge e if it a
ts on the breakpoints whi
h

orrespond to the bla
k edges in
ident with e. A gray edge is oriented if a reversal

a
ting on it is proper, otherwise it is unoriented. Noti
e that a gray edge (k ; l ) is

oriented if and only if k + l is even. For example, the gray edge (0; 1) in the graph of

Figure 2.1(a) is unoriented, while the gray edge (7; 6) is oriented.

Two intervals on the real line overlap if their interse
tion is nonempty but neither properly
ontains the other. A graph G is an interval

overlap graph if one
an assign an interval to ea
h vertex su
h that two verti
es are

adja
ent if and only if the
orresponding intervals overlap (see, e.g., [8℄). For a permutation , we asso
iate with a gray edge (i ; j ) the interval [i; j ℄. The overlap graph

of a permutation , denoted OV (), is the interval overlap graph of the gray edges

of B(). Namely, the vertex set of OV () is the set of gray edges in B(), and two

verti
es are
onne
ted if the intervals asso
iated with their gray edges overlap. We

shall identify a vertex in OV () with the edge it represents and with its interval in the

representation. Thus, the endpoints of a gray edge are a
tually the endpoints of the

interval representing the
orresponding vertex in OV (). Note that all the endpoints

of intervals in this representation are distin
t integers. A
onne
ted
omponent of

OV () that
ontains an oriented edge is
alled an oriented
omponent; otherwise, it

is
alled an unoriented
omponent.

Figure 2.1(
) shows the interval overlap graph for = (4; 3; 1; 5; 2; 7; 6). It

has only one0 oriented
omponent. Figure 2.2(b) shows the overlap graph of the permutation = (4; 3; 1; 2; 5; 7; 6), whi
h has two
onne
ted
omponents, one oriented

and the other unoriented.

+1

1

+1

1

+1

1

+1

1

2.2. The overlap graph.

4

a

0

8

7

6

5

1

2

3

4

12,13

9

10

13

4,5

0,1

8,9

6,7

14

11

12

15

b

14,15

10,11

Fig. 2.2.

a) The breakpoint graph of 0 = (4; 3; 1; 2; 5; 7; 6). 0 was obtained from of

Figure 2.1 by the reversal (7; 10); or, equivalently, by the reversal dened by the gray edge (2; 3).

b) The overlap graph of 0 .

Let X be a set of

gray edges in B(). Dene min(X ) = minfi j (i ; j ) 2 X g, max(X ) = maxfj j (i ; j ) 2

X g and span(X ) = [min(X ); max(X )℄. Equivalently, one
an look at the interval overlap representation of OV () mentioned above and dene the span of a set of verti
es

X as the minimum interval whi
h
ontains all the intervals of verti
es in X .

The major obje
t our algorithm will work with is OV (), though for eÆ
ien
y

onsiderations we will avoid generating it expli
itly. In
ontrast, Pevzner and Hannenhalli worked with the interleaving graph H , whose verti
es are the alternating

y
les of B(). Two
y
les C and C are
onne
ted by an edge in H i there exists

a gray edge e 2 C and a gray edge e 2 C that overlap.

The following lemma and its
orollary imply that the partition imposed by the

onne
ted
omponents of OV () on the set of gray edges is identi
al to the one

imposed by the
onne
ted
omponents of H :

Lemma 2.1. If M is a set of gray edges in B ( ) that
orresponds to a
onne
ted

omponent in OV ( ) then min(M ) is even and max(M ) is odd.

Proof. Assume min(M ) is odd. Then M + 1 and M

1 must both

be in span(M ) (i.e. there exist l ; l 2 span(M ) su
h that l1 = M + 1 and

l2 = M 1). Thus M is neither the maximum nor the minimum element

in the set fi j i 2 span(M )g. Hen
e, either the maximum element or the minimum

element in span(M ) is j for some min(M ) < j < max(M ). By the denition of B()

there must be a gray edge (j ; l ) for some l 62 span(M ),
ontradi
ting the fa
t that

M is a
onne
ted
omponent in OV (). The proof that max(M ) is odd is similar.

As an illustration of Lemma 2.1,
onsider Figure 2.2(a). Let M = f(0; 1); (4; 5); (8; 9); (6; 7)g

and M = f(10; 11); (12; 13); (14; 15)g. Then span(M ) = [0; 9℄ and span(M ) =

[10; 15℄.

Corollary 2.2. Every
onne
ted
omponent of OV ( )
orresponds to the set of

gray edges of a union of
y
les.

Proof. Assume by
ontradi
tion that C is a
y
le whose gray edges belong to

at least two
onne
ted
omponents in OV (). Assume M and M are two of these

omponents su
h that there are two
onse
utive gray edges e 2 M and e 2 M

along C . Sin
e the spans of dierent
onne
ted
omponents in OV ()
annot overlap

there are two dierent
ases to
onsider.

2.3. The
onne
ted
omponents of the overlap graph.

1

1

2

1

2

2

min(

1

min(

)

min(

)

min(

)

2

min(

)

)

1

2

1

2

1

2

1

5

1

2

2

**1. span(M ) span(M ) (the
ase span(M ) span(M ) is symmetri
). Sin
e
**

e and e are in dierent
omponents they
annot overlap. Thus, either the right

endpoint of e is even and equals max(M ) or the left endpoint of e is odd and

equals min(M ). In both
ases we have a
ontradi
tion to Lemma 2.1.

2. span(M ) and span(M ) are disjoint intervals. W.l.o.g. assume that max(M ) <

min(M ). The right endpoint of e is even and equals max(M ), whi
h
ontradi
ts

Lemma 2.1.

Note that in parti
ular Corollary 2.2 implies that an overlap graph
annot
ontain

isolated verti
es.

Let i1 ; i2 ; : : : ; i be the subsequen
e of 0; ; : : : ; n ; n +1
onsisting of those elements in
ident with gray edges that o
ur in unoriented
omponents of OV (). Order i1 ; i2 ; : : : ; i on a
ir
le CR su
h that i follows i 1 for

2 j k and i1 follows i . Let M be an unoriented
onne
ted
omponent in

OV (). Let E (M ) fi1 ; i2 ; : : : ; i g be the set of endpoints of the edges in M . An

unoriented
omponent M is a hurdle if the elements of E (M ) o
ur
onse
utively on

CR.

This denition of a hurdle is dierent from the one given by Hannenhalli and

Pevzner [11℄. It is simpler in the sense that minimal hurdles and the maximal one do

not have to be treated in dierent ways. Using Corollary 2.2 above, one
an prove that

the hurdles as we have dened them are identi
al to the ones dened by Hannenhalli

and Pevzner. Let h() denote the number of hurdles in a permutation .

A hurdle is simple if when one deletes it from OV () no other unoriented
omponent be
omes a hurdle, and it is a super hurdle otherwise. A fortress is a permutation

with an odd number of hurdles all of whi
h are super hurdles.

The following theorem was proved by Hannenhalli and Pevzner.

Theorem 2.3. [11℄ The minimum number of reversals required to sort a permutation is b( )
( ) + h( ), unless is a fortress, in whi
h
ase exa
tly one

2

1

1

1

2

2

2

2

2

2

2

1

1

2

1

1

2.4. Hurdles.

1

k

j

j

k

k

k

additional reversal is ne essary and suÆ ient.

**3. Overview of our algorithm. Denote by d( ) the reversal distan
e of , i.e.,
**

d() = b()
()+ h()+1 if is a fortress and d() = b()
()+ h() otherwise.

**Following the theory developed in [11℄, it turns out that given a permutation
**

with h() >0 0 one
an perform

t = dh()=2e reversals and transform into a

permutation su
h that h(0 ) = 0 and d(0 ) = d() t. If OV () has unoriented

omponents then our algorithm rst nds t su
h reversals that transform into a 0

whi
h has only oriented
omponents.

Our method of \
learing the hurdles" uses the theory developed by Hannenhalli

and Pevzner. In Se
tion 5 we des
ribe an eÆ
ient implementation of this pro
ess

whi
h uses the impli
it representation of the overlap graph OV (). Our implementation runs in O(n) time assuming OV () is already partitioned into its
onne
ted

omponents. Re
ently, Berman and Hannenhalli [4℄ gave an O(n(n)) algorithm for

omputing the
onne
ted
omponents of an interval overlap graph given impli
itly by

its representation. Using their algorithm we
an
lear the hurdles from a permutation

in O(n(n)) time.

The overlap graph of 0 , OV (0 ), has only oriented
omponents. In Se
tion 4 we

prove that in the neighborhood of any oriented gray edge e there is an oriented gray

edge e (e
ould be the same as e) su
h that a reversal a
ting on e does not
reate

new hurdles. Call su
h a reversal a safe reversal. We develop an eÆ
ient algorithm

to lo
ate a safe reversal in a permutation with at least one oriented gray edge. Our

1

1

1

6

**algorithm uses only an impli
it representation of the overlap graph and runs in O(n)
**

time.

The se
ond stage of our algorithm repeatedly nds a safe reversal and performs

it as long as OV () is not empty. Clearly the overall
omplexity is O(r n + n(n)),

where r is the number of reversals required to sort 0.

We assume that the input is given as

a sequen
e of n signed integers representing . First the permutation = u( ) is

onstru
ted as des
ribed in Se
tion 2.1 and stored in an array. We also
onstru
t an

array representing . It is straightforward to verify that with these two arrays we

an determine for ea
h element in whether it is a left or a right endpoint of a gray

edge in
onstant time. In
ase the element is an endpoint of a gray edge we
an also

nd the other endpoint and
he
k whether the edge is oriented in
onstant time.

Thus the arrays and
omprise a representation of OV (). Our algorithm

will maintain these two arrays while
arrying out the reversals that it nds. The time

to update the arrays is proportional to the length of the interval being reversed, whi
h

is O(n). We shall give a high-level presentation of our algorithm and use primitives

like \S
an the oriented gray edges in in
reasing left endpoint order". It is easy to

see how to implement these primitives using the arrays and ; we shall omit the

details.

It is easy to produ
e a list of the intervals in the representation of OV () sorted

by either left or right endpoint from the arrays and . It is also possible to

maintain them without in
reasing the asymptoti
time bound of the algorithm. In

pra
ti
e it may be faster to maintain su
h lists instead of, or in addition to and

.

First we introdu
e some notation. Re
all that the verti
es of OV () are the gray edges of B(). In order to avoid
onfusion

we will usually refer to them as verti
es of OV (). Hen
e a vertex of OV () is oriented

if the
orresponding gray edge is oriented and it is unoriented otherwise. Let e be a

vertex in OV (). Denote by r(e) the reversal a
ting on the gray edge
orresponding

to e. Denote by N (e) the set of neighbors of e in OV () in
luding e itself. Denote by

ON (e) the subset of N (e)
ontaining the oriented verti
es and by UN (e) the subset

of N (e)
ontaining the unoriented verti
es.

In this se
tion we prove that if an oriented vertex e exists in OV () then there

exists an oriented vertex f 2 ON (e) su
h that r(f ) is proper and safe. We also

des
ribe an algorithm that nds a proper safe reversal in a permutation that
ontains

at least one oriented edge.

We start with the following useful observation:

Observation 4.1. Let e be a vertex in OV ( ) and let 0 = r(e). OV ( 0 )
ould

be obtained from OV ( ) by the following operations. 1) Complement the graph indu
ed

by OV ( ) on N (e) feg, and
ip the orientation of every vertex in N (e) feg. 2)

If e is oriented in OV ( ) then remove it from OV ( ). 3) If there exists an oriented

edge e0 in OV ( ) with r(e) = r(e0 ) then remove e0 from OV ( ).

Note that if e is an oriented vertex in a
omponent M of OV (), M feg may

split into several
omponents in OV (0 ). (Compare gures 2.1(
) and 2.2(b).) Denote

these
omponents by M 0 (e); : : : ; Mk0 (e), where k 1. We will refer to Mi0(e) simply

as Mi0 whenever e is
lear from the
ontext.

Let C be a
lique of oriented verti
es in OV (). We say that C is happy if for

every oriented vertex e 62 C and every vertex f 2 C su
h that (e; f ) 2 E (OV ()) there

exists an oriented vertex g 62 C su
h that (g; e) 2 E (OV ()) and (g; f ) 62 E (OV ()).

3.1. Representing the overlap graph.

0

0

1

1

1

1

1

4. Eliminating oriented omponents.

1

7

**For example, in the overlap graph shown in Figure 2.1(
) f(2; 3); (10; 11)g and f(6; 7)g
**

are happy
liques, but f(2; 3); (10; 11); (8; 9)g is not. Our rst theorem
laims that one

of verti
es in any happy
lique denes a safe proper reversal.

Theorem 4.1. Let C be a happy
lique and let e be a vertex in C su
h that

jUN (e0 )j jUN (e)j for every e0 2 C . Then the reversal r(e) is safe.

Proof. Let 0 = r(e) and assume by
ontradi
tion that Mi0 (e) is unoriented for

some 1 i k. Clearly N (e) \ Mi0 6= ;.

Assume there exists y 2 N (e) \ Mi0 su
h that y 62 C . Clearly y must be oriented

in OV () and sin
e C is happy it must also have an oriented neighbor y0 su
h that

(y0; e) 62 E (OV ()). Sin
e

y0 is not adja
ent to e in OV () it stays oriented and

0

adja
ent to y in OV ( ), in
ontradi
tion with the assumption that Mi0 is unoriented.

Hen
e we may assume that N (e) \ Mi0 C .

Let y 2 N (e) \ Mi0 and let z 2 UN (e). Vertex z is oriented in OV (0 ) and if it is

adja
ent to y in OV (0 ) we obtain a
ontradi
tion. Hen
e, z and y are not adja
ent in

OV (0 ), so they must be adja
ent in OV (). Hen
e we obtain that UN (e) UN (y)

in OV (). Corollary 2.2 implies that
omponent Mi0
annot
ontain y alone. Thus y

must have a neighbor x in Mi0. Sin
e N (e) \ Mi0 C , vertex x is not adja
ent to e

in OV (). Thus we obtain that (x; y) 2 OV (), (x; e) 62 OV (), and x is unoriented

in OV (). Sin
e we have already proved that UN (e) UN (y), this implies that

UN (e) UN (y), in
ontradi
tion with the
hoi
e of e.

For example Theorem 4.1 implies that the reversal dened by the gray edge

(10; 11) is a safe proper reversal for the permutation of Figure 2.1 (a), sin
e it

orresponds to the vertex with maximum unoriented degree in the happy
lique

f(2; 3); (10; 11)g. On the other hand, the reversal dened by (2; 3)
reates a new

unoriented
omponent, as it yields the permutation shown in Figure 2.2.

The following theorem proves that a happy
lique exists in the neighborhood of

any oriented edge.

Theorem 4.2. Let e be an oriented vertex in OV ( ). There exists an oriented

vertex f 2 ON (e) su
h that for 0 = r(f ), all the
omponents in OV ( 0 ) are oriented.

Proof. By Theorem 4.1 it suÆ
es to show that there exists a happy
lique C in

ON (e).

Let Ext(e) = fx 2 ON (e) j there exists y 2 ON (x) su
h that y 62 ON (e)g.

That is, Ext(e)
ontains all oriented neighbors of e whi
h have oriented neighbors

outside of ON (e).

Case 1: Ext(e) = ON (e) feg. Set C = feg.

Case 2: Ext(e) ON (e) feg. Let D = ON (e) Ext(e). For j 0, while Dj is

not a
lique let K j be a maximal
lique in Dj and dene Dj = Dj K j . Let Dk ,

k 0 be the nal
lique and set C = Dk .

It is straightforward to verify that in ea
h of the two
ases C is indeed a happy
lique.

0

+1

**In the next se
tion we des
ribe an algorithm that will nd an oriented edge e
**

su
h that r(e) is safe given the representation of OV () des
ribed in Se
tion 3.1. The

algorithm rst nds a happy
lique C and then sear
hes for the vertex with maximum

unoriented degree in C . A
ording to Theorem 4.1 this vertex denes a safe reversal.

Even though Theorem 4.2 guarantees the existen
e of a happy
lique in the neighborhood of any xed oriented vertex, our algorithm does not sear
h in one parti
ular

su
h neighborhood. We will prove that the algorithm is guaranteed to nd a happy

lique assuming that there exists at least one oriented edge. Therefore the algorithm

8

**provides an alternative proof to a weaker version of Theorem 4.2 that only
laims the
**

existen
e of a happy
lique somewhere in the graph.

In this se
tion we give an algorithm that

lo
ates a happy
lique in OV (). Let e ; : : : ; ek be the oriented verti
es in OV () in

in
reasing left endpoint order. The algorithm traverses the oriented verti
es in OV ()

a
ording to this order. Let L(e) and R(e) be the left and right endpoints, respe
tively,

of vertex e in the realization of OV (). After traversing e ; : : : ; ei, 1 i k, the

algorithm maintains a happy
lique Ci in the subgraph of OV () indu
ed by these

verti
es. Assume jCi j = j , j i and let ei1 ; : : : ; ei be the verti
es in Ci where

i < i < : : : < ij . The verti
es of Ci are maintained in a linked list ordered in

in
reasing left endpoint order. If there exists an interval that
ontains all the intervals

in Ci then the algorithm maintains a minimal su
h interval ti . The
lique Ci and the

vertex ti (if exists) satisfy the following invariant.

Invariant 4.1.

1) Every vertex el 62 Ci , l i, su
h that L(ei1 ) < L(el ) must be adja
ent to ti , i.e.,

R(el ) > R(ti ).

2) Every vertex el 62 Ci , L(el ) < L(ei1 ) that is adja
ent to a vertex in Ci is either

adja
ent to an interval ep su
h that R(ep ) < L(ei1 ) or adja
ent to ti .

The fa
t that Ci is happy in the subgraph indu
ed by e ; : : : ; ei follows from

this invariant. We initialize the algorithm by setting C = fe g. Initially, t is not

dened. Let the
urrent interval be ei . If R(ei ) < L(ei ) then Ci is guaranteed

to be happy in OV () sin
e all remaining oriented verti
es are not adja
ent to Ci .

Hen
e the algorithm stops and returns Ci as the answer. See Figure 4.1(a).

We now assume that L(ei ) R(ei ) and show how to obtain Ci and ti .

We have to
onsider the following
ases.

Case 1. The interval ti is dened and R(ti ) < R(ei ). Continue with Ci = Ci and

ti = ti . See Figure 4.1(b).

Case 2. The interval ti is not dened or R(ei ) R(ti ).

a) R(ei ) < R(ei ) and L(ei ) R(ei1 ). Ci is obtained by adding ei to Ci and

ti = ti . See Figure 4.1(
).

b) R(ei ) < R(ei ) and L(ei ) > R(ei1 ). The
lique Ci
onsists of ei alone and

ti = ti . See Figure 4.1(d).

) R(ei ) < R(ei ). As in the previous
ase Ci = fei g. In this
ase ti is set

to ei , the last interval in Ci . See Figure 4.1(e).

4.1. Finding a happy
lique.

1

1

j

1

2

1

1

+1

+1

1

1

+1

j

+1

j

+1

+1

+1

+1

+1

j

+1

+1

j

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

j

+1

+1

j

**The following theorem proves that the algorithm above produ
es a happy
lique.
**

Let Cl be the
urrent
lique when the algorithm stops. Then Cl

is a happy
lique in OV ( ).

Proof. A straightforward indu
tion on the number of oriented verti
es traversed

by the algorithm proves that Cl and tl satisfy Invariant 4.1.

The algorithm stops either when R(ei ) < L(el ) or when l is equal to the

number of oriented verti
es. In either
ase sin
e Cl is happy in the subgraph indu
ed

by e ; : : : ; el it must be happy in OV ().

The running time of the algorithm is proportional to the number of oriented

verti
es traversed sin
e a
onstant amount of work is performed for ea
h su
h vertex.

After lo
ating a happy
lique C in OV ()

we need to sear
h it for a vertex with a maximum number of unoriented neighbors.

In this se
tion we give an algorithm that performs this task.

Theorem 4.3.

j

1

4.2. Sear hing the happy lique.

9

+1

a

b

c

d

e

Fig. 4.1. The various
ases of the algorithm for nding a happy
lique. The topmost interval

is always ti . The three thi
k intervals
omprise Ci . The dotted interval
orresponds to ei+1 .

**Let e ; : : : ; ej be the intervals in C ordered in in
reasing left endpoint order.
**

Clearly, L(1) < L(2) < : : : < L(j ) < R(1) < R(2) < : : : < R(j ). Thus the endpoints

of the j verti
es in C partition the line into 2j + 1 disjoint intervals I ; : : : ; I j , where

I = ( 1; L(1)℄, Il = (L(l); L(l + 1)℄ for 1 l < j , Ij = (L(j ); R(1)℄, Il = (R(l

j ); R(l j + 1)℄ for j < l < 2j and I j = (R(j ); 1). The algorithm
onsists of the

following three stages.

Stage 1: Let e be an unoriented vertex that has a non-empty interse
tion with the

interval [L(1); R(j )℄. Mark ea
h of e's endpoints with the index of the interval that

ontains it.

to a vertex in C . The

Stage 2: Let o be an array of j
ounters, ea
h
orresponding

P

l

intention is to assign values to o su
h that the sum i o[i℄ is the unoriented degree

of the vertex el 2 C . The
ounters are initialized to zero. For ea
h unoriented vertex

e that overlaps with the interval [L(1); R(j )℄ we
hange at most four of the
ounters

as follows. Let Il and Ir be the intervals in whi
h L(e) and R(e) o
ur, respe
tively.

We may assume l < r as otherwise e is not adja
ent to any vertex in C and we
an

ignore it. We
ontinue a
ording to one of the following
ases.

Case 1: r j . All the verti
es from el to er are adja
ent to e: we in
rement o[l +1℄

and de
rement o[r + 1℄ (if r < j ).

Case 2: j l. All the verti
es from el j to er j are adja
ent to e: we in
rement

o[l j + 1℄ and de
rement o[r j + 1℄ (if r < 2j ).

Case 3: l < j and j < r. Let m = minfl; r j g. If m > 0 then all the verti
es from

e to em are adja
ent to e: we in
rement o[1℄ and de
rement o[m + 1℄. Similarly let

M = maxfl; r j g. If M < j then the verti
es from el to ej are adja
ent to e: we

in
rement the
ounter o[l + 1℄.P

Stage 3: Compute f = maxlf li o[i℄j1 l j g. Return ef .

The following theorem summarizes the result of this se
tion. We omit the proof,

whi
h is straightforward.

1

0

2

0

2

=1

+1

+1

1

+1

=1

**Theorem 4.4. Given a
lique C , the vertex ef 2 C
omputed by the algorithm
**

above has maximum unoriented degree among the verti
es in C .

10

**The
omplexity of the algorithm is proportional to the size of C plus the number
**

of unoriented verti
es in OV (), and hen
e is O(n).

In
ase there are unoriented
omponents in OV (),

there exists a sequen
e r ; : : : rt of t reversals that transform into 0 su
h that d(0 ) =

d() t, where t = dh()=2e. In this se
tion we summarize the
hara
terization given

by Hannenhalli and Pevzner for these t reversals and outline how to nd them using

our impli
it representation of OV ().

We will use the following denitions. A reversal merges hurdles H and H if it

a
ts on two breakpoints, one in
ident with a gray edge in H and the other in
ident

with a gray edge in H . Re
all the
ir
le CR dened in Se
tion 2, in whi
h the

endpoints of the edges in the unoriented
omponents of OV () are ordered
onsistently

with their order in . Two hurdles H and H are
onse
utive if their sets of endpoints

E (H ) and E (H ) o
ur
onse
utively on CR, i.e., there is no hurdle H su
h that

E (H ) separates E (H ) and E (H ) on CR.

The following lemmas were essentially proved by Hannenhalli and Pevzner though

stated dierently in their paper.

Lemma 5.1 ([11℄). Let be a permutation with an even number, say 2k , of

hurdles. Any sequen
e of k 1 reversals ea
h of whi
h merges two non-
onse
utive

hurdles followed by a reversal merging the remaining two hurdles will transform into

0 su
h that d(0 ) = d() k and 0 has only oriented
omponents.

Lemma 5.2 ([11℄). Let be a permutation with an odd number, say 2k + 1, of

hurdles. If at least one hurdle H is simple then a reversal a
ting on two breakpoints

in
ident with edges in H transforms into 0 with 2k hurdles su
h that d( 0 ) =

d() 1. If is a fortress then a sequen
e of k 1 reversals merging pairs of non5. Clearing the hurdles.

1

1

2

1

2

1

1

2

2

1

2

**onse
utive hurdles followed by two additional merges of pairs of
onse
utive hurdles
**

(one merges two original hurdles and the next merges a hurdle
reated by the rst and

the last original hurdle) will transform into 0 su
h that d( 0 ) = d( ) (k + 1) and

0 has only oriented
omponents.

**We now outline how to turn these lemmas into an algorithm that nds a parti
ular
**

sequen
e of reversals r ; : : : ; rt with the properties des
ribed above. First OV () is

de
omposed into
onne
ted
omponents as des
ribed in [4℄. One then has to identify

those unoriented
omponents that are hurdles. This task
an be done by traversing the

endpoints of the
ir
le CR,
ounting the number of elements in ea
h run of
onse
utive

endpoints belonging to the same
omponent. If a run
ontains all endpoints of a

parti
ular unoriented
omponent M then M is an hurdle.

In a similar fashion one
an
he
k for ea
h hurdle whether it is a simple hurdle

or a super hurdle. While traversing the
y
le, a list of the hurdles in the order they

o
ur on CR is
reated. At the next stage this list is used to identify
orre
t hurdles

to merge.

We assume that given an endpoint one
an lo
ate its
onne
ted
omponent in

onstant time. It is easy to verify that the data
an be maintained so that this is

possible.

Theorem 5.3. Given OV ( ) de
omposed into its
onne
ted
omponents, the

algorithm outlined above nds t reversals su
h that when we apply them to we obtain

a 0 whi
h is hurdle-free and has d( 0 ) = d( ) t. The algorithm
an be implemented

to run in O(n) time.

Proof. Corre
tness follows from Lemma 5.1 and 5.2. The time bound is a
hieved

if we always merge hurdles that are separated by a single hurdle. If the ith merge

merged hurdles H and H that are separated by H , then H should be merged in the

1

1

2

11

**i + 1st merge. Carrying out the merges this way guarantees that the span of ea
h
**

hurdle H overlaps at most two merging reversals, the se
ond of whi
h eliminates H .

6. Summary.

Figure 6.1 gives a s hemati des ription of the algorithm.

Signed Reversals();

/* is a signed permutation */

1. Compute the
onne
ted
omponents of OV ( ).

2. Clear the hurdles.

3. while is not sorted do :

/* iteration */

algorithm

begin

a. nd a happy
lique C in OV ( ).

b. nd a vertex ef 2 C with maximum unoriented

degree, and perform a safe reversal on ef ;

. update and the representation of OV ( ).

end

end

**4. output the sequen
e of reversals.
**

. An algorithm for sorting signed permutations

Fig. 6.1

Theorem 6.1.

Algorithm

Signed Reversals

nds the reversal distan e r in

O(n(n) + r n) time, and in parti ular in O(n2 ) time.

**Proof. The
orre
tness of the algorithm follows from Theorem 2.3, Theorem 4.1
**

and Lemmas 5.1 and 5.2.

Step 1 takes O(n(n)) time by the algorithm of Berman and Hannenhalli [4℄.

Step 2 takes O(n) time by Theorem 5.3. Step 3 takes O(n) time per reversal, by the

dis
ussion in Se
tion 4.

It is an intriguing open question whether a faster algorithm for sorting signed

permutations by reversals exists. It
ertainly might be the
ase that one
an nd an

optimal sequen
e of reversals faster. To date, no nontrivial lower bound is known for

this problem.

We thank Donald Knuth, Sridhar Hannenhalli, Pavel Pevzner,

and Itsik Pe'er for their
omments on a preliminary version of this paper.

A
knowledgments.

REFERENCES

[1℄

[2℄

[3℄

[4℄

[5℄

[6℄

[7℄

**, Zum hilbertshen aufbau der reelen zahlen, Math. Ann., 99 (1928), pp. 118{133.
**

, Sorting permutations by transpositions, in Pro
eedings of the 6th

Annual Symposium on Dis
rete Algorithms, ACM Press, Jan. 1995, pp. 614{623.

V. Bafna and P. A. Pevzner, Genome rearragements and sorting by reversals, SIAM Journal

on Computing, 25 (1996), pp. 272{289. A preliminary version appeared in Pro
. 34th IEEE

Symp. of the Foundations of Computer S
ien
e, pages 148{157, 1994.

P. Berman and S. Hannenhalli, Fast sorting by reversals, in Pro
. Combinatorial Pattern

Mat
hing (CPM), 1996, pp. 168{185. LNCS 1075.

A. Caprara, Sorting by reversals is diÆ
ult, in Pro
eedings of the First International Conferen
e on Computational Mole
ular Biology (RECOMB), ACM Press, 1997, pp. 75{83.

D. A. Christie, A 3/2-approximation algorithm for sorting by reversals, in Pro
. ninth annual

ACM-SIAM Symp. on Dis
rete Algorithms (SODA 98), ACM Press, 1998, pp. 244{252.

T. Dobzhansky and A. H. Sturtevant, Inversions in the
hromosomes of drosophila pseudoobs
ura, Geneti
s, 23 (1938), pp. 28{64.

W. A
kermann

V. Bafna and P. Pevzner

12

[8℄

[9℄

[10℄

[11℄

[12℄

[13℄

[14℄

[15℄

[16℄

[17℄

[18℄

[19℄

[20℄

[21℄

[22℄

[23℄

, Algorithmi Graph Theory and Perfe t Graphs, A ademi Press, New York,

M. C. Golumbi

1980.

**, Polynomial algorithm for
omputing translo
ation distan
e between genomes,
**

Dis
. Appl. Math., 71 (1996), pp. 137{151.

S. Hannenhalli and P. Pevzner, Transforming men into mi
e (polynomial algorithm for

genomi
distan
e problems, in Pro
. IEEE Symp. of the Foundations of Computer S
ien
e,

1995, pp. 581{592.

S. Hannenhalli and P. A. Pevzner, Transforming
abbage into turnip (polynomial algorithm

for sorting signed permutations by reversals), in Pro
eedings of the Twenty-Seventh Annual

ACM Symposium on Theory of Computing, Las Vegas, Nevada, 29 May{1 June 1995,

pp. 178{189.

S. B. Hoot and J. D. Palmer, Stru
tural rearrangements, in
luding parallel inversions, within

the
hloroplast genome of Anemone and related genera, J. Mole
ular Evooution, 38 (1994),

pp. 274{281.

H. Kaplan, R. Shamir, and R. E. Tarjan, Faster and simpler algorithm for sorting signed

permutations by reversals, in Pro
. 8th ACM-SIAM Symposium on Dis
rete Algorithms,

ACM-SIAM, 1997, pp. 344{351. Also in Pro
. RECOMB 97, page 163.

J. Ke
e
ioglu and R. Ravi, Physi
al mapping of
hromosomes using unique probes, in Pro
.

sixth annual ACM-SIAM Symp. on Dis
rete Algorithms (SODA 95), ACM Press, 1995,

pp. 604{613.

J. Ke
e
ioglu and D. Sankoff, EÆ
ient bounds for oriented
hromosome inversion distan
e,

in Pro
. of 5th Ann. Symp. on Combinatorial Pattern Mat
hing, Springer, 1994, pp. 307{

325. LNCS 807.

, Exa
t and approximation algorithms for sorting by reversals, with appli
ation to genome

rearrangement, Algorithmi
a, 13 (1995), pp. 180{210. A preliminary version appeared in

Pro
. CPM93, Springer, Berlin, 1993, pages 87{105.

J. D. Palmer and L. A. Herbon, Tri
ir
ular mito
hondrial genomes of Brassi
a and

Raphanus: reversal of repeat
ongurations by inversion, Nu
lei
A
ids Resear
h, 14

(1986), pp. 9755{9764.

, Uni
ir
ular stru
ture of the Brassi
a hirta mito
hondrial genome, Current Geneti
s, 11

(1987), pp. 565{570.

, Plant mito
hondrial DNA evolves rapidly in stru
ture, but slowly in sequen
e, J. Mole
ular Evolution, 28 (1988), pp. 87{97.

J. D. Palmer, B. Osorio, and W. Thompson, Evolutionalry signi
an
e of inversions in

legume
horloplast DNAs, Current Geneti
s, 14 (1988), pp. 65{74.

D. Sankoff, Edit distan
e for genome
omparison based on non-lo
al operations, Le
ture Notes

in Computer S
ien
e, 644 (1992), pp. 121{135.

D. Sankoff, R. Cedergren, and Y. Abel, Genomi
divergen
e through gene rearrangement.,

Methods in Enzymology, 183 (1990), pp. 428{438.

R. E. Tarjan, EÆ
ien
y of a good but not linear set union algorithm, J. ACM, 22 (1979),

pp. 215{225.

S. Hannenhalli

13

- Split Block Domination in GraphsUploaded byInternational Journal of Research in Engineering and Technology
- GATE Data Structure & Algorithm BookUploaded byMims12
- graphic organizerUploaded byapi-429808857
- The collaboration network of the Brazilian Symposium on DatabasesUploaded byAlefe
- merge_sortedUploaded byipjeaanvbnr
- AlgorithmAnalysis1.pptUploaded byNuwan Dhanushka
- Graph ProblemUploaded byGerald Carson
- Cl451midsem14 AnswersUploaded byRahul Malhotra
- mth303_20180103_wed_15449Uploaded byTANISH GUPTA
- Intro Graph AlgorithmsUploaded byThahir Shah
- NPcompleteUploaded bypoja_sinha
- 15_chapter6Uploaded byKampa Lavanya
- Imc2015 Day1 SolutionsUploaded byIván Adrián
- CS 135 mp - Chua,Ralph (2009-10151)Uploaded byRalph Jacob Ang Chua
- On A New Multicomputer Interconnection Topology For Massively Parallel SystemsUploaded byijdps
- Excel Practice Toolkit_2007.xlsUploaded bypgdm mho
- On the Theory of Colorful GraphsUploaded byhumejias
- The EOQ FormulaUploaded byAjay Nikte
- Chemical Graph Theory - TrinajsticUploaded byAnonymous RZMJQ6lW
- ugc net exam Daa pdf fileUploaded byRamesh Kalia
- codility - how to get ascender element in a array - Stack Overflow.htm.txtUploaded byprasuresh
- Ce 31342345Uploaded byIJMER
- CBSE 2014 Question Paper for Class 12 Computer Science - DelhiUploaded byaglasem
- IJSET_2014_218Uploaded byInnovative Research Publications
- UNIT 6Uploaded bybadboy_rockss
- 10.1.1.21.8541Uploaded bymohsenmirzay
- Lecture 3 Arrays OperationsUploaded byMuhammad Muzammil
- Time Response of Second Order Systems - GATE Study Material in PDFUploaded byAtul Choudhary
- NSCL ROOT User ManualUploaded byFiore Carpino
- Operation on FileUploaded byShalabh Saigal

- Monetary PolicyUploaded bytanmay100
- iitjee2011paper2Uploaded bytanmay100
- Narayana aieee 2010 solutionsUploaded bytanmay100
- IIT JEE 2011 PAPER-2 NARYANAUploaded bytanmay100
- 25340192 Impact of Globalization on Indian EconomyUploaded bytanmay100
- sardar patel- iron man of indiaUploaded bytanmay100
- IIT JEE 2011 PAPER-2 FIITJEEUploaded bytanmay100
- AIEEE 2011 Solutions - resonance kotaUploaded bytanmay100
- Airtel DocumentUploaded bytanmay100
- My Reminiscences of Sardar Patel by v ShankarUploaded bytanmay100
- PatelUploaded bytanmay100
- Topic 4_curved Surf & BuoyancyUploaded bytanmay100
- partition of india by panigrahiUploaded bytanmay100
- maulana abul kalam azad by rc modyUploaded bytanmay100
- my memories of sardar patel by rc modyUploaded bytanmay100
- v.p. menon -the forgotten architect of modern indiaUploaded bytanmay100
- IIT JEE 2011 paper-1 FIITJEEUploaded bytanmay100
- my memories of jinnah by rc modyUploaded bytanmay100