Estimating Genome Reversal Distance by Genetic Algorithm
Andy Auyeung
Oklahoma State UniversityMath Science 219Stillwater, OK 740781 (405) 744–5668wingha@cs.okstate.edu
Ajith Abraham
Oklahoma State UniversityNorth Hall 328Tulsa, OK 741061 (918) 594–8188aa@cs.okstate.edu
Abstract- Sorting by reversals is an importantproblem in inferring the evolutionary relationshipbetween two genomes. The problem of sortingunsigned permutation has been proven to be NP-hard.The best guaranteed error bounded is the 3/2-approximation algorithm. However, the problem of sorting signed permutation can be solved easily. Fastalgorithms have been developed both for finding thesorting sequence and finding the reversal distance of signed permutation. In this paper, we present a way toview the problem of sorting unsigned permutation assigned permutation. And the problem can then be seenas searching an optimal signed permutation in all
n
2
corresponding signed permutations. We use geneticalgorithm to conduct the search. Our experimentalresult shows that the proposed method outperform the3/2-approximation algorithm.
1
Introduction
Genome Rearrangement is a mechanism that happens inmitochondrial genomes (Russell 2002). The genes orderin mitochondrial genome is constantly underrearrangement. Therefore, by estimating therearrangement distance between two genomes, therelationship between them can also be estimated (Pevzner2001). Reversal is the most commonly seen mechanismthat genomes are rearranged. Figure 1 shows theestimated transformation from
Tobacco
to
Lobelia fervens
by reversals (Bafna and Pevzner; 1996). There are twovariations of this problem, signed permutation andunsigned permutation. For unsigned permutation, agenome is modeled as a permutation
π
with order
n
(i.e. apermutation of {1, 2, …,
n
}), where
n
is the number of gene blocks in the genome. Let the permutation
π
=
π
[1]
π
[2] …
π
[
n
], the reversal operation
ρ
(
i
,
j
) rearrange
π
into
π
[1] …
π
[
i
-1]
π
[
j
-1] …
π
[
i
]
π
[
j
] …
π
[
n
]. For signedpermutation
π
’
, each
π
’
[
k
] has either a positive or anegative sign. Each reversal operation
ρ
(
i
,
j
) not onlyrearrange
π
’
but also negate the sign of
π
’
[
k
] for
i
≤
k
<
j
.The problem of estimating reversal distance between twogenomes is formulated as sorting permutation by reversaloperation. That is, given
π
(or
π
’)
, we want to find asorting sequence that uses minimum number of reversal to
Figure 1. Transformation from
Tabacco
to
Lobelia fervens
byreversals (Bafna and Pevzner; 1996).
sort
π
(or
π
’)
into identity permutation (i.e. thepermutation, 1 2 …
n
for unsigned permutation, and +1+2 … +
n
for signed permutation). We called theminimum number of reversal the reversal distance.The problem of sorting signed permutation can besolved in
O(
2
n
)
time (Kaplan et al. 1997). The problemof finding the reversal distance of signed permutation canbe solved in
O(n)
time (Bader et al. 2001). However, bothsorting and finding the reversal distance of unsignedpermutation has been proven to be NP-hard (Caprara1997). So, error bounded heuristic solutions have beenproposed (Bafna and Pevzner 1996; Kececioglu andSankoff 1995). The lowest guaranteed error bound thusfar is the 3/2-approximation algorithm (Christie 1998).The 3/2-approximation algorithm uses the fact that anycycle decomposition of the breakpoint graph thatmaximize the number of 2-cycles exists a sortingsequence with length at most 3/2 of the optimal sortingsequence.In this paper, we propose a genetic algorithm forsorting unsigned permutation by reversal. Our methoddoes not provide guaranteed error bound. However, ourexperiment shows that it finds better solution than the 3/2-