Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Reversal Distance

Reversal Distance

Ratings: (0)|Views: 52 |Likes:
Published by viveksahu87

More info:

Categories:Types, Research, Science
Published by: viveksahu87 on Sep 19, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/19/2009

pdf

text

original

 
 Estimating Genome Reversal Distance by Genetic Algorithm
Andy Auyeung
Oklahoma State UniversityMath Science 219Stillwater, OK 740781 (405) 744–5668wingha@cs.okstate.edu
Ajith Abraham
Oklahoma State UniversityNorth Hall 328Tulsa, OK 741061 (918) 594–8188aa@cs.okstate.edu
Abstract- Sorting by reversals is an importantproblem in inferring the evolutionary relationshipbetween two genomes. The problem of sortingunsigned permutation has been proven to be NP-hard.The best guaranteed error bounded is the 3/2-approximation algorithm. However, the problem of sorting signed permutation can be solved easily. Fastalgorithms have been developed both for finding thesorting sequence and finding the reversal distance of signed permutation. In this paper, we present a way toview the problem of sorting unsigned permutation assigned permutation. And the problem can then be seenas searching an optimal signed permutation in all
n
2
corresponding signed permutations. We use geneticalgorithm to conduct the search. Our experimentalresult shows that the proposed method outperform the3/2-approximation algorithm.
1
 
Introduction
Genome Rearrangement is a mechanism that happens inmitochondrial genomes (Russell 2002). The genes orderin mitochondrial genome is constantly underrearrangement. Therefore, by estimating therearrangement distance between two genomes, therelationship between them can also be estimated (Pevzner2001). Reversal is the most commonly seen mechanismthat genomes are rearranged. Figure 1 shows theestimated transformation from
Tobacco
to
 Lobelia fervens
by reversals (Bafna and Pevzner; 1996). There are twovariations of this problem, signed permutation andunsigned permutation. For unsigned permutation, agenome is modeled as a permutation
π 
with order
n
(i.e. apermutation of {1, 2, …,
n
}), where
n
is the number of gene blocks in the genome. Let the permutation
π 
=
π 
[1]
π 
[2] …
π 
[
n
], the reversal operation
ρ 
(
i
,
 j
) rearrange
π 
into
π 
[1] …
π 
[
i
-1]
π 
[
 j
-1] …
π 
[
i
]
π 
[
 j
]
π 
[
n
]. For signedpermutation
π 
, each
π 
[
] has either a positive or anegative sign. Each reversal operation
ρ 
(
i
,
 j
) not onlyrearrange
π 
but also negate the sign of 
π 
[
] for
i
<
j
.The problem of estimating reversal distance between twogenomes is formulated as sorting permutation by reversaloperation. That is, given
π 
(or 
π 
’)
, we want to find asorting sequence that uses minimum number of reversal to
Figure 1. Transformation from
Tabacco
to
 Lobelia fervens
byreversals (Bafna and Pevzner; 1996).
sort
π 
(or
π 
’)
into identity permutation (i.e. thepermutation, 1 2 …
n
for unsigned permutation, and +1+2 … +
n
for signed permutation). We called theminimum number of reversal the reversal distance.The problem of sorting signed permutation can besolved in
O(
2
n
)
time (Kaplan et al. 1997). The problemof finding the reversal distance of signed permutation canbe solved in
O(n)
time (Bader et al. 2001). However, bothsorting and finding the reversal distance of unsignedpermutation has been proven to be NP-hard (Caprara1997). So, error bounded heuristic solutions have beenproposed (Bafna and Pevzner 1996; Kececioglu andSankoff 1995). The lowest guaranteed error bound thusfar is the 3/2-approximation algorithm (Christie 1998).The 3/2-approximation algorithm uses the fact that anycycle decomposition of the breakpoint graph thatmaximize the number of 2-cycles exists a sortingsequence with length at most 3/2 of the optimal sortingsequence.In this paper, we propose a genetic algorithm forsorting unsigned permutation by reversal. Our methoddoes not provide guaranteed error bound. However, ourexperiment shows that it finds better solution than the 3/2-
 
approximation algorithm. Also, so far, all heuristicalgorithms for this problem use a constructive manner tofind the solution. We would like to show an alternativeapproach how this problem can be solved in inductivemanner.The rest of the paper is organized as the following.In Section 2, some background materials on sortingpermutation by reversal and the concept of geneticalgorithm are presented. In Section 3, the proposedmethod is explained. In Section 4, the experimental setupand results are shown. In Section 5, observations from theexperiment are discussed. Finally in Section 6, someconcluding remarks are made.
2
 
Reviews
2.1
 
Breakpoint Graph
An unsigned permutation can be modeled by a breakpointgraph. For each gene (a number in the permutation), wewill create a node for it. The idea of breakpoint graph is tomark the desired and realistic relationship between thesenodes in the permutation. For each pair of the nodes wedraw a black edge between them if they are adjacent inthe permutation, and we draw a red edge between them if they are adjacent in the identity permutation. In order tomodel the orientation, we expand the unsignedpermutation to have a zero at the front and a
n
+1 at theend. An example is shown in Figure 2. It has been shownthat given a cycle decomposition of the breakpoint graph,any reversal can at most change the number of cycles byone. Besides, it has also been shown that given a cycledecomposition of the breakpoint graph, the correspondingshortest sorting sequence can then be found. Thus, the keyproblem is to find a cycle decomposition that provides theshortest sorting sequence for the unsigned permutation.However, the problem of finding an optimal cycledecomposition is NP-hard.On the other hand, sorting signed permutationcan be solved easily. We first create the breakpoint graphas above. Then for each gene node, we split it into twonodes according to its sign (and we shift to number of thenode accordingly). An example is shown in Figure 3. Theadvantage is that now each node has exactly one red edgeand one black edge associated with it. Thus, there is onlyone cycle decomposition of this breakpoint graph. Manyalgorithms have been proposed to find the sortingsequence. The best known time bound is
O(
2
n
)
time.And the best known time bound time for finding thereversal distance is
 
O(n)
.
2.2
 
Genetic Algorithm
Genetic algorithm is a searching technique. The idea isinspired by natural selection happens in evolution. Agenetic algorithm works with a population of individuals,each representing a solution to a given problem. The idea
Figure 2. Breakpoint graph for unsigned permutation 4 1 3 2.Figure 3. Breakpoint graph for signed permutation +4 -1 +3 -2.
is to use evolutionary model to determine the nextsearching area from that is the next population, isgenerated by selection, crossover and mutation operationaccording to the fitness of solutions in the currentpopulation.The norm of the genetic algorithm can be describeas the following. A population of possible solutions isinitially generated. The algorithm is divided ingenerations. In each generation, if the terminationcondition has not met, next population will be firstdetermined by selection and crossover. In selection,individuals in the current population are probabilisticallyselected according to their fitness to move to the nextpopulation. In crossover, pairs of individuals areprobabilistically selected according to their fitness togenerate a new pair of individuals (offspring) by thecrossover operator. Once the next population has beendetermined, then mutation operation will beprobabilistically applied to individuals in the nextpopulation. Finally, we re-label the next population to bethe current population that is to symbolize the oldpopulation has died out. And the fitness of the newindividuals will be evaluated. In each generation, only fitindividuals can produce offspring and survive. Thus, apopulation of fit solutions would be expected when thealgorithm terminates. And the best individual would beused as the solution to the problem.
 
3
 
Proposed Method
The idea of the proposed method is to view all thepossible cycle decomposition of the unsigned permutationas the signed permutations that have the same gene order.Except node 0 and
n
+1, every node in the breakpointgraph has degree 2 for red edges and also for black edges.So, a cycle decomposition is used to define the (color)alternating paths. However, there is a correspondingsigned permutation that actually defines the samealternating paths. Therefore, we can now turn our focuson signed permutation instead of cycle decomposition.Define the set
Signed(
π 
)
be the set of signed permutationsthat have the same gene order as
π 
. For example, when
π 
is 2 1, then
Signed(
π 
)
is { -2 -1, -2 +1, +2 –1, +2 +1}.Thus, the size of 
Signed(
π 
)
is
n
2
. The following twoobservations are required for our method.From the two observations, we can see that theproblem can be solved in O(
n
2
n) time. That is to find thereversal distance for all
n
2
 
π
’. Let
π
* be the
π
’ that hasminimum reversal distance. Then the sorting sequence of 
π
is the sorting sequence of 
π
*. However, it is not feasibleto go through all
n
2
 
π
’. Thus, we use genetic algorithmto find
π
*. There is no guarantee that the geneticalgorithm would find
π
*, however, we could expect thegenetic algorithm would find a
π
’ that has low reversaldistance.
4
 
Experiment
The genetic algorithm is the following. We allow thepopulation size to be
2
n
. The initial population israndomly generated binary strings representing the signsof the genes in the permutation. However, we apply aheuristic on taking all trivial cycles (i.e. cycles thatcompose of exactly one red edge and one black edge).Thus, sorted substrings would be assigned to the samesign (positive for ascending sorted substring; negative fordescending sorted substring). The fitness is evaluated bythe reversal distance of this signed permutation. Singlepoint crossover and mutation are used with rate 0.5 and0.3 respectively. And the genetic algorithm is terminatedwhen the best reversal distance in the population remainsunchanged in three generations.We conduct the experiment by randomly generatedpermutations, where permutations are generated byperforming n random swap operations on the identitypermutation. Figure 4 shows the comparison between the3/2-approximation algorithm and the proposed method.(The actual data is shown in Appendix A.) The figureshows the number of reversals required in the sortingsequences found by the two methods. The comparison ison the average solution of ten runs with
n
is between 10to 150. We can see that the proposed method producebetter solution than the 3/2-approximation algorithm.
Figure 4. Performance comparison of 3/2-approximation andgenetic algorithm.
5
 
Discussion
Sorting unsigned permutation has a trivial
n
-1 upperbound. That is, simply using one reversal to put one geneblock in place. When
n
-1
 
gene blocks are in place, the
n
-
Observation 1: Each
π 
Signed(
π 
) can deduces a valid sorting sequence for 
π 
.Proof: Let sorting sequence
ρ 
 
 
sort 
π 
’ into identity(i.e.
ρ 
π 
'
= id). Because |
π 
’[i]| =
π 
[i], then
ρ 
π 
'
[i]=
ρ 
π 
[i], for all i. Thus,
ρ 
 
can also sort 
π 
(i.e.
ρ 
π 
= id).
 
Observation 2:There exist 
π 
*
Signed(
π 
) that deducesan optimal sorting sequence for 
π 
.Proof: Let 
ρ 
 
be a sorting sequence for 
π 
that usesminimum number of reversals. For each
π 
[k], let count[k] be the number of times that 
π 
[k] isincluded in reversals of 
ρ 
 
 , i.e.
ρ 
(i,j) i
≤ 
k < j. Then
π 
* is the following,
π 
*[k] has a positive sign icount[k] is even, otherwise it has a negative sign. Because |
π 
’[i]| =
π 
[i],
ρ 
π 
[i] =|
ρ 
π 
'
[i]|, for all i. However, all
ρ 
π 
'
[i], must be positive by our construct. Thus
ρ 
 
can also
π 
* and the reversaldistance for 
π 
* is equal to the reversal distance for 
π 
.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->