4 views

Uploaded by Mukul Bhalla

Bioinformatics lecture notes !

Bioinformatics lecture notes !

Attribution Non-Commercial (BY-NC)

- Top Alumni-Bioinformatics - Upto May 2018
- 9 Permutation & Combination Part 2 of 4
- Introduction to Algorithms
- TCS Sample Question
- Probability
- BIO F111
- Some Basic Probability Concepts
- Plant Physiology
- Human Genomes by the Thousands-1st Lecture Info
- B.sc. (Prac) Mathematics 2010
- CEHv8 Module 18 Buffer Overflow
- 3.1 Permutations
- Pervasive contaminations in sequencing experiments are a major source of false genetic variability: a Mycobacterium tuberculosis meta-analysis
- CS301 4th Solution Fall 2012 Share By Romeo
- Genome Truncation vs Mutational Opportunity - j22!1!111-119
- Cardan and Cryptography
- M422.S03.final_copy
- A Class of stochastic optimization problems with application to selective data editing
- Mathematics
- 1030154608-Witul

You are on page 1of 5

Fall 2011

.

Alexander Dekhtyar

.

When genomes of dierent species were compared, biologists noticed that groups of genes appeared together in the dierent genomes, however, the global arrangement of these blocks of genes diered from genome to genome. Biologists hypothesize that such rearrangements are due to gene reversals : groups of genes being transferred from one DNA strand to the other, and thus reversing their order. Reversal. More formally, let = 1 , . . . i , . . . j , . . . n be a permutation of {1, . . . , n}. A reversal of positions i and j in , denoted (i, j ) is a transformation that results in the following permutation : 1 , . . . i1 , pj , pj 1 , . . . , i+1 , i , j +1 , . . . , n . Example. Consider a permutation = 1, 2, 3, 4, 5, 6, 7. A reverasal (3, 6) applied to will produce the following permutation: = (3, 6) = 1, 2, 3, 4, 5, 6, 7 1, 2, 6, 5, 4, 3, 7

Biological meaning. In a permutation , each component i represents a block of genes that persisted through a variety of genomes. Biologists construct such maps for dierent organisms - each can be represented by some permutation. The key study question here is, what is the sequence of reversals that leads from one genome to another. Genome reversal problem. Given two permutations and of n numbers, nd a series of reversals 1 , . . . , t , such that: 1. rho1 . . . t = ; 2. t is minimized. t is called the reversal distance between and . 1

Note. Without loss of generality, we assume that = 1, 2, 3, . . . , n. The problem of genome reversal to the identity permutation is sometimes called the sorting by reversal problem.

Prexes. Let = 1 , . . . , n and let 1 , . . . i = 1, 2, . . . i. We call 1 , . . . , i , the prex of and denote it pref ix( ). Idea. Let pref ix( ) = 1 , . . . i . Find position j in , such that j = i + 1. Apply reversal (i + 1, j ) to . (i + 1, j ) will increase its prex by at least one position (possibly more, if, e.g., j 1 = i + 2). Thus, on each step, we are approaching monotonically the sorted permutation. Greed. The algorithm that extends the pref ix( ) is greedy, because on each step it goes for the low-hanging fruit of increasing the already sorted part of the permutation. Algorithm. The pseudocode for the algorithm is below. Algorithm SimpleReveraslSort( ) begin for i 1 to n 1 do j FindPosition( ,i); if j = i then Reversal( ,i,j); print ; end if if = 1, 2, . . . , n then return; end if end for end

Function FindPosition(). FindPosition nds a position j in the permutation (array) , such that j = i. Without preprocessing, it can be done in O(n) for each call. To speed up FindPosition() we can create an array P [1..n], such that P [i] is such that P [i] = i, i.e., the position in , which contains number i. If an up-to-date version of P is available prior to calling FindPosition, then, FindPosition() works in O(1). Function Reversal(). Reversal(, i, j ) performs the reversal operation (i, j ) on permutation . This can be done in O(n) time, with the use of a supplemental variable which would be used for value exchange. Because a reversal changes the locations of various values in the permutations, a new array P needs to be computed. This computation, however, can be done in a straightforward manner - each time a new assignment to a position in

is made, the appropriate update of the array P is performed. This doubles the number of operations, but the running time of this algorithm will remain O(n). Analysis. Straightforward implementation of SimpleReversalSort() takes O(n2 ) time: the outer loop repeats O(n) times, and each loop repeat involves, in worst case, an O(n) operation. SimpleReversalSort is NOT optimal. Consider the following permutation = 7, 6, 1, 2, 3, 4, 5. SimpleReveralSort with as input will produce the following output: 7 1 1 1 1 1 6 6 2 2 2 2 1 7 7 3 3 3 2 2 6 6 4 4 3 3 3 7 7 5 4 4 4 4 6 6 5 5 5 5 5 7 Reversal(1,3) Reversal(2,4) Reversal(3,5) Reversal(4,6) Reversal(5,7)

That is, SimpleReversalSort sorts the permutation using ve reversals. Yet, the following shows that there is a sequence of reversals that takes fewer steps: 7 6 1 2 3 4 5 7 6 5 4 3 2 1 1 2 3 4 5 6 7 Reversal(3,7) Reversal(1,7)

Breakpoints. Let = 1 , . . . , n be a permutation. Without loss of generality we extend it to = 0 , 1 , . . . , n , n+1 . A pair positions i , i+1 is called an adjacency if |i i+1 | = 1. Otherwise, the pair i , i+1 is called a breakpoint. Example. Let = 6, 5, 2, 3, 1, 4. has two (2) adjacencies: 6,5 and 2,3. It has three (3) breakpoints: 5,2, 3,1 and 1,4. Idea. = 1, 2, . . . , n has no breakpoints and n 1 adjacencies. All adjacencies are increasing. If has b( ) breakpoints, then an algorithm that monotonically decreases the number of breakpoints until it reaches 0 will solve the problem. How bad can it get? Each reversal can potentially eliminate two break) points. Therefore, the total number of reversals needed to sort, d( ) b(2 . Strips. A strip in a permutation is an interval between two consecutive breakpoints. A strip i , . . . , j is increasing if i < i+1 < . . . < j . A strip is decreasing otherwise. (Strips of length 1 are both increasing and decreasing). Example. Let = 0, 7, 5, 4, 3, 1, 2, 8. There is a decreasing strip 5, 4, 3 between the breakpoints 7, 5 and 3, 1 in . Similarly, there is an increasing strip 1, 2 between breakpoints 3, 1 and 2, 8.

Theorem. If a permutation has a decreasing strip, then there exists a reversal that decreases the number of breakpoints in : b( ) < b( ). Proof. Consider a decreasing strip i , . . . , j in , such that j = k is the smallest number terminating a decreasing strip. The number k 1 must therefore terminate an increasing strip : (k is the smallest terminus of a decreasing strip, it is NOT followed by k 1, hence k 1 either is surrounded by two breakpoints somewhere, or is the end of an increasing strip of length 2. Let s be the position of k in . Let = (s + 1, j ) (or (j + 1, s) - depending on which number if greater). will eliminate put k and k 1 on the same strip. Example. Consider = 0, 4, 3, 7, 6, 2, 1, 5, 8. Here, the decreasing strip with the smallest terminus is 2, 1, and k 1 = 0, at position 0. The total number of breakpoints is 5((0, 4), (3, 7), (6, 2), (1, 5), (5, 8). We use the reversal (1, 6). (1, 6) = 0, 1, 2, 6, 7, 3, 4, 5, 8. Here, the number of breakpoints is 3 : (2, 6), (7, 3) and (5, 8). Example. Consider = 0, 5, 4, 6, 7, 1, 2, 3, 8. This permutation has 4 breakpoints ((0, 5), (4, 6), (7, 1), (3, 8) and only one decreasing strip, 5, 4. We apply the trasformation (3, 7): (1, 7) = 0, 5, 4, 3, 2, 1, 7, 6, 8. Here, the number of breakpoints is 3 : (0, 5), (1, 7) and (6, 8). What if there is no decreasing strip? If has no decreasing strip, then we pick any increasing strip and reverse it. This will create a decreasing strip in and we can apply our theorem. Lemma. If has no decreasing strips, then reversing any increasing strip does not change the total number of breakpoints. Example. Recall our permutation = 0, 4, 3, 7, 6, 2, 1, 5. We applied a reversal (1, 6) to it to get (1, 6) = 0, 1, 2, 6, 7, 3, 4, 5, 8. This permutation has no decreasing strips. We pick an increasing strip 6, 7 and reverse it using (3, 4): (1, 6) (3, 4) = 0, 1, 2, 7, 6, 3, 4, 5, 8. There is now a decreasing strip and we can proceed with reversing it: (1, 6) (3, 4) (5, 7) = 0, 1, 2, 7, 6, 5, 4, 3, 8 (1, 6) (3, 4) (5, 7) (3, 7) = 0, 1, 2, 3, 4, 5, 6, 7, 8 Algorithm. The outline of the algorithm is: 1. If has decreasing strips, nd the decreasing strip with the smallest terminus k , and merge it with k 1. 2. If has no decreasing strips, reverse any increasing strip. 4

References

[1] John Kececioglu, David Sanko, Exact and Approximation Algorithms for Sorting by Reversals with Applications to Genome Rearrangement, Algorithmica, Vo. 1/2: pp. 180-210 (1995).

- Top Alumni-Bioinformatics - Upto May 2018Uploaded byChandraprakash Shah
- 9 Permutation & Combination Part 2 of 4Uploaded bykotes2007
- Introduction to AlgorithmsUploaded byShri Man
- TCS Sample QuestionUploaded byMela Tei Skhem
- ProbabilityUploaded byRudinico Mariano Tolentino
- BIO F111Uploaded byAnshik Bansal
- Some Basic Probability ConceptsUploaded byasdasdas asdasdasdsadsasddssa
- Plant PhysiologyUploaded byCharleneKronstedt
- Human Genomes by the Thousands-1st Lecture InfoUploaded byDeepali Kundnani
- B.sc. (Prac) Mathematics 2010Uploaded byrevanth143
- CEHv8 Module 18 Buffer OverflowUploaded byRifqi Multazam
- 3.1 PermutationsUploaded byAndre Piper
- Pervasive contaminations in sequencing experiments are a major source of false genetic variability: a Mycobacterium tuberculosis meta-analysisUploaded bykjhgfghjk
- CS301 4th Solution Fall 2012 Share By RomeoUploaded byMuhammad Zaeem
- Genome Truncation vs Mutational Opportunity - j22!1!111-119Uploaded byadikira2
- Cardan and CryptographyUploaded byAshish Yadav
- M422.S03.final_copyUploaded byukg12345
- A Class of stochastic optimization problems with application to selective data editingUploaded bypaundpro
- MathematicsUploaded byCC
- 1030154608-WitulUploaded bymanusansano
- R. Piergallini and D. Zuddas- A Universal Ribbon Surface in B^4Uploaded byJupwe
- 013Uploaded byFandis
- lecture06Uploaded byvv
- [Ben Ayed M., El Mehdi K., Pacella F.] Blow-up and(BookSee.org)Uploaded bypaqsori
- CountingUploaded byThuy Nguyen
- genitic syllibusUploaded bySuliman Garalleh
- Strassen's AlgorithmUploaded bygprasadatvu
- matematica filogenomicaUploaded byjuan
- daa fileUploaded byYog Kunwar
- bstreeUploaded byado_ado

- Guide to the Entrance 2013Uploaded byforbugmenot
- lec09.448Uploaded byMukul Bhalla
- lec08.448Uploaded byMukul Bhalla
- lec07.448Uploaded byMukul Bhalla
- lec04.570Uploaded byMukul Bhalla
- phylogeny_29Jan2007Uploaded byMukul Bhalla
- lec06.448Uploaded byMukul Bhalla
- lec05.448Uploaded byMukul Bhalla
- lec04.448Uploaded byMukul Bhalla
- lec03.448Uploaded byMukul Bhalla
- lec05.570Uploaded byMukul Bhalla
- phylogeny_5Feb2007Uploaded byMukul Bhalla
- phylogeny_31Jan2007Uploaded byMukul Bhalla
- phylogeny_24Jan2007Uploaded byMukul Bhalla
- phylogeny_22Jan2007Uploaded byMukul Bhalla
- phylogeny_17Jan2007Uploaded byMukul Bhalla
- phylogeny_15Jan2007Uploaded byMukul Bhalla
- MarkovChainsAndHMMs CompleteUploaded byMukul Bhalla

- Maths WorkshopUploaded byAmarnath Murthy
- mpc5Uploaded bySurya Budi Widagdo
- Design of 32- Point FFT Algorithm - A Literature ReviewUploaded byInternational Journal for Scientific Research and Development - IJSRD
- FDUploaded byOvidiu Bucsoiu
- Chapter 4 - Complex Numbers.pdfUploaded byRayan
- 0607_s12_qp_21Uploaded byJuan Chavez
- Introductory Econometrics a Modern Approach 6th Edition Wooldridge Test BankUploaded byMinh Duy Hoàng
- Theory of Matrix Structural AnalysisUploaded bymansoor azam
- Lecture-34.pdfUploaded byVishal143ds
- 1045Uploaded byEmriedenaelliEllicheme Jlk
- Comparative Study of Data Mining Methods InUploaded byJaviera Elisa Arancibia Cádiz
- Dsp Lab -15ecl57 Part_iUploaded byvsuresha
- 4 Introduction to Integrals and DerivativesUploaded byranbus
- Tutorial 2Uploaded byamit100singh
- 2013_IJC_P1Uploaded byYvonne Gohoho
- Operation Management-NPTEL-Lec06Uploaded byRangith Ramalingam
- Probability TermsUploaded byMatt Gallion
- Boyle Broadie Glasserman Mc Overview JedcUploaded byKofi Appiah-Danquah
- EX-0604Uploaded bysayurisayuri
- Dynamics of StructuresUploaded byivan
- Math and the BibleUploaded byLane Chaplin
- Outwp Addwt Amt VwtclUploaded byrfs_2008
- Paper itcpUploaded byShivani Parhad
- Proceeding ICMCS 2013Uploaded byRita Komalasari
- chap2Uploaded byUzma Azam
- Step by Step Procedure of ABCUploaded bySheetal Soni
- ap physics 1investigation4conservationofenergy.pdfUploaded byMir'atun Nissa Quinalendra
- Decision TreeUploaded byRohit Gupta
- COMANDOS STATAUploaded byanon_267349578
- Sequences Achen UniversityUploaded bypgolan