You are on page 1of 1

Phylogenetic trees guided process discovery

Introduction to Bioinformatics (236523) winter 2013, Computer Science Department, Technion - Israel Institute of Technology

Eliran Natan, Olga Bramnik

Biological processes are one of the basic things that shapes the functionalities, appearance and properties of living organisms. A main issue is the ability to detect whether a given organism performs a certain biological process. An interesting example is Photosynthesis, which is probably the most important process occurs in nature. We have showed, with the use of large existing database and phylogenetic, that it is possible to detect whether an organism performs photosynthesis or not by relying only on the presence of certain proteins in the organism. Moreover, we have generalized the method for any other knows biological process.

We will use the suggested method in order to identify photosynthetic organisms. For the organism set , we have chosen the following organisms:

We wish to develop a systematic method to determine which biological processes occurs in a given organism, based only on the phylogenetic trees of the essential proteins of this process. We will examine our method by applying it on photosynthesis. We will also want to explorer two mysteries half plants/half animal creatures the Elysia chlorotica and the Acyrthosiphon pisum. For the leading protein cutoff , we have chosen the following enzymes:

Method Given a biological process and a set of organisms , the algorithm obtain the subset A which contains only the organisms that preform process . The algorithm executes the following major steps: Compute a 16s rRNA phylogenetic tree for the group . Find a cutoff of proteins which has a leading role in . For each protein in , compute its phylogenetic tree for the group : Compare the trees from (3) with the tree from (1), to compute a gain/loss tree T with root r, using the following two basic rules: For example:

We compute the 16S rRna Tree for all the organisms. Using BLUST, GenBank, and UniPort, we locate the sequence of each enzyme in each organism and summarize it in the following accession keys table:

For each enzyme, we compute a phylogenetic tree according to the accession keys table. Using the rRNA from step (1) and the trees from step (3), and according to steps 4 and 5, we compute the following gain/loss phylogenetic tree: We execute a BFS scan (( )) on the gain/loss tree to get the number of losses and gains () over the path from each organism to the root. Then, for each organism, we calculate the sum of its breakpoints: = () We obtain the subset as follows: = { | = 10} Thus,

Execute a series of procedures in order to minimize the amount of gain/loss events in T, for example:

Analyze T to obtain subset A: for every organism o in T, o if and only if the sum of all breakpoints on the path from r to o is . For example:

Result and Conclusions

All the organisms that were accepted by the algorithm are photosynthetic.

- photosynthesis, carbon fixation, purple bacteria, algae. - Pea aphids may be able to harvest energy from the Sun, Mark Brown - Salamander is world's first photosynthetic vertebrate, Bryan Nelson - First Known Photosynthetic Animal, Teisha rowland.

3 photosynthetic organisms were missed. Clearly, the algorithm had a 100% success among the green plants, and only 25% success among the purple bacteria. We can see that Chlorobium tepidum and Thermomicrobium roseum were missed due to their luck of carbonate dehydratase, and that the Rhodospirillum rubrum was missed due to its luck of Phosphoenolpyruvate carboxylase. Those two proteins are needed when a carbon concentrating mechanism (CCM) is being used. Its is known that several photosynthetic organisms do not use this mechanism. Thus, we suggest to remove CCM proteins from . More interesting points: (a) All the mammals have missed the same protein group, and got the sum 7/10. (b) The Elysia chlorotica and the Acyrthosiphon pisum are the only animals known to preform some kind of photosynthesis. Our research supports the argument that those organisms cannot preform the standard photosynthesis as we know it, and probably use a very different pathway for this purpose.