You are on page 1of 3

Journal of Bioinformatics and Computational Biology Vol. 10, No. 4 (2012) 1203002 (3 pages) c Imperial College Press # .

DOI: 10.1142/S0219720012030023

J. Bioinform. Comput. Biol. 2012.10. Downloaded from www.worldscientific.com by 85.74.84.134 on 10/23/12. For personal use only.

A SHORT INTRODUCTION TO SOME RECENT PROGRESS IN PHYLOGENETIC NETWORK RECONSTRUCTION, GENOME MAPPING, GENE EXPRESSION ANALYSIS, MOLECULAR DYNAMIC SIMULATION, AND OTHER PROBLEMS IN BIOINFORMATICS

LIMSOON WONG Managing Editor

Phylogenetic network is a way to describe evolutionary histories that have undergone evolutionary events such as recombination, hybridization, or horizontal gene transfer. The level, k, of a network determines how non-treelike the evolution can be, with level-0 networks being trees. A number of methods for constructing rooted phylogenetic network from triplets have been proposed in the past.1,2 In this issue, Gambette et al.3 discuss how to generalize these methods to construct unrooted phylogenetic network from quartets. The paper has three main contributions: (1) it gives an On 5 1 n; n time algorithm to compute the set of quartets of a network; (2) it shows that level-1 quartet consistency is NP-hard; and (3) given a set Q of quartets, it shows that On 4 time is sucient to compute the unrooted level-1 network N such that Q QN , if it exists. Modern DNA sequencers produce an explosive amount of sequence data of relatively short read lengths. A number of fast genome mapping tools, which use the BurrowsWheeler transforms4 for seed search and dynamic programming for extension, have been developed. Myers proposed an elegant dynamic programming method for this problem that uses bit-parallelism for approximate string matching. However, it comes with a restriction that the query length should be within the word size of the computer. In this issue, Kimura et al.5 propose a modication of Myers' algorithm that removes the restriction on the query length. Gene expression analysis is a powerful way to detect the biological signature of a disease.68 In this issue, Han and Dong9 introduce new ideas to optimize the diversity of decision trees in an ensemble classier, CABD, for gene expression prole classication. CABD is shown to be superior to other ensemble methods. Moreover, the diversied features produced by CABD are also useful for improving the performance of other classiers, e.g. SVM. In another paper in this issue, Xu10 describes an approach to identify dierentially expressed genes in non-homogeneous time course
1203002-1

L. Wong

gene expression experiments. The approach needs no assumption on the distribution of the observations and works even when there are as few as triplicate samples over four or ve time points under multiple groups. The most challenging analysis problems in biology and medicine often involve a complex multi-step process of integrating and extracting information from multiple sources of data, formulating and testing plausible hypotheses, and eventually applying that knowledge to make useful predictions.11,12 In this issue, Limaye et al.13 develop the Anvaya workows environment for automated and streamlined genomic data analysis. Anvaya has a coordinated system comprising several bioinformatics tools and databases, and supports the execution of a set of analyses tools in series or in parallel. It also comes with a set of 11 pre-dened workows for frequently used pipelines in genome annotation, as well as a nice user interface for dening analysis workows and monitoring workow execution status. Molecular dynamic simulations are increasingly being used on larger systems, being run for longer time, and being applied more often.14,15 However, analysis of even a single simulation run is complicated and time-consuming. In this issue, Benson and Daggett16 present a novel algorithm for describing the evolution of protein structures over the course of a simulation. The algorithm uses a graph representation based on chemical groups to classify parts of a protein structure. This representation and the algorithm provide a more detailed treatment of protein dynamics than previous graph methods. Bioinformaticians often have to work with datasets where the classes of interest are represented by signicantly dierent number of examples. In particular, positive samples are usually outnumbered by negative samples by an order of magnitude or more. This often leads to classiers that have high specicity but low sensitivity. In this issue, Butawita and Palade17 introduce the adjusted geometric mean measure, and they show that class imbalance learning methods deliver better results when they optimize their performance using this new measure. Cellular processes are governed by biological pathways. The modeling and analysis of biological pathways is thus an important problem.1820 In the last paper of this issue, Liu and Thiagarajan21 present a timely and useful tutorial on quantitative models of biological pathways based on ordinary dierential equations, with a particular emphasis on parameter estimation and sensitivity analysis problems. References
1. He Y-J, Huynh TND, Jansson J, Sung W-K, Inferring phylogenetic relationships avoiding forbidden rooted triplets, J Bioinform Comput Biol 4(1):5975, 2006. 2. van Iersel L, Kelk S, Mnich M, Uniqueness, intractability and exact algorithms: Reections on level-k phylogenetic networks, J Bioinform Comput Biol 7(4):597623, 2009. 3. Gambette P, Berry B, Paul C, Quartets and unrooted phylogenetic networks, J Bioinform Comput Biol 10(4):1250004, 2012. 4. Pokrzywa R, New method for yeast identication using Burrows-Wheeler transform, J Bioinform Comput Biol 6(2):403413, 2008.
1203002-2

J. Bioinform. Comput. Biol. 2012.10. Downloaded from www.worldscientific.com by 85.74.84.134 on 10/23/12. For personal use only.

A Short Introduction

5. Kimura K, Koike A, Nakai K, A bit-parallel dynamic programming algorithm suitable for DNA sequence alignment, J Bioinform Comput Biol 10(4):1250002, 2012. 6. Licamele L, Getoor L, A method for the detection of meaningful and reproducible group signatures from gene expression proles, J Bioinform Comput Biol 9(3):431451, 2011. 7. Obidat O, Reddy CK, Ranking dierential hubs in gene co-expression networks, J Bioinform Comput Biol 10(1):1240002, 2012. 8. Olman V, Hicks C, Wang P, Xu Y, Gene expression data analysis in subtypes of ovarian cancer using covariance analysis, J Bioinform Comput Biol 4(5):9991014, 2006. 9. Han Q, Dong G, Use attribute behavior diversity to build accurate decision tree committees for microarray data, J Bioinform Comput Biol 10(4):1250005, 2012. 10. Xu J, A new method for nonhomogeneous time course expression analysis, J Bioinform Comput Biol 10(4):1250007, 2012. 11. Kanagasabai R et al., A workow for mutation extraction and structure annotation, J Bioinform Comput Biol 5(6):13191337, 2007. 12. Koh CH, Lin S, Jedd G, Wong L, Sirius PSB: A generic system for analysis of biological sequences, J Bioinform Comput Biol 7(6):973990, 2009. 13. Limaye B et al., Anvaya: A workows environment for automated genome analysis, J Bioinform Comput Biol 10(4):1250006, 2012. 14. Terentiev AA et al., Modeling of three dimensional structure of human alpha-fetoprotein complexed with diethylstilbestrol: Docking and molecular dynamics simulation study, J Bioinform Comput Biol 10(2):1241012, 2012. 15. Hwang S et al., Discovery and evaluation of potential sonic hedgehog signaling pathway inhibitors using pharmacophore modeling and molecular dynamic simulations, J Bioinform Comput Biol 9(Suppl. 1):1535, 2011. 16. Benson NC, Daggett V, A chemical group graph representation for ecient highthroughput analysis of atomistic protein simulations, J Bioinform Comput Biol 10(4):1250008, 2012. 17. Butawita R, Palade V, Adjusted geometric mean: A novel performance measure for imbalanced bioinformatics datasets learning J Bioinform Comput Biol 10(4):1250003, 2012. 18. Li C et al., Structural modeling and analysis of signaling pathways based on Petri nets, J Bioinform Comput Biol 4(5):11191140, 2006. 19. Ramsey S et al., Dizzy: Stochastic simulation of large-scale genetic regulatory networks, J Bioinform Comput Biol 3(2):415436, 2005. 20. Calcada D et al., Quantative modeling of the Saccharomyces cerevisiae FLR1 regulatory network using an S-system formalism, J Bioinform Comput Biol 9(5):613630, 2011. 21. Liu B, Thiagarajan PS, Modeling and analysis of biopathways dynamics, J Bioinform Comput Biol 10(4):1231001, 2012.

J. Bioinform. Comput. Biol. 2012.10. Downloaded from www.worldscientific.com by 85.74.84.134 on 10/23/12. For personal use only.

1203002-3

You might also like