Le Sy Vin Et Al - 2007 - Towards Phylogenomics Reconstruction

Towards Phylogenomic Reconstruction
Le Sy Vinh1 & Andrés Varón1,2 & Daniel Janies3 & Ward C. Wheeler1
1
Division of Invertebrate Zoology, American Mu- mize the total edit cost between genomes. The cost is
seum of Natural History, New York composed of three factors: nucleotide transformation
2
Computer Science Department, The City Univer- costs between loci, indel costs of loci, and rearrange-
sity of New York ment costs between locus orders. This approach is
3
Department of Biomedical Informatics, Ohio State embedded within a direct optimization scheme to re-
University construct phylogenies from whole unaligned genomes.
Performance of this approach is demonstrated in
Corresponding author: our software, POY4, to reconstruct phylogenies from
Le Sy Vinh Coronavirus and Poxvirus genomes.
Division of Invertebrate Zoology, American Museum
of Natural History, Central Park West at 79th Street,
10024 New York, New York 1 Introduction
Voice: 212-769-7582
Fax: 212-769-5277 Understanding evolutionary relationships among
Email: vle@amnh.org species is one of the central objectives in biology. The
evolutionary relationships of species can be presented
Conference: Contributed to the 2007 International by a tree (phylogeny) in which leaves represent ob-
Conference on Bioinformatics & Computational served taxa, internal nodes represent inferred ances-
Biology tors. The rapid development of efficient sequencing
technologies has resulted in a huge amount of genetic
Keyword: phylogenetic reconstruction, genome data. These data enable researchers to study evo-
comparison, chromosome comparison, direct opti- lution at the molecular level. To date, phylogenies
mization are typically reconstructed based on isolated genes [1,
and references therein]. A growing number of avail-
able genome sequences leads us to develop approaches
Abstract to comparative analysis of whole genomes. Evolu-
tionary change of genomes is complex and subject to
Reconstructing phylogenies is one of the primary ob- both small and large scale variations. The small scale
jectives in evolution studies. Efficient software to processes act at nucleotide level, i.e, nucleotide sub-
reconstruct phylogenies based on isolated genes has stitution and indel. The large scale processes act at
existed for decades, yet, phylogenetic reconstructions locus level, i.e., locus rearrangement, locus indel, and
from whole genomes are only beginning. The diver- horizontal gene transfer. Figure 1 shows an example
sification of genome sequencing projects has gener- of locus rearrangement operations in genomes.
ated thousands of whole genomes making phyloge- The typical approach to compare the evolution-
nomic reconstruction a challenging research topic. In ary relationships among species is to analyze nu-
this paper, we present an approach for pairwise align- cleotide transformations among isolated homologous
ment construction which deploys both nucleotide and sequences. Besides, locus orders of genomes also re-
locus (a segment of nucleotides) operations to mini- veal phylogenetic signals, hence, they can be used for
1
Original order of loci g g g g g
1 2 3 4 5
phylogeny reconstruction [2, and references therein].
However, the locus order-based approach is unlikely Loci from position 3 to 5 are inverted g g −g −g4 −g3
1 2 5
applicable to closely related genomes because their Loci from position 3 to 5 are moved g g g g g
3 4 5 1 2
locus orders are usually identical. Moreover, current to position 1
implementations that use locus order to infer phy- Loci from position 3 to 5 are moved −g −g −g3 g g
to position 1, then inverted 5 4 1 2
logeny ignore all nucleotide transformations which
might contain orders of magnitude more informa- Figure 1: Three types of locus rearrangements on a
tion [2, 3, 4]. Thus, combining phylogenetic signals genome
from both nucleotide transformation and locus order
is useful to understand the evolutionary relationships
among species.
Approaches to align genomes have been developed quences, this problem has been investigated [11, and
previously [5, 6, 7, 8, 9]. The first class of approaches references therein]. We are working on genome pair-
considers genomes merely as long sequences. Se- wise alignment at various levels of generality and dif-
quence partitioning strategies are often used to over- ficulty. The first level considers each genome as a long
come limits of time and memory needed to align long annotated sequence (each genome is pre-divided into
sequences. However, this class is not applicable when segments using annotations as boundaries) [4, 12].
locus rearrangements have occurred in genomes. The Alignments between two annotated genomes can be
second class concentrates on detecting conserved ar- constructed using pairwise alignment while allowing
eas among genomes, which are subsequently aligned rearrangements in gene order [12]. However, this ap-
using traditional sequence alignment techniques. Al- proach is not applicable when annotations are not
though this class allows locus rearrangement oper- well defined.
ations, it suffers from two principle shortcomings: To relax the assumptions required by the annota-
large sequence areas might be excluded from align- tion approach, the second level of our approach con-
ments (especially for distantly related genomes) and siders each genome as a single chromosome-genome
one locus might be aligned with several loci. To our (SC-genome). This type of data covers a wide range
knowledge, there does not exist any approach to align of genomes, e.g, viral genomes. In this paper, we pro-
two genomes such that: (1) all loci are determined pose an algorithm for the construction of a compre-
automatically, (2) each locus is either aligned with hensive SC-genome pairwise alignment. In the third
only one hypothetically homologous locus or consid- level, each genome is represented as a set of chromo-
ered as a locus indel, (3) loci are allowed to rear- somes in which locus operations inside chromosomes
range, (4) minimizing the total cost to transform one and between chromosomes are allowed. The pro-
genome into another genome comprising nucleotide posed algorithm for comprehensive SC-genome pair-
transformation costs, locus indel costs, and locus or- wise alignment can be extended to reconstruct a com-
der rearrangement costs. We call this alignment a prehensive genome pairwise alignment.
comprehensive genome pairwise alignment. The rest of paper is organized as follows: sec-
To comprehend the evolutionary changes in tion 2 presents an overview of direct optimization
genomes, we must simultaneously analyze multiple scheme to reconstruct phylogenies from unaligned se-
genomes linked by phylogeny. ‘Direct optimization’ quences. In section 3, we describe an approach to
(DO) simultaneously evaluates sequence homologies reconstruct the comprehensive SC-genome pairwise
and tree topologies to reconstruct phylogenies from alignment which is subsequently embedded within
unaligned sequences [10]. A core task in DO is the the direct optimization scheme to reconstruct phy-
determination of hypothetical ancestor sequences ac- logenies from unaligned genomes. Experiments on
cording to their descendent’s sequences. In other viral genome data sets to demonstrate the ability of
words, the task is the construction of the pairwise our approach are presented in section 4. Discussions
alignment between two sequences. For isolated se- and open problems are addressed in the last section.
2
A B
a
α C
β
b
Figure 2: Topology and ancestor sequences are deter-
x x x x x
mined to minimize the total number of evolutionary 1 2 3 4 5
events
y y y y y
1 2 3 4 5
2 Direct optimization c
Direction optimization (DO) is a heuristic algorithm x x x x x
1 2 3 4 5
to the general NP-hard tree alignment optimiza- y y y y y
5 1 4 3 2
tion problem [13, 10]. DO searches tree topologies
combined with reconstruction of ancestral sequences d
nodes in search of a global optimum of the the total
Figure 3: a: non-rearranged seeds constitute a block,
number of evolutionary events such as mutations and
b: consecutive blocks are connected into a large
insertion deletions but can also include locus level
block, c: blocks are used as anchors to divide genomes
events. The key procedure is the determination of
into loci, d: loci are aligned allowing order rearrange-
hypothetical ancestor sequences based on observed
ments
sequences following the post order travel. For exam-
ple, ancestor sequences in Figure 2 is determined as
follows. The pairwise alignment between A and B
is constructed. As a result, the median sequence of this end, first an algorithm is developed to detect
A and B is inferred from the pairwise alignment and conserved areas (blocks) between two genomes. It is
assigned to ancestor α. Following this, the median not a novel algorithm but rather a consolidation of
sequence of α and C is determined and assigned to existing techniques to handle genomes from different
ancestor β. The tree cost is calculated as the total levels of diversity [15, 6, 16, 9]. Conserved areas serve
number of evolutionary events over all branches. The as anchors to divide each genome into a sequence of
phylogeny with minimum cost is considered as best. separated loci. Loci of the same block are initially
Since number of tree topologies increases combinato- considered homologous. To create the comprehen-
rially with number of sequences, heuristics must be sive SC-genome pairwise alignment, homologies (or
applied to search for the best phylogeny in acceptable indels) of other loci are determined by reconstruct-
time [14]. ing pairwise alignment between two locus sequences
when locus rearrangements are allowed.
3 Comprehensive SC-genome
3.1 Detecting conserved areas
pairwise alignment
To detect conserved areas between two SC-genomes
Here, we present an approach for construction of a X and Y , three concepts, namely segment, seed, and
comprehensive SC-genome pairwise alignment. To block, are introduced to build up the algorithm pro-
3
gressively. To simplify the description of algorithm, related genomes consist of about two seeds per thou-
genome orientations are not considered. However, sand nucleotides. According to the observation, the
this algorithm can be extended to integrate genome default of r is 1000.
orientations into real implementations. Experiments also show that seeds of over fifty nu-
Let (sX , eX ) denote a segment on X starting from cleotides appears rarely, even between closely related
nucleotide sX and ending at nucleotide eX (sX ≤ genomes. That means individual seeds must be con-
eX ). Intuitively, we say segment (sX X
1 , e1 ) is be- nected to construct larger conserved areas (blocks).
fore segment (s2 , e2 ) on X, denoted (sX
X X X
1 , e1 ) < Precisely, a list of seeds (S1 , S2 , . . . , Sn ) can be con-
X X X X
(s2 , e2 ), if e1 < s2 . The distance between two nected into a block b if Si < S(i+1) and Si ↔ S(i+1)
segments is dX X X
12 = s2 − e1 . for i = 1 . . . (n − 1) as illustrated in Figure 3a. Con-
A pair of two identical segments (sX , eX ) and sider two block b1 and b2 , block b1 is before block
(s , eY ) is called a seed between X and Y , denoted
Y
b2 , denoted b1 < b2 , if seeds of b1 are before seeds of
S = (sX , eX , sY , eY ). Seeds expose signals of con- b2 . The block score is calculated as the sum of seed
served areas between X and Y . To eliminate spu- scores and connecting seed scores.
rious signals, seeds whose lengths are smaller than a Due to the ordered property of seeds, the maxi-
predefined seed length threshold ls are excluded from mum score block can be constructed using dynamic
analysis (ls = 9 as default). All seeds between X and programming. We are now ready to present an algo-
Y can be quickly identified using a suffix tree-based rithm to detect conserved areas.
algorithm [17]. The number of seeds depends on two
factors: the divergence between X and Y , and the Detecting conserved areas algorithm:
seed length threshold lb . The score of a ls long seed
S is assigned by (ls × c) where c is a predefined pa- 1. Find the seed list L between X and Y . Set block
rameter (c = 100 as default). Obviously, a better list B ← ∅.
seed is longer and have a higher score.
Consider two seeds S1 = (sX X Y Y
1 , e1 , s1 , e1 ) and 2. Find the maximum score block b. Add b into
X X Y Y
S2 = (s2 , e2 , s2 , e2 ), seed S1 is before seed S2 if block list B. Remove seeds in block b from L. If
segments of S1 are before segments of S2 on both X L is not empty, repeat step 2.
and Y . Distance d12 and shift g12 between S1 and S2
(S1 < S2 ) is measured as d12 = max {dX Y
12 , d12 } and 3. Remove blocks in B whose lengths are smaller
X Y
g12 = |d12 − d12 |, respectively. The shift g12 indicates than a significant block length threshold lb (lb =
the minimum number of nucleotide indels needed to 100 as default). This guarantees that remained
align two segments in between two seeds S1 and S2 . blocks indicate strong and reliable signals of con-
The connecting score of S1 and S2 can be estimated served areas.
approximately as (o + (g12 − 1) × e) where o and e
are two predefined parameters (o = −200, e = −10 as
4. Resolve overlapped blocks. Specifically, if two
default). The default values of c, o and e parameters
blocks in B are overlapped, the smaller score
are determined by experiments such that this score
block is removed.
system can be used as a good optimal criterion for
connecting seeds into larger areas.
Two seeds S1 and S2 are said to be non-rearranged, 5. Connect consecutive blocks in B to create larger
denoted S1 ↔ S2 , if their distance d12 is not greater blocks as illustrated in Figure 3b (blocks b1 and
than a predefined non-rearranged threshold r. In b2 are consecutive if b1 < b2 and @b ∈ B in be-
other words, it is unlikely that rearrangement op- tween).
erations can be occurred in between non-rearranged
seeds if they are connected. Experiments on real data 6. Output blocks in B as hypothetically conserved
shows that conserved areas between two moderately areas between X and Y .
4
3.2 Reconstructing pairwise align- Torovirus Group 1 Group 3 Group 2 Coronaviruses
outgroup
ment with rearrangements
Transmissible gastroenteritis virus
Bat coronavirus BtCoV/279/2005
Murine hepatitis virus strain A59

Porcine epidemic diarrhea virus
Avian infectious bronchitis virus
Murine hepatitis virus strain 2
Human coronavirus OC43

Experiments with a large range of real data show
Human coronavirus 229E

that conserved areas often cover only a part of whole
Bovine coronavirus
SARS coronavirus
genomes. That means homologies (or indels) of
Breda virus
other areas must be determined in order to gather
additional phylogenetic signals. To this end, con-
served blocks are deployed as anchors to partition
X and Y into separated loci as demonstrated in
Figure 3c. Thereafter, X and Y are represented
as two sequences of loci X = (x1 , x2 , . . . , xp ) and
Y = (y1 , y2 , . . . , yq ), respectively. Let λ denote a
locus indel.
Here, we briefly define the pairwise alignment with
rearrangements (PAR) between X and Y (the prob-
lem is fully described in [12]). Let C(xi , yj ) ∈ R+
be the edit cost to transform locus xi into locus yj
for i = 1 . . . p, j = 1 . . . q. Note that C(xi , λ) and Figure 4: Genome-based phylogeny of 11 Coron-
C(λ, yj ) are respective indel costs of xi and yj . Since aviruses where SARS-CoV shares a common ancestor
xi and yj are segments of nucleotides, the edit cost with CoV from bats instead of small carnivors
C(xi , yj ) can be calculated as the minimum num-
ber of nucleotide transformations between xi and yj .
Consider two loci xi and yj of the same conserved proaches which compromise between time complexity
block, they must be aligned together. To guarantee and alignment quality have been proposed [12]. In a
that, all edit costs C(xi , yj0 ) and C(x0i , yj ) are set to nutshell, the alignment Ar presents the comprehen-
infinity, i0 6= i, j 0 6= j. sive pairwise alignment between two SC-genomes X
We denote Yr a permutation of Y , that is, Yr con- and Y .
sists of the same set of loci in Y but in different
orders. Let R(Y, Yr ) be the rearrangement distance
function between Y and its permutation Yr . Typi- 4 Experiments
cally, R(Y, Yr ) is computed as the breakpoint distance
or inversion distance [18, 19]. The rearrangement cost We are focusing on analyzing viral genomes to under-
between Y and Yr is measured as R(Y, Yr ) × cr where stand the origin and prediction of pathogenicity. To
cr is the rearrangement operation cost. our knowledge, there does not exist any software to
Given edit cost matrix C and rearrangement dis- reconstruct phylogenies from unaligned genomes. To
tance function R, we construct the pairwise align- demonstrate the potential of our approach, we ana-
ment Ar allowing locus rearrangements such that lyze Coronavirus genomes and Poxvirus genomes on
minimizing the total cost which is composed of three a 3.2 GHz PC.
factors: edit costs between loci, indel costs of loci Severe Acute Respiratory Syndrome (SARS) is a
and rearrangement costs of locus orders. Figure novel human illness caused by a previously unrec-
3d illustrates the Ar between two sequences X = ognized Coronavirus, SARS CoV, of zoonotic origin
(x1 , x2 , x3 , x4 , x5 ) and Y = (y1 , y2 , y3 , y4 , y5 ). Note [20]. Between November and August 2003, there were
that the PAR problem does not require genomes to 8,422 cases and 916 deaths from SARS (WHO, 2003).
have the same number of loci. Since an exact solution Although more than 219 isolates of SARS CoV have
for PAR problem is likely intractable, heuristic ap- been sequenced very little is known about genome
5
diversity among the family Coronaviridae. Very few
non-SARS Coronaviruses have been sequenced and Variola_Japan
Variola_Ethiopia
the zoonotic potential of the Coronaviridae is not well
understood [21] Variola_Germany
There remain conflicting reports on the animal Camelpox_CMS

Camelpox_M96
reservoir of SARS-CoV. Using small portions of the
CoV genome, Guan et al. (2003) implicate small car-
nivores whereas Li et al. (2005) assert that bats are
Horsepox_MNR76
the animal reservoir of SARS-CoV [22]. Using the en-
tire genome and a diverse sample of Coronaviruses, Cowpox_Brighton
Janies et al. (2007) confirm that small carnivores are

not the reservoir species and bats are the best candi- Cowpox_GRI90
Monkeypox_USA
dates for the reservoir species [23]. Monkeypox_Congo
Cowpox_Germany
The genome of Coronaviruses is comprised of a Monkeypox_Sierra Monkeypox_Liberia
single-stranded, positive-sensed RNA molecule 27-31

kb in length [24]. To increase our understanding
of Coronaviruses and their potential for reasssort-
ment, we have examined a dataset of 11 Coronavirus Figure 5: Genome-based phylogeny of 13 Poxviruses
genomes and a Torovirus outgroup. To search for where five clades are reconstructed: Camelpox, Cow-
the optimal phylogeny, five Wagner trees were built. pox, Horsepox, Monkeypox and Variola.
Subsequently, the best of those was selected and im-
proved further by the subtree pruning and re-grafting
technique and finally considered as the optimal phy- rangements are not frequent between genomes in-
logeny (see Figure 4). The optimal tree found is in side the same group, e.g., zero between Monkey USA
agreement with recently studies that SARS-CoV is and Monkey Sierra, two between Camelpox M96 and
related to group 2 Coronaviruses and shares a com- Camelpox CMS. However, a larger number of rear-
mon ancestor with CoV from bats instead of small rangements occurs between ancestor sequences of dif-
carnivores [22, 23]. Although rearrangements are not ferent groups, e.g., 24 breakpoints between Horse-
found among these 11 Coronaviruses, further experi- pox and Cowpox. The program took approximately
ments with more genomes must be investigated. ten hours indicating the potential of our approach for
To examine our approach with larger genomes, construction of phylogenies from large genomes and
we collected 13 Poxvirus genomes from NCBI more taxa.
(http://www.ncbi.nlm.nih.gov/). The genome sizes
range from 184,900 to 228,250 bp. The pairwise align-
ment between two genomes is constructed in approx- 5 Discussions
imately ten seconds. Interestingly, conserved areas
cover approximately 90% of the whole genomes. The Reconstructing phylogenies from whole genomes re-
average breakpoint distance between two genomes is quires a marriage between genome analysis tech-
four indicating a small number of locus rearrange- niques and phylogenetic reconstruction algorithms.
ments. To search for the optimal tree, five Wag- We present an approach to align whole genomes
ner trees were built. Subsequently, the best of those which incorporates both nucleotide transformations
was improved by nearest neighbor interchanges tech- and locus operations to minimize the cost to trans-
nique. Figure 5 presents the reconstructed phy- form one genome into another genome. The approach
logeny where five clades are constructed: Camelpox, is implemented in a direct optimization scheme to re-
Cowpox, Horsepox, Monkeypox and Variola. Diag- construct phylogenies from unaligned genomes.
nosing the tree shows 132 breakpoints. The rear- The performance of our approach is demonstrated
6
on Coronavirus and Poxvirus genome data sets. The [3] Moret, B. M., Tangy, J., Wangz, L.-S., Tandy-
program reconstructs phylogenies in acceptable time. Warnow. Steps toward accurate reconstructions
Default parameters are determined by experiments of phylogenies from gene-order data. J. Comput.
on real data rather than theoretical establishments. Syst. Sci., 65:508–525, 2002
For example, the seed length threshold lb = 9 is ex-
perimentally considered as the best value for moder- [4] Wheeler, W. C. Chromsomal character opti-
ately related genomes. Although lb can be adjusted, mization. Molecular Phylogenetics and Evolu-
either largely increasing or decreasing lb will decrease tion, 2007. In press
the efficiency of the approach. [5] Delcher, A. L., Kasif, S., Fleishman, R., Pe-
Although the approach is described for single terson, J., White, O., Salzberg, S. L. Align-
chromosome-genomes, we are working on an exten- ment of whole genomes. Nucleic Acids Research,
sion to cope with multiple chromosome-genomes in 27:2369–2376, 1999
which loci operations within chromosomes and be-
tween chromosomes are allowed. [6] Schwartz, S., Kent, W. J., Smit, A., Zhang,
This approach can cope with locus indels and locus Z., Baertsch, R., Hardison, R. C., DavidHaus-
rearrangements, however, the evolution of genomes is sler, Miller, W. Human mouse alignments with
certainly more complicated. Two other kinds of locus blastz. Genome Research, 13:103–107, 2003
operations not considered are horizontal gene trans- [7] Brudno, M., Do, C. B., Cooper, G. M., Kim,
fer and recombination [25, 26]. These processes typ- M. F., Davydov, E., Green, E. D., Sidow, A.,
ically result in additional homoplasy. Our ultimate Batzoglou, S. Lagan and multi-lagan: Efficient
goal is to design an approach incorporating all these tools for large-scale multiple alignment of ge-
operations to reconstruct phylogenies from genomes. nomic dna. Genome Research, 13:721–731, 2003
[8] Brudno, M., Malde, S., Poliakov, A., Do, C. B.,
Couronne, O., Dubchak, I., Batzoglou, S. Glocal
Acknowledgments alignment: finding rearrangements during align-
ment. Bioinformatics, 19:i54–i62, 2003
This paper is based upon work supported by the
U.S. Army Research Laboratory and the U.S. Army [9] Darling, A. C., Mau, B., Blattner, F. R., Perna,
Research Office under grant “Novel analytical and N. T. Mauve: Multiple alignment of conserved
empirical approaches to the origin and prediction of genomic sequence with rearrangements. Genome
pathogenicity” (#W911NF-05-1-0271) and NSF-ITR Research, 14:1394–1403, 2004
grant “Information technology research: Phyloinfor-
matics.” [10] Wheeler, W. C. Optimization alignment: The
end of multiple sequence alignment in phyloge-
netics? Cladistics, 12(1):1–9, 1996
References [11] Waterman, M. S. Introduction to Computational

Biology. Chapman and Hall, London, UK, first
[1] Felsenstein, J. Infering Phylogenies. Sinauer crc press edition, 2000
Associates, Sunderland, Massachusetts, 2004
[12] Vinh, L. S., Varon, A., Wheeler, W. C. Pair-
wise alignment with rearrangements. Genome
[2] Sankoff, D., Nadeau, J. H. Comparative informatics, 17(2):141–151, 2006
Genome: Empirical and Analytical Approaches
to Gene Order Dynamics, Map Alignment and [13] Wang, L., Jiang, T. On the complexity of mul-
the Evolution of Gene Families. Kluwer, first tiple sequence alignment. Journal of Computa-
edition, 2000 tional Biology, 1(4):337–348, 1994
7
[14] Wheeler, W., Aagesen, L., Arango, C. P., [24] Lai, M. Coronavirus: organization, replication
Faivovich, J., Grant, T., Dhaese, C., Janies, D., and expression of genome. Ann. Rev. Microb.,
Smith, W. L., Varon, A., Giribet, G. Dynamic 44:303–333, 1990
homology and phylogenetic systematics: a uni-
fied approach using poy. American Museum of [25] Maddison, W. P. Gene trees in species trees.
Natural History, New York, USA, 2006 Syst. Biol., 46:523–536, 1997
[26] Posada, D., Crandall, K. A., Holmes, E. C. Re-
[15] Altschul, S. F., Gish, W., Miller, W., Myers,
combination in evolutionary genomics. Annual
E. W., Lipman, D. J. Basic local alignment
Review of Genetic, 36:75–97, 2002
search tool. J. Mol. Biol., 215:403–410, 1990
[16] Brudno, M., Chapman, M., Gttgens, B., Bat-

zoglou, S., Morgenstern, B. Fast and sensitive
multiple alignment of large genomic sequences.
BMC Bioinformatics, 4:66, 2003
[17] Gusfield, D. Algorithms on Strings, Trees and

Sequences: Computer Science and Computa-
tional Biology. Cambridge University Press,
New York, USA, 1997
[18] Sankoff, D., Blanchette, M. Multiple genome re-

arrangement and breakpoint phylogeny. J. Com-
put. Biol., 5:555–570, 1998
[19] Hannenhalli, Pevzner, P. A. Transforming cab-

bage into turnip. (polynominal algorithm for
sorting signed permutations by reversals. Pro-
ceedings of the 27th Annual ACM-SIAM Sym-
posium on the Theory of Computing, 178–189.
1995
[20] Ksiazek, et al. A novel coronavirus associated

with severe acute respiratory syndrome. The
New England Journal of Medicine, 348:1953–
1966, 2003
[21] Narvas-Martin, SR., W. Sars: lessons learned

from other coronaviruse. Viral Immunolog,
16:461–474, 2003
[22] Li, et al. Bats are natural reservoirs of sars-like

coronaviruses. Science, 310:676–679, 2005
[23] Janies, D., Habib, F., Alexandrov, B., Pol, D.

Evolution of genomes and host shifts among sars
associated and related coronaviruses. Cladistics,
2007

Le Sy Vin Et Al - 2007 - Towards Phylogenomics Reconstruction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Le Sy Vin Et Al - 2007 - Towards Phylogenomics Reconstruction

Uploaded by

Copyright:

Available Formats

Towards Phylogenomic Reconstruction

Transmissible gastroenteritis virus

Bat coronavirus BtCoV/279/2005

Murine hepatitis virus strain A59

Murine hepatitis virus strain 2

Human coronavirus OC43

Human coronavirus 229E

There remain conflicting reports on the animal Camelpox_CMS

Janies et al. (2007) confirm that small carnivores are

single-stranded, positive-sensed RNA molecule 27-31

References [11] Waterman, M. S. Introduction to Computational

[16] Brudno, M., Chapman, M., Gttgens, B., Bat-

[17] Gusfield, D. Algorithms on Strings, Trees and

[18] Sankoff, D., Blanchette, M. Multiple genome re-

[19] Hannenhalli, Pevzner, P. A. Transforming cab-

[20] Ksiazek, et al. A novel coronavirus associated

[21] Narvas-Martin, SR., W. Sars: lessons learned

[22] Li, et al. Bats are natural reservoirs of sars-like

[23] Janies, D., Habib, F., Alexandrov, B., Pol, D.

You might also like