Professional Documents
Culture Documents
• Examination of phylogeny to
determine distance to characterized
molecules
• draw conclusions regarding biological
functions not otherwise apparent
• multiple alignments vs. pairwise
homology
Hbb_Human
1 3 4
Hbb_Horse
Hba_Human Neighbor-joining tree
2 (guide tree)
Hba_Horse
Myg_Whale
alpha-helices
1 PEEKSAVTALWGKVN--VDEVGG 1 3 4
2 GEEKAAVLALWDKVN--EEEVGG Progressive alignment
3 PADKTNVKAAWGKVGAHAGEYGA
4 AADKTNVKAAWSKVGGHAGEYGA
2 following guide tree
5 EHEWQLVLHVWAKVEADVAGHGQ
Creating multiple alignments
Low penalties
Multiple alignments: truncated sequences
Multiple alignments: non-homologous sequences
Constructing phylogenies
• Stages in constructing phylogenies:
• Phylogenetic methods:
• Algorithmic
• Neighbor-joining
• UPGMA
• Tree-searching
• Maximum parsimony
• Maximum likelihood
• Bayesian inference
Principles: A ·
B 0.001 ·
E
0.003
0.336
0.002
0.331
0.019
0.219
·
0.231 ·
Disadvantages:
• The method lacks accuracy because there is no attempt to correct for potential
bias (homoplasy).
• The method lacks precision because the outcome is partly contingent on the tree
with which the search process begins.
Maximum parsimony (MP)
Principles:
• Searches through tree topologies in ‘tree-space’ using a ‘hill-climbing’ algorithm.
• Scores trees on their ‘length’, i.e., the number of character state changes required to
explain the distribution of characters on a given tree topology.
• Looks for the tree with the minimum number of changes, i.e. the topology with the fewest
character changes overall.
Advantages:
• Generally accurate method with few assumptions.
• Phylogenetic hypotheses can be statistically tested by comparing the lengths of different
trees.
• Tree estimation is relatively fast and undemanding.
Disadvantages:
• There are typically several shortest trees, resulting in a potentially ambiguous consensus
topology.
• There is no explicit model of evolution and so the method is prone to error under certain
circumstances, e.g., long-branch attraction (homoplasy).
Maximum likelihood (ML)
Principles:
• Looks for the tree that, under a given model of evolution, maximizes the likelihood of the
observed data
• Applies a complex model of DNA or protein sequence evolution that estimates parameters for
specific substitutions and other qualities of molecular sequences
• Phylogenetic estimation within the likelihood framework provides a robust statistical context
in which to evaluate specific hypotheses.
Disadvantages:
• The complexity of the estimation process means that it is slow and computationally
demanding.
• The hill-climbing algorithm is susceptible to local optima and so does not guarantee to
return the most optimal solution.
Bootstrapping a tree
• Statistical estimate of the
reliability of groupings
• Subsamples of sites in an
alignment are used to
generate trees
• Process is iterated multiple
times (100-1000 times)
• Agreement among the
resulting trees is
summarized with a
majority-rule consensus
tree
Bayesian
Principles:
• Based on the notion of posterior probabilities: probabilities that are estimated, based on
some model (prior expectations), after learning something about the data.
• Uses an MCMC process to search through tree-space.
• Selects the tree-topology with the highest probability, given the data.
Advantages:
• Intuitive
• Potential for any complex model.
• Provides both parameter estimates (i.e., tree) and their probabilities in a single analysis.
• Many different hypotheses can be evaluated in a single analysis.
• The MCMC algorithm makes integrating over all parameter values fast and accurate;
MCMCs are able to break out of local optima.
Bayesian
Disadvantages: Tb93 6
Tb93 7
Tb93 3
• The MCMC must be run long enough for variation
in the parameter estimates to smooth out or reach Tb93 4
Tb93 13
Tb93 5
Tb93 10
Tb93 8
Tb93 11
0.1
Remember
eukaryote
eukaryote
The root defines
eukaryote common ancestry
bacteria outgroup
archaea
Rooted archaea Monophyletic group
by outgroup archaea
eukaryote
eukaryote
Monophyletic
root eukaryote group
eukaryote
Further details
Textbooks:
Software:
Phyml http://atgc.lirmm.fr/phyml/
PAUP* (NJ, MP, ML): http://paup.csit.fdsu.edu
PHYLIP (NJ, MP, ML): http://evolution.genetics.washington.edu/phylip.html
MrBayes (Bayesian): http://mrbayes.csit.fdsu.edu
Splitstree (Networks) http://www.splitstree.org
FindModel (Model Test) http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html
Websites:
MultiPhyl (ML via email) http://distributed.cs.nuim.ie/multiphyl.php