You are on page 1of 26

Phylogenetics Basics

Molecular Evolution And Molecular Phylogenetics


What data is used to build trees?
• Traditionally:
• Fossil Records
• Morphological features (e.g. number of
legs, beak shape, etc.)
• Available only for certain species.
• Existing fossil data can be incomplete and their
collection is often limited by abundance,
habitat, geographic range, and other factors.
• The descriptions of morphological traits are
often ambiguous due to multiple genetic
factors. Thus, using fossil records to determine
phylogenetic relationships can often be biased.
• For microorganisms, fossils are essentially
nonexistent, which makes it impossible to study
phylogeny with this approach.
Molecular Evolution And Molecular Phylogenetics
• Today
• Mostly molecular data (e.g. DNA and protein sequences)
• As organisms evolve, the genetic materials accumulate mutations over time causing phenotypic
changes.
• Because genes are the medium for recording the accumulated mutations, they can serve as
molecular fossils.
• Through comparative analysis of the molecular fossils from a number of related organisms, the
evolutionary history of the genes and even the organisms can be revealed.
• Advantages:
• Molecular data are more numerous than fossil records and easier to obtain.
• There is no sampling bias involved, which helps to mend the gaps in real fossil records.
• More clear-cut and robust phylogenetic trees can be constructed with the molecular data.
Molecular Phylogenetics
• The advent of the genomic era with tremendous amounts of molecular
sequence data has led to the rapid development of molecular
phylogenetics.
• “study of evolutionary relationships of genes and other biological
macromolecules by analyzing mutations at various positions in their
sequences and developing hypotheses about the evolutionary
relatedness of the biomolecules”
Major Assumptions
• Based on the sequence similarity of the molecules, evolutionary
relationships between the organisms can often be inferred.
• To use molecular data to reconstruct evolutionary history
requires making a number of reasonable assumptions.
• The molecular sequences used in phylogenetic construction are
homologous.
• Phylogenetic divergence is assumed to be bifurcating
• Each position in a sequence evolved independently.
• The variability among sequences is sufficiently informative for
constructing unambiguous phylogenetic trees.
Terminology
Monophyletic, Paraphyletic
and Polyphyletic
1. Monophyletic taxon
• A group composed of a collection of
organisms, including the most recent
common ancestor of all those
organisms and all the descendants of
that most recent common ancestor.
• A monophyletic taxon is also called
a clade.
• Examples : Mammalia, Aves (birds),
angiosperms, insects, etc.
2. Paraphyletic taxon
• A group composed of a collection of
organisms, including the most recent
common ancestor of all those
organisms.
• Unlike a monophyletic group, a
paraphyletic taxon does not include all
the descendants of the most recent
common ancestor.
• Examples : Traditionally defined
Dinosauria, fish, gymnosperms,
invertebrates, protists, etc.
3. Polyphyletic taxon
• A group composed of a collection of
organisms in which the most recent
common ancestor of all the included
organisms is not included, usually
because the common ancestor lacks the
characteristics of the group.
• Polyphyletic taxa are considered
"unnatural", and usually are reclassified
once they are discovered to be
polyphyletic.
• Examples : marine mammals, bipedal
mammals, flying vertebrates, trees, algae,
etc.
Gene Phylogeny Versus Species Phylogeny

• The traditional objective of a phylogenetic tree is to Speciation events Species A


represent the evolutionary relationship between species.
• A species tree is a phylogenetic tree that represents the Species B
evolutionary pathways of a group of species.
Species C

• In a species tree, the branching point at an internal node Species D


represents the speciation event
• Speciation -- the origin of new species from previously
existing ones Species E
• The species evolution is the combined result of evolution
by multiple genes in a genome.
Species tree
Gene Phylogeny Versus Species Phylogeny
• A gene tree is a phylogenetic tree constructed from a single
gene or protein sequence from each of the species under
Gene A
study. Mutation events
• In a gene tree, the internal node indicates a gene
duplication event. Gene B

• Gene phylogeny only describes the evolution of that Gene C

particular gene or encoded protein. Gene D


• Speciation and duplication: two events may or may not
coincide.
• Thus, to obtain a species phylogeny, phylogenetic trees Gene E
from a variety of gene families need to be constructed
to give an overall assessment of the species evolution. Gene tree
Consider the evolution of alpha–hemoglobins in human,
chimp and rat

The phylogeny of the species can be transferred from the gene tree, if the genes are orthologous.
The two events - mutation and speciation- are not expected to occur at
the same time. So gene trees cannot represent species tree.
Counting Trees
• Topology  Branching Pattern of a tree.
• There are many possible rooted and unrooted tree topologies for a
sizable number of taxa.
• 3 taxa  three rooted and one unrooted tree topology
• 4 taxa 15 rooted and three unrooted tree topology
• No of topologies increases with increasing number of taxa.
Counting Trees
Counting Trees
• For labeled bifurcating trees, there are:

total rooted trees and

total unrooted trees, where n represents the number of leaf nodes.


• Among labeled bifurcating trees, the number of unrooted trees
with n leaves is equal to the number of rooted trees with n – 1 leaves.
Phylogenetic Analysis
• Four steps to infer a phylogenetic tree based on homologous
sequences
1. Selection of sequences for analysis
2. Determine the evolutionary distances between the sequences and
build distance matrix
3. Tree building
4. Tree evaluation
1. Selection of sequences for analysis
• DNA:
• Very Sensitive, non-uniform mutation rates
• Higher phylogenetic signal
Protein:
• Phylogenetic signal less predominant than in DNA
• Better to construct a tree for evolutionary distant species
or genes.
• More Uniform mutation rates
• More character states
RNA: rRNA often used for constructing species trees
rRNA is very conserved among species.
Phylogenetic Inference
• Four steps to infer a phylogenetic tree based on homologous
sequences
1. Selection of sequences for analysis
2. Determine the evolutionary distances between the sequences and
build distance matrix
3. Tree building
4. Tree evaluation
2. Multiple Sequence Alignment
• This is a critical step in the analysis as in many cases the alignment of amino
acids or nucleotides in a column implies that they share a common ancestor

• If you misalign a group of sequences you will still be able to produce a tree.
However, it is not likely to be biologically meaningful.
• Inspect the alignment to be sure that all sequences are homologous
• Try different gap and extension parameters to improve the alignment

• Only use these columns of the multiple alignment for which you have data for all
organisms or sequences.
• Delete the columns for which this is not the case. Delete columns with gaps
3. Tree building

Character-based Non-character based


methods methods
Methods based on an Maximum Likelihood Pairwise distance
explicit Methods/Bayesian methods
model of evolution Phylogeny
Methods not based on Maximum Parsimony
an explicit Methods
model of evolution
4. Tree evaluation: bootstrapping
• Bootstrapping analysis gives a way to judge the strength of support
for clades on phylogenetic trees
• Used to evaluate the reliability of the inferred tree or better the
reliability of specific branches
• Show bootstrap values on phylogenetic trees

You might also like