Professional Documents
Culture Documents
EVOLUTIONARY
PROCESSES I
Phylogenetic Trees
Alignments and Phylogeny: What’s the
connection?
Will the sequence alignment will contain traces of the evolutionary history of
these sequences?
Which is larger: the basal rate of DNA mutation or the rate of PAM?
Vocabulary:
species trees
branches or edges
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
populations,
B species, etc.)
used to infer
C the phylogeny
D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points
(represent hypothetical
ancestors of the taxa)
Mutation Rates and Bacterial Growth
Even if only a single S. aureus cell were to make its way into your wound, it would take only
10 generations for that single cell to grow into a colony of more than 1,000 (210 = 1,024), and
just 10 more generations for it to erupt into a colony of more than 1 million (220 = 1,048,576).
For a bacterium that divides about every half hour (which is how quickly S. aureus can grow
in optimal conditions), that is a lot of bacteria in less than 12 hours. S. aureus has about 2.8
million nucleotide base pairs in its genome. At a rate of, say, 10-10 mutations per nucleotide
base, that amounts to nearly 300 mutations in that population of bacteria within 10 hours!
To better understand the impact of this situation, think of it this way: With a genome size of
2.8 × 106 and a mutation rate of 1 mutation per 1010 base pairs, it would take a single
bacterium 30 hours to grow into a population in which every single base pair in the genome
will have mutated not once, but 30 times! Thus, any individual mutation that could
theoretically occur in the bacteria will have occurred somewhere in that population—in just
over a day.
Four different types of rooted phylogenetic tree
Note that for additive trees, the branch length are proportional to the number of
mutations that have occurred. Seems simple, but its not—why?
Three types of trees showing the same evolutionary relationships,
or branching orders, between the taxa
Cladogram Phylogram (additive tree) Ultrametric tree
6
Taxon B Taxon B Taxon B
1
1 Taxon C
Taxon C 3 Taxon C
Taxon A 1 Taxon A
Taxon A
Note that for additive trees, the branch length are proportional to the number of
mutations that have occurred. Seems simple, but its not—why?
For ultrametric tree, which in addition to the properties of the additive tree, has
the same constant rate of mutation assumed along all branches. This last property
is often referred to as a molecular clock, because one can, in principle, measure
the actual times of evolutionary events from such trees.
Examples of a species tree and a gene tree
(A) A species tree showing the evolutionary relationships between seven eukaryotes, with one
more distantly related to the others (Hydra) used as an outgroup to root the tree. Xenopus is
a frog, Catostomus a fish, Drosophila a fruit fly, and Artemia the brine shrimp.
(B) The gene tree for the Na+–K+ ion pump membrane protein family members found in the
species shown in (A). In some species, e.g., Hydra, and Xenopus, only one member of the
family is known, whereas other species, such as humans and chickens, have three
members. The small squares at nodes indicate gene duplications.
Taxon C
No meaning to the
Taxon A spacing between the
taxa, or to the order in
which they appear from
Taxon D top to bottom.
Taxon E
Humans Gorillas
Chimpanzees Chimpanzees
Bonobos Bonobos
Gorillas Orangutans
Orangutans Humans
14 0 15-30 0
MYA MYA
Mitochondrial DNA, most nuclear DNA- The pre-molecular view was that the
encoded genes, and DNA/DNA great apes (chimpanzees, gorillas and
hybridization all show that bonobos and orangutans) formed a clade separate
chimpanzees are related more closely to from humans, and that humans
humans than either are to gorillas. diverged from the apes at least 15-30
MYA.
Did the Florida Dentist infect his patients with HIV?
Local control 9
Local control 35
Local control 3
Patient D No
B D C D D C
Phylogenetic tree building (or inference) methods are aimed at
discovering which of the possible unrooted trees is "correct".
We would like this to be the true biological tree — that is, one
that accurately represents the evolutionary history of the taxa.
However, we must settle for discovering the computationally
correct or optimal tree for the phylogenetic method of choice.
The number of unrooted trees increases in a greater
than exponential manner with number of taxa
A B
# Taxa (N ) # Unrooted trees
C A C 3 1
4 3
5 15
B D 6 105
7 945
C 8 10,935
A D
9 135,135
10 2,027,025
B E . .
. .
A C . .
D
. .
30 !3.58 x 1036
B F E (2N - 5)!! = # unrooted trees for N taxa
Condensed trees showing well-supported features is derived by
applying the bootstrap procedure.
A a. ((A,B,C),(D,E))
B
b. ((C,(A,B)),(D,E))
C
c. (C,(A,B),(D,E)))
D
d. ((A,(B,C)),(D,E))
E
Best DNA alignments require the alignment of
amino acid sequences first
Which is the better
alignment?
(A) The possible transitions (blue) and transversions (red) between the four bases in DNA.
Note that there are twice as many ways of generating a transversion than a transition.
(B) The observed numbers of transitions, transversions, and total mutations in an aligned
set of cytochrome c oxidase subunit 2 (COII) mitochondrial gene sequences from the
mammalian subfamily Bovinae.
Influence of selective pressure on the observed frequency of
synonymous and non-synonymous mutations
Positive Selection:
If protein is more effective as a result of a non-
synonymous mutation, under selective pressure the
mutation is likely to be retained Greater number of
non-synonymous mutations than expected.
Negative Selection:
If a mutation decreases the effective role of a protein,
under selective pressure it is likely to be lost. Negative
selection will result in fewer non-synonymous mutations
than expected, implying that change is being strongly
selected against; that is, the sequence is being
conserved.
In this case:
1st position: 3 of 3 Ncounts as one N site
1. Perform a multiples sequence alignment (MSA). The quality of the MSA is a
critical determinant in the ultimate quality of the phylogenetic tree. Therefore,
every effort should be undertaken to use as much information as possible to
improve the alignment.
a) Possible to use structural information to improve an automatic alignment
(e.g. ClustalW) and manually adjust.
b) Some researchers remove blocks of sequence where the alignments are
ambiguous.
2. To obtain information on DNA sequence evolution, convert the protein MSA to
an MSA of the corresponding DNA sequences using the amino acid alignments
to produce the DNA sequence alignments. The resultant DNA sequence MSA
can provide information regarding the likely selective forces (positive versus
negative selection)
3. Use the MSA to generate a phylogenetic hypothesis using one of the many
phylogenetic tree generating programs currently available.
The evolutionary history of a gene that has undergone two separate
duplication events
(A) A species tree is depicted by the pale blue cylinders, with the branch points (nodes)
in the cylinders representing speciation events. In the ancestral species a gene is
present as a single copy and has function a (blue). At some time a gene duplication
event occurs within the genome, producing two identical gene copies, one of which
subsequently evolves a different function, identified as b (red).