Professional Documents
Culture Documents
CG7 Trees
CG7 Trees
Basedonlecturesby
What is phylogenetic analysis and why
should we perform it?
Basedonlecturesby
Common Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
populations,
B species, etc.)
used to infer
C the phylogeny
D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points
(represent hypothetical
ancestors of the taxa)
Basedonlecturesby
Phylogenetic trees diagram the evolutionary
relationships between the taxa
Taxon B
Taxon C
No meaning to the
spacing between the
Taxon A taxa, or to the order in
which they appear from
top to bottom.
Taxon D
Taxon E
Basedonlecturesby
Which species are the closest living
relatives of modern humans?
Humans Gorillas
Chimpanzees Chimpanzees
Bonobos Bonobos
Gorillas Orangutans
Orangutans Humans
14 0 15-30 0
MYA MYA
Mitochondrial DNA, most nuclear DNA- The pre-molecular view was that the great
encoded genes, and DNA/DNA apes (chimpanzees, gorillas and
hybridization all show that bonobos and orangutans) formed a clade separate
chimpanzees are related more closely to from humans, and that humans diverged
humans than either are to gorillas. from the apes at least 15-30 MYA.
Basedonlecturesby
Did the Florida Dentist infect his patients with HIV?
Local control 35
Local control 3
Patient D No
Basedonlecturesby
From Ou et al. (1992) and Page & Holmes (1998)
A few examples of what can be learned
from character analysis using phylogenies
as analytical frameworks:
C A C
B D
C
A D
B E
A C
D
Basedonlecturesby
Inferring evolutionary relationships between
the taxa requires rooting the tree:
B
C
To root a tree mentally,
imagine that the tree is
made of string. Grab the
string at the root and Root D
tug on it until the ends of
the string (the taxa) fall Unrooted tree
opposite the root: A
A B C D
Rooted tree
Note that in this rooted tree, taxon A is
no more closely related to taxon B than Root
it is to C or D.
Basedonlecturesby
Now, try it again with the root at another position:
B
C
Root
Unrooted tree
D
B
C D
Rooted tree
Basedonlecturesby
An unrooted, four-taxon tree theoretically can be rooted in five
different places to produce five different rooted trees
2 4
A C
The unrooted tree 1: 1 5
B 3 D
Rooted tree 1a Rooted tree 1b Rooted tree 1c Rooted tree 1d Rooted tree 1e
B A A C D
A B B D C
C C C A A
D D D B B
These trees show five different evolutionary relationships among the taxa!
Basedonlecturesby
There are two major ways to root trees:
By outgroup:
Uses taxa (the outgroup) that are
known to fall outside of the group of
interest (the ingroup). Requires
some prior knowledge about the
relationships among the taxa. The
outgroup can either be species (e.g.,
birds to root a mammalian tree) or
previous gene duplicates (e.g., outgroup
-globins to root -globins).
By midpoint or distance:
Roots the tree at the midway point A
d (A,D) = 10 + 3 + 5 = 18
between the two most distant taxa in
Midpoint = 18 / 2 = 9
the tree, as determined by branch
10
lengths. Assumes that the taxa are C
evolving in a clock-like manner. This 3 2
assumption is built into some of the B 2
5 D
distance-based tree building methods.
Basedonlecturesby
Each unrooted tree theoretically can be rooted
anywhere along any of its branches
A C
x =
B D
C
A D
B E
C
A D
COMPUTATIONAL METHOD
Optimality criterion Clustering algorithm
Characters
PARSIMONY
MAXIMUM LIKELIHOOD
DATA TYPE
Distances
Basedonlecturesby
Types of data used in phylogenetic inference:
Character-based methods: Use the aligned characters, such as DNA
or protein sequences, directly during tree inference.
Taxa Characters
Species A ATGGCTATTCTTATAGTACG
Species B ATCGCTAGTCTTATATTACA
Species C TTCACTAGACCTGTGGTCCA
Species D TTGACCAGACCTGTGGTCCG
Species E TTGACCAGTTCTCTAGTTCG
Basedonlecturesby
Example 2: Kimura 2-parameter distance
(estimate of the true number of substitutions between taxa)
Computational methods for finding optimal trees:
Basedonlecturesby
Exact searches become increasingly difficult, and
eventually impossible, as the number of taxa increases:
A B
A C
C
B D
C
A D
B E
C
A D
B F E
(2N - 5)!! = # unrooted trees for N taxa
Basedonlecturesby
Heuristic search algorithms are Rerunning heuristic searches using
input order dependent and can get different input orders of taxa can help
stuck in local minima or maxima find global minima or maxima
Search
for global
Search maximum
for global
minimum GLOBAL GLOBAL
MAXIMUM MAXIMUM
local
maximum
local
minimum GLOBAL GLOBAL
MINIMUM MINIMUM
Basedonlecturesby
Classification of phylogenetic inference methods
COMPUTATIONAL METHOD
Optimality criterion Clustering algorithm
Characters
PARSIMONY
MAXIMUM LIKELIHOOD
DATA TYPE
Distances
Basedonlecturesby
Parsimony methods:
Optimality criterion: The most-parsimonious tree is the one that
requires the fewest number of evolutionary events (e.g., nucleotide
substitutions, amino acid replacements) to explain the sequences.
Advantages:
Are simple, intuitive, and logical (many possible by pencil-and-paper).
Can be used on molecular and non-molecular (e.g., morphological) data.
Can tease apart types of similarity (shared-derived, shared-ancestral, homoplasy)
Can be used for character (can infer the exact substitutions) and rate analysis.
Can be used to infer the sequences of the extinct (hypothetical) ancestors.
Disadvantages:
Are simple, intuitive, and logical (derived from Medieval logic, not statistics!)
Can be fooled by high levels of homoplasy (same events).
Can become positively misleading in the Felsenstein Zone:
[See Stewart (1993) for a simple explanation of parsimony analysis, and Swofford
et al. (1996) for a detailed explanation of various parsimony methods.]
Basedonlecturesby
Branch and Bound
Basedonlecturesby
There are many trees..,
Basedonlecturesby
BRANCH AND BOUND
Basedonlecturesby
THE TSP PROBLEM
(especially adapted to israel).
A guard has to visit n check-points whose location
on a map is known. The problem is to find the
shortest path that goes through all points exactly
once (no need to come back to starting point).
1 2 3 4 5
2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
45 25 24
54 52 42
Basedonlecturesby
THE SHP NAVE APPROACH
(1,2,3,4,5)
(1,2,3,5,4)
(1,2,4,3,5)
(1,2,4,5,3)
(1,2,5,3,4)
We can go over the list and find the one giving the
highest score.
Basedonlecturesby
THE SHP NAVE APPROACH
Basedonlecturesby
A TSP GREEDY HEURISTIC
Basedonlecturesby
BNB SOLUTION TO SHP
1 2 3 4 5
2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
Score here
Shortest path 245 145 125 124
already 16:
found so far = no point in
15 45 25 24 expanding
the rest of
54 52 42 the subtree
Basedonlecturesby
Back to finding the MP
tree
Basedonlecturesby
The MP search tree
1
4 is added to branch 1. 3
1 1 1
4 3 4 3 3
4
2 2 2
5 is added to branch 2.
There are 5 branches Basedonlecturesby
The MP search tree
30
4 is added to branch 1.
43 55 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
Basedonlecturesby
MP-BNB
30
4 is added to branch 1.
43 55 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
43 55 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
Best record = 52
Basedonlecturesby
MP-BNB
30
4 is added to branch 1.
43 55 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
Best record = 52
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52 51
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52 51 42
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52 51 42
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
Best record = 52 51 42
Basedonlecturesby
MP-BNB
30
43 55 39
52 54 52 53 58 53 51 42 47 47
53 51 42 47 47
Basedonlecturesby
And Now
Maximum Parsimony is
Computationally Intractable