Professional Documents
Culture Documents
Session 16
Molecular Phylogeny
• Molecular phylogeny is the study of the evolutionary relationships
among organisms or molecules using the techniques of molecular
biology.
• Any phylogenetic tree provides two pieces of information:
• Topology of a tree: defines the relationships of the proteins represented in the
tree.
• The branch lengths: reflect the degree of relatedness of the objects in the
tree.
Topologies and branches of trees
• A phylogenetic tree is a graph composed of branches and nodes.
• Branch (edge) connects any two nodes.
• Nodes (taxonomic units) represent the protein sequence of an
organism.
• An operational taxonomic unit (OTU) is an extant taxon present at an external
node or leaf.
Rooted and Unrooted tree
Rooted tree shows the ancestry relationship: Evolutionary relationship between the OTU’s and their ancestor.
Unrooted tree only shows the relatedness of organisms: when you are trying to understand the
conservancy/diversity in the sequences.
A unrooted tree can be converted to a rooted tree (outgroup necessary).
Types of rooted trees
• Cladograms: Branch length have no meaning
• Phylograms: Branch length represent evolutionary change
• Ultrametric: Branch length represent time, and the length from
the root to the leaves are the same
Enumerating Trees
Branch-and-bound method:
• Provides an exact algorithm for identifying the optimal tree (or trees) without performing
an exhaustive search.
• By considering the tree in each group having the shortest branch lengths, it is possible to
efficiently identify candidates for the optimal tree(s).
• Exhaustive searches is not necessary for trees (or subtrees) having a worse score than
the potential optimal tree.
• Name of this method refers to a boundary that is reached once the search process has identified a
subtree with a suboptimal score.
How does this heuristic algorithm work?
The algorithm detects the tree the shortest total branch lengths from a
dataset of sequences (i.e., the most parsimonious tree).
• This search occurs without evaluating all possible trees, but instead
by performing a series of rearrangements of the topology.
• Each time the algorithm sieves through subtrees and once a tree with
a particular score is obtained, discard all trees for which
rearrangements are unlikely
• The algorithm iteratively establishes the upper limit of the score and
chooses the final tree.
Li et al, 2000
A gene tree differs from a species tree in
two ways
(1) The divergence of two genes from two species may have predated
the speciation event, and cause overestimation of branch lengths.
(2) The topology of the gene tree may differ from that of the species tree.
In particular, it may be difficult to reconstruct a species tree from a
gene tree.
• A molecular clock may be applied to a gene tree in order to date the
time of gene divergence, but it cannot be assumed that this is also the
time that speciation occurred.
Li et al, 2000
Gene tree is not reliable to estimate species
relationships (despite molecular clock)
• Reconstructing a phylogenetic tree based upon a single protein
(or gene) can therefore give complicated results.
• NJ: Neighbor-Joining
Session 17
Distance matrix
• Distance matrices can be generated using different methods.
• Mega can compute distance matrix.
• You can also do it manually.
A B C D E
B 5
C 4 7
D 7 10 7
E 6 9 6 5
F 8 11 8 9 8
• r(A) = 5+4+7+6+8=30 A B C D E
• r(B) = 5+7+10+9+11 = 42 B 5
• r(C) = 4+7+7+6+8 = 32 C 4 7
• r(D) = 38 D 7 10 7
E 6 9 6 5
• r(E) = 34
F 8 11 8 9 8
• r(F) = 44
Step 2: Calculate a modified matrix for each pair of
OTUs using the formula:
A B C D E
B 5
•
C 4 7
D 7 10 7
• For the case of the pair A,B: E 6 9 6 5
• = 5 = -13 F 8 11 8 9 8
r(A) = 30
Calculate M(AC), M(AD) and M(AE) r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Modified matrix
A B C D E
B -13
C -11.5 -11.5
D -10 -10 -10.5
E -10 -10 -10.5 -13
F -10.5 -10.5 -11 -11.5 -11.5
Step 3: Choose as neighbors those two OTUs for
which Mij is the smallest.
• These are A and B and D and E.
• Let's take A and B as neighbors and we form a new node called U.
• Now we calculate the branch length from the internal node U to the
external OTUs A and B.
• S(AU) =
A B C D E
• S(AU) = = 1 r(A) = 30
B 5
r(B) = 42
• S(BU) =d(AB) -S(AU) C 4 7 r(C) = 32
• =5–1=4 D 7 10 7 r(D) = 38
E 6 9 6 5 r(E) = 34
r(F) = 44
F 8 11 8 9 8
Step 4: Define new distances from U to each other
terminal node:
• d(CU) = d(AC) + d(BC) – (d(AB) / 2) = 3
• d(DU) = d(AD) + d(BD) – (d(AB) / 2) = 6
• d(EU) = d(AE) + d(BE) – (d(AB) / 2) = 5
• d(FU) = d(AF) + d(BF) – (d(AB) / 2) = 7
N= N-1 = 5
The entire procedure is repeated starting at step 1
• Create a new matrix:
U C D E
C 3 U
D 6 7
E 5 6 5
F 7 8 9 8