You are on page 1of 34

Phylogenetic Tree Construction

maximum parsimony and maximum likelihood

January 11, 2021 SHREEYA SHARMA | SECTION A | B.TECH BIOTECHNOLOGY


phylogenetic trees
Phylogenetic trees illustrate the evolutionary relationships among groups of
organisms, or among a family of related nucleic acid or protein sequences
APPLICATIONS:
• Tree of life: Analyzing changes that have occurred in evolution of different organisms
• Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog
detection)
• Follow changes occuring in rapidly changing species (e.g., HIV virus)
data used to create trees

morphological features or molecular data

• Distance-based: The input is a matrix of distances between the species


• Character-based: Examine each character separately.
maximum parsimony
• requires the smallest number of changes to explain the observed
differences
• “The best hypothesis is the one requiring the smallest number of
assumptions”
• This corresponds to the tree that has the least amount of homoplasy
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

I
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

II
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

III
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

I
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

II
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C

III
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
steps of maximum parsimony
• construct all possible trees
• Determine length of each tree
• find the shortest tree
• if more than one tree has the shortest length - equally parsimonious
maximum parsimony - challenges

• how to find the trees


• how to calculate length
find the possible trees
find tree length - Fitch's Algorithm
• root the tree at any internal node or branch
• visit an internal node (x) with no set state (nucleotide etc.) defined but
where the set state of immediate descendants (y.z) is defined
• If y and z have the same set state, assign that to x
• If they have different set states, assign union of (y,z) and add one tree
length
• Repeat until all roots have been reached
find tree length - Fitch's Algorithm
find tree length - Fitch's Algorithm
find tree length - Fitch's Algorithm

(A,C)

length = 1
find tree length - Fitch's Algorithm

(A,C) (A,G)

length = 2
find tree length - Fitch's Algorithm

(A,C) (A,G)
(C,A,G)

length = 3
find tree length - Fitch's Algorithm

(A,C) (A,G)
(C,A,G)

length = 3
(A,C)
maximum likelihood
• choose the tree which makes the data most probable
• The likelihood of a set of data, D, is the probability of the data, given a
hypothesis, θ. The hypothesis will usually come in the form of different
parameters. We denote the likelihood, L, of a set of data, D, as

L = P(D | θ)
maximum likelihood - coin toss

• Data: coin tossed 10 times - 7 heads and 3 tails


• Model: probability of heads = p; probability of tails = (1-p)
• Probability of getting h heads in n number of tosses
maximum likelihood

• Likelihood (model) = Probability (Data|Model)


• Maximum likelihood = set of parameter values that give the highest
possible likelihood
maximum likelihood in phylogeny
• Observed data: Multiple alignment sequences

• Model:
A model of how one ancestral sequence has evolved into thre three above
sequences
maximum likelihood in phylogeny
• Model parameters
⚬ Tree topology and branch lengths
⚬ Nucleotide frequencies (π)
⚬ nucleotide nucleotide substitution rates
computing probability of one column

• Assume that there were ancestral states that evolved to give these
nucleotides
• Assume a tree topology, branch lengths and other parameters, and start
computing at any node.
computing probability of one column
• We assumed that the ancestral nucleotides were both A. But we don't know
that for sure
• Redo the computation for all possible combinations of the ancestral
nucleotides
• Probability should be summed up for all the possible combinations
• Since we have 2 internal nodes and 4 nucleotides, there will be 16 possible
combinations
computing probability of one column
computing probability of an entire alignment
• Probability of individual columns are multiplied to give the probability of
the entire alignment
L = L1 * L2 *... Ln
• In phylogeny software, the summation of logs of the likelihood is done to
prevent underflow
ln(L) = ln(L1) + ln (L2) .... + ln (Ln)
maximum likelihood - advantages
• The method is very appropriate when analyzing a simple data set
containing genetic information.
• When the degree of variance among the genetic data is lower, the
maximum likelihood scores are reliable.
• The results generated through maximum likelihood further confirms the
maximum parsimony scores of a particular phylogenetic relationship.
• Therefore, maximum likelihood analysis acts as a confirmative test.
maximum likelihood - disadvantages

• this method is a slow and intense process.


• In the absence of a single data set, the error output is high.
• Thus, it also makes reproducibility of the results more difficult
maximum likelihood vs maximum parsimony
• Maximum parsimony considers less number of characters, while maximum
likelihood considers a high number of characters
• Maximum parsimony has a higher number of branches as compared to
maximum likelihood
• Maximum likelihood is more reliable than maximum parsimony

You might also like