Maximum Parsimony and Likelihood

Phylogenetic Tree Construction
maximum parsimony and maximum likelihood
January 11, 2021 SHREEYA SHARMA | SECTION A | B.TECH BIOTECHNOLOGY

phylogenetic trees
Phylogenetic trees illustrate the evolutionary relationships among groups of
organisms, or among a family of related nucleic acid or protein sequences
APPLICATIONS:
• Tree of life: Analyzing changes that have occurred in evolution of different organisms
• Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog
detection)
• Follow changes occuring in rapidly changing species (e.g., HIV virus)
data used to create trees
morphological features or molecular data
• Distance-based: The input is a matrix of distances between the species

• Character-based: Examine each character separately.
maximum parsimony
• requires the smallest number of changes to explain the observed
differences
• “The best hypothesis is the one requiring the smallest number of
assumptions”
• This corresponds to the tree that has the least amount of homoplasy
maximum parsimony example
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
I
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
II
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
III
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
I
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
II
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
III
1 2 3
A : A A A
B : A C A
C : C A C
D : C C C
steps of maximum parsimony
• construct all possible trees
• Determine length of each tree
• find the shortest tree
• if more than one tree has the shortest length - equally parsimonious
maximum parsimony - challenges
• how to find the trees

• how to calculate length
find the possible trees
find tree length - Fitch's Algorithm
• root the tree at any internal node or branch
• visit an internal node (x) with no set state (nucleotide etc.) defined but
where the set state of immediate descendants (y.z) is defined
• If y and z have the same set state, assign that to x
• If they have different set states, assign union of (y,z) and add one tree
length
• Repeat until all roots have been reached
(A,C)
length = 1
(A,C) (A,G)
length = 2
(A,C) (A,G)
(C,A,G)
length = 3
(A,C) (A,G)
(C,A,G)
length = 3
(A,C)
maximum likelihood
• choose the tree which makes the data most probable
• The likelihood of a set of data, D, is the probability of the data, given a
hypothesis, θ. The hypothesis will usually come in the form of different
parameters. We denote the likelihood, L, of a set of data, D, as
L = P(D | θ)
maximum likelihood - coin toss
• Data: coin tossed 10 times - 7 heads and 3 tails

• Model: probability of heads = p; probability of tails = (1-p)
• Probability of getting h heads in n number of tosses
maximum likelihood
• Likelihood (model) = Probability (Data|Model)

• Maximum likelihood = set of parameter values that give the highest
possible likelihood
maximum likelihood in phylogeny
• Observed data: Multiple alignment sequences
• Model:
A model of how one ancestral sequence has evolved into thre three above
sequences
maximum likelihood in phylogeny
• Model parameters
⚬ Tree topology and branch lengths
⚬ Nucleotide frequencies (π)
⚬ nucleotide nucleotide substitution rates
computing probability of one column
• Assume that there were ancestral states that evolved to give these
nucleotides
• Assume a tree topology, branch lengths and other parameters, and start
computing at any node.
• We assumed that the ancestral nucleotides were both A. But we don't know
that for sure
• Redo the computation for all possible combinations of the ancestral
nucleotides
• Probability should be summed up for all the possible combinations
• Since we have 2 internal nodes and 4 nucleotides, there will be 16 possible
combinations
computing probability of an entire alignment
• Probability of individual columns are multiplied to give the probability of
the entire alignment
L = L1 * L2 *... Ln
• In phylogeny software, the summation of logs of the likelihood is done to
prevent underflow
ln(L) = ln(L1) + ln (L2) .... + ln (Ln)
maximum likelihood - advantages
• The method is very appropriate when analyzing a simple data set
containing genetic information.
• When the degree of variance among the genetic data is lower, the
maximum likelihood scores are reliable.
• The results generated through maximum likelihood further confirms the
maximum parsimony scores of a particular phylogenetic relationship.
• Therefore, maximum likelihood analysis acts as a confirmative test.
maximum likelihood - disadvantages
• this method is a slow and intense process.

• In the absence of a single data set, the error output is high.
• Thus, it also makes reproducibility of the results more difficult
maximum likelihood vs maximum parsimony
• Maximum parsimony considers less number of characters, while maximum
likelihood considers a high number of characters
• Maximum parsimony has a higher number of branches as compared to
maximum likelihood
• Maximum likelihood is more reliable than maximum parsimony

Maximum Parsimony and Likelihood

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maximum Parsimony and Likelihood

Uploaded by

Copyright:

Available Formats

Phylogenetic Tree Construction

maximum parsimony and maximum likelihood

January 11, 2021 SHREEYA SHARMA | SECTION A | B.TECH BIOTECHNOLOGY

morphological features or molecular data

• Distance-based: The input is a matrix of distances between the species

• how to find the trees

• Data: coin tossed 10 times - 7 heads and 3 tails

• Likelihood (model) = Probability (Data|Model)

• this method is a slow and intense process.

You might also like