You are on page 1of 6

PHYLOGENETIC INFERENCE

The crucial issue in systematics is that there is a history of the organisms we wish to
classify, but we don't know that history. We must infer the sequence of branches or
evolutionary transformations that have taken place. There is a true phylogeny which
we may never know, our task is to collect and analyze data to provide the best
estimate of the true phylogeny.

We will work some examples that illustrate the difficulty of this task. Phenetics:
classification based on overall similarity. See fig. 14.4, pg. 378. Matrix of shared
character states. Those taxa with the most number of similar character states are
deemed more similar.

Distance (or similarity) matrix derived from morphological measurements, genetic


distance measures, etc. Each cell in the matrix is a value indicating the degree of
difference (or similarity) between the two taxa. These can be clustered
by UPGMA (unweighted pair group methods with averages). The two most similar
(least distant) taxa are joined to form a group (e.g., taxa 1 & 2); the length of each
branch is half the distance value between the two taxa. The next most similar taxon
(3) is joined to the tree and the distance is calculated as the average of the distance
from taxon 1 to taxon 3 and taxon 2 to taxon 3. At each such step in building a tree,
the number of taxa in the matrix is reduced by one and new distance values are
calculated as the average distance from each member of the group just formed to each
taxon outside that group. This process of adding the most similar new taxon to a group
is continued until all taxa are joined.

The tree produced is a Phenogram and is one way to infer relationships. Why might
this tree not reflect phylogeny (true ancestor descendant relationships)? 1) Variable
evolutionary rates: faster evolving taxon will be more different from all others and
appear as an "outgroup" 2) Homoplasy (convergence) will tend to make character
states similar between unrelated taxa and the UPGMA approach will join them.

Cladistics: classification reflects sequence of branching events, not degree of


difference/similarity. See figures 17.6 and 17.7, pages 471-472. Classification is
on shared derived characters (synapomorphies). Note that relationships are never
based on the absence of characters (e.g., "Invertebrates" makes sense to us, but
refrigerators and pizzas are "invertebrates" because they don't have back bones, but
they clearly are not related to animals. For that matter, plants are invertebrates!). Tree
produced is a Cladogram and is a hypothesis of relationship. A taxon can evolve at
a different rate, but it will tend to accumulate autapomorphies which will not be
shared with any other taxa and thus will affect the branch pattern less (but variable
rate can lead to incorrect cladograms). How about Homoplasies? They will affect the
hypothesis since those characters showing convergences (or parallelisms) will
contradict data from other characters.

This brings us to the topic of Parsimony: in constructing cladograms we seek that


branching pattern which requires the fewest number of evolutionary steps. Example
of marine mammals (chosen since we know that it is an example of a convergence). It
is more parsimonious to evolve fins twice and all the characters that hold mammals
together once, than it is to evolve fins once and all the characters that ally whales with
other mammals twice. We tolerate fins as Homoplasies (=analogies) since it is much
more parsimonious than calling all the mammalian characters homoplasies. See fig.
17.13, pg. 485 and work through it.

Parsimony is central to the cladistic method and can be used for both studying
the Polarity (direction of evolution in a transformation series) of characters and the
confidence of hypotheses of relationships. Example: Drosophila chromosome
banding patterns (e.g., chromosomal inversions, figs. 17.16 and 17.17, pg. 491, 494).
Each species has a distinct pattern of bands in their salivary gland chromosomes. The
sequence of bands appears to have been inverted for certain sections of the
chromosome during evolution. One can determine a network of likely evolutionary
steps from one species to another. Big problem: can start anywhere in the network.
Need to establish where the network begins, i.e. where to Root the tree?

Choose an Outgroup: A taxon (or taxa) that are known to lie outside that group in
question and are thus believed to be ancestral to the ingroup. Requires independent
information. Once properly selected the determination of polarity falls out logically
based on parsimony. the identification of an outgroup can help identify Character
reversals = reversal in a trend of character change. An example is winglessness in
insects: insects evolved from a wingless myriapod ancestor, but there are
groups derived (i.e., more recently evolved) insects that have no wings (fleas). Wings
have been lost in fleas and represent a character reversal. The use of an outgroup is
extremely important in phylogenetic inference as it allows you to determine
the "polarity" or direction of evolution as illustrated with insect wings. Once a
reliable phylogenetic tree has been produced based on a data set of characters properly
rooted with an outgroup, one can use the polarity provided by the outgroup to analyze
the patterns of character evolution in general (how many times does a character
originate during evolution?). See fig. 17.9, pg. 476.
Another means of determining the direction of evolution in a transformation series is
by studying the development of the related taxa. Not as easy as "Ontogeny
recapitulates phylogeny" once claimed because different developmental stages can be
lost either early and/or late in development making difficult in some cases. In general,
however, development can provide resolving power in studies of transformation series
(fig. 17.11, pg. 478).

Compatibility methods: go with the tree that is supported by the largest number of
characters. Said another way: the most likely tree is that which is supported by the
greatest number of independent characters (the largest "clique" of characters) in which
there are no homoplasies.

MOLECULAR SYSTEMATICS

Molecular biology has revolutionized the field of systematics. DNA evolves by


mutations being incorporated in the DNA and fixed in populations. This will lead
to divergence of DNA sequences in different species. Although diverged, we can
refer to two DNA sequences as homologous (just as we would for any morphological
trait such as forelimbs). Nicely demonstrates descent with modification as a
definition of evolution. For this reason, DNA should be an excellent tool for inferring
phylogenies: large number of homologous characters that should be (??) less subject
to convergent evolution than other characters that might lead to a confusion of grade
and clade.

To estimate phylogenies we first must estimate how much sequence divergence there


has been between the various taxa we want to study. Several methods: direct
sequencing. Elegant molecular methods available, a lot of work but provides lots of
data. Each nucleotide position is a character and the actual nucleotide that is present
at that site is the character state.

A character can be phylogenetically informative when nucleotide changes are shared


by two or more taxa. A character can be phylogenetically uninformative when all
nucleotides are the same among taxa, or when only a single taxon has a different
nucleotide.

I U I
A C T C G A C T A G A T
A C T C G T C T A G A T
A C A C G T C T A G A T
A C A C G T C T A C A T
A C A C G T C T A C A T

A less direct method of determining sequence divergence is by restriction enzyme


mapping. These are enzymes that recognize specific sequences in the DNA and cut
the DNA strands. Depending on the location of restriction recognition sites in the
DNA, DNA fragments of various lengths will be generated by the restriction
enzyme digest resulting in a restriction fragment pattern. One can also determine
where various restriction enzymes cut a given piece of DNA and draw up
a restriction map of the stretch of DNA. The extent to which two restriction maps (or
restriction fragment patterns) are similar serves as an estimator of sequence similarity
(or difference). Restriction enzymes will recognize only a fraction of the entire DNA
sequence, so one will not know all the differences between two stretches of DNA. The
data serve as an estimate and because it usually involved less work can be done on
many individuals.

DNA-DNA hybridization is another indirect way of obtaining estimates of DNA


sequence divergence between two taxa. DNA strands are melted apart at high
temperature and allowed to "reanneal" in the presence of the DNA of another species.
One species' DNA has been labeled with a radioactive nucleotide. This
form heteroduplex DNA. The heteroduplex DNA is gradually heated and the amount
of single stranded DNA that has "melted" apart is determined by the amount of
radioactive label that is counted in each fraction collected from the various
temperature steps. Very similar DNA will melt at a high temperature and
heteroduplexes between diverged DNAs will melt at a lower temperature. Sequence
divergence is proportional to melting temperature. See fig. 17.19, pg. 497.

Phenetic approaches: DNA hybridization, sequence divergence from sequencing,


restriction patterns or restriction maps. In each case the data would be in the form of
a single number indication the similarity or difference between the DNAs of each
pair of species in the study.

Cladistic approaches: direct sequencing, restriction maps and restriction fragments


(fragments less desirable). In each case one would look for shared derived character
states (nucleotide, restriction recognition site) among taxa. With restriction
sites shared loss is unreliable as a uniting character because the nucleotide change
could have occurred anywhere in the recognition sequence. Like
refrigerator/pizza/invertebrate example.

Molecular approaches to systematics for us to think about the rates of molecular


evolution. If DNA or proteins evolved at a constant rate in all species, then one could
use estimates of sequence divergence to build very reliable phylogenies. If there was
a molecular clock we could determine the "true phylogeny". Fact is, there is no one
molecular clock.

Different proteins and DNA sequences evolve at different rates. Why?


Different functional constraints. Different proteins do different things and some can
do their structural or functional job with any of several different amino acids at many
of the positions (fibrinopeptides). Other proteins will not function properly with
"any" amino acid changes (histones: two amino acid differences between peas and
cows!). Intron sequences less constrained than coding exon sequences, and hence
introns tend to diverge faster than exons. Synonymous sites evolve faster than non-
synonymous sites, again due to different functional constraints (i.e., some form of
selection against "incorrect" sequences). Nuclear DNA tends to evolve slower than
mitochondrial DNA (in vertebrates). Unit evolutionary period: time required to
observe a given unit of divergence. A 1% divergence of vertebrate mitochondrial
DNA takes about 250,000 to 500,000 years.

What gene do I use??: Depends on the taxa you are studying and the amount of
divergence among them. Histones good for "macrosystematics", fibrinopeptides,
mtDNA good for "microsystematics" or population level phylogenies. See fig. 17.15,
pg. 489.

Note problems with tree building from data: unequal rates and convergence.

As if the choice of gene/protein were not a problem. What if different lineages evolve
at different rates? Test for this with the relative rate test. Compare the paths from
two different taxa to a third taxon. If the paths are the same: taxa are evolving at the
same rate; if not: different rates. Extreme rate fluctuations are a problem; slight ones
are not as they would not lead to regrouping taxa (depending on how "slight" is
defined)

Convergence over long stretches of DNA is unlikely, although it has been reported for
lysozyme. Another kind of "convergence" can occur due to the limited number of
character states in DNA. Back mutations: e.g. A changes to T, T changes to C and C
changes back to A. Could occur in one step or many. Maximal random
divergence: 25% similarity
Nonetheless molecular tools have allowed major leaps in our understanding of
biological diversity: Bacterial evolution: three kingdoms, not five; Endosymbiont
hypothesis, essentially proven; AIDS virus: rapid evolution is good for the virus, bad
for us.

You might also like