You are on page 1of 42

EVOLUTION: Lecture 5

Molecular Evolution and


Phylogenetics
Robin Allaby
Aims
• To understand why DNA and proteins are such
good tools for studying evolution

Objectives
• To look at Kimura’s argument
• To examine some key rules he established
• To study the basis of the molecular clock
• To study the basis of molecular phylogenetics
• To see how we decide between Neutral Evolution and Natural Selection
Mutation and Substitution
substitution

mutation

To understand and interpret diversity at the molecular level, we need to


understand the process of how mutations become incorporated into the
population (and become substitutions)
Time for allele fixation/loss

4N generations
(in diploid)

• Kimura was (clearly) a great mathematician, and established some fundamental relationships of alleles in
populations. A very important one was that neutral mutations would take 4N generations to become fixed
in a population (ie become substitutions)
Neutral Theory and the rate of Evolution

Substitution rate Probability of fixation of new mutation

K = 1/2N x 2Nµ Number of mutations in a


generation

K=µ
• Kimura (1968) showed that the probability that a neutral mutation becomes fixed in a population is equal
to its proportion in the population, or 1/2N when a mutation first arises.

• The number of mutations per generation is simply 2Nµ.

• These values cancel out, so the rate of evolution is the mutation rate, regardless of the population size

• In other words, larger populations producer more mutations that take longer to fix, while smaller
populations have fewer mutations, but they are fixed more quickly when they occur.

• Therefore evolution should proceed at a constant rate – like a clock; the molecular clock
Some consequences of Neutral Theory

• Neutral mutations behave in a very predictable and uniform way, clock like

• Genomic regions not under selection will usually evolve more quickly because
of selective constraint in areas under selection. BUT positive natural selection
can subject genes to periods of rapid fixation of new mutations.

• Large populations carry more variation than small populations, which we can
define (θ = 4Nµ)

• Genes that show a relatively large divergence between species, should show a
relatively high level of polymorphism within a species (because of constant
clock)¶

• molecular characters are, for the large part, unperturbed by the conditions of
existence making them good characters for phylogenetic reconstruction.

¶ cf this with Darwin’s prediction about the features that define a species p150, slide 21 lecture 1
Molecular Phylogenetics

• 38 amino acids from pol gene in HIV and related SIV

• although only really noticed in the 80’s in humans this tree shows:
- multiple transmissions
- that according to the molecular clock probably in the 1930’s it entered humans
Molecular Phylogenetics
Various methods available:

• Distance based trees


• Parsimony based trees
• Maximum Likelihood trees
• Bayesian trees using the coalescent
Phylogenetics begins with multiple alignment

• columns of an alignment are considered as homologous characters

• characters may be phylogenetically informative (group species together)

• outgroup helps us to determine which character states are plesiomorphic and


which are apomorphic – they provide a tree with a ‘root’
Distance based trees

• count the number of substitutions between species

• BUT over time there is an increasing chance that the same base or amino acid
site will have mutated more than once, so we have a ‘correction factor’ –
models of nucleotide change

• methods that use distances: UPGMA, Neighbor-Joining

• These are computationally quick


Different genes evolve at different rates
Saturation of
divergence –
multiple hits
occurring here
most of the
time now
Character based methods: parsimony

• parsimony chooses the tree ‘topology’ that is associated with the fewest substitutions

• can be computationally difficult


Parsimony tree of human origins
• used to support ‘Out of Africa’ hypothesis

• the tree was not without problems though there


are a large number of possible topologies:

rooted tree topologies = (2n – 3)!


2n-2(n-2)!

• that’s over 2 million trees for just 10 taxa!!

• in this case there were many equally


parsimonious trees which did not show a neat
separation of Africa and the rest of the world.
Comparing genes between species:
orthologs and paralogs
• many new genes evolve by duplication from old
ones

• processes of subfunctionalisation or
neofunctionalisation. One likely outcome is the
formation of a pseudogene ie total loss of
function. There are a huge number of GPCR
pseudogenes in humans generated through
duplication, for instance.

• orthologs are genes which are separated by


speciation events (we should be comparing these
to infer species relationships)

• paralogs are genes which are separated by gene


duplication events
- inparalogs duplications subsequent to
speciation
- outparalogs duplications prior to
speciation

• instances of ‘interesting bouts’ of evolution


across a genome can be examined by looking at
the inparalogs.
Differential gene loss and the problem of
paralogs
Using the clock to measure speciation
e.g. 13 Mya
Distance = 2µt
fossil calibration point
e.g. Orang Utan
outgroup

root Taxon A e.g. e.g. chimp

Taxon B e.g. human


If Human to Orang distance is 10
substitutions, then we can work out µ:

Taxon C 10 = 2 x µ x 13 Mya
µ = 10 / 26 Mya = 3.8 x 10 -7 subs/yr
Taxon D Now, if there are 4 differences between
humans and chimps, we can say that:

4 = 2 x 3.8 x 10-7 x t
node branch leaf/tip t = 4/(7.6 x 10-7) = 5.2 Mya
How reliable is the molecular clock?

• Different genes evolve at different rates


- functional constraints
- physiological differences
- organismal differences (e.g. generation time effect)
• Functional constraints can change over time
• Different nucleotides/amino acids evolve at
different rates
• Most methods now allow for a clock that
varies, but it is often still applicable.
Genes are not species
An integration of population processes and
phylogenetics: coalescence
• coalescence is based on the observation that all
genes ultimately descended from a single gene

• extrapolates from modern diversity, backwards

• time to most recent common ancestor allele is


given by:

t = 4N(1-1/k)

For k alleles – converges on 4N with increasing


allele number.

• coalescence, and coalescent based trees allow


us to look at population parameters over time

• most often used to estimate time of common


ancestor

• more recently been used with ancient DNA to


track population size through time
Estimation of population size over time –
Bayesian Skyline plots

Hofreiter et al 2009 Current Biology 19 R584


How do we decide if a gene is evolving
neutrally?
• We usually test whether there is significant deviation from neutral expectation
• Such deviation may be due to selection, although often other processes could also be at play e.g.
population expansion – non ideal behaviour
• E.g. an allele may occur at an anomalously high frequency in the population (relative to all the
others)
• An allele may be associated with an extended area of linkage disequilibrium around it (ie low
diversity)
• We can compare interspecies and intraspecies variation – they should correlate (HKA test)
• The tips and nodes of a tree should balance with the extent of allele sharing in the tree (e.g.
Tajima’s D test)
• We could also compare the ratio of synonymous to non-synonymous substitutions, under neutral
conditions 20% of mutations should be synonymous (actually very rarely the case)

Boivin N, Fuller DQ, Dennell R, Allaby R, Petraglia M (2013) Human Dispersal Across Diverse Environments
of Asia during the Upper Pleistocene. ​Quaternary International​300:32-47.
An example of a test of neutrality: Tajima’s D

• Works on the balance of sites supporting nodes and tips in a tree, and the
extent to which alleles are shared among individuals, both of which can be
used to calculate the statistic θ
• Sites which define nodes are segregating sites (S) Average pairwise difference
between sequences (π) is the
heterozygosity

1
3
1 AT
4
2 AT θ = heterozygosity = π
5

2
8
3 GT
9
7
4 GT θ = S/ Σ1/i (for i = 1 to n-1)
6 5 GT
Those base sites 6 AA i.e. 1/1 + ½ … + 1/8
that define these 7 AT
D = π - S/ Σ1/i
nodes are 8 GT
segregating sites 9 AT
If D = 0, then neutral
-ve could be selection or
Tajima F 1989 Genetics 123: 585-9 population expansion
The classic bottleneck model and mutation
load

Drift strong
Selection weak

Robin G. Allaby, Roselyn Ware and Logan Kistler (2018) A re-evaluation of the domestication bottleneck from
archaeogenomic evidence. Evolutionary Applications 12:29-37
Mutation load in humans

Henn et al 2015 Nature Reviews Genetics 16: 333-343


EVOLUTION: Lecture 6

Models in Evolution

Robin Allaby
Aims
• See how models can be used to simulate evolution and gain
insight

Objectives

• Examine Dawkin’s biomorphs


• Examine the locus of selection
• Selfish gene theory from models derived from kinship selection
• Models to explain group selection
A model of embryological development

• Cells divide dichotomously


• ‘genes’ determine the number of divisions
• ‘genes’ determine the shape of the cells

• ‘mimic’ development with a tree


structure.
• demonstrates the rapidity with
which selection can act

Dawkins (1986) The Blind Watchmaker


The Blind Watchmaker Game

http://www.emergentmind.com/biomorphs
What is the locus of selection?
clade

species

population

individual

genome

gene
Group Selection

Williams GC 1966 Adaptation and Natural Selection PUP, Princeton.


Altruism through kinship
“I would lay down my life for 2 brothers or 8
cousins”

rB > C (Hamilton's rule, 1964)

• r is coefficient of relatedness between actor and recipient


• B is benefit to the recipient
• C is cost to the actor
• Therefore correlates that altruism is more likely to spread when
the benefits to the recipient are great, the cost to the actor is
low, and the participants are closely related
Coefficient of relatedness

r = S (0.5)L
The sum of all possible paths to a common ancestor to the power of the generational links.

Not just relatedness that counts though:


A mother would not invest as much in a sister as a daughter because the latter requires more care (ie there is a different value of B
and C)

How to determine B and C?


Inclusive fitness concept
• Inclusive fitness

– Direct fitness = personal production

– Indirect fitness – apparently ‘altruistic’ effects

• Natural selection favouring alleles that increase indirect fitness = Kin Selection

Note that the benefit to the individual is an important component of Hamilton’s rule. So
it is NOT saying that there is selection for ‘heroes’, unless the hero benefits.
But what if there were a gene that compelled
selfless acts?
Selfish genes – green beards:
A gene that compels acts of kindness
to other holders of the gene..
Richard Dawkins

Krieger, Michael J. B.; Ross,


Kenneth G. (2002-01-11).
"Identification of a Major
Gene Regulating Complex
Social Behavior". Science. 295
(5553): 328–332
Krieger, Michael J. B.; Ross, Kenneth G. (2005-
10-01). "Molecular Evolutionary Analyses of
the Odorant-Binding Protein Gene Gp-9 in Fire
Ants and Other Solenopsis Species". Molecular
Biology and Evolution. 22 (10): 2090–2103
Hymenoptera and the selfish gene theory
Queen Drone

GG G

male
gametes
G G G G G are
identical

G GG GG

Mother daughter r = 0.5

Sister r = 0.75 (mother gametes are 50% similar,


See The Selfish Gene p.174-184
father gametes 100% similar)
The altruism of aggression

• Many animals are equipped with lethal weapons.


• Rarely are there mortalities
The logic of animal aggression: game theory
Possible actions:
• D – dangerous tactics (use teeth etc)
high risk of injury, gains more
• C – conventional tactics (display, no contact)
no risk of injury, gains less
• R – retreat
no risk of injury

Possible reactions:
• play D in response to C = ‘a probe’
J. Maynard Smith • play D in response to D = ‘a retaliation’
G.R. Price
Possible strategies (assign probabilities to taking each of the above
actions)
• Mouse: never plays D, retreats in reaction to D, plays C otherwise
• Hawk: always plays D, continues to do so until seriously injured or
opponent retreats
• Bully: plays D if making first move. Plays D in response to C and C in
reponse to D.
• Retaliator: plays C if first making first move, plays C in response to C and
D in response to D
• Prober-retaliator usually plays C, but sometimes D. Reverts to C if
opponent retaliates, but takes advantage and continues with D if
opponent responds to previous D with C.

Attempting to find an Evolutionary Stable Strategy (ESS)

Maynard Smith and Price (1973) Nature 246:!5-18


Total aggression is not an ESS

prober-
mouse hawk Bully Retaliator retaliator
29 80 80 29 56.7 Mouse and bully do better
19.5 -19.5 4.9 -22.3 -20.1
19.5 74.6 41.5 57.1 59.4
against hawks than hawks
29 -18.1 11.9 29 26.9 do
17.2 -18.9 11.2 23.1 21.9

114.2 98.1 149.5 115.9 144.8


Hawk and Dove Model
B = benefit (let B= 50)
C = cost (let C= 100)

Hawk Dove Bourgeois


hawk dove
Hawk hawk -25 ( B/2)+50
– (C/2) B +12.5
= -25 = 50
Dove 0 +25 +12.5
dove 0 B/2
= 25
Bourgeois -12.5 +37.5 +25

You might also like