You are on page 1of 7

BIOL 3306 – Evolutionary Biology (Fall 2022)

Sample Problems 3

Ricardo B. R. Azevedo

October 21, 2022

Questions

1. The frequency of a neutral allele in a population in three successive generations is 0.1, 0.15 and 0.05.
What is the probability that the allele will go ultimately to fixation sometime in the future?

2. Consider two alleles A1 and A2 such that the fitnesses of the genotypes at that locus are w11 = 1 + s,
w12 = 1 + s/2, and w22 = 1. The probability that an A1 allele appearing by mutation will go to fixation
is:
π ≈s (1)
The probability that a neutral allele will go to fixation is:
1
π= (2)
2N
where N is the population size. How strong must the selective coefficient s be so that the probability
of fixation of A1 is greater than that of a neutral allele in a population of 1,000 individuals?

3. In Drosophila melanogaster, an allele at the Cy locus causes a mild curly wing phenotype. The relative
fitnesses of the +/+, +/Cy and Cy/Cy genotypes are 1.2, 1.1 and 1, respectively (where + is the wild
type allele).

(a) We establish 100 experimental populations of fruitflies with the following composition:
• +/+ : 1 female, 1 male
• +/Cy : 2 females, 2 males
• Cy/Cy : 1 female, 1 male
Each population is maintained by randomly picking 4 females and 4 males each generation and
allowing them to produce the following generation. How many populations are expected to fix
the Cy allele eventually?

1
Equation 1 (above) is an approximation of the following equation:
1 − e−2Ne s q
π= (3)
1 − e−2Ne s
where q is the frequency of A1 and Ne is the effective population size. Use equation 3, assuming
that Ne is the census population size.
(b) Imagine that a novel + allele appears in a population of 50 Cy/Cy individuals. Compare the
probabilities of fixation of the + allele predicted under equations 1 and 3. Assume that Ne is the
census population size.
(c) Imagine that a novel neutral allele × appears in a population of 50 individuals. Compare the
probabilities of fixation of × predicted under equations 2 and 3. Assume that Ne is the census
population size. (Hint: try successively lower values of s in equation 3.)

4. The following table shows the rates of synonymous (dS) and nonsynonymous (dN) substitution for 6
protein-coding genes based on human-chimpanzee comparisons (Chimpanzee Sequencing and Anal-
ysis Consortium 2005):

Gene Length∗ dS dN
Olfactory receptor (family 2) 320 0.0034 0.0141
Ribokinase 322 0.0031 0.0033
Alcohol dehydrogenase 8 419 0.0030 0.0095
Ring finger protein 137 485 0.0157 0.0038
Keratin 5b 520 0.0330 0.0079
Lamin B receptor 615 0.0048 0.0052
∗ Number of amino acids.

(a) Which gene has evolved fastest as a whole? (Assume that there are 2.7 times more nonsynony-
mous sites than synonymous sites for each gene.)
(b) For which two genes are mutations that change the amino acid sequence most likely to be dele-
terious?
(c) For which two genes are mutations that change the amino acid sequence most likely to be bene-
ficial?

5. Consider two linked loci A and B. Alleles A and B are fixed in population 1. Alleles a and b are
fixed in population 2. A new population, M, is created by taking 400 males and 400 females from
population 1, and putting them together with 100 males and 100 females from population 2.

(a) Calculate the frequencies of each single-locus genotype and allele in the M population.
(b) Calculate the linkage disequilibrium coefficient D in population M. Is the M population in link-
age equilibrium?
(c) In the following generation, the absolute value of the linkage disequilibrium coefficient changed
to |D0 | = 0.1216, without changing sign. Estimate the recombination frequency between the two
loci, assuming random mating.

2
(d) Calculate the single-locus diploid genotype, allele and haplotype frequencies in the generation
following the formation of the M population, assuming random mating and the information given
in the previous questions.
(e) How many generations of random mating will be required to reduce linkage disequilibrium to
|D| ≤ 0.01? (Count from the formation of the M population.)
(f) What will the haplotype frequencies be when the M population reaches linkage equilibrium?

Answers

Note: In what follows values are displayed to a maximum of five significant digits, but all calculations have
been carried out with full precision.

1. The probability of fixation is 5%. The latest allele frequency provides the best estimate. In other
words, genetic drift has no “memory”.

2. Selection on a beneficial allele will overcome the effect of genetic drift if (see equations 1 and 2):
πs > πn
1
s>
2N
1
s>
2000
s > 0.0005
In a population of N = 1000 individuals, a beneficial allele with a selective advantage lower than
s = 0.0005 is effectively neutral.

3. (a) In order to use equation 3 we must calculate q, Ne and s.

In this case + is the beneficial allele. Its initial frequency in each population is q = 0.5.

The effective population size is Ne = 8.

The genotypic fitnesses are:


+/+ +/Cy Cy/Cy
Model 1+s 1 + s/2 1
Fitness 1.2 1.1 1
From this we get s = 0.2.

Applying equation 3 we get:


1 − e−2×8×0.2×0.5
π= = 0.832
1 − e−2×8×0.2

3
Therefore, we expect the Cy allele to go to fixation in 100 × (1 − 0.832) = 17 populations.
(b) From equation 1 we get:
π ≈ 0.2
The new allele appears at frequency:
1
q= = 0.01
2N
From equation 3, assuming Ne = 50, we get:

1 − e−2×50×0.2×0.01
π= = 0.181
1 − e−2×50×0.2
Equation 1 provides a reasonable approximation to equation 3.
(c) The probability of fixation of a neutral allele is given by equation 2:
1
π≈ = 0.01
2N
We cannot use s = 0 in equation 3 directly, but when we apply successively smaller values of s
we get the following:

s = 0.01 , π = 0.015741
s = 0.001 , π = 0.010503
s = 0.0001 , π = 0.010050
s = 0.00001 , π = 0.010005

Therefore, equations 2 and 3 make consistent predictions for a neutral allele.

4. (a) To answer this question, we need to estimate the overall rates of substitution of the different
genes. We do this by calculating a weighted average of dN and dS:

dS + 2.7 × dN
K≈
3.7

Gene Length (AA) dS dN K


Olfactory receptor (family 2) 320 0.0034 0.0141 0.011
Ribokinase 322 0.0031 0.0033 0.003
Alcohol dehydrogenase 8 419 0.0030 0.0095 0.008
Ring finger protein 137 485 0.0157 0.0038 0.007
Keratin 5b 520 0.0330 0.0079 0.015
Lamin B receptor 615 0.0048 0.0052 0.005
We conclude that the Keratin 5b gene is the fastest evolving one (K ≈ 0.015).
(b) To answer this question (and the next two), we must calculate the dN/dS ratio of each gene:

4
Gene Length (AA) dS dN dN/dS
Olfactory receptor (family 2) 320 0.0034 0.0141 4.15
Ribokinase 322 0.0031 0.0033 1.06
Alcohol dehydrogenase 8 419 0.0030 0.0095 3.17
Ring finger protein 137 485 0.0157 0.0038 0.24
Keratin 5b 520 0.0330 0.0079 0.24
Lamin B receptor 615 0.0048 0.0052 1.08
(c) A gene for which most mutations that change the amino acid sequence are deleterious should
show dN/dS < 1. The Ring finger protein 137 and Keratin 5b genes meet this expectation.
(d) A gene for which most mutations that change the amino acid sequence are beneficial should show
dN/dS > 1. The Olfactory receptor and Alcohol dehydrogenase 8 genes meet this expectation.

5. (a) At locus A, the frequencies of AA and aa are 0.8 and 0.2, respectively; the frequencies of alleles
A and a are pA = 0.8 and qa = 0.2, respectively. Similarly for locus B: the frequencies of BB and
bb are 0.8 and 0.2, respectively; the frequencies of alleles B and b are pB = 0.8 and qb = 0.2,
respectively.
(b) There are two genotypes in the M population: AABB and aabb. AABB individuals will only
produce AB gametes, whereas aabb individuals will only produce ab gametes. Therefore, the
haplotype frequencies in M are:
A a
B 0.8 0
b 0 0.2
From this we get:
D = 0.8 × 0.2 − 0 × 0 = 0.16
Since D 6= 0, the M population is not in linkage equilibrium. In fact it shows the highest possible
linkage disequilibrium for those allele frequencies.
Note that if we had organized the table like this:

a A
B 0 0.8
b 0.2 0

we would conclude D = −0.16, which would lead to the same conclusion.


(c) The linkage disequilibrium coefficient in the following generation is given by the following ex-
pression:
D0 = D(1 − r)
where r is the recombination frequency. Substituting, we get:

0.1216 = 0.16 × (1 − r)
r = 1 − 0.1216/0.16 = 0.24

5
(d) In the following generation, the allele frequencies will remain constant and the single-locus geno-
typic frequencies will reach Hardy–Weinberg equilibrium. Therefore, the expected frequencies
at locus A will be:
AA Aa aa
Frequency 0.64 0.32 0.04
Similarly, the expected frequencies at locus B will be:
BB Bb bb
Frequency 0.64 0.32 0.04
The haplotype frequencies in the next generation will be:
A a
B 0.8 − rD 0 + rD
b 0 + rD 0.2 − rD
From question 2b we have D = 0.16 and from question 2c we have r = 0.24. Therefore, rD =
0.0384. Substituting in the above table, we get:
A a
B 0.7616 0.0384
b 0.0384 0.1616
To confirm these calculations, we can calculate the coefficient of linkage disequilibrium:

D = hAB hab − hAb haB =


= 0.7616 × 0.1616 − 0.0384 × 0.0384 = 0.1216

(e) One way to solve this problem is to apply the formula given in question 3 repeatedly until the
condition is met:

D1 = D0 (1 − r) = 0.16 × 0.76 = 0.1216


D2 = D1 × 0.76 = 0.092416
D3 = D2 × 0.76 = 0.070236
D4 = D3 × 0.76 = 0.053379
D5 = D4 × 0.76 = 0.040568
D6 = D5 × 0.76 = 0.030832
D7 = D6 × 0.76 = 0.023432
D8 = D7 × 0.76 = 0.017809
D9 = D8 × 0.76 = 0.013534
D10 = D9 × 0.76 = 0.010286
D11 = D10 × 0.76 = 0.0078175

Using this brute-force approach, we conclude that it would require 11 generations of random
mating.

6
More elegantly, we can infer from the above formulae that:

Dn = D0 (1 − r)n

where Dn is the coefficient of linkage disequilibrium after n generations of random mating and
D0 is the initial value of the coefficient of linkage disequilibrium. If we take logs and solve for
n, we get:

ln Dn = ln D0 + n ln(1 − r)
ln Dn − ln D0
n =
ln(1 − r)
n = 10.103

Rounding up to the nearest integer, we get n = 11 generations.


(f) The M population will be in linkage equilibrium when the gametic frequencies are:
A a
B fA fB = 0.64 fa fB = 0.16
b fA fb = 0.16 fa fb = 0.04
To confirm these calculations we can calculate the linkage disequilibrium coefficient:

D = hAB hab − hAb haB =


= 0.64 × 0.04 − 0.16 × 0.16 = 0

You might also like