Population Genetics Lecture Notes

Population genetics
Population genetics is a field that could be viewed as the extension of Mendelian genetics
to the population level, rather than a consideration of the gene segregation within a cross
or family. While a single diploid individual can have at most two alleles for some gene,
in a population there can be numerous alleles at various frequencies. In population
genetics, descriptions can be made of the frequencies of various genotypes and alleles in
populations, and/or the levels of genetic variation can be determined.
A population is a collection of organisms of a single species the individuals of which

interact with each other in some way. So, a species will typically be broken up into a
number of populations.
Population genetics includes both empirical and theoretical studies. It provides the
mechanics or mathematics underlying the evolutionary process, and it emerged in the
1920's. The main founders include RA Fisher, JBS Haldane, and Sewall Wright. The
increasing availability of DNA sequence data, faster computers, and mathematics, have
continued to fuel this field.
Population genetics is also relevant in applied fields including agriculture, medical

genetics including the genetics of various human diseases, and in the evolution of
bacterial resistance to antibiotics, and the evolution of virulence in pathogens.
Questions in population genetics:
1. What are the levels of genetic variation in populations?

2. How do different mating patterns affect genotype and allele frequencies?
3. What forces are responsible for changes in allele frequencies or the genetic
composition of populations?
In otherwords, what are the roles of mutation, migration, genetic drift and natural
selection in populations.
Genetic variation
Genetic variation is critically important from both theoretical and applied perspectives. In
the absence of genetic variation, evolution cannot occur. Likewise, in order to improve
crop plants or animals for breeding purposes there must be genetic variation present.
One of the first tasks in population genetics is the measurement of variation within
populations. The easy way to approach this is to make use of single genes that are co-
dominant (or show additive inheritance or incomplete dominance) where there is a one to
one correspondence between the genotype and phenotype (for quantitative or polygenic
traits, other approaches are necessary).
Among the first loci used to assess levels of variation in human populations were various
blood group loci, including the ABO blood group system and the MN blood group
polypmorphism.
MN blood group polymorphisms
So let's consider an example involving the MN blood group polymorphism. The gene has
two codominant alleles in populations, M and N. And with a simple blood test one can
determine the the genotype of individuals as MM, MN or NN.
So let's imagine you have the following sample of from a population
Phenotype M MN N TOTAL
Genotype MM MN NN
Obs # 180 240 80 500
Genotype 180/500 240/500 80/500

proportions = 0.36 = 0.48 = 0.16
Note that the proportions should add to 1, or you've made a mistake.
We can also determine the frequencies of alleles in the populations. Since there is co-
dominance, this is very simple and can simply be done by counting up the numbers of
various alleles and dividing by the total number of alleles counted. Or, it can be done
using the proportions of each genotype.
1. Using the numbers of various genotypes observed in the population
p = freq(M) = # of M alleles/total alleles = (2 x 180 + 240)/ (2 x 500) = 0.6
q = freq(N) = # of N alleles/total alleles = (2 x 80 + 240)/ (2 x 500) = 0.4
Note that the alleles frequencies p + q = 1 or you've made a mistake.
2. Using the proportions of each genotype.
p = freq (MM) + 1/2 x freq (MN) = 0.36 + 0.48/2 = 0.6

q = freq (NN) + 1/2 x freq (MN) = 0.16 + 0.48/2 = 0.4
So this provides a simple description of the allele frequences for a particular gene in a
particular population. It is possible to compare populations as seen below.
Relative to most populations which have roughly comparable frequencies of the two
alleles, the Eskimo population studied has a considerably greater frequency of the M
allele (and correspondingly lower frequency of the N), while the Australian Aborignial
population has the reverse pattern.
The reasons for the differences are unknown, and the function of the MN polymorphism
(and / or ) functional differences between genotypes are not known. Random processes
and the lack of migration between populations (or their degree of isolation) are likely
responsible for these differences - or at least, this is the null hypothesis.
The ABO blood group system shows co-dominance of only two of the three alleles, and
so some other methods of estimation of allele frequencies and assumptions need to be
used in determining allele frequencies. Different frequencies also occur among some
human populations.
Here the Sioux population has an unusually low level of the IA allele compared with other
populations. Again, it may well be that random genetic drift is responsible.
Protein polymorphisms
The application of protein electrophoresis to assess the levels of genetics variation in

populations began in 1967 and sparked the "Find em and grind em era". The application
of these methods allowed the easy determination of genotype and allele frequencies in
numerous organisms and for a large number of genes. That is, one could assess not just
model organisms, like fruitflys, but most organisms.
Furthermore, these polymorphisms are commonly co-dominant so there is a one to one
correspondence between genotype and phenotype, making allele frequencies easy to
determine.
The first studies were carried out in 1967 by Harris on humans, and by Lewontin and
Hubby on fruitflys.
See powerpoint slides and gels as a reminder and a few other slides of classical and
balance hypothesis.
Using these methods, one can easily determine genotype and allele frequecies at
numerous loci.
So imagine you obtain the follow data for an esterase locus with three alleles denoted
F, M, S. for alleles encoding Fast, Intermediate and Slow migrating forms to the
enzymes. (See power point slide of the figure below).
Schematic of allozyme gel electrophoresis

showing an
enzyme segregating for 3 alleles, F, M, S,
in a population
sample.
F
M
S
SS MS MM FS FM FF MS MM FS FM SS
SS
So here are the data collected.
Genotyes FF FM FS MM MS SS TOTAL
Obs #s 100 150 50 125 0 75 500
We can easily calculate proportion of each genotype by simply dividing the observed
numbers by 500.
Allele frequencies are estimated just as with two alleles as follows:
p = freq(F) = (2x # FF + #FM + #FS) /(2x500) = 2x100+150+50 /(1000) = 0.4
q = freq(M) = (2x#MM + #FM +#MS)/(2x500) = 2x125+150+0/(1000) = 0.4
r = freq(S) = (2x#SS + #FS + #MS)/(2x500) = 2x75+50+0/(1000) = 0.2
note that p+q+r =1 or you've made an error.
In general for a locus with any number of co-dominant alleles, equation is given by
Freq(allele) = 2 x # of homozyg for the allele + the number heterozygous for it / total
alleles
or if you have proportions of each genotype it is:
Freq(allele) = proportion homozyg + 1/2 proportion of heterozygs for the allele.
Note that these equations can also be used for any co-dominant marker including DNA
based markers or polymorphisms, like CAPs, RFLPs, SNPs etc.
Sequencing DNA directly, can of course lead to the finest level of resolution for
determining the extent of genetic variation in populations.
RFLP variation in a sample of 58 flys across a 4.5 kb region of the genome of D.

pseudoobscura. There are 78 RFLPs and 53 unique haplotypes illustrating the great
extent of genetic variation in natural populations. (see book of powerpoint)
Effect of sexual reproduction on genetic variation.
Darwin was unaware of the correct mechanism of inheritance, and in at least one version
of his book on the Origins of Species, he invoked blending inheritance. Under blending
inheritance, sexual reproduction will result in a depletion of genetic variation from
populations and hence evolution cannot occur.
In the 1900s, some biologists independently determined that sexual reproduction alone
doesn't alter levels of genetic variation and can lead to an equilibrium frequency of
genotypes in a population following just 1 generation of random mating.
This has become known as Hardy-Weinberg equilibrium.
One way to consider what genotype frequencies should be under random mating (with
respect to a particular locus) is to consider the gamate pool approach.
That is, imagine an organism where all eggs are shed into the ocean and also all sperm
are shed and there is subsequent random union of gametes.
So, let's imagine there are two alleles A, a.
p = freq(A) in the population and q = freq (a).
The frequencies lies somwhere between 0 and 1 and of course p + q = 1.
So, we can set up a Punnet Square to deduce the expected frequency of offpring in the
population under random mating.
Eggs freq
p q
Sperm p p2 pq
q pq q2
So under random mating we expect :
Freq (AA) = p2
Freq (Aa) = 2pq
Freq (aa) = q2
This result holds independent of the initial genotype frequencies in the population. After
just one round of random mating, the progeny generation will be at these Hardy-
Weinberg equilibrium frequencies.
Note that this mating system described, is of course atypical for Humans, who usually
don't dump their sperm and eggs into the ocean and allow random fertilization to occur.
However, the same result holds for random mating for a locus, if individuals choose to
mate at random in pairs. The demonstration that this holds is a little more laborious to
derive, but not difficult.
First enumerate all possible matings among the three genotypes (9 of them).
Then weight the probability of each mating according to their proportion in the
population. Then determine the fraction of each kind of offspring produced by each
mating weighting them by the proportion of that mating and then adding up the progreny
across all matings.
So, Imagine you have a population with two alleles, A and a, and the frequencies
of genotypes are freq(AA) = D; freq(Aa) = H; freq (aa) = R.
note that p = D + 1/2H; q = R + 1/2H

also D+H+R=1
Matings
AA Aa aa
Male AA x Female AA :prob D x D: offspring all AA D2
Male AA x Female Aa : prob D x H offsping 1/2 AA : 1/2 Aa 1/2DH 1/2 DH
Male AA x Female aa : prob D x R offspring all Aa DR

etc,
to
Male aa x Female aa : prob R x R offspring all aa R2
Then add up the proportions of AA offspring, then Aa and then aa.

You'll have expression in terms of D, H, R. But sub into them expressions for p and q.
You should end up with Freq (AA) = p2; Freq (Aa) = 2pq ; Freq (aa) = q2
Thus under random mating, of individuals in pairs with respect to some locus, you expect
to find Hardy-Weinberg proportions of each genotype.
That is: Freq (AA) = p2 ; Freq (Aa) = 2pq ; Freq (aa) = q2
Note that Hardy-Weinberg proportions can be extended to any number of alleles per
locus. So for any number of alleles, the frequency of a particular homozygote will always
be the square of the allele frequences, while the frequency of heterozgotes will always be
2 x the product of each of the two relevant allele frequencies comprizing the heterozgote.
So, for three alleles, F, M, S at freqs p, q, r

You'd expect
FF p2 ; FM 2pq; FS 2ps; MM q2 and so on.
This may be given by the expansion of (p + q + r)2
HW eq can also be extended to tetrasomic inheritance through the expansion of

(p + q )4
Note that random mating and resultant HW eq doesn't result in any change in allele
frequencies, so in large random mating population, you wouldn't expect allele or
genotype frequencies to change (unless some other force is operarting).
One consquence of HW equilbrium (and random mating), is that rare alleles will occur
most of the time in heterozygotes. That is, there will be few individuals homozygous for
a rare allele. This has important consequences in evolution and in medical genetics. That
is, human genetic disorders are rare, and hence most of the alleles for recessive disorders
will exist and go unrecognized in individuals who are heterozygous for those recessive
alleles.
so imagine the frequency of an allele d, causing a recessive disease is q = 0.001

(so p = 0.999)
Expected frequencies assuming HW eq are:
DD p2 = 0.998 Dd = 2pq = 0.001998 dd = q2 = 0.000001
So, here 1 in 1 million have the disease (they are dd)

close to 2000 in a million carry the disease causing allele as heterozygotes and they are
normal.
Or there are about 2000 more individuals carrying the allele as heterozygotes compared
with those with the disease.
This is one reason why it isn't easily possible to get rid of rare genetic recessive disorders
since you'd have to eliminate far more normal people than diseased.
This is also why natural and artificial selection against recessives is difficult or lags since
most of the time the phenotype of the recessive allele isn't expressed until its frequency
becomes appreciable.
Note that one can determine whether a population is at HW eq with respect to some locus
by assaying a random sample from the popluation and doing a chi-square goodness of fit
test as in the example below:
Sample 100 flys for ADH activity

FF FS SS N
Number of flys 22 30 48 100
We first need to estimate allele freqs
p = freq(F) = 2 x 22 + 30 /(2 x 100) = 0.37
q = freq(S) = 2 x 48 + 30 /(2 x 100) = 0.63
check that p + q = 1
Now set up a table with observed and #s expected (based upon Hweq).
FF FS SS N
OBS # of flys 22 30 48 100
exp # of flys N x p2 N x 2 pq N x q2
= 13.69 = 46.62 = 39.69
Chisq = 12.7 Chisq tab df =1 = 3.841
Df = 3 classes - 1 -1 independent parameter estimated from the data
Reject Hweq. With respect to ADH , the flys are not at Hweq. There appear to be too few
FS and too many FF and SS. Why that is requires further study. Could be inbreeding.
Note that in cases where there is dominance, and you wish to estimate allele frequencies,
you can do this if you assume mating is random with respect to the locus you are
considering.
So Imagine flower colour Red = RR and Rr, white rr.

let p = freq(R) and q = freq(r)
Sample a population as follows:
RED flrs (RR and Rr) white (rr) N
obs 75 25 100
pheno proportion 75/100 = .75 25/100 = .25
Assuming Hweq p2+2pq q2
The easy solution is to set q2 = 0.25
take square root q = 0.5
since p + q =1, p = 1 - .5 = 0.5
If we wished to we could also determine expected frequency of each genotype

so, p2 of RR, 2 pq of Rr.
Can we test for HW eq?

No. we assumed it to get here, and there are also no degrees of freedom.
DF = 2 classes - 1 - 1 parm est = 0
Departures from random mating.
If a species is divided into separate populations, random mating may occur within
populations but not necessarily between them. If allele frequencies differ between
populations, then globally Hweq will likely not be observed across the whole species,
whereas it may occur within each population (given random mating within popns).
Inbreeding.
One form of departure from random mating is inbreeding (or at least inbreeding in excess
of that which would occur due to chance).
Inbreeding is mating among relatives, or the most extreme form, selfing.

The more closely related the individual mating, the greater the effects on genotype
frequencies (greater departure from Hweq).
If an entire popluation were suddenly to undergo nothing but inbreeding, then eventually
all loci would become homozygous. The rate at which this occurs is a function of the
kind of inbreeding. Eg 1st cousin matings, brother sister, selfing etc.
for selfing!
Inbreeding depression often results from inbreeding (in normally outbred organisms).
Inbreeding is the reduction in fitness of inbred relative to outbred offspring.
In many ways it is the opposite of hybrid vigour.
Inbreeding depression is largely the result of recessive or partially recessive genes that
become homozygous upon inbreeding.
The inbreeding coefficent, F, may be estimated from co-dominant data at one (or
preferably more than one locus). Inbreeding effects all loci in the genome as opposed to
other departures from random mating.
F = 1 - (obs het / exp het)
Where obs het is the observed proportion of heterozygotes at a locus and

exp het is the expected proportion assuming Hweq.
When there is random mating obs het = exp het and F = 0 (no inbreeding)
At the opposite extreme, F = 1 when there are no heterozygotes at all at a locus that has
two or more alleles. Normally 0 < F < 1.
Note that you'd expect an approximately equivalent F value for all loci if a population is
inbreeding.
To explore the effects of inbreeding, Imagine that 1 in 10,000 people have the recessive
genetics disease, cystic fibrosis. That is the frequency of the disease causing allele is q =
0.01 while the normal is p = .99
Normal Cystic
with F = 0 9999 1
with F=.5 9949.5 50.5
with F=1 9900 100
So with partial inbreeding (F=0.5), we'd see 50 times more people with the disease. Now
recall that there are numerous recessive genetic disorders, and all would increase in
homozygosity resulting in increases in frequency of people with numerous other genetic
disorders.
Note that if you can estimate F, you can predict frequency of various genotypes:
AA Aa aa
p2 +Fpq 2pq(1-F) 2
q +Fpq
Other deviations from random mating
Assortative mating. This is where, say, individuals with the same phenotype tend to mate
more often than is predicted based upon random. So in humans, there is a tendancy to
assortative mating by height. Note that assortative mating could increase homozygosity
of genes for height, but it will not affect all genes in the genome as inbreeding does.
Negative assortative mating, is the opposite with unlike individuals mating. This form of
mating can increase heterozygosity for genes involved in the particular trait. Tristylous
plant populations exhibit negative assortative mating and thus maintain the three
phenotypes in the population at approximately equal frequencies .
Show image again. (ie see powerpoints)
Or diagram out homomorphic SI system showing mating advantage to low frequency

genotypes. (omit)
Mutation
Mutation is the ultimate source of genetic variation and can introduce new alleles into
populations. It is, however, a weak force because of its low rate and alone results only in
very slow changes in allele frequency through time.
Rates of mutation vary across organisms and genes, but typically lie in the range of about
1 mutation in 1 million per generation for a particular gene.
or 10-6 mutations per gene per generation and perhaps range from
about 10-6 to 10-9 mutations per gene per generation
Imagine a population is composed of just the A allele at some point in time

and the mutation rate to other alleles is 10-6.
So in generation t=0 the frequency of the A allele p = 1.
In the next generation, 1 in 1 million A alleles mutate to other alleles so there is a

reduction in the frequency of the A allele p = 1 - 10-6 = 0.999999
in the next generation there will p = .999999 - .999999 x 10-6 = 0.999998

and so on.
Allele frequency change due to m utation alone

Freq of A allele
1
rate = 0.00001
0.5
rate = 0.000001
0
0 20000 40000 60000 80000 100000
Generation
The basic message is plain. While mutation is a critically important process, mutation
alone is very very slow at causing allele frequency change in populations.
Variation due to recombination
Recombination has a greater capacity to generate genetic variation than mutation. This is
because recombination can shuffle or create new gene combinations due to crossing over
or independent assortment for genes on different chromosomes.
Given the low rate of mutation, when a mutation first occurs, it will be unique and will
occur in a given chromosomal context. So imagine you have the following situation with
one gene already polymorphic in the population and the other we will have new mutation
occur: So initially the population is polymorphic at the A locus with A and a alleles.
While the population is fixed for the B allele, except for this first new b* mutation.
So with respect to the two loci the population is composed initially of the chromosome
(or haplotypes) AB and aB. Mutation produces a new haplotype ab*, and the new
mutation initially never occurs with the A allele.
A a Mutation B to b* A a
B B B b*
Imagine there is no crossing over and that for some reason the ab* combination increases
in frequency.
Then the population will be composed of only gene combinations

AB, aB and ab*. There will be no Ab*.
If we allow recombination to occur between the A and B genes, then over time, there will
be a random association of alleles generated. This state is called linkage equilibrium.
The frequency of each combination of allele of different genes is expected to occur
according to the product of their individual frequencies. That is, whether a chromosome
has the A allele is independent of whether it has the B or b alleles.
So at linkage equilibrium
freq(AB) = freq(A) x freq (B)
Freq (Ab*) = freq(A) x freq (b*)

Freq (aB) = freq (a) x freq (B)
Freq (ab*) = freq (a) x freq (b*).
The original situation is described as linkage disequilibrium That is, a nonrandom

association of alleles. When new mutations occur, they are initially in linkage
disequilibrium. The disequilibrium eventually breaks down provided there is some
recombination betwee the genes. The more recombination there is, the more rapidly the
disequilibrium breaks down.
An example of extreme disequilibrium would be if you had only AB and ab

chromosomes in the population. You could even set this up in a fruitfly bottle experiment
and follow the change in frequency of haplotypes through time. Note that linkage
disequilbrium can even occur for genes that are not linked, but of course it will
breakdown more quickly the if genes are not linked.
Note that linkage disequilbrium can also be generated by other forces such as natural
selection or migration or intermixing of two previously isolated populations. Even drift
can generate linkage disequilibrium which will of course decay.
Show graphs of decay (see more in powerpoints)
Decay of linkage disequilibrium with r = 0.01
0.6
0.5
Two-locus frequency
0.4
AB
Ab
0.3
aB
ab
0.2
0.1
0
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
Generation
Recently, the occurrence of linkage disequilibrium has been exploited to help localize or
identify genes for various human genetic diseases. One can look for an association
between various molecular markers and the disease occurrence. The closer the marker to
the disease gene, the stronger the extent of linkage disequilibrium. This can then be
exploited to find the gene.
Variation from migration.
If populations differ in allele frequency, then migration of individuals from one

population into the other can change gene frequencies. The rate of change will be a
function of how much migration occurs, and migration can be more potent than mutation
at changing allele frequencies. Migration tends to homogenize allele frequencies across
populations if there is bi-directional migration (migrants from pop A go to pop B and
vice versa). The final allele frequencies may end being the avearage of the two initially in
the separate populations. Population size will also be important in the final frequencies.
Migration can also introduce an entirely new allele from one population to another in
which it might not have previously existed.
Natural Selection
Involves the differential rate of survival and/or reproduction of individuals with different
genotypes. Natural selection has the capacity to change gene frequencies relatively
rapidly (compared to the other evolutionary forces of mutation, migration and drift). It
can also, however, result in the maintenance of genetic variation and/or prevent
evolutionary change from occurring at locus. It is the process responsible for adaptation
of organisms to their environment. The process is very much like that of artificial
selection practiced by plant and animal breeders. The difference being there is a human
conceived of goal in the breeding of plants and animals, while in nature, the particular
environment (biotic and abiotic) drives the direction of evolutionary change.
We often refer to the Darwinian fitness which is a relative measure of the probability of
survival and reproduction of a genotype or phenotype. That is, it is measured relative to
other genotypes or phenotypes in the population.
Relative fitness is a function of the environment an individual is in. So a particular

genotype might be favoured in one environment but less favoured in another.
A gene that results in increased offspring production in an environment compared with

other genotypes, will increase infrequency.
Different forms of selection.
Frequency independent selection is selection where the fitness of a genotype/phenotype

does not depend upon the composition of the population to which it belongs. That is, the
fitness is a function of the enviroment but not the composition of its own population. So
as a example, the fitness of two plants might be a function of their leaf numbers to
enhance the amount of photosynthesis, something that is independent of the presence of
other plants in the population.
Frequency dependent selection is where the frequency of other genotypes/phenotypes

affects the fitness. So for example plants with self-incompatibility systems, the rare
genotypes at the S-locus have greater relative fitness because they can mate with many
other plants, as compared to plants carrying more common S-alleles. This process can
result in high levels of genetic variation at incompatibility loci, and their maintenance
over long periods of time.
Density dependent selection is selection where population size or density changes the
relative fitness of genotypes/phenotypes. E.g. In field mice, there could be selection for
increasingly aggressive behaviour as density increases (ie fight your neighbours to get
more food).
Measuring fitness differences
While natural selection acts on the sum total of genotypic/phenotypic characters in the
wild, we'll consider cases where a variation at a single gene is responsible for fitness
differences. Clearly there are many such examples in humans, and other organisms, when
we consider genes with detrimental effects, such as alleles for various genetic disorders
which have the capacity to severely reduce the fitness of individuals with those
genotypes. The genes causing these disorders are typically selected against and hence
there typically at low frequency in populations.
The effects of selection can be determined at a single locus by considering the relative
fitness of individuals with various genotypes.
So let's imagine we have form of selection that operates some time after zygote formation
but before reproductive maturity. This is sometimes refereed to as viability selection.
We'll consider a gene with two alleles, A and a with initial freqs p and q.
Assume we begin at zygote formation with HW eq genotype freqs.
Genotype AA Aa aa
zygotes stage p2 2pq q2
viability selection occurs here and some individuals die as a function of their genotype.
adults W AA x p2 W Aa x 2pq W aa x q2
here the W's represent the probability of surviving to reproductive age.

Because these W's are likely to be smaller than 1, the sum of the genotype proportions
would no longer add up to one. We can scale this so that it adds up to one by dividing
through be the mean fitness of the population.
W = W AA x p2 + W Aa x 2pq + W aa x q2
So after selection the genotype frequencies will be:
AA Aa aa
W AA x p2 W Aa x 2pq W aa x q2
W W W
These will now sum to one and can easily be compared with the initial genotype
frequencies.
Given these frequencies after selection, we can also now calculate allele frequencies.
So after selection,the frequency of A will be p' and a is q'
p' = p2 W AA / W + pq W Aa / W
q' = q2 W aa / W + pq W Aa / W
If selection is constant from 1 generation to the next, one can then use the new genotype
or allele frequency, and the same equations to derive what happens in the subsequent
generations. Each time calculating the new genotype type frequencies based upon those
in the previous generation and then calculating mean fitness, allele freqs etc.
Note that under frequency independent selection the mean fitness of the population tends
to increase through generations.
Keep in mind that selection is really a relative measure. That is, it is the fitness or in this
case the survivorship of one genotype relative to others that leads to increases or
decreases in allele frequencies.
An example of viability selection in plants.

Cyanogenesis is the ability of plants to release hydrogen cyanide gas, HCN, when they
are eaten by herbivores. The genetics is relatively simple, with two alleles, A and a,
where AA and Aa plants produce cyanide and aa don't.
And further let's imagine we start off the population at Hweq freqs with p = q = .5
Genotype AA Aa aa
zygotes stage 0.25 0.5 0.25
prob surviving 0.5 0.5 0.1
Fitness W AA =.5 W Aa =.5 W aa = .1
W = W AA x p2 + W Aa x 2pq + W aa x q2
W = 0.5 x .25 + 0.5 x 0.5 + 0.1 x 0.25 = 0.4

So after selection the genotype frequencies will be:
AA Aa aa
W AA x p2 W Aa x 2pq W aa x q2
W W W
0.5 x .25 /.4 .5 x .5 / .4 .1 x .25 / .4
= 0.3125 = 0.625 = .0625
Allele freqs are
p' = 0.3125 + 0.625/2 = 0.625
q' = 0.0625 +0.625/2 = 0.375
So there has been an increase in the frequency of the C allele and decrease in c.
If this process were to continue, the c allele would eventually be eliminated from the
population.
(show some examples using populus program simulation of natural selection or graph or
or graphs shown in text of allele frequency change)
Selection can cause rapid changes in allele frequency depending upon how different the
relative fitness are.
Balanced Polymorphism
Sometimes called overdominance, the phenomenon can occur when the fitness of the
heterozygote is greater than that of either homozygote.
When this occurs, an equilibrium point is reached where both alleles are maintained in
the population at frequencies determined by the relative fitnesses (or seln coeffs).
The equilbrium is often described using selection coefficients rather than fitness
coefficients.
So t is the strength of selection against the AA genotype.

W AA = 1 - t
s is the strength of selection against the aa homozgote.
Waa = 1 - s
The relative fitness of the heterozygote is set to 1
So W Aa = 1.
So we have
AA Aa aa
W AA W Aa Waa
1-t 1 1-s
It can be shown that an equilibrium is reached where:
freq (A) = p = s / (s+t)
So if any perturbation from this equilibrium occurs, the population is driven back towards
the equilibrium point. Thus both alleles are held in the population by selection which
maintains genetic variation at this locus.
The classic example of this is sickle cell disease where in malarial areas the Ss
heterozygotes have superior fitness because they have increased resistance to malaria.
ss have the disease
SS get malaria.
So in Malarial regions t = .15, s = 1
Then Freq (S) = p eq = 1/(1 + .15) = 0.87
q eq = 1 - p = 0.13
These theoretical predictions are close to the observed frequencies of alleles in the
populations in Malarial regions of Africa.
Illustrate using populus (again use natural selection simulation in populus).
Note that if the heterozygote has lower fitness than either homozygote, then this yields an
unstable situation and one or the other allele will become fixed in the population.
Mutation selection balance.
freq (a) = q = (u / s)1/2
It can be shown that the frequency of a deleterious recessive allele reaches an equilibrium
between its mutation rate and selection against the recessive homozygotes.
So this is strictly a function of the mutation rate to the deleterious allele and the strength
of selection against it.
Selection removes the alleles by selection against aa homozygotes but mutation re-
introduces them.
so so let's imagine a recessive allele is harmful with a fitness of Waa = 0.5 (so s= .5 as
well) and u = 10-5
then the equilbrium frequency of a is q = (10-5 / .5)1/2 = 0.0045
a more harmful, in fact, a lethal allele will have Waa = 0 or s = 1
q = (10-5 / 1)1/2 = 0.003 a lower frequency obviously and one could calculate the percent
of homozygous individuals assuming Hweq.
Random effects (Genetic Drift)
The final force capable of changes to allele frequency is random and is referred to as
genetic drift. It occurs in any population of finite size (ie all populations are finite in
size). The smaller the population size the more signficant the effects of genetic drift.
Use populus to illustrate random change in allele frequencies.
Under drift alone in a finite sized population, eventually all but one allele at a locus will
be lost. The length of time this takes is a function of population size. The larger the popn
the slower the rate of loss. It isn't possible to predict with certainty which allele will be
lost although in general, the lower the frequency of an allele the greater the chance it will
be lost.
Elephant seals and Cheetahs were driven to near extinction (Cheetah's about 10000 years
ago), and they are largely devoid of genetic variation. The same is true of elephant seals.
This is because they were driven to very small population sizes, and that is when the
effects of drift are most marked.
Colonizing populations often go through a genetic bottleneck (loss of genetic varn due to
drift) and are also have lower levels of genetic variation such as species that colonize an
island with only a few founders. (a phenomenon known as Founder Effect).
Note that drift can result in the fixation of somewhat deterimental alleles if population
size is small enough, and likewise, an allele favoured by natural selection could be lost
due to chance sampling from one generation to the next.
The effects of drift on DNA sequence data have become very important because this
allows the formulation of a null model to explain DNA sequence diversity within species
and between species. If the null model doesn't fit, than this provides evidence that natural
selection may be responsible for sequence diversity or sequence divergence between
species.
Note that drift has the effect of resulting in loss of genetic variation which may be
counter balanced by the re-introduction of genetic variation by mutation. A steady state
can even be reached where rate of loss by drift is counterbalanced by mutation leading to
an equilibrium level of heterozygosity. This is often known as the neutral model of
molecular evolution and forms the null hypothesis against which evidence for selection
can be obtained at the molecluar level.

Population Genetics Lecture Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Population Genetics Lecture Notes

Uploaded by

Copyright:

Available Formats

Population genetics

A population is a collection of organisms of a single species the individuals of which

Population genetics is also relevant in applied fields including agriculture, medical

Questions in population genetics:

1. What are the levels of genetic variation in populations?

MN blood group polymorphisms

So let's imagine you have the following sample of from a population

Obs # 180 240 80 500

Genotype 180/500 240/500 80/500

Note that the proportions should add to 1, or you've made a mistake.

1. Using the numbers of various genotypes observed in the population

p = freq(M) = # of M alleles/total alleles = (2 x 180 + 240)/ (2 x 500) = 0.6

q = freq(N) = # of N alleles/total alleles = (2 x 80 + 240)/ (2 x 500) = 0.4

Note that the alleles frequencies p + q = 1 or you've made a mistake.

2. Using the proportions of each genotype.

p = freq (MM) + 1/2 x freq (MN) = 0.36 + 0.48/2 = 0.6

The application of protein electrophoresis to assess the levels of genetics variation in

Schematic of allozyme gel electrophoresis

Allele frequencies are estimated just as with two alleles as follows:

p = freq(F) = (2x # FF + #FM + #FS) /(2x500) = 2x100+150+50 /(1000) = 0.4

q = freq(M) = (2x#MM + #FM +#MS)/(2x500) = 2x125+150+0/(1000) = 0.4

r = freq(S) = (2x#SS + #FS + #MS)/(2x500) = 2x75+50+0/(1000) = 0.2

note that p+q+r =1 or you've made an error.

or if you have proportions of each genotype it is:

Freq(allele) = proportion homozyg + 1/2 proportion of heterozygs for the allele.

RFLP variation in a sample of 58 flys across a 4.5 kb region of the genome of D.

So, let's imagine there are two alleles A, a.

p = freq(A) in the population and q = freq (a).

The frequencies lies somwhere between 0 and 1 and of course p + q = 1.

So under random mating we expect :

Freq (Aa) = 2pq

note that p = D + 1/2H; q = R + 1/2H

Male AA x Female Aa : prob D x H offsping 1/2 AA : 1/2 Aa 1/2DH 1/2 DH

Male AA x Female aa : prob D x R offspring all Aa DR

Then add up the proportions of AA offspring, then Aa and then aa.

That is: Freq (AA) = p2 ; Freq (Aa) = 2pq ; Freq (aa) = q2

So, for three alleles, F, M, S at freqs p, q, r

This may be given by the expansion of (p + q + r)2

HW eq can also be extended to tetrasomic inheritance through the expansion of

so imagine the frequency of an allele d, causing a recessive disease is q = 0.001

Expected frequencies assuming HW eq are:

DD p2 = 0.998 Dd = 2pq = 0.001998 dd = q2 = 0.000001

So, here 1 in 1 million have the disease (they are dd)

Sample 100 flys for ADH activity

We first need to estimate allele freqs

p = freq(F) = 2 x 22 + 30 /(2 x 100) = 0.37

q = freq(S) = 2 x 48 + 30 /(2 x 100) = 0.63

Chisq = 12.7 Chisq tab df =1 = 3.841

Df = 3 classes - 1 -1 independent parameter estimated from the data

So Imagine flower colour Red = RR and Rr, white rr.

Sample a population as follows:

RED flrs (RR and Rr) white (rr) N

pheno proportion 75/100 = .75 25/100 = .25

Assuming Hweq p2+2pq q2

The easy solution is to set q2 = 0.25

take square root q = 0.5

since p + q =1, p = 1 - .5 = 0.5

If we wished to we could also determine expected frequency of each genotype

Can we test for HW eq?

Departures from random mating.

Inbreeding is mating among relatives, or the most extreme form, selfing.

F = 1 - (obs het / exp het)

Where obs het is the observed proportion of heterozygotes at a locus and

Freq (Ab) = freq(A) x freq (b)

Freq (ab) = freq (a) x freq (b).