Professional Documents
Culture Documents
Population genetics is a field that could be viewed as the extension of Mendelian genetics
to the population level, rather than a consideration of the gene segregation within a cross
or family. While a single diploid individual can have at most two alleles for some gene,
in a population there can be numerous alleles at various frequencies. In population
genetics, descriptions can be made of the frequencies of various genotypes and alleles in
populations, and/or the levels of genetic variation can be determined.
Population genetics includes both empirical and theoretical studies. It provides the
mechanics or mathematics underlying the evolutionary process, and it emerged in the
1920's. The main founders include RA Fisher, JBS Haldane, and Sewall Wright. The
increasing availability of DNA sequence data, faster computers, and mathematics, have
continued to fuel this field.
Genetic variation
Genetic variation is critically important from both theoretical and applied perspectives. In
the absence of genetic variation, evolution cannot occur. Likewise, in order to improve
crop plants or animals for breeding purposes there must be genetic variation present.
One of the first tasks in population genetics is the measurement of variation within
populations. The easy way to approach this is to make use of single genes that are co-
dominant (or show additive inheritance or incomplete dominance) where there is a one to
one correspondence between the genotype and phenotype (for quantitative or polygenic
traits, other approaches are necessary).
Among the first loci used to assess levels of variation in human populations were various
blood group loci, including the ABO blood group system and the MN blood group
polypmorphism.
So let's consider an example involving the MN blood group polymorphism. The gene has
two codominant alleles in populations, M and N. And with a simple blood test one can
determine the the genotype of individuals as MM, MN or NN.
Phenotype M MN N TOTAL
Genotype MM MN NN
We can also determine the frequencies of alleles in the populations. Since there is co-
dominance, this is very simple and can simply be done by counting up the numbers of
various alleles and dividing by the total number of alleles counted. Or, it can be done
using the proportions of each genotype.
So this provides a simple description of the allele frequences for a particular gene in a
particular population. It is possible to compare populations as seen below.
Relative to most populations which have roughly comparable frequencies of the two
alleles, the Eskimo population studied has a considerably greater frequency of the M
allele (and correspondingly lower frequency of the N), while the Australian Aborignial
population has the reverse pattern.
The reasons for the differences are unknown, and the function of the MN polymorphism
(and / or ) functional differences between genotypes are not known. Random processes
and the lack of migration between populations (or their degree of isolation) are likely
responsible for these differences - or at least, this is the null hypothesis.
The ABO blood group system shows co-dominance of only two of the three alleles, and
so some other methods of estimation of allele frequencies and assumptions need to be
used in determining allele frequencies. Different frequencies also occur among some
human populations.
Here the Sioux population has an unusually low level of the IA allele compared with other
populations. Again, it may well be that random genetic drift is responsible.
Protein polymorphisms
The first studies were carried out in 1967 by Harris on humans, and by Lewontin and
Hubby on fruitflys.
See powerpoint slides and gels as a reminder and a few other slides of classical and
balance hypothesis.
Using these methods, one can easily determine genotype and allele frequecies at
numerous loci.
So imagine you obtain the follow data for an esterase locus with three alleles denoted
F, M, S. for alleles encoding Fast, Intermediate and Slow migrating forms to the
enzymes. (See power point slide of the figure below).
F
M
S
SS MS MM FS FM FF MS MM FS FM SS
SS
So here are the data collected.
Genotyes FF FM FS MM MS SS TOTAL
Obs #s 100 150 50 125 0 75 500
We can easily calculate proportion of each genotype by simply dividing the observed
numbers by 500.
In general for a locus with any number of co-dominant alleles, equation is given by
Freq(allele) = 2 x # of homozyg for the allele + the number heterozygous for it / total
alleles
Note that these equations can also be used for any co-dominant marker including DNA
based markers or polymorphisms, like CAPs, RFLPs, SNPs etc.
Sequencing DNA directly, can of course lead to the finest level of resolution for
determining the extent of genetic variation in populations.
Darwin was unaware of the correct mechanism of inheritance, and in at least one version
of his book on the Origins of Species, he invoked blending inheritance. Under blending
inheritance, sexual reproduction will result in a depletion of genetic variation from
populations and hence evolution cannot occur.
In the 1900s, some biologists independently determined that sexual reproduction alone
doesn't alter levels of genetic variation and can lead to an equilibrium frequency of
genotypes in a population following just 1 generation of random mating.
This has become known as Hardy-Weinberg equilibrium.
One way to consider what genotype frequencies should be under random mating (with
respect to a particular locus) is to consider the gamate pool approach.
That is, imagine an organism where all eggs are shed into the ocean and also all sperm
are shed and there is subsequent random union of gametes.
So, we can set up a Punnet Square to deduce the expected frequency of offpring in the
population under random mating.
Eggs freq
p q
Sperm p p2 pq
q pq q2
Freq (AA) = p2
Freq (aa) = q2
This result holds independent of the initial genotype frequencies in the population. After
just one round of random mating, the progeny generation will be at these Hardy-
Weinberg equilibrium frequencies.
Note that this mating system described, is of course atypical for Humans, who usually
don't dump their sperm and eggs into the ocean and allow random fertilization to occur.
However, the same result holds for random mating for a locus, if individuals choose to
mate at random in pairs. The demonstration that this holds is a little more laborious to
derive, but not difficult.
First enumerate all possible matings among the three genotypes (9 of them).
Then weight the probability of each mating according to their proportion in the
population. Then determine the fraction of each kind of offspring produced by each
mating weighting them by the proportion of that mating and then adding up the progreny
across all matings.
So, Imagine you have a population with two alleles, A and a, and the frequencies
of genotypes are freq(AA) = D; freq(Aa) = H; freq (aa) = R.
Matings
AA Aa aa
Male AA x Female AA :prob D x D: offspring all AA D2
You should end up with Freq (AA) = p2; Freq (Aa) = 2pq ; Freq (aa) = q2
Thus under random mating, of individuals in pairs with respect to some locus, you expect
to find Hardy-Weinberg proportions of each genotype.
Note that Hardy-Weinberg proportions can be extended to any number of alleles per
locus. So for any number of alleles, the frequency of a particular homozygote will always
be the square of the allele frequences, while the frequency of heterozgotes will always be
2 x the product of each of the two relevant allele frequencies comprizing the heterozgote.
Note that random mating and resultant HW eq doesn't result in any change in allele
frequencies, so in large random mating population, you wouldn't expect allele or
genotype frequencies to change (unless some other force is operarting).
One consquence of HW equilbrium (and random mating), is that rare alleles will occur
most of the time in heterozygotes. That is, there will be few individuals homozygous for
a rare allele. This has important consequences in evolution and in medical genetics. That
is, human genetic disorders are rare, and hence most of the alleles for recessive disorders
will exist and go unrecognized in individuals who are heterozygous for those recessive
alleles.
Or there are about 2000 more individuals carrying the allele as heterozygotes compared
with those with the disease.
This is one reason why it isn't easily possible to get rid of rare genetic recessive disorders
since you'd have to eliminate far more normal people than diseased.
This is also why natural and artificial selection against recessives is difficult or lags since
most of the time the phenotype of the recessive allele isn't expressed until its frequency
becomes appreciable.
Note that one can determine whether a population is at HW eq with respect to some locus
by assaying a random sample from the popluation and doing a chi-square goodness of fit
test as in the example below:
check that p + q = 1
Now set up a table with observed and #s expected (based upon Hweq).
FF FS SS N
OBS # of flys 22 30 48 100
exp # of flys N x p2 N x 2 pq N x q2
= 13.69 = 46.62 = 39.69
Reject Hweq. With respect to ADH , the flys are not at Hweq. There appear to be too few
FS and too many FF and SS. Why that is requires further study. Could be inbreeding.
Note that in cases where there is dominance, and you wish to estimate allele frequencies,
you can do this if you assume mating is random with respect to the locus you are
considering.
obs 75 25 100
If a species is divided into separate populations, random mating may occur within
populations but not necessarily between them. If allele frequencies differ between
populations, then globally Hweq will likely not be observed across the whole species,
whereas it may occur within each population (given random mating within popns).
Inbreeding.
One form of departure from random mating is inbreeding (or at least inbreeding in excess
of that which would occur due to chance).
for selfing!
Inbreeding depression often results from inbreeding (in normally outbred organisms).
Inbreeding is the reduction in fitness of inbred relative to outbred offspring.
In many ways it is the opposite of hybrid vigour.
Inbreeding depression is largely the result of recessive or partially recessive genes that
become homozygous upon inbreeding.
The inbreeding coefficent, F, may be estimated from co-dominant data at one (or
preferably more than one locus). Inbreeding effects all loci in the genome as opposed to
other departures from random mating.
When there is random mating obs het = exp het and F = 0 (no inbreeding)
At the opposite extreme, F = 1 when there are no heterozygotes at all at a locus that has
two or more alleles. Normally 0 < F < 1.
Note that you'd expect an approximately equivalent F value for all loci if a population is
inbreeding.
To explore the effects of inbreeding, Imagine that 1 in 10,000 people have the recessive
genetics disease, cystic fibrosis. That is the frequency of the disease causing allele is q =
0.01 while the normal is p = .99
Normal Cystic
with F = 0 9999 1
So with partial inbreeding (F=0.5), we'd see 50 times more people with the disease. Now
recall that there are numerous recessive genetic disorders, and all would increase in
homozygosity resulting in increases in frequency of people with numerous other genetic
disorders.
Note that if you can estimate F, you can predict frequency of various genotypes:
AA Aa aa
p2 +Fpq 2pq(1-F) 2
q +Fpq
Assortative mating. This is where, say, individuals with the same phenotype tend to mate
more often than is predicted based upon random. So in humans, there is a tendancy to
assortative mating by height. Note that assortative mating could increase homozygosity
of genes for height, but it will not affect all genes in the genome as inbreeding does.
Negative assortative mating, is the opposite with unlike individuals mating. This form of
mating can increase heterozygosity for genes involved in the particular trait. Tristylous
plant populations exhibit negative assortative mating and thus maintain the three
phenotypes in the population at approximately equal frequencies .
Mutation is the ultimate source of genetic variation and can introduce new alleles into
populations. It is, however, a weak force because of its low rate and alone results only in
very slow changes in allele frequency through time.
Rates of mutation vary across organisms and genes, but typically lie in the range of about
or 10-6 mutations per gene per generation and perhaps range from
1
rate = 0.00001
0.5
rate = 0.000001
0
0 20000 40000 60000 80000 100000
Generation
The basic message is plain. While mutation is a critically important process, mutation
alone is very very slow at causing allele frequency change in populations.
Variation due to recombination
Recombination has a greater capacity to generate genetic variation than mutation. This is
because recombination can shuffle or create new gene combinations due to crossing over
or independent assortment for genes on different chromosomes.
Given the low rate of mutation, when a mutation first occurs, it will be unique and will
occur in a given chromosomal context. So imagine you have the following situation with
one gene already polymorphic in the population and the other we will have new mutation
occur: So initially the population is polymorphic at the A locus with A and a alleles.
While the population is fixed for the B allele, except for this first new b* mutation.
So with respect to the two loci the population is composed initially of the chromosome
(or haplotypes) AB and aB. Mutation produces a new haplotype ab*, and the new
mutation initially never occurs with the A allele.
A a Mutation B to b* A a
B B B b*
Imagine there is no crossing over and that for some reason the ab* combination increases
in frequency.
If we allow recombination to occur between the A and B genes, then over time, there will
be a random association of alleles generated. This state is called linkage equilibrium.
The frequency of each combination of allele of different genes is expected to occur
according to the product of their individual frequencies. That is, whether a chromosome
has the A allele is independent of whether it has the B or b alleles.
So at linkage equilibrium
Note that linkage disequilbrium can also be generated by other forces such as natural
selection or migration or intermixing of two previously isolated populations. Even drift
can generate linkage disequilibrium which will of course decay.
0.6
0.5
Two-locus frequency
0.4
AB
Ab
0.3
aB
ab
0.2
0.1
0
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
Generation
Recently, the occurrence of linkage disequilibrium has been exploited to help localize or
identify genes for various human genetic diseases. One can look for an association
between various molecular markers and the disease occurrence. The closer the marker to
the disease gene, the stronger the extent of linkage disequilibrium. This can then be
exploited to find the gene.
Variation from migration.
Natural Selection
Involves the differential rate of survival and/or reproduction of individuals with different
genotypes. Natural selection has the capacity to change gene frequencies relatively
rapidly (compared to the other evolutionary forces of mutation, migration and drift). It
can also, however, result in the maintenance of genetic variation and/or prevent
evolutionary change from occurring at locus. It is the process responsible for adaptation
of organisms to their environment. The process is very much like that of artificial
selection practiced by plant and animal breeders. The difference being there is a human
conceived of goal in the breeding of plants and animals, while in nature, the particular
environment (biotic and abiotic) drives the direction of evolutionary change.
We often refer to the Darwinian fitness which is a relative measure of the probability of
survival and reproduction of a genotype or phenotype. That is, it is measured relative to
other genotypes or phenotypes in the population.
Density dependent selection is selection where population size or density changes the
relative fitness of genotypes/phenotypes. E.g. In field mice, there could be selection for
increasingly aggressive behaviour as density increases (ie fight your neighbours to get
more food).
While natural selection acts on the sum total of genotypic/phenotypic characters in the
wild, we'll consider cases where a variation at a single gene is responsible for fitness
differences. Clearly there are many such examples in humans, and other organisms, when
we consider genes with detrimental effects, such as alleles for various genetic disorders
which have the capacity to severely reduce the fitness of individuals with those
genotypes. The genes causing these disorders are typically selected against and hence
there typically at low frequency in populations.
The effects of selection can be determined at a single locus by considering the relative
fitness of individuals with various genotypes.
So let's imagine we have form of selection that operates some time after zygote formation
but before reproductive maturity. This is sometimes refereed to as viability selection.
We'll consider a gene with two alleles, A and a with initial freqs p and q.
Genotype AA Aa aa
zygotes stage p2 2pq q2
viability selection occurs here and some individuals die as a function of their genotype.
adults W AA x p2 W Aa x 2pq W aa x q2
W = W AA x p2 + W Aa x 2pq + W aa x q2
AA Aa aa
W AA x p2 W Aa x 2pq W aa x q2
W W W
These will now sum to one and can easily be compared with the initial genotype
frequencies.
Given these frequencies after selection, we can also now calculate allele frequencies.
p' = p2 W AA / W + pq W Aa / W
q' = q2 W aa / W + pq W Aa / W
If selection is constant from 1 generation to the next, one can then use the new genotype
or allele frequency, and the same equations to derive what happens in the subsequent
generations. Each time calculating the new genotype type frequencies based upon those
in the previous generation and then calculating mean fitness, allele freqs etc.
Note that under frequency independent selection the mean fitness of the population tends
to increase through generations.
Keep in mind that selection is really a relative measure. That is, it is the fitness or in this
case the survivorship of one genotype relative to others that leads to increases or
decreases in allele frequencies.
And further let's imagine we start off the population at Hweq freqs with p = q = .5
Genotype AA Aa aa
zygotes stage 0.25 0.5 0.25
W = W AA x p2 + W Aa x 2pq + W aa x q2
AA Aa aa
W AA x p2 W Aa x 2pq W aa x q2
W W W
0.5 x .25 /.4 .5 x .5 / .4 .1 x .25 / .4
So there has been an increase in the frequency of the C allele and decrease in c.
If this process were to continue, the c allele would eventually be eliminated from the
population.
(show some examples using populus program simulation of natural selection or graph or
or graphs shown in text of allele frequency change)
Selection can cause rapid changes in allele frequency depending upon how different the
relative fitness are.
Balanced Polymorphism
Sometimes called overdominance, the phenomenon can occur when the fitness of the
heterozygote is greater than that of either homozygote.
When this occurs, an equilibrium point is reached where both alleles are maintained in
the population at frequencies determined by the relative fitnesses (or seln coeffs).
The equilbrium is often described using selection coefficients rather than fitness
coefficients.
Waa = 1 - s
So W Aa = 1.
So we have
AA Aa aa
W AA W Aa Waa
1-t 1 1-s
So if any perturbation from this equilibrium occurs, the population is driven back towards
the equilibrium point. Thus both alleles are held in the population by selection which
maintains genetic variation at this locus.
The classic example of this is sickle cell disease where in malarial areas the Ss
heterozygotes have superior fitness because they have increased resistance to malaria.
ss have the disease
SS get malaria.
q eq = 1 - p = 0.13
These theoretical predictions are close to the observed frequencies of alleles in the
populations in Malarial regions of Africa.
Note that if the heterozygote has lower fitness than either homozygote, then this yields an
unstable situation and one or the other allele will become fixed in the population.
Mutation selection balance.
It can be shown that the frequency of a deleterious recessive allele reaches an equilibrium
between its mutation rate and selection against the recessive homozygotes.
So this is strictly a function of the mutation rate to the deleterious allele and the strength
of selection against it.
Selection removes the alleles by selection against aa homozygotes but mutation re-
introduces them.
so so let's imagine a recessive allele is harmful with a fitness of Waa = 0.5 (so s= .5 as
well) and u = 10-5
q = (10-5 / 1)1/2 = 0.003 a lower frequency obviously and one could calculate the percent
of homozygous individuals assuming Hweq.
The final force capable of changes to allele frequency is random and is referred to as
genetic drift. It occurs in any population of finite size (ie all populations are finite in
size). The smaller the population size the more signficant the effects of genetic drift.
Under drift alone in a finite sized population, eventually all but one allele at a locus will
be lost. The length of time this takes is a function of population size. The larger the popn
the slower the rate of loss. It isn't possible to predict with certainty which allele will be
lost although in general, the lower the frequency of an allele the greater the chance it will
be lost.
Elephant seals and Cheetahs were driven to near extinction (Cheetah's about 10000 years
ago), and they are largely devoid of genetic variation. The same is true of elephant seals.
This is because they were driven to very small population sizes, and that is when the
effects of drift are most marked.
Colonizing populations often go through a genetic bottleneck (loss of genetic varn due to
drift) and are also have lower levels of genetic variation such as species that colonize an
island with only a few founders. (a phenomenon known as Founder Effect).
Note that drift can result in the fixation of somewhat deterimental alleles if population
size is small enough, and likewise, an allele favoured by natural selection could be lost
due to chance sampling from one generation to the next.
The effects of drift on DNA sequence data have become very important because this
allows the formulation of a null model to explain DNA sequence diversity within species
and between species. If the null model doesn't fit, than this provides evidence that natural
selection may be responsible for sequence diversity or sequence divergence between
species.
Note that drift has the effect of resulting in loss of genetic variation which may be
counter balanced by the re-introduction of genetic variation by mutation. A steady state
can even be reached where rate of loss by drift is counterbalanced by mutation leading to
an equilibrium level of heterozygosity. This is often known as the neutral model of
molecular evolution and forms the null hypothesis against which evidence for selection
can be obtained at the molecluar level.