You are on page 1of 6

J. Math. Biol.

(1992) 31:101 106 dournalof


mathematical
Wology
Springer-Verlag 1992

The multilocus Hardy-Weinberg law

P. Holgate
Department of Mathematics and Statistics, Birkbeck CoIlege, London WCIE 7HX, UK

Received June 7, 199I; received in revised form September 3, 1991

Abstract. Work on the genetic algebra of multilocus genetic systems is reviewed,


with particular emphasis on aspects of Hardy-Weinberg theory, including
existence and stability of equilibria, global convergence and disequilibrium
functions. It is pointed out that the non-uniqueness of the disequilibrium functions
does not necessarily invalidate proof of the multilocus Hardy-Weinberg law.

Key words: Population genetics Genetic algebras Multilocus equilibria

In the latest of a series of papers on multilocus selection, Karlin and Liberman


(1990) establish global convergence and stability for diallelic systems in which
each locus is overdominant, and multilocus recombination is positive. In the last
section of their paper they discuss neutral selection. The special case of their
results obtained by setting the viabilities equal leads to a simple proof of the
multilocus Hardy Weinberg law.
Karlin and Liberman point out that the result had been proved earlier by
Geiringer (1944), using induction on the number of loci. They refer to a paper
of Bennett (1954), who studied convergence in multilocus systems in terms of
disequilibrium functions, measuring the departure from independent association
among a set of loci, commenting that he, and later Hill (1974) "appeared not to
realise that these higher order disequilibrium functions are not well defined for
general recombination schemes". They say that "the multilocus case [of the
Hardy Weinberg law] is hardly mentioned".
There are in fact several studies of nonselective multilocus population genetics
in the literature of genetic algebras. They have covered such questions as the
conditions for global convergence, various ways of handling the multilocus system
and its generalisations, and the role of disequilibrium functions. These notes aim
to bring this work to a wider audience. The theory and applications of nonasso-
ciative algebras in genetics are treated in the monograph of W6rz-Busekros
(1980). Many papers have appeared since then, although most of them have been
concerned with the purely mathematical theory (bibliography by Holgate 1991).
The elements of a genetic algebra are linear combinations with real or
complex coefficients, in symbols representing the genetic types in the system.
Those combinations with non-negative coefficients adding up to one can repre-
sent either the proportions of genetic types in an infinite population, or the
102 P. Holgate

probability distribution of the type of an individual. We define the product of


any two symbols to be the linear combination representing the proportions or
probabilities in the generation arising from the union of those symbols. Thus the
product of two gametes represents the distribution of gametic types among the
output of recombination, while the product of two zygotes represents the
distribution of zygotic types among the offspring produced by mating.
For many nonselective genetic systems, the corresponding algebras are Gonshor
algebras, (see Eqs. (1), (2) and (3) of Gonshor 1960, §2). Gonshor algebras do
not necessarily represent actual or conceivable genetic systems, but I will describe
mathematical properties of the algebras in terms of their genetic interpretation (even
where it is not meaningful for Gonshor algebras that are not genetically realisable).
The principal train roots are the characteristic roots of the recurrence relation
between the sequence of vectors of genetic proportions produced by repeated
backcrossing from a given initial population, which in a Gonshor algebra are the
same for every initial population and are therefore characteristic of the genetic
system that it represents. The behaviour of the sequence of vectors of genetic
proportions under random mating can be deduced from the principal train roots
(see Holgate 1967, Abraham 1980a,b for a detailed study, and Holgate 1989 for
a survey). Gonshor (1960, Theorems 2.1 and 2.2) showed that (i) if none of the
principal train roots of a Gonshor algebra has the value ½, the system admits an
equilibrium population, and (ii) if all the principal train roots 2i other than the
necessary 20 = 1, have ]2i] <~,1 the sequence of genotype proportions tends
globally to this equilibrium. Gonshor notes in discussing these theorems that if
r of the principal train roots are equal to ½ and the remainder are smaller in
modulus, there can be an r-parameter family of equilibria. This was further
discussed by W6rz-Busekros (1980, §4), who provided a sufficient condition.
The basic data for the multilocus problem are the set {2(1, I c) } of recombi-
nation probabilities, where 2(1, I c) denotes the probability that the loci of subset
I come from the father, and those of I c from the mother, where I runs through
all the subsets of the k loci {1, 2 . . . . . k} ( = K ) . The first paper to treat the
multilocus problem by genetic algebra did not use the principal train root
approach, but attacked the quadratic operation of random mating directly.
Reiersol (1962) studied homomorphisms of the k-locus algebra onto the k
(k - 1)-locus factor algebras obtained by marginalising over each locus in turn.
Thus Reiers~l's work is an algebraic development of Geiringer's inductive
procedure, as he notes. He showed that the set {PK } of roots of the recurrence
relation for vectors of genotype proportions under random mating, called the
plenary train roots, for k loci is contained in the union of the sets {P~ j } of roots
obtained by deleting the loci one at a time, and of all possible products of pairs
of a member of {P~ T} with one of {Pr} as T runs through all subsets of K
with IT] ~<]T- S I. Reiers91 only stated that the sequence of vectors of genotype
proportions tends globally to an equilibrium in relation to loci linked to the X
gene of the sex determining locus, but the necessary results for the autosomal
case are present in his paper.
Another way of parameterising the k-locus problem is through the probabil-
ities that crossing over occurs at the subset C of the set of gaps between loci, and
at no others, for every subset C. Consider the case when (i) a subset C of the
gaps admits no crossing over, and (ii) for each gap of the complementary subset
the crossing over fraction is ½. Every crossover distribution is a linear combina-
tion with coefficients adding to one, of the above set of special cases, as C runs
through the subsets of K. The principal train roots of the special cases are easy
The multilocus Hardy Weinberglaw 103

to calculate, and those arising in the general distribution are obtained as the
same linear combination of those arising in the basic cases. This approach is
described in (Holgate 1968). If there are ri + 1 alleles at locus i, there is an
r l r 2 . . , rk-dimensional family of equilibria (see Theorem 4). The result relating
the dimensionality of the manifold of equilibria to the multiplicity of the
principal train root ½, although applicable in the multilocus algebra, does not
hold in as great generality as stated in Eq. (25), (see W6rz-Busekros 1980, pp.
65 69). The principal train roots are 1, and the numbers A(I) = ~ j = ~)~(J, jc).
The value A(I) is the probability that the loci in I come from the father,
irrespective of what happens elsewhere. Then, since clearly A(I) <<,½, any se-
quence of vectors of genetic proportions under random mating converges to an
equilibrium, by Gonshor's Theorem 2.2.
Heuch (1972) extended Reiersol's approach to multiple loci linked to the sex
factor. Theorem 2 explicitly establishes the relevant modification of the H a r d y -
Weinberg law for this case. In a later paper, Heuch (1973) dealt with the k-locus
autosomal problem, allowing for mutation at all loci, using a method related to
that of my 1968 paper. Heuch took as the basis of the set of recombination
distributions, the special cases where at each gap between chromosomes, crossing
over was either impossible or compulsory. In §6 Heuch points out the advantages
of his approach. Every actual distribution appears as a convex combination of
the basic distributions, and the calculation of the weighting coefficients is more
straightforward. On the other hand, there is some advantage in having as basis
elements distributions that could conceivably occur in nature. This line of
research was continued by Heuch (1977), where particular emphasis was placed
on the determination of the equilibria of a k-locus system with mutation. The
last paragraph of §4 contains an explicit statement of the multilocus H a r d y -
Weinberg law. Here Heuch extended Reiersol's method for sex-linkage, and
applied it to a certain incompatibility system.
In (Holgate 1979) I introduced a convenient calculus of chromosome recom-
bination. Let the alleles at each locus be arbitrarily labelled 0, 1 and for each
subset I of the loci {1, 2 . . . . . k}, denote by a(I) the chromosome containing the
'1' allele at the loci o f / , and the '0' allele at those of I c. The output of union
between a(I) and a(J) can be written down in terms of the recombination
distribution. We now extend the notation so that 2(L J), for any pair L J of
subsets of { 1, 2 . . . . . k }, denotes the probability that the loci of I come from the
father, and those of J from the mother. These coefficients are mutually dependent,
and 2(L J ) = 0 if I c~J ~ ~ . We take a new basis in the genetic algebra of
multilocus gametes, given by c(I) = ~s=_ 1 ( - l )Fgla(J), a(I) = ~,s=_ i ( - 1)lJIc(J) •
We then have the simple multiplication c(I)c(J) = 2 ( L J ) c ( I w J ) . Since
c ( ~ ) c ( J ) = t.(~,~, J)c(J), the principal train roots, which are the eigenvalues of
multiplication by any element representing a population, e.g. c ( ~ ) , are the set
{)4(1, ~ ) } ( = {2(1)} for brevity), and the corresponding eigenfunctions are {c(I)}.
The first purpose of this communication is to draw attention to genetic
algebra, specifically its treatment of multilocus, multiallele systems, thus out-
lining its suitability for handling complex systems in nonselective genetics, but it
is opportune to announce a number of further results pertinent to this problem.
Let the population ~ x(I)a(I) in respect of the natural basis be denoted by
y(I)c(I) with respect to the canonical basis, so that y(I) = ( - l)]'q ~s=_ ~ x(J).
Proposition 1 The disequilibrium functions of a nonselective, k-locus, diallelic
system are {1-I~ ~)'(It)} (={a(I~ . . . . . Is)} say), where the product is taken over
104 P. Holgate

each mutually exclusive, but not necessarily exhaustive set of subsets I1,12 . . . . . I,,
excluding the functions y(f2~) and {y(i), i a single locus}. The corresponding
eigenvalues of the operator taking a population to its offspring generation are
{2s HT=~ 2(1,)}.
Proof. We represent the population with canonical coordinates {y(1):y(~) = 1}
by the vector of values a(I~ . . . . . L) specified in the proposition. They are
partially ordered by refinement of the partition in the argument. We have
{ c ( ~ ) +~y(I)c(I)} 2 = ~ y ( I ) y ( J ) 2 ( I , J)c(IwJ). Thus a ( I , , . . . , / s ) is trans-
formed into 2" H~= 12(I,)a(I1 . . . . . /,) + terms with more refined arguments.
Thus the squaring operator is equivalent to a linear operator with eigenfunctions
{r~(Ii,..., L)}, and eigenvalues {2" I-I) 1 ;t(/~)} as required. However the eigen-
values of y ( ~ ) and of each y(i) are 1.
The idea of the diallelic k-locus algebra of (Holgate 1979) can be extended to
the multiallelic case with r i + 1 alleles at locus i, i = 1. . . . . k. At any locus i we
can distinguish one allele, say the t i-th, and amalgamate the others. We form the
direct sum of copies of the k-locus diallelic algebra, corresponding to each string
r = tl . . . . . tk specifying the choice of distinguished allele at each locus. One
allele at each locus can be ignored, so we need V[ r / ( = R) summands.
Definition. The genetic algebra for k linked loci with r~ + 1 alleles at locus i, is
the direct sum of R = H ri copies of the diallelic k-locus algebra. The natural
basis elements are the symbols ~ = ~ ® a~(I~,), where each I~, runs through all the
subsets of K. Its elements are linear combinations ~ X(rl . . . . . rg) ~ _ , O a~(/~,).
The canonical basis is related to the natural basis by the equations
C(Ir)--2j=l~(-1)lJ~la(J~) for each subscripted string z. In each direct sum-
mand the product rule is c(L)c(J~) = 2(/~, J~)c(L wJ~), products corresponding
to different strings r being zero.
An individual is represented by the basis element ~ ®a~(L,) whose i-th
component is the ai (I~,) for which/~ is that set of loci at which the profile of the
individual agrees with the string r~. As usual, populations are represented by
those elements with x(r~ . . . . . zR) ~> 0, ~ X(Zl . . . . . ZR) = 1. The coefficient of the
component corresponding to allele t/ at locus i, with respect to the canonical
basis, will be denoted by y(r~ . . . . . zR).
Proposition 2 The principal train roots of the multilocus, multiallelic genetic algebra
are those of the diallelic algebra, with the multiplicity of each multiplied by R.
Proof. Each of the direct summands in the algebra is isomorphic to a diallelic
algebra with the same crossover distribution.
Genetic algebras of nonselective genetic systems admit obvious discrete
groups of automorphisms corresponding to permutations of the labels of the
alleles. However, the simplest law of gametic combination, a~az=~(a 1 + a 2 ) ,
implies that ifb~ =Oa~ + ( 1 --O)a2, b2 = q0al + (1 - (p)a2, then bib 2 ~(b I + b 2 ) .
This means that not only can a single locus, (r + 1)-allelic population be
described equally well by any r + l linear combinations of gametic frequencies
that add up to 1, of which r are linearly independent, but we can calculate with
these new coordinates from generation to generation just as if they were gametic
frequencies. The automorphism group Aut(k; r~ . . . . . rk) for the multilocus
situation is the product of the affine groups on spaces of dimensions re + 1,
i = 1. . . . . k, seen more naturally in terms of the canonical coordinates as
The multilocus Hardy Weinberg law 105

projective groups on the spaces spanned by Yl . . . . . Yrr Each automorphism is


the product of k linear transformations of which the j-th has the form
~ ! = t O,jy(T1, • • •, rR), and acts on the j-th index of each string ~, while the rest
are held constant (see Peresi 1988). The action of these automorphisms is
extended naturally to the disequilibrium functions. Let ~ ( I 1 , . . . , I,) be one of
the disequilibrium functions of the genetic algebra for k diallelic loci, and let
a(L . . . . . I , : r ) denote the version of O ' ( I 1 , . . . , l~) obtained when the string ri
specifies the distinguished alleles at each locus.
Proposition 3 The images of the set {cr(l~. . . . ,/~ :r)} for fixed r and all collec-
tions (I1 . . . . . Is) of mutually exclusive subsets of loci, under all automorphisms in
Aut (k; rl . . . . . rk), provide a complete set of disequilibrium functions for the
multiallelic k-locus genetic algebra. For fixed (I~ . . . . . I,) the images have the same
eigenvalue 2" F[~= 12(1,).
Proof For each such function a ( l l , . . . , Is) the images are eigenfunctions with
the same eigenvalues because of the automorphism. The total number obtained
in this way is equal to the dimension of the linear space required to linearise the
squaring operator in the direct sum of R copies of the diallelic algebra.
For a generic recombination distribution, all the plenary train roots are
distinct except for the possible multiplicities of 1. In special cases that have been
studied in genetics, equalities can occur. For instance we may have 2(1) = 2(1 + t),
where I + t is the set i + t, i s L where t is any integer such that I + t is a subset
of K. The nonuniquness of the disequilibrium functions can be described.
Proposition 4 Let a~ . . . . . a, be disequilibrium .[unctions corresponding to a com-
mon eigenvalue #. Then ~ Oi~i is also a disequilibrium .function with eigenvalue t~.
Proo£ By linearity.
Thus, while the disequilibrium functions are not uniquely defined, this does
not invalidate the results of Bennett (1954) and Hill (1974). All that they require
is to be able to select one appropriate set of disequilibrium functions. The situation
is like that of a matrix with eigenvalues of geometric multiplicity, where the truth
of a theorem dependent on the existence of a set of orthogonal eigenvectors is not
necessarily invalidated by the fact that they are not unique. The fact that the
genetic algebras being studied contain subalgebras obtained by setting to zero the
frequencies for specified sets of alleles at any locus, and admit factor algebras
given by projection onto a subset of the loci, ensures that the eigenvalues have
geometric multiplicities equal to their algebraic multiplicities. The referee points
out that the situation considered here parallels the fact that for some systems of
difference or differential equations there may be several Liapounov functions.
Examples. The gametic genetic algebra G for two alleles 0, 1 at each of two loci
is described in (Holgate 1976). The classical disequilibrium function in this case,
in terms of the natural gametic frequencies {x~j}, is Xooxl~-xolXjo, with
eigenvalue 1 - r, the probability of nonrecombination. For three alleles at each
locus the algebra is the direct sum of four algebras isomorphic to G, in which the
alleles at the first and second loci indicated by the e-strings 11, 12,21,22
respectively, play the roles of 11 in G. When calculating with elements representing
populations, we are working in a 12-dimensional subspace of G ® G ® G ® G,
since in each summand the coefficient of the first element relative to the canonical
basis (the sum of coefficients relative to the natural basis) is 1. The set of
disequilibrium functions corresponding to the direct summands can be calculated
106 P. Holgate

in terms of the natural coordinates. Whenever the index 'l' occurs it is replaced
by i , the t-th digit of the e-string of the relevant component of the direct sum.
A coordinate containing 'O's is replaced by a sum of coordinates in which
the indices take all values except i t . Some care is necessary here since the
natural coordinatisation is 'singular', since ~ xil . . . xik = 1. For the string 11
the disequilibrium function is transformed into (Xoo+ x02 + X2o + x22)xli-
(Xo~+X2i)(Xlo+X12). For the strings 12,21,22 we obtain respectively:
(X00 -~- X20 -~- X01 + X21)X12 --(X01 -~- X21)(XIO + XI2) = (Xo0 -1- X20)X12 --(Xo1 + XI2)Xlo;
(Xo0 -~- Xlo)X21 -- (X02 + X21 )X20; (Xo0 -~- X01 -~- XlO -'~ Xll ) -- (X02 q- X12)(X20 + X21 ),
the third and fourth functions being written down from the second and first
respectively by interchanging 1 and 2. For three loci each with three alleles, we
have the disequilibrium functions as above for each of the three pairs of loci, the
eigenvalues being the probabilities of nonrecombination between each pair. In
addition, the diallelic situation leads to the further disequilibrium function, which
by Proposition 1 is y ~ l in canonical coordinates. Use of the symbolic calculus with
Yijk ?]i?~j17k,the substitution ~ = rh - t/o and the replacement ~ j ~ k = Xijk leads
:

to the expression in natural coordinates ~ ( - 1 ) i' +~2+i3x~1i2~3, with each index


taking values 0, 1. We can then compute the eight further functions arising from
three alleles at each of three loci, as before. In canonical terms they are simply
the Y~i2i3, with each index taking values 1, 2. The common eigenvalue is the
probability of no recombination at all among the three loci.

References

Abraham, V. M.: Linearising quadratic transformations in genetic algebras. Proc. Lond. Math. Soc.,
III. Ser. 40, 346 363 (1980)
Abraham, V. M.: The induced linear transformation in a genetic algebra. Proc. Lond. Math. Soc., III.
Ser. 40, 364-384 (1980)
Bennett, J. H.: On the theory of random mating. Ann. Eugen. 184, 311-317 (1954)
Geiringer, H.: On the probability theory of linkage in Mendelian heredity. Ann. Math. Stat. 15, 25 57
(1944)
Gonshor, H.: Special train algebras arising in genetics. Proc. Edinb. Math. Soc., II. Ser. 12, 41-53
(1960)
Heuch, I.: k loci linked to a sex factor in haploid individuals. Biom. Z. 13, 57-68 (1972)
Heuch, I.: The linear algebra for linked loci with mutation. Math. Biosci. 16, 263 271 (1973)
Heuch, I.: Genetic algebras for systems with linked loci. Math. Biosci. 34, 35-47 (1977)
Hill, W. G.: Disequilibrium among several linked neutral genes in finite populations. I. Theor. Popul.
Biol. 5, 366 392 (1974)
Holgate, P.: Sequences of powers in genetic algebras. J. Lond. Math. Soc., 42, 489 496 (1967)
Holgate, P.: The genetic algebra o f k linked loci. Proc. Lond. Math. Soc., III. Ser. 18, 315 327 (1986)
Holgate, P.: Direct products of genetic algebras and Markov chains. J. Math. Biol. 3, 289 295 (1976)
Holgate, P.: Canonical multiplication in the genetic algebra for linked loci. Linear Algebra Appl. 26,
281 -287 (1979)
Holgate, P.: Linearisation of quadratic operators in genetic algebras. Cah. Math. 38, 23 33 (1989)
Holgate, P.: Bibliography of genetic algebras and related topics. Typescript (1991)
Karlin, S, Liberman, U.: Global convergence properties in multilocus viability selection models: the
additive model and the Hardy Weinberg law. J. Math. Biol. 29, 161 176 (1990)
Peresi, L. A.: The derivation algebra of gametic algebra for linked loci. Math. Biosci. 91, 151 156
(1988)
Reiersol, O.: Genetic algebras studied recursively and by means of differential operators. Math. Scand.
10, 25-44 (1962)
W6rz-Busekros, A.: Algebras in genetics. (Lect. Notes Biomath., vol. 36) Berlin Heidelberg New York:
Springer 1980

You might also like