You are on page 1of 6

American Journal of Epidemiology

Copyright 1999 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved

vol. 149, No. 8


Printed in U.S.A

Biased Tests of Association: Comparisons of Allele Frequencies when


Departing from Hardy-Weinberg Proportions

Daniel J. Schaid1'2 and Steven J. Jacobsen

association; bias (epidemiology); case-control studies; chi-square statistic; genes; significance tests

nificant. However, a number of other investigators


have failed to reproduce these results (9). Metaanalyses on these studies (10, 11) concluded that the
differences between cases and controls could be
explained by variation in the genetic marker allele frequencies among the various ethnic groups, as well as
sampling error. This demonstrates that the choice of
appropriate controls can be difficult (2-4, 6), due to
potential unmeasured confounding factors, such as
ethnic background of cases and controls.
Another potential source of error is the choice of
analytic method. The distribution of genotypes is
often compared between cases and controls by using
Pearson's chi-square statistic for a 2 x G contingency
table, where G is the number of observed genotypes.
This method can have limited power when G is large,
because of the large number of degrees-of-freedom.
Alternatively, the frequencies of K alleles are often
compared by cross-classifying both alleles of each
person according to their case-control status, creating
a 2 x K table. Frequencies are then compared with
Pearson's chi-square statistic. The use of allele frequencies can be more appealing than comparing
genotype frequencies because the sample size is
twice as large (with each person contributing two
alleles) and the degrees-of-freedom are fewer.
However, the validity of Pearson's chi-square statis-

Association studies of candidate genes with disease


have helped to decipher the genetic basis of many
complex diseases. The case-control design provides an
efficient method for assessing these associations.
Unfortunately, many initial genetic associations found
in case-control studies have been difficult to reproduce. This could be due to reporting bias (1), with the
first publication representing an extreme outlier.
Alternatively, bias in choice of cases or controls (2-4),
choice of genetic markers or genotyping errors (5),
unaccounted confounding factors (6), or improper analytic methods could explain the difficulty in replicating
findings. An example of the difficulty in interpreting
multiple case-control association studies is provided
by the controversy of the association of alcoholism and
the dopamine D2 receptor. The first report (7) documented a large odds ratio of 8.7 {p < 0.001); a second
study by the same group of investigators (8) reported a
reduced odds ratio of 3.7 that was still statistically sigReceived for publication February 17,1998 and accepted for publication August 12, 1998.
Abbreviation: HWE, Hardy-Weinberg Equilibrium.
' Department of Health Sciences Research Mayo Clinic/Mayo
Foundation, Rochester, MN.
2
Department of Medical Genetics, Mayo Clinic/Mayo Foundation,
Rochester, MN.
Reprint requests to Dr. Daniel J. Schaid, Harwick 7, Mayo Clinic,
200 First Street S.W., Rochester, MN 55905.

706

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

Association studies of genetic markers or candidate genes with disease are often conducted using the
traditional case-control design. Cases and controls are sampled from genetically unrelated subjects, and allele
frequencies compared between cases and controls using Pearson's chi-square statistic. An assumption of this
analysis method is that the two alleles within each subject are statistically independent, at least when no
association exists. This is equivalent to assuming that the frequencies of the genotypes in the general population
comply with Hardy-Weinberg Equilibrium proportions, which may not always be the case. However, deviations
from Hardy-Weinberg Equilibrium can inflate the chance of a false-positive association. These results
demonstrate that when comparing the frequencies of two alleles between cases and controls, the chance of a
false-positive association can be substantially increased if homozygotes for the putative high-risk allele are more
common in the general population than predicted by Hardy-Weinberg Equilibrium. In contrast, Pearson's chisquare statistic can be conservative if the frequency of homozygotes for the high-risk allele is less than that
predicted. A statistically valid method that corrects for deviations from Hardy-Weinberg Equilibrium is presented,
so that the chance of a false-positive association is not greater than the acceptable level. Am J Epidemiol
1999;149:706-11.

Biased Association when Departing from Hardy-Weinberg Proportions

Am J Epidemiol

Vol. 149, No. 8, 1999

tify the false-positive rate, and to offer guidelines and


a correct analytic method to account for deviations
from HWE when comparing allele frequencies in casecontrol studies.
STATISTICAL METHODS

Consider comparing the frequencies of two alleles


between cases and controls. Although Pearson's chisquare statistic is often used to make this comparison,
it is easier to present statistical properties by use of an
equivalent statistic, based on the normal distribution.
Let pd denote the estimated frequency of allele A
among diseased cases, and let pc denote that among
the non-diseased controls, where these estimates are
obtained by simply counting alleles. A statistic to compare pd and pc is
z =

~ Pc)

(i)

here Vis the variance of (pd-pc). If Vis correctly specified, z has an approximate standard normal distribution. When HWE exists, V can be written as

^77 I

^ H W E ~~

(2)

where p is the underlying allele frequency common


to both cases and controls under the null hypothesis
of no association, and Nd and Nc are the number of
cases and controls, respectively. The allele frequency
p can be estimated by pooling cases and controls.
When there are departures from HWE, V can be
written (17) as
~P)

NonHWE

2Nd +

Wj

(3)

where PM is the frequency of AA homozygotes. This


latter variance includes a measure of discrepancy
between the frequency of AA homozygotes and that
predicted by HWE; let 6 denote the discrepancy coefficient, where 6 = (PM - p2). Under the null hypothesis
of no association, the relative frequency PM can be estimated by pooling cases and controls, but it is not clear
if this is the best approach when considering power. As
an alternative to expression 3, one can estimate the
variance of the allele frequency among cases,

=W ~

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

tic requires the independence of alleles in the general


population to maintain the correct false-positive
(Type-I error) rate.
When sampling nonrelated cases and controls, genotypes are independent among people, but alleles within genotypes may or may not be independent.
Statistical independence of alleles is equivalent to the
genotype frequencies complying with HardyWeinberg Equilibrium (HWE) proportions. For a simple case, consider two alleles, denoted A and B, where
A is thought to be the high-risk allele, and B represents
all other alleles. If p is the population frequency of
allele A, then the HWE proportions for genotypes AA,
AB, and BB are p2, 2p(l - p), and (1 - p)2.
Independence of alleles can be tested by comparing the
observed genotype proportions to those expected when
there is HWE (12). Note that even if the general population is in HWE, the expected marker genotype proportions among diseased cases can deviate from HWE
when a true association exists, and the amount of deviation depends on the genetic mechanism. For example,
if a marker allele is associated with a disease because
of a rare dominant disease susceptibility allele, then
HWE is not expected to hold, yet, for association with
a recessive disease susceptibility allele, HWE may
hold among the cases, but with a marker allele frequency greater than in the general population (13, 14).
Hence, testing for HWE should be performed among
only controls, assuming that the disease is rare in the
population.
Deviations from HWE can be caused by multiple
reasons, such as small population variation (random
genetic drift), and the structure of the population. The
latter may include inbreeding, assortative mating,
stratification, or admixture of different ethnic groups.
If a population is composed of a recent admixture of
different ethnic groups that have different frequencies
of marker alleles, then any trait more frequent in one
of these ethnic groups will be positively associated
with any marker allele that is more frequent in that
group, even if the trait and marker locus are not genetically linked. This type of association is an example of
confounding due to ethnic background. As a method
to assess the potential for an admixed population, it
has been proposed that testing for departures from
HWE should be routinely performed for association
studies (15). However, the impact of departures from
HWE on the Type-I error rate has been ignored in
many applications.
When comparing allele frequencies between cases
and controls, ignoring deviations from HWE can alter
the Type-I error rate. Although this has been speculated to occur (16), the magnitude of the problem has not
been explored. The purposes of this paper are to quan-

707

708

Schaid and Jacobsen

by using only cases to estimate p and PAA' and a similar method for controls to estimate V.NooHWEj:' and then
add these to compute

>
= PI

'NonHWE

= V,NonHWE,*/

+ v,NonHWE,c-

When HWE is falsely assumed to be true, and Vuwc


*

= PI

HWb

- P
So,

ZNonHWE

the

square root of the variance ratio,


^NonHWE> which depends on the discrepancy
coefficient 8, determines the true Type-I error rate. The
effects of departure from HWE on the true Type-I error
rate are considered separately in situations when there
is an excess and a deficit of AA homozygotes.
When there is an excess of AA homozygotes, 8 > 0.
Because PM = p2 + 8 and p = PM + P^/2 imply that
P ^ = 2[p(l -p) - 8] and Pw is bounded by 0 and 1, the
maximum value of Ii is p{\ -p). At this maximum value,
there are no AB heterozygotes, because PM=p and P ^
= 0. An alternative way to express 8 is a fraction,/, of its
maximal discrepancy value: 8 =fp(l - p)- Substituting
this representation into the variance ratio results in

max = 0.17
^ ^
Nominal Error Rate

0.15-

0.01

0.10max = 0.07

0.05 -

0.01

o.o

0.0

02.

0.4

0.8

0.8

1.0

Fraction of Maximum Discrepancy from HWE

FIGURE 1. True Type-I error rate as a function of fractional maximum discrepancy from Hardy-Weinberg Equilibrium (HWE) when there is an
excess of AA homozygotes and the assumed Type-I error rate is either 0.01 or 0.05.

Am J Epidemiol Vol. 149, No. 8, 1999

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

is used in expression 1, the Type-I error rate can be


either inflated or deflated (i.e., conservative) relative
to the assumed error rate. The Type-I error rate will be
inflated when Viram is an underestimate of the true vanance, which occurs when 5 > 0, or, in other words,
when the frequency of AA homozygotes exceeds that
predicted by HWE. In contrast, the Type-I error rate is
deflated when VHWE is greater than VNonHWE, which
occurs when the frequency of AA homozygotes is less
than that predicted by HWE.
To examine the amount of deviation of the true
Type-I error rate from that assumed, let za be the quantile of a standard normal distribution that gives an
assumed Type-I error rate for a two-sided test of a.
Also, let z^^ and zNooHWE be the test statistics using
Vjj^, and VNonHWE, respectively, in expression 1. Note
that if HWE is false, then z^^, does not have a standard
normal distribution, but ztiooHWE does. Assuming HWE
to be false, the Type-I error rate when using zlWfE can
be evaluated by the following probability calculations:

Biased Association when Departing from Hardy-Weinberg Proportions

'HWE

(5)

1+/'

'NonHWE

which is independent of allele frequency. Substituting


this variance ratio into expression 4 allows evaluation
of the true Type-I error rate.
When a deficiency of AA homozygotes exist, 8 < 0,
and the maximum amount of negative discrepancy is
-p1. At this value, there are no AA homozygotes
because PM = 0 and P^ = 2p. After expressing 5 as a
fraction of its maximum negative value, the variance
ratio can be written as
'HWE

(6)

'NonHWE

RESULTS

When the frequency of AA homozygotes exceeds


that predicted by HWE, the Type-I error rate can be
0.05 -

inflated, as illustrated in figure 1. Here, the true TypeI error rate is plotted as a function of the fractional
maximum discrepancy, for an assumed Type-I error
rate of either 5 percent or 1 percent. When discrepancy
is at its maximum, the true Type-I error rate can be as
high as 17 percent for an assumed rate of 5 percent,
and as high as 7 percent for an assumed rate of 1 percent.
When the frequency of AA homozygotes is less than
that predicted by HWE, the Type-I error rate can be
deflated, as illustrated in figure 2 for both a common
allele (p = 0.25) and a rare allele (p = 0.05). For a common allele (p = 0.25), the Type-I error rate can be quite
conservative, especially if the assumed error rate is 5
percent. The amount of conservatism is less when the
assumed Type-I error rate is small, as for the assumed
error rate of 1 percent in figure 2. As the allele frequency gets smaller, the amount of negative disequilibrium is also reduced, resulting in a less conservative
Type-I error rate (e.g., when p = 0.05 in figure 2).
The results in figures 1 and 2 are based on expression 4, which assumes that the sample size is large
enough for the normal approximation to be adequate.
To validate the adequacy of this approximation, simulations were performed. The genotypes for an equal
number of cases and controls (Nd = Nc = 50 or 100)
were sampled according to the probabilities PM - p2 +
l P /
where p = 0.10 a n d / = 0, 0.5, or 1.0. The maximum
discrepancy, 5 , was p{\ -p) for excess AA homozy-

^
.

p = .O5

0.04 -

0.03 -

LU

Nominal Error Rate


%.

0.02 0.01

0.01

_p = .O5
p = .25

0.0

0.0

02

0.4

0.6

0.8

1.0

Fraction of Maximum Discrepancy from HWE

FIGURE 2. True Type-I error rate as a function of fractional maximum discrepancy from Hardy-Weinberg Equilibrium (HWE) when there is a
deficiency of AA homozygotes, the allele A is either common (p = 0.25) or rare (p = 0.05), and the assumed Type-I error rate is either 0.01 or
0.05.
Am J Epidemiol

Vol. 149, No. 8, 1999

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

which depends on the allele frequency.


In this simple case, only two alleles have been
assumed. Thus, the comparison can focus on only one
allele frequency. When K alleles are compared simultaneously by applying Pearson's chi-square statistic to
&2xK contingency table, this approach can be extended to consider the dependence of alleles under the null
hypothesis.

709

710

Schaid and Jacobsen

TABLE 1. Type-I error rates for statistical methods with and without assumptions of Hardy-Welnberg
Equilibrium (HWE)
Frequency
of
AA
homozygotes

Fraction of maximum discrepancy (l)t


1.0

0.5

0.0

size
(",= " )

Excess

50
100
Large-sample
approximation

0.033
0.047
0.050

0.041
0.057
0.050

0.091
0.104
0.110

0.050
0.062
0.050

0.166
0.147
0.166

0.054
0.049
0.050

Deficient

50
100
Large-sample
approximation

0.040
0.057
0.050

0.050
0.061
0.050

0.054
0.038
0.044

0.052
0.056
0.050

0.028
0.035
0.038

0.050
0.061
0.050

gotes, and -p2 for deficient AA homozygotes. For each


sample size, 1,000 repetitions were sampled; for each
sample, the frequency of allele A was compared
between cases and controls using both z^^ (assuming
HWE) and zNoaHWE (correcting the variance for deviations from HWE), with an assumed Type-I error rate of
5 percent. The simulated Type-I error rates are presented in table 1, along with those predicted by expression 4. For these sample sizes, the simulated Type-I
error rates are close to those predicted, suggesting that
expression 4 is a reliable indicator of the magnitude of
false-positive results when HWE does not hold. Also,
the simulations indicate that zSonHWE adequately corrects for deviations from HWE, achieving the assumed
Type-I error rate of 5 percent.
To illustrate the difference in statistical significance
when considering departures from HWE, both z^^ and
^NMIHWE statistical tests were applied to data recently published on the association of a molecular variant of the
angiotensinogen gene and coronary atherosclerosis. In
the report by Ishigami et al. (18), the molecular variant
of angiotensinogen that exists in exon 2, a thyminecystosine transition at nucleotide 704, was labeled a,
and alleles which did not have this variant were
labeled A. Among the 160 control subjects, 30 had
genotype AA (18.8 percent), 51 had Aa (31.9 percent),
and 79 had aa (49.4 percent). Among the 82 cases with
coronary atherosclerosis, 6 were AA (7.3 percent), 22
were Aa (26.8 percent), and 54 were aa (65.9 percent).
The frequencies of the a allele were 79 percent among
cases and 65 percent among controls, but there was a
significant excess of homozygotes among the controls
(p = 0.00019). In pooled cases and controls, the
amount of departure was 47 percent of the maximum
departure. When ignoring this departure, z^^ = 3.17,
giving a probability value of p = 0.0015. Taking this

departure into account results in zNooHWE = 2.80 and/? =


0.005. Thus, although both methods of analysis resulted in a statistically significant association for this
example, the strength of significance was less after
appropriately accounting for departures from HWE.
DISCUSSION

These results demonstrate that deviations from


HWE can alter the assumed Type-I error rate, and that
the true error rate occurs in a predictable manner. If
AA homozygotes are more common in the general population than predicted by HWE, the chance of a falsepositive finding can be greater than assumed (11 percent if one-half of the maximum discrepancy, and up to
a 17 percent chance, when the assumed chance is 5
percent). Population dynamics that can lead to an
increased frequency of homozygotes are inbreeding, or
a stratified population, also called Wahlund's principle
in population genetics (19).
If homozygotes occur less frequent than predicted
by HWE, the true Type-I error rate will be less than
assumed, leading to a conservative statistical comparison. For a common allele, the amount of conservatism
can be substantial. However, the amount of conservatism depends on the allele frequency, and, as the
candidate allele becomes more rare, the amount of
conservatism becomes smaller. This finding may be
important for associations of alleles that have a selective heterozygote advantage (20).
In summary, these results demonstrate that the probability of a Type-I error can be underestimated when
comparing frequencies of two alleles between cases
and controls when there is a departure from HWE.
Sasieni (16) recently speculated that the Type-I error
rate would not be correct if HWE is falsely assumed.
Am J Epidemiol

Vol. 149, No. 8, 1999

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

Nq, number of cases; Nc, number of controls. Type-I error rates for sample sizes of 50 and 100 are based on
simulations; large sample approximation is based on expression 4 in the text,
t f = 0 implies HWE, and f = 1 is the maximum departure from HWE.
t Zywz, statistic assuming HWE; ztlaltttm, statistic with variance corrections for departure from HWE.

Biased Association when Departing from Hardy-Weinberg Proportions

Sasieni suggested that the Pearson chi-square statistic


should not be used to compare allele frequencies, but
rather only genotype frequencies should be compared;
trends in genetic relative risks could be assessed with
Armitage's trend test (21). The results presented here
quantify the biased Type-I error rate and demonstrate
how to correctly account for dependencies of alleles
where there are departures from HWE.

7.
8.
9.
10.
11.

ACKNOWLEDGMENTS

12.

This research was partially supported by grant no.


GM51256 from the National Institutes of Health.

13.

15.
REFERENCES
1. Begg CB, Berlin JA. Publication bias: a problem in interpreting medical data. J R Stat Soc [A] 1988;151:419-45.
2. Wacholder S, McLaughlin JK, Silverman DT, et al. Selection
of controls in case-control studies. I. Principles. Am J
Epidemiol 1992;135:1019-28.
3. Wacholder S, Silverman DT, McLaughlin JK, et al. Selection
of controls in case-control studies, n. Types of controls. Am J
Epidemiol 1992;135:1029-41.
4. Wacholder S, Silverman DT, McLaughlin JK, et al. Selection
of controls in case-control studies. HI. Design options. Am J
Epidemiol 1992;135:1042-50.
5. Stefanski LA, Carroll RJ. Covariate measurement error in
logistic regression. Ann Stat 1985;13:133551.
6. Falk CT, Rubinstein P. Haplotypc relative risks: an easy reli-

Am J Epidemiol Vol. 149, No. 8, 1999

16.
17.
18.
19.
20.
21.

able way to construct a proper control sample for risk calculations. Ann Hum Genet 1987;51:227-33.
Blum K, Nobel EP, Sheridan PJ, et al. Allelic association of
human dopamine D2 receptor gene in alcoholism. JAMA
1990;263:2055-60.
Blum K, Noble EP, Sheridan PJ, et al. Association of the A1
allele of the D dopamine receptor gene with severe alcoholism. Alcohol 1991;8:409-16.
Holden C. A cautionary genetic tale: the sobering story of D .
Science 1994;264:1696-7.
Gelernter J, Goldman D, Risch N. The Al allele at the D2
dopamine receptor gene and alcoholism. JAMA
1993;269:1673-7.
Pato CN, Macciardi F, Pato MT, et al. Review of the putatitve
association of dopamine D receptor and alcoholism: a metaanalysis. Am J Med Genet 1993;48:78-82.
Guo SW, Thompson EA. Performing the exact test of HardyWeinberg proportion for multiple alleles. Biometrics
1992;48:361-72.
Risch N. A general model for disease-marker association. Ann
Human Genet 1983;47:245-52.
Thomson G. HLA disease associations: models for the study of
complex human genetic disorders. Clin Rev Clin Lab Sci
1995;32:183-219.
Tiret L, Cambien F. Departure from Hardy-Weinberg
Equilibrium should be systematically tested in studies of association between genetic markers and disease. (Letter).
Circulation 1995;92:3364-5.
Sasieni PD. From genotypes to genes: doubling the sample
size. Biometrics 1997;53:1253-61.
Weir BS. Genetic data analysis. Sunderland, MA: Sinauer
Associates, Inc, 1990:34.
Ishigami T, Umemura S, Iwamoto T, et al. Molecular variant of
angiotensinogen gene is associated with coronary atherosclerosis. Circulation 1995;91:951^1.
Li CC. First course in population genetics. Pacific Grove, CA:
The Boxwood Press, 1976:522.
Haiti DL, Clark AG. Principles of population genetics. 2nded.
Sunderland, MA: Sinauer Associates, Inc, 1989.
Armitage P. Tests for linear trends in proportions and frequencies. Biometrics 1955;ll:375-86.

Downloaded from aje.oxfordjournals.org by guest on November 7, 2010

14.

711