You are on page 1of 14

Genetic Epidemiology 35 : 606–619 (2011)

Comparison of Statistical Tests for Disease Association

With Rare Variants
Saonli Basu and Wei Pan
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota

In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating association
between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low
frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent
development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs.
However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive
comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations,
and representative tests for both CVs and RVs. As expected, if there are no or few non-causal (i.e. neutral or non-associated)
RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either
protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association
directions) and a new test called kernel-based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC
is more robust than simple pooled association tests in the presence of non-causal RVs; however, as the number of non-causal
CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed
for CVs and a new test called C-alpha test proposed for RVs, each of which can be regarded as testing on a variance
component in a random-effects model. Interestingly, several methods based on sequential model selection (i.e. selecting
causal RVs and their association directions), including two new methods proposed here, perform robustly and often have
statistical power between those of the above two classes. Genet. Epidemiol. 35:606–619, 2011. r 2011 Wiley Periodicals, Inc.

Key words: C-alpha test; kernel machine regression; logistic regression; model selection; permutation; pooled association
tests; random-effects models; SSU test; Sum test; statistical power

Contract grant sponsor: NIH; Contract grant numbers: R21DK089351; R01HL65462; R01HL105397.
Correspondence to: Wei Pan, Division of Biostatistics, MMC 303, School of Public Health, University of Minnesota, Minneapolis,
MN 55455--0392. E-mail:
Received 30 November 2010; Revised 23 March 2011; Accepted 3 June 2011
Published online 18 July 2011 in Wiley Online Library (
DOI: 10.1002/gepi.20609

association with any single RV. Hence, the most popular

INTRODUCTION statistical test for GWAS based on testing single SNVs is
not expected to perform well. In fact, in light of the
Genome-wide association studies (GWASs) have success- significant difference in variant frequencies between RVs
fully identified thousands of common genetic variants, and CVs, common wisdom might suggest that many
mainly common single nucleotide variants (SNVs), asso- existing methods for CVs would not work either, motivat-
ciated with complex traits, including many common ing the development of new statistical tests specifically
diseases [Hindorff et al., 2010]. However, these identified targeting RVs. The most striking feature of several recently
variants can only explain a small proportion of inheritable proposed new tests for RVs is the idea of pooling or
phenotypic variance [Maher, 2008], leaving the door open collapsing: rather than testing on individual SNVs one by
for many more yet to be discovered variants. A popular one (as in GWASs), one would pool or collapse multiple
hypothesis is that many more rare variants (RVs) may rare SNVs together such that collectively they would have
contribute to the missing heretability unexplained by a reasonably high frequency, and then apply a test to the
discovered common variants (CVs) [Bodmer and Bonilla, collapsed genotype [Morgenthaler and Thilly, 2007; Li and
2008; Gorlov et al., 2008; Pritchard, 2001; Pritchard and Cox, Leal, 2008; Madsen and Browning, 2009; Price et al., 2010].
2002]. At the same time, biotechnological advances have Albeit well motivated and shown to perform better than
made it feasible to re-sequence parts of or whole genomes. single SNV-based testing, such a pooling strategy has its
In anticipation of the arrival of massive amounts of next- own limitations. If the RVs to be pooled are associated
generation sequencing data, the chance of success in with the trait in different directions, i.e. some are
detecting association between complex traits and RVs associated positively while others negatively, the strategy
largely depends on statistical analysis strategies for RVs; of pooling may weaken or diminish the signal in
see two excellent timely reviews [Asimit and Zeggini, associated RVs. Furthermore, if many of the RVs are
2010; Bansal et al., 2010]. Since frequencies of RVs are very non-causal, i.e. they are not associated with the traits,
low, even with high penetrance, it will be difficult to detect pooling will inevitably introduce noises into the collapsed

r 2011 Wiley Periodicals, Inc.

Statistical Tests for Rare Variants 607

genotype and thus have reduced statistical power. Note coded by an additive genetic model: Xij 5 0, 1 or 2 for the
that, the effects of RVs are not always in the same number of the RV (minor allele) for SNV j, j ¼ 1; . . . ; k.
direction: they can be protective or deleterious. For
example, some RVs in gene PCSK9 are associated with METHODS ORIGINALLY PROPOSED FOR CVS
lower plasma levels of low-density lipoprotein cholesterol
(LDL-C) while others associated with higher levels of Logistic regression. Several most popular statistical
LDL-C [Kotowski et al., 2006]. In recognition of these tests are based on logistic regression:
limitations, several methods based on model selection X
have been proposed recently [Han and Pan, 2010a; Logit PrðYi ¼ 1Þ ¼ b0 1 Xij bj : ð1Þ
Hoffmann et al., 2010; Bhatia et al., 2010; Zhang et al., j¼1
2010]. The main idea is to determine whether a RV should
be pooled, and if so, what is its association direction. Since The null hypothesis to be tested is H0: b¼ðb1 ; . . . ; bk Þ0 ¼0.
these methods are based on either a marginal test or a step- Maximum likelihood can be utilized to derive asymptoti-
up procedure on each individual RV, the power of cally equivalent score test, Wald’s test and likelihood ratio
selecting a RV and determining its association direction test (LRT); here we focus on the score test for its
may be limited. Here we propose two new model-selection computational simplicity. For model (1), the score vector
procedures that improve over the existing pooled associa- and its covariance matrix are
tion tests while maintaining low computational cost, X
borrowing the idea of Basu et al. [2010] in linkage analysis. U¼  i;
ðYi  YÞX
Very recently several new tests, including a kernel-based i¼1
adaptive clustering (KBAC) [Liu and Leal, 2010], a C-alpha X
test [Neale et al., 2011] and a replication-based test (RBT)   YÞ
V ¼ Yð1  
ðXi  XÞðX  0
i  XÞ ;
[Ionita-Laza et al., 2011], specifically designed for RVs and i¼1
aiming to overcome various weaknesses of the pooled  Pn Yi =n and X
where Xi ¼ðXi1 ; . . . ; Xik Þ0 , Y¼  ¼ Pn Xi =n.
association tests, have appeared. However, no comparison i¼1 i¼1
The most popular test for CVs in GWAS is the
was made among these new tests and model-selection
(univariate) minP (UminP) method that tests on each
approaches for RVs. More generally, in the current
single SNVs one-by-one and then takes the minimum of
literature, there is no evaluation on the applicability of
their P-values. The corresponding UminP score test
most existing tests to RVs. Although most existing tests
statistic is
have been proposed for and mainly applied to CVs, some
were originally developed for high-dimensional data and TU min P ¼ max Uj2 =Vjj ;
thus are likely to be robust to the large number of j¼1;...;k
parameters facing the analysis of RVs, and may have where Uj is the jth element of U and Vjj is the (j,j) th
reasonable power for RVs. Goeman’s score test [Goeman element of V. An adjustment for multiple testing has to be
et al., 2006] and kernel machine regression (KMR) [Liu et al., made. Although the Bonferroni and permutation methods
2008] are two such examples. Since Goeman’s test is are most commonly used, a better way is to derive the null
permutation-based and is equivalent to a test called the distribution of TU min P and thus a P-value based on
sum of squared score (SSU) test [Pan, 2009], we consider the numerical integration with respect to a multivariate
SSU test here. As to be shown, perhaps surprisingly, both Gaussian density [Conneely and Boehnke, 2007].
the SSU test and KMR, along with the C-alpha test A joint test as an alternative to the UminP test is the
specifically proposed for RVs [Neale et al., 2011], performed multivariate score test:
extremely well under certain situations when the pooled
association tests had low power. In summary, given the TScore ¼ U 0 V1 U;
compelling interest of the scientific community in detecting
association between complex traits and RVs while little is which has an asymptotic chi-squared distribution with
known about the relative performance and merits of degrees of freedom (DF) k. If DF k is large, the test may not
various existing and new tests, it is timely to have a have high power.
comparative evaluation of the tests, an endeavor taken here. Pan [2009] proposed two tests, called SSU and sum of
weighted squared score (SSUw) tests:

METHODS TSSU ¼ U 0 U; TSSUw ¼ U 0 ðDiagðVÞÞ1 U;

where Diag(V) is a diagonal matrix with the diagonal
To be concrete, we restrict the attention to the case- elements of V. Under H0, each of the two test statistics has
control design with a binary trait, say disease, though an asymptotic distribution of a mixture of w21 ’s, which can
many of the methods discussed are based on logistic be approximated by a scaled and shifted chi-squared
regression and can be easily extended to generalized linear distributions [Pan, 2009]. The two tests can be regarded as
models (GLMs) for other types of traits. We do not modified score test by ignoring the non-diagonal elements
consider adjusting for covariates, such as environmental of V, i.e. correlations among the components of U, which is
factors, though again methods based on logistic regression known to be advantageous for high-dimensional data
can easily accommodate covariates. We assume that the [Chen and Qin, 2010]. More importantly, as shown by Pan
analysis goal is to detect whether there is any association [2009], the SSU test is equivalent to the permutation-based
between the disease and a group of rare SNVs, for version of [Goeman et al., 2006] which is derived as a
example, SNVs in a sliding window or in a functional variance component score test for a random-effects (R-E)
unit such as gene. We denote the binary trait Yi 5 0 for n0 logistic regression model. Specifically, in model (1), if we
controls, and Yi 5 1 for n1 5 nn0 cases. The k variants are assume bj’s as random effects drawn from a distribution
Genet. Epidemiol.
608 Basu and Pan

with E(b) 5 0 and CovðbÞ ¼ t, then Goeman’s score test difference between the SSU and SSUw tests, illustrating
on H0: t 5 0 is when one of the two is more powerful than the other. For
example, if the causal variants tend to have lower MAFs
S ¼ 12U 0 U  12trðVÞ; ð2Þ than that of non-causal ones, the SSU test is expected to be
less powerful; otherwise, the SSU test is more powerful.
where tr(A) is the trace of matrix A. Observing that V is A potential problem with the above weighting scheme
invariant to permutations of Y, we know that, under (and with the SSUw test) is that, since a causal variant may
permutations, using S is equivalent to using SP ¼ U 0 U, have a higher MAF in cases but a lower MAF in controls,
which is equivalent to the SSU, SSUw and score test and thus a higher overall MAF across both cases and
statistics with  ¼ I,  ¼ DiagðVÞ and  ¼ V, respectively. controls, it will downweight this causal variant, leading to
Note that Goeman’s test was originally derived to test on a reduced power. This is a reason that Madsen and
large number of parameters for high-dimensional micro- Browning [2009] proposed using the MAFs of only
array data, though its good performance for lower-
controls to construct weights. Specifically, if there are n0j
dimensional SNV data have been empirically confirmed
too [Chapman and Whittaker, 2008; Pan, 2009]. minor alleles for variant j in all the controls, then we can
Another test performed well under certain situations for use weights
CVs is the so-called Sum test, as noted by Chapman and qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Whittaker [2008] and Pan [2009]. The Sum test was wj ¼ 1= nqi ð1  qj Þ; qj ¼ ðn0j 11Þ=ð2n0 12Þ: ð4Þ
motivated to strike a balance between jointly testing on
multiple SNVs and its resulting DF. The Sum test is based With such weights, which already use the disease labels,
on a key and generally incorrect working assumption that the asymptotic SSU test (and Sum test) would have
the SNVs are all associated with the trait with a common inflated Type I error rates. Alternatively, we use a standard
association strength: permutation to calculate P-values, and denote the result-
ing test wSSU-P.
k Logistic KMR and genomic similarity-based
Logit PrðYi ¼ 1Þ ¼ bc;0 1 Xij bc ; ð3Þ methods. Rather than testing the effects of the SNVs
j¼1 parametrically (i.e. linearly in our specified model (1)), one
can adopt a nonparametric model:
where bc reflects the common odds ratio (OR) between the
trait and each SNV under the working assumption. While Logit PrðYi ¼ 1Þ ¼ b0 1hðXi1 ; . . . ; Xik Þ; ð5Þ
utilizing all the SNVs, the Sum test avoids the possibly too
where h(  ) is an unknown nonparametric function to be
large DF, and thus loss of power, of other multivariate
estimated, offering the flexibility in modeling the effects of
tests. It only requires to test on a single parameter with H0:
the SNVs on the trait. In a specific approach called KMR
bc 5 0 by a score test (or its asymptotically equivalent
[Liu et al., 2008], the form of h(  ) is determined by a user-
Wald’s test or LRT). Pan [2009] pointed out that the
specified positive and semi-definite (psd) kernel function
weighted score test of Wang and Elston [2007] share the
KðXi :; Xj Þ, which measures the genomic similarity between
same spirit and thus similar performance as the Sum test.
Note that in model (3) we regress Y on a new ‘‘super-SNV’’ the genotypes of subjects i and j. Some commonly used
that is the sum of the genotype values of all the SNVs, kernels include linear, identity-by-descent (IBS) and quad-
hence we call the resulting test Sum test. ratic kernels. By the representer P theorem [Kimeldorf and
To incorporate prior biological information, one may Wahba, 1971], hi ¼ hðXi Þ ¼ nj¼1 gj KðXi ; Xj Þ with some
want to weight SNVs using some suitable weights, e.g. g1 ; . . . ; gn . To test the null hypothesis of no association
based on their MAFs [Madsen and Browning, 2009] or between the phenotype and SNVs, one can test H0:
their predicted likelihoods of being functional [Price et al., h ¼ ðh1 ðX1 Þ; . . . ; hn ðXn ÞÞ0 ¼ 0. Denote K as the n  n matrix
2010]. It is straightforward to do so in logistic regression: with the (i,j) th element as KðXi ; Xj Þ and g ¼ ðg1 ; . . . ; gn Þ0 ,
with a set of weights w ¼ ðw1 ; . . . ; wk Þ0 , we can simply then we have h 5 Kg. Treating h as subject-specific random
weight the codings for SNVs; that is, rather than effects with mean 0 and covariance matrix tK, testing H0:
using Xi ¼ ðXi1 ; . . . ; Xik Þ0 for subject i, we use Xi;w ¼ h 5 0 for no SNV effects is equivalent to testing H0: t 5 0.
ðw1 Xi1 ; . . . ; wk Xik Þ0 in logistic regression model (1). It is The corresponding variance component score test statistic is
easy to see that, the UminP, score and SSUw tests are
 0 KðY  Y1Þ;
Q ¼ ðY  Y1Þ 
invariant to such weighting, while the SSU and Sum tests
do depend on such weighting. In fact, a careful examina- whose asymptotic null distribution is a mixture of w21 ’s,
tion of the SSU and Sum test statistics indicates that the which can be approximated by a scaled chi-squared
above two tests treat Xij’s more or less equally across j. By distributions [Wu et al., 2010].
the expression of V, we see that those variants with larger The above logistic KMR can be extended to include
MAFs tend to have larger variances for their components other covariates and for other traits, e.g. linear models for
of the score vector. Hence, without weighting, the SSU and quantitative traits [Kwee et al., 2008]. Since the kernel
Sum tests essentially give heavier weights to the variants function measures the similarity of two genotypes, KMR is
with larger MAFs, implying that they will be sensitive to expected to be related to genomic-distance based regres-
the presence of non-causal CVs, as to be confirmed. sion (GDBR) of Wessel and Schork [2006]; see Schaid
To overcome the above weakness of the SSU test, a [2010a,b] for a review on the topic. More specifically, as
simple strategy is to weight each variant j inversely by its shown by Pan [2011], both KMR and GDBR are equivalent
sample standard deviation SDðX1j ; . . . ; Xnj Þ, which is to the SSU test on H0: b 5 0 in a new logistic regression
equivalent to standardizing each predictor j to have a model:
sample SD 5 1. The resulting SSU test is essentially the
same as the SSUw test. This point highlights a key Logit PrðY ¼ 1Þ ¼ b0 1Zb; ð6Þ

Genet. Epidemiol.
Statistical Tests for Rare Variants 609

where K 5 ZZ0 . Hence, the difference between the SSU test the sums themselves, between the case and control groups.
for model (1) and logistic KMR is only in the transforma- Hence, putting aside the difference in weighting, the
tion of SNV codings in model (6), while both tests are wSum test is analogous to the Mann–Whitney–Wilcoxon
actually an SSU test applied to two different regression rank test, while other pooled association tests are
models. A special case is that, for a linear kernel K, we analogous to the t-test.
have K 5 XX0 ; that is, Z 5 X, under which the SSU and The main advantage of the above pooled association
KMR are equivalent. tests is their minimum DF at 1, hence no loss of power due
Empirically it has been found that GDBR and KMR to large DF or multiple testing adjustment. However, as
performed very well in detecting disease association with pointed out by Han and Pan [2010a], they all share a
CVs [Lin and Schaid, 2009; Wu et al., 2010; Han and Pan, common weakness: they suffer from possibly significant
2010b]. Albeit proposed for and mainly applied to CVs, power loss if the association directions of the causal
first Wessel and Schork [2006], and more recently Bansal variants are opposite. This can be most clearly seen from
et al. [2010] commented that GDBR (and thus KMR) could the Sum test. Generally, the common association para-
be applied to sequence data to detect association with RVs. meter bc in (3) can be viewed as a weighted average of the
To our knowledge, the above statistical tests originally individual b1 ; . . . ; bk ; see a closed-form expression for b^ c
proposed for CVs have never been applied to RVs. for linear regression given in Pan [2009]. Hence, depend-
Intuition might argue against their application to RVs. ing on the signs of b1 ; . . . ; bk , |bc| may be very small,
However, as to be shown, perhaps quite surprisingly, some leading to loss of power in the Sum test. To overcome this
of them performed quite well in our numerical studies. We limitation, several methods based on model selection have
will offer some explanations in Discussion. been proposed, as to be presented next.
Methods based on model selection. A general
METHODS FOR RVs model has been proposed by Hoffmann et al. [2010]:
Pooled association tests. The first test specifically X
designed for RVs is perhaps the cohort allelic sums test Logit PrðYi ¼ 1Þ ¼ bc0 1 gj Xij bc ; ð8Þ
(CAST) [Morgenthaler and Thilly, 2007]. CAST works by j¼1
first collapsing the genotypes across RVs to generate a
‘‘super-variant’’: Xi;C ¼ 1 if any Xij40 (i.e. any RV is with gj ¼ wj sj , where wj is a weight assigned to SNV j, sj 5 1
present), and Xi;C ¼ 0 otherwise. It then tests the associa- or 1 indicating whether the effect of SNV j is positive or
tion between the trait and this new Xi;C. It can be regarded negative, and sj 5 0 indicating the exclusion of SNV j from
as fitting a logistic regression model the model (i.e. the SNV is unlikely to be associated with the
trait). Madsen and Browning [2009] suggested to weight
Logit PrðYi ¼ 1Þ ¼ bC;0 1Xi;C bC ; ð7Þ RVs with the weights depending on their MAFs. However,
it is still debatable on how to appropriately weight the
and testing H0: bC 5 0. The most striking feature of CAST, SNVs, and if needed, it is not difficult to incorporate a
as the Sum test, is its testing on a single parameter, thus weighting scheme into most methods discussed here.
low DF and possibly increased power. Hence, we do not discuss the use of weights and always
As pointed out by Han and Pan [2010a], the CAST is assume wj 5 1 for any test except the wSum test.
closely related to the Sum test: both test on only a single The pooled association tests correspond to fixing sj 5 1
parameter representing some average effect of the multiple for all j’s. Several existing model-selection-based methods
SNVs. They differ in their coding of the ‘‘super-variant’’: can be classified into one of the two classes:
Xi;C ¼ _kj¼1 Xij versus Xi;S ¼ kj¼1 Xij , similar to the use of a (1) Choosing sj 5 1 or 1 in a data-dependent manner.
dominant genetic model versus an additive genetic model Han and Pan [2010a] proposed an adaptive Sum
for the effect of an individual variant. Note for RVs, we (aSum) test, in which the value of each sj is determined
have Xi;C  Xi;S . Other codings for the ‘‘super-variant’’ are based on a univariate test on the marginal association
also possible, as considered by Morris and Zeggini [2010]. between the trait and SNV j for j ¼ 1; . . . ; k.
Li and Leal [2008] proposed a new test called Combined (2) Choosing sj 5 1, 0 or 1 in a data-dependent manner.
Multivariate and Collapsing (CMC) test, which modifies A Step-up procedure [Hoffmann et al., 2010] and a
the CAST to improve its performance when both rare and covering method (called RareCover) [Bhatia et al.,
CVs are present. Specifically, for any rare mutations with 2010] have been proposed to determine the value of
their minor allele frequencies (MAFs) less than some sj’s, both in a manner of forward variable selection:
threshold, say 0.05, they will be combined into a new starting from a null model without any SNV, SNVs are
group as in the CAST, while each CV (e.g. with selected one by one based on their statistical signifi-
MAF40.05) forms its own group, and the generalized cance and then added into the model.
Hotelling’s test [Fan and Knapp, 2003; Xiong et al., 2002] is
applied to such formed multiple groups. Note that the Here we propose two new methods, both of which start
generalized Hotelling’s test is closely related to the score from the Sum test with all sj 5 1. The main motivation is
test in logistic regression [Clayton et al., 2004]. Hence, for that, since the individual effect of each RV is hard to detect
only RVs, the CMC test is essentially the same as the CAST while the Sum test (or any other pooled association test)
(and the Sum test). has proven useful for RVs, rather than starting from a null
The weighted sum (w-Sum) test of Madsen and model (as in the Step-up and the RareCover procedures) or
Browning [2009] is also based on the idea of collapsing testing on marginal association (as in the aSum test), we
RVs. It differs from the Sum test in (i) using a weighted would like to start from the Sum test and make any
sum, instead of a simple sum, of RVs by their MAFs, and necessary adjustment on the values of sj’s, which may
(ii) comparing the ranks of the weighted sum, rather than result in higher power. In the first method, called
Genet. Epidemiol.
610 Basu and Pan

Sequential Sum test (Seq-Sum), for each SNV j with j 5 1 with C(a,b) as the combination number of choosing b out
and increased to k, we determine which of the two models, of a. The P-value is calculated by standard permutations
the current model with sj 5 1 and the other model with (for small samples while a Monte Carlo approximation is
sj 5 1 (while all other sj’s fixed at their current values for used for large samples).
both models), is preferred based on which model yields a From the expression of TKBAC , we see that its perfor-
larger (maximized) likelihood; then we increase j by one mance may deteriorate in the presence of both protective
and repeat the above process until we have tried and harmful causal variants: some positive and negative
j ¼ 1; . . . ; k. In the second method, called Seq-Sum test components ðn1;i =n1  n0;i =n0 Þwi may cancel out with each
with variable selection (Seq-Sum-VS), starting from SNV other in the sum, though the use of weight wi may alleviate
j 5 1, we consider three models with sj 5 1, 0 and 1, the problem. A simple modification as shown below may
respectively, and choose the model with the largest help overcome the problem:
(maximized) likelihood; then we increase j by 1, and
repeat the above process until having tried j ¼ 1; . . . ; k X
TmKBAC ¼ ðn1;i =n1  n0;i =n0 Þ2 wi ;
sequentially. Hence Seq-Sum considers only the coding of
each SNV (i.e. its protective or harmful effect), while
Seq-Sum-VS considers selecting both SNVs and their though we do not pursue it here. In addition, KBAC
association directions. It is noted that the two methods includes non-causal variants in forming mutation patterns,
consider only a total of k11 and 2k11 candidate models which may dramatically increase the number of mutation
respectively. Due to the nature of their sequential search patterns (M) and thus effectively reduce the group sizes
and dependence on the order of the SNVs, unlike the n:i ’s, leading to loss of power. Nevertheless, the KBAC test
Step-up and CoverRare procedures, it is unlikely that they is attractive in detecting possible interactions among the
will select the best model (in terms of the largest variants, though we do not pursue this issue here.
maximized likelihood). Nevertheless, there are two possi- C-alpha test. Neale et al. [2011] proposed using the
ble benefits. One is the obviously reduced computational C-alpha test of Neyman and Scott [1966]. It is based on
cost when compared to an exhaustive search for exponen- testing for a common value (i.e. homogeneity) for a set of
tially many (i.e. 2k and 3k) models. The second benefit is binomial proportions, not on logistic regression.
less obvious: there is also lower cost for multiple testing For SNV j, assume there are nj subjects with the rare
adjustment due to a reduced number of model compar- mutation (or minor allele); among those nj subjects, we
isons. Computationally, rather than using the maximized have mj cases with mutation (and njmj controls with
likelihood as the criterion to select models, which requires mutation). We assume mjBin(nj,pj). Under the null
fitting each model by an iterative algorithm to obtain the hypothesis of no association between the disease and
maximum likelihood estimates, we adopt a score test, SNV j, we have pj 5 p0 for some common p0 for all
which is computationally much faster. The proposed j ¼ 1; . . . ; k. For a case-control study as considered here, we
Seq-Sum test is closely related to a new aSum test of Pan have p0 ¼ 1  n0 =n. The C-alpha test is based on the
and Shen [2011], which is more flexible while overcoming following:
a weakness of the Seq-Sum method, namely, its depen-
dence on an arbitrary ordering of the SNVs. X
k X
TC ¼ TC;j ¼ ðmj  nj p0 Þ2  nj p0 ð1  p0 Þ;
In general, it is difficult to analytically derive the null j¼1 j¼1
distribution of a test statistic after model selection. For each
procedure above, we use permutations to calculate P-values. X
k X
VC ¼ VarðTC;j Þ ¼ E½ðmj  nj p0 Þ2  nj p0 ð1  p0 Þ2 ;
Kernel-based adaptive clustering. Liu and Leal
j¼1 j¼1
[2010] proposed a method called KBAC for RV association
testing. KBAC works by grouping/clustering mutation where
patterns across the variants, and assigning each mutation nj
pattern a kernel-based weight adaptively determined by X
data. Specifically, suppose that among the cases and VarðTC;j Þ ¼ ½ðu  nj p0 Þ2  nj p0 ð1  p0 Þ2 fðujnj ; p0 Þ
controls, we have M11 mutation patterns across all k
variants, denoted as G0, G1 ; . . . ; GM , where G0 represents and fðujnj ; p0 Þ ¼ Cðnj ; uÞpu0 ð1  p0 Þnj u is the binomial prob-
the wild-type without any mutation. We also assume that ability PrðU ¼ uÞ for UBin(nj,p0). If all mj’s are independent,
there are n1;i cases and n0;i controls with mutation pattern then under the null hypothesis of no association p between
Gi; denote n:i ¼ n1;i 1n0;i . For mutation pattern Gi, the risk any SNV and the disease, the test statistic Z ¼ TC = VC has
of having disease is estimated as Ri ¼ n1;i =n:i . The KBAC an asymptotic distribution of N(0,1), from which a P-value
test statistic is can be calculated. Alternatively, one can permute the
disease labels Y, calculate Z’s for permuted data and thus a
!2 P-value. We denote the two versions of the tests using the
TKBAC ¼ ðn1;i =n1  n0;i =n0 Þwi ; asymptotic distribution and the permutation distribution
respectively as C-alpha-A and C-alpha-P.
The C-alpha test treats SNV-specific mutation rates pj’s
where the weight wi is determined by a hyper-geometric as a random sample drawn from some common distribu-
kernel: tion, say G. Under H0, the distribution reduces to a point
Z Ri mass at p0. Hence, the C-alpha test can be regarded as
X Cðn:i ; n:i rÞCðnn:i ; n1 n:i rÞ testing on the variance component of G: the variance of pj’s
wi ¼ k0i ðrÞ dr ¼ is 0 under H0. The C-alpha test is a score test for such a
0 n o Cðn; n1 Þ
r2 0 1
n:i ;n:i ;...Ri
homogeneity problem [Zelterman and Chen, 1988],
bearing some similarity to the framework of the variance
Genet. Epidemiol.
Statistical Tests for Rare Variants 611

component testing for a R-E model, under which the SSU Although the RBT was designed to differentiate between
and KMR can be formulated. In fact, as shown in Appendix, protective and harmful variants, it treats and tests the two
the general homogeneity score test of Zelterman and Chen groups separately, hence may lose power. Furthermore, for
[1988] has the same form of Goeman’s test. a non-causal variant j, it is likely that nj6¼mj/2, under
Each component of the C-alpha test statistic, TC;j , which case non-causal variant j will be pooled over into
contrasts the sample variance for variant j with its the test statistic, though its weight wj may be relatively
theoretical variance under H0. Since the fourth central small; nonetheless, the RBT may lose power in the
moment is presence of a large number of non-causal RVs.
A summary. We compare the above tests in several
Eðmj  nj p0 Þ4 ¼ 3ðnj p0 q0 Þ2 1nj p0 q0 ð1  6p0 q0 Þ aspects as shown in Table I. We do not include CAST and
with q0 ¼ 1  p0 , under H0, we have CoverRare since they are similar to CMC and Step-up,
respectively. We note that the wSum test uses permuta-
VarðTC;j Þ ¼ 2ðnj p0 q0 Þ2 1nj p0 q0 ð1  6p0 q0 Þ; tions to estimate the mean and variance of its asymptotic
Normal distribution, and does not need a large number of
which is an increasing function of nj. Thus, similar to the permutations to reach high statistical significance, which is
SSU test (and KMR), since the C-alpha test statistic
P is a required by other permutation-based tests. We also note
simple sum of the statistics for the variants, TC ¼ kj¼1 TC;j , that the CMC was proposed to use the generalized
it may be dominated by the variants with large VarðTC;j Þ, Hotelling’s test, which does not accommodate covariates
e.g. those with high MAFs; it is possible, and even and other types of traits as shown in Table I. However,
productive, to weight the components suitably with since Hotelling’s test is equivalent to the score test in
a set of weights wj’s to yield a weighted version of the logistic regression [Clayton et al., 2004], it is easy to
C-alpha test: generalize the CMC test to accommodate covariates and
k other types of traits if the score test in a GLM is adopted.
TC;w ¼ wj TC;j : Finally, we will call the Sum, CMC and wSum tests loosely
j¼1 as the pooled association tests (that do not consider
selecting SNVs and their association directions).
As to be shown, similar to the SSU test, the C-alpha test
does not perform well in the presence of non-causal CVs,
in which case its weighted versions are more powerful. Weffi SIMULATED DATA
can use weights wj as shown in (4), or wj ¼ 1= VarðTC;j Þ, We generated simulated data as in Wang and Elston
and calculate their -P-values using permutations; we [2007] and Pan [2009]. Specifically, we simulated k SNVs
denote the resulting tests as w1C-alpha-P and w2C- with the sample size of 500 cases and 500 controls. Each RV
alpha-P respectively. Since VarðTC;j Þ ¼ 0 and 0.25 for had a mutation rate or MAF uniformly distributed between
nj 5 1 and nj 5 2 respectively, we define wj 5 1/0.5 for 0.001 and 0.01, while for a CV it was between 0.01 and 0.1.
nj 5 1, the same as wj for nj 5 2, in the w2C-alpha-P test. As First, we generated a latent vector Z ¼ ðZ1 ; . . . ; Zk Þ0 from a
to be shown, the two weighted C-alpha tests did not multivariate normal distribution with a first-order auto-
perform as well as the wSSU-P test, and will be skipped in regressive (AR1) covariance structure: there was an
most of simulations. correlation CorrðZi ; Zj Þ ¼ rjijj between any two latent
Replication-based test. Ionita-Laza et al. [2011] components. We used r 5 0 and r 5 0.9 to generate
proposed a new test called RBT. The RBT is similar to a (neighboring) SNVs in linkage equilibrium and in linkage
pooled association test but purposefully designed to deal disequilibrium (LD) respectively. Second, the latent vector
with possibly different association directions. In addition, was dichotomized to yield a haplotype with MAFs each
a new weighting scheme is adopted to improve power. randomly selected. Third, we combined two independent
Using the same notation as before, suppose that for variant haplotypes and obtained genotype data Xi ¼ ðXi1 ; . . . ; Xik Þ0 .
j there are nj mutations in cases and mjnj mutations in Fourth, the disease status Yi of subject i was generated from
controls. Define a statistic to measure the enrichment of the logistic regression model (1). For the null case, we used
mutations in cases: b 5 0; for non-null cases, we randomly selected 8 non-zero
components of b while the remaining ones were all 0. Fifth,
as in any case-control design we sampled 500 cases and 500
S1 ¼ Iðnj 4mj =2Þwðnj ; mj Þ controls in each dataset.
We considered several simulation set-ups. Throughout
with weight the simulations, we fixed the test significance level at
a 5 0.05 (or a 5 0.01 in a few cases), and used 500
wðnj ; mj Þ ¼ log Prðnj ; mj Þ permutations for each permutation-based method. The
¼ logðppoisðmj nj ; mj =2Þ½1  ppoisðnj 1; mj =2ÞÞ; results were based on 1,000 independent replicates for
each set-up.
where ppois(a,b) is the cumulative distribution function of a We used the R code of Wu et al. [2010] implementing the
Poisson distribution Pois(b) evaluated at a. Similarly we KMR methods. We used the linear, IBS and quadratic
measure the enrichment of mutations in controls with kernels; since the first two performed similarly across all
simulations, we present results for the linear and quadratic
kernels. We used the R package thgenetics implementing
S ¼ Iðnj omj =2Þwðmj  nj ; mj Þ:
the Step-up procedure, and a C11/R implementation of
KBAC. We implemented all other tests in R. For the CMC
The final test statistic is TR ¼ maxðS1 ; S Þ. The P-value is test, we used the default cut-off of MAF r0.05 for RVs,
calculated by permutations. though we explored using the cut-off r0.01 in a few cases.
Genet. Epidemiol.
612 Basu and Pan

TABLE I. A summary of the properties of the tests to be compared: originally proposed to target CVs or RVs (or both),
whether pooling over variants, whether sensitive to association directions (1/), to a large number of non-causal RVs
(nRVs) and to a few non-causal CVs (nCVs), requiring permutations for P-value calculations, capability to adjust for
other covariates (Cov), applicability to other non-binary traits, whether can be formulated as testing on a variance
component in a random-effects (R-E) model, and references for more details
Original Sens to Sens to Sens to Other
Test target Pool 1/ nRVs nCVs Permut Cov traits R-E Refs

UminP CV No No No No No Yes Yes No 3

Score CV No No No No No Yes Yes Yes 1
SSU CV No No No Yes No Yes Yes Yes 2
wSSU-P Both No No No No Yes Yes Yes Yes Here
SSUw CV No No No No No Yes Yes Yes 2
Sum CV No Yes Yes Yes No Yes Yes No 2
KMR CV No No No Yes No Yes Yes Yes 4, 5
CMC RV Yes Yes Yes No No No No No 6
wSum RV Yes Yes Yes Some Some No No No 7
aSum-P Both Yes Some Yes Some Yes Yes Yes No 8
Step-up RV Yes Some Some No Yes Yes Yes No 10
Seq-aSum Both Yes Some Some Yes Yes Yes Yes No Here
Seq-aSum-VS Both Yes Some Some No Yes Yes Yes No Here
KBAC RV No Some Some Some Yes Some No No 11
C-alpha-A RV No No No Yes No No No Yes 9
C-alpha-P RV No No No Yes Yes No No Yes 9
RBT RV Yes Some Yes No Yes No No No 12

Refs: 1, Clayton et al. [2004]; 2, Pan [2009]; 3, Conneely and Boehnke [2007]; 4, Kwee et al. [2008]; 5, Wu et al. [2010]; 6, Li and Leal [2008]; 7,
Madsen and Browning [2009]; 8, Han and Pan [2010]; 9, Neale et al. [2011]; 10, Hoffmann et al. [2010]; 11, Liu and Leal [2010]; 12, Ionita-Laza
et al. [2011].

RESULTS directions (Table IV), it is confirmed that the pooled

association tests performed similarly and suffered from
substantial loss of power. Across all the situations, the
INDEPENDENT RVs SSU, KMR and C-alpha performed similarly and were
We first consider that there is no LD between any two RVs, most powerful. Although the three sequential model
mimicking the situation where mutations are all completely selection approaches (Step-up, Seq, Seq-VS), the KBAC
random and independent of each other. To investigate the and the aSum test performed well with no or few
possible dependence of performance on the significance level, non-causal RVs, surprisingly, as the number of non-causal
we used both a 5 0.05 and a 5 0.01. RVs increased, their performance deteriorated more
Table II shows that all the tests had satisfactory Type I than that of the SSU, KMR and C-alpha tests. Nevertheless,
error rates except that the C-alpha-A test might have the above procedures did improve over the pooled
some inflated Type I error rates at a 5 0.01 (as shown by association tests.
the bold values), suggesting that perhaps a larger sample It is noted that the CMC(0.01) test (with MAFr0.01 as
size is needed for using its asymptotic distribution with a the cut-off for RVs) was less powerful than the default
more stringent significance level. CMC, i.e. CMC(0.05), test in Table III because the former
For power comparison, the overall conclusions are the unnecessarily formed a few extra groups for CVs and
same with either a 5 0.05 and a 5 0.01. In each scenario three increased the DF of the test; in contrast, the former
most powerful ones are highlighted as bold values in each performed better than the latter in Table IV, presumably
table. First, for the non-null case that the eight causal RVs because some causal RVs might have an overall
shared a common OR (Table III), which is ideal for the MAF40.01 (due to their enrichment in cases) and the
pooled association tests (Sum, CMC, wSum), the pooled CMC(0.01) test grouped these causal RVs into separate
association tests and KBAC were most powerful if there groups, avoiding pooling them over with other causal RVs
were no or few non-causal RVs (i.e. RVs not associated with with opposite association directions and thus improving
the trait). As the number of non-causal RVs increased, the power. Hence, the choice of the MAF threshold for RVs is
SSU and KMR gradually became the most powerful while important for CMC, but it is unclear how to do so
the C-alpha test and the model selection approaches also generally in practice.
had much improved performance relatively. The KBAC was It is noted that since the true model was the main-effects
most powerful except for the case with 64 non-causal RVs. model (1), KMR with a linear kernel corresponded to using
Note that the aSum test maintained power as high as that of the true model, thus it was more powerful than using a
the Sum test, while the single SNV-based test, UminP, most quadratic kernel; the small performance difference between
commonly used in GWAS, had consistently low power. using the two kernels demonstrated the robustness of the
For the case that the association strengths of the KMR method. It is also noted that, since all the RVs were
causal RVs were not constant with possibly opposite independent, the covariance matrix V was nearly diagonal,
Genet. Epidemiol.
Statistical Tests for Rare Variants 613

TABLE II. Type I error rates at nominal level a based on 1,000 replicates for eight RVs plus a number of non-causal RVs
a 5 0.05 a 5 0.01

] of neutral RVs ] of neutral RVs

Test 0 4 8 16 32 0 4 8 16 32

U min P 0.027 0.027 0.016 0.011 0.019 0.003 0.001 0.004 0.001 0.002
Score 0.043 0.049 0.040 0.040 0.040 0.006 0.009 0.005 0.005 0.007
SSU 0.044 0.055 0.045 0.037 0.043 0.004 0.013 0.009 0.005 0.011
wSSU-P 0.052 0.051 0.048 0.048 0.046 0.008 0.008 0.014 0.010 0.008
SSUw 0.041 0.049 0.039 0.034 0.040 0.006 0.011 0.005 0.005 0.007
Sum 0.047 0.055 0.041 0.054 0.038 0.012 0.007 0.010 0.010 0.007
KMR (Linear) 0.046 0.056 0.046 0.042 0.047 0.007 0.016 0.011 0.007 0.012
KMR (Quad) 0.046 0.056 0.047 0.039 0.046 0.007 0.016 0.010 0.006 0.011
CMC (0.01) 0.035 0.053 0.044 0.055 0.039 0.008 0.014 0.010 0.011 0.009
CMC 0.048 0.053 0.043 0.056 0.051 0.010 0.009 0.011 0.011 0.007
wSum 0.050 0.057 0.038 0.059 0.056 0.010 0.012 0.011 0.009 0.006
aSum-P 0.058 0.064 0.052 0.063 0.047 0.012 0.011 0.010 0.010 0.011
Step-up 0.046 0.059 0.056 0.051 0.051 0.012 0.011 0.009 0.009 0.010
Seq-aSum 0.044 0.066 0.056 0.055 0.059 0.008 0.013 0.008 0.008 0.013
Seq-aSum-VS 0.050 0.058 0.056 0.051 0.058 0.011 0.018 0.011 0.009 0.013
KBAC 0.058 0.044 0.053 0.054 0.046 0.013 0.007 0.009 0.012 0.009
C-alpha-A 0.045 0.051 0.042 0.036 0.043 0.016 0.030 0.022 0.010 0.014
C-alpha-P 0.050 0.065 0.058 0.051 0.055 0.005 0.016 0.013 0.006 0.012
RBT 0.045 0.045 0.050 0.062 0.044 0.011 0.010 0.011 0.011 0.005

There is no LD among the RVs.

TABLE III. Empirical power for tests at nominal level a based on 1,000 replicates for an ideal case for eight causal RVs
with a common association strength OR 5 2 and a number of non-causal RVs
a 5 0.05 a 5 0.01

] of neutral RVs ] of neutral RVs

Test 0 4 8 16 32 64 0 4 8 16 32 64

U min P 0.441 0.336 0.296 0.222 0.175 0.117 0.142 0.089 0.094 0.050 0.043 0.029
Score 0.746 0.632 0.595 0.471 0.332 0.245 0.496 0.391 0.314 0.221 0.143 0.073
SSU 0.756 0.702 0.694 0.626 0.499 0.423 0.525 0.479 0.448 0.379 0.283 0.205
wSSU-P 0.821 0.732 0.714 0.644 0.514 0.390 0.573 0.471 0.407 0.332 0.222 0.161
SSUw 0.743 0.638 0.593 0.477 0.339 0.268 0.502 0.389 0.316 0.218 0.153 0.082
Sum 0.951 0.875 0.808 0.673 0.484 0.313 0.859 0.709 0.605 0.438 0.248 0.116
KMR (Linear) 0.762 0.711 0.699 0.631 0.509 0.438 0.548 0.500 0.473 0.405 0.308 0.234
KMR (Quad) 0.755 0.707 0.699 0.629 0.501 0.410 0.545 0.497 0.466 0.403 0.299 0.215
CMC (0.01) 0.853 0.761 0.702 0.628 0.484 0.396 0.672 0.524 0.452 0.384 0.268 0.218
CMC 0.938 0.853 0.777 0.616 0.399 0.211 0.831 0.679 0.570 0.383 0.196 0.086
wSum 0.940 0.846 0.782 0.618 0.424 0.267 0.838 0.687 0.568 0.394 0.216 0.114
aSum-P 0.933 0.858 0.780 0.669 0.499 0.313 0.781 0.611 0.534 0.381 0.257 0.125
Step-up 0.859 0.801 0.769 0.679 0.521 0.335 0.712 0.608 0.552 0.431 0.301 0.135
Seq-aSum 0.810 0.705 0.663 0.547 0.407 0.312 0.596 0.470 0.415 0.320 0.190 0.128
Seq-aSum-VS 0.798 0.722 0.692 0.590 0.420 0.344 0.598 0.506 0.452 0.345 0.216 0.141
KBAC 0.960 0.911 0.867 0.779 0.600 0.388 0.858 0.749 0.680 0.529 0.317 0.160
C-alpha-A 0.741 0.687 0.664 0.597 0.460 0.364 0.637 0.580 0.538 0.446 0.320 0.234
C-alpha-P 0.771 0.712 0.688 0.627 0.484 0.378 0.542 0.492 0.459 0.402 0.305 0.219
RBT 0.941 0.849 0.784 0.664 0.463 0.321 0.813 0.667 0.587 0.424 0.238 0.121

There is no LD among the RVs.

and thus the score test and SSUw test performed similarly. informative to weight the variants according to their
Finally, since in the current simulation set-up, causal RVs MAFs, suggesting why the SSU test outperformed the
were randomly chosen with various MAFs, it was not wSSU-P and SSUw tests.

Genet. Epidemiol.
614 Basu and Pan

TABLE IV. Empirical power for tests at nominal level a based on 1,000 replicates for a non-ideal case for eight causal
RVs with various association strengths OR 5 (3,3,2,2,2,1/2,1/2,1/2) and a number of non-causal RVs
a 5 0.05 a 5 0.01

] of neutral RVs ] of neutral RVs

Test 0 4 8 16 32 0 4 8 16 32

U min P 0.607 0.532 0.481 0.417 0.346 0.318 0.259 0.227 0.204 0.142
Score 0.869 0.772 0.721 0.632 0.483 0.660 0.532 0.480 0.356 0.233
SSU 0.895 0.835 0.815 0.774 0.696 0.723 0.662 0.645 0.583 0.472
wSSU-P 0.861 0.776 0.735 0.685 0.550 0.606 0.510 0.460 0.401 0.258
SSUw 0.867 0.773 0.732 0.633 0.501 0.661 0.550 0.481 0.355 0.238
Sum 0.682 0.566 0.465 0.365 0.258 0.471 0.348 0.257 0.172 0.101
KMR (Linear) 0.897 0.842 0.824 0.783 0.707 0.740 0.678 0.667 0.619 0.495
KMR (Quad) 0.893 0.835 0.815 0.781 0.698 0.734 0.680 0.663 0.608 0.484
CMC (0.01) 0.703 0.669 0.670 0.670 0.590 0.511 0.457 0.470 0.470 0.383
CMC 0.661 0.544 0.456 0.336 0.204 0.461 0.337 0.235 0.157 0.086
wSum 0.659 0.548 0.459 0.335 0.228 0.460 0.336 0.236 0.158 0.093
aSum-P 0.854 0.745 0.684 0.574 0.430 0.670 0.538 0.430 0.315 0.207
Step-up 0.839 0.767 0.724 0.640 0.527 0.652 0.564 0.518 0.413 0.285
Seq-aSum 0.892 0.811 0.757 0.671 0.528 0.752 0.620 0.532 0.438 0.273
Seq-aSum-VS 0.885 0.807 0.768 0.686 0.545 0.729 0.623 0.567 0.448 0.293
KBAC 0.907 0.813 0.763 0.642 0.436 0.737 0.607 0.536 0.399 0.199
C-alpha-A 0.892 0.826 0.802 0.757 0.655 0.824 0.732 0.720 0.653 0.512
C-alpha-P 0.906 0.844 0.823 0.775 0.674 0.735 0.673 0.661 0.612 0.496
RBT 0.810 0.659 0.603 0.482 0.301 0.590 0.429 0.356 0.250 0.125

There is no LD among the RVs.

TABLE V. Type I error (with OR 5 1) and power (with eight causal RVs with OR 5 (3,1/3,2,2,2,1/2,1/2,1/2)) for tests at
nominal level a 5 0.05 based on 1,000 replicates for eight RVs and a number of other non-causal RVs
OR 5 1 OR 5 (3,1/3,2,2,2,1/2,1/2,1/2)

] of neutral RVs ] of neutral RVs

Test 0 4 8 16 32 0 4 8 16 32

U min P 0.033 0.027 0.026 0.016 0.013 0.489 0.479 0.452 0.365 0.318
Score 0.034 0.022 0.025 0.019 0.023 0.599 0.538 0.491 0.380 0.276
SSU 0.040 0.041 0.052 0.044 0.036 0.603 0.624 0.635 0.581 0.574
wSSU-P 0.057 0.043 0.047 0.062 0.053 0.566 0.586 0.609 0.585 0.491
SSUw 0.035 0.042 0.049 0.033 0.034 0.532 0.561 0.574 0.506 0.493
Sum 0.049 0.047 0.059 0.033 0.049 0.342 0.312 0.315 0.258 0.239
KMR (Linear) 0.042 0.045 0.057 0.046 0.043 0.611 0.630 0.644 0.597 0.590
KMR (Quad) 0.038 0.033 0.041 0.030 0.025 0.545 0.563 0.565 0.493 0.474
CMC 0.045 0.053 0.056 0.036 0.060 0.296 0.283 0.189 0.182 0.365
wSum 0.045 0.054 0.056 0.040 0.063 0.369 0.297 0.287 0.191 0.200
aSum-P 0.050 0.046 0.061 0.038 0.053 0.350 0.323 0.325 0.258 0.243
Step-up 0.047 0.060 0.059 0.042 0.050 0.524 0.516 0.532 0.429 0.409
Seq-aSum 0.045 0.062 0.054 0.056 0.055 0.658 0.617 0.596 0.484 0.416
Seq-aSum-VS 0.043 0.056 0.058 0.054 0.049 0.658 0.606 0.577 0.472 0.414
KBAC 0.050 0.054 0.050 0.053 0.049 0.497 0.439 0.426 0.371 0.275
C-alpha-A 0.065 0.076 0.092 0.097 0.110 – – – – –
C-alpha-P 0.050 0.049 0.062 0.057 0.048 0.629 0.650 0.668 0.607 0.598
RBT 0.047 0.039 0.036 0.060 0.056 0.374 0.343 0.386 0.357 0.279

There is no LD among the RVs.

RVs IN LD if a RV was associated with the disease, so were the other

We next consider the case where all the RVs, both causal RVs since they were in LD. For the null case (Table V),
and non-causal ones, were possibly correlated. In this case, all the tests except C-alpha-A had their Type I error rates

Genet. Epidemiol.
Statistical Tests for Rare Variants 615

TABLE VI. Type I error (with OR 5 1) and power (with eight causal RVs with OR 5 (3,1/3,2,2,2,1/2,1/2,1/2)) for tests at
nominal level a 5 0.05 based on 1,000 replicates for eight RVs and a number of other non-causal RVs
OR 5 1 OR 5 (3,1/3,2,2,2,1/2,1/2,1/2)

] of neutral RVs ] of neutral RVs

Test 0 8 16 32 64 0 8 16 32 64

U min P 0.032 0.018 0.021 0.014 0.007 0.506 0.380 0.324 0.288 0.208
Score 0.029 0.029 0.028 0.019 0.021 0.631 0.480 0.373 0.241 0.160
SSU 0.049 0.051 0.035 0.034 0.034 0.642 0.553 0.475 0.444 0.334
wSSU-P 0.045 0.060 0.042 0.050 0.052 0.606 0.494 0.424 0.362 0.269
SSUw 0.045 0.040 0.027 0.015 0.036 0.562 0.450 0.352 0.272 0.187
Sum 0.046 0.059 0.046 0.046 0.046 0.345 0.229 0.159 0.110 0.079
KMR (Linear) 0.051 0.056 0.039 0.040 0.037 0.649 0.568 0.490 0.459 0.356
KMR (Quad) 0.046 0.049 0.022 0.021 0.017 0.572 0.487 0.392 0.331 0.205
CMC 0.046 0.053 0.040 0.050 0.047 0.339 0.235 0.193 0.124 0.111
wSum 0.048 0.052 0.041 0.053 0.048 0.342 0.237 0.199 0.133 0.114
aSum-P 0.052 0.061 0.049 0.046 0.052 0.364 0.239 0.170 0.113 0.081
Step-up 0.057 0.055 0.047 0.048 0.051 0.554 0.449 0.378 0.304 0.213
Seq-aSum 0.051 0.053 0.041 0.046 0.052 0.703 0.584 0.453 0.353 0.249
Seq-aSum-VS 0.053 0.053 0.048 0.041 0.054 0.701 0.572 0.447 0.351 0.258
KBAC 0.048 0.058 0.036 0.053 0.047 0.527 0.388 0.321 0.262 0.180
C-alpha-A 0.076 0.093 0.084 0.092 0.118 – – – – –
C-alpha-P 0.055 0.065 0.043 0.050 0.047 0.669 0.585 0.504 0.472 0.340
RBT 0.057 0.059 0.049 0.042 0.054 0.376 0.285 0.188 0.141 0.097

There is LD among the 8 RVs and among other non-causal RVs, but no LD between the 8 RVs and non-causal RVs.

well controlled. Since the asymptotic distribution of the INDEPENDENT RVs AND CVs
C-alpha test is derived under the assumption that all the RVs Finally we considered the case with independent RVs
are independent, which was violated here, one has to use its and four non-causal CVs (with MAFs randomly between
permutational distribution, which appears to work well. 0.01 and 0.1). Although the aSum test was proposed by
For the non-null case with varying association strengths Han and Pan [2010a] to group CVs separately from RVs, as
(Table V), again all the pooled tests suffered from done in CMC, for simplicity here, we did not do such
significant power loss, while the SSU, KMR and C-alpha-P groupings. All the tests had satisfactory Type I error rates
tests were most powerful. The three sequential model (not shown). For power comparison (Table VII), it is most
selection approaches and KBAC performed similarly and notable that the SSU, KMR and C-alpha tests were all low-
better than the aSum test, and all improved over the powered, due to the undue influence of the CVs, as
pooled association tests. analyzed before. The performance of the pooled associa-
Due to the LD among the RVs, the score test and SSUw tion tests and KBAC also degraded. With CVs, variable
test performed differently: When there was no non-causal selection worked well as evidenced by the good perfor-
RVs, the score test was more powerful; however, as the mance of the Step-up procedure, and by that the Seq-
number of non-causal RVs increased, the SSUw test aSum-VS performed much better than Seq-aSum without
became much more powerful than the score test. variable selection. We also note that the weighted C-alpha
tests were much more powerful than the original
unweighted C-alpha test; between the two weighted
C-alpha tests, the first one with weights inversely
NO LD BETWEEN CAUSAL RVs AND NON- proportional to the MAFs performed much better than
CAUSAL RVs the second one in the presence of a large number of non-
Now we consider the case where causal RVs were causal RVs, but not so otherwise; neither was more
correlated, non-causal RVs were also correlated, but there powerful than the wSSU-P test. Overall, the weighted
was no LD between causal and non-causal RVs. For the SSU test (wSSU-P) performed best, closely followed by the
null case (Table VI), again all the tests except the C-alpha- Step-up procedure, then by the SSUw and score tests.
A had satisfactory Type I error rates. The C-alpha-A did
not work because its independence assumption on the RVs
was violated.
For power comparison, again due to the presence of DISCUSSION
opposite association directions, the pooled association
tests performed similarly and had the lowest power. With The three pooled association tests (i.e. Sum, CMC and
no or few non-causal RVs, Seq-aSum and Seq-aSum-VS wSum) performed similarly for RVs. They were most
performed best, closely followed by the C-alpha-P, KMR powerful when there were no opposite association direc-
and SSU tests; for a larger number of non-causal RVs, the tions and when there were no or only few non-causal RVs;
C-alpha-P, KMR and SSU tests were the winners. otherwise, they suffered from a substantial loss of power.
Genet. Epidemiol.
616 Basu and Pan

TABLE VII. Empirical power for the tests at nominal level a 5 0.05 based on 1,000 replicates with eight causal RVs, four
neutral CVs and a number of other neutral RVs
OR 5 (2,2,2,2,2,2,2,2) OR 5 (3,3,2,2,2,,1/2,1/2,1/2)

] of neutral RVs ] of neutral RVs

Test 0 4 8 16 32 0 4 8 16 32

U min P 0.355 0.283 0.269 0.213 0.156 0.518 0.482 0.441 0.412 0.331
Score 0.628 0.580 0.498 0.424 0.348 0.766 0.706 0.629 0.584 0.466
SSU 0.148 0.128 0.134 0.131 0.135 0.225 0.201 0.206 0.203 0.215
wSSU-P 0.777 0.729 0.700 0.589 0.518 0.810 0.764 0.724 0.655 0.582
SSUw 0.634 0.592 0.515 0.429 0.332 0.765 0.704 0.631 0.599 0.489
Sum 0.455 0.438 0.396 0.348 0.299 0.231 0.225 0.195 0.199 0.152
KMR (Linear) 0.158 0.138 0.151 0.145 0.153 0.237 0.216 0.222 0.223 0.234
KMR (Quad) 0.153 0.124 0.136 0.137 0.141 0.219 0.198 0.204 0.201 0.219
CMC 0.575 0.512 0.429 0.309 0.212 0.296 0.254 0.209 0.155 0.124
wSum 0.533 0.508 0.469 0.408 0.346 0.291 0.285 0.249 0.230 0.181
aSum-P 0.467 0.457 0.414 0.355 0.310 0.239 0.245 0.206 0.202 0.158
Step-up 0.776 0.750 0.715 0.610 0.522 0.727 0.712 0.658 0.605 0.499
Seq-aSum 0.368 0.314 0.323 0.300 0.266 0.453 0.410 0.392 0.395 0.342
Seq-aSum-VS 0.550 0.518 0.502 0.450 0.379 0.610 0.617 0.567 0.541 0.471
KBAC 0.554 0.537 0.478 0.446 0.370 0.415 0.402 0.358 0.335 0.270
C-alpha-A 0.106 0.083 0.089 0.088 0.082 0.165 0.154 0.146 0.149 0.160
C-alpha-P 0.165 0.150 0.145 0.139 0.139 0.245 0.233 0.228 0.220 0.225
w1C-alpha-P 0.542 0.527 0.527 0.496 0.474 0.670 0.642 0.632 0.636 0.593
w2C-alpha-P 0.628 0.568 0.476 0.388 0.298 0.773 0.698 0.606 0.563 0.422
RBT 0.826 0.770 0.688 0.592 0.453 0.630 0.581 0.487 0.410 0.321

There is no LD among the CV/RVs.

Perhaps the most surprising and interesting finding is harmful. As shown by the close performance between
that, overall, in the presence of opposite association Seq-aSum and Seq-aSum-VS, there is only minimal gain or
directions and non-causal RVs, the SSU, KMR and loss in selecting causal RVs. On the other hand, as shown
C-alpha-P performed similarly and best. Although the here, when there were both protective and deleterious
three methods appear quite different, they share a causal RVs and few non-causal RVs, our newly proposed
common feature: all can be regarded as testing on a Seq-aSum and Seq-aSum-VS were or nearly were the most
variance component in a R-E model, thus are robust to a powerful, suggesting their applicability not only to RVs,
large number of parameters induced by a large group of but also to CVs: in analyzing multiple common SNVs in an
RVs. This is related to the success story of the class of gene LD region, if the untyped causal SNV is in LD with the
set tests, including both the SSU-equivalent Goeman’s test multiple typed SNVs, the two methods could be powerful.
[Goeman et al., 2004] and KMR [Liu et al., 2008], applied to In addition, leveraging on the idea of pooling and thus
high-dimensional microarray data. Furthermore, the SSU reduced degrees of freedom, they can be also applied to
test and KMR are themselves closely related to each other detect epistasis, as done in He et al. [2010].
[Pan, 2011], and share some advantages: they can be Several approaches are not considered here, including
applied to other GLMs for other types of traits, such as penalized regression [e.g. Malo et al., 2008 for CVs, Zhou
quantitative or survival traits, and to adjust for other et al., 2010 for both CVs and RVs] and some nonparametric
covariates, such as environmental factors, which are regression techniques, such as logic regression [Kooperberg
important as argued by Bansal et al. [2010]. et al., 2001] for CVs, and a Bayesian GLM [Yi and Zhi,
The approaches based on model selection (aSum, 2011], largely due to their difficulty in controlling Type I
Step-up, Seq-aSum, Seq-aSum-VS) improve over the error rates (which is required to make a formal and fair
pooled association tests in the presence of opposite comparison with other statistical tests) and/or associated
association directions. However, in spite of their strong high computing cost, e.g. in permutation tests, especially if
motivation for model selection, their performance might one aims to take account of the uncertainty in choosing
not be as impressive as expected, especially in the presence optimal penalization or tuning parameters. Penalized
of a large number of non-causal RVs. A possible explana- regression and logic regression belong to the class of
tion lies in the trade-off between the gain and the cost of model selection-based approaches. Compared to the four
model selection: in spite of possibly a strong association selection methods compared here, penalized regression
with the trait, due to its low frequency of any single RV, and logic regression are believed to have some advantages.
often there is only minimal power to detect its own However, existing penalized regression and logic regres-
association with the trait, rendering it difficult to sion methods do not incorporate the strategy of collapsing
distinguish whether the RV is or is not associated with RVs or of R-E models, two key elements for the success of
the trait, and if so, whether its effect is protective or the compared methods for RVs; further studies are needed

Genet. Epidemiol.
Statistical Tests for Rare Variants 617

to evaluate their performance. Another approach is based environmental factors and to detect environment-gene
on haplotype inference [Zhu et al., 2010; Li et al., 2010], interactions, their applicability to CVs and/or RVs no
which is appealing for its applicability to GWAS for matter whether they are in LD or not. We note that the SSU
association analysis of more frequent RVs in the MAF test can be applied to more complex regression models,
range of 0.1–5%. e.g. with both main effects and some interaction terms;
We note that Price et al. [2010] proposed using multiple equally, in KMR we can use a kernel that can capture some
thresholds and (possibly predicted) biological functional complex interactions among the SNVs. Of course, as
annotations to group RVs and empirically showed its shown earlier, with any given kernel and its decomposi-
advantage over using only one group. For simplicity, we tion, we can have an SSU test equivalent to KMR.
have only considered the use of a single group. However, It would be of interest to compare the performance of
our conclusion should be useful for the case with multiple the SSU/KMR and KBAC in the presence of interactions
groups for RVs: a test with high power for a single group is among RVs.
likely to be even more powerful for multiple groups that A potentially useful resource resulting from this work is
are appropriately constructed, as shown by Pan and Shen freely available software: we have implemented most of the
[2011]. As shown by our simulation studies, mixing non- compared methods in R; R code will be posted on our web
causal CVs (or RVs with relatively higher MAFs) with RVs site at
may degrade the performance of several tests, especially
the SSU, KMR, C-alpha and KBAC tests. Hence it is a ACKNOWLEDGMENTS
critical question in practice how to define RVs, to which a
test is applied. There are two possible ways. The first way
W.P. thanks Drs. Roeder and Witte for sharing their
is to use the multiple cut-offs of MAF to define RVs and
manuscripts, Drs. Wu, Ionita-Laza and Leal and Gao Wang
then combine the test results, as implemented in the
for sharing their computer programs. The authors are
multiple threshold test of Price et al. [2010] and in the
grateful to the reviewers for helpful and constructive
adaptive tests of Pan and Shen [2011]. Second, as shown
comments. This research was partially supported by
here and by other authors [Madsen and Browning, 2009],
NIH grant R21DK089351; W.P. was also supported by
weighting the variants in a test with suitably chosen
R01HL65462 and R01HL105397.
weights (e.g. inversely proportional to their MAFs) may
improve the performance of the test. We have not
investigated these issue extensively and more studies are
needed in the future. Finally, the simulation set-ups REFERENCES
considered here are similar to Li and Leal [2008], but
Asimit J, Zeggini E. 2010. Rare variant association analysis methods
may still be over-simplified. Although there is no compel-
for complex traits. Ann Rev Genet 44:293–308.
ling statistical argument for the strong dependence of our
Bansal V, Libiger O, Torkamani A, Schork NJ. 2010. Statistical analysis
conclusions on the simulation set-ups, it would be helpful strategies for association studies involving rare variants. Nat Rev
to consider more practical set-ups, such as using real Genet 11:773–785.
sequencing data; we did not pursue it here due to lack of Basu S, Stephens M, Pankow JS, Thompson EA. 2010. A likelihood-
publicly available large samples of sequencing data. With a based trait-model-free approach for linkage detection of binary
sample size of currently only several hundreds with trait. Biometrics 66:205–213.
multiple racial/ethnic groups provided by the 1,000 Bhatia G, Bansal V, Harismendy O, Schork NJ, Topol EJ, Frazer K.
Genome Project, it is not clear how to best construct 2010. A covering method for detecting genetic associations
simulated data to mimic real data while maintaining the between rare variants and common phenotypes. PLoS Comput
low MAFs of RVs. Although we acknowledge the limita- Biol 6:e1000954.
tion of our current simulation set-ups, they do illustrate Bodmer W, Bonilla C. 2008. Common and rare variants in
some useful properties of various tests, such as how they multifactorial susceptibility to common diseases. Nat Genet
perform in the presence of opposite association directions, 40:695–701.
of non-causal RVs and/or CVs, and of correlated SNVs. Chapman JM, Whittaker J. 2008. Analysis of multiple SNPs in a
In summary, since there is a large power difference candidate gene or region. Genet Epidemiol 32:560–566.
between the pooled association tests and the R-E model- Chen SX, Qin Y-L. 2010. A two-sample test for high-dimensional data
based approaches (SSU, KMR and C-alpha-P) at either of with applications to gene-set testing. Ann Stat 38:808–835.
the two extremes (i.e. whether there are opposite associa- Clayton D, Chapman J, Cooper J. 2004. Use of unphased multilocus
tion directions), we recommend the use of a test from each genotype data in indirect association studies. Genet Epidemiol
class if it is unclear which extreme is likely to hold. We also 27:415–428.
Conneely KN, Boehnke M. 2007. So many correlated tests, so little
recommend the use of the KBAC test and a variable
time! Rapid adjustment of p values for multiple correlated tests.
selection-based approach, e.g. our newly proposed
Am J Hum Genet 81:1158–1168.
Seq-Sum-VS test; the former may be able to explore some Fan R, Knapp M. 2003. Genome association studies of complex
complex interactions among RVs, while the latter may diseases by case-control designs. Am J Hum Genet 72:850–868.
shed light on which SNVs are associated with the trait and Goeman JJ, van de Geer S, de Kort F, van Houwelingen HC. 2004.
if so, their association directions. Among the pooled A global test for groups of genes: testing association with a clinical
association tests, they all perform similarly, while for the outcome. Bioinformatics 20:93–99.
other class, the SSU and KMR have certain advantages: Goeman JJ, van de Geer S, van Houwelingen HC. 2006. Testing
their known asymptotic distributions avoid the use of against a high dimensional alternative. J R Stat Soc B
computationally demanding permutations, they can be 68:477–493.
implemented in any GLMs, which implies their applic- Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI. 2008. Shifting
ability to binary, quantitative and other types of traits, and paradigm of association studies: value of rare single-nucleotide
their ability to adjust for other covariates such as polymorphisms. Am J Hum Genet 82:100–112.

Genet. Epidemiol.
618 Basu and Pan

Han F, Pan W. 2010a. A data-adaptive sum test for disease association Pan W. 2009. Asymptotic tests of association with multiple SNPs in
with multiple common or rare variants. Hum Hered 70:42–54. linkage disequilibrium. Genet Epidemiol 33:497–507.
Han F, Pan W. 2010b. Powerful multi-marker association tests: Pan W. 2011. Relationship between genomic distance-based regression
unifying genomic distance-based regression and logistic regres- and kernel machine regression for multi-marker association
sion. To appear in Genet Epidemiol. Available from: http:// testing. To appear in Genet Epidemiol. Available from: http://
He H, Oetting WS, Brott MJ, Basu S. 2010. Pair-wise multifactor Pan W, Shen X. 2011. Adaptive tests for association analysis of rare
dimensionality reduction method to detect gene-gene interactions variants. To appear in Genet Epidemiol. Available as Research
in a case-control study. Hum Hered 69:60–70. report 2011-11, Division of Biostatistics, University of Minnesota.
Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA.
2010. A catalog of published genome-wide association studies. Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J,
Available from: Accessed October Wei L-J, Sunyaev SR. 2010. Pooled association tests for rare
31, 2010. variants in exon-resequenced studies. Am J Hum Genet
Hoffmann TJ, Marini NJ, Witte JS. 2010. Comprehensive approach to 86:832–838.
analyzing rare genetic variants. PLoS One 5:e13584. Pritchard JK. 2001. Are rare variants responsible for susceptibility to
Ionita-Laza I, Buxbaum JD, Laird NM, Lange C. 2011. A new testing complex diseases? Am J Hum Genet 69:124–137.
strategy to identify rare variants with either risk or protective Pritchard JK, Cox NJ. 2002. The allelic architecture of human disease
effect on disease. PLoS Genet 7:e1001289. genes: common disease-common varianty or not? Hum Mol
Kimeldorf GS, Wahba G. 1971. Some results on Tchebycheffian spline Genet 11:2417–2423.
function. J Math Anal Appl 33:82–95. Schaid DJ. 2010a. Genomic similarity and kernel methods I: advance-
Kooperberg C, Ruczinski I, LeBlanc ML, Hsu L. 2001. ments by building on mathematical and statistical foundations.
Sequence analysis using logic regression. Genet Epidemiol Hum Hered 70:109–131.
21:S626–S631. Schaid DJ. 2010b. Genomic similarity and kernel methods I: methods
Kotowski I, Pertsemlidis A, Luke A, Cooper R, Vega G, Cohen J,
for genomic information. Hum Hered 70:132–140.
Hobbs H. 2006. A spectrum of PCSK9 alleles contributes to plasma Wang T, Elston RC. 2007. Improved power by use of a weighted score
levels of low-density lipoprotein cholesterol. Am J Hum Genet
test for linkage disequilibrium mapping. Am J Hum Genet
Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP. 2008. A powerful and
Wessel J, Schork NJ. 2006. Generalized genomic distance-based
flexible multilocus association test for quantitative traits. Am J
regression methodology for multilocus association analysis. Am J
Hum Genet 82:386–397.
Hum Genet 79:792–806.
Li B, Leal SM. 2008. Methods for detecting associations with rare
Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ,
variants for common diseases: application to analysis of sequence
Hunter DJ, Lin X. 2010. Powerful SNP-set analysis for
data. Am J Hum Genet 83:311–321.
case-control genome-wide association studies. Am J Hum Genet
Li Y, Byrnes AE, Li M. 2010. To identify associations with rare variants,
Just WHaIT: weighted haplotype and imputation-based tests. Am J
Xiong M, Zhao J, Boerwinkle E. 2002. Generalized T2 test for genome
Hum Genet 87:728–735.
association studies. Am J Hum Genet 70:1257–1268.
Lin WY, Schaid DJ. 2009. Power comparisons between similarity-based
Yi N, Zhi D. 2011. Bayesian analysis of rare variants in genetic
multilocus association methods, logistic regression, and score tests
association studies. Genet Epidemiol 35:57–69.
for haplotypes. Genet Epidemiol 33:183–197.
Zelterman D, Chen C-F. 1988. Homogeneity tests against central-
Liu DJ, Leal SM. 2010. A novel adaptive method for the analysis of
next-generation sequencing data to detect complex trait associa- mixture alternative. JASA 83:179–182.
tions with rare variants due to gene main effects and interactions. Zhang L, Pei YF, Li J, Papasian CJ, Deng HW. 2010. Efficient utilization
PLoS Genet 6:e1001156. of rare variants for detection of disease-related genomic regions.
Liu D, Ghosh D, Lin X. 2008. Estimation and testing for the effect of a PLoS One 5:e14288.
genetic pathway on a disease outcome using logistic kernel Zhou H, Sehl ME, Sinsheimer JS, Lange K. 2010. Association screening
machine regression via logistic mixed models. BMC Bioinformatics of common and rare genetic variants by penalized regression.
9:292. Bioinformatics 26:2375–2382.
Madsen BE, Browning SR. 2009. A groupwise association test Zhu X, Feng T, Li Y, Lu Q, Elston RC. 2010. Detecting rare variants for
for rare mutations using a weighted sum statistic. PLoS Genet complex traits using family and unrelated data. Genet Epidemiol
5:e1000384. 34:171–187.
Maher B. 2008. Personal genomes: the case of the missing heritability.
Nature 456:18–21.
Malo N, Libiger O, Schork NJ. 2008. Accommodating linkage
disequilibrium in genetic-association analyses via ridge regression.
Am J Hum Genet 82:375–385.
Morgenthaler S, Thilly WG. 2007. A strategy to discover BETWEEN GOEMAN’S TEST AND
genes that carry multi-allelic or mono-allelic risk for
common diseases: a cohort allelic sums test (CAST). Mutat Res ZELTERMAN AND CHEN’S
Morris AP, Zeggini E. 2010. An evaluation of statistical approaches to
rare variant analysis in genetic association studies. Genet Epide-
miol 34:188–193. We first review Zelterman and Chen’s homogeneity test.
Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Ogho- Suppose that y1 ; . . . ; yn are independent random variables
Melander M, Katherisan S, Purcell SM, Roeder K, Daly MJ. 2011. with respective pdf’s fi ðyi jli Þ, conditional on a k-dimen-
Testing for an unusual distribution of rare variants. PLoS Genet sional parameter lt. Under H0, all li’s are equal to a fixed
7:e1001322. vector l0. It is assumed that li’s are random: li ¼ l0 1az,
Neyman J, Scott E. 1966. On the use of c-alpha optimal tests of where z is a k-dimensional random variable with E(z) 5 0
composite hypothesis. Bull Int Stat Inst 41:477–497. and CovðzÞ ¼  ¼ ðsst Þ. Under this formulation, testing H0
Genet. Epidemiol.
Statistical Tests for Rare Variants 619

0 0
is equivalent to testing H00 : a 5 0. Zelterman and Chen Since EðUðiÞ UðjÞ Þ ¼ EðUðiÞ ÞEðUðjÞ Þ ¼ 0, by ignoring
[1988] showed that the score test statistic for H00 is cross-product terms UðiÞ UðjÞ , we have Goeman’s test
1X n X k
@2 fi ðyi jl0 Þ 1
TZ ¼ sss 1 1 1X n X
2 i¼1 s¼1 @l20s fi ðyi jl0 Þ S ¼ U 0 U  trðVÞ ¼ 00
sst fi;st =fi ¼ TZ :
2 2 2 i¼1 s;t
n X
@2 fi ðyi jl0 Þ 1
1 sst ;
i¼1 sot
@l0s @l0t fi ðyi jl0 Þ Hence, Goeman’s test is equivalent to Zelterman and
Chen’s homogeneity test, which covers the C-alpha test as
where l0 ¼ ðl01 ; . . . ; l0k Þ0 .
a special case (with mi as yi and f i as Bin(ni,pi)). By the
For observation Yi, the score vector is UðiÞ ¼
equivalence among permutation-based Goeman’s test,
ðfi;1 0
=fi ; . . . ; fi;k =fi Þ0 , and
X SSU test and KMR test with a linear kernel, we know that
UðiÞ 0
UðiÞ ¼ 0 0
sst fi;s fi;t : the SSU test, KMR test with a linear kernel, and
s;t permutation-based C-alpha test are all equivalent (if a
0 common random variable of interest is modeled). How-
For simplicity we use notation fi;s ¼ @fi =@l0s. On the other ever, in the current context, since the disease status of
hand, we have the (s,t) th element of V as subject i is treated as random variable of interest yi in the
@UðiÞ;s X
n 00
fi;st 0 0
fi  fi;s fi;t SSU test and KMR, while the mutation status of variant i is
Vst ¼  ¼ : treated as yi in the C-alpha test, the three tests are closely
@l0t I¼1
fi2 related but not exactly equivalent.

Genet. Epidemiol.