You are on page 1of 13

Biometrics DOI: 10.1111/biom.

12653

Pearson’s Chi-square Test and Rank Correlation Inferences for


Clustered Data

Joanna H. Shih1, * and Michael P. Fay2, **


1
Biometric Research Program, National Cancer Institute, 9609 Medical Center Drive, Rm 5W124,
Bethesda, Maryland 20892, U.S.A.
2
Biostatistics Research Branch National Institute of Allergy and Infectious Diseases, Bethesda,
Maryland 20852, U.S.A.
∗ email: jshih@mail.nih.gov
∗∗ email: mfay@niaid.nih.gov

Summary. Pearson’s chi-square test has been widely used in testing for association between two categorical responses.
Spearman rank correlation and Kendall’s tau are often used for measuring and testing association between two continuous
or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair
of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists
of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster
resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data.
We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations.
The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration.
Key words: Association; chi-square test; Clustered data; Kendall’s tau; midrank; Spearman rank correlation; U-statistics;
Within-cluster resampling

1. Introduction obtain a parameter estimate. The procedure is repeated many


Testing for association between two variables is routinely times, and the final parameter estimate is the average of the
performed in statistical applications. When the variables of parameter estimates obtained from the re-sampled data sets.
interest are categorical, they may be represented by an R × C The variance of that average uses a specialized WCR variance
contingency table, and Pearson’s χ2 statistic is widely applied formula that accounts for the induced correlation created by
to test the independence of the row and column variables that the within-cluster resampling.
are jointly multinomial. When both variables of interest are In the context of testing, Follmann et al. (2003) proposed
ordinal or continuous, we want measures of association which applying the WCR procedure to p-values. In brief, in each
account for ordinality and often additionally want measures re-sampled data set, a p-value is obtained from a statisti-
that are invariant to monotonic transformations of the origi- cal procedure. If the p-value is continuous and exact, then
nal data. Two such measures are Spearman’s rank correlation under the null hypothesis, the ith p-value, Pi , is uniformly
and Kendall’s tau. The sample statistics of these two mea- distributed over (0,1) and −1 (Pi ) = Zi is standard normal,
sures may be used to test for independence (Hollander and where −1 (q) is the qth quantile of the standard normal dis-
Wolfe, 1999). When the response pairs are all independent, tribution. For discrete and/or non-exact p-values then Zi is
the validity of the simple permutation test (i.e., permuting only approximately standard normal. In either case, the aver-
one variable all possible ways) and the asymptotic validity age of these repeated transformed variables may be treated
of the asymptotic test based on the above mentioned test as a mean zero normal random variable. The final p-value
statistics have been well established. However, when data are is obtained by comparing this average statistic standard-
clustered, these tests are no longer even asymptotically valid, ized by its standard error to a standard normal distribution.
because within-cluster observations might be correlated, and However, there is one major drawback of this approach. For
hence the required assumption of independent response pairs test statistics which are discrete in nature, such as Pear-
is violated. son’s χ2 test, the p-value of a re-sampled data set might
To alleviate the problem of dependent samples in esti- be equal to 1 and hence the inverse normal score equals ∞.
mation, Hoffman et al. (2001) developed the within-cluster Consequently, the average of the repeated transformed vari-
resampling (WCR) method. The basic idea as applied to our ables is not finite. So some adjustment needs to be made
problem is as follows. If each cluster consisted of only one to the observations with p-values equal to 1 so that the
response pair, then the standard methods for independent average of the transformed data would be approximately dis-
pairs of observations can be applied. In WCR, a response tributed as a mean zero normal random variable. We use a
pair is randomly selected from each cluster, and a standard hypothetical example to illustrate the problem. We gener-
statistical method is applied to this independent data set to ated a simulated data set consisting of 50 clusters each with

© 2017, The International Biometric Society 1


2 Biometrics

1000

1500
800

1000
Frequency

Frequency
600
400

500
200
0

0
0.0 0.2 0.4 0.6 0.8 1.0 −4 −3 −2 −1 0 1 2
P−value Inverse normal score of p−value

(a) (b)
Figure 1. Histogram of p-values: (a) Histogram of p-values of 10,000 within-cluster re-sampled data; (b) Histogram of
inverse normal scores of p-values.

three pairs of observations, namely (Xij , Yij ), i = 1, . . . , 50; j = value for the WCR p-values equal to 1, but it is not clear
1, 2, 3. Both Xi = [Xi1 , Xi2 , Xi3 ] and Yi = [Yi1 , Yi2 , Yi3 ] were what finite value would work well. The new test proposed in
independently drawn from multivariate normal with means this article avoids this problem completely, and results in the
0 and an exchangeable covariance structure with correlation p-value of 0.270 when applied to that simulated data set.
coefficient 0.3. Each Xij and Yij was converted to a binary In this article, we formulate the above mentioned chi-
response by thresholding both X and Y at 0. The above men- square test and rank-based correlation estimators through a
tioned WCR procedure on the binary responses was repeated WCR-based U-statistic framework such that the established
10, 000 times to test the independence of the binary versions statistical properties for standard U-statistics of independent
of X and Y. Of the 10,000 p-values each obtained from the and identically distributed (iid) samples can be applied. Then
Pearson χ2 test, 302 were equal to 1. through analytical averaging with respect to the re-sampling
First, we consider the simple adjustment of excluding the distribution given the observed data, new test statistics and
p-values equal to 1, and following the method of Follmann, estimators for clustered data are formed. Note that WCR
et al (2003, Section 4.1) on the remaining p-values. The his- analytical averaging has been applied to estimating equa-
togram of the 10,000 WCR p-values and of 9698 finite inverse tions by Williamson et al. (2003) and to rank-sum tests (e.g.,
normal scores were displayed in Figure 1a and b, respectively. the Wilcoxon test) by Datta and Satten (2005). Neverthe-
Because 302 observations with inverse standard normal trans- less, these approaches do not directly apply to the chi-square
formed values equal to ∞ were excluded in Figure 1b, the test and estimation of rank correlations considered here.
distribution of the remaining observations becomes truncated Lorenz et al. (2011) addressed marginal association measures
and less variable. If one proceeds with the WCR procedure for clustered continuous data. They applied WCR analytical
without these 302 observations, the mean and variance of averaging to the estimation of Pearson’s correlation coef-
the remaining 9698 transformed observations are equal to ficient and Kendall’s τ for continuous clustered data. We
−0.136 and 0.895, respectively. Assuming each transformed expand that work by allowing for both continuous and dis-
observation is distributed standard normal, the estimated crete clustered data using U-statistics based on midranks,
variance of the average of the 9698 transformed observa- and studying WCR-based estimators for Spearman rank cor-
tions using the WCR variance formula to account for within relation, Kendall’s tau, Kendall’s tau b, and Goodman and
cluster correlations
√ is 1−0.895=0.105. The final p-value is kruskal’s gamma.
(−0.136/ 0.105) = 0.337, where  is the standard normal The rest of the article is organized as follows. In Section
cumulative distribution. However, this final p-value is not 2, we express the chi-square test and four rank-based associ-
likely to be valid, because we have eliminated only the largest ation measures for iid samples by U-statistics. In Section 3,
WCR p-values, which will decrease the mean of the trans- we generalize these U-statistics to clustered data via WCR
formed values and will also change the variance estimate. An analytical averaging and present the asymptotic properties
alternative to this simple adjustment is to use some finite of these new tests and estimators. We evaluate the perfor-
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 3


mance of the proposed new tests with simulations in Section where (k, j) denotes summation over all kj unique
4, and apply them to a data set collected from a cancer imag- sets of j elements, (i1 , · · · , ij ), of {1, · · · , k}, ignoring
ing study in Section 5. We conclude the article with some order. The U-statistic Unjk is associated with kernel ψjk =
discussion in Section 6. j j j j
ψ[(X1 , Y1k ), (X2 , Y2k )] = 12 (X1 − X2 )(Y1k − Y2k ) of degree 2 for
j = 1, · · · , R − 1, k = 1, · · · , C − 1. Let θ denote the expecta-
2. Inference without Clustering tion of kernel ψ, and define h(1) as the residual kernel of the
For this section, we let Xi and Yi , i = 1, · · · , n, denote n ran- conditional expectation of ψ with degree k given by
dom samples of variables X and Y , respectively. Let F be
the joint distribution of any pair (Xi , Yi ), and let FX and FY h(1) (x, y) = ψ1 (x, y) − θ, ψ1 (x1 , y1 )
denote the marginal distributions of Xi and Yi , respectively.
In the next subsections, we explore testing and estimating the = E[ψ{(x1 , y1 ), (X2 , Y2 ), · · · , (Xk , Yk )}]. (2)
association between the variables in different ways.
By the H-decomposition of U-statistics (see e.g., Lee (1990)),
2.1. χ2 Test For Categorial Data
2  (1) j k
Consider the null hypothesis that X and Y are independent √
n

such that the joint distribution of X and Y is equal to the n(Unjk − σjk ) = √ hjk (Xi , Yi ) + n(Rn ), (3)
product of the marginal distributions expressed by n
i=1

(1) jk
P(Xi ≤ x, Yi ≤ y) = P(Xi ≤ x)P(Yi ≤ y) = FX (x)FY (y) where hjk (x, y) = ψ1 (x, y) − σjk = 12 [(x − pj+ )(y − p+k )−σjk ]
H0 : √
and the variance of n(Rn ) is O(n−1 ). Hence by Slut-
(1) √
sky’s theorem, n(Unjk − σjk ) is asymptotically equivalent to
n (1) j
2n−1/2 i=1 hjk (Xi , Yik ) which has mean 0 and variance
Assume Xi take values on 1, · · · , R and Yi take values on (1) j
4E[hjk (X1 , Y1k )2 ]. Let Unχ2 be an (R − 1)(C − 1) × 1 vector
1, · · · , C. Let the joint distribution of Xi and Yi be denoted with [j + (k − 1)C]th element equal to Unjk for j = 1, . . . , R −
by pjk = P(Xi = j, Yi = k) and the marginal distribution be 1; k = 1, . . . , C − 1, and let  denote the asymptotic variance-

denoted by pj+ = P(Xi = j) and p+k = P(Yi = k) for j = covariance of nUnχ2 . By the multivariate central limit

1, · · · , R, and k = 1, · · · C. The null hypothesis (1) is equiv- theorem, under H0 , nUnχ2 is asymptotically multivariate
alent to H0 : pjk = pj+ p+k , i = 1, · · · , R − 1, j = 1, · · · C − 1. normal with mean 0 and variance–covariance matrix 0 given
We can rewrite H0 in terms of covariances, after re-expressing by
j
Xi and Yi as sets of indicator variables. Let Xi = 1 if Xi = j
and 0 otherwise, for j = 1, · · · , R, and Yi = 1 if Yi = k and
k

(1) j (1) j 
j
0 otherwise, for k = 1, · · · , C. Thus, pjk = P(Xi = 1, Yik = 1). 0 = 4Cov[hjk (Xi , Yik ), hj k (Xi , Yik )]
The joint distribution of (Xi , · · · , Xi ) and (Yi1 , · · · , Yik ) is
1 k

⎪ pj+ (1 − pj+ )pk+ (1 − pk+ ) if j = j  , k = k
multinomial with a total of RC − 1 non-redundant multi- ⎪
⎨ −p (1 − p )p p  if j = j  , k = k
nomial joint probabilities {pjk }. These joint probabilities j+ j+ +k +k
= (4)
{pjk } can be transformed to (R − 1) marginal probabili- ⎪ −pj+ pj + p+k (1 − p+k )
⎪ if j = j  , k = k
ties (p1+ , · · · , pR−1,+ ) of Xi , (C − 1) marginal probabilities ⎩
−pj+ pj + p+k p+k 
if j = j , k == k .
(p+1 , · · · , p+,C−1 ) of Yi , and (R − 1) × (C − 1) covariances
j
σjk = Cov(Xi , Yik ) = pjk − pj+ p+k , for j = 1, · · · , R − 1, and
k = 1, · · · , C − 1. So we can write the null hypothesis of inde- A consistent estimator  ˆ 0 of 0 is obtained by replacing
pendence as H0 : σjk = 0, j = 1, · · · , R − 1, k = 1, · · · , C − 1. pj+ and p+k by their consistent estimators p̂j+ = X̄j and
The covariance parameter σjk can be expressed as the covari- p̂+k = Ȳ k . Because of the special structure of 0 , under H0 ,
−1
ance functional X2 = nUTnχ2 ˆ0 Unχ2 has an asymptotic chi-squared distribu-
tion with (R − 1) × (C − 1) degrees of freedom (Bishop et al.
  (1975), p. 473). In the case of 2 × 2 table,
1
Cov(Xj , Y k ) = (x1 − x2 )(y1 − y2 ) dF jk
2 
[ (Xi − p̂1+ )(Yi − p̂+1 )/(n − 1)]2
× (x1 , y1 ) dF (x2 , y2 ),
jk X =
2 i
p̂1+ (1 − p̂1+ )p̂+1 (1 − p̂+1 )/n
  2
where F jk is the joint distribution of Xj and Y k . Then the (Xi − p̂1+ )(Yi − p̂+1 )
  i
corresponding U-statistic for the covariance functional is np̂1+ (1 − p̂1+ )p̂+1 (1 − p̂+1 )

 −1  where the right-hand side of the expression equals the Pearson
n 1 j j
Unjk = (X − Xi2 )(Yik1 − Yik2 ) chi-squared statistic (Bishop et al. (1975)). For R > 2 or C >
2 2 i1
(n,2) 2, X2 is asymptotically equivalent to the Pearson statistic
due to the fact that X2 is equal to scalar (n − 1)/n times
1  j
n

= (Xi − X̄j )(Yik − Ȳ k ), the score test statistic (Lipsitz and Fitzmaurice (1996)) and
n−1 that the Pearson statistic is asymptotically equivalent to the
i=1
4 Biometrics


likelihood-ratio test statistic (Agresti (1990), p.49) and hence where (k)
denotes summation over all k! permuta-
the score test statistic. tions (i1 , · · · , ik ) of {1, · · · , k}, and K(a, b) = {I[b ≤ a] +
2.2. Spearman Rank Correlation I[b < a]}/2 for any a, b. The corresponding U-statistics
(Unρ , UnρX , UnρY ) are unbiased for (σ̇XY , σ̇X2 , σ̇Y2 ), and by alge-
The Spearman rank correlation is the sample Pearson’s corre-
bra (Unρ , UnρX , UnρY ) = (σ̇X∗ Y ∗ , σ̇X2 ∗ , σ̇Y2∗ ) + Op (1/n). It follows
lation ρ applied to the midranks of the X’s and the midranks
that rs is consistent for ρs and has the same asymp-
of the Y ’s. Let the midrank of Xi out of X = {X1 , . . . , Xn } be totic distribution as Unρ / UnρX UnρY . Hence, the asymptotic
Ri , and similarly let the midrank of Yi out of
Y be Si . Then distribution of rs can be derived through (Unρ , UnρX , UnρY )
Spearman’s rank correlation is defined as rs = (Ri − R̄)(Si −
  which by the H-decomposition theory of U-statistics
S̄)/ i
(Ri − R̄)2 j
(Sj − S̄)2 , where R̄ and S̄ are means. To is established through the residual kernels hρ (Xi , Yi ) =
(1)

(1) (1) (1)


express the Spearman rank correlation as a function of U- {hρ (Xi , Yi ), hρX (Xi ), hρY (Yi )} , i = 1, · · · , n defined accord-
T

statistics, we write the midranks  in terms of the empirical  ing to (2) and given by
distributions F̂X and F̂Y as Ri = n 12 F̂X (Xi ) + 12 F̂X (Xi −) +
 
1/2 = nḞˆ (X ) + 1/2, and similarly S = nḞˆ (Y ) + 1/2, where (1) 1
X i i Y i hρ (x, y) = 1 − ḞX (x) − ḞY (y) + ḞX (x)ḞY (y)+ Ḟ (x, y) dFX (x)
ˆ and Ḟ
Ḟ ˆ are defined implicitly. Then it can be shown that 3
X Y
 

1
ˆ (Y ∗ )
ˆ (X∗ ), Ḟ + Ḟ (x, y) dFY (y) − − σ̇XY , (10)
rs ≡ ρ Ḟ X Y 4

CovX∗ Y ∗ Ḟ ˆ (Y ∗ )
ˆ (X∗ ), Ḟ
X Y
=  


(1) 1 2 1
VarX∗ ˆ (X∗ ) Var ∗
Ḟ X Y
ˆ (Y ∗ )
Ḟ Y
hρX (x) = ḞX (x)2 + ḞX (u)K(u, x) dFX (u) − − σ̇X2 ,
3 3 4

σ̇X∗ Y ∗ (11)
=  , (5)
σ̇X2 ∗ σ̇Y2∗

where X∗ ∼ F̂X ,Y ∗ ∼ F̂Y and the joint distribution of (X∗ , Y ∗ ) (1) 1 2 1
hρY (y) = ḞY (y)2 + ḞY (u)K(u, y) dFY (u) − − σ̇Y2 .
is the bivariate empirical distribution function. We define the 3 3 4
parameter estimated by rs as (12)
 
CovXY ḞX (X), ḞY (Y ) √
ρs ≡ ρ(ḞX (X), ḞY (Y )) =  It follows that n(Unρ −σ̇XY , UnρX −σ̇X2 , UnρY − σ̇Y2 )T is asym-
    ptotically multivariate normal with mean 0 and variance–
VarX ḞX (X) VarY ḞY (Y ) (1) (1)
covariance matrix ρ =9E{hρ (X1 , Y1 )hρ (X1 , Y1 )T } which
σ̇XY can be consistently estimated by its empirical counterpart
=  , (6) ˆ plugged in for Ḟ . Let σ 2 denote the asymptotic vari-
σ̇X2 σ̇Y2 with Ḟ
√ ρ
ance of n(rs − ρs ). The delta method can be used to estimate
where (X, Y ) ∼ F , ḞX (x) = 12 FX (x) + 12 FX (x−), and ḞY is σρ2 . Alternatively, if the denominator of rs is treated as a
known quantity, the delta method estimator simplifies to σ̂ρ2 =
defined similarly. If F is continuous, F ≡ Ḟ and E[FX (X)] = n (1)
E[FY (Y )] = 1/2. When F is discrete, we show in the sup- 9 i=1 ĥρ (xi , yi )2 /{nσ̇X2 ∗ σ̇Y2∗ }. When X and Y are continuous,
(1) (1) (1)
plementary material that the expected value of ḞX (X) also σ̇X2 = σ̇Y2 = 1/12 and hρX (x) = hρY (y) = 0, ∀x, y, and hρ (x, y),
ˆ (X∗ )] = E ∗ [Ḟ
ˆ (Y ∗ )] = 1/2. (1)
equals 1/2. Consequently, E ∗ [Ḟ X X Y Y under H0 , is simplified to hρ0 (x, y) = 13 {FX (x) − 1/2}{FY (y) −
The symmetric kernels (ψρ , ψρX , ψρY ) for the three parameters 1/2} with
(1)
= 1/(9 × 122 ). Correspondingly, rs
E[hρ0 (X, Y )2 ]
(σ̇XY , σ̇X2 , σ̇Y2 ) all have degree 3 and are given by converges to a zero-mean normal distribution with variance
1 under H0 , and independence of X and Y is rejected at the α

ψρ [(X1 , Y1 ), (X2 , Y2 ), (X3 , Y3 )] significance level when | nrs | exceeds −1 (1 − α/2). If either
1 variable is discrete or ties are present, independence of X and

= K(Xi1 , Xi2 )K(Yi1 , Yi3 ) − 1/4, (7) Y is rejected when | nrs /σ̂ρ0 | exceeds −1 (1 − α/2), where
6
(3) σ̂ρ0 is the plug-in estimator of σρ evaluated under H0 .
2.3. Kendall’s Tau
1 Kendall’s concordance test is based on Kendall’s rank cor-
ψρX (X1 , X2 , X3 ) = K(Xi1 , Xi2 )K(Xi1 , Xi3 ) − 1/4, (8)
6 relation (Kendall’s tau) which is defined by τ ≡ τ{(X1 , Y1 ),
(3)
(X2 , Y2 )}=P{(X1 − X2 )(Y1 − Y2 ) > 0}−P{(X1 − X2 )(Y1 − Y2 )
< 0}. The kernel for τ has degree 2 and equals ψτ ((X1 , Y1 ),
(X2 , Y2 )) = sign[(X1 − X2 )(Y1 − Y2 )]. The estimator τ̂ is the
1 corresponding U-statistic Unτ = (# of concordant pairs −
ψρY (Y1 , Y2 , Y3 ) = K(Yi1 , Yi2 )K(Yi1 , Yi3 ) − 1/4, (9) 
6 (1)
# of discordant pairs)/ 2n . The residual kernel hτ (x, y) =
(3)
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 5

1 − 2ḞX (x) − 2ḞY (y) + 4Ḟ (x, y) − τ which under H0 is given test for clustered data, we want to test that cor(XC , YC ) = 0.
(1)
by hτ0 = {1 − 2ḞX (x)}{1 − 2ḞY (y)}. When both X and Y We allow cluster size to be informative. For example, suppose
(1) √ cor(Xij , Yij | mi = 1) = −0.2 for i = 1, · · · , n/2 and cor(Xij , Yij |
are continuous, under H0 , E[hτ0 (X, Y )2 ] = 1/9, and nτ̂
converges to a zero-mean normal distribution with variance mi = 5) = 0.2 for i = n/2 + 1, · · · , n. Then cor(XC , YC ) = 0,
4/9. Accordingly, independence of X and Y is rejected at the since the expected correlation of a randomly sampled response
√ pair from a randomly sampled cluster would be (1/2)(−0.2) +
α significance level when |3 nτ̂/2| exceeds −1 (1 − α/2). If
either variable is discrete or ties are present, independence of (1/2)(0.2) = 0. In this case, cor(XC , YC ) is different from the
√ correlation of a random response pair picked from the sam-
X and Y is rejected when | nUnτ /σ̂τ0 | exceeds −1 (1 − α/2),
where σ̂τ0 is the plug-in estimator of στ evaluated under H0 . ple of all response pairs from all clusters, because of the
informative cluster size. To test hypotheses such as H0 :
2.4. Kendall’s Tau b and Goodman and Kruskal’s cor(XC , YC ) = 0, we apply the WCR procedure to the U-
Gamma statistics described in Section 2 for iid samples. Through
In the presence of ties, the maximal value of τ is less than analytical WCR-averaging, we present new test statistics in
one, because tied pairs in either variable are counted in closed form and interval estimation of rank correlations for
the denominator but do not contribute to the numerator clustered data.
of the estimator. Two alternative measures were proposed 3.1. WCR Average of U-Statistics
to discount tied pairs, namely Kendall’s tau b and Good-
In this section, we present a general form of the WCR average
man and Kruskal’s gamma (Agresti, 1990). Kendall’s tau
√ √ of the U-statistics considered in Section 2 and show its asymp-
b is defined by τb ≡ τb {(X1 , Y1 ), (X2 , Y2 )} = τ/{ ωX ωY },
totic properties. Let Un denote a U-statistics of degree k for
where ωX = P(X1 = X2 ) and ωY = P(Y1 = Y2 ). Let X(1) <
iid samples with associated kernel ψ{(X1 , Y1 ), · · · , (Xk , Yk )}.
X(2) · · · < X(R) denote the R distinct values of X and
In each within-cluster re-sampled data set, a data point in
(n1X , n2X , · · · , nRX ) denote the number of observations at
cluster i is selected at random from the uniform distribution
each of the R distinct values. The kernel for ωX has degree (q)
on the integers 1, · · · , mi . Let Un denote the U-statistics for
two and equals ψωX = I[X1 = X2 ] with the corresponding
 −1 R niX  the q-th re-sampled data set and Ūn denote the average of
U-statistics UnωX = 1 − 2n . Let Y(1) < Y(2) · · · < (q) n
i=1 2 Un from all Mn = i=1 mi possible re-sampled data sets.
Y(C) denote the C distinct values of Y, and (n1Y , n2Y , · · · , nCY ),
ψωY and UnωY be similarly defined. The residual kernel (q)
(1) (1) Lemma 1. The WCR average of Un , denoted by Ūn , is also
hωX (x) = P(X = x) − ωX and hωY (y) = P(Y = y) − ωY . Write
(1) (1) (1) (1) a U-statistic of degree k based on kernel
hτb (Xi , Yi ) for {hτ (Xi , Yi ), hωX (Xi ), hωY (Yi )}T . It follows that
√ φ[(X1 , Y1 ), · · · , (Xk , Yk )] for parameter θ = E(ψ), where
n(Unτ − τ, UnωX − ωX , UnωY − ωY ) is asymptotically multi-
T

variate normal with mean 0 and variance–covariance matrix


(1) (1) 
k
1 
m1

mk
τb = √4E{hτb (X1 , Y1 )hτb (X1 , Y1 ) }. Correspondingly, τ̂b =
T
φ[(X1 , Y1 ), · · · , (Xk , Yk )] = ··· ψ{(X1r1 ,
Unτ / UnωX UnωY is a consistent estimator for τb , and its vari- mi
i=1 r1 =1 rk =1
ance στ2b can be estimated in a similar fashion as σρ2 described
in Section 2.2. × Y1r1 ), · · · , (Xkrk , Ykrk )}.
Goodman and Kruskal’s gamma is defined by γ ≡
γ{(X1 , Y1 ), (X2 , Y2 )} = τ/λ, where λ = P(X1 = X2 , Y1 = Y2 ).
The corresponding U-statistic Unλ is associated with degree Lemma 2. Let h(1) , h(2) , · · · , h(k) denote kernels of degrees
two kernel ψλ {(X1 , Y1 ), (X2 , Y2 )} = ψτ {(X1 , Y1 ), (X2 , Y2 )}2 and 1, 2, · · · , k defined recursively by the equations
(1)
residual kernel hλ (x, y) = P(X = x, Y = y) − λ. Similar to

the U-statistics for τb , n(Unτ − τ, Unλ − λ)T converges to a h(1) (x1 , y1 ) = ψ1 (x1 , y1 ) − θ
multivariate normal distribution with mean 0 and variance–
(1) (1)
covariance matrix γ = 4E{hγ (X1 , Y1 )hγ (X1 , Y1 )T }, where and
(1) (1) (1)
hγ (Xi , Yi ) = {hτ (Xi , Yi ), hλ (Xi , Yi )} . Correspondingly, the
T

estimator γ̂ = Unτ /Unλ is consistent for γ and its variance σγ2 h(c) {(x1 , y1 ), · · · , (xc , yc )}
can be estimated by the same way as σρ2 . = ψc {(x1 , y1 ), · · · , (xc , yc )}
3. Inference with Clustering 
c−1

Now consider the setting for clustered data. Let (mi , Xi , Yi ) − h(j) {(xi1 , yi1 ), · · · (xij , yij )} − θ,
denote the data for cluster i for i = 1, · · · , n, where Xi = j=1 (c,j)

(Xi1 , . . . , Ximi )T , Yi = (Yi1 , . . . , Yimi )T and mi is the number of


observation pairs in cluster i. We assume that the data from for c = 2, 3, · · · , k, where ψc {(x1 , y1 ), · · · , (xc , yc )} = E[ψ{(Xi1 r1 ,
different clusters are independent. We are interested in mea- Yi1 r1 ), · · · , (Xik rk , Yik rk )} | (Xil rl = xl , Yil rl = yl ), l = 1, · · · , c]].
suring and testing association between XIJI and YIJI , where Then kernels h̄(1) , · · · , h̄(k) with respect to φ are given by
I denotes a randomly selected cluster from a discrete uni-
form distribution on the integers 1, · · · , n, and given I = i, 
m1

mc

c
1
h̄(c) {(x1 , y1 ), · · · , (xc , yc )} = ··· h(c)
Ji denotes a random variable having a discrete uniform dis- mj
r1 =1 rc =1 j=1
tribution on the integers 1, · · · , mi . To decrease notational
clutter, we define (XC , YC ) ≡ (XIJI , YIJI ). For the chi-square × {(x1 r1, y1 r1 ), · · · , (xcrc , ycrc )}.
6 Biometrics

The asymptotic distribution of Ūn is established in the fol- proportions


  rather than the pooled estimators, condition
n
lowing theorem. i=1
(mi / j mj ) → 0, as n → ∞ required for the pooled
2

estimators to be consistent is not required here. Because of


Theorem 1. Under the regularity condition that this consistency, using similar arguments to Section 2.1, it
 −1  ˆ −1 Ūnχ2 has an asymptotic χ2
limn→∞ nj Var[h̄(j) {(Xi1 , Yi1 ), · · · , (Xij , Yij )] con- follows that under H0 , nŪTnχ2 
(n,j)
√ distribution with (R − 1)(C − 1) degrees of freedom.
verges to σ̄j ≥ 0, for j = 1, . . . , k, n(Ūn − θ) converges to a
2

normal distribution with mean 0 and variance k2 σ̄12 . 3.3. WCR-Based Spearman Rank Correlation
Let (XC , YC ) ∼ FC with marginal distributions, FXC and FYC .
Proofs of Lemma 1 and 2 and Theorem 1 are provided in
Then the parameter of interest is given by equation 6, except
the supplementary material.
with X and Y replaced by XC and YC and the midrank
3.2. WCR-Based Chi-Square Test adjusted distributions defined accordingly. In other words, we
Analogously to Section 2.1, let random variables XC and generalize ρs to clustered data using ρs ≡ ρ(ḞXC , ḞYC ), where
YC be redefined by a set of indicator random variables as ḞXC (x) = 12 FXC (x) + 12 FXC (x−) and analogously define ḞYC (y).
j
XC = 1 if XC = j and 0 otherwise, for j = 1, · · · , R, and The WCR averages of the kernels for the three U-statistics
YC = 1 if YC = k and 0 otherwise, for k = 1, · · · , C. Thus
k associated with the three parameters in ρs are given by
j j
pjk = P(XC = 1, YCk = 1), pj+ = P(XC = 1), p+k = P(YCk = 1),
and σjk = pjk − pj+ p+k . The WCR average of the kernel for φρ {(X1 , Y1 ), (X2 , Y2 ), (X3 , Y3 )}
the U-statistics associated with the χ2 test statistic equals  1 
mi1
ˆ ˆ 1
= Ḟ i2 X (Xi1 l )Ḟ i3 Y (Yi1 l ) − , (14)
j j
φ[(Xi1 , Yik1 ), (Xi2 , Yik2 )] 6mi1 4
(3) l=1

  
mi1 mi2
1  mi ˆ ˆ (x ) − 1/4
=
j j
ψ[(Xi1 r1 , Yik1 r1 ), (Xi2 r2 , Yik2 r2 )] φρX (X1 , X2 , X3 ) = (3) 6m1i Ḟ (x )Ḟ
l=1 i2 X i1 l i3 X i2 l
mi1 mi2 1
r1 =1 r2 =1 and φρY (Y1 , Y2 , Y3 ) is defined similarly, where
  ˆ (x) = m−1 mi K(x, x ) and Ḟ ˆ (y) = m−1 mi K(y, y )
1 
mi1 mi2
1 j j
ḞiX i l=1 il iY i l=1 il
= (X − Xi2 r2 )(Yik1 r1 − Yik2 r2 ) are the midrank adjustments to the empirical distributions
mi1 mi2 2 i1 r1 of X and Y for cluster i. Similarly define the midrank
r1 =1 r2 =1
adjustments to the empirical joint distribution of X and
ˆ (x, y) = m−1 mi K(x, x )K(y, y ).
for 1 ≤ i1 < i2 ≤ n. By algebra, the average of the U-statistic Y for cluster i by Ḟ i il il
ˆ (x, y) = n−1  Ḟ ˆ (x) = n−1  Ḟ
i l=1
for σjk with respect to the resampling distribution given the Let Ḟ ˆ (x, y) Ḟ ˆ (x) and
C i XC iX
ˆ (y) = n−1  Ḟ
i i
original data has the expression ˆ (y) denote the average of cluster-specific
Ḟ YC i iY
midrank- adjusted joint and marginal empirical distributions
1  1  j
n mi
of X and Y, respectively. Then Pearson’s correlation applied
Ūnjk = (Xil − p̂j+ )(Yilk − p̂k+ ), (13)
n−1 mi to Ḟ XC
ˆ (Y ∗ ) is given by
ˆ (X∗ ) and Ḟ
YC
i=1 l=1 C C

n n m ˆ
CovXC∗ YC∗ {Ḟ ∗ ˆ ∗
XC (XC ), Ḟ YC (YC )}
j j j
where p̂j+ = 1n i=1 X̄i , p̂+k = 1n i=1 Ȳik , X̄i = m1i i
X , 
mi k =1 i r̄s =
Ȳi = mi
k 1
=1
Yi , for j = 1, · · · , R − 1, k = 1, · · · , C − 1. By ˆ
VarXC∗ {Ḟ ∗ ˆ ∗
XC (XC )}VarXC∗ {Ḟ YC (YC )}
(1)
Lemma 2, the residual kernel h̄jk (x, y) for x = (x1 , · · · , xm )  m ˆ ˆ
and y = (y1 , · · · , ym ) equals m− 1
XC (Xil )Ḟ YC (Yil ) − 1/4
i

=  m
i i l=1
 m ,
ˆ ˆ
m− 1
XC (Xil ) − 1/4 m− 1
YC (Yil ) − 1/4
i i
Ḟ 2 Ḟ 2

(1)
1
m
(1)
i i l=1 i i l=1

h̄jk (x, y) = hjk (xl , yl )


m
l=1 where XC∗ ∼ F̂XC , YC∗ ∼ F̂YC and (XC∗ , YC∗ ) ∼ F̂C with F̂XC , F̂YC
 1
m
j σjk
and F̂C denoting the averages of cluster-specific empiri-
= (xl − pj+ )(ylk − p+k ) − . cal distribution functions. Similar to the iid single pair
2m 2 data shown in Section 2.2, it can be shown that r̄s
l=1
is asymptotically equivalent to Ūnρ / ŪnρX ŪnρY , and by

(1) (1)
Write h̄i for [h̄11 (X1i , Yi1 ), . . . , h̄R−1,C−1 (XR−1
(1)
, YiC−1 )]T and Lemma 2 and Theorem 1 , n(Ūnρ − σ̇XY , ŪnρX − σ̇X2 , ŪnρY −
i
(1) σ̇Y2 ) converges to a multivariate normal distribution with
Ūnχ2 for [Ūn11 , · · · , ŪnR−1,C−1 ]T . Then E(h̄i ) = 0 and mean 0 and variance–covariance matrix  ¯ ρ = limn→∞ n−1
(1) (1) (1)T
Var(h̄i ) = E(h̄i h̄i ). By Theorem 1, under H0 ,  (1) (1)
√ i
9E{h̄ρ (Xi , Yi )h̄ρ (Xi , Yi ) } which can be consistently
T
nŪnχ2 has an asymptotic multivariate normal dis- ˆ ,
ˆ , Ḟ
tribution with  mean 0 and variance–covariance estimated by its empirical counterpart and with Ḟ XC YC
n (1)
 = limn→∞ 4n−1 i=1 Var(h̄i ). A consistent estimator and Ḟˆ plugged in for Ḟ , Ḟ , and Ḟ , respectively. It fol-
C
√ XC YC C
ˆ of  is obtained by replacing pj+ , p+k , and σjk
 
s in  by lows that n(r̄s − ρs ) is asymptotical normal. The variance
p̂j+, p̂+k , and 0 for j = 1, · · · , R − 1, k = 1, · · · , C − 1. Note estimator of can be obtained in the similar ways as those for
that since p̂j+ and p̂+k are the averages of cluster specific iid samples described in Section 2.2.
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 7

3.4. WCR-Based Kendall’s Tau, Tau b, and Goodman Rank correlations ρs , τ, τb , and γ were calculated from the dis-
and Kruska’s Gamma crete joint distribution and displayed in the title of each figure.
We define τ, τb , γ for the cluster situation by replacing FC for For each of the five tests studied in this article, we evalu-
F in the their definitions. The WCR average of the kernel for ated the performance of three different tests: naive test, single
the U-statistics associated with τ is given by point per cluster (SPPC) test, and proposed WCR-averaged
(WCRA) test. In the naive test, clustering was ignored and
all the data points were treated as iid samples. In the SPPC
φτ {(X1 , Y1 ), (X2 , Y2 )}
test, the first data point in each cluster was selected and used
 
m1 m2
to calculate the test statistic. Both the naive and SPPC test
= m−1 −1
1 m2 sign{(x1j − x2k )(y1j − y2k )}. are based on the U-statistics of iid samples presented in Sec-
j=1 k=1 tion 2. For the rank-based tests and estimators, in addition to
making inferences by comparing the standardized statistics to
The WCR average of Kendall’s tau estimator, denoted a standard normal distribution, we also compared them to the
by τ̄, is the corresponding U-statistic given by Ūnτ = t-distribution with n − 2 degrees of freedom as recommended
n−1 
φ {(Xi1 , Yi1 ), (Xi2 , Yi2 )}. By Lemma 2 and The-
(n,2) τ
by Kendall and Stuart (1972) for small and moderate sample
2
√ size. The simulations were repeated 10,000 times.
orem 1, n(τ̄ − τ) is asymptotically normal with mean 0 and
n Proportion of rejections of the χ2 test at the 5% level are
variance limn→∞ 4n−1 i=1 E[h̄τ (Xi , Yi )2 ].
(1)

The WCR averages of the kernels associated with ωX presented in the top panels of Figure 2. When Xij and Yij
and ωY in  the denominator of τb are given by φωX (X1 , X2 ) are uncorrelated (ρ1 = 0), the intra-cluster correlation (ρ2 )
m1 m2 equals 0, and the cluster size is non-informative, all the three
= m−1 1 m2
−1
I[X =
 X2k ] and φωY (Y1 , Y2 ) = m−1
−1
m1 mj=1 k=1 1j 1
tests control the type-I error rate well. When ρ2 > 0 and
m2 2
I[Y 1j =
 Y 2k ]. It can be shown that the
j=1 k=1 cluster size is not constant (scenarios b and c for mi ), the
WCR average of Kendall’s tau b estimator,  denoted by
type-I error is inflated for the naive approach, whereas the
τ̄b , is asymptotically equivalent to τ̄/ ŪnωX ŪnωY , where other two approaches maintain the type-I error rate. Since the
ŪnωX and ŪnωY are U-statistics associated with ker-
√ SPPC only used one data point in each cluster, it has much
nel φωX and φωY , respectively. It follows that n(Ūnτ − lower power than the WCRA approach, even when WCRA is
τ, ŪnωX − ωX , ŪnωY − ωY ) is asymptotically multivariate nor-
T
conservative under informative cluster size (scenario c for mi ).
mal with mean 0 and variance–covariance matrix τb = For moderate sample size (e.g., n = 50), the distribution of

limn→ 4n−1 i E{h̄τb (Xi , Yi )h̄τb (Xi , Yi )T }.
(1) (1)
rank-based estimators tend to have longer tails than that of
The WCR average of the kernel associated with λ≡P(XC1 = a normal distribution. As a result, Z-test is slightly inflated
XC2 , YC1 =YC2 ) in the denominator
m1 m2of γ is given by φλ {(X1 , (results not shown), whereas the t-test controls the type I error
Y1 ), (X2 , Y2 )} = m−1 1 m2
−1
j=1
ψ {X1 , Y1 ), (X2 , Y2 )}.
k=1 λ rate well and hence is recommended. In addition, for Spear-
As for τb , the WCR average of the gamma estimator, man rank correlation, tau b, and gamma, as the coverage
denoted by γ̄, is asymptotically equivalent to τ̄/Ūnλ . It probability using the standard error based on fixed denomina-

follows that n(Ūnτ − τ, Ūnλ − λ)T is asymptotically multi- tor is more conservative than that based on the delta method
variate normal with mean(1)0 and variance–covariance matrix (supplementary material), the former is recommended. Sim-
γ = limn→∞ 4n−1 i E{h̄γ (Xi , Yi )h̄γ (Xi , Yi )T }.
(1)
Test ulation results based on the t-test and fixed denominator are
procedures and inference for τ, τb , and γ follow those for presented in the lower two panels of Figure 2 for the pro-
un-clustered paired samples. portion of rejections and Figures 3–5 for rank correlation
estimate, coverage probability of 95% confidence interval, and
4. Simulation Study mean width of 95% confidence interval, respectively. In the
We conducted a simulation study to evaluate the perfor- presence of intra-cluster correlation (ρ2 > 0) and informative
mance of the WCR-averaged tests and estimation of rank cluster size, the naive test for the Spearman rank correlation
correlations presented in Section 3. Clustered paired observa- has inflated type-I error, the naive estimator of ρs is biased
tions (Xij , Yij ) were generated from a mean-zero multivariate under informative cluster size, and the coverage probability of
normal distribution according to the following scenarios: (1) the 95% confidence interval is anti-conservative. Both SPPC
ρ1 = cor(Xij , Yij ) = 0, 0.2, 0.4, or 0.6; (2) ρ2 = cor(Xij , Xij ) = and WCRA control the type-I error well, and their associated
cor(Yij , Yij ) = 0 or 0.3; (3) three types of cluster sizes mi : (a) estimators have little bias. The coverage probability using
two per cluster, (b) mi = 1 in half of the clusters, and five WCRA is slightly conservative when ρs = 0.55 in which case
in the other half of the clusters, (c)mi = 1 if bi = −0.1 and the coverage probability with the standard error based on the
5 if bi = 0.1, where bi = −0.1 in half of the clusters and 0.1 delta method (see supplementary material) is closer to the
in the other half of the clusters, and cor(Xij , Yij ) = ρ1 + bi ; 95%. As expected, WCRA has higher power and shorter width
(4) number of clusters n = 50, 100, 200. For the chi-square of the 95% confidence interval than SPPC. Similar findings are
test, the simulated data were formed into a 2 × 2 by thresh- observed for τ, τb and γ. By treating the denominator of τ̄b ,
olding each variable at 0. For the rank tests, to allow for and γ̄ as fixed, the test statistics for τ, τb , and γ are equal, and
ties, the continuous data generated by the above scenar- it is interesting to observe that the corresponding coverage
ios were discretized into integers 1–9 by the cut-off values probabilities are very close to the 95% nominal level and more
(−∞, −3, −2, −1, −0.5, 0.5, 1, 2, 3, ∞) for both X and Y . The conservative than those based on the delta method (see sup-
bivariate distribution of discretized X and Y were determined plementary material). We also generated random data from
by calculating the joint probability in each of the 9 × 9 cells. multivariate t-distribution and with the within-cluster corre-
8 Biometrics

Figure 2. Proportion of rejections at the 5% significance level based on 10,000 simulations. Solid line: Naive, dashed line:
SPPC, dotted line: WCRA, and horizontal solid line: reference line. Description of the 18 scenarios given in the x-axis label.
This figure appears in color in the electronic version of this article.

lation of repeated measures of the same variable depending established but its routine clinical use for monitoring metas-
|j−k|
on their distance specified as ρ2 for scenarios with cluster tases of prostate cancer patients was not evaluated until
size greater than 2. Similar simulation results were observed recently (Apolo et al., 2016). In that study, the ability of
and presented in the supplementary material. Na18 F PET/CT to detect and monitor bone metastases activ-
In summary, the proposed WCRA testing and estimating ity over time was assessed in 60 patients all of whom had
approach outperforms the naive approach by maintaining the advanced prostate cancer and had received primary definitive
type-I error rate and achieving the proper coverage proba- therapy. Among them, 30 had known bone metastases by con-
bility, and outperforms the SPPC approach by having better ventional imaging (metastatic group), and the other 30 men
power. did not have known bone metastases by conventional imaging
(non-metastatic group) but were at high-risk for metastases
5. Example based on a rising level of prostate-specific antigen (PSA). The
Prostate cancer is the most common noncutaneous malig- purpose of accruing the two groups of patients was to assess
nancy in men in the United States. Although most patients the ability of the new tracer to detect both easy to find lesions
are diagnosed with localized disease and have an excellent such as those found by the conventional imaging and hard
prognosis, some patients will develop metastases. The abil- to detect lesions such as those missed by the conventional
ity of Na18 F PET/CT to detect skeletal metastases had been imaging. All the patients underwent Na18 F PET/CT at base-
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 9

ρs = 0 ρs = 0.18 ρs = 0.36 ρs = 0.55

0.06

0.6
0.4
Monte Carlo mean of rs

Monte Carlo mean of rs

Monte Carlo mean of rs

Monte Carlo mean of rs


0.20

0.3
0.04

0.4
0.2
0.10
0.02

0.2
0.1
0.00

0.00

0.0

0.0
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τ=0 τ = 0.11 τ = 0.23 τ = 0.36


0.00 0.01 0.02 0.03 0.04

0.15

0.4
Monte Carlo mean of τ

Monte Carlo mean of τ

Monte Carlo mean of τ

Monte Carlo mean of τ


0.20

0.3
0.10

0.2
0.10
0.05

0.1
0.00

0.00

0.0
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τb = 0 τb = 0.15 τb = 0.3 τb = 0.46


0.00 0.05 0.10 0.15 0.20
Monte Carlo mean of τb

Monte Carlo mean of τb

Monte Carlo mean of τb

Monte Carlo mean of τb


0.30
0.04

0.4
0.20
0.02

0.2
0.10
0.00

0.00

a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2
a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2
a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0.0 a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

γ=0 γ = 0.19 γ = 0.39 γ = 0.59


0.06

0.6
0.4
Monte Carlo mean of γ

Monte Carlo mean of γ

Monte Carlo mean of γ

Monte Carlo mean of γ


0.20
0.04

0.3

0.4
0.2
0.10
0.02

0.2
0.1
0.00

0.00

0.0

0.0

a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

Figure 3. Monte–Carlo means of rank-based estimators (using standard errors based on fixed denominators). Solid line:
Naive, thick gray line: SPPC, dotted line: WCRA, and horizontal solid line: reference line. This figure appears in color in the
electronic version of this article.

line, 6 months and 12 months, and standardized uptake values line and 6-months SUV was a secondary outcome and used to
(SUV) of lesions identified on Na18 F PET/CT at each time illustrate the proposed methodology.
point were recorded. For the new tracer to be useful in moni- Patients with lesions identified in both time points were
toring tumor activity, repeated measures of SUV need to show included in the analysis. A total of 362 lesions from 36 patients
substantial positive correlation. Correlation between the base- were identified with each patient having at least two lesions
10 Biometrics

ρs = 0 ρs = 0.18 ρs = 0.36 ρs = 0.55


Coverage probability of 95% CI of rs (%)

Coverage probability of 95% CI of rs (%)

Coverage probability of 95% CI of rs (%)

Coverage probability of 95% CI of rs (%)


90

90

90

40 50 60 70 80 90
80

80

80
70

70

70
60

60

60
50

50

50
40

40

40
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τ=0 τ = 0.11 τ = 0.23 τ = 0.36


Coverage probability of 95% CI of τ (%)

Coverage probability of 95% CI of τ (%)

Coverage probability of 95% CI of τ (%)

Coverage probability of 95% CI of τ (%)


90

90
90

90
80

80
80

80
70

70
70

70
60

60
60

60
50

50
50

50
40

40

40

40
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τb = 0 τb = 0.15 τb = 0.3 τb = 0.46


Coverage probability of 95% CI of τb (%)

Coverage probability of 95% CI of τb (%)

Coverage probability of 95% CI of τb (%)

Coverage probability of 95% CI of τb (%)


90

90

40 50 60 70 80 90

40 50 60 70 80 90
80

80
70

70
60

60
50

50
40

40

a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

γ=0 γ = 0.19 γ = 0.39 γ = 0.59


Coverage probability of 95% CI of γ (%)

Coverage probability of 95% CI of γ (%)

Coverage probability of 95% CI of γ (%)

Coverage probability of 95% CI of γ (%)


90

90

40 50 60 70 80 90

40 50 60 70 80 90
80

80
70

70
60

60
50

50
40

40

a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

Figure 4. Coverage probability of 95% confidence interval of rank-based estimator (using standard errors based on fixed
denominators). Solid line: Naive, thick gray line: SPPC, dotted line: WCRA, and horizontal solid line: reference line. This
figure appears in color in the electronic version of this article.
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 11

Monte−Carlo mean width of 95% CI of rs ρs = 0 ρs = 0.18 ρs = 0.36 ρs = 0.55

Monte−Carlo mean width of 95% CI of rs

Monte−Carlo mean width of 95% CI of rs

Monte−Carlo mean width of 95% CI of rs


0.6

0.6

0.6

0.6
0.4

0.4

0.4

0.4
0.2

0.2

0.2

0.2
0.0

0.0

0.0

0.0
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τ=0 τ = 0.11 τ = 0.23 τ = 0.36


Monte−Carlo mean width of 95% CI of τ

Monte−Carlo mean width of 95% CI of τ

Monte−Carlo mean width of 95% CI of τ

Monte−Carlo mean width of 95% CI of τ


0.6

0.6

0.6

0.6
0.4

0.4

0.4

0.4
0.2

0.2

0.2

0.2
0.0

0.0

0.0

0.0
a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

τb = 0 τb = 0.15 τb = 0.3 τb = 0.46


Monte−Carlo mean width of 95% CI of τb

Monte−Carlo mean width of 95% CI of τb

Monte−Carlo mean width of 95% CI of τb

Monte−Carlo mean width of 95% CI of τb


0.6

0.6

0.6

0.6
0.4

0.4

0.4

0.4
0.2

0.2

0.2

0.2
0.0

0.0

0.0

0.0

a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi a b c a b c a b c a b c a b c a b c mi
0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2 0 0.3 0 0.3 0 0.3 ρ2
50 100 200 n 50 100 200 n 50 100 200 n 50 100 200 n

Figure 5. Mean width of 95% confidence interval of rank-based estimator (using standard errors based on fixed denomina-
tors). Solid line: Naive, dashed line: SPPC, and dotted line: WCRA.

and a median 4.5 lesions per patient; one patient had 60 and 0.59 (p < 0.01, 95%CI (0.30, 0.88)), respectively. The
lesions identified (the maximum). Among the 36 patients, naive approach led to a smaller correlation than the WCRA
27 belonged to the metastatic group, and nine belonged to approach, because the naive rank correlation was dominated
the non-metastatic group. As SUV measurements (calculated by the smaller correlation observed in the subgroup of patients
as the mean of the upper 20th percentile of pixel values) each with more than 10 lesions identified. The result of the
were highly skewed, the correlation was determined by the SPPC approach depends on the arbitrary selection of the
Spearman rank correlation. To explore the effect of cluster single observation from each cluster.
size on correlation, these patients were stratified based on As requested by an associate editor, Spearman rank cor-
the number of lesions <= 3, (3, 10], and >10. The scatter relation stratified by number of lesions was estimated for
plots of SUV uptakes displayed in Figure 6 and the estimated 27 patients (lesions plotted with open circles in Figure 6)
Spearman rank correlation in each subgroup suggest that the in the metastatic group and nine patients (solid circles) in
cluster size is informative. The WCRA estimate of ρs and the non-metastatic group separately. For the former group,
the 95% confidence interval based on fixed denominator stan- there were 11, 6, and 10 patients in each stratum, and for
dard error and the t-distribution is equal to 0.71 (p < 0.01, the latter group, there were four, five, and zero patients
95%CI: (0.52,0.91). With the naive and SPPC approach, in each stratum. The Spearman rank correlation was 0.86
the estimate is equal to 0.64 (p < 0.01,95%CI (0.56,0.71)) and 0.46 for the two groups in the first stratum, 0.87 and
12 Biometrics

Number of lesions per patient <=3 3 < Number of lesions per patient <=10 Number of lesions per patient > 10
70

120

100
60

100

80
50

80
Suvmax at 6 months

Suvmax at 6 months

Suvmax at 6 months

60
40

60
30

40
40
20

20
20
10

0
20 40 60 80 100 20 40 60 80 100 120 20 40 60 80 100 120

Suvmax at baseline Suvmax at baseline Suvmax at baseline

Figure 6. Scatter plots of SUV uptake at baseline versus 6 months. Each dot is a lesion, and each subject may have multiple
lesions. Open circle: patients with known metastases by conventional imaging, filled circle: patients without known metastases
by conventional imaging. Naive Spearman correlations,rs , for the three panels are: for number of lesions ≤3, open circle 0.86,
filled circle 0.46, all circles 0.83; for 3 < number of lesions <= 10, open circle 0.87, field circle 0.13, all circles 0.61; for number
of lesions >10, open circle 0.54.

0.13 for the two groups in the second stratum, and 0.54 correlation, and have higher power than the singe-point-per-
for the first group in the third stratum. The cluster size cluster (SPPC) procedure.
also appears informative for both groups. For the metastatic WCR averaging presented in this work is performed on the
group, the WCRA, naive and SPPC estimate of ρs are U-statistic and is quite general. As such, it can be applied
equal to 0.79 (95%CI: (0.60,0.99), 0.65 (95%CI: (0.58, 0.73)), to any estimator or test statistic which can be expressed as
and 0.68 (95%CI: (0.47, 0.90)), respectively. For the non- a U statistic. For example, McNemar’s test for paired nomi-
metastatic group, the WCRA, naive and SPPC estimate of nal data may be alternatively expressed as a U-statistic, and
ρs are equal to 0.49 (95% CI: (-.28,1.00), 0.35 (95% CI: WCR-averaged McNemar’s test is readily available for paired
(-0.07, 0.76)) and 0.45 (95% CI: (-.31, 1.00)), respectively. repeated nominal data.
The correlation estimate are smaller in the non-metastatic
group but show a similar pattern to that in the metastatic 7. Supplementary Materials
group. Nevertheless, the sample size in the non-metastatic
Web Appendices, Tables, and Figures referenced in Sections
group is too small to draw a definitive conclusion and
2–4, and R-code for running the data example are available
the proposed methodology based on an asymptotic approx-
with this article at the Biometrics website on Wiley Online
imation may not perform well under such a small sample
Library.
size.

6. Conclusion References
Repeated measurements data frequently arise in medical Agresti, A. (1990). Categorical Data Analysis. New York, NY: John
research. When inference of interest is on association of two Wiley and Sons.
repeated variables, due to intra-cluster correlation, conven- Apolo, A. B., Lindenberg, L., Shih, J. H., Mena, E., Kim, J.
tional methods which require the assumption of iid samples W., Park, J. C., et al. (2016). Prospective study evaluating
may not be valid. In this article, we develop WCR-averaged Na18 F PET/CT in predicting clinical outcomes and sur-
χ2 test and rank-based association estimators and correspond- vival in advanced prostate cancer, The Journal of Nuclear
Medicine 57, 886–892.
ing test statistics through the framework of U-statistics. To
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975).
account for ties, we constructed U-statistics as a functional Discrete Multivariate Analysis. Cambridge, Massachusetts:
of the midranks. The simulation study demonstrates that in The MIT Press.
the presence of intra-subject correlation, while the conven- Datta, S. and Satten, G. (2005). Rank-sum tests for clustered
tional naive approach is biased, the proposed methods control data. Journal of the American Statistical Association 100,
the type I error rate, have little bias in estimation the rank 908–915.
Pearson’s Chi-square Test and Rank Correlation Inferences for Clustered Data 13

Follmann, D., Proschan, M., and Leifer, E. (2003). Multi- Lipsitz, S. R. and Fitzmaurice, G. M. (1996). The score test for
ple outputation: inference for complex clustered data by independence in r × c contingency tables with missing data.
averaging analyses from independent data. Biometrics 59, Biometrics 52, 751–762.
36–42. Lorenz, D., Datta, S., and Harkema, S. (2011). Marginal associa-
Hoffman, E. B., Sen, P. K., and Weinbert, C. R. (2001). Within- tion measures for clustered data. Statistics in Medicine 30,
cluster resampling. Biometrika 88, 1121–1134. 3181–3191.
Hollander, M. and Wolfe, D. A. (1999). Nonparametric Statistical Williamson, J. M., Datta, S., and Satten, G. (2003). Marginal
Methods, 2nd edition. New York, NY: John Wiley and Sons. analyses of clustered data when cluster size is informative.
Kendall, M. G. and Stuart, A. (1972). The Advanced Theory of Biometrics 59, 36–42.
Statistics, Volume 2, , 3rd edition. London: Charles Griffin
and Company Limited. Received March 2016. Revised December 2016.
Lee, A. E. (1990). U-Statistics. New York, NY: CRC Press. Accepted December 2016.

You might also like