(Sici) 1097 0258 (19980715) 17 13 1495 Aid Sim863 3.0

STATISTICS IN MEDICINE
Statist. Med. 17, 14951507 (1998)

ON THE COMPARISON OF CORRELATED PROPORTIONS
FOR CLUSTERED DATA
NANCY A. OBUCHOWSKI*
Department of Biostatistics and Epidemiology, The Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland,
Ohio 44195-5196, U.S.A.
SUMMARY
McNemars test is often used to compare two proportions estimated from paired observations. We propose
a method extending this to the case where the observations are sampled in clusters. The proposed method is
simple to implement and makes no assumptions about the correlation structure. We conducted a Monte
Carlo simulation study to compare the size and power of the proposed method with a test developed earlier
by Eliasziw and Donner. In the presence of intracluster correlation, the size of McNemars test can greatly
exceed the nominal level. The size of Eliasziwand Donners test is also inated for some correlation patterns.
The proposed method, on the other hand, is close to the nominal size for a variety of correlation patterns,
although it is slightly less powerful than Eliasziw and Donners procedure. The proposed method is a good
alternative to Eliasziw and Donners test when, in practice, little is known about the correlation pattern of
the data. 1998 John Wiley & Sons, Ltd.
INTRODUCTION
McNemars test is often used to compare two proportions estimated from paired observations.
We propose a method extending this to the case where the observations are sampled in clusters.
Our method is a direct extension of the work of Rao and Scott` who compared independent
groups of clustered binary data. Rao and Scotts method, as well as the proposed extension,
makes no assumptions about the correlation structure and is simple to implement.
In 1991, Eliasziw and Donner` considered this same problem of comparing correlated propor-
tions for clustered data. They developed an adjustment to McNemars test which involves rst
estimating the correlation between discordant pairs within a cluster, then using the estimate of the
correlation to adjust the usual McNemar test statistic. In this paper we compare our method with
the adjusted McNemar test via Monte Carlo simulation. Note that one can also use generalized
estimating equations (GEE)"` for comparing correlated proportions when the data are sampled
in clusters. There are many applications, however, where a simple test, such as the one proposed
here or the one proposed by Eliasziw and Donner,` is preferred to the more complicated GEE
approach.
* Correspondence to: Nancy A. Obuchowski, Department of Biostatistics and Epidemiology, The Cleveland Clinic
Foundation, 9500 Euclid Avenue, Cleveland, Ohio 44195-5196, U.S.A. E-mail: nobuchow@bio.ri.ccf.org
CCC 02776715/98/13149513$17.50 Received October 1996
1998 John Wiley & Sons, Ltd. Revised September 1997
This work was motivated by a study in diagnostic radiology where two diagnostic tests,
positron emission tomography (PET) and single photon emission computed tomography
(SPECT), were compared for the detection of hyperparathyroidism.' Seventy-two glands in 21
patients were evaluated by both tests. Following the tests, all patients underwent surgery to
determine denitively whether or not hyperparathyroidism was present. The objective of the
study was to compare the sensitivity and specicity of the two tests. To do this, we must account
for not only the correlation in the results of the PET and SPECT tests, but also the correlation
between test results of glands in the same patient, that is, the intracluster correlation.
PROPOSED EXTENSION OF RAO AND SCOTTS METHOD
We use the notation of Rao and Scott` with minor modications. Suppose we have a random
sample of m clusters from a population, where there are n
H
units in the jth cluster, j"1,
2
, m.
Each unit is given I treatments or tests. Let x
GH
denote the number of units in the jth cluster that
respond to treatment i, i"1,
2
, I.
Let p
G
denote the probability that a randomly selected unit from the population will respond to
treatment i. Thus, E(X
GH
)"n
H
p
G
. Rao and Scott` give pL
G
as an estimator of p
G
pL
G
"
K
H
x
GH
H
n
H
(1)
which is just the overall proportion of responses to treatment i. An estimator of the variance of pL
G
,
for large m, which takes into consideration the intracluster correlation is`
va r(pL
G
)"m(m!1)
H
(x
GH
!n
H
pL
G
)`
H
n
H
`
. (2)
Rao and Scott show that (pL
G
!p
G
)/va r(pL
G
)` is asymptotically N(0, 1) as mPR(see also Scott
and Wu`).
To compare proportions estimated from the same sample of m clusters, we must consider the
correlation between the proportions, in addition to the intracluster correlation. A proposed
estimator of the covariance between pL
G
and pL
G
is
co v(pL
G
, pL
G
)"m(m!1)
H
(x
GH
!n
H
pL
G
) (x
GH
!n
H
pL
G
)
H
n
H
`
. (3)
Then, an estimator of the variance of the dierence between pL
G
and pL
G
is
va r(pL
G
!pL
G
)"va r( pL
G
)#va r( pL
G
)!2co v( pL
G
, pL
G
) (4)
where iOi .
We denote a vector of estimated proportions by pL
1;I
"[pL
, pL
`
,
2
, pL
'
] and its estimated
covariance matrix by S
I;I
, where the (i, i )th element of S is given by equation (3). To test the null
hypothesis that the vector of proportions p
1;I
equals some null vector p
1;I(o)
, against the
alternative hypothesis that they are not equal, the test statistic is
X`"(pL
1;I
!p
1;I(o)
) (S
I;I
)(pL
I;1
!p
I;1(o)
) (5)
which, under the null hypothesis, is asymptotically distributed as a chi-square variable with
I degrees of freedom.
1496 N. OBUCHOWSKI
Statist. Med. 17, 14951507 (1998) 1998 John Wiley & Sons, Ltd.
Another hypothesis of interest is the homogeneity hypothesis: H
"
: p
"p
`
, against the alterna-
tive hypothesis that the p
G
s are unequal. The test statistic for this hypothesis is
X`"( pL
!pL
`
)`/va r(pL
!pL
`
)
(p )
(6)
which, under the null hypothesis, is asymptotically distributed as a chi-square variable with one
degree of freedom. Here, pN is equal to the pooled proportion, pN"( pL
#pL
`
)/2. va r(pL
!pL
`
)
( p )
is the estimated variance of the dierence between pL
and pL
`
under the null hypothesis, where
the (i, i)th element of S
2;2( p )
is given by
co v (pL
G
, pL
G
)
(pN)
"m(m!1)
H
(x
GH
!n
H
pN) (x
GH
!n
H
pN)
H
n
H
`
. (7)
It can be shown that when there is exactly one unit per cluster, the test statistic in (6) reduces to
McNemars test statistic multiplied by the nite correction factor, [m/(m!1)].
ADJUSTED McNEMAR TEST
Eliasziw and Donner` developed an adjustment to McNemars test. It involves rst estimating
the correlation among discordant pairs within a cluster, then using the estimate of the correlation
to adjust the usual McNemar test statistic. Their test statistic takes the form X`
^`
"X`
`
/C, where
AM stands for adjusted McNemar, X`
`
is the usual McNemar test statistic, and C is the required
correction. C is given by C"1#(n
!1)p, where n
is, roughly, the average number of

discordant pairs per cluster and p is the correlation among discordant pairs within a cluster (see
Eliasziw and Donner` for full details).
They considered two estimates of the correlation among discordant pairs: one based on data
from the discordant pairs only (denoted p ), and the other based on data from both discordant and
concordant pairs (denoted pJ). They compared the two versions of their test with Fishers
one-sample permutation test` and a likelihood ratio test in a Monte Carlo simulation study.
They found that the correlation estimated from the data of discordant pairs only (that is, pL) can be
unreliable when the total number of discordant pairs in the sample is small and/or when the
numbers of the two types of discordant pairs (that is, number of (#,!) versus number of
(!,#)) dier greatly. Furthermore, the size of the test statistic slightly exceeds the nominal level.
They recommend that one use the correlation estimate based on both discordant and concordant
pairs (that is, pJ) for most applications because of its nominal type I error rate, high power, and
ease of computation. The exception to this recommendation would seem to occur if one expects
that the intracluster correlation is not constant across clusters, because in the derivation of pJ there
is an assumption of equi-correlation for all clusters.
In the following Monte Carlo simulation study, we compare the size and power of the proposed
method with the size and power of both versions of the adjusted McNemar test, that is, the one
based on pJ and the one based on p .
MONTE CARLO SIMULATION STUDY
We conducted a Monte Carlo simulation study to assess the size and power of the test statistic in
(6) and to compare it with McNemars test without the continuity correction" (denoted X`
`
) and
CORRELATED PROPORTIONS FOR CLUSTERED DATA 1497
Figure 1. Diagram illustrating the dierent correlation structures considered in the Monte Carlo simulation study.
Cluster j has three units labeled A, B, and C. Both treatments were applied to all units within the cluster.
Eliasziw and Donners` adjusted McNemar tests (denoted XI`
^`
and XK`
^`
for the tests based on
pJ and pL, respectively). We considered several correlation structures and varied the number of
clusters, the number of units per cluster, and the response probability (p
"0)05, 0)20 and 0)50).

Figure 1 describes the dierent correlation structures considered in this study. For the rst
correlation structure, there was common correlation between treatments 1 and 2 for all units
within a cluster (that is, r
`
"r
"
), and there was common intracluster correlation (that is, r
"r
`
).
The intracluster correlation was set at 0)0, 0)1, 0)4, or 0)8, and the correlation between treatments
was one-half the intracluster correlation (that is, r
`
"r
"
"1/2r
"1/2r
`
).
For the second correlation structure, the correlation between treatments for the same unit was
set at 0)5 (that is, r
`
"0)5), and the correlation between treatments for dierent units within
a cluster was set at one-half of the intracluster correlation (that is, r
"
"1/2r
"1/2r
`
). The
intracluster correlation was constant (that is, r
"r
`
). This second correlation structure seems
more plausible than the rst. In particular, one would expect r
"
, the correlation between
treatments for dierent units within a cluster, to be less than r
`
, the correlation between
treatments for the same unit. Furthermore, r
"
is likely small if r
and r
`
are small and is likely
large only if r
and r
`
are large. Therefore, we made r
"
a function of r
and r
`
, and we arbitrarily set
r
`
equal to 0)5. When r
and r
`
are 0)8, r
"
is also relatively large (that is, 0)4) but still smaller than r
`
.
1498 N. OBUCHOWSKI
The third correlation structure is a special case. It was motivated by one of Eliasziw
and Donners` examples, where the observations were clustered for only one of the two groups.
In their example, patients and psychiatrists were compared on responses to check-listed items
relating to patient concerns and treatments (see Petryschen"). The sample consisted of 135
patients and 29 psychiatrists who each treated from one to eight of the patients. Since each
patient matches with one of the psychiatrists, there are a total of 135 matched pairs. However,
the responses of a psychiatrist may be correlated, thus the 135 matched pairs are clustered
within the 29 psychiatrists. Accordingly, for the third correlation structure the intracluster
correlation was zero for treatment 2 (that is, r
`
"0)0) and r
'0)0 (that is r
"0)1, 0)4, or
0)8, r
`
"0)0, and r
`
"0)5, r
"
"0)0 for the rst two values of r
and r
"
"0)1 for r
"0)8).
Note that for this correlation structure, the assumption of equi-correlation across clusters may
not be met, thus we may prefer the adjusted McNemar test based on discordant pairs only,
that is XK`
^`
, over the adjusted McNemar test based on both concordant and discordant pairs,
that is, XI`
^`
.
We generated the data using the following strategy. We generated a random vector from
a (n
H
;2)-variate normal distribution with zero mean vector. The rst n
H
variates represented the
outcomes of n
H
units within a cluster given treatment 1; the last n
H
variates represented the
outcomes of the same n
H
units within the same cluster but given treatment 2. The various
covariance matrices for the (n
H
;2)-variate normal distribution have been described above. If the
simulated value for the kth unit within a cluster (k"1,
2
, n
H
) was greater than zero, then we set
the outcome to one; otherwise, we set the outcome for that unit to zero. This resulted in
a response rate of 50 per cent. Similarly, for a response rate of 20 per cent and 5 per cent, the
cut-o values were 0)84 and 1)645. To generate clusters of unequal size (see Tables IIII and V),
units were randomly and independently deleted with probability 5 per cent (for Tables IIII) or
with probability 30 per cent (for Table V).
We generated 2000 data sets for each assessment. We used a z-test, based on the normal
distribution approximation, to test whether the type I error rate diered from the nominal 5 per
cent level. With a sample size of 2000, we could detect departures of 1 per cent with 53 per cent
power and departures of 2 per cent with 97 per cent power ("0)05, two-tailed).
Tables IIII summarize the results of the simulation study for each of the three correlation
structures, respectively. In these tables the number of clusters (m) was 100, 50 or 25 and the
number of units per cluster (n
H
) was )5 (approximately 97 per cent of clusters have at least
4 units). Table IV summarizes the results for the case where there is a small number of units per
cluster (that is, n
H
"2); we used the second correlation structure since it seems most plausible.
Finally, Table V summarizes the results for the case where there are few clusters (m"30, 20 or 10)
but many units per cluster (n
H
)20) and the cluster sizes vary dramatically (approximately 50 per
cent of clusters have 1315 units, 25 per cent have '15 units, and 25 per cent have (13 units). In
this table we also used the second correlation structure.
Table I summarizes results for the rst covariance structure with )5 units per cluster. In the
presence of only slight intracluster correlation (that is, 0)1), the size of McNemars test exceeded
the nominal level. At p
"0)50, XI`
^`
and the proposed test never exceeded the nominal level,
whereas XK`
^`
exceeded the nominal level 3 of 12 times. At p
"0)20 or 0)05, both XI`

^`
and the
proposed test have signicance levels below 5 per cent, whereas the size of XK`
^`
continued to run
a little high at p
"0)20. The two adjusted McNemar tests were generally more powerful than the
proposed test; on average, the gain in power by using the adjusted McNemar tests was less than
3 percentage points.
Table I. Empirical size, power for McNemars test (X`
`
), adjusted McNemar test using the estimate of
correlation based on both concordant and discordant pairs (XI`
^`
), adjusted McNemar test using the
estimate of correlation based on discordant pairs only (XK`
^`
), and proposed method (equation (6)).
1/2r
"1/2r
`
"r
`
"r
"
,)5 units/cluster
m p
Corr Size Power

X`
`
XI`
^`
XK`
^`
Equation (6) X`
`
XI`
^`
XK`
^`
Equation (6)
100 0)50 0)0 5)7 5)9 5)5 5)1 87)1 87)0 87)0 86)1
0)1 6)4* 4)8 5)2 5)0 87)5 84)4 85)2 84)1
0)4 12)4* 5)0 6)1* 5)4 86)4 74)0 76)1 74)2
0)8 22)8* 5)3 5)2 4)8 84)7 60)1 59)9 58)2
50 0)50 0)0 4)8 5)1 5)5 4)3 59)5 60)2 59)3 56)7
0)1 6)1* 4)5 5)7 4)3 60)4 56)1 57)4 54)4
0)4 11)2* 4)7 5)6 4)9 62)5 44)7 47)1 44)2
0)8 23)4* 5)5 5)7 4)9 65)1 35)7 35)5 32)6
25 0)50 0)0 4)6 5)5 5)8 4)3 34)6 36)1 36)3 29)6
0)1 6)4* 5)8 6)1* 4)6 37)6 33)2 35)7 28)5
0)4 11)3* 4)7 6)8* 4)5 39)2 25)8 27)9 22)6
0)8 23)3* 5)3 5)9 4)1 47)6 20)3 20)3 16)7
100 0)20 0)0 5)0 4)8 5)3 4)9 93)7 94)0 93)6 93)4
0)1 5)7 4)2 4)7 4)3 93)8 91)8 92)6 92)1
0)4 11)7* 3)5* 4)8 4)5 92)5 82)9 84)8 83)8
0)8 22)5* 5)4 5)8 5)2 90)0 70)5 71)9 69)9
50 0)20 0)0 5)4 5)5 5)4 4)6 72)3 73)3 73)2 70)3
0)1 6)4* 5)0 6)0* 5)0 70)9 64)8 68)2 64)7
0)4 11)3* 4)0* 5)8 4)8 71)8 53)0 57)4 53)3
0)8 21)8* 4)7 5)1 4)2 71)7 41)8 43)7 40)3
25 0)20 0)0 5)0 6)1* 6)4* 4)2 41)3 43)4 43)5 37)2
0)1 6)3* 5)1 6)4* 4)0* 42)6 37)7 41)0 34)3
0)4 11)2* 3)7* 6)2* 3)2* 46)6 28)9 32)8 27)1
0)8 22)1* 3)9* 5)3 3)2* 53)5 19)3 24)3 22)4
100 0)05 0)0 4)6 5)0R 5)7 5)0 99)9 99)9 99)9 99)9
0)1 5)8 5)0R 5)6 5)1 99)9 99)9 99)9 99)9
0)4 9)0* 3)5* 5)2 4)5 99)9 99)5 99)8 99)6
0)8 21)8* 4)1 5)0 4)6 99)4 95)0 95)5 95)2
50 0)05 0)0 4)7 4)7R 5)1 4)1 96)6 96)6R 96)7 96)2
0)1 5)8 5)1R 5)4 4)2 96)0 95)3R 96)2 95)1
0)4 9)4* 3)0R* 5)1 3)5* 94)4 87)0 89)7 88)4
0)8 20)7* 3)8R* 5)0S 4)0* 90)5 69)7 72)1S 69)9
25 0)05 0)0 3)8* 4)3R 4)7 3)6* 75)6 74)9R 74)7S 68)4
0)1 5)1 4)5R 5)3S 4)1 75)2 69)5R 72)8S 65)9
0)4 7)4* 3)3R* 5)7S 3)1* 74)3 55)5R 60)8S 53)7
0)8 20)4* 2)1R* 3)7S* 1)9* 72)0 36)7R 36)9S 33)5
Corr"intracluster correlation (r
). *indicates that the size is not the nominal level of 5 per cent. R indicates that, for some
of the 2000 data sets, out-of-range estimates of the correlation pJ were truncated to boundary values (see text). S indicates
that the correlation pL could not be computed for all 2000 data sets (see text). For assessing power, p
`
"p
#0)10. The
maximum standard error of the power estimates is 0)011; the maximum standard error of the size estimates is 0)009
Eliasziw and Donners adjustments to McNemars test involve estimation of the correlation
among discordant pairs within a cluster. We noticed that for rare events (that is, p
"0)05) and/or
when the number of units per cluster is small (see Table IV), Eliasziw and Donners correlation
p was not always in the range !1)0 to #1)0. We included these cases in the estimates of size and
1500 N. OBUCHOWSKI
Table II. Empirical size, power for McNemars test (X`
`
^`
^`
1/2r
"1/2r
`
"r
"
; r
`
"0)5, )5 units/cluster
m p
Corr Size Power

X`
`
XI`
^`
XK`
^`
Equation (6) X`
`
XI`
^`
XK`
^`
Equation (6)
100 0)50 0)0 5)5 5)8 5)6 5)1 96)2 96)2 96)2 95)8
0)1 7)5* 6)0* 5)4 5)2 95)5 94)3 93)8 93)3
0)4 15)3* 6)0* 5)6 5)0 92)0 84)4 82)3 80)7
0)8 23)6* 6)1* 5)1 4)7 86)6 64)9 61)4 60)1
50 0)50 0)0 4)5 4)6 4)9 4)1 76)2 76)1 76)1 73)9
0)1 6)9* 5)2 5)2 4)4 76)0 71)3 70)6 67)7
0)4 14)5* 7)2* 6)4* 5)3 70)4 55)7 52)9 49)4
0)8 24)5* 6)7* 5)7 5)2 67)1 39)8 35)5 33)6
25 0)50 0)0 5)2 5)9 7)0* 4)6 48)3 49)8 49)9 43)0
0)1 6)9* 5)6 6)3* 3)9* 48)0 44)3 44)4 37)7
0)4 15)0* 6)9* 6)1* 4)4 47)3 32)8 31)1 26)5
0)8 24)9* 6)4* 5)3 4)2 49)7 23)2 20)1 16)8
100 0)20 0)0 5)6 5)6 6)4* 5)9 98)8 98)7 98)7 98)7
0)1 8)1* 6)4* 6)2* 5)7 98)3 97)5 97)6 97)4
0)4 12)9* 5)5 5)5 4)7 95)9 91)0 90)6 89)9
0)8 23)8* 6)8* 5)8 5)0 91)5 75)6 73)0 71)6
50 0)20 0)0 5)3 5)3 5)6 4)6 85)7 85)9 85)9 84)5
0)1 6)5* 4)9 5)4 4)6 83)6 80)0 79)9 77)6
0)4 12)4* 5)1 5)3 4)2 79)4 66)3 64)3 61)1
0)8 22)8* 5)5 5)1 4)4 74)4 47)1 44)6 42)1
25 0)20 0)0 5)2 5)4R 6)6* 4)0* 56)9 57)4 57)6 50)2
0)1 7)1* 5)4R 6)6* 4)4 56)0 51)7 51)4 46)1
0)4 12)7* 4)7R 5)5 3)7* 55)5 39)5 38)1 31)9
0)8 22)9* 5)0 4)6 3)6* 56)4 26)0 23)5S 21)0
100 0)05 0)0 4)8 4)4R 4)8 4)4 100 100 R 100 100
0)1 5)7 4)6R 5)1 4)6 100 100 R 100 100
0)4 9)4* 4)5R 5)3 4)4 100 99)8 99)9 99)8
0)8 24)5* 4)9 5)1 4)6 99)4 95)7 95)5S 95)2
50 0)05 0)0 4)4 4)4R 4)5 3)9* 96)4 99)4R 99)3S 99)2
0)1 4)8 3)9R* 4)8 3)7* 99)2 98)8R 98)7S 98)3
0)4 9)9* 4)0R* 4)8S 4)0* 97)3 93)6 93)2S 92)0
0)8 22)2* 4)3R 4)9S 3)9* 92)5 73)7 72)2S 76)1
25 0)05 0)0 4)1 4)2R 5)0S 3)4* 85)2 85)3R 83)7S 79)9
0)1 3)6* 3)9R* 4)6S 2)9* 83)4 79)9R 79)3S 75)8
0)4 8)3* 4)2R 4)5S 3)3* 79)6 66)0R 63)3S 59)6
0)8 21)2* 2)9R* 2)3S* 1)5* 74)3 40)6R 33)5S 34)8
`
"p
#0)10. The
power, since the estimates of size and power would likely be biased if we excluded them, but we set
the correlation estimates equal to the relevant boundary value (that is, !1)0 or #1)0). In the
tables we indicated with a - when this occurred. This approach of setting the estimates to the
relevant boundary value had a negligible eect on the results. Also, we cannot compute Eliasziw
Table III. Empirical size, power for McNemars test (X`
`
^`
^`
r
'r
`
"0)0; r
`
"0)5'r
"
,)5 units/cluster
m p
Corr Size Power

X`
`
XI`
^`
XK`
^`
Equation (6) X`
`
XI`
^`
XK`
^`
Equation (6)
100 0)05 0)1 7)9* 7)5* 6)0* 5)6 95)3 94)2 93)1 92)6
0)4 14)4* 10)0* 6)0* 5)4 91)7 87)5 81)8 80)7
0)8 20)0* 10)6* 5)1 5)0 88)3 79)0 69)0 68)2
50 0)50 0)1 6)6* 5)6 5)6 4)2 74)4 72)2 69)7 67)1
0)4 12)8* 8)5* 5)8 5)1 69)7 61)5 51)7 49)4
0)8 19)7* 9)8* 5)2 5)0 68)7 52)3 38)8 37)4
25 0)50 0)1 6)8* 6)4* 6)6* 4)0* 48)1 46)6 44)6 38)2
0)4 13)6* 9)7* 5)7 4)2 47)4 39)5 30)2 25)9
0)8 19)0* 10)2* 5)3 4)9 47)3 32)4 20)9 19)4
100 0)20 0)1 7)2* 6)4* 5)7 5)4 98)3 98)1 97)7 97)4
0)4 13)2* 8)2* 5)9 5)0 95)8 93)3 90)5 90)0
0)8 19)0* 8)8* 5)4 5)2 93)9 89)2 83)5 83)0
50 0)20 0)1 6)2* 5)8 5)2 4)5 83)1 81)5 79)8 78)0
0)4 13)7* 8)8* 5)9 4)9 79)5 74)1 66)5 64)5
0)8 18)3* 8)9R* 5)1 4)8 76)8 65)2 55)1 53)6
25 0)20 0)1 7)0* 5)6R 6)5* 4)1 56)6 54)8 52)0 46)1
0)4 13)2* 8)3* 5)9 4)4 55)7 50)1 39)7 36)6
0)8 17)3* 9)3R* 5)8 5)2 56)8 45)9 34)0 31)5
100 0)05 0)1 5)5 5)0R 5)0 4)6 100 100 R 100 100
0)4 10)2* 6)9R* 6)2* 5)8 100 100 R 100 100
0)8 18)8* 9)2R* 7)6* 7)4* 99)9 99)7 99)6 99)5
50 0)05 0)1 4)6 4)5R 4)7 3)7* 99)3 99)2R 98)9 98)7
0)4 8)4* 5)6R 5)2 4)3 98)6 96)9R 95)5S 95)1
0)8 17)3* 9)8R* 7)7S* 8)0* 95)5 92)1R 87)1S 87)2
25 0)05 0)1 4)3 4)4R 4)4S 3)5* 84)9 84)3R 82)1S 79)4
0)4 7)0* 4)9R 4)6S 3)7* 81)2 78)6R 72)0S 70)8
0)8 17)0* 10)1R* 7)1S* 7)8* 79)8 72)3R 57)3S 63)4
`
"p
#0)10. The
and Donners correlation pL when there is only one type of discordant pair (that is, all discordant
pairs in the sample are either (#,!) or (!,#)). In the tables we indicated with a when this
occurred.
Table II summarizes the results for the second correlation structure and with )5 units per
cluster. For p
"0)50, the size of both adjusted McNemar tests exceeded the nominal level:
XI`
^`
exceeded the nominal level 7 of 12 times; XK`
^`
exceeded the nominal level 4 of 12 times. In
contrast, the size of the proposed test was below the nominal level on one occasion. For p
"0)20
or 0)05 and small numbers of clusters, the size of the proposed test was less than nominal. The size
of XK`
^`
tended to run high at p
"0)20. For p
"0)20 or 0)05, the power of the adjusted

McNemar tests were, on average, less than 3 percentage points greater than the proposed test.
1502 N. OBUCHOWSKI
Table IV. Empirical size, power for McNemars test (X`
`
^`
^`
1/2r
"1/2r
`
"r
"
; r
`
"0)5, 2 units/cluster
m p
Corr Size Power

X`
`
XI`
^`
XK`
^`
Equation (6) X`
`
XI`
^`
XK`
^`
Equation (6)
100 0)50 0)0 4)7 4)7R 5)0 4)7 67)6 67)4 67)4 66)3
0)1 5)6 4)9 5)0 4)8 67)1 65)7 66)0 64)8
0)4 8)4* 5)8 5)5 5)2 65)8 60)4 59)1 58)4
0)8 11)0* 5)8 5)1 5)0 64)4 50)6 48)9 48)5
50 0)50 0)0 4)5 4)6R 5)2 4)2 39)8 40)1R 40)2 37)6
0)1 5)4 5)2R 5)3 4)5 39)7 38)9 39)0 36)5
0)4 7)5* 5)7 5)0 4)8 41)1 34)7 34)3 33)0
0)8 11)1* 5)6 5)2 5)1 42)5 29)4 27)2 26)4
25 0)50 0)0 4)9 5)1R 5)2 3)8* 23)4 23)3R 23)8 19)7
0)1 5)7 5)8R 5)5 3)9* 24)3 22)7R 23)3 19)3
0)4 8)0* 5)8R 5)8 4)2 26)8 22)1R 20)9 18)3
0)8 11)1* 5)7 5)0 4)2 27)7 17)4 15)9S 14)0
100 0)20 0)0 5)4 5)5R 5)7 5)5 78)3 78)6R 78)6 77)7
0)1 5)8 5)4R 5)5 5)3 78)5 77)5R 77)0 76)0
0)4 7)5* 6)0* 5)6 5)5 76)8 73)1 72)1 71)9
0)8 11)5* 6)0* 5)6 5)4 73)9 63)1 61)2 60)3
50 0)20 0)0 5)0 4)6R 5)0 3)9* 48)7 47)7R 47)8 44)3
0)1 5)4 4)5R 4)9 3)8* 49)1 46)2R 46)4 43)8
0)4 7)4* 4)9R 5)3 4)4 49)4 43)0R 42)0 40)5
0)8 11)6* 5)6 5)1 4)9 50)4 36)4 35)1 34)3
25 0)20 0)0 4)3 4)7R 4)7 4)0* 25)9 28)7R 25)7S 23)3
0)1 4)6 4)8R 4)8 4)1 26)2 25)7R 25)5S 22)6
0)4 7)1* 6)0R* 5)3S 4)5 28)6 24)4R 21)9S 19)0
0)8 9)4* 5)1R 4)1S 3)8* 31)1 21)5 19)0S 17)7
100 0)05 0)0 4)4 4)0R* 4)4 4)4 98)0 97)5R 97)5S 97)2
0)1 5)1 4)2R 4)7 4)6 97)4 96)9R 97)0S 96)6
0)4 6)3* 4)5R 5)1 4)5 96)8 95)2R 95)0 94)9
0)8 9)0* 3)7R* 3)5* 3)4* 95)0 90)9 90)4S 90)3
50 0)05 0)0 3)8* 3)4R* 3)9* 4)0* 79)3 78)3R 78)0S 78)0
0)1 4)0* 3)6R* 3)8S* 4)1 79)4 78)0R 76)7S 77)1
0)4 5)4 4)6R 4)1S 4)8 76)7 72)6R 70)9S 70)8
0)8 9)5* 4)4R 2)7S* 4)1 73)6 62)3R 57)8S 59)6
25 0)05 0)0 4)4 3)9R* 3)6S* 3)5* 46)2 44)6R 40)7S 39)6
0)1 4)7 4)1R 3)9S* 3)9* 45)6 44)1R 38)8S 38)6
0)4 5)8 4)1R 3)5S* 3)9* 46)0 40)2R 30)1S 34)2
0)8 7)9* 1)4R* 0)7S* 5)8 46)3 30)9R 16)1S 25)6
`
"p
#0)10. The
Table III summarizes results for the third correlation structure and with )5 units per cluster.
The size of XI`
^`
exceeded the nominal level in 19 of 27 simulations; this result was expected since
the correlation was not constant across clusters for this correlation structure. The size of
XK`
^`
exceeded the nominal level in 8 of 27 simulations. In contrast, the size of the proposed test
Table V. Empirical size, power for McNemars test (X`
`
^`
^`
1/2r
"1/2r
`
"r
"
; r
`
"0)5,)20 units/cluster
m p
Corr Size Power

X`
`
XI`
^`
XK`
^`
Equation (6) X`
`
XI`
^`
XK`
^`
Equation (6)
30 0)50 0)0 5)8 6)9* 7)1* 4)9 94)4 94)9 94)2 91)8
0)1 14)3* 7)2* 7)1* 5)0 88)1 79)7 78)2 73)7
0)4 31)5* 8)6* 7)4* 5)1 78)7 52)0 48)8 40)8
0)8 49)0* 7)6* 6)5* 5)5 74)8 29)5 26)0 22)4
20 0)50 0)0 6)2* 7)4* 8)4* 4)6 82)7 84)2 83)4 76)1
0)1 13)9* 6)6* 6)7* 4)6 76)1 65)9 64)6 54)1
0)4 31)2* 8)3* 7)7* 4)6 69)6 39)0 36)3 28)0
0)8 49)8* 7)1* 6)2* 4)0* 67)7 21)7 19)1 15)1
10 0)50 0)0 5)8 8)2* 10)6* 3)0* 53)6 59)5 59)5 32)8
0)1 12)8* 8)1* 9)7* 3)1* 52)0 41)0 41)1 21)0
0)4 29)2* 8)1* 8)7* 2)9* 54)6 24)1 23)6 10)6
0)8 48)6* 6)9* 8)2* 2)8* 58)9 14)8 12)5S 5)7
30 0)20 0)0 5)3 6)4* 6)9* 4)6 98)2 98)1 97)9 97)0
0)1 11)0* 5)6 6)6* 4)5 95)0 90)0 90)3 86)9
0)4 27)1* 7)3* 6)9* 4)6 86)4 61)6 61)0 54)3
0)8 46)9* 6)5* 6)1* 4)6 80)5 34)2 32)2 27)8
20 0)20 0)0 5)3 6)4* 8)9* 4)8 90)3 91)2 89)6 84)8
0)1 10)5* 5)6 7)3* 3)6* 85)7 76)9 77)0 68)9
0)4 28)2* 7)0* 8)2* 3)8* 77)1 47)2 46)1 36)2
0)8 48)9* 6)4* 6)9* 4)0* 72)6 26)0 24)2 19)0
10 0)20 0)0 5)3 8)2* 11)8* 2)7* 63)5 67)1 68)1 42)9
0)1 10)4* 7)0* 9)3* 2)7* 62)9 52)4 53)5 29)1
0)4 27)5* 7)1* 8)8* 2)5* 60)0 28)7 29)3 12)6
0)8 48)6* 5)2 6)0S* 1)5* 62)5 14)7 15)0S 5)4
30 0)05 0)0 5)4 5)9R 6)8* 4)7 100 100 100 100
0)1 6)3* 3)9R* 6)3* 4)2 99)9 99)9 99)8 99)8
0)4 20)7* 3)8R* 6)5* 3)8* 98)8 91)4 92)1 89)5
0)8 45)3* 2)9R* 5)2S 2)0* 92)7 57)1 56)3S 51)8
20 0)05 0)0 5)5 6)1R* 8)4* 4)2 99)7 99)7 99)3 99)4
0)1 7)6* 5)1R 7)1* 4)1 98)8 97)9 97)1 95)6
0)4 21)6* 4)2R 7)1* 3)3* 93)9 76)6 78)4S 68)8
0)8 46)0* 2)3R* 4)7S 1)2* 86)2 38)3 35)2S 30)3
10 0)05 0)0 4)4 7)8R* 8)9S* 3)4* 90)7 91)1R 85)2S 72)2
0)1 6)5* 6)2R* 7)6S* 1)7* 86)5 79)1R 76)0S 56)4
0)4 18)9* 4)8R 6)4S* 1)7* 79)2 45)4R 44)8S 21)1
0)8 42)5* 1)3R* 4)0S* 5)5 73)4 14)0R 15)1S 3)4
`
"p
#0)10. The
exceeded the nominal level in 3 cases and fell below the nominal level in 4; these results occurred
mainly at p
"0)05.
Table IV summarizes the results for a balanced case of exactly two units per cluster. The size of
XI`
^`
ran slightly above the nominal level at p
"0)20, while the proposed test ran below the

1504 N. OBUCHOWSKI
nominal level. The size of the proposed test and both of the adjusted McNemar tests ran below
the nominal level at p
"0)05. The adjusted McNemar tests oered more power than the
proposed test, by an average of 3 percentage points.
Finally, Table V summarizes the results when the ratio of the number of clusters to the average
cluster size is small and the clusters are of quite variable size. The size of the adjusted McNemar
tests exceeded the nominal level in most of the simulations. In contrast, the size of the proposed
test was at or below the nominal level.
ILLUSTRATIVE EXAMPLE
Table VI summarizes the specicity data for PET and SPECT. Specicity is dened as the
proportion of true negative (TN) test results among glands conrmed not to have hyperpara-
thyroidism. In 21 patients there were 51 glands conrmed at surgery not to have hyper-
parathyroidism. (Note that there was only one patient who had more than one gland with
hyperparathyroidism, so we focus our illustration of these methods on estimating and comparing
the specicities of the two diagnostic tests, rather than the sensitivities.)
From equation (1), the estimated specicity of PET is 40/51, or 0)784; its standard error is
0)0696 (from equation (2)). The estimated specicity of SPECT is 46/51, or 0)902, and its standard
error is 0)0380. Under the hypothesis of homogeneity, the estimated variance of the estimated
dierence between the specicities of PET and SPECT is 0)00484. The value of the test statistic in
(6) is 2)88. We compare 2)88 to a chi-square distribution with 1 d.f. The associated p-value is 0)089,
and we conclude that there is marginal evidence that the specicity of SPECT exceeds that of
PET. The 95 per cent condence interval for the dierence in specicities is [!0)242, 0)006].
The adjusted McNemar test based on the correlation estimated from discordant pairs only
yields pL equal to 1)0, and the test statistic is 3)00. We compare 3)00 to a chi-square distribution
with 1 d.f. The associated p-value is 0)084, and we conclude that there is marginal evidence that
the specicities of the two tests dier.
The adjusted McNemar test based on the correlation estimated from both discordant and
concordant pairs yields pJ equal to 0)458, and the test statistic is 3)66. We compare 3)66 to
a chi-square distribution with 1 d.f. The associated p-value is 0)056, and we conclude that there is
marginal evidence that the specicities of the two tests dier. Note that the two versions of the
adjusted McNemar test` require unit-specic data (that is, paired data on each gland as given in
the brackets in Table VI), whereas the proposed test requires only the summary data for each
cluster (that is, x
GH
and n
H
).
Finally, if we had ignored the intracluster correlation and performed McNemars test, the 2;2
table would be:
SPECT
TN FP
PET TN 39 1
FP 7 4
where TN denotes true negative test results and FP denotes false positive test results. McNemars
test statistic is 4)5 with associated p-value of 0)034. Thus, based on McNemars test, we would
reject the null hypothesis.
Table VI. Specicity of PET versus SPECT
Patient number Number of glands x
'"''' H
x
'`"'''' H
(n
H
)
1 3 0 [0,0,0] 2 [0,1,1]
2 3 2 [1,1,0] 3 [1,1,1]
3 3 3 [1,1,1] 3 [1,1,1]
4 1 1 [1] 1 [1]
5 3 2 [1,1,0] 3 [1,1,1]
6 4 4 [1,1,1,1] 4 [1,1,1,1]
7 3 3 [1,1,1] 3 [1,1,1]
8 2 2 [1,1] 2 [1,1]
9 2 2 [1,1] 1 [1,0]
10 1 1 [1] 1 [1]
11 3 2 [1,1,0] 2 [1,1,0]
12 2 2 [1,1] 2 [1,1]
13 3 3 [1,1,1] 3 [1,1,1]
14 2 2 [1,1] 2 [1,1]
15 2 0 [0,0] 2 [1,1]
16 3 2 [1,1,0] 2 [1,1,0]
17 3 2 [1,1,0] 2 [1,1,0]
18 3 2 [1,1,0] 3 [1,1,1]
19 2 2 [1,1] 2 [1,1]
20 1 1 [1] 1 [1]
21 2 2 [1,1] 2 [1,1]
H
n
H
"51
H
x
'"''' H
"40
H
x
'`"''''H
"46
x
GH
is the number of true negative test results for patient j, test i.
The numbers in brackets correspond with the test result for each gland, where
1 indicates that the gland was called negative for hyperparathyroidism (that is, true
negative) and 0 indicates that the gland was called positive (that is, false positive). The
PET test result listed rst in the brackets corresponds with the SPECT test result
listed rst, etc.
DISCUSSION
In 1991 Eliasziw and Donner` reported that for matched pair data sampled in clusters,
McNemars test can lead to a tenfold increase in the type I error rate. We observed similar
increases. In Eliasziw and Donners study, they found that their adjusted McNemar test based on
the correlation estimated from discordant pairs only (that is, XK`
^`
) yielded signicance levels
slightly above nominal. Furthermore, this test statistic cannot be computed when there is only
one type of discordant pair in the sample (that is, (#,!) or (!,#)). Our simulation study also
supports these ndings. Eliasziw and Donner recommended use of their adjusted McNemar test,
which uses the correlation estimated from both concordant and discordant pairs (that is,
XI`
^`
) for most applications. However, we found that for some correlation patterns, the adjusted
McNemar test can exceed the nominal size. In our study, the size of XI`
^`
was inated when the
correlation between treatments of the same unit exceeded the correlation between treatments of
dierent units in the same cluster (that is, r
`
'r
"
in Figure 1). Moreover, we anticipate that this
type of correlation pattern might occur often in practice.
1506 N. OBUCHOWSKI
The size of the test proposed here, on the other hand, was close to the nominal level for all
correlation structures considered in this study. The proposed test rarely exceeded the nominal
type I error rate. In samples with few events (that is, few clusters and/or few units and/or low
response rate), the size of the proposed test may be below the nominal level. The power of the
proposed test is generally lower than Eliasziw and Donners tests (that is, in those cases where the
sizes of their tests are the nominal level, the average loss in power was 13 percentage points).
The choice between the adjusted McNemar tests and the test proposed here depends primarily
on the correlation pattern. In practice, when little is known about the correlation pattern, the test
proposed here is a good alternative to the adjusted McNemar tests because it will rarely exceed
the nominal size and the loss in power is small. The proposed test is also simple to implement.
Note that one can also use generalized estimating equations (GEE)"` for such data. The GEE
approach is more complicated in terms of both the model and computations, but may be more
powerful when the model is specied correctly. In addition, covariates can be included in the
model. Smith and Hadgu describe the application of GEE for estimating sensitivity and
specicity and standard errors for clustered binary data.
ACKNOWLEDGEMENTS
Thanks to two referees whose comments greatly improved this paper, to Michael Lieber
and Dr. Mark Schluchter for constructive comments on the presentation of this work, and
to Dr. Donald Neumann for the use of his data.
REFERENCES
1. Conover, W. J. Practical Nonparametric Statistics, 2nd edn, Wiley, New York, 1980.
2. Rao, J. N. K. and Scott, A. J. A simple method for the analysis of clustered binary data, Biometrics, 48,
577585 (1992).
3. Eliasziw, M. and Donner, A. Application of the McNemar test to non-independent matched pair data,
Statistics in Medicine, 10, 19811991 (1991).
4. Liang, K.-Y. and Zeger, S. L. Longitudinal data analysis using generalized linear models, Biometrika,
73, 1322 (1986).
5. Zeger, S. L. and Liang, K.-Y. Longitudinal data analysis for discrete and continuous outcomes,
Biometrics, 42, 121130 (1986).
6. Neumann, D. R., Esselstyn, C. B., MacIntyre, W. J., Go, R. T., Obuchowski, N. A., Chen, E. Q. and
Licata, A. A. Comparison of FDG PET and Sestamibi-SPECT in Primary Hyperparathyroidism,
Journal of Nuclear Medicine, 37, 18091815 (1996).
7. Scott, A. J. and Wu, C. F. J. On the asymptotic distribution of ratio and regression estimators, Journal
of the American Statistical Association, 76, 98102 (1981).
8. Donner, A. Statistical methodology for paired cluster designs, American Journal of Epidemiology, 126,
o972979 (1987).
9. Fleiss, J. L. Statistical Methods for Rates and Proportions, 2nd edn, Wiley, New York, 1981.
10. Petryshen, P. A study of patient-psychiatrist disagreement: contributing variables and consequences,
unpublished doctoral dissertation, The University of Western Ontario, 1988.
11. Smith, P. J. and Hadgu, A. Sensitivity and specicity for correlated observations, Statistics in Medicine,
11, 15031509 (1992).

(Sici) 1097 0258 (19980715) 17 13 1495 Aid Sim863 3.0

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Sici) 1097 0258 (19980715) 17 13 1495 Aid Sim863 3.0

Uploaded by

Copyright:

Available Formats

STATISTICS IN MEDICINE

Statist. Med. 17, 14951507 (1998)

is, roughly, the average number of

"0)05, 0)20 and 0)50).

"0)20 or 0)05, both XI`

Corr Size Power

Corr Size Power

Corr Size Power

"0)20 or 0)05, the power of the adjusted

Corr Size Power

Corr Size Power

"0)20, while the proposed test ran below the

You might also like