Sample Size Considerations For Stepped Wedge Designs With Subclusters

Received: 7 May 2021 Revised: 12 October 2021 Accepted: 21 October 2021
DOI: 10.1111/biom.13596
BIOMETRIC METH ODOLOGY
Sample size considerations for stepped wedge designs with

subclusters
Kendra Davis-Plourde1,2,3 Monica Taljaard4,5 Fan Li1,3
1Department of Biostatistics, Yale School

of Public Health, New Haven, Abstract
Connecticut, USA The stepped wedge cluster randomized trial (SW-CRT) is an increasingly popular
2Department of Internal Medicine, Yale design for evaluating health service delivery or policy interventions. An essen-
School of Medicine, New Haven,
Connecticut, USA
tial consideration of this design is the need to account for both within-period
3Center for Methods in Implementation and between-period correlations in sample size calculations. Especially when
and Prevention Science, Yale University, embedded in health care delivery systems, many SW-CRTs may have subclus-
New Haven, Connecticut, USA
ters nested in clusters, within which outcomes are collected longitudinally. How-
4Clinical Epidemiology Program, Ottawa
ever, existing sample size methods that account for between-period correlations
Hospital Research Institute, Ottawa,
Ontario, Canada have not allowed for multiple levels of clustering. We present computationally
5School of Epidemiology and Public efficient sample size procedures that properly differentiate within-period and
Heath, University of Ottawa, Ottawa, between-period intracluster correlation coefficients in SW-CRTs in the presence
Ontario, Canada
of subclusters. We introduce an extended block exchangeable correlation matrix
Correspondence Department of Bio- to characterize the complex dependencies of outcomes within clusters. For Gaus-
statistics, Yale School of Public Health,
sian outcomes, we derive a closed-form sample size expression that depends on
New Haven, CT 06520, USA.
Email: kendra.plourde@yale.edu the correlation structure only through two eigenvalues of the extended block
exchangeable correlation structure. For non-Gaussian outcomes, we present a
First published: 31 October 2021.
generic sample size algorithm based on linearization and elucidate simplifica-
Funding information tions under canonical link functions. For example, we show that the approxi-
National Institute on Aging, Grant/Award mate sample size formula under a logistic linear mixed model depends on three
Number: U54AG063546
eigenvalues of the extended block exchangeable correlation matrix. We provide
an extension to accommodate unequal cluster sizes and validate the proposed
methods via simulations. Finally, we illustrate our methods in two real SW-CRTs
with subclusters.
KEYWORDS
cluster randomized trial, eigenvalues, extended block exchangeable correlation structure, gen-
eralized linear mixed models, power analysis
1 INTRODUCTION
and statistical efficiency (Hemming and Taljaard, 2020).
Stepped wedge cluster randomized trials (SW-CRTs) are In an SW-CRT, one or more randomly allocated clusters
an increasingly popular trial design for a variety of rea- are scheduled to transition from control to intervention at
sons including their potential benefits of increased power prespecified time points until all clusters are exposed to
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium,
provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2021 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.
98 wileyonlinelibrary.com/journal/biom Biometrics. 2023;79:98–112.

15410420, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/biom.13596 by University of Hong Kong, Wiley Online Library on [23/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DAVIS-PLOURDE et al. 99
Period 1 Period 2 Period 3 Period 4 Period 5 vidually randomized trial needs to be multiplied to obtain
Cluster 1 the sample size required for a three-level CRT and coin-
Cluster 2 cides with the leading eigenvalue of the nested exchange-
Cluster 3
able correlation structure (Li et al., 2019). The same DE
expression also applies to three-level CRTs with a binary
Cluster 4
outcome (Teerenstra et al., 2010; Liu and Colditz, 2020).
Cluster 5
Frequently, it is assumed that the between-subcluster ICC
Cluster 6
does not exceed the within-subcluster ICC (Teerenstra
Cluster 7 et al., 2010), which suggests, based on the DE, that 𝜌0 = 𝛼0
Cluster 8 would lead to conservative sample size estimates under
parallel randomization.
F I G U R E 1 A schematic illustration of a SW-CRT with eight The design and analysis of SW-CRTs have been con-
clusters and five periods. Each white cell indicates a cluster period ventionally based on linear mixed models (Li et al.,
under the control condition and each gray cell indicates a cluster
2021). Assuming a Gaussian outcome, Hemming et al.
period under the intervention condition. There are in total 𝑆 = 4
(2015) proposed a generic framework for designing cross-
distinct intervention sequences
sectional SW-CRTs and extended the Hussey and Hughes
(2007) linear mixed model by including an additional
the intervention. Figure 1 provides a schematic of a typi- random intercept at the subcluster level. The variance of
cal SW-CRT in which eight clusters are allocated to four the intervention effect estimator was obtained using the
sequences and outcomes are observed within each of five covariance matrix of the generalized least squares estima-
time periods. SW-CRTs can employ either a cross-sectional tor, but no analytical formulas were provided. Teerenstra
or closed-cohort design, depending on whether repeated et al. (2019) extended the Hemming et al. (2015) approach
measurements are taken from the same or different indi- to accommodate closed-cohort SW-CRTs and developed
viduals in each period. closed-form DE expressions. However, these approaches
A comprehensive methodological review of available implicitly assume that the between-period ICC equals the
statistical methods for SW-CRTs is presented in Li et al. within-period ICC, both within and between subclusters.
(2021). While almost all current methods for designing Under this strong assumption, we show in Section 3.1
SW-CRTs have assumed a two-level structure within peri- that the variance of the intervention effect vanishes as
ods (e.g., patients nested in practices), several recent the subcluster size grows indefinitely, which may lead to
trials have utilized data from patients nested within sub- an underestimated sample size when the between-period
clusters, giving rise to three-level structures within peri- ICCs differ from the within-period ICCs. To the best of our
ods. For example, the Lumbar Imaging with Reporting knowledge, explicit sample size formulas that differentiate
of Epidemiology (LIRE) trial was an SW-CRT that evalu- between-period and within-period ICCs in SW-CRTs with
ated the impact of inserting benchmark prevalence data in multiple levels of clustering have not been previously
spinal imaging reports on subsequent health care utiliza- derived. Furthermore, with a binary outcome, Teerenstra
tion, and cross-sectionally sampled patients nested in pri- et al. (2019) considered an approximation based on a
mary care providers, who are nested in practices (Jarvik linear mixed model, which in the case of a single level of
et al., 2015). As another example, the Washington State clustering has already been shown to be inaccurate (Zhou
Expedited Partner Therapy (EPT) SW-CRT randomized et al., 2020). To this end, it is necessary to develop more
local health jurisdictions (LHJ) consisting of clinics and accurate sample size procedures that acknowledge the
cross-sectionally measured chlamydia infection in patients mean-variance relationship with a non-Gaussian outcome
(Golden et al., 2015). In this paper, we present new meth- for multilevel SW-CRTs.
ods for designing SW-CRTs that explicitly account for the To address these issues, we consider a generalized lin-
three-level structure within each period while additionally ear mixed model (GLMM) that characterizes five possible
accounting for the between-period correlation parameters. sources of random variation in SW-CRTs with subclusters,
For the standard parallel-arm CRT design (without while differentiating within-period and between-period
repeated measures), Heo and Leon (2008) and Teerenstra ICCs. With a Gaussian outcome, we describe an extended
et al. (2010) developed the design effect (DE) for three-level block exchangeable correlation structure parameterized by
data with a Gaussian outcome. The DE is expressed as 1 + at most five ICC parameters, depending on whether a
(𝑁 − 1)𝛼0 + 𝑁(𝐾 − 1)𝜌0 , where 𝛼0 and 𝜌0 are the within- cross-sectional or closed-cohort design is assumed at each
subcluster and between-subcluster intracluster correlation level. We show in Section 3 that the variance of the inter-
coefficients (ICCs), 𝐾 and 𝑁 are the number of subclus- vention effect estimator depends on the ICCs only through
ters and subcluster size, respectively. This DE represents two distinct eigenvalues of the extended block exchange-
the amount by which the sample size required for an indi- able correlation matrix. Although a scalar closed-form
100 DAVIS-PLOURDE et al.
variance expression is unavailable for other exponential tion and 0 otherwise), and 𝛿 is the intervention effect
family outcomes, we propose a computationally efficient of interest on the link function scale. We also write
⊤
generic sample size algorithm based on cluster period 𝜽 = (𝛽1 , … , 𝛽𝑇 , 𝛿) . Model (1) includes five random effects
averages. Under the canonical link functions, new strate- to reflect the multilevel structure of the data: 𝑏𝑖 ∼  (0, 𝜎𝑏2 )
gies are provided to circumvent complex numerical inte- is the random cluster effect, 𝑐𝑖𝑘 ∼  (0, 𝜎𝑐2 ) is the ran-
gration. In addition, we also extend our approach to dom subcluster effect, 𝑠𝑖𝑗 ∼  (0, 𝜎𝑠2 ) is the random
accommodate unequal cluster sizes. Our simulation stud- cluster-by-period interaction, and 𝜋𝑖𝑗𝑘 ∼  (0, 𝜎𝜋2 ) is the
ies to validate the proposed sample size methodology are random subcluster-by-period interaction. Furthermore,
presented in Section 4 and applications to two SW-CRTs since design (A) involves a closed-cohort of subjects,
with subclusters are presented in Section 5. Section 6 pro- 𝛾𝑖𝑘𝑙 ∼  (0, 𝜎𝛾2 ) is the random subject-level effect arising
vides concluding remarks. from the repeated outcome measurements for each same
subject. In particular, the random interactions, 𝑠𝑖𝑗 and 𝜋𝑖𝑗𝑘 ,
can represent variations within each cluster and subcluster
2 MODELS AND GENERIC SAMPLE due to time-varying characteristics of the cluster and each
SIZE CONSIDERATIONS subcluster. The inclusion of these random interactions
further allows the within-period ICCs to differ from the
2.1 Three design variants between-period ICCs at each level of clustering, which is
considered critical for accurate power calculation even in
We consider a multilevel SW-CRT with I clusters and T SW-CRTs without subclusters (Taljaard et al., 2016).
periods, with subclusters nested in clusters and subjects Following convention in modeling for SW-CRTs (Li et al.,
nested in subclusters. We consider three possible varia- 2021), we assume all random effects are mutually indepen-
tions of this multilevel stepped wedge design (Figure 2): dent and 𝑌𝑖𝑗𝑘𝑙 is a random realization from a parametric
(A) a closed-cohort design at both the subcluster and sub- distribution with mean 𝜇𝑖𝑗𝑘𝑙 and higher order moments
ject levels, in which case repeated measurements are taken as potential functions of 𝜇𝑖𝑗𝑘𝑙 . Model (1) includes several
on the same subjects in each subcluster over time; (B) a existing models for SW-CRTs as special cases. For exam-
closed-cohort design on the subcluster level but a cross- ple, when 𝜎𝑠2 = 𝜎𝜋2 = 𝜎𝛾2 = 0, model (1) coincides with the
sectional design at the subject level, in which case different model in Hemming et al. (2015) for cross-sectional SW-
subjects in the same subcluster are sampled at each time CRTs; when 𝜎𝑠2 = 𝜎𝜋2 = 0, model (1) becomes the model in
period; (C) a cross-sectional design at both the subcluster Teerenstra et al. (2019) without cluster- or subcluster-by-
and subject level, in which case different subjects within time interactions; when 𝜎𝑐2 = 𝜎𝜋2 = 0, model (1) reduces
different subclusters are sampled in each cluster during to the model in Hooper et al. (2016) for closed-cohort
each period. Our development will include all three vari- SW-CRTs without subclusters. Finally, models for design
ants. (B) or (C) can be obtained by setting 𝜎𝛾2 = 0 or 𝜎𝛾2 = 𝜎𝑐2 = 0
in model (1), due to the absence of repeated assessments
for the same subjects or the same subclusters.
2.2 Statistical model
We consider a GLMM to represent the average secu- 2.3 Generic sample size requirement
lar trend, intervention effect, as well as the three-level
structure within each period. Let 𝑌𝑖𝑗𝑘𝑙 be the outcome In the design phase, the power to detect an effect size 𝛿 ≠ 0
of interest for individual 𝑙 = 1, … , 𝑁𝑖𝑗𝑘 nested in sub- with a two-sided 𝛼-level Wald test is approximately (Har-
cluster 𝑘 = 1, … , 𝐾𝑖𝑗 nested in cluster 𝑖 = 1, … , 𝐼 during rison and Brady, 2004)
period 𝑗 = 1, … , 𝑇. For generality, we assume design (A), ( √ )
where a closed-cohort of subjects in the same set of sub- ˆ
power ≈ 1 − Φ𝑡 𝑡𝛼∕2,DoF ; DoF, |𝛿|∕ var(𝛿) , (2)
clusters are measured in the study during each period.
Designs (B) and (C) will be cast as two special cases.
For design (A), the conditional mean model for 𝜇𝑖𝑗𝑘𝑙 = where 𝑡𝛼∕2,DoF is the upper 𝛼∕2th quantile of the cen-
𝔼(𝑌𝑖𝑗𝑘𝑙 |𝑏𝑖 , 𝑐𝑖𝑘 , 𝑠𝑖𝑗 , 𝜋𝑖𝑗𝑘 , 𝛾𝑖𝑘𝑙 ) is given by tral 𝑡-distribution with specified degrees of freedom
(DoF) and Φ𝑡 (𝑡; DoF, Λ) is the cumulative 𝑡-distribution
𝑔(𝜇𝑖𝑗𝑘𝑙 ) = 𝛽𝑗 + 𝛿𝑋𝑖𝑗 + 𝑏𝑖 + 𝑐𝑖𝑘 + 𝑠𝑖𝑗 + 𝜋𝑖𝑗𝑘 + 𝛾𝑖𝑘𝑙 , (1) function with DoF and noncentrality parameter Λ. The
ˆ is an implicit
variance of the intervention effect, var(𝛿),
where 𝑔 is a link function, 𝛽𝑗 represents the categorical function of the number of clusters (𝐼) and other design
secular trend, 𝑋𝑖𝑗 is the intervention status for cluster parameters. While power expression (2) is asymptotically
𝑖 at period 𝑗 (equal to 1 if exposed under interven- equivalent to its counterpart with a standard normal
(A) Closed-cohort design at both the subcluster and subject levels: repeated measurements are taken on
the same subjects in each subcluster over time.
Cluster i
Period 1 .................................................................... Period T
Subcluster 1 ................ Subcluster Ki Subcluster 1 ................ Subcluster Ki
Indiv. 1 ... Indiv. Ni Indiv. 1 ... Indiv. Ni Indiv. 1 ... Indiv. Ni Indiv. 1 ... Indiv. Ni
(B) Closed-cohort design on the subcluster level but a cross-sectional design at the subject level:
different subjects in the same subcluster are sampled at each time period.
Cluster i
Period 1 .................................................................... Period T
(C) Cross-sectional design at both the subcluster and subject level: different subjects within different
subclusters are sampled in each cluster during each period.
Cluster i
Period 1 .................................................................... Period T
F I G U R E 2 Three design variants for a multilevel SW-CRT with 𝑇 periods, 𝐾𝑖 subclusters per cluster, and 𝑁𝑖 individuals (Indiv.) per
subcluster. Colors denote unique subclusters and individuals
distribution as the DoF approach infinity, we consider the 3 VARIANCE OF THE INTERVENTION
𝑡-distribution with DoF = 𝐼 − 2 since it has been found to EFFECT ESTIMATOR
maintain a valid empirical type I error rate with a limited
number of clusters in SW-CRTs (Ford and Westgate, 3.1 Gaussian outcomes
2020; Li et al., 2021). By Equation (2), the sample size
requirement for the multilevel SW-CRT requires us to With a Gaussian outcome, we consider 𝑔 in model (1)
characterize an expression of var(𝛿)ˆ at the design phase. to be the identity function, and assume 𝑌𝑖𝑗𝑘𝑙 = 𝜇𝑖𝑗𝑘𝑙 +
To do so, we first assume equal cluster and subcluster sizes 𝜖𝑖𝑗𝑘𝑙 , where 𝜖𝑖𝑗𝑘𝑙 ∼  (0, 𝜎𝜖2 ) is an independent residual
such that 𝐾𝑖𝑗 = 𝐾 and 𝑁𝑖𝑗𝑘 = 𝑁. We relax this assumption error. In this linear mixed model, we define 𝜎2 = 𝜎𝑏2 +
in Section 3.3. 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝛾2 + 𝜎𝜖2 as the total variance. Under design
T A B L E 1 Definition of ICCs: within-period within-subcluster (𝛼0 ), between-period within-subcluster (𝛼1 ), within-subject

auto-correlation (𝛼2 ), within-period between-subcluster (𝜌0 ), and between-period between-subcluster (𝜌1 ) under each design variant: (A)
Closed-cohort design at both the subcluster and subject levels (total variance 𝜎𝐴2 = 𝜎𝑏2 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝛾2 + 𝜎𝜖2 ), (B) Closed-cohort design on
the subcluster level but a cross-sectional design at the subject level (total variance 𝜎𝐵2 = 𝜎𝑏2 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝜖2 ), and (C) Cross-sectional
design at both the subcluster and subject level (total variance 𝜎𝐶2 = 𝜎𝑏2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝜖2 )
ICC Definition Design (A) Design (B) Design (C)
( ) ( 2 ) ( 2 ) ( 2 )
𝛼0 corr 𝑌𝑖𝑗𝑘𝑙 , 𝑌𝑖𝑗𝑘𝑙′ 𝜎 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 ∕𝜎𝐴2 𝜎 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 ∕𝜎𝐵2 𝜎𝑏 + 𝜎𝑠2 + 𝜎𝜋2 ∕𝜎𝐶2
( ) ( 𝑏2 ) ( 𝑏2 )
𝛼1 corr 𝑌𝑖𝑗𝑘𝑙 , 𝑌𝑖𝑗′ 𝑘𝑙′ 𝜎 + 𝜎𝑐2 ∕𝜎𝐴2 𝜎𝑏 + 𝜎𝑐2 ∕𝜎𝐵2 𝜌1
( ) ( 𝑏2 )
𝛼2 corr 𝑌𝑖𝑗𝑘𝑙 , 𝑌𝑖𝑗′ 𝑘𝑙 𝜎 + 𝜎𝑐2 + 𝜎𝛾2 ∕𝜎𝐴2 𝛼1 𝜌1
( ) ( 𝑏2 ) ( 2 ) ( 2 )
𝜌0 corr 𝑌𝑖𝑗𝑘𝑙 , 𝑌𝑖𝑗𝑘′ 𝑙′ 𝜎𝑏 + 𝜎𝑠2 ∕𝜎𝐴2 𝜎𝑏 + 𝜎𝑠2 ∕𝜎𝐵2 𝜎𝑏 + 𝜎𝑠2 ∕𝜎𝐶2
( )
𝜌1 corr 𝑌𝑖𝑗𝑘𝑙 , 𝑌𝑖𝑗′ 𝑘′ 𝑙′ 𝜎𝑏2 ∕𝜎𝐴2 𝜎𝑏2 ∕𝜎𝐵2 𝜎𝑏2 ∕𝜎𝐶2
(A), this linear mixed model induces five different ICCs: 𝑰𝐾 ⊗ {(1 − 𝛼0 )𝑰𝑁 + (𝛼0 − 𝜌0 )𝑱𝑁 } + 𝜌0 𝑱𝐾𝑁 and 𝑰𝐾 ⊗ {(𝛼2 −
(i) the within-period within-subcluster ICC, 𝛼0 ; (ii) the 𝛼1 )𝑰𝑁 + (𝛼1 − 𝜌1 )𝑱𝑁 } + 𝜌1 𝑱𝐾𝑁 , respectively, and, there-
between-period within-subcluster ICC, 𝛼1 ; (iii) the within- fore, the structure (3) resembles the block exchangeable
period between-subcluster ICC, 𝜌0 ; (iv) the between- structure studied in Li et al. (2018). For deriving the vari-
period between-subcluster ICC, 𝜌1 ; and (v) the within- ance of the intervention effect, we first provide the two key
subject autocorrelation, 𝛼2 . We explicitly define the ICCs intermediate results on the induced correlation structure.
as functions of variance components under each design
variant in Table 1. Lemma 1. The induced extended block exchangeable corre-
As the variance components are nonnegative, we implic- lation matrix has at most six unique eigenvalues, which can
itly have 𝛼0 ≥ 𝛼1 ≥ 𝜌1 , 𝛼0 ≥ 𝜌0 ≥ 𝜌1 , and 𝛼2 ≥ 𝛼1 ≥ 𝜌1 be expressed as linear functions of the five ICC parameters:
without further restrictions. To aid in the interpretation of
ICCs, an investigator could also parameterize the between- 𝜆1 = 1 − 𝛼0 − 𝛼2 + 𝛼1
period ICCs based on the within-period ICCs by assum-
ing a particular cluster autocorrelation coefficient (CAC), 𝜆2 = 1 − 𝛼0 − 𝛼2 + 𝛼1 + 𝑁(𝛼0 − 𝛼1 − 𝜌0 + 𝜌1 )
defined as the ratio of between-period to within-period 𝜆3 = 1 − 𝛼0 − 𝛼2 + 𝛼1 + 𝑁{𝛼0 − 𝛼1 + (𝐾 − 1)(𝜌0 − 𝜌1 )}
ICCs (Hooper et al., 2016; Martin et al., 2016). Because
ICCs are conventionally used in designing CRTs and intu- 𝜆4 = 1 − 𝛼0 + (𝑇 − 1)(𝛼2 − 𝛼1 )
itive to understand, we will characterize the variance of
𝜆5 = 1 − 𝛼0 + (𝑇 − 1)(𝛼2 − 𝛼1 )
the intervention effect with these five ICC parameters and
obtain the variances for design (B) or (C) by setting 𝛼2 = + 𝑁{𝛼0 − 𝜌0 + (𝑇 − 1)(𝛼1 − 𝜌1 )}
𝛼1 , or 𝛼2 = 𝛼1 = 𝜌1 . In addition, the variance components
or ICCs should ensure a positive definite correlation struc- 𝜆6 = 1 − 𝛼0 + (𝑇 − 1)(𝛼2 − 𝛼1 )
ture for all outcomes within each cluster. Specifically, if we + 𝑁[𝛼0 + (𝑇 − 1)𝛼1 + (𝐾 − 1){𝜌0 + (𝑇 − 1)𝜌1 }] (4)
⊤ ⊤ ⊤ ⊤ ⊤ ⊤
define 𝒀𝑖 = (𝒀𝑖1 , … .𝒀𝑖𝑇 ) , where 𝒀𝑖𝑗 = (𝒀𝑖𝑗1 , … , 𝒀𝑖𝑗𝐾 ) ,
⊤ with algebraic multiplicities (𝑇 − 1)𝐾(𝑁 − 1), (𝑇 − 1)(𝐾 −
and 𝒀𝑖𝑗𝑘 = (𝑌𝑖𝑗𝑘1 , … , 𝑌𝑖𝑗𝑘𝑁 ) , then the induced correla-
tion structure, 𝑹𝑖 = corr(𝒀𝑖 ), follows an extended block 1), 𝑇 − 1, 𝐾(𝑁 − 1), 𝐾 − 1, and 1, respectively. The set of ICC
exchangeable structure and can be written as a linear com- parameters {𝛼0 , 𝛼1 , 𝛼2 , 𝜌0 , 𝜌1 } for which 𝑹𝑖 is positive defi-
bination of six basis matrices, nite corresponds to the convex open subset characterized by
min{𝜆1 , 𝜆2 , 𝜆3 , 𝜆4 , 𝜆5 , 𝜆6 } > 0.
𝑹𝑖 = (1 − 𝛼0 − 𝛼2 + 𝛼1 )𝑰𝑇𝐾𝑁 + (𝛼0 − 𝜌0 − 𝛼1 + 𝜌1 )𝑰𝑇𝐾
Lemma 2. The extended block exchangeable correlation
⊗ 𝑱𝑁 + (𝜌0 − 𝜌1 )𝑰𝑇 ⊗ 𝑱𝐾𝑁 matrix has a closed-form inverse, which shares the same set
of basis matrices with 𝑹𝑖 , with coefficients determined by the
+ (𝛼2 − 𝛼1 )𝑱𝑇 ⊗ 𝑰𝐾𝑁 + (𝛼1 − 𝜌1 )𝑱𝑇
set of eigenvalues. The inverse is represented as
⊗ 𝑰𝐾 ⊗ 𝑱𝑁 + 𝜌1 𝑱𝑇𝐾𝑁 , (3)
1 𝜆 − 𝜆1 𝜆 − 𝜆3
𝑹−1 = 𝑰 − 2 𝑰 ⊗ 𝑱𝑁 + 2 𝑰 ⊗ 𝑱𝐾𝑁
where 𝑰𝑢 is a 𝑢 × 𝑢 identity matrix and 𝑱𝑢 = 𝟏𝑢 𝟏⊤
is a 𝑢 × 𝑢𝑢
𝑖 𝜆1 𝑇𝐾𝑁 𝑁𝜆1 𝜆2 𝑇𝐾 𝐾𝑁𝜆2 𝜆3 𝑇
matrix of ones. Evidently, the diagonal and off-diagonal ( )
1 1 1
𝐾𝑁 × 𝐾𝑁 blocks of 𝑹𝑖 are block exchangeable given by + − 𝑱 ⊗ 𝑰𝐾𝑁
𝑇 𝜆4 𝜆1 𝑇
( )
𝜆2 − 𝜆1 𝜆5 − 𝜆4
1 are associated with smaller required sample size when
+ − 𝑱 𝑇 ⊗ 𝑰𝐾 ⊗ 𝑱 𝑁
𝑇𝑁𝜆1 𝜆2 𝑁𝜆4 𝜆5 𝜏𝑋 < (𝜆62 − 𝜆32 )∕{𝜆62 + (𝑇 − 1)𝜆32 }.
( )
1 𝜆5 − 𝜆6 𝜆2 − 𝜆3 Theorem 1 reveals that the variance of the intervention
+ − 𝑱𝑇𝐾𝑁 .
𝑇𝐾 𝑁𝜆5 𝜆6 𝑁𝜆2 𝜆3 effect in a linear mixed model is free of the secular trend,
as long as it is adjusted for in the model. The variance (5)
Lemmas 1 and 2 derive the eigenvalues as well as an
also depends on the five ICCs only through two eigenval-
explicit inverse of the extended block exchangeable corre-
ues of the extended block exchangeable correlation matrix,
lation structure (derivation details and eigenvalue expres-
𝜆3 and 𝜆6 . Our results also confirm the different roles of the
sions under each design variant are provided in Web
within-period ICCs and the between-period ICCs. Specifi-
Appendix A and Web Table 1). By parameterizing 𝑹−1 as
𝑖 cally in the design phase, assuming larger values of 𝛼0 and
a function of the unique eigenvalues, the cumbersome
𝜌0 will only lead to a conservative sample size estimate.
expressions on individual ICCs are avoided in deriving an
Likewise, assuming smaller values of between-period ICCs
explicit variance expression of the intervention effect.
or even ignoring them by considering 𝛼2 = 𝛼1 = 𝜌1 = 0
Assuming the variance components are known, the fea-
will lead to a conservative sample size estimate if the con-
sible generalized least squares estimator for 𝜽 is given
∑𝐼 −1 ∑𝐼 straint on 𝜏𝑋 holds. These directional results can guide
by 𝜽ˆ = ( 𝑖=1 𝒁𝑖⊤ 𝑹−1 𝑖
𝒁𝑖 ) ( 𝑖=1 𝒁𝑖⊤ 𝑹−1𝑖
𝒀𝑖 ), where 𝒁𝑖 = decisions on design parameters to avoid an underpowered
(𝑰𝑇 , 𝑿𝑖 ) ⊗ 𝟏𝐾𝑁 is the design matrix for fixed effects and trial in the absence of accurate ICC estimates for an SW-
⊤
𝑿𝑖 = (𝑋𝑖1 , … , 𝑋𝑖𝑇 ) is the vector of cluster-level interven- CRT with subclusters. Finally, while we assume design (A),
tion indicators. As the number of clusters becomes large, variance expressions for design (B) or (C) can be easily
𝜽ˆ follows a multivariate normal distribution with mean 𝜽 obtained by enforcing 𝛼2 = 𝛼1 , or 𝛼2 = 𝛼1 = 𝜌1 in comput-
∑𝐼 −1
and covariance matrix 𝜎2 ( 𝑖=1 𝒁𝑖⊤ 𝑹−1 𝑖
𝒁𝑖 ) , whose (𝑇 + ing the two eigenvalues.
1, 𝑇 + 1)th element corresponds to the variance of the As expected from the generality of model (1), our
intervention effect estimator. Based on Lemma 2, we show variance expression (5) includes a number of variances
in Web Appendix A that a closed-form expression of var(𝛿) ˆ previously derived as special cases. First, we observe
exists, which reveals the impact of various ICC parameters that the variance (5) can be alternatively represented
on sample size and power. We summarize the results in by
Theorem 1.
ˆ = (𝜎2 ∕𝐾𝑁)𝐼𝑇𝜆6 𝜆3
var(𝛿) , (6)
Theorem 1. Assuming known variance components, the (𝑈 2 + 𝐼𝑇𝑈 − 𝑇𝑊 − 𝐼𝑉)𝜆6 − (𝑈 2 − 𝐼𝑉)𝜆3
closed-form variance of the intervention effect estimator
based on a linear mixed model with a Gaussian outcome is where
( 𝑇 )2
ˆ = 𝜎2 𝑇𝜆6 𝜆3 ∑ 𝑇
𝐼 ∑ 𝐼
∑ ∑
var(𝛿) × , 𝑈= 𝑋𝑖𝑗 , 𝑉 = 𝑋𝑖𝑗 ,
𝐼𝐾𝑁𝑡𝑟(𝛀) 𝑇𝜆6 − {1 + (𝑇 − 1)𝜏𝑋 }(𝜆6 − 𝜆3 )
𝑖=1 𝑗=1 𝑖=1 𝑗=1
(5)
and
where
𝑇
( 𝐼 )2
( )( ) ∑ ∑
𝐼
∑ 𝐼
∑ 𝐼
∑ 𝑊= 𝑋𝑖𝑗
𝛀= 𝐼 −1 𝑿𝑖 𝑿𝑖⊤ − 𝐼 −1 𝑿𝑖 𝐼 −1 𝑿𝑖⊤ 𝑗=1 𝑖=1
𝑖=1 𝑖=1 𝑖=1
are typical design constants that only depend on the
is the covariance matrix of the intervention vector under sequence of treatment indicators for each cluster. The con-
−1
a specific design and 𝜏𝑋 = {(𝑇 − 1)𝑡𝑟(𝛀)} {𝟏⊤ 𝑇 𝛀𝟏𝑇 −
nection between the two variance expressions is made
𝑡𝑟(𝛀)} ∈ [−1, 1] is the generalized ICC of the intervention, clear through the observance that 𝟏⊤ −2
𝑇 𝛀𝟏𝑇 = 𝐼 (𝐼𝑉 − 𝑈 )
2
−2
and 𝑡𝑟(𝛀) = 𝐼 (𝐼𝑈 − 𝑊). Using this equivalent variance
which is the ratio of average covariance over the average
variance and measures the similarity between the inter- expression (6), we see that in the absence of subclus-
vention status for each cluster in different periods (Kistner ters, the variance derived in Li et al. (2018) under a
closed-cohort SW-CRT is obtained by setting 𝜌0 = 𝛼0 and
and Muller, 2004). With all other design parameters
𝜌1 = 𝛼1 ; the variance of the Hooper et al. (2016) linear
fixed, larger values of the within-period ICCs, {𝛼0 , 𝜌0 }, are mixed model under a cross-sectional SW-CRT can be
always associated with larger required sample size, whereas obtained by additionally requiring 𝛼2 = 𝛼1 ; whereas the
larger values of the between-period ICCs, {𝛼1 , 𝜌1 , 𝛼2 }, basic Hussey and Hughes (2007) linear mixed model is
obtained by equating all five ICCs. In the presence of 3.2 Non-Gaussian exponential family
subclusters, our variance expression also includes that outcomes
derived in Teerenstra et al. (2019) as a special case when
we assume 𝜌1 = 𝜌0 and 𝛼1 = 𝛼0 . However, we caution
When the outcome is binary or count, Theorem 1 is not
against this simplification assumption, because based on
(6), we observe that lim𝑁→∞ (𝜆3 ∕𝑁) = (𝛼0 − 𝛼1 ) + (𝐾 − directly applicable because the residual variance function
1)(𝜌0 − 𝜌1 ), and lim𝑁→∞ (𝜆6 ∕𝑁) = 𝛼0 + (𝑇 − 1)𝛼1 + (𝐾 − of the outcome is no longer a constant. Specifically, with a
1){𝜌0 + (𝑇 − 1)𝜌1 } = 𝜅(𝛼0 , 𝛼1 , 𝜌0 , 𝜌1 ), which leads to a lim- link function 𝑔, we can rewrite the conditional mean of the
iting variance outcome 𝑌𝑖𝑗𝑘𝑙 from (1) as
ˆ
lim var(𝛿) 𝜇𝑖𝑗𝑘𝑙 = 𝑔−1 (𝜂𝑖𝑗𝑘𝑙 ) = 𝑔−1 (𝛽𝑗 + 𝛿𝑋𝑖𝑗 + 𝑏𝑖
𝑁→∞
+ 𝑐𝑖𝑘 + 𝑠𝑖𝑗 + 𝜋𝑖𝑗𝑘 + 𝛾𝑖𝑘𝑙 ), (7)
(𝜎2 ∕𝐾){(𝛼0 − 𝛼1 ) + (𝐾 − 1)(𝜌0 − 𝜌1 )}𝐼𝜅(𝛼0 , 𝛼1 , 𝜌0 , 𝜌1 )
= .
(𝐼𝑈 − 𝑊)𝜅(𝛼0 , 𝛼1 , 𝜌0 , 𝜌1 ) + (𝑈 2 − 𝐼𝑉){𝛼1 + (𝐾 − 1)𝜌1 }
where the fixed and random effects are defined in Sec-
Therefore, assuming the between-period ICCs are equal to tion 2.2. We further define the conditional variance of the
the within-period ICCs, the limiting variance approaches outcome as 𝜙𝜁(𝜇𝑖𝑗𝑘𝑙 ), where 𝜙 is a common dispersion.
zero and the required number of clusters approaches one Without loss of generality, we assume 𝜙 = 1 but the fol-
as the subcluster size increases indefinitely, similar to lowing procedure applies to arbitrary 𝜙 > 0. For example,
the discussion in Taljaard et al. (2016) without subclus- the variance function of a binary outcome is parameterized
ters. From this perspective, it is reasonable to differenti- as 𝜁(𝜇𝑖𝑗𝑘𝑙 ) = 𝜇𝑖𝑗𝑘𝑙 (1 − 𝜇𝑖𝑗𝑘𝑙 ). To approximate the large-
ate the between-period ICCs from the within-period ICCs sample variance of the maximum likelihood estimator for
ˆ we linearize model (7) by a first-order Taylor expansion
𝛿,
and avoid the risks associated with too few clusters in the
design phase. Furthermore, based on the proposed analyti- about the estimated fixed- and random-effects (Breslow
cal variance (6), we show in Web Appendix B that the most and Clayton, 1993; Amatya and Bhaumik, 2018) such that
efficient SW-CRT with subclusters allocates more clusters
𝒀𝑖𝑗 = 𝝁 ˆ +𝚫
ˆ 𝑖𝑗 𝒁𝑖𝑗 (𝜽 − 𝜽)
ˆ𝑖𝑗 + 𝚫 ˆ 𝑖𝑗 𝟏𝐾𝑁 (𝑏𝑖 − 𝑏ˆ𝑖 )
to the intervention during the second and last period.
Finally, while we have thus far focused on stepped ˆ 𝑖𝑗 (𝑰𝐾 ⊗ 𝟏𝑁 )(𝒄𝑖𝑗 − 𝒄ˆ𝑖𝑗 ) + 𝚫
+𝚫 ˆ 𝑖𝑗 𝟏𝐾𝑁 (𝑠𝑖𝑗 − ˆ
𝑠𝑖𝑗 )
wedge designs with a staggered allocation of interven-
tion to clusters, the derivations of Theorem 1 do not ˆ 𝑖𝑗 (𝑰𝐾 ⊗ 𝟏𝑁 )(𝝅𝑖𝑗 − 𝝅
+𝚫 ˆ 𝑖𝑗 (𝜸𝑖𝑗 − 𝜸ˆ𝑖𝑗 ) + 𝝐𝑖𝑗 , (8)
ˆ𝑖𝑗 ) + 𝚫
involve restrictions on 𝑿𝑖 , and therefore expression (5)
−1
applies more generally to any multiperiod CRT, including where 𝚫𝑖𝑗 = diag(Δ𝑖𝑗11 , … , Δ𝑖𝑗𝐾𝑁 ) = {𝜕𝑔−1 (𝜼𝑖𝑗 )∕𝜕𝜼⊤
𝑖𝑗
} is
the longitudinal parallel-arm and repeated crossover ⊤
a diagonal matrix of derivatives, 𝜼𝑖𝑗 = (𝜂𝑖𝑗11 , … , 𝜂𝑖𝑗𝐾𝑁 ) ,
designs. Clearly, variance (5) reveals that higher effi- ⊤
𝝁𝑖𝑗 = (𝜇𝑖𝑗11 , … , 𝜇𝑖𝑗𝐾𝑁 ) , 𝒁𝑖𝑗 = (𝒆𝑗 , 𝑋𝑖𝑗 ) ⊗ 𝟏𝐾𝑁 (𝒆𝑗 is the
ciency is achieved with a larger total variance of the ⊤ ⊤
intervention status, 𝑡𝑟(𝛀), and a smaller generalized 𝑗th row of 𝑰𝑇 ), 𝒄𝑖𝑗 = (𝑐𝑖1 , … , 𝑐𝑖𝐾 ) , 𝝅𝑖𝑗 = (𝜋𝑖𝑗1 , … , 𝜋𝑖𝑗𝐾 ) ,
⊤ ⊤
ICC of the intervention, 𝜏𝑋 , both of which a specific 𝜸𝑖𝑗 = (𝛾𝑖11 , … , 𝛾𝑖𝐾𝑁 ) , 𝝐𝑖𝑗 = (𝜖𝑖𝑗11 , … , 𝜖𝑖𝑗𝐾𝑁 ) , and
design will implicitly characterize. For example, with var(𝜖𝑖𝑗𝑘𝑙 ) = 𝜁(𝜇𝑖𝑗𝑘𝑙 ). Therefore, if we define the vec-
𝑇 = 5 periods and 𝐼 as a multiple of (𝑇 − 1), a standard tor of pseudo-outcomes as 𝒀𝑖𝑗∗ = 𝚫 ˆ −1 (𝒀𝑖𝑗 − 𝝁 ˆ𝑖𝑗 ) + 𝜼ˆ𝑖𝑗 and
𝑖𝑗
stepped wedge design corresponds to 𝑡𝑟(𝛀) = 0.625 and rearrange the terms in (8), we obtain an approximate linear
𝜏𝑋 = 0.25, a parallel longitudinal design corresponds to mixed model with 𝒀𝑖𝑗∗ = 𝜼𝑖𝑗 + 𝝐𝑖𝑗∗ , with a modified random
𝑡𝑟(𝛀) = 1.25 and 𝜏𝑋 = 1, whereas a repeated crossover residual error 𝝐 ∗ = 𝚫 ˆ −1 𝝐𝑖𝑗 . Define the collection of all
𝑖𝑗 𝑖𝑗
design engenders 𝑡𝑟(𝛀) = 1.25 and 𝜏𝑋 = −0.2. In Web ⊤
Table 2, we show that a repeated crossover design and pseudo-outcomes in cluster 𝑖 as 𝒀𝑖∗ = (𝒀𝑖1
∗⊤ ∗⊤
, … , 𝒀𝑖𝑇 ) , we
a parallel longitudinal design have the same values of show in Web Appendix C that
𝑡𝑟(𝛀), which is generally larger than that under a standard { }
stepped wedge design. Furthermore, the generalized 𝑽𝑖 = cov(𝒀𝑖∗ ) ≈ 𝔼 𝚫−1
𝑖
𝜁(𝝁𝑖 )𝚫−1
𝑖
ICC of the intervention indicator is maximal under a + 𝜎𝜋2 (𝑰𝑇𝐾 ⊗ 𝑱𝑁 ) + 𝜎𝑠2 (𝑰𝑇 ⊗ 𝑱𝐾𝑁 ) + 𝜎𝛾2 (𝑱𝑇 ⊗ 𝑰𝐾𝑁 )
longitudinal parallel-arm design, whereas 𝜏𝑋 ∈ (0, 1) + 𝜎𝑐2 (𝑱𝑇 ⊗ 𝑰𝐾 ⊗ 𝑱𝑁 ) + 𝜎𝑏2 𝑱𝑇𝐾𝑁 , (9)
in a standard stepped wedge design. However, the gen-
eralized ICC, 𝜏𝑋 , is often negative under a repeated with 𝚫𝑖 = ⊕𝑇𝑗=1 𝚫𝑖𝑗 where “⊕” is a block diagonal operator
crossover design. Such insights could facilitate the com- with nonzero matrices along the diagonal and zero val-
parison of relative efficiency for alternative designs with ues elsewhere, and the expectation is taken over the
subclusters. distribution of all the random effects. In general,
T A B L E 2 Estimated required number of clusters 𝐼, subclusters per cluster 𝐾, participants per subcluster 𝑁, periods 𝑇, empirical type I
error (Test Size), empirical power (Empirical), and predicted power (Predicted) obtained from sample size formula for given effect size 𝛿∕𝜎,
within-period ICCs for within- and between-subcluster (𝛼0 , 𝜌0 ), and between-period ICCs for within- and between-subcluster (𝛼1 , 𝜌1 )
assuming a cluster autocorrelation of 0.5, when outcome is Gaussian (n = 1000)
𝜹∕𝝈 (𝜶𝟎 , 𝝆𝟎 ) (𝜶𝟏 , 𝝆𝟏 ) 𝑰 𝑲 𝑵 𝑻 Test Size Empirical Predicted
0.1 (0.03, 0.0075) (0.015, 0.00375) 24 6 15 7 3.6 88.2 85.3
(0.01, 0.0025) (0.005, 0.00125) 30 6 15 4 4.3 82.5 82.2
24 5 10 7 4.4 82.9 81.4
0.2 (0.1, 0.025) (0.05, 0.0125) 24 6 10 4 4.5 85.4 83.3
18 3 12 7 3.9 84.0 81.8
(0.03, 0.0075) (0.015, 0.00375) 18 3 15 4 3.2 81.3 80.0
15 3 10 6 4.0 81.1 80.8
(0.01, 0.0025) (0.005, 0.00125) 12 6 10 4 2.8 82.2 82.6
10 4 10 6 2.7 79.4 80.0
0.25 (0.1, 0.025) (0.05, 0.0125) 21 4 10 4 5.2 83.7 84.6
18 2 10 7 4.6 83.2 83.5
(0.03, 0.0075) (0.015, 0.00375) 15 4 8 4 2.7 82.9 81.4
12 2 10 7 3.5 81.0 80.2
(0.01, 0.0025) (0.005, 0.00125) 24 2 8 4 4.1 83.7 84.3
10 3 9 6 1.8 83.6 83.6
0.35 (0.1, 0.025) (0.05, 0.0125) 12 4 9 4 2.8 84.5 83.2
10 3 8 6 1.9 83.6 82.9
(0.03, 0.0075) (0.015, 0.00375) 9 3 12 4 2.4 84.3 83.5
16 2 5 5 3.0 86.9 84.0
(0.01, 0.0025) (0.005, 0.00125) 9 3 9 4 2.5 82.1 82.9
8 3 7 5 1.7 77.7 80.0
0.4 (0.1, 0.025) (0.05, 0.0125) 18 2 7 4 3.2 87.9 86.2
12 2 8 5 3.5 80.8 82.0
(0.03, 0.0075) (0.015, 0.00375) 9 3 8 4 1.4 80.8 82.5
8 3 7 5 1.1 84.3 83.5
(0.01, 0.0025) (0.005, 0.00125) 15 2 5 4 3.3 81.2 83.3
12 2 5 5 1.8 85.4 85.1
0.5 (0.1, 0.025) (0.05, 0.0125) 12 2 7 4 3.3 82.6 84.7
12 2 4 5 3.2 84.3 82.5
(0.03, 0.0075) (0.015, 0.00375) 9 2 8 4 1.8 87.4 85.4
𝔼{𝚫−1 𝜁(𝝁𝑖 )𝚫−1 } can be computed via numerical inte- { 𝐼 }

𝑖 𝑖 ∑ ( ⊤ ) −1
⊤
gration and depends on the conditional mean of outcomes × 𝑰𝑇 ⊗ 𝟏𝐾𝑁 𝑽𝑖 (𝑰𝑇 ⊗ 𝟏𝐾𝑁 ) −1
𝑖=1
through the secular trend and intervention status, thus
𝑽𝑖 will be cluster specific. The approximate covariance { 𝐼 }]
( )∑
matrix of the estimator for fixed-effects parameters based × 𝑰𝑇⊤ ⊗ 𝟏⊤
𝐾𝑁 𝑽𝑖−1 (𝑿𝑖 ⊗ 𝟏𝐾𝑁 ) −1 .
∑𝐼 −1 𝑖=1
on the pseudo-outcomes is then ( 𝑖=1 𝒁𝑖⊤ 𝑽𝑖−1 𝒁𝑖 ) ,
whose (𝑇 + 1, 𝑇 + 1)th element is In the design stage, computation of the above variance
[ requires us to invert the 𝑇𝐾𝑁 × 𝑇𝐾𝑁 variance matrix,
𝐼
∑ ( ⊤ ) −1 ( ) 𝑽𝑖 , for each cluster, which could require substantial
ˆ =
var(𝛿) 𝑿𝑖 ⊗ 𝟏⊤
𝐾𝑁 𝑽𝑖 𝑿𝑖 ⊗ 𝟏𝐾𝑁
computational time as the subcluster size or the number
𝑖=1
of subclusters become large. However, because the fixed
{ 𝐼 }
∑ ( ⊤ ) −1 ( ) effects in (7) only depends on each cluster period, we
− 𝑿𝑖 ⊗ 𝟏⊤
𝐾𝑁 𝑽 𝑖
𝑰 𝑇 ⊗ 𝟏 𝐾𝑁 provide an equivalent but computationally more efficient
𝑖=1
variance expression in the following lemma.
Lemma 3. The variance of the intervention effect estimator than variance components to consider in the design phase,
in the GLMM (7) is equivalently written as we could use the latent response formulation and define
{ ( 𝐼 ) the total variance as 𝜎2 = 𝜎𝑏2 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝛾2 + 𝜋2 ∕3,
𝐼
∑ ∑ where 𝜋2 ∕3 is the variance of the standard logistic distri-
ˆ =
var(𝛿) ˜ −1 𝑿𝑖
𝑿𝑖⊤ 𝑽 − ˜ −1
𝑿𝑖⊤ 𝑽
𝑖 𝑖 bution (Eldridge et al., 2009). This formulation allows us to
𝑖=1 𝑖=1 ˆ with 𝜎2 and the five ICCs (Table 1).
reparameterize var(𝛿)
( 𝐼
)−1 ( 𝐼 )}
∑ ∑ Specifically, using Lemma 3 the variance matrix under a
× ˜ −1
𝑽 ˜ −1 𝑿𝑖
𝑽 −1 , (10) binary outcome with a logit link can be represented by
𝑖 𝑖
𝑖=1 𝑖=1
(𝜆 − 𝜆1 )𝜎2 (𝜆 − 𝜆3 )𝜎2
˜ −1 ˜ 𝑖 = 1 𝑬𝑖 + 3
𝑽 𝑰𝑇 + 6 𝑱𝑇 , (11)
where 𝑽𝑖 = (𝐾𝑁) 𝑬𝑖 + (𝐾 −1 𝜎𝜋2 + 𝜎𝑠2 )𝑰𝑇 + {𝜎𝑏2 + 𝐾𝑁 𝐾𝑁 𝑇𝐾𝑁
−1 2 −1 2
𝐾 𝜎𝑐 + (𝐾𝑁) 𝜎𝛾 }𝑱𝑇 is the 𝑇 × 𝑇 matrix charac-
with
terizing the covariance of the cluster period means of
∗ −1 { }
the pseudo-outcomes 𝒀 𝑖 = (𝐾𝑁) (𝑰𝑇 ⊗ 𝟏⊤ ∗
𝐾𝑁 )𝒀𝑖 , and (1 − 𝜆1 )𝜎2 {
𝑬𝑖 = diag[𝔼{Δ𝑖111 𝜁(𝜇𝑖111 )Δ𝑖111 }, … , 𝔼{Δ𝑖𝑇11 𝜁(𝜇𝑖𝑇11 )Δ−1
−1 −1 −1
}]. 𝑬𝑖 = 2𝑰𝑇 + 2 exp diag cosh (𝛽1 + 𝑋𝑖1 𝛿), … ,
𝑖𝑇11 2
}
By providing an equivalent variance expression, cosh (𝛽𝑇 + 𝑋𝑖𝑇 𝛿) , (12)
Lemma 3 indicates that one only needs to invert a set
of 𝑇 × 𝑇 matrices to obtain the variance for power cal- and 𝜆1 , 𝜆3 , and 𝜆6 are three eigenvalues of the extended
culation for general exponential family outcomes with block exchangeable correlation matrix defined in
a certain mean-variance structure and alleviates the Lemma 1. Unlike Theorem 1, Equations (11) and (12)
computational burden associated with inverting a series suggest the variance of the intervention effect in a logistic
of 𝑇𝐾𝑁 × 𝑇𝐾𝑁 matrices. While Lemma 3 is easy to linear mixed model depends on the ICCs through an
verify when 𝑽𝑖 has a closed-form inverse (as when the additional eigenvalue 𝜆1 . In Web Figure 1, we study the
variance function 𝜁(𝜇𝑖𝑗𝑘𝑙 ) ∝ 1), it is not trivial for general relationship between the five ICCs and var(𝛿)ˆ and confirm
variance functions and requires us to exploit the block that the observations in Theorem 1 remain valid under
structure of 𝔼{𝚫−1 𝑖
𝜁(𝝁𝑖 )𝚫−1
𝑖
} under a stepped wedge a logistic linear mixed model with the exception of the
design; the detailed proof is provided in Web Appendix C. within-subject autocorrelation, 𝛼2 , for which larger values
Finally, Lemma 3 implies that var(𝛿) ˆ can be equivalently will lead to more conservative sample size estimates. In
considered as that obtained from a linear mixed∗ model Web Appendix D, we provide the expression of 𝑬𝑖 for
for the cluster period mean pseudo-outcomes 𝒀 𝑖 and is a count outcome with canonical log link and gamma
therefore an extension of the results in Li et al. (2021) outcome with a canonical inverse link which can be easily
to more complex correlation structures, under equal computed without complex numerical integration.
subcluster sizes.
From variance expression (10), it is critical to compute
the nonzero diagonal elements of 𝑬𝑖 , where the expecta- 3.3 Extension to unequal cluster sizes
tion involves integration over all the random effects. While
numerical or Monte Carlo integration can be used as a gen- In multilevel SW-CRTs, it may not always be realistic
eral solution, the computation of these terms can be con- to assume equal cluster sizes for power calculation. To
siderably simplified under canonical link functions. For extend our procedure to unequal cluster sizes, we assume
example, with a binary outcome and a canonical logit link, the number of subclusters and the subcluster size only
we can use the property of the Gaussian moment generat- vary across clusters but remain constant within each clus-
ing function to obtain the 𝑗th diagonal element of 𝑬𝑖 as ter over time; namely, 𝐾𝑖𝑗 = 𝐾𝑖 and 𝑁𝑖𝑗𝑘 = 𝑁𝑖 . A similar
assumption was made in previous studies incorporating
{ }
𝔼 Δ−1 𝜁(𝜇 )Δ−1 unequal cluster sizes in sample size calculations for SW-
𝑖𝑗11 𝑖𝑗11 𝑖𝑗11
CRTs (Harrison et al., 2019; Matthews, 2020). This assump-
( 2 )
𝜎𝑏 + 𝜎𝑐2 + 𝜎𝑠2 + 𝜎𝜋2 + 𝜎𝛾2 tion is considered appropriate because the efficiency of
= 2 + 2 exp SW-CRTs is primarily driven by the between-cluster imbal-
2
( ) ance rather than the within-cluster imbalance over time
× cosh 𝛽𝑗 + 𝑋𝑖𝑗 𝛿 , (Tian et al., 2021). For given 𝐼 and 𝑇 and assuming 𝐾𝑖 ∼
𝑓𝐾 (𝐾𝑖 ; 𝐾, CV𝐾 ) and 𝑁𝑖 ∼ 𝑓𝑁 (𝑁𝑖 ; 𝑁, CV𝑁 ), where 𝑓(∙; 𝑎, 𝑏)
where cosh(𝑡) = (𝑒𝑡 + 𝑒−𝑡 )∕2 is the hyperbolic cosine func- represents a valid density or mass function with mean 𝑎
tion. In cases when the ICCs are more intuitive parameters and coefficient of variation (CV) 𝑏, the expected variance
of 𝛿ˆ can be used for sample size determination and is given 𝛿𝑋𝑖𝑗 − 𝑏𝑖 − 𝑐𝑖𝑘 − 𝑠𝑖𝑗 − 𝜋𝑖𝑗𝑘 )}. Following Li et al. (2018), we
by assumed a slightly decreasing secular trend on the logit
scale with a baseline prevalence of 0.7 such that 𝛽1 =
𝐼 𝑗−1
∏ 1∕{1 + exp(−0.7)} and 𝛽𝑗 − 𝛽𝑗+1 = 0.1 × (0.5) for 𝑗 ≥ 1.
ˆ =
var(𝛿) … ˆ
var(𝛿|)
∫ ∫ To specify the variance components, we employ the latent
𝑖=1
{ } response formulation in Section 3.2 by setting the residual
× 𝑓𝐾 (𝐾𝑖 ; 𝐾, CV𝐾 )𝑓𝑁 (𝑁𝑖 ; 𝑁, CV𝑁 )𝑑𝐾𝑖 𝑑𝑁𝑖 , variance as 𝜋2 ∕3 and mapping the three sets of ICCs con-
sidered for Gaussian outcomes to the variance components
(13) on the logit scale (Web Appendix F).
We simulated standard stepped wedge designs such that
where  = {(𝐾𝑖 , 𝑁𝑖 ), 𝑖 = 1, … , 𝐼} represents a specific an equal number of clusters crossed over to treatment at
design with unequal cluster sizes and var(𝛿|) ˆ can be a randomly assigned step. Motivated by a recent system-
efficiently computed based on a modified version of atic review of SW-CRTs (Grayling et al., 2017) and assum-
Lemma 3 provided in Web Appendix E. To obtain var(𝛿), ˆ
ing equal cluster sizes, we varied the number of clusters
one can impose any sensible parametric distributions for (𝐼) from eight to 30, the number of subclusters per cluster
𝑓𝐾 and 𝑓𝑁 and approximate (13) via Monte Carlo integra- (𝐾) from two to six, the number of periods (𝑇) from four
tion. For example, one can assume 𝑓𝐾 and 𝑓𝑁 as gamma to seven, and allowed a maximum of 15 subjects per sub-
distributions with specified mean and CV. A total of 𝑅 sets cluster (𝑁). For Gaussian outcomes, we assumed standard-
of cluster size configurations, {(𝑟) , 𝑟 = 1, … , 𝑅}, are then ized effect sizes, 𝛿∕𝜎 ∈ {0.1, 0.2, 0.25, 0.35, 0.4, 0.5}, and for
drawn from the specified gamma distributions (rounded to binary outcomes, we considered effect sizes on the odds
nearest integer) to obtain var(𝛿) ˆ ≈ 𝑅−1 ∑𝑅 var(𝛿| ˆ (𝑟) ).
𝑟=1 ratio scale, exp(𝛿) ∈ {0.8, 0.75, 0.7, 0.65, 0.6, 0.5}. Each spe-
By averaging over 𝑅 designs (e.g., 𝑅 = 1000), var(𝛿) ˆ cific parameter combination was selected to ensure the
accounts for the anticipated variations in cluster sizes and predicted power is at least 80% based on a two-sided 5%
can be used in (2) for power calculation. Finally, when level Wald test. For a Gaussian outcome, the predicted
CV𝐾 = CV𝑁 = 0, the above procedure coincides with power is based on Equation (2) and Theorem 1; for a binary
that in Sections 3.1 and 3.2 as both 𝑓𝐾 and 𝑓𝑁 degenerate outcome, the predicted power is based on Equation (2),
to a point mass. Additional details are provided in Web Lemma 3, and Equations (11) and (12). The empirical power
Appendix E. of the test is obtained as the proportion correctly reject-
ing 𝐻0 over 1000 simulated SW-CRTs, and the agreement
between the predicted and empirical power was used to
4 SIMULATION STUDY confirm the accuracy of the proposed method. Finally, we
assessed the empirical type I error rate to confirm the
We conducted a simulation study to assess the accu- validity of the Wald test. For this purpose, we simply set
racy of our sample size procedures for a Gaussian out- 𝛿∕𝜎 = 0 with a Gaussian outcome and exp(𝛿) = 1 with
come and a binary outcome. For illustration, we focused a binary outcome. Further, in Web Tables 3 and 4, we
on a closed-cohort design at the subcluster level and a compare the predicted power assuming correctly CAC =
cross-sectional design at the subject level (design (B)). 0.5 with the predicted power assuming equal within-
We generated correlated Gaussian outcomes based on the and between-period ICCs (incorrectly assuming CAC = 1).
linear mixed model, 𝑌𝑖𝑗𝑘𝑙 = 𝛽𝑗 + 𝛿𝑋𝑖𝑗 + 𝑏𝑖 + 𝑐𝑖𝑘 + 𝑠𝑖𝑗 + Our results show that incorrectly assuming equal within-
𝜋𝑖𝑗𝑘 + 𝜖𝑖𝑗𝑘𝑙 , by constraining the total variance components and between-period ICCs typically leads to overly confi-
𝜎2 = 1. Given 𝜎2 , we consider three sets of ICCs (recall dent power predictions. Finally, to assess the accuracy of
that 𝛼2 = 𝛼1 under design (B)), (𝛼0 , 𝛼1 , 𝜌0 , 𝜌1 ) = {(0.1, our approach in Section 3.3 with unequal cluster sizes,
0.05, 0.025, 0.0125), (0.03, 0.015, 0.0075, 0.00375), (0.01, we chose four typical scenarios and draw 𝐾𝑖 and 𝑁𝑖 from
0.005, 0.0025, 0.00125)}, which determines the value of gamma distributions with CV𝐾 ∈ {0, 0.25, 0.5} and CV𝑁 ∈
each variance component. The within-period ICCs were {0, 0.25, 0.5, 0.75, 1.0}.
chosen to mimic commonly reported values in parallel-
arm CRTs, and the between-period ICCs correspond to
a CAC of 0.5. By Theorem 1, var(𝛿) ˆ is invariant to the
period effects, and therefore we only considered a gen- 4.1 Simulation results for Gaussian
tly increasing secular trend with 𝛽1 = 0 and 𝛽𝑗+1 − 𝛽𝑗 = outcomes
𝑗−1
0.1 × (0.5) for 𝑗 ≥ 1. On the other hand, we generated
correlated binary outcomes from a Bernoulli distribution We present the empirical type I error rate and empirical
with 𝑌𝑖𝑗𝑘𝑙 ∼ Bern(𝜇𝑖𝑗𝑘𝑙 ), with 𝜇𝑖𝑗𝑘𝑙 = 1∕{1 + exp(−𝛽𝑗 − and predicted power of the Wald test under each scenario
T A B L E 3 Estimated required number of clusters 𝐼, subclusters per cluster 𝐾, participants per subcluster 𝑁, periods 𝑇, empirical type I
error (Test Size), empirical power (Empirical), and predicted power (Predicted) obtained from sample size formula for given effect size exp(𝛿),
within-period ICCs for within- and between-subcluster (𝛼0 , 𝜌0 ), and between-period ICCs for within- and between-subcluster (𝛼1 , 𝜌1 )
assuming a cluster autocorrelation of 0.5, when outcome is binary with canonical logit link (n = 1000)
exp(𝜹) (𝜶𝟎 , 𝝆𝟎 ) (𝜶𝟏 , 𝝆𝟏 ) 𝑰 𝑲 𝑵 𝑻 Test Size Empirical Predicted
0.8 (0.03, 0.0075) (0.015, 0.00375) 18 6 15 7 4.2 82.2 80.7
(0.01, 0.0025) (0.005, 0.00125) 27 6 15 4 5.2 85.3 84.2
25 4 12 6 2.7 81.2 81.0
0.75 (0.1, 0.025) (0.05, 0.0125) 25 6 15 6 5.2 85.8 82.8
24 5 15 7 4.6 88.1 83.1
(0.03, 0.0075) (0.015, 0.00375) 27 5 12 4 5.1 83.4 80.6
30 3 10 6 5.2 84.0 83.3
(0.01, 0.0025) (0.005, 0.00125) 21 6 10 4 2.8 82.0 80.5
12 4 15 7 2.4 84.6 81.5
0.7 (0.1, 0.025) (0.05, 0.0125) 30 5 14 4 4.4 82.9 82.3
18 4 15 7 4.2 86.7 81.7
(0.03, 0.0075) (0.015, 0.00375) 18 6 10 4 4.8 82.7 80.6
15 3 15 6 4.4 82.7 81.2
(0.01, 0.0025) (0.005, 0.00125) 18 4 12 4 3.6 83.2 82.3
20 2 15 5 3.9 81.5 81.8
0.65 (0.1, 0.025) (0.05, 0.0125) 21 6 12 4 4.1 88.3 83.6
18 3 12 7 3.9 87.6 84.1
(0.03, 0.0075) (0.015, 0.00375) 24 3 10 4 4.6 86.4 85.0
20 2 10 6 3.3 85.7 83.7
(0.01, 0.0025) (0.005, 0.00125) 15 4 10 4 2.5 84.8 82.7
12 3 14 5 3.1 85.4 85.2
0.6 (0.1, 0.025) (0.05, 0.0125) 18 5 10 4 3.6 85.9 82.3
12 3 15 7 4.6 88.1 82.8
(0.03, 0.0075) (0.015, 0.00375) 16 2 12 5 4.6 85.3 83.9
15 2 10 6 3.7 87.1 84.0
(0.01, 0.0025) (0.005, 0.00125) 21 2 10 4 3.7 84.6 85.5
12 3 8 5 2.4 81.6 80.0
0.5 (0.1, 0.025) (0.05, 0.0125) 15 3 10 4 3.0 86.9 83.2
16 2 9 5 3.3 84.7 82.5
(0.03, 0.0075) (0.015, 0.00375) 15 2 9 4 2.6 88.2 84.1
with a Gaussian outcome (Table 2). We used the restricted 4.2 Simulation results for binary
maximum likelihood estimator for model fitting, based on outcomes
which the Wald test is carried out. The empirical type I
error rate was generally conservative, and the empirical We present the empirical type I error rate, and empiri-
and predicted power were in agreement with the largest cal and predicted power of the Wald test under each sce-
difference within 3%. Similar results were found when the nario using the logistic linear mixed model fitted via the
Wald test was computed from (unrestricted) maximum Laplace approximation (Table 3). Similarly, the empiri-
likelihood estimation (Web Table 5). Lastly, in Web Tables cal type I error rate was well controlled under the nomi-
6 and 7 we present the simulation results under unequal nal level, and the empirical power was in agreement with
cluster sizes. The empirical type I error rate was conser- the predicted power, with the differences ranging from
vative overall, and differences between empirical and pre- −0.9% to 5.3%. This suggests that our sample size proce-
dicted power were within −4.4% and 3.5%. dure based on first-order Taylor expansion at most results
in slightly conservative power prediction. Results were based on the closed-form variance expression (5) for sam-
similar when penalized quasi-likelihood (which is compu- ple size determination. The details and illustrative calcula-
tationally more efficient) was used to fit the model (Web tions that reach the same power results in the LIRE trial are
Table 8). Lastly, in Web Tables 9 and 10, we present the provided in Web Appendix G. Further, in Web Appendix
simulation results under unequal cluster sizes. Overall, H we provide sample size determinations for the LIRE
the empirical type I error rate was adequately controlled trial under alternative cluster level designs. Finally, in Web
and differences between empirical and predicted power Appendix I we provide an application to the LIRE trial
were within −6.1% and 8.2%. Overall, these results con- assuming unequal cluster sizes using the expected vari-
firm that the procedure in Section 3.3 sufficiently captures ance expression (13).
the trend of the empirical power under unequal cluster
sizes.
5.2 Washington State EPT trial
In the Washington State EPT study, investigators were

5 APPLICATIONS TO SW-CRTs WITH
interested in evaluating whether an expedited partner
SUBCLUSTERS
therapy, the treatment of sex partners of people with
sexually transmitted infections without medical evalua-
5.1 Lumbar Imaging with Reporting of
tion, would decrease the risk of chlamydia reinfection
Epidemiology trial
(Golden et al., 2015). Since the chlamydia infection sta-
The LIRE trial aimed to evaluate the effect of adding preva- tus was binary, we illustrate the following calculations
lence data to spine imaging reports on subsequent spine- with a logistic linear mixed model. The study included
related health care utilization (Jarvik et al., 2015). The 𝐼 = 24 LHJ that were randomly assigned to interven-
study planned to randomize 𝐼 = 100 practices consisting tion at one of four steps (𝑇 = 5). Each LHJ includes
of a total of 1700 primary care providers (PCPs) over 𝑇 = 6 clinics that provide subject-level outcomes over time. Of
periods; each practice is a cluster and each PCP represents the clinics with repeated measures, over half were sampled
a subcluster. This is a closed-cohort design on the subclus- at each period; thus for simplicity, we assume a closed-
ter level but a cross-sectional design at the subject level cohort design at the clinic level and a cross-sectional
(design (B)). While the number of PCPs per practice var- design at the subject level (design (B)). A total of 219 clin-
ics participated in chlamydia testing, but to be conserva-
ied, we assume 𝐾 = 17 PCPs per practice for illustration.
tive we assume the number of clinics per LHJ is 𝐾 = 5.
The primary outcome was log-transformed spine-related
In the design of this study, investigators aimed to detect
relative value units (RVUs), a continuous composite mea-
a prevalence ratio of 0.7 and assumed a baseline preva-
sure of back pain. Assuming the median and total vari-
lence of 0.05. Because the outcome is rare, we assume the
ance of RVU is approximately 3.56 and 2.5 (Jarvik et al.,
effect size expressed as an odds ratio can be approximated
2020), a 5% reduction in median due to treatment corre-
by the prevalence ratio and obtain the required number of
sponds to a standardized effect size of around −0.1. Based
subjects per clinic (𝑁) to achieve at least 80% power for
on preliminary data, an overall ICC was estimated to be
a two-sided 5% test. Without distinguishing between clus-
0.013 with a 95% confidence interval of (0.00, 0.046). We
ters and subclusters, Li et al. (2021) estimated the within-
therefore assume the within-period within-PCP ICC to be
the upper bound of the preliminary estimates, 𝛼0 = 0.046, period ICC to be 0.007 and the between-period ICC to be
and a slightly smaller within-period between-PCP ICC of 0.004 based on marginal models. We consider these val-
𝜌0 = 0.04. Assuming a CAC of 0.5 further gives us 𝛼1 = ues to be the within-period between-clinic and between-
0.023 and 𝜌1 = 0.02. Based on (2) and our variance expres- period within-clinic ICCs (defined based on the latent
sion (5), we found having 𝑁 = 77 subjects per PCP leads to response formulation in Section 3.2) such that 𝜌0 = 0.007
87.5% power for a two-sided 5% test. and 𝛼1 = 0.004, and set the remaining ICCs to be 𝛼0 =
To assess the sensitivity of our power calculation to ICC 0.008 and 𝜌1 = 0.0035 (corresponding to a CAC of 0.5).
specifications, we looked at power trends for 𝛼0 ∈ (0, 0.1) We assume a slightly decreasing time effect as in our sim-
with various ratios of 𝜌0 ∕𝛼0 across varying levels of CAC ulations and find based on Equation (2), Lemma 3, and
(0.2, 0.5, 0.8). In concordance with our findings in The- Equations (11) and (12) that including 𝑁 = 42 subjects per
orem 1, larger within-period ICCs (𝛼0 , 𝜌0 ) and smaller clinic gives us 89.5% power. As a sensitivity analysis, we
considered a larger decreasing period effect such that 𝛽𝑗 −
between-period ICCs (𝛼1 , 𝜌1 ) correspond to more conser- 𝑗−1
vative power predictions (Figure 3 ), thus we are confident 𝛽𝑗+1 = 1 × (0.5) for 𝑗 ≥ 1, which increased 𝑁 = 139 to
that our ICC specifications likely produced a conservative attain 89.5% power. On the other hand, using a smaller
𝑗−1
power estimate. Alternatively, we have also derived a DE decreasing period effect, 𝛽𝑗 − 𝛽𝑗+1 = 0.01 × (0.5) for
F I G U R E 3 Contour plots illustrating the relationship between ICCs and power in our application study of the LIRE trial. ICCs include:
within-period within-subcluster 𝛼0 ; between-period within-subcluster 𝛼1 ; within-period between-subcluster 𝜌0 ; and between-period
between-subcluster 𝜌1 . Various 𝛼0 specifications are shown on the y-axis and various 𝜌0 specifications are shown on the x-axis as a ratio of 𝛼0 .
Between-period specifications are denoted by the CAC. Darker colors correspond to higher values of power
𝑗 ≥ 1, reduced our required number of subjects per clinic sample size determinations for the EPT study under alter-
to 𝑁 = 37 to achieve 89.3% power. native cluster level designs. Finally, in Web Appendix K we
We assessed the sensitivity of our power calculation provide an application to the EPT study assuming unequal
to ICC specifications by examining the power trends cluster sizes using the expected variance expression (13).
for varying 𝛼0 ∈ (0, 0.05) with various ratios of 𝜌0 ∕𝛼0
across varying levels of CAC (0.2, 0.5, 0.8) and time
𝑗−1
trends (𝛽𝑗 − 𝛽𝑗+1 = {0.01, 0.1, 1} × (0.5) for 𝑗 ≥ 1) (Web 6 CONCLUDING REMARKS
Figure 2). We found that larger within-period ICCs (𝛼0 ,
𝜌0 ) and smaller between-period ICCs (𝛼1 , 𝜌1 ) correspond In this study, we presented new sample size procedures
to more conservative power predictions, thus our current for SW-CRTs with subclusters. With a Gaussian outcome,
ICC specifications likely produced a conservative power we characterized an extended block exchangeable correla-
estimate. Furthermore, in Web Appendix J we provide tion structure induced by the random-effects assumption
of the linear mixed model. The extended block exchange- D A T A AVA I L A B I L I T Y S T A T E M E N T

able correlation structure is parameterized by at most five Data sharing is not applicable to this article as no new data
ICC parameters and intentionally differentiate between were created or analyzed in this paper.
the within-period and between-period ICCs to avoid unre-
alistically small sample size estimates (Taljaard et al., 2016). OPEN RESEARCH BADGES
We derived the variance of the intervention effect estima-
tor, which depends on the ICC parameters only through This article has earned an Open Materials badge for mak-
two eigenvalues of the extended block exchangeable cor- ing publicly available the components of the research
relation matrix. Assuming a GLMM with non-Gaussian methodology needed to reproduce the reported procedure
outcomes, we further presented a generic framework for and analysis. All materials are available at https://github.
approximating the variance of the intervention effect, and com/kldavisplourde/multilevelSWCRT.
specific examples are provided under the canonical link
functions. Of note, while our primary focus is GLMMs, ORCID
our methods can be extended to accommodate general- Kendra Davis-Plourde https://orcid.org/0000-0001-
ized estimating equations; the details are presented in Web 6654-0710
Appendix L. Fan Li https://orcid.org/0000-0001-6183-1893
While our sample size and variance expressions are
derived based on large sample approximation, many SW- REFERENCES
CRTs may have a limited number of clusters. Our simu- Amatya, A. and Bhaumik, D.K. (2018) Sample size determination
lation results suggest that with as few as eight clusters, for multilevel hierarchical designs using generalized linear mixed
a Wald 𝑡-test can preserve adequate type I error rate and models. Biometrics, 74, 673–684.
maintain sufficient power, thus validating our methods. Breslow, N.E. and Clayton, D.G. (1993) Approximate inference in gen-
To sum up, we reiterate two key contributions from this eralized linear mixed models. Journal of the American Statistical
work. First, our variance expressions are computationally Association, 88, 9–25.
convenient, thus obviating the need for simulation-based Eldridge, S.M., Ukoumunne, O.C. and Carlin, J.B. (2009) The intra-
cluster correlation coefficient in cluster randomized trials: a
power calculation; the latter approach can quickly become
review of definitions. International Statistical Review, 77, 378–
impractical in SW-CRTs with subclusters due to the need 394.
for searching across many design parameters, as well as the Ford, W.P. and Westgate, P.M. (2020) Maintaining the validity of infer-
associated computational cost for repeatedly fitting com- ence in small-sample stepped wedge cluster randomized trials
plex multilevel models. Second, we have characterized the with binary outcomes when using generalized estimating equa-
relationship between various ICCs and power in the pres- tions. Statistics in Medicine, 39, 2779–2792.
ence of subclusters and confirmed that assuming smaller Golden, M.R., Kerani, R.P., Stenger, M., Hughes, J.P., Aubin, M.,
Malinski, C., et al. (2015) Uptake and population-level impact of
between-period ICCs typically leads to larger and conser-
expedited partner therapy (EPT) on Chlamydia trachomatis and
vative sample sizes. In the presence of limited external data
Neisseria gonorrhoeae: the Washington state community-level ran-
to inform sample size calculations in SW-CRTs with sub- domized trial of EPT. PLoS Medicine, 12, e1001777.
clusters, it may therefore be prudent to assume smaller Grayling, M.J., Wason, J.M.S. and Mander, A.P. (2017) Stepped wedge
between-period ICC values. cluster randomized controlled trial designs : a review of reporting
quality and design features. Trials, 18, 1–13.
Harrison, D.A. and Brady, A.R. (2004) Sample size and power calcu-
AC K N OW L E D G M E N T S lations using the noncentral t-distribution. The Stata Journal, 4,
This work is supported by the National Institute of 142–153.
Harrison, L.J., Chen, T. and Wang, R. (2019) Power calculation for
Aging (NIA) of the National Institutes of Health (NIH)
cross-sectional stepped wedge cluster randomized trials with vari-
under Award Number U54AG063546, which funds NIA able cluster sizes. Biometrics, 76, 951–962.
Imbedded Pragmatic Alzheimer’s Disease and AD-Related Hemming, K., Lilford, R. and Girling, A.J. (2015) Stepped-wedge clus-
Dementias Clinical Trials Collaboratory (NIA IMPACT ter randomised controlled trials: a generic framework including
Collaboratory). The content is solely the responsibility of parallel and multiple-level designs. Statistics in Medicine, 34, 181–
the authors and does not necessarily represent the offi- 196.
cial views of the NIH. The authors are grateful to Profes- Hemming, K. and Taljaard, M. (2020) Reflection on modern methods:
when is a stepped-wedge cluster randomized trial a good study
sor Jim Hughes for providing data and information from
design choice? International Journal of Epidemiology, 49, 1043–
the Washington State EPT trial. We also thank the asso-
1052.
ciate editor and an anonymous referee for their valuable Heo, M. and Leon, A.C. (2008) Statistical power and sample size
suggestions, which greatly improved the exposition of this requirements for three level hierarchical cluster randomized tri-
work. als. Biometrics, 64, 1256–1262.
Hooper, R., Teerenstra, S., de Hoop, E. and Eldridge, S. (2016) Sample Matthews, J.N. (2020) Highly efficient stepped wedge designs for
size calculation for stepped wedge and other longitudinal cluster clusters of unequal size. Biometrics, 76, 1167–1176.
randomised trials. Statistics in Medicine, 35, 4718–4728. Taljaard, M., Teerenstra, S., Ivers, N.M. and Fergusson, D.A. (2016)
Hussey, M.A. and Hughes, J.P. (2007) Design and analysis of stepped Substantial risks associated with few clusters in cluster random-
wedge cluster randomized trials. Contemporary Clinical Trials, 28, ized and stepped wedge designs. Clinical Trials, 13, 459–463.
182–191. Teerenstra, S., Lu, B., Preisser, J.S., Van Achterberg, T. and Borm, G.F.
Jarvik, J.G., Comstock, B.A., James, K.T., Avins, A.L., Bresnahan, (2010) Sample size considerations for GEE analyses of three-level
B.W., Deyo, R.A., et al. (2015) Lumbar imaging with reporting of cluster randomized trials. Biometrics, 66, 1230–1237.
epidemiology (LIRE)-protocol for a pragmatic cluster randomized Teerenstra, S., Taljaard, M., Haenen, A., Huis, A., Atsma, F., Rodwell,
trial. Contemporary Clinical Trials, 45, 157–163. L., et al. (2019) Sample size calculation for stepped-wedge cluster-
Jarvik, J.G., Meier, E.N., James, K.T., Gold, L.S., Tan, K.W., Kessler, randomized trials with more than two levels of clustering. Clinical
L.G., et al. (2020) The effect of including benchmark prevalence Trials, 16, 225–236.
data of common imaging findings in spine image reports on Tian, Z., Preisser, J., Esserman, D., Turner, E., Rathouz, P. and Li, F.
health care utilization among adults undergoing spine imaging: (2021) Impact of unequal cluster sizes for GEE analyses of stepped
a stepped-wedge randomized clinical trial. JAMA Network Open, wedge cluster randomized trials with binary outcomes. Biometri-
3, e2015713–e2015713. cal Journal https://doi.org/10.1002/bimj.202100112.
Kistner, E.O. and Muller, K.E. (2004) Exact distributions of intraclass Zhou, X., Liao, X., Kunz, L.M., Normand, S.-L.T., Wang, M. and
correlation and Cronbach’s alpha with Gaussian data and general Spiegelman, D. (2020) A maximum likelihood approach to power
covariance. Psychometrika, 69, 459–474. calculations for stepped wedge designs of binary outcomes. Bio-
Li, F., Forbes, A.B., Turner, E.L. and Preisser, J.S. (2019) Power and statistics, 1, 102–121.
sample size requirements for GEE analyses of cluster randomized
crossover trials. Statistics in Medicine, 38, 636–649.
Li, F., Hughes, J.P., Hemming, K., Taljaard, M., Melnick, E.R. and
S U P P O RT I N G I N F O R M AT I O N
Heagerty, P.J. (2021) Mixed-effects models for the design and anal-
ysis of stepped wedge cluster randomized trials: an overview. Sta-
tistical Methods in Medical Research, 30, 612–639. Web Appendices, Tables, and Figures referenced in Sec-
Li, F., Turner, E.L. and Preisser, J.S. (2018) Sample size determinations 3-6 are available with this paper at the Biometrics
tion for GEE analyses of stepped wedge cluster randomized trials. website on Wiley Online Library. R code for predicting
Biometrics, 74, 1450–1458. power and for conducting the simulation and application
Li, F., Yu, H., Rathouz, P.J., Turner, E.L. and Preisser, J.S. (2021) studies described in Sections 2-5 are available as support-
Marginal modeling of cluster period means and intraclass correla-
ing material and at https://github.com/kldavisplourde/
tions in stepped wedge designs with binary outcomes. Biostatistics,
multilevelSWCRT.
39, 3347–3372.
Liu, J. and Colditz, G.A. (2020) Sample size calculation in three-level
cluster randomized trials using generalized estimating equation
models. Statistics in Medicine, 39, 3347–3372.
How to cite this article: Davis-Plourde, K.,
Martin, J., Girling, A., Nirantharakumar, K., Ryan, R., Marshall,
T. and Hemming, K. (2016) Intra-cluster and inter-period cor-
Taljaard, M., & Li, F. (2023) Sample size
relation coefficients for cross-sectional cluster randomised con- considerations for stepped wedge designs with
trolled trials for type-2 diabetes in UK primary care. Trials, 17, subclusters. Biometrics, 79, 98–112.
1–12. https://doi.org/10.1111/biom.13596

Sample Size Considerations For Stepped Wedge Designs With Subclusters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample Size Considerations For Stepped Wedge Designs With Subclusters

Uploaded by

Copyright:

Available Formats

Received: 7 May 2021 Revised: 12 October 2021 Accepted: 21 October 2021

BIOMETRIC METH ODOLOGY

Sample size considerations for stepped wedge designs with

1Department of Biostatistics, Yale School

98 wileyonlinelibrary.com/journal/biom Biometrics. 2023;79:98–112.

Period 1 .................................................................... Period T

Subcluster 1 ................ Subcluster Ki Subcluster 1 ................ Subcluster Ki

Period 1 .................................................................... Period T

Subcluster 1 ................ Subcluster Ki Subcluster 1 ................ Subcluster Ki

Period 1 .................................................................... Period T

Subcluster 1 ................ Subcluster Ki Subcluster 1 ................ Subcluster Ki

T A B L E 1 Definition of ICCs: within-period within-subcluster (𝛼0 ), between-period within-subcluster (𝛼1 ), within-subject

𝔼{𝚫−1 𝜁(𝝁𝑖 )𝚫−1 } can be computed via numerical inte- { 𝐼 }

In the Washington State EPT study, investigators were

of the linear mixed model. The extended block exchange- D A T A AVA I L A B I L I T Y S T A T E M E N T

You might also like