You are on page 1of 7

Journal of Clinical Epidemiology 66 (2013) 752e758

Stepped wedge designs could reduce the required sample size


in cluster randomized trials
Willem Woertmana,1, Esther de Hoopa,*,1, Mirjam Moerbeekb, Sytse U. Zuidemac,
Debby L. Gerritsend, Steven Teerenstraa
a
Department for Health Evidence, Radboud University Medical Centre, 133, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
b
Department of Methodology and Statistics, Utrecht University, PO Box 80089, 3508 TB, Utrecht, The Netherlands
c
Department of General Practice, University of Groningen, University Medical Centre Groningen, PO Box 72, 9700 AB, Groningen, The Netherlands
d
Department of Primary and Community Care, Centre for Family Medicine, Geriatric Care and Public Health, Radboud University Medical Centre, 117,
PO Box 9101, 6500 HB, Nijmegen, The Netherlands
Accepted 8 January 2013; Published online 22 March 2013

Abstract
Objective: The stepped wedge design is increasingly being used in cluster randomized trials (CRTs). However, there is not much
information available about the design and analysis strategies for these kinds of trials. Approaches to sample size and power calculations
have been provided, but a simple sample size formula is lacking. Therefore, our aim is to provide a sample size formula for cluster
randomized stepped wedge designs.
Study Design and Setting: We derived a design effect (sample size correction factor) that can be used to estimate the required sample
size for stepped wedge designs. Furthermore, we compared the required sample size for the stepped wedge design with a parallel group and
analysis of covariance (ANCOVA) design.
Results: Our formula corrects for clustering as well as for the design. Apart from the cluster size and intracluster correlation, the design
effect depends on choices of the number of steps, the number of baseline measurements, and the number of measurements between steps.
The stepped wedge design requires a substantial smaller sample size than a parallel group and ANCOVA design.
Conclusion: For CRTs, the stepped wedge design is far more efficient than the parallel group and ANCOVA design in terms of sample
size.
Ó 2013 Elsevier Inc. Open access under the Elsevier OA license.

Keywords: Cluster randomized trial; Stepped wedge design; Parallel group design; ANCOVA; Sample size; Design effect

1. Introduction The most commonly used trial design is the parallel


group design in which each cluster is randomized to either
Randomized controlled trials are considered the gold
an intervention or control condition [4]. Within this design,
standard in evaluating health care interventions [1]. How-
each cluster receives only one kind of treatment during
ever, cluster randomized trials (CRTs) are increasingly
the study, and usually all clusters start simultaneously.
being used in the health care setting [2]. In these trials, An extension of the parallel group design is the analysis
complete social units, or groups of individuals (such as
of covariance (ANCOVA) design where a baseline mea-
families, nursing homes [NHs], or general practices), are
surement is added to the design and included as a covariate
randomized to different treatments. They are mostly used
in the analysis [5]. In contrast, in the crossover design, ev-
to prevent contamination and in situations where individual
ery cluster will receive both the intervention and the control
randomization is not possible or not desirable for logistic,
treatment. Yet, the order of the interventions is randomized
financial, or ethical reasons [3].
for each cluster [3,4,6]. However, it is not always possible
to conduct a crossover design because it assumes that the
carryover effects are absent [3,4,6]. This means that the
Conflict of interest: Neither the article nor any parts of it have been estimated treatment effects should be independent of the
published or submitted before. No external funding has been received order in which the treatments were assigned. So, the effects
and no conflict of interest is present.
1
W.W and E.de.H. are the joint contributions.
of the first treatment should have disappeared by the time
* Corresponding author. Tel.: þ31-24-3667262; fax: þ31-24-3613505. the second treatment is started, which may be unrealistic
E-mail address: E.deHoop@ebh.umcn.nl (E. de Hoop). if, for example, the first treatment is the reinforcement of
0895-4356 Ó 2013 Elsevier Inc. Open access under the Elsevier OA license.
http://dx.doi.org/10.1016/j.jclinepi.2013.01.009
W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758 753

In addition, there are other advantages of the stepped


What is new? wedge design. First, the clusters act as their own controls
because they receive both the control and treatment condi-
 Approaches to power calculations for cluster ran-
tions. Therefore, the intervention effect can be estimated
domized stepped wedge designs have been pro-
from both between- and within-cluster comparisons. This
vided, but a simple sample size formula is
results in more statistical power and smaller required sam-
lacking. Therefore, we present a sample size for-
ple sizes than in a parallel group design [8]. Furthermore, it
mula for these kinds of trials.
is possible to control for time with the stepped wedge de-
 We derived a formula in which, besides the cluster sign [9]. By modeling the effects of time, it is possible to
size and intracluster correlation, the number of study whether the time spent in the intervention condition
steps and measurements can be varied. influences the effectiveness of the treatment. Finally, re-
cruitment of clusters and/or subjects may be easier within
 The stepped wedge design requires a substantial
this design because everyone will receive the treatment dur-
smaller sample size than a parallel group or analy-
ing the trial.
sis of covariance design.
In this article, we present a relatively simple sample size
formula for stepped wedge CRTs. A recent review showed
that the stepped wedge design is increasingly being used
over the last couple of years [10]. Yet, it was noted that
the reporting of stepped wedge CRTs needs to be improved,
a hygiene protocol and the second implies falling back to
especially the reporting of sample size and power calcula-
usual care in hospital wards.
tions. Hussey and Hughes [8] provide approaches to sample
Herein, we will focus on the stepped wedge design,
size and power calculations. However, their approach does
which is a type of crossover design in which (different)
not provide a sample size formula. Therefore, we propose
clusters switch treatments in only one direction at different
a simpler sample size approach using a design effect (sam-
time points (steps) [7e10]. Typically, all clusters start in the
ple size correction factor).
control condition. Then, the clusters switch to the interven-
In Section 2, we describe a trial in which the stepped
tion at consecutive time points, where the time of the
wedge design is being used. Throughout the article, we will
switch is randomized for every cluster. Eventually, all clus-
use this trial as an example. In Section 3, a sample size for-
ters will have switched from one condition to the other
mula will be presented, and a comparison with the parallel
(see Fig. 1).
group and ANCOVA design will be made in Section 4. We
The stepped wedge design is especially useful when the
will conclude with a summary and discussion in Section 5.
intervention is thought to do more good than harm (i.e.,
when there is no equipoise) [8e10]. In that situation, it is
unethical to withhold or withdraw the intervention from
a proportion of the subjects as would occur in a parallel 2. Exampledthe Act in Case of Depression study
group or crossover design, respectively. Besides, it may
Depression is a common health problem in NH resi-
be impossible to implement the intervention in half of all
dents. However, it is often undetected and undertreated.
clusters simultaneously because of practical, logistical, or
Therefore, the Nijmegen University Network of Nursing
financial reasons [8e10]. Then, the stepwise treatment im-
Homes developed the Act in Case of Depression (AiD)
plementation of the stepped wedge design offers a solution.
program [11]. This is a multidisciplinary care program to
identify and treat depression, and to monitor the treatment
effects. Because the AiD program involves the training and
cooperation of nursing staff, physicians, psychologists, and
recreational therapists in the NHs, this program is naturally
Control
implemented at the unit level (ward) of the NHs.
The AiD study is a CRT using a stepped wedge design
that aims to assess the efficacy of the AiD program in
3
NH units. There are two main reasons why a stepped wedge
Treatment
2 design was chosen. First, the number of available units was
1
small. Therefore, a parallel group design would not have
sufficient power to detect a relevant treatment effect (see
Section 4). Second, it was impractical to implement the
program in half of the participating units simultaneously
Fig. 1. Illustration of the stepped wedge design, where different because of the substantial training effort that was required.
(groups of) clusters switch from control to treatment at different Hence, stepwise implementation of the program was pre-
time points. ferred. Obviously, a crossover design was impossible for
754 W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758

this trial because the training of all professionals involved calculated by dividing the required sample size Nsw by
could not be undone. the cluster size n, and the number of clusters switching
treatment at each step is calculated by dividing the number
of clusters c by the number of steps k. Obviously, it is not
3. Sample size calculations guaranteed that the required number of clusters will be an
integer. If not, round this number off to the integer above.
In this section, we will provide a relatively simple sam- The same holds for the number of clusters that should
ple size formula for CRTs with a stepped wedge design. switch at every step. However, it suffices to distribute the
This formula is derived from the formulae provided by clusters as evenly as possible over the steps.
Hussey and Hughes [8]. Their work is based on a model The design effect DEsw is affected by choices regarding
that comes with the following assumptions that will hold three determinants. To guide the choice of these determi-
for our formula as well. First, it is assumed that there are nants, we will describe how each of them changes the
random cluster effects, fixed time effects, and absence of design effect and hence the sample size. First, it can be seen
cluster by time interactions. That is, the variation of a clus- that the design effect decreases as the number of measure-
ter mean over time is only because of changing subjects ments t after each step increases. The same holds for the
over time, and there is no inherent variation at the cluster number of baseline measurements b and the number of
level over time. Furthermore, the model takes external time steps k. That is, increasing the number of steps or the num-
trends into account. Yet, these trends are assumed to be ber of baseline measurements decreases the design effect.
equal for all clusters. Second, it is assumed that there is So, increasing one or more of the above-mentioned three
no within-subject correlation over time. This is likely to determinants decreases the required sample size. However,
hold if different subjects are sampled from the clusters at increasing the cluster size n results in a slightly larger
each measurement point. In contrast to Hussey and Hughes design effect.
[8], we consider a simpler situation where the same number Besides the previously described design choices, the
of clusters switches at each step and where the number of design effect also depends on the variation between clus-
measurements after each step is constant as well. ters, that is, the ICC r. The ICC cannot be chosen freely
The standard approach for sample size calculations for but depends on the context of the trial (e.g., population of
parallel group CRTs is to calculate the sample size that subjects, type of clusters, and type of outcome measure).
would be needed if individuals were to be randomized Reasonable estimates for the ICC need to be motivated
(Nu). Then, this unadjusted sample size is multiplied by by previous comparable studies, for example, pilot studies
the design effect ½1 þ ðn  1Þr to correct for clustering, or context matter knowledge. As the ICC increases, the de-
where n is the number of subjects within a cluster and sign effect first slightly increases as well (up to about
r is the intracluster correlation (ICC) [3]. For an ANCOVA r50:05Þ, and then starts decreasing. Figure 2 shows the
design, the sample size for a clustered parallel group design effect of the determinants on the design effect and hence
nr
is multiplied by a factor ð1  r2 Þ; where r5 the sample size. Furthermore, Table 1 shows the required
1 þ ðn  1Þr number of clusters for a parallel group, ANCOVA, and
(derived from [5] with cluster autocorrelation set to 1 and stepped wedge design given the several effect sizes and
subject autocorrelation set to 0 to obtain the same model numbers of steps.
as Hussey and Hughes [8]). Various choices for the determinants of the design effect
To be able to use a similar approach for stepped wedge will lead to different study designs, which may differ in the
designs, we derived the following design effect: required sample size. For example, suppose the number of
1 þ rðktn þ bn  1Þ 3ð1  rÞ clusters is fixed at 12. Then, it is possible to opt for a design
DEsw 5  $  
1 1 with two steps, both at which six clusters will switch. How-
1 þ r ktn þ bn  1 2t k  ever, other options are to choose for 3, 4, or 6 steps at which
2 k
4, 3, or 2 clusters will switch, respectively. The most
where k is the number of steps, b is the number of baseline extreme variant would be a design at which only one cluster
measurements, and t is the number of measurements after switches at each step, which results in 12 steps. Then, the
each step. Hence, the clusters will be measured b þ k$t question arises which design will require the smallest num-
times each. This design effect corrects for both clustering ber of subjects. If t is fixed, a design with 12 steps will lead
and the stepped wedge design. See Appendix at www. to the smallest number of subjects. This is rather straight-
jclinepi.com for derivations. forward because the total number of measurements is
Now, the required sample size for the stepped wedge b þ 12$t for this option, which is larger than b þ 2$t mea-
design can be calculated by multiplying the unadjusted surements for a design with two steps, for example. There-
sample size by the design effect: Nsw 5Nu $DEsw . Note that fore, more information will be available and hence fewer
formulae for Nu usually result in the number of subjects subjects are required.
per treatment arm, whereas the total number of subjects In general, it can be shown that efficiency in terms of
is needed here. The required number of clusters c is sample size improves if the number of steps increases.
W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758 755
1.4

1.4

1.4
1.2

1.2

1.2
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.0 0.00 0.05 0.10 0.15 0.20

Fig. 2. The design effect for a stepped wedge design as a function of the intracluster correlation for various cluster sizes (n), numbers of steps (k),
and measurements after each step (t). The number of baseline measurements (b) equals t. From top to bottom, the long-dashed lines represent
k 5 2, dotted lines represent k 5 3, dashed lines represent k 5 5, and solid lines represent k 5 10.

However, the efficiency gain from going from 2 to 3 steps is the number of clusters is known in advance and the ques-
much larger than the efficiency gain from going from 6 to tion is how many subjects per cluster should be sampled.
0
12 steps. Moreover, the efficiency gain of increasing the In this situation, guess a reasonable cluster size ðn Þ, calcu-
0
number of steps while keeping the number of clusters and late the number of clusters needed (c ), and compare this
the total number of measurements fixed is modest com- number with the available number of clusters ðcÞ. If
0
pared with the efficiency gain from adding clusters or mea- c0 !c, the cluster size n can be decreased, whereas the
surements to the design. cluster size should be increased if c0 Oc. This process
0
Within the presented sample size approach, the number should be repeated until a cluster size n is found for which
0
of subjects per cluster is chosen in advance and the required c is close to or equals c, but not larger than c.
number of clusters is calculated. However, it may be that Although this method is rather straightforward, it should
be noted that, for CRTs in general, it may be difficult or
Table 1. Total number of clusters needed for a parallel group design impossible to obtain sufficient power only by increasing
(PGD), ANCOVA, or stepped wedge design with several numbers cluster sizes, especially if the number of clusters is small
of steps (k) for several effect sizes [12e16].
Effect size 0.2 0.3 0.4 0.5
PGD 114 51 29 19 Example. The AiD program was expected to reduce the
ANCOVA 101 45 26 17 prevalence of depression in psychogeriatric NH units from
Stepped wedge 30% to 19.5%. Uncorrected for clustering and repeated
With k 5 2 94 42 24 15 measurements, a total sample size (Nu ) of 598 residents
With k 5 3 57 25 15 9
With k 5 5 35 16 9 6
would be required to detect this effect (power of 80% and
significance level of 0.05). It was expected that the number
Abbreviation: ANCOVA, analysis of covariance.
Note: Cluster size n 5 10, intracluster correlation 5 0.05, and the
of participating patients per NH unit would be around 20
number of baseline measurements and measurements after each (n). Furthermore, the ICC was estimated to be at most r
step b 5 t 5 1.  0:10 from a pilot study. The total study period was 24
756 W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758

months, and it was expected that at least 4 months would be effect for an ANCOVA design is ð1  r 2 Þ  ½1 þ ðn  1Þr,
needed for training and implementation of the AiD pro- and the design effect for a parallel group design is
gram. Therefore, at most, six measurements could be taken. ½1 þ ðn  1Þr. Therefore, the efficiency of these designs
We chose to maximize the number of steps, so k 5 5. can be compared by dividing the design effect for a stepped
Hence, one baseline measurement (b 5 1) and one mea- wedge and ANCOVA design by the one for a parallel group
surement after each step (t 5 1) had to be taken. This re- design. Figure 3 shows this ratio for several design choices.
sulted in a design effect of DEsw 5 0.46 and therefore It can be seen that the stepped wedge design is more effi-
a total sample size of 275 patients. So, the total sample size cient than a parallel group design, and its reduction in
was reduced by more than 50% in comparison to a study sample size becomes more pronounced when the ICC, clus-
where individuals would be randomized and measured only ter size, and number of steps and measurements increase.
once (after the intervention). Dividing the total sample size However, even in a design with one measurement after each
by the cluster size resulted in 14 clusters of which 2.8 step, a small cluster size and an ICC of 0.05, the stepped
should switch at every step. Hence, three clusters would wedge design with three steps already reduces the sample
switch simultaneously at four of the steps and two clusters size with more than 40% in comparison to a parallel group
at the remaining step. design. Besides, it can be seen that the ANCOVA design is
also more efficient than the parallel group design, but less
efficient than the stepped wedge design.
4. Comparing stepped wedge with parallel group and
ANCOVA design Example. Previously, we showed that the AiD trial using
a stepped wedge design with five steps required 275 patients.
We presented sample size formulae for the stepped For a cluster randomized parallel group design, the design
wedge, ANCOVA, and parallel group design in CRTs. All effect is DEpgd 51 þ ðn  1Þr51 þ ð20  1Þ  0:1052:9.
use a design effect approach, where the design effect for Multiplying this design effect with the unadjusted sample
a stepped wedge design is presented previously, the design size of 598 patients results in a total of 1,735 patients. So,
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0

0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20

Fig. 3. Efficiency of an analysis of covariance (solid line) and the stepped wedge design relative to a parallel group cluster randomized trial as
a function of the intracluster correlation for various cluster sizes (n), numbers of steps (k), and measurements after each step (t). The number
of baseline measurements (b) equals t. From top to bottom, the long-dashed lines represent k 5 2, dotted lines represent k 5 3, dashed lines
represent k 5 5, and dot-dashed lines represent k 5 10.
W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758 757

for the AiD trial, using a parallel group design would require information about the design of crossover CRTs can be
six times as much patients as the stepped wedge design. In found elsewhere [6,17].
comparison, for an ANCOVA design, the design effect is Although the stepped wedge design is more efficient in
ð1  0:692 Þ  2:951:52, so 910 patients would be required. terms of sample size, the properties of this design have
This shows that the ANCOVA design also requires far less some less attractive consequences. First, the inherent re-
patients than the parallel group design, but still over three peated measurements lead to a longer study period, which
times as many patients as the stepped wedge design. will come with additional costs. Besides, the costs for im-
plementation will be higher because all clusters within this
design will receive the intervention during the study even-
5. Discussion tually. In contrast, in a parallel group design, only half of
the clusters will receive the intervention. Furthermore, the
We showed that, for CRTs, the stepped wedge design is analysis of the data is more complex for stepped wedge
far more efficient than a parallel group design in terms of designs than for parallel group or ANCOVA designs [8,9]
sample size. Besides, we also showed that the stepped because data from a stepped wedge design are correlated
wedge design is more efficient than an ANCOVA design. because of the clustering of subjects within clusters as well
This can be explained by the fact that a stepped wedge as the repeated measurements. Hence, analysis methods for
design involves repeated measurements by definition. Fur- correlated data can use, for example, generalized linear
thermore, all clusters receive both the control and the treat- mixed models or generalized estimating equations [8].
ment condition, so they can act as their own control. Another implication of implementing the intervention in
Therefore, the treatment effect can be estimated from all clusters sequentially is that, depending on the type of
between- and within-cluster comparisons. intervention, learning effects may occur in the trainers or
It should be noted that the comparison is made with appliers of the intervention. The AiD study, for example,
a parallel group design that uses only one measurement involved training of nurses, physicians, psychologists, and
and an ANCOVA design that uses two measurements. recreational therapists to detect and treat depression. The
Hence, it would be more fair to make a comparison with trainer(s) of this program may become more experienced
a repeated measurements parallel group or ANCOVA after training every new enrolling NH unit, which may lead
design with an equal number of measurements as a corre- to (small) differences in the program between units. Subse-
sponding stepped wedge design. However, design effects quently, this may have an effect on the estimated treatment
for such designs (with equal assumptions) are not yet avail- effect across clusters. Besides, clusters will differ in the
able. It can be expected that a stepped wedge design will amount of time spent in the treatment condition. Eventu-
still be more efficient than a repeated measurements paral- ally, clusters that switch at the first step will be more expe-
lel group design because the latter design only uses rienced with the treatment than clusters that switch at later
between-cluster comparisons, whereas the stepped wedge steps. This may also affect the estimated treatment effect.
design uses both between- and within-cluster comparisons However, both types of learning effects can be modeled.
to estimate the treatment effect. For a repeated measure- An often-mentioned implication of repeated measure-
ments ANCOVA design, the comparison is less straightfor- ments is a higher burden on the respondents. In our sample
ward and therefore subject to further research. size formula, we assumed that there would be no within-
In the introduction, we mentioned the crossover design, subject correlation over time. This is most likely to be true
so we could have questioned how the efficiency of a stepped in cross-sectional studies, when new subjects are sampled
wedge design compares to such a design. The crossover from the clusters at each measurement. Therefore, the bur-
design is useful when carryover effects are absent. Then, den of repeated measurements on the subjects will be absent.
the crossover design is expected to be more efficient than However, not only the subjects but also the clusters are in-
the simplest stepped wedge design with two steps because volved. So, there might be burden at the cluster level. In
within the crossover design two between- and two within- the AiD study, for example, a part of the outcome measures
cluster comparisons can be made by using two measure- had to be carried out by the unit personnel at each step.
ments, whereas within the stepped wedge design only one Hence, a limited number of measurements is recommended.
between- and two within-cluster comparisons can be made In principle, the stepped wedge design is not only exclu-
by using three measurements. Yet, for larger numbers of sively applicable to CRTs but it can also be used in individ-
steps within the stepped wedge design, this design may be- ually randomized trials. In fact, the individually randomized
come more efficient in terms of sample size. Furthermore, trial equals a CRT with only one subject per cluster. The
if carryover effects are present, the crossover design gives same formulae apply, but here the burden of repeatedly
biased results and will be less efficient than the stepped measuring the subject may become a problem. If this is
wedge design. In our example of the AiD study, carryover the case, it is recommended to restrict the number of steps.
effects were very likely to be present because the program The cross-sectional nature of our approach has an effect
involved training of professionals. Hence, the stepped on the statistical power. In general, cohort studies are more
wedge design was preferred over a crossover design. More powerful than cross-sectional studies [18e20]. Subjects are
758 W. Woertman et al. / Journal of Clinical Epidemiology 66 (2013) 752e758

measured repeatedly within cohort designs. Therefore, not [2] Bland JM. Cluster randomised trials in the medical literature: two
only within-cluster but also within-subject comparisons bibliometric surveys. BMC Med Res Methodol 2004;4:21.
[3] Donner A, Klar N. Design and analysis of cluster randomization trials
can be used to estimate the treatment effect. Hence, the in health research. London, UK: Arnold; 2000.
required sample size for a cohort study will be smaller than [4] ICH E9 Expert Working Group. Statistical principles for clinical
for a cross-sectional study. A sample size formula that trials. Stat Med 1999;18:1905e42.
allows for within-subject correlations over time is not avail- [5] Teerenstra S, Eldridge S, Graff M, de Hoop E, Borm GF. A simple
able yet. However, using our formula will at least provide sample size formula for analysis of covariance in cluster randomized
trials. Stat Med 2012;31:2169e78.
a sample size with sufficient power in case of a cohort [6] Rietbergen C, Moerbeek M. The design of cluster randomized cross-
design. over trials. J Educ Behav Stat 2011;36:472e90.
We stated that when the number of clusters that should [7] The Gambia Hepatitis intervention study. Cancer Res
switch at each step is not an integer, it suffices to distribute 1987;47:5782e7.
the clusters as evenly as possible over the steps. Yet, this is [8] Hussey MA, Hughes JP. Design and analysis of stepped wedge
cluster randomized trials. Contemp Clin Trials 2007;28:182e91.
an approximation, that is, it is not known how the power is [9] Brown CA, Lilford RJ. The stepped wedge trial design: a systematic
affected by the uneven distribution and what if the large review. BMC Med Res Methodol 2006;6:54.
groups of clusters should switch at early steps or at later [10] Mdege ND, Man MS, Taylor Nee Brown CA, Torgerson DJ. System-
steps. For example, if seven clusters are required for three atic review of stepped wedge cluster randomized trials shows that
steps, then there are two groups of two clusters and one design is particularly used to evaluate interventions during routine
implementation. J Clin Epidemiol 2011;64:936e48.
group of three clusters for the three steps. Then, it can be [11] Gerritsen DL, Smalbrugge M, Teerenstra S, Leontjevas R,
questioned if the group of three clusters should switch at Adang EM, Vernooij-Dassen M, et al. Act in case of depression:
the first, second, or third step. Practical considerations may the evaluation of a care program to improve the detection and
influence this choice. However, the effect of the uneven dis- treatment of depression in nursing homes. Study Protocol. BMC
tribution on statistical power is a topic for further research. Psychiatry 2011;11:91.
[12] Feng ZD, Diehr P, Peterson A, McLerran D. Selected statistical
In conclusion, the usefulness of the stepped wedge design issues in group randomized trials. Annu Rev Public Health
is increasingly being recognized by researchers [10]. How- 2001;22:167e87.
ever, a simple sample size formula for this design was lack- [13] Flynn TN, Whitley E, Peters TJ. Recruitment strategies in a cluster
ing. We presented a formula in which the number of steps, randomized trialdcost implications. Stat Med 2002;21:397e405.
measurements, and cluster sizes can be varied. Besides, we [14] Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-
randomized trials: a review of recent methodological developments.
showed how these choices affect the required sample size. Am J Public Health 2004;94:423e32.
Hence, designing a stepped wedge CRT is simplified. [15] Hemming K, Girling AJ, Sitch AJ, Marsh J, Lilford RJ. Sample size
calculations for cluster randomised controlled trials with a fixed
number of clusters. BMC Med Res Methodol 2011;11:102.
Appendix [16] Campbell MJ. Cluster randomized trials in general (family) practice
research. Stat Methods Med Res 2000;9:81e94.
Supplementary data [17] Giraudeau B, Ravaud P, Donner A. Sample size calculation for
cluster randomized cross-over trials. Stat Med 2008;27:5578e85.
Supplementary data related to this article can be found at [18] Duncan GJ, Kalton G. Issues of design and analysis of surveys across
http://dx.doi.org/10.1016/j.jclinepi.2013.01.009. time. Int Stat Rev 1987;55:97e117.
[19] Feldman HA, McKinlay SM. Cohort versus cross-sectional design in
References large field trials: precision, sample size, and a unifying model. Stat
Med 1994;13:61e78.
[1] Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: [20] Frison L, Pocock SJ. Repeated measures in clinical-trialsdanalysis
updated guidelines for reporting parallel group randomised trials. using mean summary statistics and its implications for design. Stat
J Clin Epidemiol 2010;63:834e40. Med 1992;11:1685e704.

You might also like