You are on page 1of 4

Computer Methods and Programs in Biomedicine 64 (2001) 121 – 124

www.elsevier.com/locate/cmpb

Sample size and power calculations in repeated


measurement analysis
Chul Ahn *, John E. Overall, Scott Tonidandel
Uni6ersity of Texas Health Science Center at Houston, 6431 Fannin St., MSB 1.112, Houston, TX 77030, USA

Received 8 November 1999; received in revised form 3 March 2000; accepted 20 April 2000

Abstract

Controlled clinical trials in neuropsychopharmacology, as in numerous other clinical research domains, tend to
employ a conventional parallel-groups design with repeated measurements. The hypothesis of primary interest in the
relatively short-term, double-blind trials, concerns the difference between patterns or magnitudes of change from
baseline. A simple two-stage approach to the analysis of such data involves calculation of an index or coefficient of
change in stage 1 and testing the significance of difference between group means on the derived measure of change
in stage 2. This article has the aim of introducing formulas and a computer program for sample size and/or power
calculations for such two-stage analyses involving each of three definitions of change, with or without baseline scores
entered as a covariate, in the presence of homogeneous or heterogeneous (autoregressive) patterns of correlation
among the repeated measurements. Empirical adjustments of sample size for the projected dropout rates are also
provided in the computer program. © 2001 Elsevier Science Ireland Ltd. All rights reserved.

Keywords: Repeated measures; Sample size estimate; Power calculation; Dropouts

1. Introduction line. The analysis of data from the randomized,


parallel-groups design often focuses on the differ-
Controlled clinical trials tend to employ a par- ence between experimental and control groups in
allel-groups repeated measurements design in the average rate of change as represented by
which individuals are randomly assigned between slopes of regression lines fitted to the mean re-
treatment groups, evaluated at baseline, and then sponse patterns.
evaluated at intervals across a treatment period of In several recent articles, we have examined the
fixed total duration. The repeated measurements actual type I error and power provided by alter-
are usually equally spaced, although not necessar- native general linear mixed model formulations,
ily so. The hypothesis of primary interest in short- including procedures that utilize maximum likeli-
term efficacy trials of perhaps 6 – 8 weeks duration hood and related solutions to model random ef-
concerns the difference between treatment groups fects and error covariance structures of the
in patterns and magnitudes of change from base- repeated measurements [1,2]. The purpose of this
article is to make available a computer program
for calculation of sample sizes and power for this
* Corresponding author. most common clinical trials design based on

0169-2607/01/$ - see front matter © 2001 Elsevier Science Ireland Ltd. All rights reserved.
PII: S 0 1 6 9 - 2 6 0 7 ( 0 0 ) 0 0 0 9 5 - X
122 C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124

adaption of more general equations previously response measure in two treatment groups can be
provided by [3] and elaborated computationally for obtained as the area that lies beyond a critical Zb
a two-stage generalized least squares solution by value in the unit normal curve. The required sample
[2]. size can be calculated directly from the power
For sample size and power calculations, the calculation formula. Type I error can be calculated
computer program to be described considers a by making the treatment effect size D equal to zero
two-stage analysis of the repeated measurements in from the power calculations formula. The formulas
which an index or coefficient of change is calculated presented here for sample size and power calcula-
for each individual in stage 1, and the significance tions are simplified version in [3], which present
of the difference between group means on the equations for calculating Zb for tests of the ‘equal
derived measure of change is evaluated against the change hypothesis’ using simple endpoint change
within-groups variability of that measure in stage or ordinary least squares (OLS) regression slopes
2 using analysis of variance (ANOVA) or analysis as dependent variables. An equation for calculating
of covariance (ANCOVA) methods. Stage 1 of the power for tests on generalized least squares (GLS)
analysis can specify three different definitions of regression slopes results from substituting x%=
subject-specific rate of change, which represents z%C − 1 into calculation of effect size D. The general

'
endpoint change, ordinary least squares (OLS), or power equation is of the form
generalized least squares (GLS) regression analysis. n
Each provides an estimate of the slope of a regres- Zb = D − Za
2
sion line relating the repeated measurements to
corresponding assessment times. For the intent-to- where Zb is the critical Z-score delineating an area
treat analysis, the number of available repeated under the unit normal curve equal to power, Za is
measurements to which the regression model is the critical value corresponding to the desired
fitted is permitted to differ for provided dropouts one-sided or two-sided alpha level, n is the sample
and completers. size per groups, and D is the effect size which
Dropouts tend to attenuate the power of tests for depends on the particular definition of change.
evaluating differences in patterns of change across x%d
time in a repeated measurements design. Simula- D=
s
x%Cx
tion methods are used to examine the attenuation
of power due to dropouts, and they concluded that where x% is the contrast vector for endpoint, OLS,
the common practice of increasing the ‘dropout or GLS analysis, d is the vector of differences
free’ sample size by the anticipated number of between group means, C is the within-groups
dropouts is a useful rule-of-thumb [4]. Such correlation matrix for the repeated measurements,
dropout adjusted sample sizes are thus also pro- and s is the within-groups standard deviation
vided by the computer program. It first calculates which is assumed to be constant across the repeated
sample sizes and/or power, with or without the measurements. It is common to covary baseline
baseline value entered as a covariate, using all three scores for the simple endpoint analysis, which
definitions of change mentioned above, and then it involves adjustment of the error variance by the r 2
also provides adjusted sample size estimates aimed correlation between baseline and endpoint in calcu-
at maintaining the same desired power for an lating D. The following simplified equations for
intent-to-treat analysis that includes data for an calculating Zb are obtained by considering the
anticipated 20 or 30% dropouts. different x% linear contrasts for endpoint, OLS, or
GLS analysis.

'
Endpoint analysis without baseline covaried:
2. Sample size and power calculations m − m2t n
Zb = 1t − Za
sw 4(1− r)
The power of a test of significance for the where m1t − m2t is the difference between endpoint
difference between means of a normally-distributed means only, assuming that the expected value of
C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124 123

the baseline difference in the two treatment groups 3. Program description


is zero due to randomization, and r is the within-
groups correlation between baseline and endpoint Power depends on the pattern of treatment
scores. effects across time, the within-groups variance
Endpoint analysis with baseline covaried: which is assumed to be constant across the different

Zb =
m1t −m2t ' n
−Za
time points, the number of equally-spaced repeated
measurements, and the level and pattern of their
sw 2(1 −r 2) intercorrelations, as well as on parameters common
to estimation of sample sizes for simple randomized
OLS slope difference without baseline covaried:

'
designs. Herein we describe the requirements for
x%d n use of a computer program called POWER.EXE,
Zb = − Za which has been written in Microsoft Fortran, to
sw 2(x%Cx)
perform sample size and power calculations for
where x% is the mean-corrected vector of linearly comparing differences in rates of change from
increasing time coefficients, x% =[ −4 − 3 − 2 − 1 baseline produced by two treatments in a controlled
0 1 2 3 4] for a total of nine measurements, d is the clinical trial. The interactive program is easily
mean-corrected vector of postulated linearly in- implemented. It calculates: (1) the required sample
creasing differences between group means, and C size per group for desired level of power at a
is the within-groups correlation matrix among the specified alpha level given a specified minimum (or
repeated measurements. meaningful) endpoint treatment difference; or (2)
OLS with baseline covaried: the statistical power against a specified minimum

Zb =
x%d ' n
−Za
(or meaningful) endpoint treatment difference
given specified alpha level and sample size per
sw 2
2(1−r )(x%Cx)
b group.
where rb = x%Cx(1)/(x%Cx)1/2 is the correlation be- To implement the program POWER.EXE, sim-
tween baseline scores and the time-weighted combi- ply type POWER at the DOS prompt. The program
nation of repeated measurements defined by asks whether you want to compute the required
x%= [− 4 −3 − 2 − 1 0 1 2 3 4] with x %(1) = [1 0 sample size per treatment group (two groups) or the
0 0 0 0 0 0 0] for a total of nine measurements. statistical power provided by a specified sample
GLS slope differences without baseline covaried: size. The program next asks whether you want to

Zb =
z%C − 1d ' n
− Za
model the correlational structure as autoregressive
(order 1) or compound symmetry. The program
sw 2(z%C − 1z) then asks user to specify the desired minimal
detectable endpoint mean difference and the
where z% is the vector of linearly increasing time within-groups standard deviation.
coefficients as used for OLS calculations, z%= [− 4 Limitations of the program include the fact that
− 3 −2 − 1 0 1 2 3 4] for a total of nine it calculates sample size and power for a two-group
measurements, and C − 1 is the inverse of the design only, considers fitting linear equations to the
within-groups correlation matrix or model thereof. patterns of change across equally-spaced repeated

'
GLS slope differences with baseline covaried: measurements, and assumes that any dropouts will
z%C − 1d n tend to be relatively uniformly distributed across
Zb = −Za time in the repeated measurements design.
sw 2(1− r 2c )(z’C − 1z)
where rc = z%z(1)/(z%C − 1z)1/2 is the correlation be-
tween baseline scores and the transformed GLS 4. Example
definition of change, where z%= [− 4 − 3 − 2 − 1
0 1 2 3 4] and z%(1) =[1 0 0 0 0 0 0 0 0] for a total Schizophrenia symptomatology is assessed using
of nine measurements. the total score from the Brief Psychiatric Rating
124 C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124

Scale (BPRS). Scores have a potential range from It can be appreciated that power and sample
0 to 108, with higher score indicating more severe size for the GLS solution approach those for the
pathology. Assessments are carried out at baseline simple endpoint difference-score analysis as the
(week zero) and at weeks 1, 2, 3, 4, 5, and 6. The correlation structure of the repeated measure-
primary objective of the study is to compare ments approaches a true AR(1) pattern and g
gradients of change in BPRS total scores from approaches 1. The GLS solution approaches OLS
baseline to week 6 for placebo and a new therapy definition of change when the correlation struc-
for schizophrenia. The investigator wants to esti- ture approaches compound symmetry. The end-
mate the sample size needed to detect a treatment point analysis, and consequently GLS regression,
effect of medium magnitude with 80% power. have superior power and thus require smaller
Cohen [5] (p. 26) has characterized a difference of sample sizes than OLS regression in the presence
one-half standard deviation as a treatment effect of an autoregressive correlation structure. The
of ‘medium magnitude.’ The within-groups corre- OLS and GLS regression provide superior power
lation structure of the repeated measurements is in the presence of uniform correlation. An unpub-
assumed to conform to autoregressive (AR(1)) lished parametric study examined the sample size
with the (population) baseline-to-endpoint corre- and power requirements for various intermediate
lation equal to 0.5. Table 1 shows the results from levels of serial dependence, and the conclusion
using the computer program POWER.EXE to followed that it is better to use the autoregressive
estimate the required sample size. (AR(1)) option for all cases where any meaningful
degree of serial dependence among the repeated
measurements is present. This is also a conserva-
Table 1 tive approach as confirmed by the larger sample
sizes that are required for desired power under
autoregressive conditions.

Acknowledgements

This work was supported in part by NIH grants


MH32457 and MO1RR02588.

References

[1] C. Ahn, S. Tonidandel, J.E. Overall, issues in use of SAS


PROC.MIXED to test the significance of treatment effects
in controlled clinical trials. J. Biopharm. Stat., 10 (2000)
265 – 286.
[2] J.E. Overall, C. Ahn, C. Shivakumar, Y. Kalburgi, Prob-
lematic formulations of SAS PROC.MIXED models for
repeated measurements, J. Biopharm. Stat. 9 (1999) 189 –
216.
[3] J.E. Overall, S.R. Doyle, Estimating sample sizes for re-
peated measurements designs, Control. Clin. Trials 15
(1994) 100 – 123.
[4] J.E. Overall, G. Shobaki, C. Shivakumar, J. Steele, Adjust-
ing samples size for anticipated dropouts in clinical trials,
Psychopharmacol. Bull. 34 (1998) 25 – 33.
[5] J. Cohen, Statistical Power Analysis for the Behavioral
Sciences: 2nd ed. Hillsdale, NJ, Lawrence Erlbaum, 1988.

You might also like