You are on page 1of 3

Statistics

Sample Size Calculation for Two Independent Groups: A Useful Rule


of Thumb
John C Allen Jr, PhD
Duke-NUS Graduate Medical School, Singapore

Introduction the false negative rate. Using conventional values


There is a great little book by Gerald van Belle of α = 0.05 and β = 0.20, the ROT for calculating n,
written “for practitioners of statistical science” the per-group sample size is simply
called Statistical Rules of Thumb1. It is a collection
of widely applicable simple rules that capture n 16 16 16
= = = (1)
Δ δ2
the essence of key statistical concepts. One of ( ) σ
2
(effect size) 2

the topics addressed is sample size calculation —


in particular, the basic problem of calculating a Effect size (δ ) is the standardised difference —
sample size when the aim is to compare means from the absolute difference Δ divided by the standard
2 independent samples. deviation σ. The ROT is easily obtained from the
2-sample normal approximation sample size
With the widely available and readily accessible formula:
software for computing sample size, both online
and commercial, one might question the value of a 2 ( Ζ α ∕2 + Ζ 1–β )2 2(1.96 + 0.84)2 15.68 16
n= = = ≈
“rule of thumb” formula for calculating sample size. μ1 – μ2 Δ δ2 δ2
While a thorough assessment of the conceivable
( σ ) 2
( σ ) 2

factors and influences that could impact sample


size is critical, what is often needed in the early A comparison of samples sizes calculated using
planning stages is an expeditious calculation to the ROT versus a commercial software package
allow a gross assessment of resource requirements (PASS©) is given in Table 1, overleaf (α = 0.05,
and feasibility. This is where the statistical rule of 1 – β = 0.80). Overall, the agreement is very good.
thumb (ROT) is often very useful. In discussions The total number of subjects required is 2n.
between statistician and investigator, the ROT
removes much of the algebraic and notational For α = 0.05, the numerators in equation (1)
shroud and the problem is reduced to its elemental corresponding to 1 – β = 0.90 and 0.95, respectively,
form. The complex is made simple. The benefit are 21 and 26. In addition, equation (1) can
may not lie as much in the ease of calculation as be inverted to calculate either the detectable
in facilitating communication between researcher difference Δ or effect size δ for a given sample size:
and statistician.
4σ 4
Δ = or effect size = δ = (2)
2 Independent Samples √n √n
A sample size calculation with the intent of comparing
means μ1 and μ2 from 2 independent groups 1 Sample
usually begins by assuming normal distributions In the 1-sample case where a single mean
and homogeneous variances σ12 = σ22 = σ 2, equal is compared to a known reference value, for
sample sizes n1 = n2 = n and a 2-sided test. The false α = 0.05 and 1 – β = 0.80, 0.90 and 0.95, the
positive rate is set to α and the power for detecting numerator in equation (1) becomes, respectively, 8,
a difference Δ = | μ1 – μ2 | is set to 1 – β, where β is 11 and 13 — half the number per group required

138 Proceedings of Singapore Healthcare  Volume 20  Number 2  2011


Sample Size Calculation for Two Independent Groups: A Useful Rule of Thumb

Table 1. Sample size calculated using the Rule of Thumb (ROT) versus commercial software (PASS©).
Effect Size 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
PASS© 393 176 100 64 45 33 26 21 17
n per Group
ROT 400 178 100 64 45 33 25 20 16

for a 2-sample test. This serves as a reminder that In approach (B) comparing time-averaged
comparing 2 groups requires 4 times the total differences, the sample size will be labelled
number of subjects as when comparing a single −
nμ(r , ρ). The standardised difference δ = Δ ∕ σ is a
mean to a known reference. meaningful effect size in the average response
between the 2 groups. For α = 0.05 and β = 0.20, the
2 Independent Samples with Repeated sample size nμ(r , ρ) can be obtained by multiplying
Measurements n in equation (1) by the factor [1 + ρ (r – 1)] ∕ r , i.e.
Extending the use of equation (1), suppose we
wish to determine n for 2 independent samples [1 + ρ (r – 1)]
nμ(r , ρ) = n • . (4)
for which r repeated measurements will be made r
on each subject over a period of time. Two analysis
approaches can be considered: (A) a comparison of Inspection of (4) indicates that low values of the
slopes of 2 linear trends; or (B) a comparison of time intra-class correlation ρ (the closer to 0 the better)
averaged differences. Within- and between-subject will reduce n for detecting an effect of size δ.
variance components are σw2 and σ b2, respectively, Spacing of measurement times is irrelevant.
with total variance σ 2 = σw2 + σ b2 . It is assumed
that each subject is measured at the same times Repeated measurement studies are likely to have
tj = ( j = 1, ..., r ) and s t2 = Σ j (tj – t̄)2 ∕ r is the missing follow-up visit data due to various reasons
within-subject variance of the tj. The only other (e.g. non-compliance). As a “hedge” against the
critical piece of information is an estimate of the potential for a substantial loss of power, the
intra-class correlation ρ = σ b2 ∕ σ 2 , the common proportion of patients anticipated to be lost
correlation between measurements taken on the to follow-up should be taken into account in
same subject, where 0 ≤ ρ ≤ 1. calculating sample size.

In approach (A) comparing slopes, the sample size Summary


will be labelled nβ (r , ρ). The standardised difference In calculating sample size for comparing means
δ = ( β 1 – β 2 ) ∕ σ is a meaningful effect size relative of independent samples, a good starting point
to the difference in slopes. For α = 0.05 and is simply 16 ∕ (effect size)2. In the absence of
β = 0.20 the sample size nβ (r , ρ) can be obtained specific information about Δ or σ, effect sizes of
by multiplying n in equation (1) by the factor δ = 0.30, 0.50 and 0.80 could be regarded as “small”,
(1 – ρ) ∕ rs t2 , i.e. “medium” and “large”, respectively. In longitudinal
studies, the intra-class correlation affects sample
(1 – ρ) size calculation differently, depending on the
nβ (r , ρ) = n • . (3)
rs t2 analysis approach. In applying formulas (3) and (4),
respectively, the ROT is as follows: for comparing
Inspection of (3) indicates that high values of the slopes (within-subject average rate of change over
intra-class correlation ρ (the closer to 1 the better) time), high values of the intra-class correlation (ρ
and more repeated measurements with wider close to 1) are desirable as they will result in smaller
spacing of measurement times will reduce n for sample sizes for detecting a given effect size, all
detecting an effect of size δ. else constant; for comparing time-averaged mean

Proceedings of Singapore Healthcare  Volume 20  Number 2  2011 139


Statistics

differences (averaged across subjects), low values of Reference:


the intra-class correlation (ρ close to 0) are desirable 1. Van Belle G. Statistical Rules of Thumb. 1st ed. New York,
NY: Wiley Interscience. 2002. 248 p.
for minimising sample size. In the absence of any
information at all about ρ, the recommendation
is to try to choose a value of ρ that will err on the
conservative side of whichever approach is being
used, (A) or (B).

140 Proceedings of Singapore Healthcare  Volume 20  Number 2  2011

You might also like