You are on page 1of 33

Sample Size and Power

Ketut Suega
Unit Litbang FK Unud/RSUP Sanglah,
Denpasar,Bali
Why is it important to consider sample
size?
 To have a high chance of detecting a clinically
important treatment effect if it exists.
 To ensure appropriate precision of estimates.
 To avoid wasting resources and the time of
participants.
 To avoid misleading conclusions.
When is a sample size calculation not
necessary?

 Truly qualitative research.

 Pilot studies that will be used to inform larger studies


(and not make conclusions).
TOO VIEW SUBJECTS PROVES
NOTHING
Sample Size and Power
 The most common reason statisticians get contacted
 Sample size is contingent on design, analysis plan, and outcome
 With the wrong sample size, you will either
 Not be able to make conclusions because the study is “underpowered”
 Waste time and money because your study is larger than it needed to be to
answer the question of interest
 And, with wrong sample size, you might have problems interpreting
your result:
 Did I not find a significant result because the treatment does not work, or
because my sample size is too small?
 Did the treatment REALLY work, or is the effect I saw too small to warrant
further consideration of this treatment?
 This is an issue of CLINICAL versus STATISTICAL significance
Sample Size and Power
 Sample size ALWAYS requires the investigator to
make some assumptions
 How much better do you expect the experimental
therapy group to perform than the standard therapy
groups?
 How much variability do we expect in measurements?
 What would be a clinically relevant improvement?
 The statistician CANNOT tell you what these
numbers should be (unless you provide data)
 It is the responsibility of the clinical investigator to
define these parameters
Sample Size and Power
 Review of power

o Power = The probability of concluding that the new treatment is


effective if it truly is effective

o Type I error = The probability of concluding that the new treatment


is effective if it truly is NOT effective

o (Type I error = alpha level of the test)

o (Type II error = 1 – power)

 When your study is too small, it is hard to conclude that


your treatment is effective
Continuous outcomes
 Easiest to discuss

 Sample size depends on


 Δ: difference under the null hypothesis
 α: type 1 error
 β type 2 error
 σ: standard deviation
 r: ratio of number of patients in the two groups
(usually r = 1)
Why bother?
 Sample size calculations are important to ensure that estimates are obtained
with required precision or confidence.
 E.g. a prevalence of 10% from a sample of size 20 ... 95%CI is 1% to 31%...
 ... a prevalence of 10% from a sample of size 400 ... 95%CI is 7% to 13%

 In studies concerned with detecting an effect


 if an effect deemed to be clinically or biologically important exists, then there is a
high chance of it being detected, i.e. that the analysis will be statistically
significant.
 If the sample is too small, then even if large differences are observed, it will be
impossible to show that these are due to anything more than sampling variation.
Some terminology
 Significance level
 Cut-off point for the p-value, below which the null hypothesis will be rejected and it will be
concluded that there is evidence of an effect. Typically set at 5%.
 One-sided and two-sided tests of significance
 Two-sided tests should be used unless there is a very good reason for doing otherwise.
 Power
 Power is the probability that the null hypothesis will be correctly rejected i.e. rejected when
there is indeed a real difference or association. It can also be thought of as "100 minus the
percentage chance of missing a real effect" - therefore the higher the power, the lower the
chance of missing a real effect. Power is typically set at 80% or 90% but not below 80%.
 Effect size of clinical importance
 This is the smallest difference between the group means or proportions (or odds ratio/relative
risk closest to unity) which would be considered to be clinically or biologically important.
The sample size should be set so that if such a difference exists, then it is very likely that a
statistically significant result would be obtained.
Example (1)
Estimating a single proportion
 Scenario: The prevalence of dysfunctional breathing amongst asthma patients
being treated in general practice is to be assessed using a postal questionnaire
survey
 Required information:
 Primary outcome variable = presence/absence of dysfunctional breathing
 'Best guess' of expected percentage (proportion) = 30% (0.30)
 Desired width of 95% confidence interval = 10% (i.e. +/- 5%, or 25% to 35%)

 Formula for sample size for estimation of a proportion is n=


15.4 * p * (1-p)/W2
 where n = the required sample size
 p = the expected proportion - here 0.30
 W = width of confidence interval - here 0.10
Example (2)
Estimating a single proportion
 Here we have: n = 15.4 * 0.30 * (0.70)/ 0.102 = 324

 "A sample of 324 patients with asthma will be required to obtain a


95% confidence interval of +/- 5% around a prevalence estimate
of 30%. To allow for an expected 70% response rate to the
questionnaire, a total of 480 questionnaires will be delivered."

 Note: The formula presented below is based on 'normal


approximation methods', and, should not be applied when
estimating percentages which are close to 0% or 100%. In these
circumstances 'exact methods' should be used.
Which variables should be included in the sample
size calculation?
 The sample size calculation should relate to the study's
primary outcome variable.
 If the study has secondary outcome variables which are
also considered important (as is often the case), the
sample size should also be sufficient for the analyses of
these variables.
Allowing for response rates and other losses
to the sample
 The sample size calculation should relate to the final,
achieved sample.
 Therefore, the initial numbers approached in the
study may need to be increased in accordance with
the expected response rate, loss to follow up, lack of
compliance, and any other predicted reasons for loss
of subjects.
 The link between the initial numbers approached
and the final achieved sample size should be made
explicit.
Consistency with study aims and statistical
analysis
 If the aim is to demonstrate that a new drug is superior to an
existing one then it is important that the sample size is sufficient to
detect a clinically important difference between the two treatments.
 However, sometimes the aim is to demonstrate that two drugs are
equally effective. This type of trial is called an equivalence trial or a
'negative' trial.
 The sample size required to demonstrate equivalence will be larger
than that required to demonstrate a difference.
 The sample size calculation should also be consistent with the
study's proposed method of analysis, since both the sample size and
the analysis depend on the design of the study.
Pitfalls to avoid (1)
 "The throughput of the clinic is around 50 patients a year, of whom 10%
may refuse to take part in the study. Therefore over the 2 years of the
study, the sample size will be 90 patients. "
 Although most studies need to balance feasibility with study power, the
sample size should not be decided on the number of available patients
alone.
 Where the number of available patients is a known limiting factor,
sample size calculations should still be provided, to indicate either
 The power which the study will have to detect the desired difference of
clinical importance, or
 The difference which will be detected when the desired power is applied.
Pitfalls to avoid (2)
 "Sample sizes are not provided because there is no prior
information on which to base them."

 Where prior information on standard deviations is


unavailable, sample size calculations can be given in very
general terms, i.e. by giving the size of difference that
may be detected in terms of a number of standard
deviations.
Pitfalls to avoid (3)
 "A previous study in this area recruited 150 subjects and found highly
significant results (p=0.014), and therefore a similar sample size
should be sufficient here."
 Previous studies may have been 'lucky' to find significant results, due
to random sampling variation.
 Calculations of sample size specific to the present, proposed study
should be provided, including details of
 power
 significance level
 primary outcome variable
 effect size of clinical importance for this variable
 standard deviation (if a continuous variable)
 sample size in each group (if comparing groups)
Smallest Worthwhile Effect
GB =
D = 0.01 s 38.07
USA s
=
38.08
s
The purpose of sample size formulae ‘is not to give
an exact number…but rather to subject the study
design to scrutiny, including an assessment of the
validity and reliability of data collection, and to give
an estimate to distinguish whether tens,
hundreds, or thousands of participants are
required’

Williamson et al. (2000) JRSS 163: p. 10


Practical Considerations
 We don’t always have the luxury of finding N
 Often, N fixed by feasibility
 We can then ‘vary’ power or δ
 But sometimes, even that is difficult.
 We don’t always have good guesses for all of the
parameters we need to know…
Not always so easy
 More complex designs require more complex
calculations
 Usually also require more assumptions
 Examples:
 Longitudinal studies
 Cross-over studies
 Correlation of outcomes
 Often, “simulations” are required to get a sample size
estimate.
Strategies for Minimizing
Sample Size
 Use continuous variables
 Paired measurements (consider measuring the
change)
 Use more precise variables
 Use unequal group sizes
N = [(c+1)/2c] x n (c = controls per cases)
 Use more common outcome
Things to Note
 Power is linked to effect size.
 All trials have an infinite number of powers!

 Post-hoc power calculations are pointless.


 Power conveyed by confidence interval.

 If secondary outcomes are important separate


sample size calculations should be done for these
too.
 The largest size resulting from these calculations should
be used so powerful enough for all analyses.
Statistical power is the probability of not missing
an effect, due to sampling error, when there really
is an effect there to be found.

Power is the probability (prob = 1 - β) of


correctly rejecting Ho when it really is false.
Power in a nutshell
 Get the biggest sample you can
 Benefits:
 Sample is more representative of the population
 More likely to discover the true relationship
 Reminder: Some things are independent or very nearly so-
ES = 0.
Maximum Power!
 In statistics we want to give ourselves the best chance
to find a significant result if one exists.
 Power represents the probability of finding that
significant result
 p(reject H0|Ho is false)
 As we have discussed it is directly related to type II
error rate ()
 Power = 1 - 
Two kinds of power analysis
 A priori
 Used when planning your study
 What sample size is needed to obtain a certain level of power?
 Post hoc
 Used when evaluating study
 What chance did you have of significant results?
 Not really useful.
 If you do the power analysis and conduct your analysis accordingly then
you did what you could. To say after, “I would have found a difference
but didn’t have enough power” isn’t going to impress anyone.
An acceptable level of power?
 Why not set power at .99?
 Practicalities
 Cost of increasing power, usually done through increasing
n, can be high
 Notice how for small effects one needs enormous sample
sizes to be able to reject the null
Post hoc power
 If you fail to reject the null hypothesis might want to
know what chance you had of finding a significant result
– defending the failure
 As many point out this is a little dubious
 Better used to help understand (vaguely) the likelihood
of other experiments replicating your results
 But again, your sample size tells you that already
Reminder: Factors affecting Power
 Effect size
 Bigger effects are easier to spot
 Alpha level
 Larger alpha = greater chance for type I error = more
‘liberal’ = less chance for type II error = greater power
 Sample size
 Bigger is better
Summary
 The power of a statistical test is influenced by the size of
the effect and sample size
 Effect size provides a useful tool for examining data when
sample size is small
 The smallest worthwhile effect can also be applied to
determine how many subjects would be required for
statistical significance
 Remember that our choice of data for this analysis was very
subjective in places.
Howell’s general rule
 Look for big effects
or
 Use big samples

 You may now start to understand how little power many of the
studies in psych have considering they are often dealing with small
effects
 Many seem to think that if they use the ‘rule of thumb’ for a single
sample size (N = 30), which doesn’t even hold that often for that
case, that power is solved too
 By the way you’d need N = 200 for a single sample and small effect (d =
.20)
 This is clearly not the case

You might also like