Professional Documents
Culture Documents
A
common question in orthodontic research is In this article, we will perform a sample calculation
“how many patients do I need for my study?” for a normally distributed quantitative outcome for
The next articles will introduce relevant con- a 2-arm trial with 1:1 allocation ratio (2-sided test).
cepts that will help readers to understand how to appro- Sample calculations are based on assumptions, and we
priately plan the size of a trial. should aim to detect differences between treatment
The objective of a clinical trial is to provide reliable groups, if they exist, that have clinical importance rather
evidence regarding the effect or no effect of a treatment than statistical significance.
modality. A sufficient number of participants allows the Before we proceed with the sample calculation, we
researcher to detect a difference with reasonable preci- need to define the following.
sion (good power) if a difference exists, or allows one
The research question.
to be reasonably certain that no difference exists if the
The principal outcome measure of the trial.
results show no difference. Small studies tend to be
m1, the anticipated mean response for the standard or
less convincing and inconclusive because they often
control treatment.
have low power. Recruiting more patients than necessary
m2, the anticipated mean response for the alternative
is a waste of resources and even unethical, since more
treatment and hence the minimum clinically impor-
patients than necessary could be exposed to a potentially
tant difference (m2 – m1) between treatment arms
ineffective therapy. There is a close relationship between
that we would like to detect.
power and sample size; usually, as the sample size in-
The standard deviation (for continuous outcomes
creases, study power is also expected to increase. Ideally,
only).
a balance between study power, a clinically important
The degree of certainty with which we want to be able
difference to be detected, trial feasibility, and credibility
to detect the treatment difference (power) and the
are required.
level of significance (type I error or a).
What is study power? Power is the probability of ob-
serving a difference between treatment groups when We will use an example trial to illustrate the pro-
a difference exists. A study designed to detect a clinically cess. Pandis et al,1 in a study assessing treatment
important difference with, let's say, a power of 80% time to alignment and dental changes between self-
assumes an 80% chance of observing a difference if ligating and conventional appliances, found that the
there is a difference, and also assumes a 20% chance molar width difference at the end of the follow-up pe-
of missing the difference (false negative) when such riod was 2 mm (SD, 2 mm), a statistically significant
a difference exists. Allowing a 20% (power 80%) or finding (Table II). This study was not randomized,
a 10% (power 90%) chance of a false negative (type II and the authors used different wires. Was the 2-mm
error or beta) is unavoidable, since a sample calculation difference in molar width genuine or was it observed
with 100% power (type II error approaching zero) would because wires of different shapes were used for the
require an infinite number of participants. Type I error, treatment groups? We would like to confirm or refute
or a or alpha, refers to false-positive results and indi- those findings by adopting a randomized control trial
cates that we are willing to accept a 5% (a 5 0.05) design and using exactly the same wire shape and se-
chance of observing a statistically significant difference quence for both treatment groups. As it was previously
when no such difference exists between the treatment explained, to perform the sample calculation, we would
groups. See Table I for descriptions and relationships need to decide what would be a clinically important
of error types and power. difference that we want to detect. We can refer to
the previous study and can assume that a molar width
difference of 2 mm between the 2 appliances at a cer-
Am J Orthod Dentofacial Orthop 2012;141:519-21 tain time after treatment initiation has clinical impor-
0889-5406/$36.00
Copyright Ó 2012 by the American Association of Orthodontists. tance. Then we can design a randomized control trial
doi:10.1016/j.ajodo.2011.12.010 with 90% power and a 5% level of significance, which
519
520 Statistics and research design
Table I. Types of errors in hypothesis testing at a 5% significance level and 80% power
Result of significance
test In reality, no difference exists In reality, a difference exists
Not significant 1 – a (5 0.95 or 95%) b or type II error (5 0.20 or 20%)
Correct conclusion, accepting the null hypothesis b 5 1 – power
(Ho) when the Ho is true Incorrect conclusion, rejecting the alternative hypothesis
(Ha) when the Ha is true
Significant a (5 0.05 or 5%) or type I error 1 – b (5 1 – 0.20 5 0.8 or 80%)
a 5 level of significance 1 – b 5 power
Incorrect conclusion, rejecting the Ho when the Correct conclusion, rejecting the Ho when the Ha is true
Ho is true
April 2012 Vol 141 Issue 4 American Journal of Orthodontics and Dentofacial Orthopedics
Statistics and research design 521
American Journal of Orthodontics and Dentofacial Orthopedics April 2012 Vol 141 Issue 4