You are on page 1of 26

Sample size calculation &

software
KWENTI E. TEBIT
Why do sample size calculations?

 Prospective study design with sample size calculation helps to avoid studies that are:
• Too small: leads to equivocal results. An under powered study may dismiss a potentially beneficial
treatment, or may fail to detect an important relationship.
• Too large: wastes resources.
 Both sample size errors create ethical issues when using humans or animals.
• Too small: you have exposed them to harm with little likelihood of learning anything.
• Too big: you have exposed more of them to harm than was necessary.
Why do sample size calculations?

 Secondary benefit: Makes for better studies. Before you can do a sample size calculation, you will
have to:
• Define the scientific issue you are addressing.
• Translate the issue into research questions or hypotheses.
• Determine what data are needed.
• Formulate the questions or hypotheses in terms of parameters describing the distribution of the
data to be collected.
• Map out the statistical analysis plans
Cont…

 The process of sample size calculation can substantially improve study design. It requires one to
think through:
• definition of the scientific issue
• how the scientific issue is being formulated as an empirical question
• sampling plan
• variables to be collected
• statistical analysis plan
• expected results
 In general, if the details of implementation has been glossed over, this will become obvious
during sample size calculation.
 Recall that the p-value from a hypothesis test can be used to
1. decide whether to reject the null hypothesis (reject if p-value less than α)
2. summarize the evidence against the null
 For the purposes of designing a study we use the first method.
 Typically α = 0:05.
 When we run the study we can also interpret the p-value as evidence against the null.
Definitions:

• A type I error occurs if the null hypothesis is true and it is rejected.


• A type II error occurs if the null hypothesis is false and it is not rejected.

 Table of the possible outcomes of a hypothesis test


Cont…

 What is probability of a type I error occurring?


 Recall that the cutoff p-value α is the probability of rejecting the null hypothesis when the null
hypothesis is true.
α = Pr(reject H0jH0 is true)
also
Pr(Type I error) = Pr(reject H0jH0 is true)
so
α = Pr(Type I error)

 Usually α is set to 0.05.


Cont…

 β = Pr(Type II error) = Pr(fail to reject H0jH0 is false)


 When designing a study we must chose both α and β.
 We want both to be small but typically we set α smaller than β
• A type II error (claiming no difference when there is a difference) is not considered as bad as
• a type I error (claiming that there is a difference when there isn’t).
 While α is usually set to 0.05, β is usually set to 0.20 or 0.10
Cont…

 The power of a test is the probability of the correct decision when the null hypothesis is false.
 Power = Pr(reject H0jH0 is false)
 That is, the power is the probability of finding an effect when an effect exists.
 Power = Pr(reject H0jH0 is false)
 = 1 − Pr(fail to reject H0jH0 is false) = 1 − β
Cont…

 The probabilities α and β refer to what could happen in the study


 By designing the study with respect to these parameters, we minimize the probability of incorrect
conclusions.
 That is, for a given null and alternative hypothesis of interest, we design the study that has
adequate statistical power to lead to a correct decision.
 Power typically should be .8 or above.
Problems with Over- and Under-Powered Studies

Over powered:
 If the sample size is too large the study will be able to detect very small differences.
 This is a waste of money and time if the difference is so small it is scientifically or clinically
unimportant.
 If the intervention is risky you have put too many individuals at risk.

Under powered:
 If the sample size is too small the study will be unable to detect differences that are scientifically or
clinically important.
 The risk taken by the individuals in the study was unnecessary because the study was unlikely to
detect clinically important effects.
 Also a waste of money and time.
Methods of calculating sample sizes

 There are three main methods of estimating samples sizes; online, software and manual
calculation.

Online
 There are websites that can be used for calculating sample sizes hosted by a number of
organisations and are open for the public such as
 www.surveysystem.com/sscalc.htm
 www.nss.gov.au/nss/home.nsf/pages/Sample+size+calculator
 www.raosoft.com/samplesize.html
 https://www.surveymonkey.com/mp/sample-size-calculator/
 powerandsamplesize.com/
 www.calculator.net › Math Calculators
 https://fluidsurveys.com/survey-sample-size-calculator/

Manual computation
 Using formulae and calculators you can estimate the sample size required for your study such
as
𝒁2 𝒙 𝑷 (𝟏 −𝒑)
 n= 𝒆²
} Lorentz formula
 There are formulae for case control studies, comparison of proportions, comparison of means,
diagnostic accuracy and other descriptive studies. (see Hajian-Tilaki K. Journal of Biomedical
Informatics 2014; 48: 193–204).
Software
 Statistical software can also be used to calculate sample size e.g. Epi Info, SPSS, Stata, SAS,
R, Minitab etc.

 These do not give you control over your type 1 and type 2 errors.
 G*Power offers you with great control over your type 1 and type 2 error and provide a more
accurate approach in estimating your sample size.
Introduction to G*Power

 Many papers published in the scientific literature do not have enough power to make any
conclusion.
 G*Power is an easy to use program for performing various types of power analysis.
 G*Power version 3.1.9.2 was written by Franz Faul, Universitât Kiel, Germany.
 It is the most widely used software for power analysis.
Types of power analyses

 There are two most common types—a priori and post-hoc power analysis.

a priori
 An a priori analysis is done before a study takes place.
 It is the ideal type of power analysis because it provides users with a method to control both
the type 1 error probability α and the type 2 error probability β
 By implication, it also controls the power of the test, that is, the complement of the type-2
error probability (1 - β) (i.e., the probability of correctly rejecting H0 when it is in fact false).
 An a priori analysis is used to determine the necessary sample size N of a test given a desired α
level, a desired power level (1 - β), and the size of the effect to be detected
 i.e. a measure of the difference between the H0 and the H1

post-hoc analysis
 a post-hoc analysis is typically performed after a study has been conducted so that the sample
size N is already a matter of fact. Given N, α, and a specified effect size, this type of analysis returns
the power (1 – β), or the error probability of the test.
 Obviously, post-hoc analyses are less ideal than a priori analyses because only α is controlled, not β.
Both β and its complement (1 - β) are assessed but not controlled in post-hoc analyses.
 Thus, post-hoc power analyses can be characterized as instruments providing for a critical
evaluation of the (often surprisingly large) error probability β associated with a false decision in
favor of the H0.
Getting to know G*Power

 When you open G*Power for the first time, it presents with 3 window; input parameters, output
parameters, and a window presenting the distribution plot.
 Above you have the quick access toolbar with commands such as file, edit, view, tests,
calculator etc.
 Below the distribution plot window, you have three drop down menus; test family, statistical
test, type of power analysis.
 Below the input and output parameters, you have commands to plot an X-Y graph for a range
of values.
How to use G*Power

 Before you start using G*Power, you have to know


1) The type of study design
2) The endpoint of your study
3) The effect size to use
4) The power you intend to achieve
5) The α level you will be working with
6) The number of groups you will be working with.
Types of studies

1) Observational 2) Experimental
 Descriptive studies  Randomised controlled trails
 Ecological studies  Field trials
 Ecological fallacy  Community trials
 Cross-sectional studies
 Case-control studies
 Cohort studies
Effect size

 An effect size is simply an objective and standardized measure of the magnitude of observed
effect (Field, 2005).
 The fact that the measure is standardize thus means that we can compare effect sizes across
different studies that have measured different variables, or have used different scales of
measurement.
 The most common measures of effect sizes are Cohen’s d, and Pearson’s correlation
coefficient, r.
 Others include Hedges’ g, Glass’ , odd ratios and risk rates
1) Correlation coefficients (r)

 r can also be used to express difference between means and is constrained to lie
between 0 (no effect) and 1 (a perfect effect).
 r can also be used to express the difference between two groups.
 r is related to the t in the t-test: r can be easily obtained from several common test
statistics.
 For example, if a t test has been used, r is a function of the observed t-value and the
degree of freedom, df, on which it is based:
 When ANOVA has been used and an F-ratio is the test statistics, then when there is 1
degree of freedom for the effect, the following conversion can be used:

 In which F(1,-) is simply the F-ratio for the effect (which must have 1 degree of
freedom) and dfR is the degrees of freedom for the error term on which the F-ratio is
based.

 r can also be used to express relationships in categorical data because it is directly


related to the chi-square statistic (again, provided this chi-square statistic has only 1
degree of freedom):
 r can be calculated from the probability value of a test statistic.

 r = 0.10 (small effect): in this case, the effect explains 1% of the total variance.
 r = 0.30 (medium effect): the effect accounts for 9% of the total variance.
 r = 0.50 (large effect): the effect accounts for 25% of the total variance.
2) Cohen’s d

Where pooled SD =

.8 = large (8/10 of a standard deviation unit)

.5 = moderate (1/2 of a standard deviation)

.2 = small (1/5 of a standard deviation)


Exercise

 Among 7th graders in Lowndes County Schools taking the CRCT reading exam (N = 336),
there was a statistically significant difference between the two teaching teams, team 1 (M
= 818.92, SD = 16.11) and team 2 (M = 828.28, SD = 14.09). Compute the effect size.

You might also like