You are on page 1of 26

Sample size calculation

Presented by: Dr Aamir Omair

Dr. Emad Masuadi, BSc, MSc, MPhil, Ph.D.


Assistant Professor of Biostatistics
Research Unit, College of Medicine,
King Saud bin Abdulaziz University for Health Sciences
Objectives
At the end of this session participants should be able to:

1. Explain the importance of sample size


calculation in research

2. Identify the factors that determine the sample


size

3. Identify the relevant software / website for


calculating the required sample size

Emad Masuadi
Importance of sample size calculation
• To ensure that estimates are obtained with the
required accuracy / precision (margin of error).

• The size of the sample influences both the


representativeness of the sample and the
statistical analysis of the data

• Smaller samples are more likely not to be


representative

• Larger samples are more likely to detect a


small difference between different groups

Emad Masuadi
Large sample size effect
100%

80%

60%

40%

20%

0%
0 20 40 60 80 100 120 140 160 180 200

Emad Masuadi
Large sample size effect
100%
r is the correlation coefficient

r = -0.07
80% p-value =0.04
n =770

60% R² = 0.00544374403386072

40%

20%

0%
0 20 40 60 80 100 120 140 160 180 200

Emad Masuadi
Large sample size effect
100%
r is the correlation coefficient

r = -0.07
80% p-value =0.04
n =770

60% R² = 0.00544374403386072

40%

Only 0.5% of y explained


by x
20%

0%
0 20 40 60 80 100 120 140 160 180 200

Emad Masuadi
Importance of sample size calculation
Inappropriate sample size can lead to:

• Wrong conclusions
• Poor quality research (Errors)
• Waste of resources (time, money and effort)
• Delay in meeting deadlines

Emad Masuadi
What is Statistics?
• Statistics is a science that deals with collecting and
analyzing data, drawing conclusions, and making
decisions.
• There are two main areas of Statistics:
– Descriptive statistics:
provides tabular and graphical techniques and
numerical measures for describing data.
– Inferential statistics:
provides procedures for analyzing data and
making decisions. Using the sample (statistic)
to infer about the population (parameter)
8
Emad Masuadi
Inferential Statistics
• Estimation: the process of approximating the population
characteristic by a value or range of values e.g.
– Estimate the population mean weight using the sample
mean weight
– Estimate the proportion of obese persons in the
population
• Hypothesis testing: the process of assessing the evidence
against a claim about the population characteristics e.g.
– Determine if the mean weight of males is greater than that
of females
– Determine if obese persons are more likely to be diabetic
as compared to normal weight persons
9

Emad Masuadi
Factors influence the sample size
Research question/
Objectives

Variables in research
question and their type

Hypothesis
Statistical Estimation
testing
Procedure

Outcome Outcome Outcome Outcome


(Categorical) (Numerical) (Categorical) (Numerical)

Significance Significance Confidence Confidence


level (5%) level (5%) Level (95%) Level (95%)

Power of the Power of the


Margin of error Margin of error
test (80%) test (80%)

Prevalence Mean
Difference Prevalence SD
(difference) and SD

Emad Masuadi
Point and Interval Estimates
• A point estimate is a single number,
• a confidence interval provides additional
information about variability

Lower Upper
Confidence Confidence
Point Estimate Limit (UCL)
Limit (LCL)
Width of
confidence interval
11
Emad Masuadi
Confidence Interval Estimate
• An interval gives a range of values:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from one sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
• The general formula for confidence intervals is:

Point Estimate  (Critical Value)(Standard Error) 12


Emad Masuadi
Margin of Error (E)
• Margin of Error (E): the amount added and subtracted
to the point estimate to form the confidence interval
Confidence interval for the Proportion Confidence interval for the Mean
E

𝑝^ ±𝑧 𝛼/2 √𝑝^ ¿¿¿ x  t  /2


s
n
 Data variation, σ/s: E as σ/s
 Sample size, n : E as n
 Level of confidence, 1 -  : E if 1 - 
13
Emad Masuadi
Common Levels of Confidence

• Commonly used confidence levels are 90%, 95%, and 99%

Confidence
Confidence z value,
Coefficient,
Level
1  z/2
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.576
99.8% .998 3.08
99.9% .999 3.27

14

Emad Masuadi
Estimation vs Hypothesis testing

• What is the prevalence of obesity among adult


females in Saudi Arabia?
– Confidence interval (one group) categorical variable
• Is there a difference in the prevalence of obesity
between males and females in Saudi Arabia?
– Testing hypothesis (two groups) categorical variable
• What is the mean BMI of adult females in Saudi
Arabia?
– Confidence interval (one group) numerical variable
• Is there a difference in the mean BMI between
males and females in Saudi Arabia?
– Testing hypothesis (two groups) numerical variable

Emad Masuadi
Estimation / Confidence intervals
• When the primary outcome is categorical
Parameter Value
Confidence level 95%
Prevalence From the literature review or
a pilot study or 50%
Margin of error (precision) (half of Usually 5% ()
the interval width) or 10% ()

N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Confidence level 95%
Standard deviation From the literature review or
a pilot study or (Range / 4)
Margin of error (precision) (half of the e.g. ()
interval width)
N (sample size) ?
Emad Masuadi
Sample size calculation
• Categorical variables:
– Expected Prevalence / Proportion
should be known

– If Not Known …
• Take 0.5 or 50% as the expected value
• This gives the maximum variance
• So it will give the sample size based on the
maximum variance possible

Emad Masuadi
Sample size calculation (confidence interval)

• Categorical variables (Single sample):

( za / 2 )2 pq
n=
E2
• z/2 = 1.96 for two-sided 95% confidence level
• p is the expected population prevalence
• q= 1-p
• E = (margin of error) is is half of the interval width

Emad Masuadi
Sample size calculation
• Numerical variables:
– Standard deviation (SD) or the expected
Variance of the population should be known
– If Not Known …
• Estimate the maximum possible range for
the outcome variable e.g. 110 to 190 mmHg
for systolic blood pressure
• So range is 190 – 110 = 80
• Divide the ‘maximum range’ by 4 to get
estimated value of ‘sd’ e.g. 80 / 4 = 20

Emad Masuadi
Normal Distribution Curve
Sample size calculation (confidence interval)

• Numerical variables (Single sample):

(t/2) x 
2

n= E

• t/2 = critical value for two-sided 95% confidence


level
•  is the expected standard deviation
• E (margin of error) is half of the interval width

Emad Masuadi
Hypothesis testing
• When the primary outcome is categorical
Parameter Value
Significance level 0.05
Prevalence difference From the literature review or
a pilot study
Power of the test Usually 80%

N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Significance level 0.05
Mean difference and standard From the literature or
deviation a pilot study
Power of the test Usually 80%

N (sample size) ?
Emad Masuadi
Types of errors in hypothesis testing

Truth in the population


Decision Null hypothesis is Null hypothesis is
True False

Fail to reject null Wrong decision


hypothesis Correct decision
Type II error (β)

Reject null Wrong decision


hypothesis Correct decision
Type I error (α)

Type I error: is rejecting the null hypothesis when it is actually true


Type II error: is failing to reject the null hypothesis when it is actually false

Emad Masuadi
Types of errors
• The investigator choose α prior to the study
(common values are α = 0.05 or 0.01)
• Probability of committing type I error (reject the
null hypothesis when it is right) is called α (level of
significance)
• Probability of committing type II error is called β
• Power is (1- β)
• If β = 0.20 the power is 0.8:
20% is set as maximum chance of missing an
association if it exists (incorrectly finding no
association)
Emad Masuadi
Sample size calculators
• Websites:
– http://www.raosoft.com/samplesize.html
– http://www.stat.uiowa.edu/~rlenth/Power/
– http
://www.openepi.com/SampleSize/SSProp
or.htm

– http://www.powerandsamplesize.com/
• Software: PASS from
http://www.ncss.com/
Emad Masuadi
Thank You

Questions?

Emad Masuadi

You might also like