Professional Documents
Culture Documents
Emad Masuadi
Importance of sample size calculation
• To ensure that estimates are obtained with the
required accuracy / precision (margin of error).
Emad Masuadi
Large sample size effect
100%
80%
60%
40%
20%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
Large sample size effect
100%
r is the correlation coefficient
r = -0.07
80% p-value =0.04
n =770
60% R² = 0.00544374403386072
40%
20%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
Large sample size effect
100%
r is the correlation coefficient
r = -0.07
80% p-value =0.04
n =770
60% R² = 0.00544374403386072
40%
0%
0 20 40 60 80 100 120 140 160 180 200
Emad Masuadi
Importance of sample size calculation
Inappropriate sample size can lead to:
• Wrong conclusions
• Poor quality research (Errors)
• Waste of resources (time, money and effort)
• Delay in meeting deadlines
Emad Masuadi
What is Statistics?
• Statistics is a science that deals with collecting and
analyzing data, drawing conclusions, and making
decisions.
• There are two main areas of Statistics:
– Descriptive statistics:
provides tabular and graphical techniques and
numerical measures for describing data.
– Inferential statistics:
provides procedures for analyzing data and
making decisions. Using the sample (statistic)
to infer about the population (parameter)
8
Emad Masuadi
Inferential Statistics
• Estimation: the process of approximating the population
characteristic by a value or range of values e.g.
– Estimate the population mean weight using the sample
mean weight
– Estimate the proportion of obese persons in the
population
• Hypothesis testing: the process of assessing the evidence
against a claim about the population characteristics e.g.
– Determine if the mean weight of males is greater than that
of females
– Determine if obese persons are more likely to be diabetic
as compared to normal weight persons
9
Emad Masuadi
Factors influence the sample size
Research question/
Objectives
Variables in research
question and their type
Hypothesis
Statistical Estimation
testing
Procedure
Prevalence Mean
Difference Prevalence SD
(difference) and SD
Emad Masuadi
Point and Interval Estimates
• A point estimate is a single number,
• a confidence interval provides additional
information about variability
Lower Upper
Confidence Confidence
Point Estimate Limit (UCL)
Limit (LCL)
Width of
confidence interval
11
Emad Masuadi
Confidence Interval Estimate
• An interval gives a range of values:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from one sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
• The general formula for confidence intervals is:
Confidence
Confidence z value,
Coefficient,
Level
1 z/2
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.576
99.8% .998 3.08
99.9% .999 3.27
14
Emad Masuadi
Estimation vs Hypothesis testing
Emad Masuadi
Estimation / Confidence intervals
• When the primary outcome is categorical
Parameter Value
Confidence level 95%
Prevalence From the literature review or
a pilot study or 50%
Margin of error (precision) (half of Usually 5% ()
the interval width) or 10% ()
N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Confidence level 95%
Standard deviation From the literature review or
a pilot study or (Range / 4)
Margin of error (precision) (half of the e.g. ()
interval width)
N (sample size) ?
Emad Masuadi
Sample size calculation
• Categorical variables:
– Expected Prevalence / Proportion
should be known
– If Not Known …
• Take 0.5 or 50% as the expected value
• This gives the maximum variance
• So it will give the sample size based on the
maximum variance possible
Emad Masuadi
Sample size calculation (confidence interval)
( za / 2 )2 pq
n=
E2
• z/2 = 1.96 for two-sided 95% confidence level
• p is the expected population prevalence
• q= 1-p
• E = (margin of error) is is half of the interval width
Emad Masuadi
Sample size calculation
• Numerical variables:
– Standard deviation (SD) or the expected
Variance of the population should be known
– If Not Known …
• Estimate the maximum possible range for
the outcome variable e.g. 110 to 190 mmHg
for systolic blood pressure
• So range is 190 – 110 = 80
• Divide the ‘maximum range’ by 4 to get
estimated value of ‘sd’ e.g. 80 / 4 = 20
Emad Masuadi
Normal Distribution Curve
Sample size calculation (confidence interval)
(t/2) x
2
n= E
Emad Masuadi
Hypothesis testing
• When the primary outcome is categorical
Parameter Value
Significance level 0.05
Prevalence difference From the literature review or
a pilot study
Power of the test Usually 80%
N (sample size) ?
• When the primary outcome is numerical
Parameter Value
Significance level 0.05
Mean difference and standard From the literature or
deviation a pilot study
Power of the test Usually 80%
N (sample size) ?
Emad Masuadi
Types of errors in hypothesis testing
Emad Masuadi
Types of errors
• The investigator choose α prior to the study
(common values are α = 0.05 or 0.01)
• Probability of committing type I error (reject the
null hypothesis when it is right) is called α (level of
significance)
• Probability of committing type II error is called β
• Power is (1- β)
• If β = 0.20 the power is 0.8:
20% is set as maximum chance of missing an
association if it exists (incorrectly finding no
association)
Emad Masuadi
Sample size calculators
• Websites:
– http://www.raosoft.com/samplesize.html
– http://www.stat.uiowa.edu/~rlenth/Power/
– http
://www.openepi.com/SampleSize/SSProp
or.htm
– http://www.powerandsamplesize.com/
• Software: PASS from
http://www.ncss.com/
Emad Masuadi
Thank You
Questions?
Emad Masuadi