# SAMPLE SIZE CALCULATION

Melchor V.G. Frias, IV Clinical Epidemiology Unit Angelo King Medical Research Center De La Salle Health Sciences Institute

Learning Objectives:
At the end of this session, learners should be able to: 1. Explain the concept/importance of sample size, 2. Explain and apply the concept of hypothesis testing, 3. Apply sample size formulas for descriptive and analytic studies, 4. Identify the requirements for sample size calculation , 5. Apply OPEN EPI/EPIINFO for sample size calculation for cross-sectional, cohort, casecontrol and experimental studies.

How many subjects are to be included in the sample?
SIZE CALCULATION  Why calculate? 
for 

SAMPLE

planning purposes  for ´powerµ of the study (low power ² it will have little chance of giving a statistically significant difference).  meaningful results (small sample - the study will have failed to establish that the intervention has no appreciable effect).

Open EPI) .How do we calculate sample size? Using formulas Using tables of sample sizes Using statistical calculators (StatCalc of EpiInfo.

Sample size calculation  Things  type to know: of the study: descriptive or analytic?  proportions or means? usual values?  amount of deviation from the true value? Clinically important difference?  confidence level? power?  one-tailed or two-tailed hypotheses .

Hypotheses testing The first thing to do when given a claim is to write the claim mathematically (if possible). . and decide whether the given claim is the null or alternative hypothesis.

if it represents change. or a statement of no change from the given or accepted condition. .Hypotheses testing If the given claim contains equality. otherwise. it is the alternative hypothesis. then it is the null hypothesis.

a statement about the population  null hypothesis (Ho) -.equality  alternative hypothesis (Ha) - two-tailed -- not equal  one-tailed -.Hypotheses testing  hypothesis -.one is greater than the other .

. Jim." said Dr.Hypotheses testing "He's dead. McCoy to Captain Kirk.

Hypotheses testing Mr. Spock. as the science officer. is put in charge of statistically determining the correctness of Bonesµ statement and deciding the fate of the crew member (to vaporize or try to revive) .

Hypotheses testing  His first step is to arrive at the hypothesis to be tested.  Does the statement represent a change in previous condition?  Yes. there is no change. H1  No. therefore it is the null hypothesis. there is change. H0 . thus it is the alternative hypothesis.

H1 : Patient is not alive (dead). The null hypothesis always represents no change. Dead represents a change from the accepted state* of alive. Therefore. the hypotheses are: H0 : Patient is alive.Hypotheses testing  The correct answer is that there is change. .

H1 false ) Patient is dead (H0 false .H1 true) .Hypotheses testing  Possible states of nature (Based on H0) Patient is alive (H0 true .

Hypotheses testing Decisions are something that you have control over. It depends on the state of nature as to whether your decision is correct or in error. . You may make a correct decision or an incorrect decision.

Hypotheses testing  Possible decisions (Based on H0 ) / conclusions (Based on claim ) Reject H0 / "Sufficient evidence to say patient is dead" Fail to Reject H0 / "Insufficient evidence to say patient is dead" .

Hypotheses testing  There are four possibilities that can occur based on the two possible states of nature and the two decisions which we can make. .

or that we don't have enough evidence to say that it isn't. In other words. because someone else might come along with another sample which shows that it isn't and we don't want to be wrong. .Hypotheses testing  Statisticians will never accept the null hypothesis. we'll say that it isn't. we will fail to reject. but we'll never say that it is.

Insufficient evidence of Insufficient death evidence of death Fail to reject H0 . Sufficient evidence of Sufficient death evidence of death Patient is dead. Patient is alive. Patient is alive.Hypotheses testing Statistically speaking: State of Nature Decision Reject H0 H0 True H0 False Patient is dead.

Hypotheses testing ² In English (or Clingon?) State of Nature Decision H0 True H0 False Reject H0 Vaporize a live person Try to revive a live person Vaporize a dead person Fail to reject H0 Try to revive a dead person .

Hypotheses testing ² Were you right? State of Nature Decision Reject H0 H0 True Type I Error alpha Correct Assessment H0 False Correct Assessment Fail to reject H0 Type II Error beta .

Hypotheses testing Which of the two errors is more serious? Type I or Type II ? State of Nature Decision Reject H0 H0 True Type I Error alpha Correct Assessment H0 False Correct Assessment Fail to reject H0 Type II Error beta .

Hypotheses testing State of Nature Decision Reject H0 H0 True Patient is alive. Sufficient evidence of death: vaporize a live person Correct Assessment H0 False Correct Assessment Fail to reject H0 Patient is dead. Insufficient evidence of death: revive a dead person Which of the two errors is more serious? Type I or Type II ? .

Hypotheses testing Disease actually present Diagnosis Disease present No Mis-diagnosis Yes Correct diagnosis Disease absent Correct diagnosis Missed diagnosis .

Hypotheses testing Assumption of innocence Judgment True False Correct judgment Pronounced guilty Serious error in judgment Pronounced not guilty Correct judgment Error in judgment .

Hypotheses testing  Since Type I is the more serious error (usually). that is the one we concentrate on. usually pick alpha to be very small (0.01). . Note: alpha is not a Type I error. Likewise beta is the probability of committing a Type II error. 0.05.  We  Alpha is the probability of committing a Type I error.

.Hypotheses testing Conclusions  Conclusions are sentence answers which include whether there is enough evidence or not (based on the decision). and whether the original claim is supported or rejected. the level of significance.

which may be the null or alternative hypotheses. The decisions are always based on the null hypothesis .Hypotheses testing Conclusions  Conclusions are based on the original claim.

Hypotheses testing .Conclusions Original Claim H0 "REJECT" H1 "SUPPORT" Decision Reject H0 "SUFFICIENT" There is sufficient evidence at There is sufficient evidence at the alpha the alpha level of significance level of significance to to support the claim that (insert original claim here) reject the claim that (insert original claim here) There is insufficient There is insufficient evidence at evidence at the alpha the alpha level of significance level of significance to to support the claim that (insert original claim here) reject the claim that (insert original claim here) Fail to reject H0 "INSUFFICIENT" .

=. not equal. it is the null hypothesis.  The null hypothesis always includes the equal sign. The decision is based on the null hypothesis. or >=).  .Definitions  Null Hypothesis ( H0 ) Statement of zero or no change. >) then the null hypothesis is the complement of the original claim.  If the original claim does not include equality (<.  If the original claim includes equality (<=.

Definitions  Alternative Hypothesis ( H1 or Ha )  Statement which is true if the null hypothesis is false. . or two-tail) is based on the alternative hypothesis.  The type of test (left. right.

Definitions  One-Tailed (Sided) Test .

Definitions  Two-Tailed (Sided) Test .

Definitions  Type I error  Rejecting the null hypothesis when it is true (saying false when true).  Type II error  Failing to reject the null hypothesis when it is false (saying true when false). Usually the more serious error. .

E .probability of committing Type I error 1.Definitions  alpha (E .the confidence level beta.

ability to detect a true difference  .F .power of the study.probability of committing Type II error 1.F .

. use alpha = 0. alpha = 0.  The level of significance is the complement of the level of confidence in estimation.Definitions  Significance level ( alpha )  The probability of rejecting the null hypothesis when it is true.05 and alpha = 0.01 are common.  If no level of significance is given.05.

80 . 1.E (confidence level) = .Confidence level.20. 1.05.F (power) = 0.95 F= 0. Power  Usual Values: E= 0.

10%  increase desired  decrease significance level . Power  The easiest ways to increase power are to:  increase sample size difference (or effect size) desired e.Confidence level.g.

 We will never accept the null hypothesis.Definitions  Decision A statement based upon the null hypothesis. .  It is either "reject the null hypothesis" or "fail to reject the null hypothesis".

.  and whether the original claim is rejected (null) or supported (alternative).  at what level of significance.Definitions  Conclusion A statement which indicates the level of evidence (sufficient or insufficient).

How do we calculate sample size?
A.J. Dobson·s formula (SIMPLE RANDOM SAMPLE)  descriptive studies 
population proportion  population mean 

analytic

studies 

comparing two

proportions  comparing two means

Sample size for descriptive studies

1. Estimation of a population proportion

p(100  p) n! v f (1  E ) 2 (
where n = computed sample size p = estimate of the proportion ( = the desired width of the confidence interval 1- E = confidence level

Sample size for descriptive studies

1. Estimation of a population proportion
Table 1 Values for f(1-E) for various confidence levels 100 (1-E) %

(1-E)

0.8

0.9

0.95 0.99

f(1-E)* 1.642 2.706 3.842 6.635
* f(1-E) is the square of the upper 1/2 E point of the std. Normal Distribution

and a 95% confidence interval will be used for an interval of 4% (11-19%)? p(100  p ) v f (1  E ) n! 2 ( .Sample size for descriptive studies 1. What is the sample size if it is expected that the smoking prevalence is 15%. Estimation of a population proportion A researcher wants to estimate the smoking prevalence in high school students .

95 0.642 2. Estimation of a population proportion Table 1 Values for f(1-E) for various confidence levels 100 (1-E) % (1-E) 0. Normal Distribution .99 f(1-E)* 1.Sample size for descriptive studies 1.9 0.706 3.842 6.635 * f(1-E) is the square of the upper 1/2 E point of the std.8 0.

Estimation of a population proportion A researcher wants to estimate the smoking prevalence in high school students .842 2 4 n ! 306 .Sample size for descriptive studies 1. and a 95% confidence interval will be used for an interval of 4% (11-19%)? p(100 p) n! v f (1  E ) 2 ( 15 (100  15 ) n! v 3 . What is the sample size if it is expected that the smoking prevalence is 15%.

Estimation of a population mean s n ! 2 v f (1  E ) ( where n = computed sample size s = estimate of the standard deviation of the observations ( = the desired width of the confidence interval 1.Sample size for descriptive studies 2.E = confidence level 2 .

9 0.706 3.842 6.99 f(1-E)* 1.635 * f(1-E) is the square of the upper 1/2 E point of the std. Normal Distribution .95 0. Estimation of a population mean Table 1 Values for f(1-E) for various confidence levels 100 (1-E) % (1-E) 0.642 2.8 0.Sample size for descriptive studies 2.

Estimation of a population mean A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men. How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml? s n ! 2 v f (1E) ( 2 .Sample size for descriptive studies 2.

635 * f(1-E) is the square of the upper 1/2 E point of the std.842 6. Normal Distribution .9 0.Sample size for descriptive studies 2.8 0. Estimation of a population mean Table 1 Values for f(1-E) for various confidence levels 100 (1-E) % (1-E) 0.642 2.95 0.99 f(1-E)* 1.706 3.

How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml? s n ! 2 v f (1  E ) ( 40 n ! 2 v 2. Estimation of a population mean A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men.706! 43 10 2 2 .Sample size for descriptive studies 2.

Hypothesis testing between two proportions p1(100  p1)  p2(100  p2) n! v f (E .Sample size for analytic studies 1.F = power of the test .E = confidence level 1. p2 = estimate of the sample proportion for each group 1. F ) 2 ( p1  p2) where n = computed sample size p1.

04 7.02 10.05 2.63 11.5 0.01 5.18 8. 10.85 13.51 14. Hypothesis testing between two proportions Table 2 Values for f(E. E one-taile t o-taile Si 0.F)* ower.71 6.88 * f(E.84 0.68 10.8 0. Normal distribution .01 6.Sample size for analytic studies 1.9 ifica ce le el.41 0.56 0.05 3.F) is the square of the sum of the upper tail F and the upper tail E point (for one tailed test) or 1/2 E point (for two-tailed test) of the std.

F ) 2 ( p1  p2) . Hypothesis testing between two proportions A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection. How many patients are needed if the investigator wants 90% power and 95% confidence? p1(100  p1)  p2(100  p2) n! v f (E . The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%.Sample size for analytic studies 1.

Sample size for analytic studies 1.18 8. Normal distribution .41 two-tailed 0.9 Significance level.84 0.85 13.63 11.04 7.01 6.56 0. Hypothesis testing between two proportions Table 2 Values for f(E.88 * f(E.8 0.05 3.02 10.68 10.71 6.01 5.05 2.5 0. E one-tailed 0. 1-F 0.F) is the square of the sum of the upper tail F and the upper tail E point (for one tailed test) or 1/2 E point (for two-tailed test) of the std.F)* Power.51 14.

The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%. F ) 2 ( p1  p 2 ) 80(100  80)  85(100  85) n! v 8.Sample size for analytic studies 1. Hypothesis testing between two proportions A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection.56 ! 984 2 (80  85) . How many patients are needed if the investigator wants 90% power and 95% confidence? p1(100  p1)  p 2(100  p 2 ) n! v f (E .

F ) 2 ( where n = computed sample size s = estimate of the standard deviation of the observations.Sample size for analytic studies 2.E = confidence level 1.F = power 2 . assuming it is the same for each group ( = the true difference between the means 1. Hypothesis testing between two means 2s n ! v f (E .

01 5.85 13.41 0. Hypothesis testing between two means Table 2 Values for f(E.8 0.01 6.18 8. Normal distribution .68 10.71 6.56 0. 10.04 7.51 14.Sample size for analytic studies 1.05 2. E one-taile two-taile 0.5 0.63 11.84 0.05 3.88 * f(E.02 10.F) is the square of the sum of the upper tail F and the upper tail E point (for one tailed test) or 1/2 E point (for two-tailed test) of the std.F)* ower.9 Si nificance le el.

Sample size for analytic studies 2. how many patients are needed for a two-tailed test at the 5% significance level. Hypothesis testing between two means To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg. F ) 2 ( 2 . and power of 90%? 2s n ! v f (E .

Hypothesis testing between two means Table 2 Values for f(E.9 Significance level.51 14.5 0.01 6.8 0.02 10.01 5. E one-tailed 0.F) is the square of the sum of the upper tail F and the upper tail E point (for one tailed test) or 1/2 E point (for two-tailed test) of the std. normal distribution .84 0.05 2.56 0.68 10.88 * f(E.Sample size for analytic studies 2.05 3.71 6.41 two-tailed 0.85 13.63 11. 1-F 0.18 8.04 7.F)* Power.

51 ! 84 2 5 2 . and power of 90%? 2 s n ! v f (E .Sample size for analytic studies 2. Hypothesis testing between two means To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg. how many patients are needed for a two-tailed test at the 5% significance level. F ) 2 ( 2 2 ( 10 ) n ! v 10 .

gov/epiinfo/Epi6/ei6.htm  STATCALC program .cdc.Sample size calculation using EPI-Info6 EPIhttp://www.

What is the sample size needed if expected mean hgb level after treatment for group A is 132.44 with sd of 18.86 with standard deviation of 15.34 and the mean hemoglobin level for group B is 127.23? . Hypothesis testing between two means To compare two antianemia treatment groups in terms of outcome of hemoglobin level.Sample size for analytic studies 2.

.

Cavite? Odds of exposure among diseased = 175/75 = 2.11 Odds Ratio = 21 .3 Odds of exposure among non-diseased = 25/225 = 0.Sample size for analytic studies Case Control Study Research question: Is there an association between receiving HRT and development of breast CA among women in Dasmarinas.

You need to have an estimate of the percentage of exposure among the controls and either the odds ratio or the percentage of exposure among cases .

Cavite ? Incidence of disease among exposed = 150/500 = 0.375 .3 Incidence of disease among unexposed = 400/500 = 0.8 Relative Risk = 0.Sample size for analytic studies Cohort Study Research question: Is Hib vaccine associated with the development of leukemia among children in Dasmarinas.

. RR or the percentage of the outcome among the exposed. and either an OR.You need to know the percentage of outcome among the unexposed.

Flubendazole group (Exposed) + resolution (-) resolution + resolution Mebendazole group (Unexposed) (-) resolution . Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole.Calculate sample size: RCT Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.

Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole. Flubendazole group (Exposed) + resolution (-) resolution + resolution 75% 50% Mebendazole group (Unexposed) (-) resolution .Calculate sample size: RCT Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.

.

50% with resolution in Mebendazole group 75% with resolution in flubendazole group .

.

General comments on estimation of sample size the sample size as early as possible during the design phase.  Complex data analysis generally requires larger samples than simple analysis.  Compute . longitudinal studies require a larger sample size than case-control and cross sectional studies. (to estimate the resources required and the feasibility of the study. the larger the sample size. all other things being equal.  In general.  The rarer the condition being investigated.

.General comments on estimation of sample size  The higher the level of accuracy and precision desired for the resulting estimates. the larger the sample size necessary. sample sizes are estimated separately for each item.  When more than 1 item or outcome are to be studied. The final sample size will be a compromise between the largest n and the resources to conduct the study.

 Explained and applied the concept of hypothesis testing. cohort.  Applied sample size formulas for descriptive and analytic studies.  Explained the .Summary concept/importance of sample size.  Identified the requirements for sample size calculation . case-control and experimental studies.  Introduced OPEN EPI/EPIINFO for application in sample size calculation for cross-sectional.

resources)  Statistical inference allows .Summary us to generalize sample results to the target population  sample size is based on the  research objectives/design  sample estimates. variability from previous studies  power. level of confidence  operational constraints (time.