You are on page 1of 104

Special Lecture

For
First Year Medical Students
1. Define sampling
2. Enumerate the advantages of sampling
3. Enumerate and discuss the uses of sampling
4. Define some terms used in sampling
a. Population and sample
b. Target population and sampling population
c. Sampling unit and elementary unit
d. Sampling frame
5. Enumerate and discuss the two types of
sampling designs
6. Enumerate the criteria of a good sampling
design
 Most of the time is not
feasible

 Difficult

 Expensive
- is the act of studying or examining
only a segment of the population to
represent the whole

- whatever findings we get for this


segment of the population, we
generalize for the total population
1. Cheaper
2. Faster
- shorter time spent for data collection and
processing
3. Better quality of information
- smaller number of data collectors who can be
trained more rigidly and supervised closely.
4. More comprehensive data may be obtained
- Detailed questions on a specific topic
5. Only possible method for destructive
procedure
1. Evaluating the health status of a population
- estimating the magnitude or the extent of various health problems
or conditions
2. Investigating the factors affecting health
- identify the risk factors for given diseases or the determinants of
certain conditions or practices
3. Evaluating the effectiveness of health measures
4. Assessing specific aspects in the administration of
health services
- concerns regarding availability, accessibility and quality of health
services rendered in a given area
5. Evaluating the reliability and completeness of record
systems like the vital registration system and hospital
records
 Population
 refers to the entire group of individuals or items
of interest in the study.
 Sample
 a subset or a segment of the population
 Target population
 the group from which representative information
is desired and to which inferences will be made
 Sampling population
 the population from which a sample will actually
be taken
 Sampling unit
 the units which are chosen in selecting the
sample
 maybe made up of non-overlapping collection
of elementary units
 Elementary unit
 an object or a person on which a
measurement is actually taken or an
observation is made
 Sampling frame
 a collection of all the sampling units
1. Probability sampling design
- specify the rules and procedures for selecting
the sample and estimating the parameters
- each unit in the population has a known,
non-zero chance of being included in the
sample
2. Non-probability sampling design
- the probability of each member of the population to be
selected in the sample is difficult to determine or
cannot be specified.
- standard errors cannot be computed and methods of
statistical inference cannot be applied
- best utilized only for descriptive purposes not for
generalizations or inferences about the target
population
1. Judgmental or purposive sampling
- the most common type of non-probability
sampling
- a “representative” sample of the population is
selected based on an expert’s subjective
judgment or on some pre-specified criteria.

2. Accidental or haphazard sampling


- a researcher may use in his study whatever
items come at hand or whoever is available
3. Quota sampling
- Interviewers may be given instructions to
keep on interviewing household heads in a
given place until the pre-specified quota for
that place has been reached.
4. Snow-ball technique of sampling
- used when studying “hidden populations” like
drug users and prostitutes
- the first person identified to be a member of
the target population will be interviewed for
the study and will be asked to identify other
members of the population
1. Simple random sampling (SRS)
- The most basic type of sampling design
- every element in the population has an equal chance of
being included in the sample
- use table of random numbers or computer software
100 samples – number the samples starting with 001 to
100 (30 study subjects)
64249 63664 39652 40646
26538 44249 04050 48174
05845 00512 78630 55328
74897 68373 67359 51014
20872 54570 35017 88132
2. Systematic sampling
- the sampling interval, k, is first determined
- k is the ratio of the population size (N) to the
sample size (n). K=N/n
Get a sample of 10 households from a
population of 80
k = 80 / 10 = 8
one household will be selected as sample for
every 8 households in the population
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 80
3. Stratified random sampling
- The population is first divided into non-overlapping
groups called “strata”
- a simple random sample is then selected from each
stratum

Getting 100 UST medical students as samples


1st year – 550 (550/1630) X 100 = 34
2nd year – 400 (400/1630) X 100 = 25
3rd year – 380 (380/1630) X 100 = 23
4th year – 300 (300/1630) X 100 = 18
Total 1,630 100
3. Cluster sampling
- the population is first divided into clusters
which serve as the sampling units and a
sample of units is selected
- every elements found in each sampling unit
drawn as sample may or may not be included
in the study

360 students needed in the study


school classes serving as clusters (50), each
class has 40 students
How many clusters do I need ?
4. Multi-stage sampling
- The population is first divided into a set of primary or first-
stage sampling units. A sample of such units is selected.
- Each primary sampling units included in the sample is
further subdivided into secondary or second-stage
sampling units from which a sample will again be taken
- The procedure continues until the desired stage is reached.
EXAMPLE:
- A study of drug-abuse among teenagers (nationwide
study)
- Regions (15) select 6
- Provinces, select 2 12 provinces
- High schools, select 4 48 high schools
- 50 students per school 2400 students
1. The sample to be obtained should be
representative of the population
2. The sample size should be adequate
3. The sampling procedure must be practical and
feasible.
4. The sampling design must be economical and
efficient ( cost-effective, must give the most
information at the smallest cost)
 Sample size is usually the most important factor
determining the time and funding necessary to perform
the research
 It has a profound impact on the likelihood of finding
statistical significance
 Inadequate sample size sometimes may explain why
apparently useful clinical studies are not statistically
significant
 The specific formulas to be applied in the
computation of sample size are varied and
numerous, and depend mainly on the nature of
the study and sampling design used.
1. Direct calculation using the formula

2. Determination of sample size using specific


tables

3. Using computer software (epi info)


1. Study design used
- There is a formula for sample size estimation
corresponding to each type of sampling and study design
- In general, cluster sampling designs require larger
samples than simple random sampling designs
- Longitudinal studies also require larger samples than
cross-sectional study
2. Magnitude of the parameter being estimated
- The rarer the condition being investigated, the larger the
necessary sample size
3. Variability of the parameter being estimated
- The more heterogeneous the parameter is in the
population, the larger the sample size that is necessary

4. Level of precision desired


- This usually expressed in terms of the maximum
permissible error that is desired
- The lower the desired error, the larger the sample size

5. Data analysis plan


- Multivariate data analysis generally requires larger
samples than univariate analysis
1. How much money is available to do the work ?
2. How many people can be involved in the different
phases of the study ?
3. How fast can they work ?
4. How much time is available to finish the project ?

All of these factors, both statistical and practical are very


important considerations in deciding on the number of
samples to be included in a given study
1. Whether the research design involves paired data or
unpaired data
2. Whether the investigator anticipate a large or small
variance in the variable of interest
- The larger the variance (s2) is, the larger the sample size must be
3. Whether the investigator wishes to consider beta (type
II or false negative) errors in addition to alpha (type I or
false positive) errors
4. Whether the investigator choose the usual
alpha level (p value of 0.05 or confidence
interval of 95%) or chooses a smaller level of
alpha
- To decrease the probability of being wrong from 5% to
1% would require the sample size to double
5. Whether the alpha chosen is one-sided or two
sided
6. Whether the investigator wants to be able to
detect a fairly small or extremely small
difference between the means or proportion of
the outcome variable
- To have considerable confidence that a mean
difference shown in a study is real, the analysis
must produce small p-value for the observed mean
difference, which implies that the value of t or z
was large
Study Characteristic Assumptions Made by Investigator

Type of study Randomized controlled trial of a drug to reduce 5-year


mortality in patients with a particular form of cancer
Data sets Observations in one experimental group (E) and one
control group © of the same size
Variable Success = 5-year survival after treatment (expected
to be 0.6 in the experimental group and 0.5 in the
control group); failure = death within 5 years of
treatment
Losses to follow-u None
Variance, expressed as
p (1-p) p = 0.55; (1-p) = 0.45
Data for alpha (z) = 0.05; 95% confidence desired (two-tailed test);
z = 1.96
Data for beta (z) 20% beta error; 80% power desired (one-tailed test);
z = 0.84
Difference to be detected (d) 0.1 difference between the success (survival) of the
experimental group and that of the control group (i.e.,
10% difference - because pE=0.6 and pC=0.5)
1. State the null hypothesis and either a one- or two-
tailed alternative hypothesis
2. Select the appropriate statistical test from the Table
based on the type of predictor variable and outcome
variable in those hypotheses.
3. Choose a reasonable effect size (and variability, if
necessary)
4. Set  and  (If the alternative hypothesis is one-tailed,
use a one-tailed ; otherwise, use a two-tailed 
5. Use the appropriate table or formula to estimate the
sample size
Predictor Outcome variable
variable Dichotomous Continuous
Dichotomous Z statistic t test

Continuous t test Correlation coefficient


One-tailed  = 0.005 0.025 0.05
Two-tailed  = 0.01 0.05 0.10
E/S*
 0.05 0.10 0.20 0.05 0.10 0.20 0.05 0.10 0.20
.10 3563 2977 2337 2599 2102 1570 2165 1713 1237
.15 1584 1223 1038 1155 934 698 962 762 550
.20 891 744 584 650 526 393 541 428 309

.25 570 476 374 416 336 261 346 274 198
.30 396 331 260 289 234 174 241 190 137
.40 223 186 146 162 131 98 135 107 77

.50 143 119 93 104 84 63 87 69 49


.60 99 83 65 72 58 44 60 48 34
.70 73 61 48 53 43 32 44 35 25

.80 56 47 36 41 33 25 34 27 19
.90 44 37 29 32 26 19 27 21 15
1.00 36 30 23 26 21 16 22 17 12
•E/S is the standardized effect size, computed as E (expected effect size) divided by S (standard deviation of
the outcome variable)
Smaller of Upper number:  = 0.05 (one-tailed) or  = 0.10 (two-tailed);  = 0.20
P1 and P2* Middle number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.20
Lower number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.10
Expected difference between P1 and P2

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

.05 342 110 59 38 27 21 17 13 11 9


434 140 75 49 35 27 21 17 14 12
581 187 100 65 46 35 28 22 19 15

.10 539 156 78 48 33 25 19 15 12 10


685 199 99 62 43 31 24 19 16 13
916 266 133 82 56 42 32 25 21 17

.15 712 197 95 57 38 28 21 16 13 11


904 250 120 72 49 35 27 21 17 14
1210 334 161 96 65 47 35 28 22 19

.20 860 231 108 64 42 30 23 17 14 11


1093 293 138 81 54 38 29 22 18 14
1462 392 184 108 72 51 38 29 23 19
Smaller of Upper number:  = 0.05 (one-tailed) or  = 0.10 (two-tailed);  = 0.20
P1 and P2* Middle number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.20
Lower number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.10
Expected difference between P1 and P2

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

.25 984 258 119 69 45 32 24 18 14 11


1249 3328 152 88 58 41 30 23 18 14
1672 439 203 117 77 54 40 30 24 19

.30 1083 280 128 73 47 33 24 18 14 11


1375 356 162 93 60 42 31 23 18 14
1840 476 217 124 80 56 41 31 24 19

.35 1157 295 133 75 48 33 24 18 14 11


1469 375 169 96 61 42 31 23 18 14
1966 502 226 128 82 56 41 30 23 18

.40 1206 305 136 76 48 33 24 17 13 10


1532 387 173 97 61 42 30 22 17 13
2050 518 231 129 82 56 40 29 22 17
Smaller of Upper number:  = 0.05 (one-tailed) or  = 0.10 (two-tailed);  = 0.20
P1 and P2* Middle number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.20
Lower number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.10
Expected difference between P1 and P2

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

.45 1231 308 136 75 47 32 23 16 12 9


1563 391 173 96 60 41 29 21 16 12
2092 523 231 128 80 54 38 28 21 15

.50 1231 305 133 73 45 30 21 15 11 -


1563 387 169 93 58 38 27 19 14 -
2092 518 2269 124 77 51 35 25 19 -

.55 1206 295 128 69 42 28 19 13 - -


1532 375 162 88 54 35 24 17 - -
2050 502 217 117 72 47 32 22 - -

.60 1157 250 119 64 38 25 17 - - -


1469 356 152 81 49 31 21 - - -
1966 476 203 108 65 42 28 - - -
Smaller of Upper number:  = 0.05 (one-tailed) or  = 0.10 (two-tailed);  = 0.20
P1 and P2* Middle number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.20
Lower number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.10
Expected difference between P1 and P2

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

.65 1083 258 108 57 33 21 - - - -


1375 328 138 72 43 27 - - - -
1841 439 184 96 56 35 - - - -

.70 984 231 95 48 27 - - - - -


1249 293 120 62 35 - - - - -
1672 392 161 82 46 - - - - -

.75 860 197 78 38 - - - - - -


1093 250 99 49 - - - - - -
1462 334 133 65 - - - - - -

.80 712 156 59 - - - - - - -


904 199 75 - - - - - - -
1210 266 100 - - - - - - -
Smaller of Upper number:  = 0.05 (one-tailed) or  = 0.10 (two-tailed);  = 0.20
P1 and P2* Middle number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.20
Lower number:  = 0.025 (one-tailed) or  = 0.05 (two-tailed);  = 0.10
Expected difference between P1 and P2
0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
.85 539 110 - - - - - - - -
685 140 - - - - - - - -
916 187 - - - - - - - -

.90 342 - - - - - - - - -
434 - - - - - - - - -
581 - - - - - - - - -

•P1 represents the proportion of subjects expected to have the outcome in one group; P2 in the other group.
(in a case-control study, P1 represents the proportion of cases with the predicted variable; P2 the proportion
of controls with the predictor variable)
One-tailed  = 0.005 0.025 0.05
Two-tailed  = 0.10 0.05 0.010
r2 = 0.05 0.10 0.20 0.05 0.10 0.20 0.05 0.10 0.20
0.05 7118 5947 4663 5193 4200 3134 4325 3424 2469
0.10 1773 1481 1162 1294 1047 782 1078 854 616
0.15 783 655 514 572 463 346 477 378 273

0.20 436 365 287 319 259 194 266 211 153
0.25 276 231 182 202 164 123 169 134 98
0.30 189 158 125 139 113 85 116 92 67

0.35 136 114 90 100 82 62 84 67 49


0.40 102 86 68 75 62 47 63 51 37
0.45 79 66 53 58 48 36 49 39 29

0.50 62 52 42 46 38 29 39 31 23
0.60 40 34 27 30 25 19 26 21 16
0.70 27 23 19 20 17 13 17 14 11
0.80 18 15 13 14 12 9 12 10 8
Confidence Level
W/S * 90% 95% 99%
0.10 1083 1537 2665
0.15 482 683 1180
0.20 271 385 664

0.25 174 246 425


0.30 121 171 295
0.35 89 126 217

0.40 68 97 166
0.50 44 62 107
0.60 31 43 74

.70 23 32 55
0.80 17 25 42
0.90 14 19 33
1.00 11 16 27
• W/S is the standardized width of the confidence interval, computed as W (desired total width) divided by S (standard
deviation of the variable)
Upper number: 90% confidence level
Middle number: 95% confidence level
Lower number: 99% confidence level
Expected Total width of confidence interval (W)
proportion (P) 0.10 0.15 0.20 0.25 0.30
0.10 98 - - - -
139 - - - -
239 - - - -

0.15 138 62 - - -
196 88 - - -
339 151 - - -

0.20 174 77 43 - -
246 110 62 - -
425 189 107 - -

0.25 203 91 51 33 -
289 128 73 47 -
498 221 125 80 -
Upper number: 90% confidence level
Middle number: 95% confidence level
Lower number: 99% confidence level
Expected Total width of confidence interval (W)
proportion (P) 0.10 0.15 0.20 0.25 0.30
0.30 228 101 57 37 26
323 144 81 52 36
558 248 139 90 62

0.40 260 116 65 42 29


369 164 93 60 41
638 283 160 102 71

0.50 271 121 68 44 31


384 171 96 62 43
664 294 166 107 74
Problem: The research question is to compare the efficacy of
metaproterenol and theophylline in the treatment of
asthma. The outcome variable is FEV1 ( forced expiratory
volume in 1 second) 1 hour after treatment. A previous
study has reported that the mean FEV1 in persons with
treated asthma was 2.0 liters, with a standard deviation of
1.0 liter. The investigator would like to be able to detect a
difference of 10% or more in mean FEV1 between the two
treatment groups. How many patients are required in each
group (metaproterenol and theophylline) at  (two-tailed) =
0.05 and power = 0.80
1. Ho: Mean FEV1 at 1 hour after treatment is the same
in asthmatics treated with theophylline as in those
treated with metaproterenol.
2. HA: Mean FEV1 at 1 hour after treatment is different in
asthmatics treated with theophylline than in those
treated with metaproterenol.
3. Effect size = 0.2 liters (10% x 2.0 liters)
4. Standardized effect size = effect size  standard
deviation = 0.2 liters  1.0 liter = 0.2
5.  (two-tailed) = 0.05;  = 1-0.80 = 0.20
Answer 393 patients are required per group
Problem: The research question is whether serum
cholesterol level is associated in controls without stroke is
about 200 mg/dl, with a standard deviation of about 20
mg/dl. A few previous studies have detected a difference
of about +10 mg/dl between stroke patients and controls,
and other studies have found no difference or even a
tendency for serum cholesterol to be lower in stroke
patients. How many case and controls will be needed, at
 (two-tailed)= 0.05 and  = 0.10, to detect a difference of
10 mg/dl between the two groups ? Why was a two-tailed
 used ?
1. Ho: There is no difference in mean serum cholesterol
level in stroke cases and controls
2. HA: There is a difference in mean serum cholesterol
level in stroke cases and controls
3. Effect size = 10 mg/dl
4. Standardized effect size = effect size  standard
deviation = 10 mg/dl 20 mg/dl = 0.5 (This assumes
that the standard deviation of serum cholesterol level
is the same in patients with and without stroke)
5.  (two-tailed) = 0.05;  = 0.10
Answer: 84 cases and 84 controls
 The z test can be used to compare the proportion of
subjects in each of two groups who have a
dichotomous outcome

 In an experiment or cohort study, effect size is specified


by the difference between P1, the proportion of
subjects expected to have the outcome in one group,
and P2, the proportion expected in the other group

 In a case control study, P1 represents the proportion of


cases expected to have a particular risk factor, and P2
represents the proportion of controls expected to have
the risk factor.
 Often investigator will have specified the effect
size in terms of the relative risk (risk ratio)
 In cohort study, relative risk = P1 ÷ P2

 In case-control study, the relative risk must be


approximated by the odds ratio (OR)
 The investigator must specify P2 ( the proportion of controls
exposed to the predictor variable)

 The P1 ( the proportion of cases exposed to the predictor


variable) equals

OR X P2 ÷ (1 – P2 + OR X P2)
Problem:
The researcher question is whether elderly smokers have a
greater incidence of skin cancer than nonsmokers. A review
of previous literature suggests that the 5 year incidence of
skin cancer is about 0.20 in elderly nonsmokers. At an α
(one-tailed) = 0.05 and power = 80%, how many smokers
and nonsmokers will need to be studied to determine
whether the 5 year skin cancer incidence is at least 0.30 in
smokers ? Why was a one-tailed alternative hypothesis
chosen ?
1. Ho.: The incidence of skin cancer is the same in elderly
smokers and nonsmokers

2. HA: The incidence of skin cancer is higher in elderly


smokers than nonsmokers

3. P2 (incidence in nonsmokers) = 0.20; P1 (incidence in


smokers) = 0.30. The difference between them (P1 –
P2) is 0.10

4. α (one-tailed) = .=0.05; β = 1 – 0.80 = 0.20

Answer: 231 per group


Problem:
The investigator plans a case-control study of whether a
history of herpes simplex is associated with lip cancer. A
brief pilot study finds that about 30% of persons without lip
cancer have had herpes simplex. The investigator is
interested in detecting, with α (one-tailed) = 0.025 and
power = 90%, whether the odds ratio for lip cancer
associated with herpes simplex infection is 2.5 or more.
How many subjects will be required ?
1. Ho: The proportion of cases of lip cancer with a history
of herpes simplex is the same as the proportion of
controls with a herpes simplex history
2. HA: The proportion of cases of lip cancer with a history
of herpes simplex is greater than the proportion of
controls with a herpes simplex history
3. P2 (proportion of controls expected to have the risk
factor) = 0.30; P1 (proportion of cases expected to have
the risk factor) = OR X P2 ÷ (1 – P2 + OR X P2) = (2.5
X 0.3) ÷ (1 – 0.3 + 2.5 X 0.3) = 0.75 ÷ 1.45 = 0.52.
The difference between P1 and P2 is about 0.20
4. α (one-tailed) = 0.025; β = 1 – 0.90 = 0.10
Answer : 124 per group
 Not commonly used in sample size calculation
 Can be useful when the predictor and outcome
variables are both continuous
 Correlation coefficient (r) is a measure of the strength
of the linear association between the two variables.
 It varies between -1 and +1
 The closer the absolute value of r is to 1, the stronger
the association; the closer to 0, the weaker the
association.
 To estimate sample size for a study that will be
analyzed with a correlation coefficient, the
investigator must:
1. State the null hypothesis, and decide whether the
alternative hypothesis is one- or two-tailed.
2. Estimate the effect size as the absolute value of the
smallest correlation coefficient (r) that you would like
to be able to detect
3. Set  and 
Problem:
The research question is whether urinary cotinine levels ( a
measure of the intensity of current cigarette smoking) and
bone density in smokers are inversely correlated. The
investigator believes that smokers with higher cotinine levels
will have lower bone densities. A previous study found a
modest correlation (r = 0.2) between serum carbon
monoxide levels (another measure of cigarette
consumption) and bone density; the investigator anticipates
that urinary cotinines will be at least as well-correlated, at 
(one-tailed) = 0.05 and  - 0.10?
1. Ho: There is no correlation between urinary cotinine
level and bone density in smokers
2. HA : There is an inverse correlation between urinary
cotinine level and bone density in smokers
3. Effect size (r) = |-0.2| = 0.2
4.  (one-tailed) = 0.05 and  = 0.10

Answer: 211 smokers


 Estimating sample size for descriptive studies is based on
somewhat different principles

 Such studies do not have predictor and outcome variables,


nor do they compare different groups

 Instead, the investigator calculates descriptive statistics,


such as means and proportions, and uses statistical
techniques to make inferences about the population

 Descriptive studies commonly report confidence intervals, a


range of values about the sample mean or proportion (95%
or 99%)
 Confidence intervals can be thought of as measures of
the precision of sample estimates
 A narrower confidence interval is more precise than a
wider one
 An interval with a higher confidence level (99%) is
more likely to include the true population value than an
interval with a lower one (90%)
 When estimating sample size for descriptive studies,
the investigator specifies the desired level and width of
the confidence interval
 When the variable of interest is continuous, a
confidence interval around the mean value of that
variable is often reported
 When the variable of interest is dichotomous variable,
results can be expressed as a confidence interval
around the estimated proportion of subjects with one of
the values.
 To estimate the sample size for that confidence
interval, the investigator must:
1. Estimate the standard deviation of the variable of
interest
2. Specify the desired precision (total width) of the
interval
3. Select the confidence level for the interval (e.g.,
95%, 99%)
Problem:

Investigator seeks to determine the mean birth


weight in an urban area with a 99% confidence
interval of  60 g. A previous study found the
standard deviation of birth weight in a similar
city was 600 g.
1. Standard deviation of variable (SD) = 600 g
2. Total width of interval = 120 g (60 g above and 60 g
below).
Thus the standardized width of interval = total width ÷ SD = 120 g
÷ 600 g = 0.2

3. Confidence level = 99%

Answer: 664
 To estimate the sample size for that confidence interval,
the investigator must:
1. Estimate the expected proportion with the variable of interest in
the population ( If more than half of the population is expected
to have the characteristic, then plan sample size based on the
proportion expected not to have the characteristic)

2. Specify the desired precision (total width) of the confidence


interval

3. Select the confidence level for the interval (e.g., 95%)


Problem:

The investigator wishes to determine the sensitivity of a


new diagnostic test for colon cancer. Based on a pilot
study, she expects that 80% of patients with colon
cancer will have positive tests. How many such
patients will be required to estimate a 95% confidence
interval for the test’s sensitivity of 0.80  0.05 ?
1. Expected proportion = 0.20. (Because 0.80 is more
than half, sample size is estimated from the proportion
expected to have negative results, i.e., 0.20).
2. Total width – 0.10 (0.05 below and 0.05 above)
3. Confidence level = 95%

Answer: 246 patients


Problem:
Suppose the investigator also wishes to
determine the specificity of the test for ruling
out colon cancer. She expects that 90% of
subjects without colon cancer will have
negative tests. How many such patients will be
required to estimate a 95% confidence interval
for the test’s specificity of 0.90  0.05 ?
1. Expected proportion = 0.10 (1 – 0.90)
2. Total width = 0.10 (0.05 below and 0.05 above
3. Confidence level 95%

Answer: 139
1. Use continuous variables
- This permits smaller sample sizes than dichotomous variables

2. Use more precise variables


- This permits a smaller sample size in both analytic and
descriptive studies because they reduce variability

3. Use paired measurements


- This permits a smaller sample size because it reduces the
variation of the outcome variable

4. Use a more common outcome


- This permits smaller sample size than a rare outcome
 www.openepi.com
Test Questions
The nationwide survey on the prevalence of drug and substance abuse among
high schools students which was done in 1989 was conducted in the following
manner:
All the twelve regions in the country were represented in the survey. From each
region, two provinces were selected. All schools within each province were
stratified into four, namely: urban public, urban private, rural public and rural
private. From each stratum, a random sample of secondary schools were selected.
within each sample school, classes were stratified according to year level. A random
sample of classes were selected from each year level, All students in the sample
classes were included in the study
Determine the following:
1.Target population 3. Stratification variables (2)
2.Sampling population 4. Elementary unit

5. If the level of significance of the test is changed from .05 to .01, will the sample
size needed in the study increase or decrease?
1. Quota sampling and purposive sampling are examples of
probability procedures.

2. The available sampling frame influences the choice of the


sampling procedure

3. In a cluster sampling procedure, a random sampling of group of


elements of the population comprises the sample.

4. Non-probability sampling methods are usually the sampling


method of choice for qualitative research.

5. In a sampling procedure, it is ideal that the sampled population be


different from the target population.

You might also like