You are on page 1of 30

Sampling in Epidemiology

EPIDEMIOLOGY

SimonPeters
Models of Epidemiology
• Host, agent, environment
• Person, place and time
• Exposure-outcome
• Cause-effect
• Observational vs. experimental
• Mathematical and statistical
Models in Epidemiology
• Common features across models

– Sampling

– Measurement
Sampling in Epidemiology
• Why?
– Unable to study all members of a population
– Reduce bias
– Save time and money
– Measurements may be better in sample than in
entire population
– Feasibility
Sampling in Epidemiology
• Definitions
– Sampling unit – the basic unit around which a
sampling procedure is planned
• Person
• Group – household, school, district, etc.
• Component – eye, physiological response
– Sampling frame – list of all of the sampling
units in a population
– Sample – collection of sampling units from the
eligible population
Sampling in Epidemiology
• Probability (random) sampling
– Sampling in which each sampling unit has a
known and nonzero probability of being
included in the sample

• Replacement
– With replacement – sampling unit returned to
population before next sampling event
– Without replacement – sampling unit not
returned to population before next sampling
event
Sampling in Epidemiology
• Random Sample • Non-random Sample
– Simple random sample – Convenience sample
– Stratified random – Systematic sample
sample – Consecutive sample
– Cluster sample – Quota sample
– Adaptive cluster – Volunteer sample
sample – Capture-recapture
– Multistage sample
Sampling in Epidemiology
• Which sampling design is best?

– Choose the method that gives the greatest


degree of accuracy and precision for a given
cost.
Sampling in Epidemiology
• Simple random sampling
– Each sampling unit has an equal chance of
being included in the is sample
– In epidemiology, sampling generally done
without replacement as this approach allows for
a wider coverage of sampling units, and as a
result smaller standard errors
Sampling in Epidemiology
• Simple random sampling
– Advantages
• Simple process and easy to understand
• Easy calculation of means and variance
– Disadvantages
• Not most efficient method, that is, not the most
precise estimate for the cost
• Requires knowledge of the complete sampling frame
• Cannot always be certain that there is an equal
chance of selection
• Nonrespondents or refusals
Sampling in Epidemiology
• Simple random sampling
– Estimate hemoglobin levels in patients with sickle cell
anemia
1. Determine sample size
2. Obtain a list of all patients with sickle cell anemia
in a hospital or clinic
3. Patient is the sampling unit
4. Use a table of random numbers to select units
from the sampling frame
5. Measure hemoglobin in all patients
6. Calculate mean and standard deviation of sample
Sampling in Epidemiology
• Systematic sampling
– The sampling units are spaced regularly
throughout the sampling frame, e.g., every 3rd
unit would be selected
– May be used as either probability sample or not
• Not a probability sample unless the starting point is
randomly selected
• Non-random sample if the starting point is
determined by some other mechanism than chance
Sampling in Epidemiology
• Systematic sample
– Advantages
• Sampling frame does not need to be defined in
advance
• Easier to implement in the field
• If there are unrecognized trends in the sample
frame, systematic sample ensure coverage of the
specturm of units
– Disadvantages
• Variance cannot be estimated unless assumptions
are made
Sampling in Epidemiology
• Systematic sampling
– Estimate HIV prevalence in children born
during a specified period at a hospital
1. Impossible to construct sampling frame in
advance
2. Select a random number between some pre-
specified bounds
3. Beginning with the random number chosen, take
every 5th birth and measure for HIV infection
Sampling in Epidemiology
• Stratified random sample
– The sampling frame comprises groups, or
strata, with certain characteristics
– A sample of units are selected from each group
or stratum
Sampling in Epidemiology
• Stratified random sample
– Advantages
• Assures that certain subgroups are represented in a sample
• Allows investigator to estimate parameters in different strata
• More precise estimates of the parameters because strata are more
homogeneous, e.g., smaller variance within strata
• Strata of interest can be sampled most intensively, e.g., groups with
greatest variance
• Administrative advantages
– Disadvantages
• Loss of precision if small number of units is sampled from strata
Sampling in Epidemiology
• Stratified random sample
– Assess dietary intake in adolescents
1. Define three age groups: 11-13, 14-16, 17-19
2. Stratify age groups by sex
3. Obtain list of children in this age range from
schools
4. Randomly select children from each of the 6
strata until sample size is obtained
5. Measure dietary intake
Sampling in Epidemiology
• Cluster sampling
– Clusters of sampling units are first selected
randomly
– Individual sampling units are then selected
from within each cluster
Sampling in Epidemiology
• Cluster sampling
– Advantages
• The entire sampling frame need not be enumerated
in advance, just the clusters once identified
• More economical in terms of resources than simple
random sampling
– Disadvantages
• Loss of precision, i.e., wider variance, but can be
accounted for with larger number of clusters
Sampling in Epidemiology
• Cluster sampling
– Estimate the prevalence of dental caries in
school children
1. Among the schools in the catchment area, list all
of the classrooms in each school
2. Take a simple random sample of classrooms, or
cluster of children
3. Examine all children in a cluster for dental caries
4. Estimate prevalence of caries within clusters
than combine in overall estimate, with variance
Sampling in Epidemiology
• Multistage sampling
• Similar to cluster sampling except that there
are two sampling events, instead of one
– Primary units are randomly selected
– Individual units within primary units randomly
selected for measurement
Sampling in Epidemiology
• Multistage sampling
– Estimate the prevalence of dental caries in school
children
1. Among the schools in the catchment area, list all of the
classrooms in each school
2. Take a simple random sample of classrooms, or cluster of
children
3. Enumerate the children in each classroom
4. Take a simple random sample of children within the
classroom
5. Examine all children in a cluster for dental caries
6. Estimate prevalence of caries within clusters than combine
in overall estimate, with variance
Sampling in Epidemiology
• Convenience sample
– A non-random collection of sampling units
from an undefined sampling frame
• Advantages
– Convenient and easy to perform
• Disadvantages
– Not statistical justification for sample
Sampling in Epidemiology
• Convenience sample
– Case series of patients with a particular
condition at a certain hospital
– “Normal” graduate students walking down the
hall are asked to donate blood for a study
– Children with febrile seizures reporting to an
emergency room
Investigator decides who is enrolled in a study
Sampling in Epidemiology
• Consecutive sample
– A case series of consecutive patients with a condition
of interest
– Consecutive series means ALL patients with the
condition within hospital or clinic, not just the patients
the investigators happen to know about
• Advantages
– Removes investigator from deciding who enters a study
– Requires protocol with definitions of condition of
interest
– Straightforward way to enroll subjects
• Disdavantage
– Non-random
Sampling in Epidemiology
• Consecutive sample
– Outcome of 1000 consecutive patients
presenting to the emergency room with chest
pain
– Natural history of all 125 patients with HIV-
associated TB during 5 year period

Explicit efforts must be made to identify


and recruit ALL persons with the condition
of interest
Sampling in Epidemiology
• Capture-recapture sampling
– A non-random method of sampling that relies
on lists of sampling units obtained from
multiple sources.
– The overlap in the lists allows one to estimate
the number of individuals not ‘captured’
Sampling in Epidemiology
• Uses of this method
– Estimate parameter when incomplete
information is available from  2 sources
– Refine of prevalence or incidence estimates
from population surveys
– Assess completeness of event reporting
– Derive plausible upper and lower limits on total
population affected
Sampling in Epidemiology
• Advantages
– Does not require random sample
– Can give more precise estimate of parameter than
probability sample
– Easy to perform in the field
– Useful in estimating events in difficult to access
populations
• Disadvantages
– Analysis of lists may be complicated
– Need to be able to match individuals across lists
– Assumptions regarding probability of being listed by a
source
– Unfamiliar to epidemiologists
Sampling in Epidemiology
• Capture-recapture
– Estimate the number of AIDS cases among IDUs in a
city
1. From hospital and clinic records obtain lists of persons
with diagnosis of HIV/AIDS during study period
2. Determine IDU status
3. Identify people who appear on multiple lists
4. Use nested log-linear models to estimate the number of
IDUs with AIDS not captured by the different lists
5. Use the list of reported cases and estimate of non-
reported cases to obtain overall estimate of the number
of IDUs with AIDS (with confidence intervals)

You might also like