Professional Documents
Culture Documents
Epidemiology
Anchita Khatri
Definitions
ERROR:
1. A false or mistaken result obtained in a study
or experiment
2. Random error is the portion of variation in
measurement that has no apparent connection
to any other measurement or variable,
generally regarded as due to chance
3. Systematic error which often has a
recognizable source, e.g., a faulty measuring
instrument, or pattern, e.g., it is consistently
wrong in a particular direction
(Last)
Relationship b/w Bias and Chance
True BP BP measurement
(intra-arterial (sphygmomanometer)
cannula)
No. of observations
Chance
Bias
80 90
Diastolic Blood Pressure (mm Hg)
Validity
Validity: The degree to which a
measurement measures what it purports to
measure (Last)
Degree to which the data measure what they
were intended to measure that is, the
results of a measurement correspond to the
true state of the phenomenon being
measured (Fletcher)
also known as Accuracy
Reliability
The degree of stability expected when a
measurement is repeated under identical conditions;
degree to which the results obtained from a
measurement procedure can be replicated
(Last)
Extent to which repeated measurements of a stable
phenomenon by different people and instruments,
at different times and places get similar results
(Fletcher)
Also known as Reproduciblity and Precision
Validity and Reliability
VALIDITY
High Low
High
RELIABILITY
Low
Bias
Deviation of results or inferences from the truth,
or processes leading to such deviation. Any
trend in the collection, analysis, interpretation,
publication, or review of data that can lead to
conclusions that are systematically different
from the truth. (Last)
A process at any stage of inference tending to
produce results that depart systematically from
true values (Fletcher)
Types of biases
1. Selection bias
2. Measurement / (mis)classification bias
3. Confounding bias
Selection bias
Errors due to systematic differences in
characteristics between those who are selected
for study and those who are not.
(Last; Beaglehole)
When comparisons are made between groups
of patients that differ in ways other than the
main factors under study, that affect the
outcome under study. (Fletcher)
Examples of Selection bias
Subjects: hospital cases under the care of a
physician
Excluded:
1. Die before admission acute/severe disease.
2. Not sick enough to require hospital care
3. Do not have access due to cost, distance etc.
Result: conclusions cannot be generalized
Also known as Ascertainment Bias
(Last)
Ascertainment Bias
Systematic failure to represent equally all
classes of cases or persons supposed to be
represented in a sample. This bias may arise
because of the nature of the sources from
which the persons come, e.g., a specialized
clinic; from a diagnostic process influenced
by culture, custom, or idiosyncracy . (Last)
Selection bias with volunteers
Also known as response bias
Systematic error due to differences in
characteristics b/w those who choose or
volunteer to take part in a study and those
who do not
Examples response bias
stratification
Disease-free survival according to CEA
levels in colorectal cancer pts.with similar
pathological staging (Dukes B)
100
% disease free
80
CEA Level (ng)
<2.5
2.5 10.0
60
>10.0
0 3 6 9 12 15 18 21 24
Months
Selection bias with Survival Cohorts
Patients are included in study because they are
available, and currently have the disease
For lethal diseases patients in survival cohort are
the ones who are fortunate to have survived, and
so are available for observation
For remitting diseases patients are those who are
unfortunate enough to have persistent disease
Also known as Available patient cohorts
Example bias with survival cohort
TRUE COHORT Observed True
Measure outcome improvement improvement
Assemble Improved: 75
Cohort
(N=150)
Not improved: 75 50% 50%
SURVIVAL COHORT
Assemble
patients
Begin Measure outcome 50%
Follow-up 80%
Improved: 40
(N=50) Not improved: 10
Not observed Dropouts
(N=100) Improved: 35
Not improved: 65
Selection bias due to Loss to
Follow-up
Also known as Migration Bias
In nearly all large studies some members of the
original cohort drop out of the study
If drop-outs occur randomly, such that
characteristics of lost subjects in one group are
on an average similar to those who remain in
the group, no bias is introduced
But ordinarily the characteristics of the lost
subjects are not the same
Example of lost to follow-up
EXPOSURE EXPOSURE
irradiation irradiation
+nt -nt Total +nt -nt Total
DISEASE
cataract
Measurement results
Measurement errors
Empirical definition
Theoretical definition
Example gap b/w definitions
Theoretical definition Empirical definition
Exposure: passive Exposure: passive
smoking inhalation of smoking time spent with
tobacco smoke from smokers (having smokers
as room-mates)
other peoples smoking
Disease: Myocardial
Disease: Myocardial infarction certain
infarction necrosis of diagnostic criteria (chest
the heart muscle tissue pain, enzyme levels, signs
on ECG)
Exposure misclassification
Non-differential
Misclassification does not differ between
cases and non-cases
Generally leads to dilution of effect, i.e.
bias towards RR=1 (no association)
ExampleNon-differential
Exposure Misclassification
EXPOSURE EXPOSURE
X-ray exposure X-ray exposure
+nt -nt Total +nt -nt Total
Breast Cancer
DISEASE
Placebo
Hawthorne
Natural
History
Confounding
1. A situation in which the effects of two
processes are not separated. The distortion
of the apparent effect of an exposure on risk
brought about by the association with other
factors that can influence the outcome
2. A relationship b/w the effects of two or
more causal factors as observed in a set of
data such that it is not logically possible to
separate the contribution that any single
causal factor has made to an effect
(Last)
Confounding
When another exposure exists in the study
population (besides the one being studied)
and is associated both with disease and the
exposure being studied. If this extraneous
factor itself a determinant of or risk factor
for health outcome is unequally distributed
b/w the exposure subgroups, it can lead to
confounding
(Beaglehole)
Confounder must be
1. Risk factor among the unexposed (itself a
determinant of disease)
2. Associated with the exposure under study
3. Unequally distributed among the exposed
and the unexposed groups
Examples confounding
(Smoking increases
(Coffee drinkers are the risk of heart ds)
more likely to smoke)
SMOKING
Examples confounding
ALCOHOL MYOCARDIAL
INTAKE INFARCTION
SEX
Examples confounding
Exposure-alcohol
+nt -nt
RR = 140/30000
+nt 140 100
Disease
100/30000
MI
-nt = 1.4
Total 30000 30000
Exposure-alcohol
+nt -nt RR = 120/20000
(M) 60/10000
male female male female
=1
+nt 120 20 60 40 RR = 20/10000
Disease
MI
(Last)
Matching is often done for age, sex, race, place
of residence, severity of disease, rate of
progression of disease, previous treatment
received etc.
Limitations:
- controls for bias for only those factors involved
in the match
- Usually not possible to match for more than a
few factors because of the practical difficulties
of finding patients that meet all matching criteria
- If categories for matching are relatively crude,
there may be room for substantial differences
b/w matched groups
Example Matching
Study: ? Association of Sickle cell trait (HbAS)
with defects in physical growth and cognitive
development
Other potential biasing factors: race, sex, birth
date, birth weight, gestational age, 5-min Apgar
score, socio economic status
Solution: matching for each child with HbAS
selected a child with HbAA who was similar with
respect to the seven other factors (50+50=100)
Result: no difference in growth and development
Overmatching
A situation that may arise when groups are being
matched. Several varieties:
1. The matching procedure partially or
completely obscures evidence of a true causal
association b/w the independent and
dependant variables. Overmatching may occur
if the matching variable is involved in, or is
closely connected with, the mechanism
whereby the independent variable affects the
dependant variable. The matching variable
may be an intermediate cause in the causal
chain or it may be strongly affected by, or a
consequence of, such an intermediate cause
2. The matching procedure uses one or more
unnecessary matching variables, e.g., variables
that have no causal effect or influence on the
dependant variable, and hence cannot confound
the relationship b/w the independent and
dependant variables.
3. The matching process is unduly elaborate,
involving the use of numerous matching
variables and / or insisting on a very close
similarity with respect to specific matching
variables. This leads to difficulty in finding
suitable controls (Last)
Stratification
The process of or the result of separating a
sample into several sub-samples according
to specified criteria such as age groups,
socio-economic status etc. (Last)
The effect of confounding variables may be
controlled by stratifying the analysis of
results
After data are collected, they can be
analyzed and results presented according to
subgroups of patients, or strata, of similar
characteristics (Fletcher)
ExampleStratification (Fletcher)
HOSPITAL A
Pre-op
Risk
Pts Deaths %
High 500 30 6
Medium 400 16 4
HOSPITAL B
Low 300 02 .67
Pre-op
Total 1200 48 4 Risk
Pts Deaths %
High 400 24 6
Medium 800 32 4
Low 1200 8 .67
Total 2400 64 2.6
ExampleStratification
Age Relat.
Pinellas county Dade county
Wise Rate
Stratifi
cation Dead Total Rate Dead Total Rate
Birth 737
229,198 3.2 2463 748,035 3.3 1.0
54 yrs
Unscreened Diag
Screened Diag
Early T/t not effective
Screened Diag
Early T/t is effective
1. Hypothesis testing
2. Estimation
Hypothesis testing
Start off with the Null Hypothesis (H0)
the statistical hypothesis that one variable
has no association with another variable or
set of variables, or that two or more
population distributions do not differ from
one another.
in simpler terms, the null hypothesis states
that the results observed in a study,
experiment or test are no different from
what might have occurred as a result of
operation of chance alone
Statistical tests errors
(Fletcher)
TRUE DIFFERENCE
PRESENT ABSENT
(H0) false (H0) true
SIGNIFICANT Type I
CONCLUSION (H ) Rejected
0
Power ( ) error
OF
STATISTICAL NOT
TEST SIGNIFICANT Type II
(H0) Accepted ( ) error
Statistical tests - errors
Type I () error: error of rejecting a true
null hypothesis , I.e. declaring a difference
exists when it does not
Type II () error: error of failing to reject a
false null hypothesis , I.e. declaring that a
difference does not exist when in fact it
does
Power of a study: ability of a study to
demonstrate an association if one exists
Power = 1-
p - value
Probability of an error.
Quantitative estimate of probability that
observed difference in b/w the groups in the
study could have happened by chance alone,
assuming that there is no real difference b/w the
groups OR
If there were no difference b/w the groups, and
the trial was repeated many times, what
proportion of the trials would lead to conclusions
that there is the same or a bigger difference b/w
the groups than the results found in the study
p value Remember!!
Usually P < 0.05 is considered statistically
significant (i.e. probability of 1 in 20 that
observed difference is due to chance)
0.05 is an arbitrary cut-off; can change
according to requirements
Statistically significant result might not be
clinically significant and vice-versa
Statistical significance vs.
clinical significance
Large RCT called GUSTO (41,021 pts of ac MI)
Study: Streptokinase vs. tPA
Result: death rate at 30 days
- streptokinase (7.2%) (p < 0.001)
- tPA (6.3%)
But, need to treat 100 patients with tPA instead
of streptokinase to prevent 1 death!
tPA costly - $ 250 thousand to save one death
??? Clinically significant
Estimation
Effect size observed in a particular study is
called Point estimate
True effect is unlikely to be exactly that
observed in study because of random
variation
Confidence interval (CI): usually 95%
(Last) computed interval with a given
probability e.g. 95%, that the true value
such as a mean, proportion, or rate is
contained within the interval
Confidence intervals
(Fletcher) If the study is unbiased, there is a 95%
chance that that the interval includes the true
effect size. The true value is likely to be close
to the point estimate, less likely to be near the
outer limits of that interval, and could (5 times
out of 100) fall outside these limits altogether,