You are on page 1of 62

Selection and Information Bias

1
Learning Objectives
By the end of this session the student is expected to be able to:
1. Define bias and differentiate between selection bias,
misclassification/information bias, and confounding bias
2. Define and identify selection bias
3. Define and identify non-differential misclassification of disease
and exposure
4. Define and identify differential misclassification of exposure in a
case-control study (recall bias)
5. Identify the effect that a particular bias can have on a particular
study
6. Identify which types of studies are prone to which types of bias
7. Identify common sources of bias
Why do we conduct epidemiologic studies?
Consideration of the Consideration of decreasing
random error, which can lead
exposure and disease
to a false association by
of interest. “chance.”

Goal:
To determine the relationship between an exposure and an
outcome with validity, precision, and efficiency.

Consideration of cost
Consideration of decreasing and time in
bias and confounding that conducting a study.
would otherwise distort the
results.
Goal in Epidemiologic Studies
• Obtain an accurate measure of association

• Sources of inaccuracy
– Systematic Error (lack of validity)
– Random Error (lack of precision)
Sources of Error
in Epidemiologic Research

Random Systematic

Bias Confounding

Information Selection
Bias Bias
Bias
Bias
Definition: A systematic error that causes an incorrect or invalid
estimate of an association

Sources: Caused by investigator or study participants during the design or


conduct (data collection, analysis) of the study. Can occur in experimental,
case-control, or cohort studies.

Effects: Create appearance of an association when there is none or mask an


association that really exists. Cannot be fixed in the analysis.

Selection bias = occurs during


Information bias = occurs
selection and follow-up of
during data collection
study participants
What can we do about bias?
• Limit in study design
• Limit in study conduct (i.e., data collection)
• Evaluate after study completion and discuss
the effects
– Types and sources
– Direction
– Magnitude
– Impact on study results / interpretation

Bias cannot be controlled for in the analysis!


Sources of Error
in Epidemiologic Research

Random Systematic

Bias Confounding

Information Selection
Bias Bias
Where in 2x2 table? Who is included in 2x2 table?
D+ D- D+ D-

?
E+ E+

E- E-
Non-Differential or Differential?
Errors (who gets in, or what gets
recorded) can be: OFTEN (but not
always) leads to
– NON-DIFFERENTIAL: Extent of errors is the biased results
SAME for both people with and without
disease (or for cases and controls); or for
exposed and unexposed groups

– DIFFERENTIAL: Extent of errors is DIFFERENT ALWAYS leads


to biased
for people with and without disease (or for results
cases and controls); or for exposed and
unexposed groups
Toward or Away from the Null

RR=1.0 TRUTH Biased


RR=2.0 RR=2.6
Toward or Away from the Null

RR=1.0 Biased TRUTH


RR=1.7 RR=2.0
Toward or Away from the Null

Biased TRUTH RR=1.0


RR=0.3 RR=0.7
Selection Bias
Selection Bias
Definition: Systematic error resulting from procedures used to select
subjects into a study

Sources:
Case-Control Study: Selection/participation of cases and controls related to
exposure status.
Cohort Study: Selection of exposed and unexposed subjects (or follow-up of
exposed and unexposed) related to outcome.

Effects: Lead to a result that is different from what would have been obtained
from the entire population targeted for the study.

Participation differs on exposure and


outcome
Selection Bias (cont.)
25 E+ E-

Cases 150 75

E+ E- Controls 150 200


75% of E-
Cases 150 100 cases
Estimated OR = 2.7
75% of E-
Controls 150 200
cases
Selection bias
75% of E-
True OR = 2 E+ E-
controls

Most likely to occur in Cases 150 75


case-control and
Controls 150 150
retrospective cohort studies
because exposure and outcome have 25
occurred by time of subject selection Estimated OR = 2
into study 50 No selection bias
Types of Selection Bias
• Case-Control Studies
– Control selection bias
– Self-selection bias
– Surveillance, diagnostic, and referral bias
• Cohort Studies
– Loss to follow-up
– Healthy worker effect
Control-Selection Bias
Bias that can occur if controls are more (or less) likely to be selected
if they are exposed (or unexposed)

This bias occurs when controls fail to represent the exposure distribution in
the source population from which the cases were identified because controls
do not accurately represent the same source population as the cases

Research Question: Is socioeconomic status associated with cervical cancer?


Potential Problem: Different criteria used to select cases and controls (non-
comparability)
Control-Selection Bias (cont.)
250 hospital cases of 250 hospital cases of
cervical cancer cervical cancer

250 neighborhood controls


from door-to-door survey hospital controls
conducted during work day

door-to-door survey
conducted in the evening

Problem: Who is at home during the day? Solution: Use of identical selection criteria
People at home during the day are more helps ensure that cases and controls come
likely unemployed or older (retired). The from the same source population as the
likelihood of being included as a control is cases.
therefore related to the exposure (SES).
19
Control-Selection Bias (cont.)
Selected controls have lower SES than cases:
 40% of controls in source population (true) have low SES (100/250=40%)
 60% of controls in selected population have low SES (150/250=60%)

True Association Too many Results with Selection Bias

Low SES High SES Low SES High SES


Cases 150 100 Cases 150 100
Controls Controls
100 150 150 100

True Observed
OR = 2.25 (Biased) OR = 1.5
Self-Selection Bias
Bias that can occur if willingness or ability to participate is related
to both exposure and disease status
In a case-control study, cases and controls should be selected
independent of exposure status
Goal is to have comparable rates of participation in all categories of
disease and exposure

Research Question: Does taking antidepressants during pregnancy increase


the risk for birth defects?
Potential Problem: Mothers who took antidepressants during pregnancy and
had babies with birth defects may be more willing to participate in a case
control study compared with other mothers
Self-Selection Bias (cont.)
Participation rates:
 80% of mothers who took antidepressants and had a baby with a birth defect
 60% of all other mothers

True Association 80% Results with Selection Bias


60%
AntiD+ AntiD- AntiD+ AntiD-
Cases 200 100 Cases 160 60
Controls Controls
500 500 300 300

True Observed
OR = 2.0 (Biased) OR = 2.7
Surveillance, Diagnostic, & Referral Bias
Bias that can occur if patients who are exposed are more likely
to be diagnosed with a disease

This bias occurs when the people who play a role in, or are responsible for,
disease ascertainment base their diagnosis on whether or not the
participant has the exposure of interest

Research Question: Do statins protect against breast cancer incidence?


Potential Problem: Patients who are taking statins may be more likely to
receive a breast cancer diagnosis (or to receive a diagnosis sooner) because
they have more frequent contact with their physicians; or patients who are
not taking statins may be less likely to have their breast cancer diagnosed
Surveillance, Diagnostic, & Referral Bias (cont.)
Breast cancer diagnosis:
 Breast cancer was detected in 100% of statin users
 Breast cancer detected in 50% of participants who were not using statins

True Association 100% 50% Results with Selection Bias

Statin+ Statin- Statin+ Statin-


Cases 100 100 Cases 100 50
Controls Controls
200 200 200 200

True Observed
OR = 1.0 (Biased) OR = 2.0
Selection Bias in Case-Control Studies
Bias Effect Prevention
Control Selection Toward or Use identical selection criteria for
Away from Null cases and controls (recall purpose
of controls)

Differential Participation Toward or Obtain high participation rates for


Away from Null all groups

Differential Surveillance, Toward or Use identical selection criteria for


Diagnosis, Referral Away from Null cases and controls by accounting
for referral and diagnosis patterns
for disease of interest
Types of Selection Bias
• Case-Control Studies
– Control selection bias
– Self-selection bias
– Surveillance, diagnostic, and referral bias
• Cohort Studies
– Loss to follow-up
Loss to Follow-Up
Bias that can occur if study participants exit a study for reasons
related to both exposure and disease
Think of this type of bias as selection out of a study rather than entry into a
study. Concern in cohort and experimental studies.

Why do we care about losses to follow-up?


If people are lost to follow-up, we cannot know whether they developed
disease
Research Question: Does OC use increase risk of thromboembolism?

Potential Problem: Loss to follow up may be associated with exposure and


outcome

Rarely possible to determine if losses are related to both


outcome and exposure.
Loss to Follow-Up (cont.)
Example: Association between OC use and thromboembolism (TE)

True relationship Results with loss to follow-up


OC+ OC- 90% OC+ OC-
20%
TE+ 20 10 TE+ 18 2

TE- 9,880 9,990 TE- 8,974 8,986

N 10,000 10,000 N 8,992 8,988

1,008 1,012
True RR = 2.0 Observed (Biased) RR = 9.0

Differential Loss to Follow-Up – Losses are related to both exposure and disease
Loss to Follow-Up (cont.)
Example: Association between OC use and thromboembolism (TE)

True relationship Results with loss to follow-up


OC+ OC- 20% OC+ OC- 90%

TE+ 20 10 TE+ 4 9

TE- 9,880 9,990 TE- 8,974 8,986

N 10,000 10,000 N 8,992 8,988

1,008 1,012
True RR = 2.0 Observed (Biased) RR = 0.4
Differential loss to follow-up results in unpredictable direction of
measure of association.
Selection Bias in Cohort Studies
Bias Effect Prevention

Differential Loss to Toward or Away from Since outcome cannot be


Follow-Up Null known without good follow-
up, must maintain high
participation rates
Selection Bias –
What Are the Solutions?
• Little or nothing can be done to fix this bias once it has
occurred
– Cannot control for it in the analysis

• You need to avoid it when you design the study. For example,
plan on:
– Using the same criteria for selecting cases and controls
– Obtaining high participation rates
– Taking into account diagnostic and referral patterns of
disease
– Obtaining all relevant subject records
Information (Observation) Bias
Information Bias
Definition: An error that arises from systematic differences in the way
information on exposure or disease is obtained from the study groups (aka
observation bias)

Sources: Both cohort and case-control studies are susceptible. Occurs after the
subjects have entered the study.
Case-Control Study: Different techniques are used to interview cases and
controls
Cohort Study: Different procedures are used to obtain outcome information on
exposed and unexposed.
Effects:
Results in participants who are incorrectly classified as either exposed or
unexposed or as diseased or not diseased. The misclassification can be
differential or non-differential.

Differences in how data is collected


Information Bias (cont.)
25% of E+ are classified as E-
E+ E-
Disease (200-50) (100+50)
No Disease
(100-25) (100+25)
E+ E-
Disease 200 100 Estimated (Biased) OR = 1.67
No Disease
100 100
True OR = 2.0 10% of D+ are classified as D-
E+ E-
Disease (200-20) (100-10)
No Disease
(100+20) (100+10)

Estimated (Biased) OR = 1.83


Types of Information Bias
• Recall Bias

• Interviewer/Recording Bias

• Misclassification (measurement error)


Recall Bias
Bias that can occur if people with disease remember/report their
exposure differently (e.g. more often or less often) than people
without disease

Case-control study: Cases are more or less likely to recall prior


exposures than controls
Retrospective cohort study: Diseased participants are more or less likely
to recall prior exposures than non-diseased participants
Hmm…
Let me
Research Question: think…
Are birth defects associated with use of Bendectin in
pregnancy?
Potential Problem:
Mothers of affected infants may have more accurate recall
of their exposure to Bendectin
Recall Bias (cont.)
Cases and controls recall their exposures differently
 100% of cases accurately recalled their exposure to Bendectin
 60% of controls who used Bendectin accurately recalled their exposure (40%
forgot)
True Association 40% Results with Recall Bias

Bendect+ Bendect- Bendect+ Bendect-


60%
Cases 50 50 100 Cases 50 50 100
Controls Controls 100
50 50 100 30 70

True Observed
OR = 1.0 (Biased) OR = 2.3
Recall Bias: Solutions
• Use controls who are also sick to promote comparable recall

• Use standardized, closed-ended questionnaires to promote


consistency and specificity
o Danger of recall bias lessens as specificity of exposure
assessment increases

• Include considerations for sensitive exposures/topics

• Examine pre-existing data (e.g. medical or employment records) or


use biological measurements to ascertain exposure

• Ask subjects about their knowledge of the study hypothesis (at end
of interview), and analyze data accordingly
Poor Recall vs. Recall Bias
Poor recall happens all the time

Bias only occurs when recall is different for people who do and
do not have disease (differential)
• 90% of cases and 90% of controls have accurate recall  NO bias
• 90% of cases and 70% of controls have accurate recall  bias
Interviewer Bias
Bias that can occur if there is a systematic difference in
soliciting, recording, or interpreting information

Case-control study: Interviewer is influenced by participant’s case or control status


Cohort study (including RCT): Interviewer is influenced by participant’s treatment or
exposure status
Think about it.
Are you sure
you didn’t take
Research Question: it…?
Is the presence of flame retardants in your office
associated with migraine headaches?
Potential Problem:
Interviewers who know the amount of flame retardants in your
office may be “looking” for disease
Interviewer Bias
Exposed and unexposed groups report their outcomes differently
 100% of unexposed accurately recalled their migraine incidence
 20% of exposed people reported having a migraine, even though they did not
have one + 20%
True Association Results with Interviewer Bias
- 20%
FR+ FR- FR+ FR-
Migraine + 50 50 Migraine + 60 50
Migraine - 50 50 Migraine -
40 50
Total Pop 100 100 Total Pop 100 100

True Observed
RR = 2.0 (Biased) RR = 1.2
Interviewer/Recording Bias: Solutions
Think!
• Provide rigorous training (e.g. nondirective Are you sure
you didn’t
probing) take it…?
• Monitor interviewers
• Blind the interviewer/ abstractor to study
hypothesis and disease or exposure status of
subjects
• Use standardized questionnaires consisting of
closed-ended, easy questions with appropriate
response categories
• Use standardized methods of outcome (or
exposure) ascertainment
• Examine pre-existing data (e.g. medical or
employment records)
Reducing Information Bias
• Little (or nothing) can be done to fix
information bias once it has occurred

• Information bias must be avoided through


careful study design and conduct (see
strategies for individual biases)
– Information bias cannot be “controlled for” in the
analysis
Measurement (Misclassification) Error
Bias that can occur if study participants are placed into the
wrong exposure or disease category

This is the most common form of bias


Sources:
– Self-reports (BP, cancer, smoking, drug use)
– Errors on medical records, death certificates, etc.
– Errors in how data are captured (data entry)
– Non-specific exposure or disease definitions (cervical vs. uterine
cancer, using job title for occupational exposure)

Effect:
Non-differential → bias towards the null
Differential → bias upward or downward
Non-Differential Misclassification
Non-differential means that the extent of the misclassification
is the same (not different) for both groups

If EXPOSURE is the thing being If DISEASE is the thing being


misclassified, then the extent of the misclassified, then the extent of the
exposure misclassification is the disease misclassification is the same
same for cases/controls or for exposed and unexposed persons
diseased/non-diseased

E+ E- E+ E-
Disease a b Disease a b

No Disease c d No Disease c d
Non-Differential Misclassification
If EXPOSURE is the thing being misclassified, then the extent of the exposure
misclassification is the SAME for cases/controls or diseased/non-diseased

Example: A case-control study to determine whether smokers have an


increased risk of bladder cancer
Among smokers, only half (50%) are correctly classified in both cases and
controls (the other 50% of smokers are classified as non-smokers)

True Association Results with ND Misclassification


Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 100 200
Controls 100 200 Controls 50 250

OR = 4.0 OR = 2.5

E+ and E- groups become similar  bias towards the null


Non-Differential Misclassification
If DISEASE is the thing being misclassified, then the extent of the disease
misclassification is the SAME for exposed and unexposed

Example: A case-control study to determine whether smokers have an


increased risk of bladder cancer
Among bladder cancer cases, only 75% are correctly classified in both the
exposed and unexposed groups (the other 25% of cases are classified as
controls)

True Association Results with ND Misclassification


Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 150 75
Controls 100 200 Controls 150 225

OR = 4.0 OR = 3.0

D+ and D- groups become similar  bias towards the null


Differential Misclassification
Differential means that the extent of the misclassification is
different (not the same) for both groups

If EXPOSURE is the thing being If DISEASE is the thing being


misclassified, then the extent of the misclassified, then the extent of the
exposure misclassification is disease misclassification is different
different for cases/controls or for exposed and unexposed persons
diseased/non-diseased

E+ E- E+ E-
Disease a b Disease a b

No Disease c d No Disease c d
Differential Misclassification
If EXPOSURE is the thing being misclassified, then the extent of the exposure
misclassification is DIFFERENT for cases/controls or diseased/non-diseased

Example: Exposure ascertainment is related to outcome


100% exposure accuracy in CASES
50% exposure accuracy in CONTROLS
**Differential degree of accuracy between the compared groups

True Association Results with D Misclassification


Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 200 100
Controls 100 200 Controls 50 250

OR = 4.0 OR = 10.0

Variable effects; here:  bias away from null


Differential Misclassification
If EXPOSURE is the thing being misclassified, then the extent of the exposure
misclassification is DIFFERENT for exposed and unexposed groups

Example: Exposure ascertainment is related to outcome


50% exposure accuracy in CASES
100% exposure accuracy in CONTROLS
**Differential degree of accuracy between the compared groups

True Association Results with D Misclassification


Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 100 200
Controls 100 200 Controls 100 200

OR = 4.0 OR = 1.0

Variable effects; here:  bias toward the null


In Which Direction Does Misclassification
Bias The Results?
Non-differential misclassification tends to bias the results in
which direction?
 Toward the null
 Away from the null
 Either towards or away from the null

Differential misclassification tends to bias the results in which


direction?
 Toward the null
 Away from the null
 Either towards or away from the null
Measurement Error
(Misclassification): Solutions
Improve accuracy of collected information:
• Multiple measurements of exposure and disease (e.g.,
several BP readings, continuous monitoring of air pollution)

• Validation – corroborate the data using several sources (e.g.,


questionnaire data, medications, medical records, laboratory
tests)

• Use most accurate source of information available


• Use sensitive and specific criteria for exposure and disease
Information Bias
Bias Effect Prevention

Recall Bias Toward or Away from Null • Use sick controls


• Quality questionnaires
• Use pre-existing data
• Ask participants what
they know
Interviewer Bias Toward or Away from Null • Masking
• Quality questionnaires
• Train interviewers

Measurement Error Toward or Away from Null • Sensitive and specific


definitions of E and D
• Accurate data sources
• Multiple measurements
• Validation
Quick Recap
CASE-CONTROL COHORT EXPERIMENTAL
BIAS
STUDY STUDY STUDY

Control selection bias     


Self-selection bias     
Selection

Surveillance, diagnostic,
referral bias     
Loss to follow-up    
Recall bias    
Information

Interviewer bias   
Measurement error
(misclassification)   
Questions to Ask Yourself When
Interpreting Study Results…
• Given the conditions of the study, could bias have
occurred? Which types?

• Is bias actually present?

• Which direction is the distortion? Is it towards the null or


away from the null?

• Are consequences of the bias large enough to distort the


measure of association in an important way? (qualitative,
quantitative)
Practice Questions
Question 1
In a hospital-based case-control study, investigators selected only
one disease for controls.
It turned out that this disease was also caused by the exposure.
Did this cause a bias? If so, what was the direction?

This is another way of saying that controls had a higher level of exposure
than the source population, or that the number of exposed controls is too
high
Question 1
Population targeted for study Biased study sample
E+ E- E+ E-

Cases a b Cases a b

Controls c d Controls c d
C

OR = (a/C)/(b/d)

Yes, there was a bias,


and the direction was toward the null
Question 2
A study found a relative risk of 1.5. A reviewer critiqued the study.
She pointed out ways that the estimate was biased by non-
differential misclassification of exposure and argued that therefore
there really may have been no association. Is the reviewer correct?
Question 2
The reviewer is NOT correct.
Non-differential misclassification of exposures biases
toward the null. Therefore, had there been no NDME,
the relative risk would have been even greater.
Question 3
In a case-control study of condom use and urinary tract infection (UTI) in young women,
the investigators reported an odds ratio of 2.4 for the association between use of a
spermicide-coated condom in the previous month and UTI incidence (Fihn et al, 1996).

You suspect that the control women were less likely to remember that they had used a
spermicide-coated condom in the previous month than were the women with a UTI. If
such information bias occurred (assuming no other bias or confounding), it would have
led the investigators to observe an effect of spermicide-coated condom use on UTI that
was:
 

a. An overestimate of the true association (bias away from the null).
b. An underestimate of the true association (bias towards the null).
c. The same as the true association.
Question 3
Actual information for study Biased information collected
E+ E- E+ E-

Cases a b Cases a b

Controls c d Controls c d

OR = (a/c)/(b/d)

Yes, there was a bias,


and the direction is away from the null

You might also like