Bias

Selection and Information Bias
1
Learning Objectives
By the end of this session the student is expected to be able to:
1. Define bias and differentiate between selection bias,
misclassification/information bias, and confounding bias
2. Define and identify selection bias
3. Define and identify non-differential misclassification of disease
and exposure
4. Define and identify differential misclassification of exposure in a
case-control study (recall bias)
5. Identify the effect that a particular bias can have on a particular
study
6. Identify which types of studies are prone to which types of bias
7. Identify common sources of bias
Why do we conduct epidemiologic studies?
Consideration of the Consideration of decreasing
random error, which can lead
exposure and disease
to a false association by
of interest. “chance.”
Goal:
To determine the relationship between an exposure and an
outcome with validity, precision, and efficiency.
Consideration of cost
Consideration of decreasing and time in
bias and confounding that conducting a study.
would otherwise distort the
results.
Goal in Epidemiologic Studies
• Obtain an accurate measure of association
• Sources of inaccuracy
– Systematic Error (lack of validity)
– Random Error (lack of precision)
Sources of Error
in Epidemiologic Research
Random Systematic
Bias Confounding
Information Selection
Bias Bias
Bias
Bias
Definition: A systematic error that causes an incorrect or invalid
estimate of an association
Sources: Caused by investigator or study participants during the design or

conduct (data collection, analysis) of the study. Can occur in experimental,
case-control, or cohort studies.
Effects: Create appearance of an association when there is none or mask an

association that really exists. Cannot be fixed in the analysis.
Selection bias = occurs during

Information bias = occurs
selection and follow-up of
during data collection
study participants
What can we do about bias?
• Limit in study design
• Limit in study conduct (i.e., data collection)
• Evaluate after study completion and discuss
the effects
– Types and sources
– Direction
– Magnitude
– Impact on study results / interpretation
Bias cannot be controlled for in the analysis!

Sources of Error
in Epidemiologic Research
Random Systematic
Bias Confounding
Information Selection
Bias Bias
Where in 2x2 table? Who is included in 2x2 table?
D+ D- D+ D-
?
E+ E+
E- E-
Non-Differential or Differential?
Errors (who gets in, or what gets
recorded) can be: OFTEN (but not
always) leads to
– NON-DIFFERENTIAL: Extent of errors is the biased results
SAME for both people with and without
disease (or for cases and controls); or for
exposed and unexposed groups
– DIFFERENTIAL: Extent of errors is DIFFERENT ALWAYS leads

to biased
for people with and without disease (or for results
cases and controls); or for exposed and
unexposed groups
Toward or Away from the Null
RR=1.0 TRUTH Biased

RR=2.0 RR=2.6
RR=1.0 Biased TRUTH

RR=1.7 RR=2.0
Biased TRUTH RR=1.0

RR=0.3 RR=0.7
Selection Bias
Selection Bias
Definition: Systematic error resulting from procedures used to select
subjects into a study
Sources:
Case-Control Study: Selection/participation of cases and controls related to
exposure status.
Cohort Study: Selection of exposed and unexposed subjects (or follow-up of
exposed and unexposed) related to outcome.
Effects: Lead to a result that is different from what would have been obtained
from the entire population targeted for the study.
Participation differs on exposure and

outcome
Selection Bias (cont.)
25 E+ E-
Cases 150 75
E+ E- Controls 150 200

75% of E-
Cases 150 100 cases
Estimated OR = 2.7
75% of E-
Controls 150 200
cases
Selection bias
75% of E-
True OR = 2 E+ E-
controls
Most likely to occur in Cases 150 75

case-control and
Controls 150 150
retrospective cohort studies
because exposure and outcome have 25
occurred by time of subject selection Estimated OR = 2
into study 50 No selection bias
Types of Selection Bias
• Case-Control Studies
– Control selection bias
– Self-selection bias
– Surveillance, diagnostic, and referral bias
• Cohort Studies
– Loss to follow-up
– Healthy worker effect
Control-Selection Bias
Bias that can occur if controls are more (or less) likely to be selected
if they are exposed (or unexposed)
This bias occurs when controls fail to represent the exposure distribution in
the source population from which the cases were identified because controls
do not accurately represent the same source population as the cases
Research Question: Is socioeconomic status associated with cervical cancer?

Potential Problem: Different criteria used to select cases and controls (non-
comparability)
Control-Selection Bias (cont.)
250 hospital cases of 250 hospital cases of
cervical cancer cervical cancer
250 neighborhood controls

from door-to-door survey hospital controls
conducted during work day
door-to-door survey
conducted in the evening
Problem: Who is at home during the day? Solution: Use of identical selection criteria
People at home during the day are more helps ensure that cases and controls come
likely unemployed or older (retired). The from the same source population as the
likelihood of being included as a control is cases.
therefore related to the exposure (SES).
19
Control-Selection Bias (cont.)
Selected controls have lower SES than cases:
 40% of controls in source population (true) have low SES (100/250=40%)
 60% of controls in selected population have low SES (150/250=60%)
True Association Too many Results with Selection Bias
Low SES High SES Low SES High SES

Cases 150 100 Cases 150 100
Controls Controls
100 150 150 100
True Observed
OR = 2.25 (Biased) OR = 1.5
Self-Selection Bias
Bias that can occur if willingness or ability to participate is related
to both exposure and disease status
In a case-control study, cases and controls should be selected
independent of exposure status
Goal is to have comparable rates of participation in all categories of
disease and exposure
Research Question: Does taking antidepressants during pregnancy increase

the risk for birth defects?
Potential Problem: Mothers who took antidepressants during pregnancy and
had babies with birth defects may be more willing to participate in a case
control study compared with other mothers
Self-Selection Bias (cont.)
Participation rates:
 80% of mothers who took antidepressants and had a baby with a birth defect
 60% of all other mothers
True Association 80% Results with Selection Bias

60%
AntiD+ AntiD- AntiD+ AntiD-
Cases 200 100 Cases 160 60
Controls Controls
500 500 300 300
True Observed
OR = 2.0 (Biased) OR = 2.7
Surveillance, Diagnostic, & Referral Bias
Bias that can occur if patients who are exposed are more likely
to be diagnosed with a disease
This bias occurs when the people who play a role in, or are responsible for,
disease ascertainment base their diagnosis on whether or not the
participant has the exposure of interest
Research Question: Do statins protect against breast cancer incidence?

Potential Problem: Patients who are taking statins may be more likely to
receive a breast cancer diagnosis (or to receive a diagnosis sooner) because
they have more frequent contact with their physicians; or patients who are
not taking statins may be less likely to have their breast cancer diagnosed
Surveillance, Diagnostic, & Referral Bias (cont.)
Breast cancer diagnosis:
 Breast cancer was detected in 100% of statin users
 Breast cancer detected in 50% of participants who were not using statins
True Association 100% 50% Results with Selection Bias
Statin+ Statin- Statin+ Statin-

Cases 100 100 Cases 100 50
Controls Controls
200 200 200 200
True Observed
OR = 1.0 (Biased) OR = 2.0
Selection Bias in Case-Control Studies
Bias Effect Prevention
Control Selection Toward or Use identical selection criteria for
Away from Null cases and controls (recall purpose
of controls)
Differential Participation Toward or Obtain high participation rates for

Away from Null all groups
Differential Surveillance, Toward or Use identical selection criteria for

Diagnosis, Referral Away from Null cases and controls by accounting
for referral and diagnosis patterns
for disease of interest
Types of Selection Bias
• Case-Control Studies
– Control selection bias
– Self-selection bias
– Surveillance, diagnostic, and referral bias
• Cohort Studies
– Loss to follow-up
Loss to Follow-Up
Bias that can occur if study participants exit a study for reasons
related to both exposure and disease
Think of this type of bias as selection out of a study rather than entry into a
study. Concern in cohort and experimental studies.
Why do we care about losses to follow-up?

If people are lost to follow-up, we cannot know whether they developed
disease
Research Question: Does OC use increase risk of thromboembolism?
Potential Problem: Loss to follow up may be associated with exposure and

outcome
Rarely possible to determine if losses are related to both

outcome and exposure.
Loss to Follow-Up (cont.)
Example: Association between OC use and thromboembolism (TE)
True relationship Results with loss to follow-up

OC+ OC- 90% OC+ OC-
20%
TE+ 20 10 TE+ 18 2
TE- 9,880 9,990 TE- 8,974 8,986
N 10,000 10,000 N 8,992 8,988
1,008 1,012
True RR = 2.0 Observed (Biased) RR = 9.0
Differential Loss to Follow-Up – Losses are related to both exposure and disease
Loss to Follow-Up (cont.)
Example: Association between OC use and thromboembolism (TE)
True relationship Results with loss to follow-up

OC+ OC- 20% OC+ OC- 90%
TE+ 20 10 TE+ 4 9
TE- 9,880 9,990 TE- 8,974 8,986
N 10,000 10,000 N 8,992 8,988
1,008 1,012
True RR = 2.0 Observed (Biased) RR = 0.4
Differential loss to follow-up results in unpredictable direction of
measure of association.
Selection Bias in Cohort Studies
Differential Loss to Toward or Away from Since outcome cannot be

Follow-Up Null known without good follow-
up, must maintain high
participation rates
Selection Bias –
What Are the Solutions?
• Little or nothing can be done to fix this bias once it has
occurred
– Cannot control for it in the analysis
• You need to avoid it when you design the study. For example,
plan on:
– Using the same criteria for selecting cases and controls
– Obtaining high participation rates
– Taking into account diagnostic and referral patterns of
disease
– Obtaining all relevant subject records
Information (Observation) Bias
Information Bias
Definition: An error that arises from systematic differences in the way
information on exposure or disease is obtained from the study groups (aka
observation bias)
Sources: Both cohort and case-control studies are susceptible. Occurs after the
subjects have entered the study.
Case-Control Study: Different techniques are used to interview cases and
controls
Cohort Study: Different procedures are used to obtain outcome information on
exposed and unexposed.
Effects:
Results in participants who are incorrectly classified as either exposed or
unexposed or as diseased or not diseased. The misclassification can be
differential or non-differential.
Differences in how data is collected

Information Bias (cont.)
25% of E+ are classified as E-
E+ E-
Disease (200-50) (100+50)
No Disease
(100-25) (100+25)
E+ E-
Disease 200 100 Estimated (Biased) OR = 1.67
No Disease
100 100
True OR = 2.0 10% of D+ are classified as D-
E+ E-
Disease (200-20) (100-10)
No Disease
(100+20) (100+10)
Estimated (Biased) OR = 1.83

Types of Information Bias
• Recall Bias
• Interviewer/Recording Bias
• Misclassification (measurement error)

Recall Bias
Bias that can occur if people with disease remember/report their
exposure differently (e.g. more often or less often) than people
without disease
Case-control study: Cases are more or less likely to recall prior

exposures than controls
Retrospective cohort study: Diseased participants are more or less likely
to recall prior exposures than non-diseased participants
Hmm…
Let me
Research Question: think…
Are birth defects associated with use of Bendectin in
pregnancy?
Potential Problem:
Mothers of affected infants may have more accurate recall
of their exposure to Bendectin
Recall Bias (cont.)
Cases and controls recall their exposures differently
 100% of cases accurately recalled their exposure to Bendectin
 60% of controls who used Bendectin accurately recalled their exposure (40%
forgot)
True Association 40% Results with Recall Bias
Bendect+ Bendect- Bendect+ Bendect-

60%
Cases 50 50 100 Cases 50 50 100
Controls Controls 100
50 50 100 30 70
True Observed
OR = 1.0 (Biased) OR = 2.3
Recall Bias: Solutions
• Use controls who are also sick to promote comparable recall
• Use standardized, closed-ended questionnaires to promote

consistency and specificity
o Danger of recall bias lessens as specificity of exposure
assessment increases
• Include considerations for sensitive exposures/topics
• Examine pre-existing data (e.g. medical or employment records) or

use biological measurements to ascertain exposure
• Ask subjects about their knowledge of the study hypothesis (at end
of interview), and analyze data accordingly
Poor Recall vs. Recall Bias
Poor recall happens all the time
Bias only occurs when recall is different for people who do and
do not have disease (differential)
• 90% of cases and 90% of controls have accurate recall  NO bias
• 90% of cases and 70% of controls have accurate recall  bias
Interviewer Bias
Bias that can occur if there is a systematic difference in
soliciting, recording, or interpreting information
Case-control study: Interviewer is influenced by participant’s case or control status

Cohort study (including RCT): Interviewer is influenced by participant’s treatment or
exposure status
Think about it.
Are you sure
you didn’t take
Research Question: it…?
Is the presence of flame retardants in your office
associated with migraine headaches?
Potential Problem:
Interviewers who know the amount of flame retardants in your
office may be “looking” for disease
Interviewer Bias
Exposed and unexposed groups report their outcomes differently
 100% of unexposed accurately recalled their migraine incidence
 20% of exposed people reported having a migraine, even though they did not
have one + 20%
True Association Results with Interviewer Bias
- 20%
FR+ FR- FR+ FR-
Migraine + 50 50 Migraine + 60 50
Migraine - 50 50 Migraine -
40 50
Total Pop 100 100 Total Pop 100 100
True Observed
RR = 2.0 (Biased) RR = 1.2
Interviewer/Recording Bias: Solutions
Think!
• Provide rigorous training (e.g. nondirective Are you sure
you didn’t
probing) take it…?
• Monitor interviewers
• Blind the interviewer/ abstractor to study
hypothesis and disease or exposure status of
subjects
• Use standardized questionnaires consisting of
closed-ended, easy questions with appropriate
response categories
• Use standardized methods of outcome (or
exposure) ascertainment
• Examine pre-existing data (e.g. medical or
employment records)
Reducing Information Bias
• Little (or nothing) can be done to fix
information bias once it has occurred
• Information bias must be avoided through

careful study design and conduct (see
strategies for individual biases)
– Information bias cannot be “controlled for” in the
analysis
Measurement (Misclassification) Error
Bias that can occur if study participants are placed into the
wrong exposure or disease category
This is the most common form of bias

Sources:
– Self-reports (BP, cancer, smoking, drug use)
– Errors on medical records, death certificates, etc.
– Errors in how data are captured (data entry)
– Non-specific exposure or disease definitions (cervical vs. uterine
cancer, using job title for occupational exposure)
Effect:
Non-differential → bias towards the null
Differential → bias upward or downward
Non-Differential Misclassification
Non-differential means that the extent of the misclassification
is the same (not different) for both groups
If EXPOSURE is the thing being If DISEASE is the thing being

misclassified, then the extent of the misclassified, then the extent of the
exposure misclassification is the disease misclassification is the same
same for cases/controls or for exposed and unexposed persons
diseased/non-diseased
E+ E- E+ E-
Disease a b Disease a b
No Disease c d No Disease c d
If EXPOSURE is the thing being misclassified, then the extent of the exposure
misclassification is the SAME for cases/controls or diseased/non-diseased
Example: A case-control study to determine whether smokers have an

increased risk of bladder cancer
Among smokers, only half (50%) are correctly classified in both cases and
controls (the other 50% of smokers are classified as non-smokers)
True Association Results with ND Misclassification

Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 100 200
Controls 100 200 Controls 50 250
OR = 4.0 OR = 2.5
E+ and E- groups become similar  bias towards the null

If DISEASE is the thing being misclassified, then the extent of the disease
misclassification is the SAME for exposed and unexposed
Example: A case-control study to determine whether smokers have an

increased risk of bladder cancer
Among bladder cancer cases, only 75% are correctly classified in both the
exposed and unexposed groups (the other 25% of cases are classified as
controls)
True Association Results with ND Misclassification

Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 150 75
OR = 4.0 OR = 3.0
D+ and D- groups become similar  bias towards the null

Differential Misclassification
Differential means that the extent of the misclassification is
different (not the same) for both groups
If EXPOSURE is the thing being If DISEASE is the thing being

misclassified, then the extent of the misclassified, then the extent of the
exposure misclassification is disease misclassification is different
different for cases/controls or for exposed and unexposed persons
diseased/non-diseased
E+ E- E+ E-
Disease a b Disease a b
No Disease c d No Disease c d
misclassification is DIFFERENT for cases/controls or diseased/non-diseased
Example: Exposure ascertainment is related to outcome

100% exposure accuracy in CASES
50% exposure accuracy in CONTROLS
**Differential degree of accuracy between the compared groups
True Association Results with D Misclassification

Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 200 100
OR = 4.0 OR = 10.0
Variable effects; here:  bias away from null

misclassification is DIFFERENT for exposed and unexposed groups
Example: Exposure ascertainment is related to outcome

50% exposure accuracy in CASES
100% exposure accuracy in CONTROLS
**Differential degree of accuracy between the compared groups
True Association Results with D Misclassification

Smk+ Smk- Smk+ Smk-
Cases 200 100 Cases 100 200
OR = 4.0 OR = 1.0
Variable effects; here:  bias toward the null

In Which Direction Does Misclassification
Bias The Results?
Non-differential misclassification tends to bias the results in
which direction?
 Toward the null
 Away from the null
 Either towards or away from the null
Differential misclassification tends to bias the results in which

direction?
 Toward the null
 Away from the null
 Either towards or away from the null
Measurement Error
(Misclassification): Solutions
Improve accuracy of collected information:
• Multiple measurements of exposure and disease (e.g.,
several BP readings, continuous monitoring of air pollution)
• Validation – corroborate the data using several sources (e.g.,

questionnaire data, medications, medical records, laboratory
tests)
• Use most accurate source of information available

• Use sensitive and specific criteria for exposure and disease
Information Bias
Recall Bias Toward or Away from Null • Use sick controls

• Quality questionnaires
• Use pre-existing data
• Ask participants what
they know
Interviewer Bias Toward or Away from Null • Masking
• Quality questionnaires
• Train interviewers
Measurement Error Toward or Away from Null • Sensitive and specific

definitions of E and D
• Accurate data sources
• Multiple measurements
• Validation
Quick Recap
CASE-CONTROL COHORT EXPERIMENTAL
BIAS
STUDY STUDY STUDY
Control selection bias 

Self-selection bias  
Selection
Surveillance, diagnostic,
referral bias  
Loss to follow-up  
Recall bias  
Information
Interviewer bias   
Measurement error
(misclassification)   
Questions to Ask Yourself When
Interpreting Study Results…
• Given the conditions of the study, could bias have
occurred? Which types?
• Is bias actually present?
• Which direction is the distortion? Is it towards the null or

away from the null?
• Are consequences of the bias large enough to distort the

measure of association in an important way? (qualitative,
quantitative)
Practice Questions
Question 1
In a hospital-based case-control study, investigators selected only
one disease for controls.
It turned out that this disease was also caused by the exposure.
Did this cause a bias? If so, what was the direction?
This is another way of saying that controls had a higher level of exposure
than the source population, or that the number of exposed controls is too
high
Question 1
Population targeted for study Biased study sample
E+ E- E+ E-
Cases a b Cases a b
Controls c d Controls c d
C
OR = (a/C)/(b/d)
Yes, there was a bias,

and the direction was toward the null
Question 2
A study found a relative risk of 1.5. A reviewer critiqued the study.
She pointed out ways that the estimate was biased by non-
differential misclassification of exposure and argued that therefore
there really may have been no association. Is the reviewer correct?
Question 2
The reviewer is NOT correct.
Non-differential misclassification of exposures biases
toward the null. Therefore, had there been no NDME,
the relative risk would have been even greater.
Question 3
In a case-control study of condom use and urinary tract infection (UTI) in young women,
the investigators reported an odds ratio of 2.4 for the association between use of a
spermicide-coated condom in the previous month and UTI incidence (Fihn et al, 1996).
You suspect that the control women were less likely to remember that they had used a
spermicide-coated condom in the previous month than were the women with a UTI. If
such information bias occurred (assuming no other bias or confounding), it would have
led the investigators to observe an effect of spermicide-coated condom use on UTI that
was:


a. An overestimate of the true association (bias away from the null).
b. An underestimate of the true association (bias towards the null).
c. The same as the true association.
Question 3
Actual information for study Biased information collected
E+ E- E+ E-
Cases a b Cases a b
Controls c d Controls c d
OR = (a/c)/(b/d)
Yes, there was a bias,

and the direction is away from the null

Bias

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bias

Uploaded by

Copyright:

Available Formats

Selection and Information Bias

Sources: Caused by investigator or study participants during the design or

Effects: Create appearance of an association when there is none or mask an

Selection bias = occurs during

Bias cannot be controlled for in the analysis!

– DIFFERENTIAL: Extent of errors is DIFFERENT ALWAYS leads

RR=1.0 TRUTH Biased

RR=1.0 Biased TRUTH

Biased TRUTH RR=1.0

Participation differs on exposure and

E+ E- Controls 150 200

Most likely to occur in Cases 150 75

Research Question: Is socioeconomic status associated with cervical cancer?

250 neighborhood controls

True Association Too many Results with Selection Bias

Low SES High SES Low SES High SES

Research Question: Does taking antidepressants during pregnancy increase

True Association 80% Results with Selection Bias

Research Question: Do statins protect against breast cancer incidence?

True Association 100% 50% Results with Selection Bias

Statin+ Statin- Statin+ Statin-

Differential Participation Toward or Obtain high participation rates for

Differential Surveillance, Toward or Use identical selection criteria for

Why do we care about losses to follow-up?

Potential Problem: Loss to follow up may be associated with exposure and

Rarely possible to determine if losses are related to both

True relationship Results with loss to follow-up

TE- 9,880 9,990 TE- 8,974 8,986

N 10,000 10,000 N 8,992 8,988

True relationship Results with loss to follow-up

TE- 9,880 9,990 TE- 8,974 8,986

N 10,000 10,000 N 8,992 8,988

Differential Loss to Toward or Away from Since outcome cannot be

Differences in how data is collected

Estimated (Biased) OR = 1.83

• Misclassification (measurement error)

Case-control study: Cases are more or less likely to recall prior

Bendect+ Bendect- Bendect+ Bendect-

• Use standardized, closed-ended questionnaires to promote

• Include considerations for sensitive exposures/topics

• Examine pre-existing data (e.g. medical or employment records) or

Case-control study: Interviewer is influenced by participant’s case or control status

• Information bias must be avoided through

This is the most common form of bias

If EXPOSURE is the thing being If DISEASE is the thing being

Example: A case-control study to determine whether smokers have an

True Association Results with ND Misclassification

E+ and E- groups become similar  bias towards the null

Example: A case-control study to determine whether smokers have an

True Association Results with ND Misclassification

D+ and D- groups become similar  bias towards the null

If EXPOSURE is the thing being If DISEASE is the thing being

Example: Exposure ascertainment is related to outcome

True Association Results with D Misclassification

Variable effects; here:  bias away from null

Example: Exposure ascertainment is related to outcome

True Association Results with D Misclassification

Variable effects; here:  bias toward the null

Differential misclassification tends to bias the results in which

• Validation – corroborate the data using several sources (e.g.,

• Use most accurate source of information available

Recall Bias Toward or Away from Null • Use sick controls

Measurement Error Toward or Away from Null • Sensitive and specific