You are on page 1of 33

Epidemiology Final Exam Outline:

Relative Risk

 Risk
o The probability of developing a disease
 Incidence is a measure of risk
 Prevalence is not a measure of risk
 Because disease may not be newly developed
o Relative risk is the ratio of the absolute risk for disease in an exposed group
versus an unexposed group over a defined time interval
 Examples of this would be: relative risk, risk ratio and rate ratio
 Relative risk is the ratio of the risk for disease in an exposed versus
unexposed group over a defined time interval.
 RR = Incidence in exposed during specified time interval/ Incidence in
unexposed during specified time interval.
o Absolute risk used for relative risk is the probability that a disease-free individual
will develop a given diseases over a specified time interval
 Examples of this probability are: Cumulative incidence, attack rate, and
incidence rate
 Absolute risk shows the magnitude of risk in a group of people
 Absolute does not involve a comparison
 Absolute risk does not tell us the excess (or decreased) risk associated
with the exposure
 ? Which is why it is not used to calculate the interaction of
exposures.?
 Absolute risk does not provide any information about associations
between exposure and disease.
 Measures of morbidity
o Prevalence = (the number of cases of a disease in a population during a specified
period of time)/ (the number of persons in the population during that period of
time)
o Incidence:
 Cumulative incidence = (Number of new cases of a disease occurring in
the population during a specified period of time) / ( the number of
persons at risk of developing the disease during that period of time)
 Incidence Rate = (number of new cases of a disease occurring in a
population during follow-up) / (Total time at risk experienced by the
individuals in the population)
 ***time at risk is typically measured by person-years (years can
equal any amount of time)***
Study Design Prevalence Incidence
Cross-sectional YES NO
Case-control NO NO
Cohort YES YES
Randomized trial YES YES

 Study designs, morbidity, and relative risk


o In a randomized trial
 The study is experimental
 The epidemiological type can be analytical or applied
 Analytical epidemiology is the “identification” of determinants
of health-related states or events
o Determinants can be causes, risk factors, or treatments
 Applied epidemiology is the application of knowledge of
distribution of determinants of disease to improve health
o Ex. Screening programs and Public Health programs
 Longitudinal
 Used to evaluate the effect of a treatment or a preventative regimen
 Population of the study is typically a representative sample of general
population
 As long as internal validity is not threatened or compromised the
results of the study can have strong generalizability when applied
to the population sampled from.
 Can measure incidence (and prevalence)
 You can assess the relative risk in a randomized trial study.
 A variation of a RCT know as a case-crossover study varies from a case-
control study because each participant is both a case and control. This
type of study requires that the outcome and exposure are both acute and
intermittent.
 Like a cross-sectional study, a case-crossover study cannot
measure incidence
o Therefore, the relative risk cannot be calculated.
o In a cohort study
 The study is observational
 This decreases the strength of any causal relationship found in the
study. (cannot say definitively that a causal relationship is not
conclusively established)
 This can also be analytical epidemiology or applied epidemiology
 Longitudinal
 Depicts the incidence or natural history of a disease
 Natural history:
 Selection of population for a cohort study can be done 1 or 2 ways:
 Representative sample
 Enroll exposed and unexposed individuals
 Can also measure incidence (and prevalence)
 Therefore, relative risk can be calculated
 BEGINS WITH EXPOSURE STATUS
 An alternative study design: a case-cohort, selects its population as a
subset of an existing longitudinal cohort study, where all cases are
included, and a set of controls is randomly selected from everyone at
baseline
 This type of study can estimate incidence and relative risk, if and
ONLY IF, specialized methods are used to account for
oversampling of cases
o In a case-control study
 The study is observational
 This type of study is used for descriptive and analytical epidemiological
investigations
 Descriptive epidemiology “distribution” of health-related states
or events
 NOT longitudinal
 The population for the study is selected based on case and control status
 NOT able to measure incidence or prevalence
 Therefore, the relative risk cannot be calculated during a case-
control study
 BEGINS WITH THE DISEASE
 A nested case-control study is an alternative study design, in which the
cases and control are selected from within an existing longitudinal cohort
study.
 BUT incidence cannot be measured therefore the relative risk
cannot be calculated
o In a cross-sectional study
 The study is observational
 The type of study is used for describing the distribution of health-related
states or events occurring
 NOT longitudinal “snap shot”
 Population is typically selected to be a representative sample, but can
also be selected based on exposure (like a cohort)
 Like a snap shot of a cohort study
 Prevalence can be measured if the population is representative sample;
however,
 Incidence cannot be measured
o Therefore, the relative risk cannot be calculated
 Measurements of association
o An association is a statistical dependence between two or more events,
characteristics, or other variables
 The purpose of an epidemiological study is to determine whether there is
an association between a factor or characteristic and the development of
a disease
o Examples
 RR, Odds Ratio (OR), Prevalence ratio, Attributable risk
o REMEMBER ASSOCIATION IS NOT CAUSATION
 Risk Ratio
o Risk Ratio = Cumulative Incidence of the exposed group / Cumulative Incidence
of the unexposed group
o Dose-response
 One way to look at dose-response is to compare RR’s from different
levels of exposure.
 Does the RR increase (does the effect increase) as exposure
increases?
 Rate Ratio
o Rate Ratio = Incidence Rate of exposed group / Incidence Rate of unexposed
group
 RR key:
o RR > 1 = Having the exposure increases the risk of developing disease
o RR = 1 = Having the exposure does not change the risk of developing disease
o RR < 1 = Having the exposure decreases the risk of developing of disease

Odds Ratio

 Prevalence Ratio
o Similar to RR, except that PR uses prevalence instead of incidence
o Can be used in
 Randomized clinical trial (RCT)
 Cohort Study
 Cross-sectional Study
 This is the study that you cannot calculate RR
 Typically, only used in cross-sectional studies, because can use RR in RCT
and cohort studies
 In practice, not often used (other options for cross-sectional studies are
available)
o PR = (Prevalence of disease of exposed group) / (Prevalence of disease of
unexposed group)
 Odds
o Odds = Probability of an event happening / Probability of an event NOT
happening
o Odds = (P / (1-P)
 Odds of Exposure
o Odds that the case was exposed = Probability that a case was exposed /
Probability that a case was NOT exposed
o Odds that the control was exposed = Probability that a control was exposed /
Probability that a control was NOT exposed
 Odds Ratio (OR) case-control study = Odds that a case was exposed / Odds that a
control was exposed
o This is the odds ratio of exposure
o Sometimes called relative odds
o This is the only measure of association that can be used in case-control studies
o May also calculate this for a cross-sectional study
 OR Key:
o OR > 1: Cases were more likely to be exposed, consistent with when exposure
increases the odds of disease.
o OR = 1: Cases and controls are equally likely to be exposed, consistent with when
exposure is not related to disease.
o OR < 1: Cases were less likely to be exposed, consistent with when exposure
decreases the odds of disease (or is protective against disease).
 Protective Odds Ratio
o If the odds ratio is <1.0
 There is a decreases odds or protective effect
 Examples is on slide 26 of PowerPoint
 Dose-response
o You can also observe a trend in dose-response with the OR
 Matched case-control studies
o Cases and controls are matched by a third variable such as age, sex, etc.
o This is method to make sure there are the same proportion of ages, sexes, etc. In
the cases and control groups so that these variables do not influence the
outcome
o Concordant pairs: case and controls share the same exposure status
o Discordant pairs: cases and controls have a different exposure status
o To calculate the OR in a matched case-control study you will only use the
discordant pairs
 Odds ratio of exposure
o In a case-control study, we calculate the odds ratio of exposure
 This is because in a case-control study participants are selected based on
case/control status, so the estimate is based on the odds of exposure
among cases and control.
 Odds of disease:
o Odds of disease among exposed = (Number of exposed with disease)/ (Number
of exposed without disease)
o Odds of disease among unexposed = (Number of unexposed with disease)/
(Number of unexposed without disease)
 Odds Ratio (OR) for cohort study
o OR = Odds of disease among exposed/ Odds of disease among unexposed
 Odds ratio of disease
o In a cohort study, we can calculate an odds ratio, but it is the odds ratio of
disease
 This is because in a cohort study participants are not selected by their
disease status (the disease develops over time) so you can compare odds
of disease by exposure by exposure
 Odds ratio vs. Relative risk
o Both are useful measures of association between exposure and disease
o RR can only be calculated in a cohort or experimental studies because incidence
is needed for the calculation
o In case-control studies, the OR can only be calculated because incidence is not a
morbidity that can measured in this type of study design
o The two measures are related to each other
 OR tends to exaggerate the RR
 Can see this in a cohort study
o The two (OR and RR) are similar when:
 The cases (exposed) are representative, with regard to history of
exposure of all people with the disease in the population form which
cases are drawn, and controls (unexposed) are representative, with
regard to history of exposure, of all people without the disease in the
population form which the cases were drawn,
 The exposed and unexposed participants would have to be
selected as a representative sample and not
 Enrolled based on exposure and unexposed status
 And the disease being studied does not occur frequently (< 10%).
 When a disease is rare, the number of exposed individuals with
the disease and the number of unexposed individuals with the
disease will be relatively small compared to the exposed and
unexposed individuals without the disease
 However, cohort studies are not used for rare diseases because
they are inefficient
o Cohort studies are used for rare exposures
o So typically, since a cohort study would not be used in the
first place, for a rare disease unless the exposure was rare,
a similar OR and RR would not be seen.
o RR is the preferred measure of association because it incorporates the
development of disease, i.e., risk
o Under certain conditions such as: (1) a representative sample with regard to
history of exposure and (2) the disease does not occur frequently (Prevalence of
disease <10%) the OR can be used to approximate the RR.
o The OR always exaggerate the RR
 OR will be farther from the null value, making the effect observed seem
bigger
 If RR and OR > 1, the OR > RR
 If RR and OR <1, the OR < RR
 Study Designs & Measures of association

SIR/ KM RR AR OR PR
SMR
Case-report/ case- - - - - - -
series
Cross-sectional - - - - YES YES
Case-control - - - - YES -
Cohort YES YES YES YES YES YES
RCT - YES YES YES YES -

Attributable Risk

 Also known as: "Risk difference"


 Attributable risk is another measurement of association
 Attributable risk can be viewed two ways:
o The amount or proportion of disease incidence (risk) that can be attributed to a
specific exposure
o The amount or proportion of disease incidence (risk) we can prevent if we can
eliminate the exposure.
 Types of attributable risk
o There are two types of attributable risk
 Attributable risk (in exposed) - AR
 The excess risk of disease among the exposed population
attributable to exposure
 The question answered in regard to the association between
exposure and disease is:
o Among those who were exposed, how much disease can
we attribute to the specific exposure?
 AR and AR% are typically how this measurement of association is
presented
 Population attributable risk – PAR
 The excess risk of disease in total population attributable to
exposure
 This measurement of association answers the question:
o In the total population (both exposed and unexposed),
how much disease can we attribute to the specific
exposure
 Typically presented as PAR or PAR%
 Attributable risk (AR)
o AR = Risk of disease among exposed – Background risk of disease
o AR = incidence of exposed – Incidence of unexposed
 Gives the count for risk of disease due to exposure in exposed population
 Also known as risk difference
o AR% = [(incidence of exposed – Incidence of unexposed) / (Incidence of
exposed)] * 100%
 Gives the proportion of the risk for disease due to exposure in exposed
population
 Also known as attributable proportion, attributable fraction, etiological
fraction
 Population attributable risk (PAR)
o PAR = Risk of disease in total population – Background risk of disease
 Total population = both exposed and unexposed
o PAR = Incidence of total population – Incidence of unexposed
 If the exposure is positively associated with disease, the incidence of
disease in the exposed ≥ incidence in total population
 The count
o PAR% = [(Incidence of total population –Incidence of unexposed)/ (Incidence of
total population)] * 100%
 The proportion of risk for the disease in the total population that is due
to exposure
 Also known as population attributable fraction
o Total population incidence is not always directly displayed
o To calculate it using unknown you need:
 Incidence of disease in exposed
 Incidence of disease in unexposed
 Prevalence of exposure in total population
o To estimate the total population incidence
 Calculated a weighted average of incidence form exposed and non-
exposed
 Incidence of total population = (Incidence of exposed * % of population
that was exposed) + (Incidence of unexposed * % of population that was
unexposed)
 REMEMBER that the interpretation of an incidence using an estimated
total population incidence includes the key word: estimated
 AR, PAR & prevention
o Attributable risk (AR) (among exposed)
 Quantifies the potential for risk reduction if the exposed is removed
 Clinical application
 (RCT)
o Population attributable risk (PAR)
 Quantifies the potential for risk reduction if the exposure is eliminated
from the population
 Public health application
 Ex. Slide 35
 Only if the exposure is the true cause, if confounders are present
and significantly influence the association between the exposure
and disease, attributable risk can be over exaggerated and an
intervention may intern not be as effective or effective at all
 It is important to assess if the exposure can be eliminated
o Relatively assed, the amount of money available or
present alternatives affects if an exposure "can" be
eliminated.
 Quantifying the impact of the intervention (elimination of
exposure)
o If AR is high, but exposure is rare --> small overall benefit
o If AR is lower, but exposure is very common --> larger
benefit
 Types of risk measures
o Absolute Risk
 Magnitude of risk
 Measurements of morbidity: Incidence and Prevalence
o Relative Risk
 Measure of relative risk (strength of association)
 Measured in etiological research
o Attributable Risk
 Measure of absolute excess risk
 Measured in public health and clinical practice
Regression Modeling

 Hypothesis testing
o Research question—the first step before any epidemiological study can be
designed
 Must be clearly defined and explicitly stated
o Study hypothesis
 Null hypothesis (Ho)
 the exposure is not associated with the outcome
 Alternative Hypothesis (Ha)
 The exposure is associated with outcome
o Association
 If your results do not reject the null hypothesis --> no association
 If your results reject the null hypothesis, which by default accepts the
alternative hypothesis --> there is an association
 Statistical significance 

o Statistical significance refers to whether or not the change in a measurement


observed in a study is different enough from the null (no effect) value that the
observed change is not likely to have to have occurred by chance alone.  
o A statistically significant result is not likely to have occurred by chance alone 
o Statistical significance is assessed with p-values and confidence intervals (CI) 
 P-values  
o A p-value is a value used to illustrate the chance that you would get a result as or
more extreme than the result observed assuming the null hypothesis is true 
o A p-value < 0.05 is considered statistically significant  
o P <0.05 means that there is less than a 5% chance that a result as strong or
stronger would have been observed if there was really no association.  
 Confidence Interval (CI) 
o Also known as confidence limit  
o A 95% confidence interval represents the range of values that, if you repeated
your experiment many times, 95% of the time your results would fall into this range 
o Statistical significance of the confidence interval  
 If the confidence interval includes the null value, the result is not
statistically significant  
 If the confidence interval does not include the null value, the result is
statically significant
o Unlike the p-value, the confidence interval can give some idea of the magnitude
of the association (i.e. how much it may vary)
 Sample size  
o A key factor that influences statistical significance is the sample size 
 With more participants you can detect a smaller difference between
groups 
 With a large enough sample size, almost any comparison may be
statistically significant  
o It is also possible to have a statistically significant effect that is not considered
clinically relevant
o Larger sample sizes lead to increased power
o Smaller sample sizes lead to decreased power
o Clinical relevance vs. population relevance
 Number of comparisons  
o Type 1 error is still possible with a p-value set at p = 0.05 or with a 95%
confidence interval  
o If you make 20 comparisons, you would expect that 1 would be significant by
chance alone.  
 With a very large number of comparisons, it is likely that some
observations will occur by chance as opposed to a true association.
 Some researches address this possibility by using a smaller cutoff for the
p-value when a study involves a large number of comparisons.

 Basic concepts of regression


o Regression
 Best line of fit for a scatterplot
 Used to characterize the slope of the relationship between the exposure
and the outcome
 Will should the relationship between a unit increase in exposure
and the outcome
o Regression methods find the model that minimizes the “distance” between the
data points and the model
o Regression models
 Types:
 Explanatory models
 Predictive models
 Computer programs are used to optimize the most complex data sets
 There are major benefits for using regression modeling
 Most systems are too complex to model with simplified models
(2by2 tables)
 Models allow for you to adjust for additional variables at the same
time (i.e., like age-adjusted rates but for numerous variables)
 Regression model variables
 Dependent variable
o The variable you are trying to predict
o Outcome (disease)
o In a scatterplot, typically on the y-axis
 Independent variable (s)
o Variable(s) used to predict the dependent variable
o Exposure(s) or risk factors
o In a scatter plot, typically on the x-axis
o A crude regression model only includes 2 variables (exposure and outcome)
o A regression model with many variables (exposure, outcome, additional
covariates)
 Multiple, multivariable, adjusted
 Not called multivariate
o There are many types of models that can be used to describe these relationships,
but all are not always appropriate to use for every situation
 The limitation and assumptions made to simplify a model influence which
models may or may not be “appropriate”
o Factors that affect the selection of a model are:
 The type of outcome variable
 The type of study design
 The correlation between different variable in the model
 Restrictions (or not) on the shape of the modeled line
 The statistical method used
 Parametric, nonparametric, Bayesian
o Regression models are used to present information about associations, not
about causation
o Linear regression
 Straight line
 Only two variables
 E(Y) = α + βx; E(Y)—expected value of outcome, x—exposure, α—y-
intercept, β—slope
 A linear model predicts a continuous variable; the outcome range is
between (-∞ to ∞)
 Assumptions made for linear model:
 The residuals (distance actual points are from the regression line)
are normally distributed
 Variance of the residuals is constant
 The relationship between X and Y can be described with a line
 Interpretation of β:
 If exposure has no effect—if the null hypothesis is true—the slope
will equal 0; horizontal line, E(Y) = α (the y-intercept)
 If exposure has an effect
o Positive association; β > 0
o Negative association; β < 0
 Β is statistically significant when the p-value < 0.05 or when the
95% CI does not include 0.
 Multiple linear regression
 Used to adjust for covariates
 Adds another variable to the model:
o E(Y) = α + (βx)1 + (βx)2
 Logistic regression
o Used when outcome was dichotomous
 i.e., yes vs. no
 used to predict the probability of the outcome, not the actual value of
the outcome
 this model is by definition limited to values between 0-1
o commonly used in case-control studies
 outcome is either case or control in this study design
o use a logit function to describe probability (p)
o The coefficient in a logistic regression, β, equals the ln (odds ratio); exp(β) = odds
ratio
 Generally, the odds ratio is reported
 Same interpretation should be used
 Null value = 1; so, confidence intervals that include 1 are statistically
insignificant
o A multiple logistic regression model is similar to a multiple linear regression
o Other commonly used logistic models listed in PowerPoint; slides 61-62
 Model building
o Model size and sample size
 Generally, the larger the sample size the more variables you can put in
the model equation and still get a good result
 Parsimonious models are generally preferred
 Using the least # of covariates while still having a good
explanation of the data
o Methods for variable selection
 The kitchen sink method—use everything
 Have to have a sufficient sample size to support this approach
 Stepwise model fitting—Computer selects “best fitting” variables
 Downside this can create biased estimates
 Both the kitchen sink and stepwise approaches
 Consider only the statistical correlation of variables, not biological
relevance
 Potential to put in variables in a nonsensical way
 They target the associations/ predictions, but these associations
and predictions are not always helpful in thinking about the true
causal relationships
 Use variables identified in previous studies as important
 Using the variables that you identify as being associated within the
outcome—exposure relationship.
 Create a priori theoretical model of how variable interact, and make
decisions using the model—likely to become the standard
 The latter three on the methods for variable selection are preferred
o Multicollinearity
 Model fitting does not work if all covariates are highly correlated with
each other
 Examples of this: slide 68
 Solutions for this: slide 68
Interpreting studies

 Measures of association
o Examples:
 RR, OR, PR, AR, PAR and regression coefficients
 Introduction to causal inference
o Causation
 Association is not causation
 In order to improve public health, we have to identify true causes from
factors which simply have an association with health outcomes
 Causal inference
 Causal inference is the process of drawing a conclusion about
whether a relationship is causal
 Components of causal inference:
o Evaluation of the likelihood of a causal association in a
given study
o Evaluation of all epidemiology studies
o Evaluation of all data
 Inference from an individual study
o First, is there an association in the study
o Second, is the association in an individual study likely to be
causal?
o If there is a significant association:
 It might be real
 Statistical significance
o It may be causal
o It might not be causal
 Confounding
 It might occur by chance (spurious)
 Statistical insignificance
 It might not be real (i.e. there is not really an
association)
 bias
 Statistical significance
o Factors that influence statistical significance
 Sample size
 Multiple comparisons
 Bonferroni corrections; reduced p-values or confidence interval
thresholds are all solutions for controlling for the influence of
multiple variable comparisons of the significance of an association
 Borderline significance
o A term used to describe results which are close to be statistically significant
 Typically, p is between 0.05- 0.10
o Rationale to use borderline significance
 Cutoff points we are arbitrary anyways
 Significance is highly dependent on the sample size
 Some studies can be difficult to obtain a large sample size
Confounding

 Confounding
o In confounding a third variable (the confounder) alters the observed association
between exposure and outcome
 The causal association were interested in is masked by the effect of the
confounder
o Confounding generally represents a real association within data
 It is not caused by error (bias) or by chance
 But when you do not address confounding, it prevents you from
determining causal relationships
o The classic criteria for a confounder is as follows:
 (1) The confounding factor must be causally associated with the outcome
(disease)
 (2) The confounding factor must be causally or non-causally associated
with exposure
 (3) The confounding factor must not be an intermediate variable in the
causal pathway between exposure and disease
o Positive confounding—the exposure—outcome association is exaggerated
further away from null hypothesis
 The magnitude of the crude OR is exaggerated
 Crude OR > Adjusted OR
o Negative confounding—the exposure—outcome association is attenuated—
closer to the null hypothesis
 The magnitude of the crude OR is attenuated
 Crude OR < Adjusted OR
 Intermediate variable – mediator
o A mediator represents an intermediate effect between exposure and disease. Or
and effect in the causal pathway between exposure and disease.
o NOT a confounder
o Example: slide 54
 Confounding, observational & experimental studies
o Confounding is the most important cause of spurious associations in
observational epidemiology studies
o Confounding is not a problem in experimental lab studies
 The are designed so that the only difference between study groups is the
exposure: thus, no confounder is possible
o Confounding is usually not present in randomized trials
 Randomization is preformed so that the comparison groups are as similar
to each other as possible with respect to every variable except for the
exposure
 Identifying confounders
o During design
 Biological model or underlying theory should allow you to specify
potential confounders in advance of study/analysis
 Collect information on potential confounders when possible
o During analysis
 Assess for confounding in a systematic way
 Known or potential confounding factors
 Other factors not previously known to be confounding factor, but may be
in your population
 Evaluate by comparing distribution of factor of both exposure and
outcome
o Confounding in regression
 “informal rule”
 Alternate method to identify confounders in multiple regression
models
 Compare how much the estimate for your exposure changes
when using
o A model with the confounder
o A model without the confounder
o Example: slides 22-24
o Identifying confounders using only your data is not suggested, because your data
can still be influenced by other variables or flaws in data collection, data analysis,
etc.
 It is critical to use other methods as well
 Prior (external) information
 Evaluation of study design and conduct
 DAGs—Directed Acyclic Graphs
 Type of causal diagram
o Directed—indicated direction of effect
o Acyclic—no path creates a circular loop
o Graph—created in graphical form
o Method used to select confounders
 Solutions for confounding
o At design stage
 Restriction (eligibility criteria)
 Matching (matcher-control study)
 Individual and group matching
 Randomization (randomized trial)
o At analysis stage
 Direct or indirect adjustments
 Age adjusted rates
o Adjusted rates are relative indexes rather than actual
measures of risk
 SMRs
 Example: slides 31-34
 Stratified analysis
 Or divide your population on different values of the confounder
and analyze within each subgroup
 Interpretation for stratified analysis: slides 36-41
o Can be done with RR or OR, even PR any measure of
association
 Mantel-Haenszel Pooled Estimates
 Slide 42
 Regression analysis
 Adjusted models = adjust for cofounders
 Slide 43
 Types of confounding
o Induced
 It is possible to create (induce) confounding where it did not exist
previously
 In selection of participants
o Selection is based on criteria associated with exposure
(cohort) or outcome (case-control)
 In matching of data
o Matching subjects in a case-control study on a confounder
can make this a confounder even if it is not a risk factor for
disease
 In analysis
o Controlling for a variable which is not a confounder may
inadvertently create confounding elsewhere
o Residual
 When strata of a variable are broad, there may be confounding within
the strata
 Confounding remaining due to inaccurately measured confounding factor
 Lack of adjustment for factors that are confounders
o Unknown
 Known knowns
 Confounders that you are aware of and have been accounted for
by design or analysis
 Known unknowns:
 Confounders that you are aware of but were not able to account
for
 Unknown unknowns:
 Confounders that you are unaware of and were not able to
account for
Bias

 Bias is any systematic error that results in a mistaken estimate of an exposure's effect on
the risk of disease
o Can occur in design, conduct, or analysis
o Effects internal validity
 Types of error
o Random
 Statistical variation
 Confidence interval
o Systemic (Bias)
 Main types of bias:
 Selection Bias
o Systemic errors in selecting study participants
o Distort the relationship between exposure and outcome
 Information Bias
o Systematic errors in collecting information
o Mistakes in exposure, disease status
o Distorts the relationship between exposure and outcome
 Selection Bias
o Occurs during the process used to recruit and enroll participants and results in a
distorted relationship between the exposure and outcome
 Examples of this can be seen in a
 Cohort study: exposed vs. Non-exposed
 Case-control study: diseased vs. Non-diseased
o Types of selection bias
 Response bias
 Differential loss to follow-up
 Participation in the study is related to exposure (cohort) or
disease (case-control)
o At enrolment (agreement/refusal)
o During follow-up (response/ non-response)
 If subjects in a particular exposure-disease group are more likely
or less likely to participate than other subjects, the observed
measure of association will be biased
 Example on slides: 25-29
 Exclusion bias
 Systematic difference in eligibility criteria of cases/controls
(exposed/ unexposed) that is related to exposure (or disease)
 Important to ensure that the only difference in eligibility of cases
and control is the disease status
 Example on slide 31
 Berkson's bias
 Applied to hospital-based case-control studies
 Systematic difference in case/control selection
o Occurs when combination of exposure and disease
increase the risk of admission to hospital
 Example on slide 32: coffee and pancreatic cancer
 Neyman's bias
 Also known as incidence-prevalence bias
 Exposures are related to survival or to disease status
o Especially when incidence may precede diagnosis
 Case-control example: on slide 33
 Cross-sectional example: on slide 33
 Typically leads to an underestimation of odds ratio (OR)
 Surveillance/ diagnosis bias
 Selection bias in case-control studies
 Individuals with known risk factors are more likely to diagnosed
for disease due to increased medical surveillance
o Those with exposure are more likely to have identified
disease (especially subclinical)
 Individuals with a family history of cancer may be more likely to
have cancer screen test
 Diabetics are more frequently screened for development of
hypertension
 Generalizability vs. Selection bias
 Remember generalizability relates to the external validity of a
study and the study populations.
o Is the study population similar to the reference
population?
 Population at risk?
 Selection bias impacts the internal validity of a study
o This happens when there is a difference in how the groups
for the study population are selected, which result in
biased risk estimates.
o When internal validity is threatened or compromised
external validity is compromised with decreases the
generalizability for the reference population.
 Information bias
o Systematic difference in the way the information on exposure o disease is
obtained from study groups
o Results in participants being incorrectly classified as either exposed or
unexposed/ disease or not diseased
 Misclassification of exposure or disease
 Information bias results from Misclassification
 Misclassification of exposure
o Exposed as unexposed
o Unexposed as exposed
 Misclassification of outcome
o Disease as non-diseased
o Non-diseased as diseased
 Differential vs. Non-differential
o Non-differential misclassification
 Error in assessing exposure (or disease) is similar
between comparison groups
 Measure of effect tends toward the null
o Differential misclassification
 Error in assessing exposure (or disease) differs in
comparison groups
 May increase or decrease measure of effect
 Example on slides: 45-50
o Occurs after subjects have entered the study
o Types of information bias
 Recall bias
 Of particular concern in case-control studies
 Systematic error due to differences in accuracy or completeness
of recall (memory)
o Cases tend to recall exposure more than control
o Results in an overestimate of the OR
 More common with exposures that are
o Involuntary
o Not associated with social stigma
 Example: Home pesticides use and birth defects
 Reporting bias
 Of particular concern in case-control studies and cross-sectional
studies
 Exposures that are associated with a stigma are likely to be
underreported
o Attitudes, perceptions, or beliefs about exposure
o Exposures that are not socially acceptable
o Results in an underestimation of the OR
 Example: Alcohol consumption and fetal alcohol syndrome
 Interviewer bias
 Systematic error due to interviewers' subconscious or conscious
data gathering
o Might more thoroughly question cases
o Usually a problem in case-control studies
o Generally, more of a problem when data gathered is
subjective
 Similar biases
o Observer bias
 Seen in RCT
 Corrected for with double blinding
o Responder (interviewee) bias
 Corrected for with blinding of participants
 Surveillance (diagnosis) bias
 Information bias in a cohort study
 When exposed are more likely to be under medical surveillance
o Disease is more likely to be diagnosed
o Especially when subclinical
 Solutions to Bias
o Avoid bias by
 Implementing a clearly thought out inclusion/ exclusion criteria
 Minimizing loss to follow-up
 Retention methods and tracing
 Applying the same methods for assessing exposure (or disease) for all
participants
 Training (and retraining) the data collection personnel
 Improved reliability
 Blinding (mask) interviewers and study participants
 Using a control group of diseased individuals—or multiple control groups
 Measurement methods
 Are methods valid and reliable
o Validity—accuracy
 Indicates how close a measurement is to the truth,
or the true state of nature.
 It is assessed with
 Sensitivity
o The probability of a positive test
given that the person truly has
disease as determined by a "gold
standard" test
o Example on slide 24 (screening test)
 Specificity
o The probability of a negative test
given that the person truly does not
have disease as determined by a
"gold standard" test
o Example on slide 25 (screening test)
 Positive predictive value
 Negative predictive value
o Reliability—Repeatability
 Agreement between multiple measurements on
the same sample
 Intra-observer (intra-subject) variation
 Inter-observer variation
 Assessed with:
 Percent agreement
 Kappa statistic
 Consider using multiple measurements
o Sequential testing
 Sensitivity reduced
 High specificity method
 Example on slide 40 (screening test)
o Simultaneous testing
 High sensitivity method
 Specificity reduced
 Example on slide 45 (screening test)
 Regular calibration of instruments
o This improves the reliability
o Assess whether bias exist by
 Comparing participants to nonparticipants
 Responders and on-responders
 Retained to those lost to follow-up
 Analyzing data by potential sources of bias
 Interviewer
 Laboratory batch
 Exposure assessment method (if varied)
 Outcome assessment method (if varied)
 Secondary control group
o Eliminate bias when possible
 Frequently, this is not possible because extent/direction of bias is often
unknown
 If extent of bias is known:
 Selection bias –make groups comparable
 Information bias –correct error in exposure /disease assessment
 Examples of bias slides: 58-64
Interaction

 Confounding, bias, interaction


o All may affect inferences
 A significant association that is not a real association
 Might be explained by bias
 A significant association that is not causal
 Might be explained by confounding
 Associations that change due to a third variable (but not a confounder)
 Might be explained by interaction
 Interaction (an effect modification)
o Interaction is the measure of an effect between exposure and outcome changes
over values of a third variable
o The third variable is an effect modifier
o Some terminology
 Interaction
 Biological interaction
 Effect modification
 Effect-measure modification
 Synergism vs. Antagonism
o Positive interaction (synergism)
 Result from combined exposure + effect modifier is great than would be
expected
o Additive interaction
 The additive model is where the effect of one factor is added to another
 Used with incidence and attributable risk
 If the observed result > than the expected result (i.e., sum of both
factors), then concluded that interaction exist on an additive scale
 In an additive interaction, we would expect that the result would
be "greater than the sum of the parts"
 The expected value = sum of effect of individual factors
 If actual (observed value) is greater than expected, we can
conclude that there is an additive interaction
 Generally, used in public health/disease prevention
 Typically, these cases you want to see if something works in
conjunction with something else to improve outcome
 Example slides: 35-38 (incidence) & 39-42 (Attributable risk)
o Multiplicative Interaction
 The multiplicative model is where the effect of two factors is
multiplicative
 The effect is greater than the product of the two parts
 Used with incidence, relative risk, and odds ratio
 If observed result > expected, concluded interaction exists on a
multiplicative scale
 Generally, used in etiologic research
 Etiologic –causation or origin type research
 Can only assess multiplicative scale in case-control studies
 Example slides: 45-49 (incidence) & 50-53 (relative risk)
o Negative interaction (antagonism)
 Results from combined exposure + effect modifier is less than would be
expected
 Identification/solutions
o Identifying interaction
 Heterogeneity across different strata
 Is it additive or multiplicative?
 Statistical tests for interaction
 Mantel-Haenszel test
 Use of an interaction term in regression models
o Addressing interaction
 Present stratum-specific estimates
 Additive or multiplicative
 Crude estimates are not meaningful
 In multivariable models (multiplicative interaction)
 Create interaction term
 Combine exposure and effect modifier into one categorical
variable
 Causal models
o Are used to help think about the relationship between various factors in an area
of study
o Can help in decisions about which variables one needs to measure/adjust for in
studies
o DAGs are becoming more commonly used in epidemiology
 DAGs describe a person's hypothesized causal model; including
confounders, mediators, etc.
 Remember that the lines drawn are as important as the ones absent from
the DAG
o Examples of causal models on slides: 64-66
Causal Interference

 Confounding, Bias & Interaction


o All may affect inferences
o A significant association that is not a real association might be explained by bias
o A significant association that is not causal might be explained by confounding
o Associations that change due to a third variable (but not a confounder) might be
explained by interaction
o While we want to eliminate confounding, and bias we generally do not want to
eliminate interaction
 Goal is to understand and describe the interaction when it is present
because it can help identify susceptible populations or help suggest route
of exposure or biological mechanisms of action
 Disease etiology
o Direct and indirect causes
 Direct causes are when the exposure itself causes the disease
 Indirect causes are when the exposure leads to effects witch eventually
cause the disease
o Necessary and sufficient causes
 Necessary causes are when the exposure must be present in order for
disease to develop
 Sufficient causes are when the exposure on its own is all that is needed in
order for the disease to develop.
 These are very rare causes.
o Types of causal relationships
 Necessary and sufficient
 Without the exposure the disease would not develop, and in the
presence of that exposure the disease is always present
 Necessary and not sufficient
 Various factors are necessary for the disease to develop, but no
single exposure alone is sufficient to cause disease
 Example of this is a two-hit or multiple hit theory of cancer
causation
 Sufficient and not necessary
 A single factor alone can result in disease development, but so can
other factors
 Example would be the factors that can cause a car accident
o Not all people who drive drunk, text and drive, or are sleep
deprived will have an accident
 Neither necessary or sufficient
 Made up of component causes that combined with other factors
may cause the disease
 Causal pie
 Causal criteria
o Causality at different phases
 Design phase
 Conceptual diagrams
 Analysis phase
 Assessment of confounding, bias
 Interpretation phase
 More than statistical probability
 More than one study
 Causal criteria
o Establishing criteria for causality was done because, “statistical methods cannot
establish proof of a causal relationship in an association. The causal significance
of an association is a matter of judgement which goes beyond any statement of
statistical probability.”
o Causal criteria
 Temporal relationship
 The exposure must occur prior to the disease
o Consider latency periods
o Consider reverse causation
o Designs which establish temporal relationships:
 Nested case control studies
 Case-cohort
 Retrospective cohort
 Prospective cohort
 RCT
o Example on slide 46
 Strength of association
 Concept: the stronger the association, the more likely it is causal
 Measured by the relative risk (RR) in cohort studies and the odd
ratio (OR) in case-control studies
o Not determined by p-value or CI
 However, strength may depend on factors other than a causal
relationship
o Other risk factors may be more prevalent
o Confounding might result in strong association
 Defining strength
o There is no agreement about how large an OR or RR needs
to be in order to be “strong”
 Many studies have a significant OR or RR which are
less than 2.
o This was a concept developed when powerful statistical
models were not in wide use.
o In situation s with multiple etiologic factors, individual
factors may well have a small influence
 Dose-response relationship
 Sometimes called biologic gradient
 Generally, a monotonic relationship between exposure and
disease
o Increased exposure  increased disease
o Not necessarily linear
 Provides strong evidence for a causal relationship
 However, there are concerns with dose-response
o It is well understood that not all causal relationships will
have a monotonic dose-response curve
 Threshold effects
 U-shaped and J-shaped curves
 Nutrients
o If analysis does not allow for non-linear dose-response
curves, these may no be identified
 Replication of findings
 Multiple studies have the same result
 Same result in different populations
 Same result in different subgroups
o Unless there is an explanation for a difference such as
interaction
 Same result using different study designs
 Biologic plausibility
 Association is consistent with biology
o Makes biologic sense
o Existence/description of a biological mechanism
explaining how exposure causes disease
 This evidence often comes from laboratory comes from laboratory
(in vivo/ in vitro) studies
 Consideration of alternative explanations
 Is the association the result of bias?
 Could the association be the result of confounding?
o What about residual confounding?
o Unknown confounders?
 Are there biases or confounders that affect the majority of studies
in this area?
 Cessation of exposure
 Reduced risk of disease when exposure reduced or eliminated
 Potential concerns with cessation
o Sometimes cessation data is not available
o Sometimes natural history cannot be reversed
 Consistency with other knowledge
 If causal, the relationship should be consistent with other
information
o Similar relationships
o Consistency with data from other sources
 Other data might be:
 Laboratory data
 Clinical knowledge
 Behavioral studies
 Surveillance data
 Economics, sociology, etc.

 Specificity of the association
 A specific exposure is related to a specific disease
 Weakest of the causal criteria
o Generally, not used today
o Today the causal criteria that should be met includes:
 Temporality
 Biological plausibility
 Replication of findings
 Consideration of alternative explanations
 Consistency with other knowledge
o Criteria such as: dose-response relationship, strength of association and
cessation of exposure, provide supporting evidence for causation.
Program Review

 Different goals in epidemiology


o Descriptive epidemiology
 Describes distribution of disease among person, place and time
 Some measures used
 Incidence, prevalence
 YLL, DALY
 Common study designs used
 Cross-sectional studies
 Repeated cross-sectional studies over time using NHANES or
BRFSS
o Analytical epidemiology
 Is there an association between exposure and disease?
 Some measures used:
 RR
 OR
 Regression coefficients
 Attributable risk
 Some study designs used
 Cross-sectional
 Case-control
 Cohort RCT
o Applied epidemiology
 Evaluation of health programs or policies
 Some measures used
 Change in Incidence, prevalence, count data (enrollment,
utilization data)
 Relative risk, attributable risk, odds ratios, regression analyses
 Some study designs used
 Case-control
 Cohort
 RCT
 Screening Programs
 Rationale for evaluation
o To determine change in actual survival
o To evaluate cost-benefit ratio of programs
o Screening tests have negative aspects
 Screening costs money
 Some screening test may result in some physical or mental harm
 No test is prefect—error rates
o Do the benefits of actual survival outweigh negative aspects of screening?
 Evaluation metrics
o Operational/ process measures
 Evaluate program in terms of its management, breadth, economics,
program efficiency
 Examples of operational measures
 # of people screened
 # of times people are screened
 Proportion of target pop. Screened
 Detected prevalence of preclinical disease
 Total cost of the program
 Costs per case found (including previously unknown cases)
 Proportion of positive screenees brought to final
diagnosis/treatment
 Predictive value of a positive test in population screened
o Outcome measure
 Evaluate change in health outcomes
 Examples of outcome measures
 Reduction of mortality in pop screened
 Reduction of case-fatality rate in screened individuals
 Increase in percent of cases detected at earlier states
 Reduction in complications
 Prevention of or reduction in recurrences or metastases
 Improvement of quality of life in screened individuals
 Possible study designs used for program evaluation
o Experimental
 RCT
o Observational
 Case-control
 Cohort
o Examples on slides: 25-27
 Potential bias
o Referral (volunteer) bias
 Is there a systematic difference in those who were screened versus
those who were not screened?
 Sometimes referred to as volunteer bias
 A form of selection bias
 Solutions: randomization, representative sampling, adjusted analyses
o Length-based sampling (prognosis) bias
 Type of selection bias
 Does screening selectively identify cases which have a better
prognosis?
 Clinical phase of an illness varies between people; so does the
preclinical phase
 With less frequent screening, we are more likely to identify people
with longer preclinical phases.
o Lead of time bias
 Apparent increase in survival resulting from earlier detection
 Change in observed survival versus change in actual survival
 Solution: incorporate estimated “lead-time” into analyses
 Is the survival of those screened > the survival of those not
screened + lead time?
 Example on slide 34
o Over diagnosis bias
 Type of information bias
 When screeners “over diagnose” disease
 Increased false positive rate
 But not an increased rate of false negatives
 As a result, the case group includes healthy individuals and survival
appears better than it is in reality
 Solutions to over diagnosis bias are: evaluate diagnostic process to
ensure quality
 Public health programs
o Rationale for evaluation
 Is there a measurable improvement in health outcome?
 Outcome measures
o Operational/process measures
 Number/ proportion of eligible persons in program
 Costs of program
 Outcome measures
o Difference or change in morbidity, mortality
o Patient satisfaction and quality of life
o Degree of dependence/ disability
o Example slide 40
 Is the cost-benefit ratio of the program favorable?
 Program evaluation in the healthcare context is also referred to as
outcomes research
o Efficacy, effectiveness, efficiency
 Efficacy: does it “work” under ideal “lab” conditions
 Effectiveness: does it work in real-life
 Efficiency: what is the cost-benefit ratio
 Cost = not just money, but discomfort, side effects, etc.
 Efficacy  Effectiveness  Efficiency
o Study designs used
 Ecologic (group data) studies
 Use of aggregate (group) data evaluate programs
 Advantages
o Typically, data is collected for other purposes
o Generally quick, relatively simple analyses
o Low cost
 Disadvantages
o May not include information about crucial variables
(confounders) of interest
o May reflect sampling biases of different surveys
 Important to understand characteristics of the surveys/methods
to collect these data
 Large dataset studies
 Medicaid, Medicare, hospital records
 Typically, individual-level data
 Large sample sizes
 “real-world” population, increases generalizability
 Can be analyzed quickly and inexpensively
 Disadvantages
o Often gathered to assess economics/ management, not
disease
o May not contain ideal definitions or details about exposure
or disease
o Data may be incomplete
o Selection bias may still be an issue
 Are there factors affecting whether those eligible
for a program are signed up for it?
 RCT
 Randomization
 Observational (nonrandomized) studies
 Before-after (using historic data)
 Program-no program (synchronized in time)
 Utilizers- nonusers
 Eligible -noneligible
 Also, combinations of the above
Ethics and Policy

You might also like