You are on page 1of 62

Weighing

Evidence and Identifying Causes


Justin Lessler, PhD
Johns Hopkins University
Objectives

► Turn general questions about disease and health into precise hypotheses

► Identify appropriate comparison groups

► Describe and calculate basic measures of association, including:


► Relative risk
► Risk difference

► Understand basic measures of uncertainty and significance

2
Recommended Readings and Resources

► Gordis, Leon. (1996). Epidemiology. (Fifth edition).

► CDC Self-Study Course SS1978, Lesson 3

► “John Snow—The Father of Epidemiology”

► “Is the p-value pointless?”

► “Understanding confidence intervals”

3
General Questions to Precise Hypotheses

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Following the Path of John Snow

► Famed physician of his day

► Anesthesiologist to Queen Victoria

► Investigations of cholera are recognized to be among the first


epidemiologic studies

► We will use his work as a running example throughout this module

Source: accessed on January 23, 2019, from https://upload.wikimedia.org/wikipedia/commons/c/cc/John_Snow.jpg 5


A General Question

► “What causes cholera?”

► Of scientific and public health importance

► Not specific enough to suggest a study or experiment

► Do not propose a mechanism or cause

► Maybe appropriate for framing your life’s work—but not a specific study

6
A Broad Hypothesis

► “Cholera is transmitted by contaminated water”

► More specific and, in principle, testable

► Is specific as to mechanism and cause (given present knowledge)

► Not tied to a particular set of data, experiment or setting


► Hence no clear observation that will support or refute the hypothesis

► Can help to guide investigation but not support or refute the hypothesis

7
A Specific Hypothesis

► “People who drink water with sewage in it are more likely to get cholera than
those who drink clean water”

► A clearly testable result of the broad hypothesis about cause

► Can be shown to be false (that is, is falsifiable)

► Still not tied to a particular study or observation

► Serves as the basis for the design of experiments and epidemiologic studies

8
A Hypothesis Tied to a Study or an Observation

► “People who get their water from the less


contaminated part of the Thames upstream of
London are less likely to die from cholera than those
who get their water from the more contaminated
part of the Thames downstream of London.”

► Tied to a specific location, study, or place

► Can be tested by a specific observation

► The basis of a specific study or analysis


Image source: accessed on January 23, 2019, from
https://commons.wikimedia.org/wiki/File:Monster_Soup_commonly_called_Thames_Water._Wellcome_V0011218.jpg 9
What Makes a Good Hypothesis?

► Relevant: answering the hypothesis sheds light on a larger scientific or epidemiologic


question

► Specific: postulates an exposure, outcome, and relationship

► Falsifiable: there is some observation that could lead us to conclude that the hypothesis is
false

► Precise: elucidates exactly the relationship that will be tested and an expected result

10
The Problem with a Bad Hypothesis

► “Vaccines cause autism”


► Non-specific in exposure or mechanism
► Does not define a particular outcome that can be falsified
► Hence cannot be disproved with particular evidence

► “Children who have received the MMR [measles, mumps, rubella] vaccine are significantly
more likely to have autism than those who do not”
► Has a clearly defined exposure, outcome, and relationship
► Can be falsified …
► … and has been proven false!
● For example, Jain et al., 2015, JAMA

11
Key Points

► A good hypothesis is the key to collecting and weighing evidence

► Broad scientific questions and general hypotheses can help guide research
► But are of limited value when evaluating evidence

► Evidence should be collected and evaluated based on a specific hypothesis that is


falsifiable by experiment or observation
► Specified, measurable exposure or comparison groups
► Specific, measurable outcome

12
Exercise

► Many of your family members developed diarrhea after a family dinner

► Make a specific, testable hypothesis that would help you determine the likely cause

13
Identifying Comparison Groups

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
What Is a Comparison Group?

► Specific epidemiologic hypotheses give a relation between an exposure and an outcome

► To test hypotheses, we either:


► Compare incidence of the outcome in those with and without the exposure *
► Compare frequency of the exposure in those with and without the outcome
● A case/control study

► Selection of appropriate groups to compare is the key to correctly testing a hypothesis

► Inappropriate comparisons lead to bias


► Bias means we measure a systematic (vs. random) difference from the true effect

2
What Makes a Good Comparison Group?—1

► The groups must be selected without regard to what is being compared between groups!
► If comparing incidence in the exposed and unexposed: group members must be
selected without regard to outcome
► If comparing exposure in cases and controls: group members must be selected without
regard to exposure
► Failure to do this is called selection bias

3
What Makes a Good Comparison Group?—2

► The groups must be selected without regard to what is being compared between groups!
► If comparing incidence in the exposed and unexposed: group members must be
selected without regard to outcome
► If comparing exposure in cases and controls: group members must be selected without
regard to exposure
► Failure to do this is called selection bias

► We must not be unintentionally comparing a factor that causes the outcome when we
define groups
► This happens when some other factor is associated with our exposure of interest and
the outcome
► When this occurs it is called confounding

4
Visualizing Selection Bias—1

► We hypothesize that people who get cholera are more likely to be


vegetarians than those who do not

► We decide to compare people who are hospitalized for watery diarrhea


who test positive for cholera with those who are hospitalized for diarrhea
and test negative for cholera (test-negative design)

► Meat eaters are more likely to be hospitalized with E. coli infection—


another cause of acute watery diarrhea (AWD)

► We incorrectly conclude that cholera cases are 2.5 times as likely to be


vegetarians as non-cholera cases (2/4 vs. 1/5)

5
Visualizing Selection Bias—2

► We hypothesize that people who get cholera are more likely to be


vegetarians than those who do not
► In reality, 50% of both are vegetarians (2/4 vs. 38/76)

► We decide to compare people who are hospitalized for watery diarrhea


who test positive for cholera with those who are hospitalized for diarrhea
and test negative for cholera (test-negative design)

► Meat eaters are more likely to be hospitalized with E. coli infection—


another cause of acute watery diarrhea (AWD)

► We incorrectly conclude that cholera cases are 2.5 times as likely to be


vegetarians as non-cholera cases (2/4 vs. 1/5)

6
Selection Bias in Real Life: Coffee and Pancreatic Cancer—1

► A case control study in 1981 looked at


the association between coffee drinking
and other exposures and pancreatic
cancer

► Controls were selected from other


patients of the doctors who diagnosed
the pancreatic cancer

7
Selection Bias in Real Life: Coffee and Pancreatic Cancer—2

► Many of these other patients had


gastrointestinal problems and were
advised to avoid coffee

► Hence controls consumed less coffee


than cases, and the study incorrectly
concluded that coffee was associated
with pancreatic cancer

8
Visualizing Confounding—1

► When studying a cholera outbreak, we hypothesize that people


with low BMI are more likely to get cholera than people with
normal BMI

► Comparison of these groups leads us to conclude that the risk


of cholera in the low BMI group is 1.8 times that of the high
BMI group (9/40 vs. 5/40)

9
Visualizing Confounding—2

► But people with high BMI are better off, and more likely to have
private wells to get their water

► Comparing just among people who have city water shows no


increased risk for low BMI (2/10 infected in both groups)

10
Confounding in Real Life: William Farr and Cholera

► Contemporary of John Snow; initially believed the miasma


theory of cholera transmission

► Supported this by the lower rates of cholera in those living at


high elevations versus those living at low elevations
► Rates were lower among those living at high elevations
► People living at high elevations were richer and paid for
cleaner water
► The comparison between high and low elevations was
confounded by economic status

► Farr was eventually convinced by John Snow’s work

Image source: accessed on January 23, 2019, from https://upload.wikimedia.org/wikipedia/commons/d/df/William_Farr.jpg 11


Key Points

► Appropriate comparison groups is the key to testing hypotheses

► Bias can occur if you pick a bad comparison group

► Accidentally picking groups based on the exposure (or outcome) can lead to selection bias

► Picking groups that differ by some factor associated with disease other than what is being
tested can lead to confounding

12
Exercise

► At your family dinner, you think it is the sweets that are causing the diarrhea

► Pick an appropriate comparison group to sweet eaters to determine if it is the cause

► Do the same things for comparing cases to non cases

► Why did you pick what you picked?


► Are there specific biases you tried to avoid?

13
Basic Measures of Association 1, Risk
Difference

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Measures of Disease Occurrence

► Incidence is the number of new cases of a disease that occur in a defined population in a
defined period of time; measures include:
► Reported cases: raw number of cases reported over some time period
► Cumulative incidence (rate): total number of cases that occur in a given population
over a span of time (sometimes called the attack rate)
► Incidence rate (IR): number of cases that occur per some unit of population per a unit
of time
● That is, cases per person-time

► Prevalence is the number of people that are infected with a disease in a population at a
given time

2
Risk Difference ► Difference in the incidence rate (or cumulative incidence) of
disease between groups

𝑅𝐷 = 𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴 − 𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐵

► Measures the population difference in rates of disease in


populations with different exposures (0 means no difference)

3
Classical Risk ► If we observe people for the same amount of time, we can
Difference calculate risk differences based on cumulative incidence:
Calculation—1

Sick Not sick


Exposed a b
Unexposed c d

𝑎 𝑐
RD = −
𝑎+𝑏 𝑐+𝑑

4
Classical Risk ► Risk differences from Johns Snow’s investigation:
Difference
Calculation—2

Deaths in HH Surviving in HHs


Southwark and Vauxhall 1,263 38,783
Lambeth 98 26,009

1,263 98
RD = − = 2.8 deaths per 100
1,263 + 38,783 98 + 26,009

5
Person-Time ► If different individuals are observed (that is, at risk) for different
Based Risk amounts of time, we need to base this calculation on time
observed
Difference
Calculation—1

Sick Time observed


Exposed a b
Unexposed c d

𝑎 𝑐
RD = −
𝑏 𝑑

6
Person-Time ► Risk of coronary artery disease among participants in the
Based Risk Nurses Health Study:
Difference
Calculation—2

N disease Years of follow-up


Currently use hormones 259 265,203
Never used hormones 662 358,135

259 662
RD = −
265,203 358,132

= −8.7 cases per 10,000 person−years

Data from Grodstein et al. (2000). Annals of Internal Medicine. 7


Basic Measures of Association 2, Relative
Risk

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Relative Risk

► The incidence rate ratio is the ratio of


disease rates in different populations and
is a measure of relative risk (RR)

𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴
IRR =
𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐵

► Measures the population difference in


rates of disease in populations with
different exposures
► 1 means no difference

2
Classical ► If we observe people for the same amount of time, we can
Relative Risk calculate relative risk based on cumulative incidence (more
properly, the cumulative incidence ratio)
Calculation

Sick Not sick


Exposed a b
Unexposed c d

𝑎⁄(𝑎 + 𝑏)
RR =
𝑐 ⁄(𝑐 + 𝑑)

3
Classical Risk ► Relative risk from Johns Snow’s investigation:
Difference
Calculation

Deaths in HH Surviving in HHs


Southwark and Vauxhall 1,263 38,783
Lambeth 98 26,009

1,263⁄(1,263 + 38,783)
RR = = 8.4 times the risk of death
98⁄(98 + 26,009)

4
Person-Time ► If different individuals are observed (that is, at risk) for different
Based Relative amounts of time, we need to calculate a proper incidence rate
ratio based on the time observed:
Risk Calculation

Sick Time observed


Exposed a b
Unexposed c d

𝑎⁄𝑏
IRR =
𝑐 ⁄𝑑

5
Person-Time ► Risk of coronary artery disease among participants in the
Based Risk Nurses Health Study:
Difference
Calculation

N disease Years of follow-up


Used hormones 259 265,203
Did not use hormones 662 358,135

259⁄265,203
IRR =
662⁄358,135
= 0.52 times annual rate of developing disease

Data from Grodstein et al. (2000). Annals of Internal Medicine. 6


Personal Risk vs. Public Health Risk—1

► John Snow compares those who get water from Southwark


and Vauxhall vs. Lambeth for one year
► 1/100 Lambeth users die
► 3/100 Southwark and Vauxhall users die

► The risk difference tells us the suggested benefits of


everyone using Lambeth as a water supplier
► 2 deaths per year

► The relative risk tells us the increased risk of death from


switching providers to Southwark and Vauxhall
► 3 times more likely to die each year

7
Personal Risk vs. Public Health Risk—2

► John Snow compares those who get water from Southwark


and Vauxhall vs. Lambeth for one year
► 10/100 Lambeth users die
► 30/100 Southwark and Vauxhall users die

► The risk difference tells us the suggested benefits of


everyone using Lambeth as a water supplier
► 20 deaths per year

► The relative risk tells us the increased risk of death from


switching providers to Southwark and Vauxhall
► 3 times more likely to die each year

8
Key Points

► The risk difference and relative risk are two common measures of association

► A risk difference that is significantly different than zero suggests an association

► A relative risk that is significantly different than one suggests an association

► If there are differences in amount of time individuals are observed, rates must be used

► To use these metrics inclusion must be independent of having the outcome

► Have different individual- and population-level interpretations

9
Exercise ► Compare the relative risk and risk difference for the data below

► Do you think these are strong evidence of association?

Cholera cases Person-years observed


Vaccinated 15 3,705
Unvaccinated 13 1,235

10
Understanding Basic Statistical Tests

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Statistical Significance—1

► Large association sizes do not necessarily mean meaningful


ones

► Consider that I flip a coin 3 times and get 3 heads; my office


mate flips a coin 3 times and gets 1 head
► I am 3 times more likely to get a head (RR=3)
► Suggests a difference of 20 heads per 100 coin flips (RD
per 100)

Image source: accessed on January 23, 2019, from https://commons.wikimedia.org/wiki/File:Coin_Toss_(3635981474).jpg 2


Statistical Significance—2

► But is this strong evidence that I am better at flipping heads


than my office mate?

► Testing statistical significance helps us make this call

Image source: accessed on January 23, 2019, from https://commons.wikimedia.org/wiki/File:Coin_Toss_(3635981474).jpg 3


P-Values—1

► Measure strength of evidence versus no association (the null hypothesis)

► The probability of seeing an effect as large as observed or greater if we repeated the


experiment a larger number of times and there was not a true association

► Typically values less than 0.05 are considered significant


► That is, we would be expected to see an effect that large or greater every 20 times we
did the experiment

4
P-Values—2

► Measure strength of evidence versus no association (the null hypothesis)

► The probability of seeing an effect as large as observed or greater if we repeated the


experiment a larger number of times and there was not a true association

► Typically values less than 0.05 are considered significant


► That is, we would be expected to see an effect that large or greater every 20 times we
did the experiment

► Not a yes or no decision!


► We may need different strengths of evidence in different times

5
Testing Tabular Data: the 𝜒 " Test

► 𝜒 " (pronounced chi-squared) tests tell us how different tabular data is than expected by
chance

► Looks at differences between individual cells and marginal totals

► Implemented in Microsoft Excel and other commonly used office spreadsheets

► Gives evidence that a significant association exists—not the direction of that association

6
Understanding the 𝜒 " Test —1

► The table on the right shows the expected Sick Not sick Total
values for sick men and women Men 15 35 50

► These values are based on the total Women 15 35 50


number of men and women—and total Total 30 70
number of sick people

7
Understanding the 𝜒 " Test —2

► If values are similar to that, the 𝜒 " test Sick Not sick Total
will produce a large p-value Men 14 36 50

► Here p = 0.66 Women 17 33 50


► Not significant Total 30 70

8
Understanding the 𝜒 " Test —3

► If values are different than expected, the Sick Not sick Total
𝜒 " test will produce a small p-value Men 5 45 50

► Here p = <0.0001 Women 25 25 50


► Highly significant Total 30 70

9
Subjecting Snow to the 𝜒 " Test

► John Snow did not have these No cholera cholera


statistical tools when he Lambeth and S&V 346,152 211
conducted his investigation
S&V only 118,156 111
► If he had, he would have seen a S&V and Kent 17,786 19
very significant relationship

► Here p = 0.0002
► There is strong evidence for a
significant relationship
between district water provider
and cholera

10
Other Common ► The t-test calculates the difference in means between two
Sources of P- groups
► Has one- and two-tailed versions
Values ► Two-tailed is more conservative

► Analysis of variance (ANOVA) is typically used to evaluate the


results of statistical experiments

► Regression models produce p-values for various effect


estimates

► Mann-Whitney U test produces p-values for comparing groups


without calculating means

► There are as many ways to calculate p-values as there are ways


to analyze data
11
Avoiding Misuse of P-Values

► P-values are not inherently bad, but they are misused

► Because of this misuse, some scientific journals no longer allow them

► A few pointers to avoid misuse:


► There is no magical value that makes something significant; 0.05 is just a rule of thumb
► In most statistical tests, p-values are only comparing against an alternative of no
association (that is, do not compare against another hypothesis)
► Just because something is significant does not mean it is meaningful
► Non-significance is not evidence of no association; it is just lack of evidence for one
► If you do an experiment or study enough times, you will eventually get a significant
result

12
Confidence Intervals

► Confidence intervals give a measure of the likely range of values a measure of association
might take

► Improve over p-values by giving us a range of supported values—not just a binary answer
about significance

► Most commonly reported as a 95% confidence interval


► Appropriate interpretation is subtle:
● An interval that, if calculated the same way in multiple experiments, would cover
the true value 95% of the time
► In practice, can be thought of as giving the range of values that are not significantly
different from our main estimate at the 0.05 (1–95%) level

13
P-value vs. Confidence Interval

► P-value = 0.05

► 95% confidence interval

14
Bringing it all ► Our original specific hypothesis:
Together ► “People who get their water from the less-contaminated
part of the Thames upstream of London are less likely to die
from cholera than those who get their water from the more
contaminated part of the Thames downstream of London”

► Our estimates of relative risk and risk difference strongly


suggest this is true:
► RR = 8.4; RD = 2.8 deaths per 100

► Confidence intervals show this is a significant result:


► RR 95% CI: 6.8 to 10.3
► RD 95% CI: 2.6–3.0 Deaths in HH Surviving in HHs
Southwark and Vauxhall 1,263 38,783
Lambeth 98 26,009

15
Key Points—1 ► Large measures of association are not necessarily strong
evidence (and small ones sometimes are)

► Statistical tests let us test them against a “null” hypothesis of


no association

► Low p-values support that some association exists, but there is


no magic number that “proves” it

16
Key Points—2 ► Confidence intervals provide more insight into the range
associations that are consistent with the evidence
► Still can be misused

► Statistical tests measure associations in the data and are


stronger as there is more data
► If there is systematic bias, it will just be stronger evidence
for the wrong conclusion.
● For example, bad comparison group

17
Exercise

► Use excel or an online chi-squared test calculator to calculate the p-values for the results
of me betting my friend John that I can flip more heads than he can:
► I flip 4 heads out of 5 tries; John flips 3 out 5
► I flip 9 heads out of 10 tries; John flips 4 out of 9
► I flip 18 heads out of 20 tries; John flips 10 out of 20

► Would any of these give you strong enough evidence to accuse me of cheating?

► Online 𝜒 " test calculators:


► https://www.socscistatistics.com/tests/chisquare2/Default2.aspx
► http://www.quantpsy.org/chisq/chisq.htm
► http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/

18
Summary

► A precise hypothesis is the key to testing an epidemiologic idea

► Testing a precise hypothesis requires comparisons between appropriate groups

► Measures of relative risk and risk differences can help us to quantify associations

► Statistical tests help us determine if these associations provide meaningful evidence for
and against a hypothesis

19

You might also like