Professional Documents
Culture Documents
► Turn general questions about disease and health into precise hypotheses
2
Recommended Readings and Resources
3
General Questions to Precise Hypotheses
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Following the Path of John Snow
► Maybe appropriate for framing your life’s work—but not a specific study
6
A Broad Hypothesis
► Can help to guide investigation but not support or refute the hypothesis
7
A Specific Hypothesis
► “People who drink water with sewage in it are more likely to get cholera than
those who drink clean water”
► Serves as the basis for the design of experiments and epidemiologic studies
8
A Hypothesis Tied to a Study or an Observation
► Falsifiable: there is some observation that could lead us to conclude that the hypothesis is
false
► Precise: elucidates exactly the relationship that will be tested and an expected result
10
The Problem with a Bad Hypothesis
► “Children who have received the MMR [measles, mumps, rubella] vaccine are significantly
more likely to have autism than those who do not”
► Has a clearly defined exposure, outcome, and relationship
► Can be falsified …
► … and has been proven false!
● For example, Jain et al., 2015, JAMA
11
Key Points
► Broad scientific questions and general hypotheses can help guide research
► But are of limited value when evaluating evidence
12
Exercise
► Make a specific, testable hypothesis that would help you determine the likely cause
13
Identifying Comparison Groups
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
What Is a Comparison Group?
2
What Makes a Good Comparison Group?—1
► The groups must be selected without regard to what is being compared between groups!
► If comparing incidence in the exposed and unexposed: group members must be
selected without regard to outcome
► If comparing exposure in cases and controls: group members must be selected without
regard to exposure
► Failure to do this is called selection bias
3
What Makes a Good Comparison Group?—2
► The groups must be selected without regard to what is being compared between groups!
► If comparing incidence in the exposed and unexposed: group members must be
selected without regard to outcome
► If comparing exposure in cases and controls: group members must be selected without
regard to exposure
► Failure to do this is called selection bias
► We must not be unintentionally comparing a factor that causes the outcome when we
define groups
► This happens when some other factor is associated with our exposure of interest and
the outcome
► When this occurs it is called confounding
4
Visualizing Selection Bias—1
5
Visualizing Selection Bias—2
6
Selection Bias in Real Life: Coffee and Pancreatic Cancer—1
7
Selection Bias in Real Life: Coffee and Pancreatic Cancer—2
8
Visualizing Confounding—1
9
Visualizing Confounding—2
► But people with high BMI are better off, and more likely to have
private wells to get their water
10
Confounding in Real Life: William Farr and Cholera
► Accidentally picking groups based on the exposure (or outcome) can lead to selection bias
► Picking groups that differ by some factor associated with disease other than what is being
tested can lead to confounding
12
Exercise
► At your family dinner, you think it is the sweets that are causing the diarrhea
13
Basic Measures of Association 1, Risk
Difference
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Measures of Disease Occurrence
► Incidence is the number of new cases of a disease that occur in a defined population in a
defined period of time; measures include:
► Reported cases: raw number of cases reported over some time period
► Cumulative incidence (rate): total number of cases that occur in a given population
over a span of time (sometimes called the attack rate)
► Incidence rate (IR): number of cases that occur per some unit of population per a unit
of time
● That is, cases per person-time
► Prevalence is the number of people that are infected with a disease in a population at a
given time
2
Risk Difference ► Difference in the incidence rate (or cumulative incidence) of
disease between groups
𝑅𝐷 = 𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴 − 𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐵
3
Classical Risk ► If we observe people for the same amount of time, we can
Difference calculate risk differences based on cumulative incidence:
Calculation—1
𝑎 𝑐
RD = −
𝑎+𝑏 𝑐+𝑑
4
Classical Risk ► Risk differences from Johns Snow’s investigation:
Difference
Calculation—2
1,263 98
RD = − = 2.8 deaths per 100
1,263 + 38,783 98 + 26,009
5
Person-Time ► If different individuals are observed (that is, at risk) for different
Based Risk amounts of time, we need to base this calculation on time
observed
Difference
Calculation—1
𝑎 𝑐
RD = −
𝑏 𝑑
6
Person-Time ► Risk of coronary artery disease among participants in the
Based Risk Nurses Health Study:
Difference
Calculation—2
259 662
RD = −
265,203 358,132
= −8.7 cases per 10,000 person−years
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Relative Risk
𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐴
IRR =
𝐼𝑅 𝑖𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐵
2
Classical ► If we observe people for the same amount of time, we can
Relative Risk calculate relative risk based on cumulative incidence (more
properly, the cumulative incidence ratio)
Calculation
𝑎⁄(𝑎 + 𝑏)
RR =
𝑐 ⁄(𝑐 + 𝑑)
3
Classical Risk ► Relative risk from Johns Snow’s investigation:
Difference
Calculation
1,263⁄(1,263 + 38,783)
RR = = 8.4 times the risk of death
98⁄(98 + 26,009)
4
Person-Time ► If different individuals are observed (that is, at risk) for different
Based Relative amounts of time, we need to calculate a proper incidence rate
ratio based on the time observed:
Risk Calculation
𝑎⁄𝑏
IRR =
𝑐 ⁄𝑑
5
Person-Time ► Risk of coronary artery disease among participants in the
Based Risk Nurses Health Study:
Difference
Calculation
259⁄265,203
IRR =
662⁄358,135
= 0.52 times annual rate of developing disease
7
Personal Risk vs. Public Health Risk—2
8
Key Points
► The risk difference and relative risk are two common measures of association
► If there are differences in amount of time individuals are observed, rates must be used
9
Exercise ► Compare the relative risk and risk difference for the data below
10
Understanding Basic Statistical Tests
The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Statistical Significance—1
4
P-Values—2
5
Testing Tabular Data: the 𝜒 " Test
► 𝜒 " (pronounced chi-squared) tests tell us how different tabular data is than expected by
chance
► Gives evidence that a significant association exists—not the direction of that association
6
Understanding the 𝜒 " Test —1
► The table on the right shows the expected Sick Not sick Total
values for sick men and women Men 15 35 50
7
Understanding the 𝜒 " Test —2
► If values are similar to that, the 𝜒 " test Sick Not sick Total
will produce a large p-value Men 14 36 50
8
Understanding the 𝜒 " Test —3
► If values are different than expected, the Sick Not sick Total
𝜒 " test will produce a small p-value Men 5 45 50
9
Subjecting Snow to the 𝜒 " Test
► Here p = 0.0002
► There is strong evidence for a
significant relationship
between district water provider
and cholera
10
Other Common ► The t-test calculates the difference in means between two
Sources of P- groups
► Has one- and two-tailed versions
Values ► Two-tailed is more conservative
12
Confidence Intervals
► Confidence intervals give a measure of the likely range of values a measure of association
might take
► Improve over p-values by giving us a range of supported values—not just a binary answer
about significance
13
P-value vs. Confidence Interval
► P-value = 0.05
14
Bringing it all ► Our original specific hypothesis:
Together ► “People who get their water from the less-contaminated
part of the Thames upstream of London are less likely to die
from cholera than those who get their water from the more
contaminated part of the Thames downstream of London”
15
Key Points—1 ► Large measures of association are not necessarily strong
evidence (and small ones sometimes are)
16
Key Points—2 ► Confidence intervals provide more insight into the range
associations that are consistent with the evidence
► Still can be misused
17
Exercise
► Use excel or an online chi-squared test calculator to calculate the p-values for the results
of me betting my friend John that I can flip more heads than he can:
► I flip 4 heads out of 5 tries; John flips 3 out 5
► I flip 9 heads out of 10 tries; John flips 4 out of 9
► I flip 18 heads out of 20 tries; John flips 10 out of 20
► Would any of these give you strong enough evidence to accuse me of cheating?
18
Summary
► Measures of relative risk and risk differences can help us to quantify associations
► Statistical tests help us determine if these associations provide meaningful evidence for
and against a hypothesis
19