Professional Documents
Culture Documents
Some of these assertions like the link between cigarettes and cancer stand the test of
time, get confirmed by other scientific experiments, and become acknowledged as true
jmro@me.com
HFXPQNUMI1scientific facts. For others, new studies don’t find the same effect or maybe they even
find the opposite one. And literature goes back and forth with seemingly conflicting
claims.
Researchers who have studied this phenomenon have found surprisingly high rates of
nonreplicability for scientific results. And some have proposed the possibility that in
certain fields, a majority of the published research findings be false.
1
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
This is pretty surprising since these studies are rigorously referred to. And they are
usually accompanied by p-value and confidence intervals and tables of data that
provide a veneer of the epistemological comfort conveyed by modern science.
Some examples of commonly made mistakes that can cause these problems are,
→Consider the following scenario. Suppose we get frustrated with all the conflicting
claims that we are reading. And we want to decide once and for all which food causes
or prevents heart disease.
To do this, we collect data on the 100 most common foods that people eat and their
incidence of heart disease.
Designing this study is complicated. People lie about their food habits and it’s not
always clear whether someone does or doesn’t have heart disease etc.
For simplicity, let’s just set this aside and assume that we can successfully assign
p-values to the hypothesis of the form:
Food X Causes Heart Diseases.
2
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
only appear so by coincidence or due to the role of a third, intermediary variable. When
this occurs, the two original variables are said to have a "spurious relationship."
This problem here is called Hypothesis Shopping.
Here instead of fixing a hypothesis and testing it, we looked at our data and tried to
find a hypothesis that would pass the test.
This probabilistic phenomenon, where we are simultaneously testing a large number of
hypothesis, can occur fairly easily under less contrived conditions.
For instance, imagine testing each of the five variants of a drug on five different types
of cancer. Then looking at how the effects breakdown by age, gender, and ethnicity.
jmro@me.com
HFXPQNUMI1
Then it’s easy to find even more extreme examples in scientific settings.
For example in genomics, scientists often want to figure out what genes cause the
disease. A way to start looking for these is called a Genome-Wide Association Study,
where they cross-reference the incidence of the disease with 10000s of genes, looking
for the small number of genes that cause it.
Here, if they use p = 0.95,
We would expect thousands of false positives, that is genes that exhibit a seemingly
statistically significant correlation but don’t actually have anything to do with the
disease.
On the other hand, we will get just a handful of true positives corresponding to the
actual genes they are looking for.
Of course, good researchers carefully correct for this, by for example requiring much
higher probabilities or just using these studies to identify candidates and then carefully
testing them by other means.
But it’s easy to miss this and to make mistakes.
3
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Alternatively, suppose we set up an observational experiment where we set
something up and just look to see if anything interesting happens. There is a lot of
potentially interesting behavior. So, we might actually expect something interesting to
happen just by chance. This last one actually leads to a very subtle way that these
biases can creep into this literature.
→In general, it’s pretty hard to get a good journal to accept a paper wherein an
experiment nothing interesting happened. People are much more likely to publish
papers that describe some new and exciting phenomena. If one uses 95% as the cutoff
for something being significant, then they would get something significant by chance
5% of the time.
But people are often much more likely to report their findings when they see
something significant. So, if people don’t properly correct for this, then we would
expect a lot more than 5% of the interesting phenomena that are submitted for
publication to have occurred by chance.
This is particularly acute if we have a bunch of groups of people studying the same
jmro@me.com
HFXPQNUMI1phenomena.
→For instance, suppose that some food has no effect on whether we get cancer or not
in actuality, but people think that there are some reasons that it might.
Suppose that this results in 50 groups of people independently performing
experiments to test this hypothesis. Of these, we would expect that few groups see an
effect that clears the 95% confidence bar.
4
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
These groups don’t know about each other. So, they excitedly submit their work for
publication. The other 40 or so groups don’t see anything interesting and many of
them don’t even bother to publish this sort of findings.
This means that all anybody ever sees is that 2 or 3 groups independently observe the
phenomena with 95% confidence, which if they were the only ones looking at the
question, seems extremely unlikely to happen by chance.
This is not even a mistake on their part if they don’t know about the other groups, they
might not have any obvious reasons to worry about this sort of problem.
One reason that we discussed this sort of problem is that they can become particularly
common in the context of so-called Big Data. There is really a very thin line between
Data Mining and Hypothesis Shopping. As illustrated by the genome-wide association
study example, bigger datasets make it possible to look for an increasingly large
number of patterns. Moreover, the complex machine learning techniques are looking
for increasingly complex patterns, which makes it even easier to find something
seemingly significant that occurs just by chance.
jmro@me.com
HFXPQNUMI1The techniques we discussed are powerful, so it’s important to make sure you are
using them correctly. When used correctly, they let people do things that otherwise
wouldn’t be possible. But if you don’t combine them with the appropriate skepticism
and common sense, it’s easy to make mistakes and reach the wrong conclusions.
5
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.