You are on page 1of 5

Misapplication of Statistical Techniques

Misapplication of statistical techniques can lead to incorrect conclusions.


Let’s see an example on the setting of Scientific
Research,
→If we read the paper or browse the internet,
there is no shortage of advice, supposedly
backed by some scientific study about the
health effects of some food or behavior.
For example, we may find, red wines prevent
heart diseases or antioxidants make you live
longer, cigarettes cause cancer, coffee causes
cancer, coffee prevents cancer or some berries
cause weight loss.

Some of these assertions like the link between cigarettes and cancer stand the test of
time, get confirmed by other scientific experiments, and become acknowledged as true
jmro@me.com
HFXPQNUMI1scientific facts. For others, new studies don’t find the same effect or maybe they even
find the opposite one. And literature goes back and forth with seemingly conflicting
claims.

This is not just a problem with nutrition literature, but we can


find similar examples in the literature discussing the best
treatment for a medical problem, the genetic basis for disease,
or the existence of telepathy. We can find such conflicting
evidence even in clinical trials published in top scientific
journals.

Researchers who have studied this phenomenon have found surprisingly high rates of
nonreplicability for scientific results. And some have proposed the possibility that in
certain fields, a majority of the published research findings be false.

1
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
This is pretty surprising since these studies are rigorously referred to. And they are
usually accompanied by p-value and confidence intervals and tables of data that
provide a veneer of the epistemological comfort conveyed by modern science.

Some examples of commonly made mistakes that can cause these problems are,
→Consider the following scenario. Suppose we get frustrated with all the conflicting
claims that we are reading. And we want to decide once and for all which food causes
or prevents heart disease.
To do this, we collect data on the 100 most common foods that people eat and their
incidence of heart disease.
Designing this study is complicated. People lie about their food habits and it’s not
always clear whether someone does or doesn’t have heart disease etc.
For simplicity, let’s just set this aside and assume that we can successfully assign
p-values to the hypothesis of the form:
Food X Causes Heart Diseases.

Then we take all the foods with a p-value > 0.95.


jmro@me.com
HFXPQNUMI1We will report these foods as causing heart disease with 95% confidence.
But the problem is
p > 0.95 means the chance of observed outcome under the null hypothesis(No
Effect) ≤ 0.05.
→Suppose that none of the foods that we test actually have any effect on heart
disease.
This means that for each of the hundred foods, the null hypothesis is that the food
doesn’t cause heart disease.
However, if we get false positives for each with a probability of 5%.
Then we would actually expect to find an apparent effect for 5 of the 100 foods.
Even worse if the false positives for different foods are independent, then the
probability that at least one food appears to have an effect is actually very high. It
turns out to be 99.4%.
So, even though the 95% constant seems quite good, the chance of us reporting a
spurious effect is extremely high.
Spurious is a term used to describe a statistical relationship between two variables
that would, at first glance, appear to be causally related, but upon closer examination,

2
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
only appear so by coincidence or due to the role of a third, intermediary variable. When
this occurs, the two original variables are said to have a "spurious relationship."
This problem here is called Hypothesis Shopping.

Here instead of fixing a hypothesis and testing it, we looked at our data and tried to
find a hypothesis that would pass the test.
This probabilistic phenomenon, where we are simultaneously testing a large number of
hypothesis, can occur fairly easily under less contrived conditions.

For instance, imagine testing each of the five variants of a drug on five different types
of cancer. Then looking at how the effects breakdown by age, gender, and ethnicity.

jmro@me.com
HFXPQNUMI1

Then it’s easy to find even more extreme examples in scientific settings.
For example in genomics, scientists often want to figure out what genes cause the
disease. A way to start looking for these is called a Genome-Wide Association Study,
where they cross-reference the incidence of the disease with 10000s of genes, looking
for the small number of genes that cause it.
Here, if they use p = 0.95,
We would expect thousands of false positives, that is genes that exhibit a seemingly
statistically significant correlation but don’t actually have anything to do with the
disease.
On the other hand, we will get just a handful of true positives corresponding to the
actual genes they are looking for.
Of course, good researchers carefully correct for this, by for example requiring much
higher probabilities or just using these studies to identify candidates and then carefully
testing them by other means.
But it’s easy to miss this and to make mistakes.

3
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Alternatively, suppose we set up an observational experiment where we set
something up and just look to see if anything interesting happens. There is a lot of
potentially interesting behavior. So, we might actually expect something interesting to
happen just by chance. This last one actually leads to a very subtle way that these
biases can creep into this literature.

→In general, it’s pretty hard to get a good journal to accept a paper wherein an
experiment nothing interesting happened. People are much more likely to publish
papers that describe some new and exciting phenomena. If one uses 95% as the cutoff
for something being significant, then they would get something significant by chance
5% of the time.

But people are often much more likely to report their findings when they see
something significant. So, if people don’t properly correct for this, then we would
expect a lot more than 5% of the interesting phenomena that are submitted for
publication to have occurred by chance.
This is particularly acute if we have a bunch of groups of people studying the same
jmro@me.com
HFXPQNUMI1phenomena.
→For instance, suppose that some food has no effect on whether we get cancer or not
in actuality, but people think that there are some reasons that it might.
Suppose that this results in 50 groups of people independently performing
experiments to test this hypothesis. Of these, we would expect that few groups see an
effect that clears the 95% confidence bar.

4
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
These groups don’t know about each other. So, they excitedly submit their work for
publication. The other 40 or so groups don’t see anything interesting and many of
them don’t even bother to publish this sort of findings.
This means that all anybody ever sees is that 2 or 3 groups independently observe the
phenomena with 95% confidence, which if they were the only ones looking at the
question, seems extremely unlikely to happen by chance.
This is not even a mistake on their part if they don’t know about the other groups, they
might not have any obvious reasons to worry about this sort of problem.

One reason that we discussed this sort of problem is that they can become particularly
common in the context of so-called Big Data. There is really a very thin line between
Data Mining and Hypothesis Shopping. As illustrated by the genome-wide association
study example, bigger datasets make it possible to look for an increasingly large
number of patterns. Moreover, the complex machine learning techniques are looking
for increasingly complex patterns, which makes it even easier to find something
seemingly significant that occurs just by chance.

jmro@me.com
HFXPQNUMI1The techniques we discussed are powerful, so it’s important to make sure you are
using them correctly. When used correctly, they let people do things that otherwise
wouldn’t be possible. But if you don’t combine them with the appropriate skepticism
and common sense, it’s easy to make mistakes and reach the wrong conclusions.

5
This file is meant for personal use by jmro@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.

You might also like