Professional Documents
Culture Documents
fashion can trick the casual observer into believing something other than what the data shows. That
is, a misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases,
the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When
the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.
The false statistics trap can be quite damaging for the quest for knowledge. For example, in medical
science, correcting a falsehood may take decades and cost lives.
Misuses can be easy to fall into. Professional scientists, even mathematicians and professional
statisticians, can be fooled by even some simple methods, even if they are careful to check
everything. Scientists have been known to fool themselves with statistics due to lack of knowledge
of probability theory and lack of standardization of their tests.
Simple causes[edit]
Many misuses of statistics occur because
The source is a subject matter expert, not a statistics expert.[8] The source may incorrectly
use a method or interpret a result.
The source is a statistician, not a subject matter expert.[9] An expert should know when the
numbers being compared describe different things. Numbers change, as reality does not, when
legal definitions or political boundaries change.
The subject being studied is not well defined.[10] While IQ tests are available and numeric it is
difficult to define what they measure; Intelligence is an elusive concept. Publishing "impact" has
the same problem.[11] A seemingly simple question about the number of words in the English
language immediately encounters questions about archaic forms, accounting for prefixes and
suffixes, multiple definitions of a word, variant spellings, dialects, fanciful creations (like
ectoplastistics from ectoplasm and statistics),[12] technical vocabulary...
Data quality is poor.[13] Apparel provides an example. People have a wide range of sizes and
body shapes. It is obvious that apparel sizing must be multidimensional. Instead it is complex in
unexpected ways. Some apparel is sold by size only (with no explicit consideration of body
shape), sizes vary by country and manufacturer and some sizes are deliberately misleading.
While sizes are numeric, only the crudest of statistical analyses is possible using the size
numbers with care.
The popular press has limited expertise and mixed motives.[14] If the facts are not
"newsworthy" (which may require exaggeration) they may not be published. The motives of
advertisers are even more mixed.
"Politicians use statistics in the same way that a drunk uses lamp-posts—for support rather
than illumination" - Andrew Lang (WikiQuote) "What do we learn from these two ways of looking
at the same numbers? We learn that a clever propagandist, right or left, can almost always find
a way to present the data on economic growth that seems to support her case. And we therefore
also learn to take any statistical analysis from a strongly political source with handfuls of
salt."[15] The term statistics originates from numbers generated for and utilized by the state. Good
government may require accurate numbers, but popular government may require supportive
numbers (not necessarily the same). "The use and misuse of statistics by governments is an
ancient art."[16]
Types of misuse[edit]
Discarding unfavorable data[edit]
All a company has to do to promote a neutral (useless) product is to find or conduct, for example, 40
studies with a confidence level of 95%. If the product is really useless, this would on average
produce one study showing the product was beneficial, one study showing it was harmful and thirty-
eight inconclusive studies (38 is 95% of 40). This tactic becomes more effective the more studies
there are available. Organizations that do not publish every study they carry out, such as tobacco
companies denying a link between smoking and cancer, anti-smoking advocacy groups and media
outlets trying to prove a link between smoking and various ailments, or miracle pill vendors, are likely
to use this tactic.
Ronald Fisher considered this issue in his famous lady tasting tea example experiment (from his
1935 book, The Design of Experiments). Regarding repeated experiments he said, "It would clearly
be illegitimate, and would rob our calculation of its basis, if unsuccessful results were not all brought
into the account."
Another term related to this concept is cherry picking.
Loaded questions[edit]
Main article: Loaded question
The answers to surveys can often be manipulated by wording the question in such a way as to
induce a prevalence towards a certain answer from the respondent. For example, in polling support
for a war, the questions:
Do you support the attempt by the USA to bring freedom and democracy to other places in
the world?
Do you support the unprovoked military action by the USA?
will likely result in data skewed in different directions, although they are both polling about the
support for the war. A better way of wording the question could be "Do you support the current US
military action abroad?" A still more nearly neutral way to put that question is "What is your view
about the current US military action abroad?" The point should be that the person being asked has
no way of guessing from the wording what the questioner might want to hear.
Another way to do this is to precede the question by information that supports the "desired" answer.
For example, more people will likely answer "yes" to the question "Given the increasing burden of
taxes on middle-class families, do you support cuts in income tax?" than to the question
"Considering the rising federal budget deficit and the desperate need for more revenue, do you
support cuts in income tax?"
The proper formulation of questions can be very subtle. The responses to two questions can vary
dramatically depending on the order in which they are asked.[17] "A survey that asked about
'ownership of stock' found that most Texas ranchers owned stock, though probably not the kind
traded on the New York Stock Exchange."[18]
Overgeneralization[edit]
Overgeneralization is a fallacy occurring when a statistic about a particular population is asserted to
hold among members of a group for which the original population is not a representative sample.
For example, suppose 100% of apples are observed to be red in summer. The assertion "All apples
are red" would be an instance of overgeneralization because the original statistic was true only of a
specific subset of apples (those in summer), which is not expected to be representative of the
population of apples as a whole.
A real-world example of the overgeneralization fallacy can be observed as an artifact of modern
polling techniques, which prohibit calling cell phones for over-the-phone political polls. As young
people are more likely than other demographic groups to lack a conventional "landline" phone, a
telephone poll that exclusively surveys responders of calls landline phones, may cause the poll
results to undersample the views of young people, if no other measures are taken to account for this
skewing of the sampling. Thus, a poll examining the voting preferences of young people using this
technique may not be a perfectly accurate representation of young peoples' true voting preferences
as a whole without overgeneralizing, because the sample used excludes young people that carry
only cell phones, who may or may not have voting preferences that differ from the rest of the
population.
Overgeneralization often occurs when information is passed through nontechnical sources, in
particular mass media.
Biased samples[edit]
Main article: Biased sample
Scientists have learned at great cost that gathering good experimental data for statistical analysis is
difficult. Example: The placebo effect (mind over body) is very powerful. 100% of subjects developed
a rash when exposed to an inert substance that was falsely called poison ivy while few developed a
rash to a "harmless" object that really was poison ivy.[19]Researchers combat this effect by double-
blind randomized comparative experiments. Statisticians typically worry more about the validity of
the data than the analysis. This is reflected in a field of study within statistics known as the design of
experiments.
Pollsters have learned at great cost that gathering good survey data for statistical analysis is difficult.
The selective effect of cellular telephones on data collection (discussed in the Overgeneralization
section) is one potential example; If young people with traditional telephones are not representative,
the sample can be biased. Sample surveys have many pitfalls and require great care in execution.
[20]
One effort required almost 3000 telephone calls to get 1000 answers. The simple random sample
of the population "isn't simple and may not be random."[21]
False causality[edit]
Main article: Correlation does not imply causation
When a statistical test shows a correlation between A and B, there are usually six possibilities:
1. A causes B.
2. B causes A.
3. A and B both partly cause each other.
4. A and B are both caused by a third factor, C.
5. B is caused by C which is correlated to A.
6. The observed correlation was due purely to chance.
The sixth possibility can be quantified by statistical tests that can calculate the probability that the
correlation observed would be as large as it is just by chance if, in fact, there is no relationship
between the variables. However, even if that possibility has a small probability, there are still the five
others.
If the number of people buying ice cream at the beach is statistically related to the number of people
who drown at the beach, then nobody would claim ice cream causes drowning because it's obvious
that it isn't so. (In this case, both drowning and ice cream buying are clearly related by a third factor:
the number of people at the beach).
This fallacy can be used, for example, to prove that exposure to a chemical causes cancer. Replace
"number of people buying ice cream" with "number of people exposed to chemical X", and "number
of people who drown" with "number of people who get cancer", and many people will believe you. In
such a situation, there may be a statistical correlation even if there is no real effect. For example, if
there is a perception that a chemical site is "dangerous" (even if it really isn't) property values in the
area will decrease, which will entice more low-income families to move to that area. If low-income
families are more likely to get cancer than high-income families (this can happen for many reasons,
such as a poorer diet or less access to medical care) then rates of cancer will go up, even though
the chemical itself is not dangerous. It is believed[24] that this is exactly what happened with some of
the early studies showing a link between EMF (electromagnetic fields) from power lines and cancer.
[25]
In well-designed studies, the effect of false causality can be eliminated by assigning some people
into a "treatment group" and some people into a "control group" at random, and giving the treatment
group the treatment and not giving the control group the treatment. In the above example, a
researcher might expose one group of people to chemical X and leave a second group unexposed. If
the first group had higher cancer rates, the researcher knows that there is no third factor that
affected whether a person was exposed because he controlled who was exposed or not, and he
assigned people to the exposed and non-exposed groups at random. However, in many
applications, actually doing an experiment in this way is either prohibitively expensive, infeasible,
unethical, illegal, or downright impossible. For example, it is highly unlikely that an IRB would accept
an experiment that involved intentionally exposing people to a dangerous substance in order to test
its toxicity. The obvious ethical implications of such types of experiments limit researchers' ability to
empirically test causation.
Data dredging[edit]
Main article: Data dredging
Data dredging is an abuse of data mining. In data dredging, large compilations of data are examined
in order to find a correlation, without any pre-defined choice of a hypothesis to be tested. Since the
required confidence interval to establish a relationship between two parameters is usually chosen to
be 95% (meaning that there is a 95% chance that the relationship observed is not due to random
chance), there is a thus a 5% chance of finding a correlation between any two sets of completely
random variables. Given that data dredging efforts typically examine large datasets with many
variables, and hence even larger numbers of pairs of variables, spurious but apparently statistically
significant results are almost certain to be found by any such study.
Note that data dredging is a valid way of finding a possible hypothesis but that hypothesis must then
be tested with data not used in the original dredging. The misuse comes in when that hypothesis is
stated as fact without further validation.
"You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. The
remedy is clear. Once you have a hypothesis, design a study to search specifically for the effect you
now think is there. If the result of this test is statistically significant, you have real evidence at last."[28]
Data manipulation[edit]
Informally called "fudging the data," this practice includes selective reporting (see also publication
bias) and even simply making up false data.
Examples of selective reporting abound. The easiest and most common examples involve choosing
a group of results that follow a pattern consistent with the preferred hypothesiswhile ignoring other
results or "data runs" that contradict the hypothesis.
Psychic researchers have long disputed studies showing people with ESP ability. Critics accuse
ESP proponents of only publishing experiments with positive results and shelving those that show
negative results. A "positive result" is a test run (or data run) in which the subject guesses a hidden
card, etc., at a much higher frequency than random chance.[citation needed]
Scientists, in general, question the validity of study results that cannot be reproduced by other
investigators. However, some scientists refuse to publish their data and methods.[29]
Data manipulation is a serious issue/consideration in the most honest of statistical analyses.
Outliers, missing data and non-normality can all adversely affect the validity of statistical analysis. It
is appropriate to study the data and repair real problems before analysis begins. "[I]n any scatter
diagram there will be some points more or less detached from the main part of the cloud: these
points should be rejected only for cause."[30]
Other fallacies[edit]
Pseudoreplication is a technical error associated with analysis of variance. Complexity hides the fact
that statistical analysis is being attempted on a single sample (N=1). For this degenerate case the
variance cannot be calculated (division by zero).
The gambler's fallacy assumes that an event for which a future likelihood can be measured had the
same likelihood of happening once it has already occurred. Thus, if someone had already tossed 9
coins and each has come up heads, people tend to assume that the likelihood of a tenth toss also
being heads is 1023 to 1 against (which it was before the first coin was tossed) when in fact the
chance of the tenth head is 50% (assuming the coin is unbiased).
The prosecutor's fallacy[31] has led, in the UK, to the false imprisonment of women for murder when
the courts were given the prior statistical likelihood of a woman's 3 children dying from Sudden Infant
Death Syndrome as being the chances that their already dead children died from the syndrome. This
led to statements from Roy Meadow that the chance they had died of Sudden Infant Death
Syndrome were extremely small (one in millions). The courts then handed down convictions in spite
of the statistical inevitability that a few women would suffer this tragedy. The convictions were
eventually overturned (and Meadow was subsequently struck off the U.K. Medical Register for giving
“erroneous” and “misleading” evidence, although this was later reversed by the courts).[32] Meadow's
calculations were irrelevant to these cases, but even if they were, using the same methods of
calculation would have shown that the odds against two cases of infanticide were even smaller (one
in billions).[32]
The ludic fallacy. Probabilities are based on simple models that ignore real (if remote) possibilities.
Poker players do not consider that an opponent may draw a gun rather than a card. The insured
(and governments) assume that insurers will remain solvent, but see AIG and systemic risk.