You are on page 1of 5

SCIENTIFIC METHOD / SCIENCE & EXPLORATION

Scientific consensus has gotten a bad reputationand it doesnt deserve it


It's used by both sides in the climate debates, but consensus is part of a process.
by John Timmer - Sept 4 2014, 9:00am HB

http://arstechnica.com/science/2014/09/scientific-consensus-has-gotten-a-bad-reputation-and-it-doesnt-deserve-it/
Consensus? It's complicated and does not involve A) everyone agreeing or B) everyone meeting for coffee.
One of the many unfortunate aspects of arguments over climate change is that it's where many people come across the
idea of a scientific consensus. Just as unfortunately, their first exposure tends to be in the form of shouted sound bites:
"But there's a consensus!" "Consensus has no place in science!"
Lost in the shouting is the fact that consensus plays several key roles in the process of science. In light of all the
consensus choruses, it's probably time to step back and examine its importance and why it's a central part of the
scientific process. And only after that is it possible to take a look at consensus and climate change.

Standards of evidence
Fiction author Michael Crichton probably started the backlash against the idea of consensus in science. Crichton was
rather notable for doubting the conclusions of climate scientistshe wrote an entire book in which they were the
villainsso it's fair to say he wasn't thrilled when the field reached a consensus. Still, it's worth looking at what he said,
if only because it's so painfully misguided:
Let's be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics.
Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results
that are verifiable by reference to the real world. In science consensus is irrelevant. What is relevant is reproducible
results.
Reproducible results are absolutely relevant. What Crichton is missing is how we decide that those results are
significant and how one investigator goes about convincing everyone that he or she happens to be right. This comes
down to what the scientific community as a whole accepts as evidence.
In an earlier discussion of science's standards for statistical significance, we wrote, "Nobody's ever found a stone tablet
etched with a value for scientific certainty." Different fields use different values of what they think constitutes
significance. In biology, where "facts" are usually built from a large collection of consistent findings, scientists are
willing to accept findings that are only two standard deviations away from random noise as evidence. In physics, where
particles either exist or don't, five standard deviations are required.
While that makes the standards of evidence sound completely rational, they're also deeply empirical. Physicists found
that signals that were three standard deviations from the expected value came and went all the time, which is why they
increased their standard. Biologists haven't had such problems, but other problems have popped up as new technology
enabled them to do tests that covered tens of thousands of genes instead of only a handful. Suddenly, spurious results
were cropping up at a staggering pace. For these experiments, biologists agreed to a different standard of evidence.
It's not like they got together and had a formal vote on it. Instead, there were a few editorials that highlighted the
problem, and those pieces started to sway the opinions of not only scientists but journal editors and the people who fund
grants. In other words, the field reached a consensus.
That sort of thing is easiest to see in terms of statistical significance, but it pervades the process of science. If two
closely related species share a feature, then we conclude it was present in their common ancestor. The scientific
community decided to establish 15 percent ice coverage as the standard for when a region of the ocean contains ice.
It required that every potential planet imaged by the Kepler probe must have its presence confirmed by an independent
method before being called a planet. There's no objective standard that defines any one of these test as the truth; it's just
that the people in the field have reached a consensus about what constitutes evidence.

Consensus is not just for standards


Just as fields reach a consensus about what constitutes evidence, they reach a consensus about what that evidence has
demonstrated. Confusion about the potential causes of AIDS dominated the early years of the epidemic, but it took
researchers only two years after the formal description of the disorder to identify a virus that infected the right cells. In
less than a decade, enough evidence piled up to allow the biomedical research community to form a consensus: HIV
was the causal agent of AIDS.
That doesn't mean that every single person in the field had been convinced; there are holdouts, including a Nobel Prize
winner, who continue to argue that the evidence is insufficient. Those in the fieldand humanity in generalsimply
don't find their arguments persuasive. We've since oriented public policy around what the vast majority of
experts consider a fact.

In most fields, however, the stakes aren't quite so high. You get informal consensuses forming around things that the
public isn't ever aware of: the existence of morphogens in patterning embryonic tissues, the source of the radiation in
the jets of quasars, and so on. If you asked a large group of scientists, their consensus would be that consensus is a
normal part of the scientific process. Contrary to Crichton's writings, the consensus forms precisely because
reproducible evidence is generated.

Consensus matters
On its own, the existence of a consensus seems trivial; researchers conclude some things based on the state of the
evidence without that evidence ever rising to the level of formal proof. But consensus plays a critical role in the day-today functioning of science as well.
In The Structure of Scientific Revolutions, Thomas Kuhn discussed the idea of paradigms: big intellectual frameworks
that organize a field's research. Paradigms help identify problems that need solving, areas that still have anomalous
resultsall while giving researchers ways of interpreting any results they get. Generally, they tell scientists what to do
and how to think of their results. Although not as important or over-arching as a paradigm, a consensus functions the
same way, just at a smaller scale.
For example, researchers will necessarily interpret their results based on what the consensus in their field is. So odd
cosmic observations will be considered in terms of the existence of dark matter particles, given that there's a consensus
that the particles exist. It doesn't matter whether the researchersor their resultsagree with the consensus. The
existence of a consensus simply shapes the discussion. In the same way, research goals and grants are set based on areas
where the consensus opinion seems a bit weak or has unanswered questions.
At first glance, this may seem like it can stifle the appearance of ideas that run counter to the consensus. But any idea in
science carries the seeds of its own destruction. By directing research to the areas where there are outstanding questions,
a consensus makes it more likely that we'll generate data that directly contradicts it. It may take a little while to get
recognized for what it is, but eventually the data will win out.

A climate of consensus
It's easy to find examples of how a consensus operates from any area of science, and the field of climate science is no
different. A strong consensus has formed about the broad outlines of climate change, although there are still some
details, like the precise impact of aerosols, that are recognized as uncertain. (You could say that either no consensus has
formed or that there's a consensus that we don't precisely know.) You're never going to convince everyone in the field,
but a variety of studies have suggested that over 95 percent of the scientists with the relevant expertise are on the same
page about the general outlines of climate change.
What's really different is that the consensus has been formalized. The Intergovernmental Panel on Climate Change
produces assessment reports that summarize the latest knowledge in the field. These reports synthesize multiple
research papers to paint a general picture of the state of knowledge, and they even provide measures of how certain that
picture is. And as mentioned above, people have used a variety of methodsliterature searches, polling of scientists,
and so onto measure the state of the consensus. Each of those attempts has put the consensus number for climate
change in the area of 97 percent agreement.
A consensus definitely exists. Does that mean it's right?
There have clearly been times in the past where the consensus wasn't especially brilliant. Mendel was ignored instead of
starting to build a consensus, and Alfred Wegner's formative ideas about plate tectonics were roundly ridiculed. But it's
worth noting that these cases are the exception. The majority of the time, the consensus is a bit closer to being right than
whatever came before it. And while it may be slow to change sometimes, it can eventually be shifted by the weight of
the evidence.
The other thing is that scientists are reasonably good at knowing when they don't know something as well as they'd like
to. For example, uncertainties about aerosols and cloud feedbacks are generally recognized as the biggest challenge
facing climate projections, and most scientists would also agree that projections don't do a great job with regional
effects.
That doesn't mean individual scientists aren't convinced that they know what's going on, either about these topics or
others where scientific knowledge remains fluid. It's just that there are enough other scientists convinced that the first
group are wrong. It becomes clear to everyone involved that there's a bitter argument going on. And that's enough
to show bystanders that matters haven't been settled yet.

No consensus?
Unfortunately, for people outside the field, it can be hard to distinguish these sorts of scientific arguments from the
pedantic nitpicking that's often done by scientists who haven't been persuaded by the evidence and probably never will
be. There will always be people with relevant credentials who aren't convinced, and they often make technical-sounding
arguments about why the evidence falls short. And they typically find ways of making sure the public hears those
arguments.

But it's important to understand what the few scientists that don't accept the consensus are arguing. To begin with,
critics of the mainstream climate consensus aren't arguing that consensus has no place in science. Judith Curry, often an
outspoken critic of other climate scientists, describes consensus as a normal part of science. And Roy Spencer has no
complaints about the existence of a consensushe just questions the extent of the agreement.
Spencer also attempts to clarify what, exactly, the consensus is about: the warming of the Earth and the existence of a
greenhouse effect that can be driven by carbon dioxide. By that standard, he has suggested, pretty much everyone is in
agreement.

The consensus in popular arguments


All of this is true in the scientific community. But in the popular debate, these things frequently get lost to the extent
that polls consistently show that a large fraction of the US public doesn't even think that the temperatures have gone up,
much less that humans might have anything to do with it.
If the consensus comes up in these conversations at all, it's usually used in one of two ways. Either advocates of the
consensus use it as a rhetorical clubessentially asking why anybody wouldn't agree with all the scientists. In other
cases, it's used as a parry. Someone will start an argument from authority based on a scientist who doesn't agree with the
scientific community's conclusions, and the consensus will be pulled out in order to provide a bigger, more
comprehensive authority.
Things usually go downhill from there. People will argue that consensus has no place in science (often quoting
Crichton) or complain that studies that showed a consensus exists were somehow lacking (even though several
independent ones have come to roughly the same numbers).
Given that it's so often used in unfortunate ways, is there any value in publicizing the existence of a consensus on
climate? Quite possibly. Repeated polls indicate that most people think there is still significant debate about the reality
of human-driven climate change within the scientific community; the strength of the consensus indicates that the debate
is largely over. The gap between the public's perception of things and reality indicates that there's a need for better
communications. And one of the people who have studied the degree of consensus, John Cook, has argued that the gap
exists across the ideological spectrum, even among those predisposed to accept the scientific community's conclusions.

But simply pointing out that a consensus exists won't help much if the public doesn't understand that consensus is a
natural part of the process of science, something arrived at by a careful evaluation of evidence. Mentions of the
consensus are best made in the context of a conversation about how it functions within science rather than when people
are attempting to shout each other down. And discussions on climate change far too often veer to the latter.

SCIENTIFIC METHOD / SCIENCE & EXPLORATION


Is it time to up the statistical standard for scientific results?
A statistician says science's test for significance falls short.
by John Timmer - Nov 12 2013, 10:10pm H

If you believe The Economist, science is in the midst of a crisis, with most of its conclusions failing to stand the test of
time. Research fraud is rising, but even studies that were performed properly sometimes either can't be reproduced or
appear to suffer from bias.
A new analysis suggests a very simple explanation for some of the problems: our statistics are weak. A statistician has
figured out how to compare Bayesian statistics to those normally used in scientific tests of significance. By comparing
the two, he finds that researchers are often accepting numbers that any good Bayesian would consider to be weak
evidence.

What's in a p?
To understand the problem, we have to go into how scientists assess significance. Typically, a given experiment has an
experimental condition that produces a number and a control condition that produces a second. The two numbers will
typically be different, but we need to know if those differences are significant. That's where statistics comes in. The
typical test used in science involves determining whether you'd produce the two numbers by random chance. In most
fields, if there's less than a five percent chance that you'd get the two numbers by random chance, then you can reject
chancethe results are considered significant. In statistical terms, this is called having a p value of less than 0.05.
Is there something special about a 95 percent probability? Absolutely not; a recent paper referred to it as "seemingly
arbitrary." It's simply been arrived at through the consensus of people working in the field. It seems in most fields,
people have been willing to accept a situation where, out of every 20 positive results, chances are that one of them is a
fluke and will not be reproducible.
But the 95 percent rule doesn't apply to every field. In particle physics, hints of particles with greater than 95 percent
certainty come and go all the timeyou can get a different answer depending on how much data you have at the time of
analysis. So that field has settled on a much higher standard: greater than 99.9999 percent confidence.
Even biology has made exceptions when needed. In genetics experiments, a 95 percent confidence was considered
perfectly acceptable evidence. Until the 90s, that is, when the development of gene chips meant that you could do a
single experiment that looked at every single gene in the human genome at onceover 20,000 of them. Suddenly, a
five percent error rate meant that every experiment produced over 1,000 false positives. The problem should have been
obvious, but, amazingly, it wasn't. It took a number of papers and ensuing discussions to swing the consensus of the
field around to demanding more statistical rigor.
It's worth noting that this value is no protection against research fraud. You can fake whatever statistical significance
you like. It also doesn't protect against more subtle and potentially unconscious biases, like which experiments to
include in a paper and when to stop collecting data for them. Asnoted at the Retraction Watch blog, results just at or
below a p of 0.05 are over-represented compared to other values, and their frequency is increasing. This suggests the
pressure for positive results is affecting what people publish.

Bigger problems for p


The new paper, however, argues that there are much bigger problems than biases or fields where 95 percent confidence
doesn't work. Instead, it contests that the measure itself is fundamentally misguided.
The author, Valen Johnson, is a statistician at Texas A&M. In his introduction, he notes that the standard statistics used
in science involve comparing the experimental results to a null hypothesis, namely random chance. Bayesian statistics
isn't used as often, in part because it compares a given hypothesis, random chance, and an alternative hypothesis. Since
it's usually hard to come up with alternate hypotheses, it's impractical to use Bayesian statistics.
Johnson's big contribution, published previously, was to develop a way to mathematically link Bayesian statistics to the
standard probabilities used by scientists. The math then allows a direct comparison between the probability values. In
his comparison, scientific standards seem pretty weak. The 95 percent certainty corresponds to a Bayesian evidence
threshold of between three and five, which Johnson notes is typically considered "positive evidence"but it falls well
below the values considered to be "strong evidence." It takes 99 percent certainty to get there.
(Just as with the standard practice, the values that Bayesian fans have set for what constitutes positive and strong
evidence are suggested by individual researchers and agreed to by consensus. Nobody's ever found a stone tablet etched
with a value for scientific certainty.)
Johnson concludes that if we assume that only one-half of the hypotheses should give us a positive result, then "these
results suggest that between 17 percent and 25 percent of marginally significant scientific findings are false." If we
assume the proportion of correct hypotheses is largerwhich we might, given that scientists are usually pretty clever

about the hypotheses they choose to testthen the problem gets even more pronounced. Overall, Johnson's suggestion
is simple: raise the statistical rigor all around. Demand that experiments produce a p value of 0.005 or smaller. And be
even pickier about results that we consider highly significant. There is a cost to this, in that you need bigger samples to
achieve the higher statistical rigor. In his example, you'd have to double the sample size. That's no problem if you're
breeding bacteria and fruit flies, but it will add a lot of time and expense if your project involves mice. Science as a
whole would move a lot more slowly.

Will this ensure reproducibility?


It will probably make things better, but it's not going to solve the problem. That's because there are really three classes
of reproducibility issues. The first one is simply a matter of numbers. The more experiments you do, the more likely
you'll fall afoul of the five percent error that we tolerate. And scientists are doing a lot more experiments. The number
of journals and papers are proliferating, and a presentation I attended recently indicated that, at least in biology, the
number of individual experiments per paper has gone up dramatically. (The figures within papers used to have about
eight individual images; now, they often have more than 20.)
Given Johnson's standard, a lot of these results will no longer be significant. So, we'll either have to get comfortable
with publishing a lot more suggestive, but negative, results or comfortable with publishing a lot less. It's hard to see
anyoneresearchers, publishers, funding agencieswho will actually be enthused about this, so it will be a difficult
sell.
Perhaps more significantly, there are those other types of reproducibility issues. One of them is what I'd consider the
"big picture" issue. Individual experiments may be wrong five percent of the time, but the conclusions of most papers
are built from a number of individual experiments that all point roughly in the same direction. A higher statistical rigor
would probably help by eliminating some of the spurious little-picture information that leads us astray when we
consider the big picture. But we get led astray for all sorts of additional reasons, including our biases, faulty reasoning,
and simply not having all the information we need to reach the right conclusion.
The other reproducibility problem is really a simple yes/no issue that has nothing to do with statistics. If you knock out
a specific gene in mice, does it have the phenotype that people have reported? If you use a specific antibody in a
procedure, do you see the same signal that another lab did? These are the sorts of nuts-and-bolts reproducibility issues
that drive researchers crazy, because they can be affected by things like the specific strain of mice you use, where you
buy your chemicals, and even the pH of your lab's water supply. No amount of statistical thinking is going to change
any of that.
Overall, Johnson has made what could be an important contribution at a time when a lot of people are worried about
reproducibility. Unfortunately, it also comes with a messagedo more science before you publishthat's going to be a
tough sell in the publish-or-perish research culture that currently exists. And even if Johnson were to succeed, the
problem of reproducibility won't go away entirely.

You might also like