Professional Documents
Culture Documents
http://arstechnica.com/science/2014/09/scientific-consensus-has-gotten-a-bad-reputation-and-it-doesnt-deserve-it/
Consensus? It's complicated and does not involve A) everyone agreeing or B) everyone meeting for coffee.
One of the many unfortunate aspects of arguments over climate change is that it's where many people come across the
idea of a scientific consensus. Just as unfortunately, their first exposure tends to be in the form of shouted sound bites:
"But there's a consensus!" "Consensus has no place in science!"
Lost in the shouting is the fact that consensus plays several key roles in the process of science. In light of all the
consensus choruses, it's probably time to step back and examine its importance and why it's a central part of the
scientific process. And only after that is it possible to take a look at consensus and climate change.
Standards of evidence
Fiction author Michael Crichton probably started the backlash against the idea of consensus in science. Crichton was
rather notable for doubting the conclusions of climate scientistshe wrote an entire book in which they were the
villainsso it's fair to say he wasn't thrilled when the field reached a consensus. Still, it's worth looking at what he said,
if only because it's so painfully misguided:
Let's be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics.
Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results
that are verifiable by reference to the real world. In science consensus is irrelevant. What is relevant is reproducible
results.
Reproducible results are absolutely relevant. What Crichton is missing is how we decide that those results are
significant and how one investigator goes about convincing everyone that he or she happens to be right. This comes
down to what the scientific community as a whole accepts as evidence.
In an earlier discussion of science's standards for statistical significance, we wrote, "Nobody's ever found a stone tablet
etched with a value for scientific certainty." Different fields use different values of what they think constitutes
significance. In biology, where "facts" are usually built from a large collection of consistent findings, scientists are
willing to accept findings that are only two standard deviations away from random noise as evidence. In physics, where
particles either exist or don't, five standard deviations are required.
While that makes the standards of evidence sound completely rational, they're also deeply empirical. Physicists found
that signals that were three standard deviations from the expected value came and went all the time, which is why they
increased their standard. Biologists haven't had such problems, but other problems have popped up as new technology
enabled them to do tests that covered tens of thousands of genes instead of only a handful. Suddenly, spurious results
were cropping up at a staggering pace. For these experiments, biologists agreed to a different standard of evidence.
It's not like they got together and had a formal vote on it. Instead, there were a few editorials that highlighted the
problem, and those pieces started to sway the opinions of not only scientists but journal editors and the people who fund
grants. In other words, the field reached a consensus.
That sort of thing is easiest to see in terms of statistical significance, but it pervades the process of science. If two
closely related species share a feature, then we conclude it was present in their common ancestor. The scientific
community decided to establish 15 percent ice coverage as the standard for when a region of the ocean contains ice.
It required that every potential planet imaged by the Kepler probe must have its presence confirmed by an independent
method before being called a planet. There's no objective standard that defines any one of these test as the truth; it's just
that the people in the field have reached a consensus about what constitutes evidence.
In most fields, however, the stakes aren't quite so high. You get informal consensuses forming around things that the
public isn't ever aware of: the existence of morphogens in patterning embryonic tissues, the source of the radiation in
the jets of quasars, and so on. If you asked a large group of scientists, their consensus would be that consensus is a
normal part of the scientific process. Contrary to Crichton's writings, the consensus forms precisely because
reproducible evidence is generated.
Consensus matters
On its own, the existence of a consensus seems trivial; researchers conclude some things based on the state of the
evidence without that evidence ever rising to the level of formal proof. But consensus plays a critical role in the day-today functioning of science as well.
In The Structure of Scientific Revolutions, Thomas Kuhn discussed the idea of paradigms: big intellectual frameworks
that organize a field's research. Paradigms help identify problems that need solving, areas that still have anomalous
resultsall while giving researchers ways of interpreting any results they get. Generally, they tell scientists what to do
and how to think of their results. Although not as important or over-arching as a paradigm, a consensus functions the
same way, just at a smaller scale.
For example, researchers will necessarily interpret their results based on what the consensus in their field is. So odd
cosmic observations will be considered in terms of the existence of dark matter particles, given that there's a consensus
that the particles exist. It doesn't matter whether the researchersor their resultsagree with the consensus. The
existence of a consensus simply shapes the discussion. In the same way, research goals and grants are set based on areas
where the consensus opinion seems a bit weak or has unanswered questions.
At first glance, this may seem like it can stifle the appearance of ideas that run counter to the consensus. But any idea in
science carries the seeds of its own destruction. By directing research to the areas where there are outstanding questions,
a consensus makes it more likely that we'll generate data that directly contradicts it. It may take a little while to get
recognized for what it is, but eventually the data will win out.
A climate of consensus
It's easy to find examples of how a consensus operates from any area of science, and the field of climate science is no
different. A strong consensus has formed about the broad outlines of climate change, although there are still some
details, like the precise impact of aerosols, that are recognized as uncertain. (You could say that either no consensus has
formed or that there's a consensus that we don't precisely know.) You're never going to convince everyone in the field,
but a variety of studies have suggested that over 95 percent of the scientists with the relevant expertise are on the same
page about the general outlines of climate change.
What's really different is that the consensus has been formalized. The Intergovernmental Panel on Climate Change
produces assessment reports that summarize the latest knowledge in the field. These reports synthesize multiple
research papers to paint a general picture of the state of knowledge, and they even provide measures of how certain that
picture is. And as mentioned above, people have used a variety of methodsliterature searches, polling of scientists,
and so onto measure the state of the consensus. Each of those attempts has put the consensus number for climate
change in the area of 97 percent agreement.
A consensus definitely exists. Does that mean it's right?
There have clearly been times in the past where the consensus wasn't especially brilliant. Mendel was ignored instead of
starting to build a consensus, and Alfred Wegner's formative ideas about plate tectonics were roundly ridiculed. But it's
worth noting that these cases are the exception. The majority of the time, the consensus is a bit closer to being right than
whatever came before it. And while it may be slow to change sometimes, it can eventually be shifted by the weight of
the evidence.
The other thing is that scientists are reasonably good at knowing when they don't know something as well as they'd like
to. For example, uncertainties about aerosols and cloud feedbacks are generally recognized as the biggest challenge
facing climate projections, and most scientists would also agree that projections don't do a great job with regional
effects.
That doesn't mean individual scientists aren't convinced that they know what's going on, either about these topics or
others where scientific knowledge remains fluid. It's just that there are enough other scientists convinced that the first
group are wrong. It becomes clear to everyone involved that there's a bitter argument going on. And that's enough
to show bystanders that matters haven't been settled yet.
No consensus?
Unfortunately, for people outside the field, it can be hard to distinguish these sorts of scientific arguments from the
pedantic nitpicking that's often done by scientists who haven't been persuaded by the evidence and probably never will
be. There will always be people with relevant credentials who aren't convinced, and they often make technical-sounding
arguments about why the evidence falls short. And they typically find ways of making sure the public hears those
arguments.
But it's important to understand what the few scientists that don't accept the consensus are arguing. To begin with,
critics of the mainstream climate consensus aren't arguing that consensus has no place in science. Judith Curry, often an
outspoken critic of other climate scientists, describes consensus as a normal part of science. And Roy Spencer has no
complaints about the existence of a consensushe just questions the extent of the agreement.
Spencer also attempts to clarify what, exactly, the consensus is about: the warming of the Earth and the existence of a
greenhouse effect that can be driven by carbon dioxide. By that standard, he has suggested, pretty much everyone is in
agreement.
But simply pointing out that a consensus exists won't help much if the public doesn't understand that consensus is a
natural part of the process of science, something arrived at by a careful evaluation of evidence. Mentions of the
consensus are best made in the context of a conversation about how it functions within science rather than when people
are attempting to shout each other down. And discussions on climate change far too often veer to the latter.
If you believe The Economist, science is in the midst of a crisis, with most of its conclusions failing to stand the test of
time. Research fraud is rising, but even studies that were performed properly sometimes either can't be reproduced or
appear to suffer from bias.
A new analysis suggests a very simple explanation for some of the problems: our statistics are weak. A statistician has
figured out how to compare Bayesian statistics to those normally used in scientific tests of significance. By comparing
the two, he finds that researchers are often accepting numbers that any good Bayesian would consider to be weak
evidence.
What's in a p?
To understand the problem, we have to go into how scientists assess significance. Typically, a given experiment has an
experimental condition that produces a number and a control condition that produces a second. The two numbers will
typically be different, but we need to know if those differences are significant. That's where statistics comes in. The
typical test used in science involves determining whether you'd produce the two numbers by random chance. In most
fields, if there's less than a five percent chance that you'd get the two numbers by random chance, then you can reject
chancethe results are considered significant. In statistical terms, this is called having a p value of less than 0.05.
Is there something special about a 95 percent probability? Absolutely not; a recent paper referred to it as "seemingly
arbitrary." It's simply been arrived at through the consensus of people working in the field. It seems in most fields,
people have been willing to accept a situation where, out of every 20 positive results, chances are that one of them is a
fluke and will not be reproducible.
But the 95 percent rule doesn't apply to every field. In particle physics, hints of particles with greater than 95 percent
certainty come and go all the timeyou can get a different answer depending on how much data you have at the time of
analysis. So that field has settled on a much higher standard: greater than 99.9999 percent confidence.
Even biology has made exceptions when needed. In genetics experiments, a 95 percent confidence was considered
perfectly acceptable evidence. Until the 90s, that is, when the development of gene chips meant that you could do a
single experiment that looked at every single gene in the human genome at onceover 20,000 of them. Suddenly, a
five percent error rate meant that every experiment produced over 1,000 false positives. The problem should have been
obvious, but, amazingly, it wasn't. It took a number of papers and ensuing discussions to swing the consensus of the
field around to demanding more statistical rigor.
It's worth noting that this value is no protection against research fraud. You can fake whatever statistical significance
you like. It also doesn't protect against more subtle and potentially unconscious biases, like which experiments to
include in a paper and when to stop collecting data for them. Asnoted at the Retraction Watch blog, results just at or
below a p of 0.05 are over-represented compared to other values, and their frequency is increasing. This suggests the
pressure for positive results is affecting what people publish.
about the hypotheses they choose to testthen the problem gets even more pronounced. Overall, Johnson's suggestion
is simple: raise the statistical rigor all around. Demand that experiments produce a p value of 0.005 or smaller. And be
even pickier about results that we consider highly significant. There is a cost to this, in that you need bigger samples to
achieve the higher statistical rigor. In his example, you'd have to double the sample size. That's no problem if you're
breeding bacteria and fruit flies, but it will add a lot of time and expense if your project involves mice. Science as a
whole would move a lot more slowly.