Science is broken
Perverse incentives and the misuse of quantitative
metrics have undermined the integrity of scientific

Siddhartha Roy is an environmental engineer

and PhD candidate at Virginia Tech.
T he rise of the 20th-century research university in the United States stands as one of the
great achievements of human civilisation – it helped to establish science as a public
good, and advanced the human condition through training, discovery and innovation. But if
Marc A Edwards is University Distinguished
Professor at Virginia Tech. the practice of science should ever undermine the trust and symbiotic relationship with
society that allowed both to flourish, our ability to solve critical problems facing humankind
and civilisation itself will be at risk. We recently explored how increasingly perverse
practices, and by extension, whether a loss of support for science in some segments of
are doing to science.

We argue that over the past half-century, the incentives and reward structure of science
have changed, creating a hypercompetition among academic researchers. Part-time and
adjunct faculty now make up 76 per cent of the academic labour force, allowing universities
to operate more like businesses, making tenure-track positions much more rare and
desirable. Increased reliance on emerging quantitative performance metrics that value
numbers of papers, citations and research dollars raised has decreased the emphasis on
socially relevant outcomes and quality. There is also concern that these pressures could
encourage unethical conduct by scientists and the next generation of STEM scholars who
persist in this hypercompetitive environment. We believe that reform is needed to bring
balance back to the academy and to the social contract between science and society, to
ensure the future role of science as a public good.

T he pursuit of tenure traditionally influences almost all decisions, priorities and

activities of young faculty at research universities. Recent changes in academia,
however, including increased emphasis on quantitative performance metrics, harsh
competition for static or reduced federal funding, and implementation of private business
models at public and private universities are producing undesirable outcomes and
unintended consequences (see Table 1 below).

Quantitative metrics are increasingly dominating decision-making in faculty hiring,

promotion and tenure, awards and funding, and creating an intense focus on publication
count, citations, combined citation-publication counts (h-index being the most popular),
journal impact factors, total research dollars and total patents. All these measures are
subject to manipulation as per Goodhart’s law, which states: When a measure becomes a
target, it ceases to be a good measure. The quantitative metrics can therefore be misleading
and ultimately counterproductive to assessing scientific research.

Table 1: Modified and with quotes from the blog Embedded in Academia by John Regehr, professor of
computer science at the University of Utah; used with permission.

The increased reliance on quantitative metrics might create inequities and outcomes worse
than the systems they replaced. Specifically, if rewards are disproportionally given to
individuals manipulating the metrics, well-known problems of the old subjective paradigms
(eg, old-boys’ networks) appear simple and solvable. Most scientists think that the damage
owing to metrics is already apparent. In fact, 71 per cent of researchers believe that it is
possible to ‘game’ or ‘cheat’ their way into better evaluations at their institutions.

This manipulation of the evaluative metrics has been documented. Recent exposés have
revealed schemes by journals to manipulate impact factors, use of p-hacking by researchers
to mine for statistically significant and publishable results, rigging of the peer-review
process itself and over-citation practices. The computer scientist Cyril Labbé at the Joseph
Fourier University in Grenoble even created Ike Antkare, a fictional character, who, by
virtue of publishing 102 computer-generated fake papers, achieved a stellar h-index of 94
on Google Scholar, surpassing that of Albert Einstein. Blogs describing how to inflate your
h-index without committing outright fraud are, in fact, just a Google search away.

S ince the Second World War, scientific output as measured by cited work has doubled
every nine years. How much of the growth in this knowledge industry is, in essence,
illusory and a natural consequence of Goodhart’s law? It is a real question.

Consider the role of quality versus quantity maximising true scientific progress. If a process
is overcommitted to quality over quantity, accepted practices might require triple- or
quadruple-blinded studies, mandatory replication of results by independent parties, and
peer review of all data and statistics before publication. Such a system would produce very
few results due to over-caution, and would waste scarce research funding. At another
extreme, an overemphasis on quantity would produce numerous substandard papers with
lax experimental design, little or no replication, scant quality control and substandard peer-
review (see Figure 1 below). As measured by the quantitative metrics, apparent scientific
progress would explode, but too many results would be erroneous, and consumers of
research would be mired in wondering what was valid or invalid. Such a system merely
creates an illusion of scientific progress. Obviously, a balance between quantity and quality
is desirable.

It is hypothetically possible that in an environment without quantitative metrics and fewer

perverse incentives emphasising quantity over quality, practices of scholarly evaluation
(enforced by peer review) would evolve to be near to an optimum level of productivity. But
we suspect that the existing perverse-incentive environment is pushing researchers to
overemphasise quantity in order to compete, leaving true scientific productivity at less than
optimal levels. If the hypercompetitive environment also increased the likelihood and
frequency of unethical behaviour, the entire scientific enterprise would be eventually cast
into doubt. While there is virtually no research exploring the precise impact of perverse
incentives on scientific productivity, most in the academic world would acknowledge a shift
towards quantity in research.

Figure 1: Quantity versus quality, vis-à-vis true scientific progress

Favouring output over outcomes, or quantity over quality, can also create a ‘perversion of
natural selection’. Such a system is more likely to weed out ethical and altruistic
researchers, while selecting for those who better respond to perverse incentives. The
average scholar can be pressured to engage in unethical practices in order to have or
maintain a career. Then, as per Mark Granovetter’s ‘Threshold Models of Collective
Behaviour’ (1978), unethical actions become ‘embedded in the structures and processes’ of
a professional culture. At this point, the conditioning to ‘view corruption as permissible’ or
even necessary is very strong. Compelling anecdotal testimony, in which accomplished and
public-minded professors write about why they are leaving a career they once loved, is
emerging. The Chronicle of Higher Education has even coined a name for this genre: Quit Lit.
In Quit Lit, even senior researchers provide perfectly rational explanations for leaving their
privileged and prized positions, rather than compromise their principles in a
hypercompetitive, perverse-incentive environment. One is left to wonder whether minority
students or women rationally and disproportionately decide to opt out of the system more
so than the groups who tend to persist.

In brief, although quantitative metrics provide a superficially attractive approach to

evaluating research productivity in comparison with subjective measures, once they are a
target they cease to be useful and can even be counter-productive. Continued
overemphasis of quantitative metrics might compel all but the most ethical scientists to
produce more work of lower quality, to ‘cut corners’ whenever possible, decrease true
productivity, and select for scientists who persist and thrive in a perverse-incentive
environment. It is hypothetically possible that the realities of modern academia affect the
persistence of women and minorities at all phases of the academic pipeline.

Many scientific societies, research institutions, academic journals and individuals have
advanced arguments trying to correct some excesses of quantitative metrics. Some have
signed the San Francisco Declaration on Research Assessment (DORA). DORA recognises
the need for improving ‘ways in which output of scientific research are evaluated’, and calls
for challenging research-assessment practices, especially the currently operative ‘journal
impact factor’ parameters. As of 1 August this year, 871 organisations and 12,788
individuals have signed DORA, including the American Society for Cell Biology, the
American Association for the Advancement of Science, the Howard Hughes Medical
Institute, and the Proceedings of the National Academy of Sciences. The publishers of
Nature, Science and other journals have called for downplaying the impact-factor metric. The
American Society of Microbiology recently took a principled stand and eliminated impact-
factor information from all their journals ‘to avoid contributing further to the inappropriate
focus on journal [impact factors]’. The aim is to slow the ‘avalanche’ of unreliable
performance metrics dominating research assessment. Like others, we are not advocating
for the abandonment of metrics, but reducing their importance in decision-making by
institutions and funding agencies, until we possibly have objective measures that better
represent the true value of scientific research.

I n the hypercompetitive funding environment of modern science, the federal government

has been the one enabling, indispensable resource. It has been paramount in financing
research and development (R&D), creating new knowledge, fulfilling public missions
including national security, agriculture, infrastructure and environmental health. Starting in
the Second World War, the federal government has largely borne a big fraction of the cost
of high-risk, long-term scientific research. Such scientific research carries uncertain
prospects or sometimes lacks obvious short-term societal impacts, and follows an agenda
that is often set by scientists and funding agencies. This foundation of federal funding has
created a research and knowledge ecosystem supplemented by universities and industries.
Together, it has made historic contributions to the collective progress of humanity.

For at least the past decade, however, US federal spending on R&D has been in decline. Its
‘research intensity’ (or, the federal R&D budget as a share of the country’s gross domestic
product) declining to 0.78 per cent (2014) from about 2 per cent in the 1960s. In tandem,
China is projected to outspend the US on R&D by 2020.

US colleges and universities have also historically served to shape the next generation of
researchers, who will provide education and knowledge for and to the public. But as
universities morph into ‘profit centres’ focused on generating new products and patents,
they are de-emphasising science as a public good.

Competition among researchers for funding has never been more intense, entering an era
with the worst funding environment in half a century. Between 1997 and 2014, the funding
rate for the US National Institutes for Health (NIH) grants fell from 30.5 per cent to 18 per
cent. US National Science Foundation (NSF) funding rates have remained stagnant at 23-
25 per cent in the past decade. Thankful for small favours, these funding rates are still well
above 6 per cent, which is an approximate breakeven point when the net cost of proposal-
writing equals the net value obtained from a grant by the grant-winner. Nonetheless, the
grant environment is hypercompetitive, susceptible to reviewer biases, skewed towards
funding agencies’ research agendas, and strongly dependent on prior success as measured
by quantitative metrics. Even before the financial crisis struck, the Nobel laureate Roger
Kornberg remarked: ‘If the work you propose to do isn’t virtually certain of success, then it
won’t be funded.’ These broad changes take valuable time and resources away from
scientific discovery and translation, compelling researchers to spend inordinate amounts of
time constantly chasing grant proposals and filling out ever increasing paperwork for grant

The steady growth of perverse incentives, and their instrumental role in faculty research,
hiring and promotion practices, amounts to a systemic dysfunction endangering scientific
integrity. There is growing evidence that today’s research publications too frequently suffer
from lack of replicability, rely on biased data-sets, apply low or sub-standard statistical
methods, fail to guard against researcher biases, and overhype their findings. In other
words, an overemphasis on quantity versus quality. It is therefore not surprising that
scrutiny has revealed a troubling level of unethical activity, outright faking of peer review,
and retractions. The Economist recently highlighted the prevalence of shoddy and non-
reproducible modern scientific research and its high financial cost to society. They strongly
suggested that modern science is untrustworthy and in need of reform. Given the high cost
of exposing, disclosing or acknowledging scientific misconduct, we can be fairly certain
that there is much more than has been revealed. Warnings of systemic problems go back to
at least 1991, when the NSF director Walter E Massey noted that the size, complexity and
increased interdisciplinary nature of research in the face of growing competition was
making science and engineering ‘more vulnerable to falsehoods’.

The NSF defines research misconduct as intentional ‘fabrication, falsification, or plagiarism

in proposing, performing, or reviewing research, or in reporting research results’. Among
research misconduct cases investigated by the US Department of Health and Human
Services (includes the NIH) and the NSF, 20-33 per cent are found guilty. Annual costs, at
the institutional level, of $110 million are incurred for all such research-misconduct
investigations in the US. From 1992-2012, 291 scientific papers published under NIH
grants were retracted due to misconduct, accounting for $58 million in direct funding from
the agency. Obviously, the incidence of undetected misconduct is greater, some multiple of
the cases judged as such each year.

The true incidence is difficult to predict. A comprehensive meta-analysis of research-

misconduct surveys during 1987-2008 indicated that one in 50 scientists admitted to
committing misconduct (fabrication, falsification, and/or modifying data) at least once, and
14 per cent of scientists knew of colleagues who had done so. Most likely, given the
sensitivity of the questions asked and the low response rates, these numbers are an
underestimate of the true incidence. Since 1975, in life science and biomedical research,
the percentage of scientific articles retracted has increased tenfold; 67 per cent of the
retractions were due to misconduct. Hypotheses for the increase include the ‘lure of the
luxury journal’, ‘pathological publishing’, insufficient misconduct policies, academic
culture, career stage, and perverse incentives. From climate science to galvanic corrosion,
we have seen research published that denigrates the scientific ethos, and undermines the
credibility of the scientific community and everyone in it.

T he principle of self-government in academia is strong, and this is a distinguishing

feature of the modern research university. Science is expected to be self-policing and
self-correcting. We have come to believe, however, that incentives throughout the system
induce all stakeholders to ‘pretend misconduct does not happen’. It is remarkable that
science never developed a clear system for reporting and investigating allegations of
research misconduct. Individuals who do allege misconduct don’t have an easy, evident
path to do so, and risk suffering severe negative professional repercussions. In relation to
what is considered fair in reporting research, grant-writing practices and promoting
research ideas, scholars operate, to a great extent, on an unenforceable and unwritten
honour system. Today, there are compelling reasons to doubt that science as a whole is self-
correcting. We are not the first to recognise this problem. Scientists have proposed open-
data, open-access, post-publication peer review, meta-studies and efforts to reproduce
landmark studies as practices to help compensate for the high error rates in modern
science. Beneficial as these corrective measures might be, perverse incentives on
individuals and institutions remain the root problem.

There are exceptional cases in which individuals have provided a reality check on
overhyped research press releases, especially in areas deemed potentially transformative
(for example, Johnathan Eisen’s real-time commentary on some mania surrounding the
‘microbiome’). Generally, however the limitations of hot research sectors are downplayed or
ignored. Because every modern scientific mania creates a quantitative metric windfall for
participants, and because few consequences come to those responsible when a science
bubble bursts, the only effective check on pathological science and a misallocation of
resources is the unwritten honour system.

Misconduct is not limited to academic researchers. Perverse incentives and

hypercompetition also come to bear on federal agencies, giving rise to a new phenomenon
of institutional scientific research misconduct. The US Centers for Disease Control and
Prevention (CDC), for example, produced an erroneous report on the drinking-water crisis
in Washington, DC, claiming that the extremely high levels of lead in the water did not
cause an elevation in the local children’s blood lead levels. After refusing to correct or
defend their research, Congressional investigators had to intervene, and found the report to
be ‘scientifically indefensible’. A few months after being chastised in Congress, the same
branch of CDC wrote what a Reuters investigation called yet another ‘flawed’ report on lead
contamination of soil, drinking water and air in East Chicago in Indiana that left vulnerable
children and minorities in harm’s way for at least five years longer than was necessary.

The US Environmental Protection Agency (EPA) also published scientific reports from
consultants based on non-existent data in industry journals. More recently, the EPA
silenced its own whistleblowers during the water crisis in the city of Flint in Michigan. As
agencies increasingly compete with each other for reduced discretionary funding and
maintaining existing cash flows (CDC’s desire to focus more on lead paint, as opposed to
lead in water, for example), they seem to be more inclined to publish ‘good news’ instead of
science. In an era of declining discretionary funding, federal agencies have financial
conflicts of interest and fears of survival, similar to those in private industry. Given the
common misconception that federal funding agencies are free of such conflicts, the dangers
of institutional research misconduct might rival or even outweigh those of industry-
sponsored research, given that there is no system of checks and balances, and consumers
of such work might be overly trusting.

If we don’t reform the academic scientific-research enterprise, we risk significant disrepute

to and public distrust of science. The modern academic research enterprise, which The
Economist has derided as a ‘Ponzi scheme’, operates on a system of perverse incentives that
would have been almost inconceivable to researchers 50 years ago. We believe that this
system presents a real threat to the future of science. If immediate action is not taken, we
risk creating a corrupt professional culture akin to that revealed in professional cycling (ie,
20 out of 21 Tour de France podium finishers during 1999-2005 were conclusively tied to
doping), where an uncontrolled perverse-incentive system created an environment in which
athletes felt that they had to cheat to compete. While pro-cycling suffered severe disrepute
due to prolific doping scandals instigated by a burning desire to win at any cost, the stakes
in science are much higher. The loss of altruistic actors and trust in science would bring
even greater harm to the public and the planet.

I n recent years, academia has witnessed unqualified success in acknowledging numerous

important issues, including those of demographic diversity, work-life balance, funding,
better teaching, public outreach, and engagement – attempts are being made to address
many of these problems.

All scientists should aspire to leave the field in a better state than when we first entered it.
The very important matters of state and federal funding lie beyond our direct control.
However, when it comes to the health, integrity and public perception of science and its
value, we are the key actors. We can openly acknowledge and address problems with
perverse incentives and hypercompetition that are distorting science and imperilling
scientific research as a public good. Some relatively simple steps include arriving at a better
understanding of the problem, by systematically mining the experiences and perceptions of
academics in STEM fields, via a comprehensive survey of high-achieving graduate students
and researchers.

Second, the NSF should commission a panel of economists and social scientists with
expertise in perverse incentives to collect and review input from all levels of academia,
including retired National Academy members and distinguished STEM scholars. With a
long-term view to fostering science as a public good, the panel could also develop a list of
‘best practices’ to guide evaluation of candidates for hiring and promotion.

Third, we can no longer afford to pretend that the problem of research misconduct does not
exist. At both the undergraduate and graduate levels, science and engineering students
should receive realistic instruction on these subjects, so that they are prepared to act when,
not if, they encounter it. The curriculum should include review of real-world pressures,
incentives and stresses that can increase the likelihood of research misconduct.

Fourth, universities can take measures immediately to protect the integrity of scientific
research, and announce steps to reduce perverse incentives and uphold research
misconduct policies that discourage unethical behaviour. Finally, and perhaps most simply,
in addition to teaching technical skills, PhD programmes themselves should accept that
they ought to acknowledge the present reality of perverse incentives, while also fostering
character development, and respect for science as a public good, and the critical role of
quality science to the future of humankind.

