You are on page 1of 4

Listicle

Three Reasons Why Research Has


a Reproducibility Crisis (and Four
Ways We Can Fix It)
Naomi Heffer

Reproducibility is a key principle of the scientific method. If science is to be considered credible and
trustworthy, it needs to yield reproducible findings. To be called reproducible, a result obtained in a single
study should be achievable again with a high level of agreement when the study is repeated using the
same methodology. Despite the vital importance of reproducibility to credible science, fields of research
across biomedical science continue to experience an ongoing reproducibility crisis, where many findings
cannot be independently reproduced and replicated (and some cannot even be replicated by the original
researchers themselves).

This listicle will consider how the crisis began and what data integrity and compliance steps can be taken by
individual researchers to help address the ongoing reproducibility issues in biomedical research.

Is there really a reproducibility “crisis”?


Many of the factors responsible for undermining reproducibility have been rampant within research practice
for many years, but collective awareness that biomedical science was experiencing a “reproducibility crisis”
didn’t come about until the 2000s/early 2010s. A key milestone in the awareness of science’s reproducibility
problems came in 2005 following the publication of the paper “Why Most Published Research Findings
are False”, which described an empirical attempt to estimate how much of published research was
reproducible and concluded that published scientific findings were more likely to be false (i.e., not
reproducible) than true. This was followed in the early 2010s by the discovery of major reproducibility
issues within preclinical research when an attempt was made to replicate cancer research findings
from fifty-three “landmark” studies and revealed that findings could only be replicated in 11% of cases.
Awareness of reproducibility problems in biomedical science is now widespread, and 90% of respondents
to a 2016 survey in Nature agreed that there is a “reproducibility crisis”.

How did we get into this mess?


This is a complex problem resulting from biases across different levels, from the cognitive biases of
individual researchers, to prejudices inherent within academic culture, but the main culprits fall into three
broad categories: poor study design, perverse incentives and questionable research practices.

1
Listicle

Poor study design

Failure to understand or apply basic design principles, such as blinding, randomization and the
importance of using appropriate control groups can introduce bias from confounding factors. One study
design issue which is pervasive throughout different fields of biomedical science is the use of small,
underpowered samples, which is made worse by the frequent use of less powerful between-subjects,
rather than within-subjects, designs. This not only increases the likelihood of false negative results, but
also reduces the positive predictive value of the findings, meaning that results that do reach the specified
statistical significance level are less likely to reflect a true effect.

Perverse incentives

Another major part of the problem is that doing replicable research is not highly incentivized within
research culture. Publication bias within research means that it is generally much easier to publish novel,
positive findings than negative findings or replication studies, no matter how rigorously these may have
been carried out. Owing to the influential role that publication metrics play in academic promotions, this
places researchers in a “publish or perish” situation, which shifts the emphasis from conducting accurate,
reproducible research, to producing “publishable” research findings.

Questionable research practices

The pressure to publish, amongst other research and career pressures, may lead researchers to engage
in questionable research practices to increase the likelihood of attaining more novel or conclusive findings.
Such practices include “p-hacking” (changing pre-processing or analysis routines to make statistical
significance more likely) and HARKing (Hypothesizing After the Results are Known), where researchers
formulate their hypotheses after they have seen the results of the study. The consequence of this is an
increase in the frequency false-positive results and inflated effect sizes in the published literature.

...and how do we get out of it?


There is no one easy solution to the reproducibility crisis. However, there are lots of small steps that
individual researchers can take to ensure the integrity of their data and the transparency of their
published research. The UK Reproducibility Network (UKRN) is a national, researcher-led organization that
aims to promote replicable research in the UK by providing training opportunities and disseminating best
practice. The UKRN identifies four key facets of reproducibility: organization, documentation, analysis, and
dissemination.

Organization

Even after a study is completed, it is important that researchers have full access to their data and
materials so that they can resolve issues and address queries from other researchers. One way to
ensure this happens is to build a data management plan ahead of time, outlining what data will be
produced as part of the project and how each type of data will be organized, documented, standardized
and stored. By developing an informative directory structure and establishing file naming conventions
in advance, researchers maximize their ability to retrieve information about their research projects in a
timely manner. As is the case for many different reproducible research practices, adopting this approach
to project organization not only serves to benefit the wider research community, but also enhances
individual research efficiency.

THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 2
Listicle

Documentation

Poor record keeping is another major challenge to reproducibility. For research findings to be
reproducible, it is essential that researchers have a clear record of all the decisions that were taken
during data collection and analysis, so that dissemination of study methods and findings can be made as
detailed and transparent as possible. Paper lab notebooks have been used to record research decisions
and observations since the 15th century, but have several important limitations: they are not searchable,
they can be easily damaged or misplaced and it is not easy to share the recorded information with peers
or collaborators. For these reasons, there has been a move in recent years towards using electronic
lab notebooks (ELNs), which are easily searchable, can be made globally accessible and are instantly
sharable with collaborators and the wider research community. Many different ELNs are available,
including costed, as well as free and open-source tools. As the number of available tools increases, it is
useful to be able to compare the advantages and limitations of different ELN platforms (see this tool from
Harvard Medical School).

To promote reproducibility, researchers should make detailed, digitized protocols relating to their
research findings accessible to the wider research community. When seeking protocols to use in their
own research, researchers often rely on searching the available literature, but the information included
in the methods section of research articles is usually not detailed enough to be exactly replicated, and
additional information included in supplementary files may not be as easily located. Research protocols
should be documented in as much detail as possible, including details such as duration per step, reagent
amount, expected result and software packages used. Publishing protocols to a protocol repository, rather
than including as a supplementary file to a research article, can increase discoverability and enable
protocol reuse by researchers seeking to extend or replicate the published work.

Analysis

Researchers don’t usually set out to engage in questionable statistical practices, such as p-hacking, but if
they don’t have a detailed prior analysis plan, or clearly documented justification for analytical decisions,
they can easily fall into bad practices. One way that researchers can protect against the potential
influence of their own cognitive biases is pre-registration. Publishing a statement outlining the primary
outcomes and analysis plan before beginning the planned research can help to dissuade researchers
from exploiting analytical flexibility to make their findings appear more conclusive. An additional benefit
of pre-registration is that it forces the researcher to think about key aspects of study design which are
important for reproducibility ahead of time (e.g., clear hypotheses, blinding, sample size justification).

Another step that researchers can take towards making their data analysis reproducible is to engage in
literate analysis programming. Tools like Jupyter and RMarkdown can be used to make understandable,
reproducible analysis records, where what was done can be recorded alongside a narrative description of
why it was done in a single, executable document.

Dissemination

By sharing their data and analysis code, researchers make it possible to validate their findings and
reproduce results, whilst also providing access to materials that might be valuable to other researchers
or policy makers, which increases the impact of the research. To maximize impact, data and code sharing
should be FAIR: Findable, Accessible, Interoperable and Reusable.

Data can be made findable by sharing data via a dedicated data repository, rather than publishing
on the researcher’s own website or in a supplemental file to a journal publication. This is because
publishing data to a repository provides a unique and citable identifier for your data (similar to DOI
for journal publications) and makes the data easier to find, as it is more likely to appear in searches.

THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 3
Listicle

Making data and code open access whilst applying a suitable data license, such as a Creative
Commons license, helps to make the data as widely accessible as possible, whilst also ensuring that
the original authors receive appropriate credit for their work. Data and code should be made available
in open, persistent, and non-proprietary file formats that don’t require specialized software to open
and run, to make the data interoperable. Sharing the data alongside detailed documentation and rich
metadata makes the data reusable.

THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 4

You might also like