Professional Documents
Culture Documents
Reproducibility is a key principle of the scientific method. If science is to be considered credible and
trustworthy, it needs to yield reproducible findings. To be called reproducible, a result obtained in a single
study should be achievable again with a high level of agreement when the study is repeated using the
same methodology. Despite the vital importance of reproducibility to credible science, fields of research
across biomedical science continue to experience an ongoing reproducibility crisis, where many findings
cannot be independently reproduced and replicated (and some cannot even be replicated by the original
researchers themselves).
This listicle will consider how the crisis began and what data integrity and compliance steps can be taken by
individual researchers to help address the ongoing reproducibility issues in biomedical research.
1
Listicle
Failure to understand or apply basic design principles, such as blinding, randomization and the
importance of using appropriate control groups can introduce bias from confounding factors. One study
design issue which is pervasive throughout different fields of biomedical science is the use of small,
underpowered samples, which is made worse by the frequent use of less powerful between-subjects,
rather than within-subjects, designs. This not only increases the likelihood of false negative results, but
also reduces the positive predictive value of the findings, meaning that results that do reach the specified
statistical significance level are less likely to reflect a true effect.
Perverse incentives
Another major part of the problem is that doing replicable research is not highly incentivized within
research culture. Publication bias within research means that it is generally much easier to publish novel,
positive findings than negative findings or replication studies, no matter how rigorously these may have
been carried out. Owing to the influential role that publication metrics play in academic promotions, this
places researchers in a “publish or perish” situation, which shifts the emphasis from conducting accurate,
reproducible research, to producing “publishable” research findings.
The pressure to publish, amongst other research and career pressures, may lead researchers to engage
in questionable research practices to increase the likelihood of attaining more novel or conclusive findings.
Such practices include “p-hacking” (changing pre-processing or analysis routines to make statistical
significance more likely) and HARKing (Hypothesizing After the Results are Known), where researchers
formulate their hypotheses after they have seen the results of the study. The consequence of this is an
increase in the frequency false-positive results and inflated effect sizes in the published literature.
Organization
Even after a study is completed, it is important that researchers have full access to their data and
materials so that they can resolve issues and address queries from other researchers. One way to
ensure this happens is to build a data management plan ahead of time, outlining what data will be
produced as part of the project and how each type of data will be organized, documented, standardized
and stored. By developing an informative directory structure and establishing file naming conventions
in advance, researchers maximize their ability to retrieve information about their research projects in a
timely manner. As is the case for many different reproducible research practices, adopting this approach
to project organization not only serves to benefit the wider research community, but also enhances
individual research efficiency.
THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 2
Listicle
Documentation
Poor record keeping is another major challenge to reproducibility. For research findings to be
reproducible, it is essential that researchers have a clear record of all the decisions that were taken
during data collection and analysis, so that dissemination of study methods and findings can be made as
detailed and transparent as possible. Paper lab notebooks have been used to record research decisions
and observations since the 15th century, but have several important limitations: they are not searchable,
they can be easily damaged or misplaced and it is not easy to share the recorded information with peers
or collaborators. For these reasons, there has been a move in recent years towards using electronic
lab notebooks (ELNs), which are easily searchable, can be made globally accessible and are instantly
sharable with collaborators and the wider research community. Many different ELNs are available,
including costed, as well as free and open-source tools. As the number of available tools increases, it is
useful to be able to compare the advantages and limitations of different ELN platforms (see this tool from
Harvard Medical School).
To promote reproducibility, researchers should make detailed, digitized protocols relating to their
research findings accessible to the wider research community. When seeking protocols to use in their
own research, researchers often rely on searching the available literature, but the information included
in the methods section of research articles is usually not detailed enough to be exactly replicated, and
additional information included in supplementary files may not be as easily located. Research protocols
should be documented in as much detail as possible, including details such as duration per step, reagent
amount, expected result and software packages used. Publishing protocols to a protocol repository, rather
than including as a supplementary file to a research article, can increase discoverability and enable
protocol reuse by researchers seeking to extend or replicate the published work.
Analysis
Researchers don’t usually set out to engage in questionable statistical practices, such as p-hacking, but if
they don’t have a detailed prior analysis plan, or clearly documented justification for analytical decisions,
they can easily fall into bad practices. One way that researchers can protect against the potential
influence of their own cognitive biases is pre-registration. Publishing a statement outlining the primary
outcomes and analysis plan before beginning the planned research can help to dissuade researchers
from exploiting analytical flexibility to make their findings appear more conclusive. An additional benefit
of pre-registration is that it forces the researcher to think about key aspects of study design which are
important for reproducibility ahead of time (e.g., clear hypotheses, blinding, sample size justification).
Another step that researchers can take towards making their data analysis reproducible is to engage in
literate analysis programming. Tools like Jupyter and RMarkdown can be used to make understandable,
reproducible analysis records, where what was done can be recorded alongside a narrative description of
why it was done in a single, executable document.
Dissemination
By sharing their data and analysis code, researchers make it possible to validate their findings and
reproduce results, whilst also providing access to materials that might be valuable to other researchers
or policy makers, which increases the impact of the research. To maximize impact, data and code sharing
should be FAIR: Findable, Accessible, Interoperable and Reusable.
Data can be made findable by sharing data via a dedicated data repository, rather than publishing
on the researcher’s own website or in a supplemental file to a journal publication. This is because
publishing data to a repository provides a unique and citable identifier for your data (similar to DOI
for journal publications) and makes the data easier to find, as it is more likely to appear in searches.
THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 3
Listicle
Making data and code open access whilst applying a suitable data license, such as a Creative
Commons license, helps to make the data as widely accessible as possible, whilst also ensuring that
the original authors receive appropriate credit for their work. Data and code should be made available
in open, persistent, and non-proprietary file formats that don’t require specialized software to open
and run, to make the data interoperable. Sharing the data alongside detailed documentation and rich
metadata makes the data reusable.
THREE REASONS WHY RESEARCH HAS A REPRODUCIBILITY CRISIS (AND FOUR WAYS WE CAN FIX IT) 4