Professional Documents
Culture Documents
INCF Workshop Report - Towards Neuroscience-Centered Selection Criteria For Data Repositories and Scientific Gateways
INCF Workshop Report - Towards Neuroscience-Centered Selection Criteria For Data Repositories and Scientific Gateways
Authors
Malin Sandström, Mathew Abrams
Workshop organizer
INCF Infrastructure Committee
Workshop participants
Jan Bjaalie, University of Oslo & EBRAINS (chair)
Mathew Abrams, INCF Secretariat
Alice Allen
Matt Cannon, Taylor & Francis
Satrajit Ghosh, Massachusetts Institute of Technology
Wojtek Gościński, Monash University
Chris Graf, Wiley
Nick Guenther
Daniel S. Katz, University of Illinois
Dawei Lin, NIH
Maryann Martone, University of California, San Diego
Mustapha Mokrane, Core Trust Seal
Tom Morrell, California Institute of Technology
Malin Sandström, INCF Secretariat
Susanna-Assunta Sansone, FAIRsharing.org
Sirarat Sarntivijai, ELIXIR
Kathleen Shearer, Confederation of Open Access Repositories
Jerry Sheehan, NIH
Marie Timmerman, Science Europe
Contents
Executive summary 2
Introduction 2
Current Landscape of guidelines/recommendations for digital data services 3
RDA/FAIRSharing 3
FORCE11 Software Citation 5
INCF Infrastructure Committee 6
COAR 7
Summary of workshop discussions 8
Recommendations to INCF 9
Executive summary
FAIR requires that the necessary infrastructure in the form of web-accessible repositories is
available to neuroscientists for publishing research objects such as data, code, and workflows.
The objective of this workshop was to bring together representatives from international
initiatives working to develop FAIR (Findable, Accessible, Interoperable, Reusable) criteria for
data repositories, portals, and scientific gateways to inform the INCF Infrastructure Committee’s
work on a common set of recommendations for the neuroscience community.
Introduction
Digital data services such as repositories and scientific gateways play an important role in the
archiving, management, analysis and sharing of research data. They provide stable, long-term
storage, can improve data quality through active curation, can increase the discoverability and
reusability of data through the use of controlled terms and standardized metadata, and make it
easier to request and transfer data, as well as removing or lowering barriers to data reuse and
collaboration. Besides hosting data and providing computational power, repositories and
scientific gateways are also important for supporting research reproducibility and replicability;
they can preserve data and computational research outcomes that might otherwise be lost or
become unfindable over time, and make it realistically possible to redo the same analyses or
computational experiments. Broadly, openly available data storage and computational resources
also have the possibility to become a driver for increasing diversity and equality in science, as
they help counteract differences in access to hardware, tools and resources.
The mass proliferation of repositories and scientific gateways in recent years has resulted in a
landscape that is quite diverse and varied with many different possible choices that often
overwhelms potential users. In response to the proliferation of digital data services, many
community-led organizations have issued guidelines for community-specific repositories.
Unfortunately, instead of providing the intended clarity (i.e. aiding users in selecting the
appropriate digital data service), the guidelines/recommendations have added to the confusion
since they vary between the different bodies issuing the guidelines/recommendations and
typically cover only repositories and not scientific gateways. Thus, there is a need for community
2
coordination among community-led organizations developing community-specific
guidelines/recommendations for digital data services.
At a recent RDA meeting, a session was held focusing on mapping the repository landscape
and trying to identify a common set of metadata descriptors. It gathered many initiatives that
had defined guides, criteria or recommendations to help users and stakeholders understand
what functionalities are offered and define good practices to assist repositories to evaluate and
1
“Transparency and Openness Promotion (TOP) Guidelines” https://osf.io/9f6gx/
3
improve current operations. The outcome was that different bodies were willing to work together
on selection criteria, but there was a divide between recommending features (to be adopted)
and best practices (guidelines for improvement).
It was observed that recommending repositories is a sensitive issue that needs flexibility and
warns that anyone making recommendations must make sure that the outcome fits researchers’
needs. The landscape of repositories is evolving, and proliferation of repositories should be
promoted with understanding and collaboration.
The working groups influenced a new version of DataCite's schema (v 4.1) that supports
software citation, and has encouraged using the CodeMeta framework and another citation
framework (citation.cff) for metadata to store with the repository.
The Software Citation working group identified a number of technical challenges to software
citation and concluded that software metadata is fundamental. There are also social challenges,
different groups need to come together and establish norms that actually will work for their
communities.
4
The FORCE11 Software Citation Implementation working group has developed a set of best
practices for research software registries and repositories [FORCE11 2020]. The working group
had representatives from 25 different registries and repositories.
A new consortium, SciCodes has been formed to coordinate adoption and to update the best
practices.
The group also worked with a set of publishers to create specific guidance for them to use with
their editors, authors, and reviewers, and many publishers are now sharing their guidance
through a common index [Katz 2021].
● Discoverability
● Access
5
● Reuse
● Integrity and authenticity
● Quality assurance
● Privacy of sensitive data
● Preservation
● Sustainability and governance
● Other characteristics, such as user support, usage information, scope documentation
and interoperability.
The framework will be updated yearly. It is intended to work as a common baseline that links
out to more comprehensive requirements relevant to different communities, content types,
funder policies et cetera.
COAR reports that the repository ecosystem is diverse and other aspects than FAIR might be
relevant to users, such as jurisdiction, sustainability or governance model. Therefore good
practices should be used to improve (existing) repository operations, not to exclude or disqualify
them from participating in the ecosystem. Too high a threshold would be harmful and slow down
the progress towards open science especially in developing countries.
International Neuroinformatics Coordinating Facility (INCF). The INCF criteria for selecting
neuroscience data repositories and scientific gateways are being developed by the INCF
Infrastructure Committee to help neuroscience researchers and students choose good services
for their specific use cases, and to help neuroinformatics providers make good decisions. The
work consists of a criteria checklist and ten high level recommendations.
Discussions
The discussion focused around the following 7 questions:
1. Do you have any feedback to the INCF Infrastructure Committee on the developing
criteria?
6
The draft criteria were positively received by the workshop’s participants and were seen
to address the major issues. It was recommended that INCF should focus on advising its
community members on how to get up to date and start using the criteria and identify
ways to help them do so.
2. Do you have any advice to INCF or to those who work with neuroscience repositories?
It was observed that there is a challenge in getting researchers to understand why they
are being asked to do more things and getting them to understand the value to the
community of doing those things. Neuroscience has many communities, at many
different levels of maturity, and it will be a challenge to reach all of them.
Feedb
3. Is there a way to look for the minimum of everything required? Is it the right way to think,
in those terms?
There was clear interest among the participants in coming together to collaborate on a
joint set of minimum requirements.
It was agreed that building coherence could help many communities. It was
recommended that efforts should focus on interoperability, transparency, and making
data available. Information sharing was seen as a good first step before doing work
downstream.
7
It was suggested that it could be useful to assemble a set of community challenges and
then having different repositories come together and talk about what they would do in
order to support those challenges.
Journals were identified as the first to be asked by researchers on which services to use.
For this reason, journals could help set and reinforce high level minimum standards that
then could be implemented by different communities according to their own needs.
Sustainability for digital services was seen as a complex issue. It was noted that
currently many services have grown out of time-limited projects or have short funding
cycles and that there is a need to change how open science infrastructure is funded.
Centrally funded infrastructure was considered key. Funding a few larger services was
seen to be less complex than funding many small ones. Institutions were seen to have
an extremely important role since they can repurpose some funding towards institutional
repositories which are critical for fields that lack community repositories. The Open
Library of the Humanities was recommended as an interesting model to look at for
sustainability and institutional funding.
Curation was seen to be very important for sustainability, but often not funded and hard
to get funding for. In addition, curators are hard to retain since they are not paid well.
COAR (the Coalition of Open Access Repositories) is discussing how to set up shared
curation models. The Data Curation Network might work as inspiration.
It was noted that increasingly, very large datasets require computational resources to be
provided in the same service. Institutions therefore may need to contribute compute
resources in order for researcher’s data to be available and creditable. What should
institutional policies be, should they let people outside of the institution use
8
computational resources? One solution could be to have communities develop main
specific repositories on top of shared infrastructures, just fielding the costs for curation
rather than building and maintaining the infrastructure. Communities are moving towards
this model, but funding is still an unsolved issue. National or regional infrastructures can
help.
Recommendations to INCF
1. INCF should embark on a campaign to promote the adoption of its guidelines by the
neuroscience community. Particular focus should be spent on advising the community on
how to get up to date and start using the guidelines and identify ways to help the
community to use the guidelines. Since the neuroscience community is large, diverse,
and at many different stages of data sharing maturity, it was recommended that INCF
focus its efforts on a limited number of subcommunities instead of trying to reach them
all. In addition, INCF should engage in targeted outreach to the publishing community
since the participants agreed that they are most likely the first place where researchers
will reach out for advice on which services to use. For this reason, publishers could help
set and reinforce high level minimum standards that then could be implemented by
different communities according to their own needs.
2. INCF should organize a follow-up workshop that is dedicated to defining a minimum set
of requirements/guidelines for digital data services that includes the current set of
participants as well as other interested organizations.
Next steps
As a first step, the INCF Infrastructure Committee plans to submit the INCF Recommendations
for repositories and scientific gateways for publication and establish a web presence on the
INCF portal for the recommendations that links to FAIR data sharing training modules on the
INCF TrainingSpace.
9
References
COAR (Coalition of Open Access Repositories) Community Framework for Good Practices in
Repositories,
https://www.coar-repositories.org/coar-community-framework-for-good-practices-in-repositories/
V1, October 8, 2020
FORCE11 Task Force on Best Practices for Software Registries Practices for Research
Software Registries and Repositories: A Concise Guide 2020-12-23
https://arxiv.org/pdf/2012.13117.pdf
Daniel S. Katz, Neil P. Chue Hong, Tim Clark, August Muench, Shelley Stall, Daina Bouquin,
Matthew Cannon, Scott Edmunds, Telli Faez, Patricia Feeney, Martin Fenner, Michael
Friedman, Gerry Grenier, Melissa Harrison, Joerg Heber, Adam Leary, Catriona MacCallum,
Hollydawn Murray, Erika Pastrana, Katherine Perry, Douglas Schuster, Martina Stockhause,
Jake Yeston
Recognizing the value of software: a software citation guide v2
F1000Research 2021, 9:1257
https://doi.org/10.12688/f1000research.26932.2
Susanna-Assunta Sansone, Peter McQuilton, Helena Cousijn, Matthew Cannon, Wei Mun
Chan, Ilaria Carnevale, Imogen Cranston, Scott Edmunds, Nicholas Everitt), Emma Ganley,
Chris Graf, Iain Hrynaszkiewicz, Varsha K. Khodiyar, Thomas Lemberger, Catriona J.
MacCallum, Kiera McNeice, Hollydawn Murray Philippe Rocca-Serra, Kathryn Sharples, Marina
Soares E Silva, Jonathan Threlfall
Data Repository Selection: Criteria That Matter 2019-10-28 https://osf.io/m2bce/
10