You are on page 1of 10

Towards neuroscience-centered selection criteria

for data repositories and scientific gateways

Authors
Malin Sandström, Mathew Abrams

Workshop organizer
INCF Infrastructure Committee

Workshop participants
Jan Bjaalie, University of Oslo & EBRAINS (chair)
Mathew Abrams, INCF Secretariat
Alice Allen
Matt Cannon, Taylor & Francis
Satrajit Ghosh, Massachusetts Institute of Technology
Wojtek Gościński, Monash University
Chris Graf, Wiley
Nick Guenther
Daniel S. Katz, University of Illinois
Dawei Lin, NIH
Maryann Martone, University of California, San Diego
Mustapha Mokrane, Core Trust Seal
Tom Morrell, California Institute of Technology
Malin Sandström, INCF Secretariat
Susanna-Assunta Sansone, FAIRsharing.org
Sirarat Sarntivijai, ELIXIR
Kathleen Shearer, Confederation of Open Access Repositories
Jerry Sheehan, NIH
Marie Timmerman, Science Europe

Contents
Executive summary 2
Introduction 2
Current Landscape of guidelines/recommendations for digital data services 3
RDA/FAIRSharing 3
FORCE11 Software Citation 5
INCF Infrastructure Committee 6
COAR 7
Summary of workshop discussions 8
Recommendations to INCF 9
Executive summary
FAIR requires that the necessary infrastructure in the form of web-accessible repositories is
available to neuroscientists for publishing research objects such as data, code, and workflows.
The objective of this workshop was to bring together representatives from international
initiatives working to develop FAIR (Findable, Accessible, Interoperable, Reusable) criteria for
data repositories, portals, and scientific gateways to inform the INCF Infrastructure Committee’s
work on a common set of recommendations for the neuroscience community.

Presentations on current initiatives were held by representatives from RDA, FAIRSharing,


FORCE11 and COAR, followed by open discussion. The workshop identified a number of
challenges but also an opportunity: there was an interest in collaboration between fields, on a
minimal set of requirements that could be accepted as a common standard to move repositories
toward. Challenges identified were the many differing recommendations from different bodies,
how to help repositories move towards adoption, how to ensure sustainability, and the issue of
funding compute resources needed to share very large data sets. It was recommended that
INCF identify ways to assist its community in using the criteria, and suggested that INCF could
help repositories to develop roadmaps towards adoption and implementation of the criteria.

Introduction
Digital data services such as repositories and scientific gateways play an important role in the
archiving, management, analysis and sharing of research data. They provide stable, long-term
storage, can improve data quality through active curation, can increase the discoverability and
reusability of data through the use of controlled terms and standardized metadata, and make it
easier to request and transfer data, as well as removing or lowering barriers to data reuse and
collaboration. Besides hosting data and providing computational power, repositories and
scientific gateways are also important for supporting research reproducibility and replicability;
they can preserve data and computational research outcomes that might otherwise be lost or
become unfindable over time, and make it realistically possible to redo the same analyses or
computational experiments. Broadly, openly available data storage and computational resources
also have the possibility to become a driver for increasing diversity and equality in science, as
they help counteract differences in access to hardware, tools and resources.

The mass proliferation of repositories and scientific gateways in recent years has resulted in a
landscape that is quite diverse and varied with many different possible choices that often
overwhelms potential users. In response to the proliferation of digital data services, many
community-led organizations have issued guidelines for community-specific repositories.
Unfortunately, instead of providing the intended clarity (i.e. aiding users in selecting the
appropriate digital data service), the guidelines/recommendations have added to the confusion
since they vary between the different bodies issuing the guidelines/recommendations and
typically cover only repositories and not scientific gateways. Thus, there is a need for community

2
coordination among community-led organizations developing community-specific
guidelines/recommendations for digital data services.

In an effort to provide the neuroscience community with guidelines/recommendations for digital


data service, the Infrastructure Committee of the International Neuroscience Coordinating
Facility (INCF) developed a selection criteria and associated recommendations with a FAIR
neuroscience perspective, and attempted to harmonize them with existing work on criteria for
repository selection and best practices from FAIRsharing [Sansone et al 2019], FORCE11
[FORCE11 2020] and Coalition of Open Access Repositories [COAR 2020]. The objective of this
workshop was to present the INCF guidelines/recommendations to the neuroscience
community-at-large and key actors from other organizations that have developed similar
guidelines/recommendations to ensure that the INCF guidelines are an inclusive set of
recommendations that can apply to a wide and diverse range of digital services, both
repositories and science gateways, for data as well as software.

Current Landscape of guidelines/recommendations for


digital data services

FAIRSharing.org. The FAIRSharing initiative has delivered a registry of FAIR repository


services and a set of recommendations related to FAIRness of repositories. The registry guides
users to discover, select and use resources and helps service owners to make their resources
visible, more widely adopted and cited. It has information on a wide range of community
standards, and can be used to search for repositories which use certain standards. Policy
makers can use FAIRSharing to list repositories that use their recommended data policies and
standards, and to monitor the evolution of the landscape. Other data management services, like
automated FAIR checking and data stewards, can use FAIRsharing as a Lookup and select
service for standards and repositories.

The joint Research Data Alliance (RDA)-Future of Research Communications and


e-Scholarship (FORCE11)-FAIRSharing registry working group. This joint working group is
mapping repository features across initiatives and aligning data policies from funders and
publishers, focusing on data availability statements and data deposit requirements. FAIRSharing
will offer harmonized policy templates that will be used iteratively, with community feedback. The
FAIRsharing service allows viewing of current policy implementations and comparison of
different policies. It uses the TOP (Transparency and Openness Promotion 1) guidelines for data
transparency as one of the comparison criteria.

At a recent RDA meeting, a session was held focusing on mapping the repository landscape
and trying to identify a common set of metadata descriptors. It gathered many initiatives that
had defined guides, criteria or recommendations to help users and stakeholders understand
what functionalities are offered and define good practices to assist repositories to evaluate and

1
“Transparency and Openness Promotion (TOP) Guidelines” https://osf.io/9f6gx/

3
improve current operations. The outcome was that different bodies were willing to work together
on selection criteria, but there was a divide between recommending features (to be adopted)
and best practices (guidelines for improvement).

It was observed that recommending repositories is a sensitive issue that needs flexibility and
warns that anyone making recommendations must make sure that the outcome fits researchers’
needs. The landscape of repositories is evolving, and proliferation of repositories should be
promoted with understanding and collaboration.

FORCE11 (Future of Research Communications and e-Scholarship). The FORCE11 work


on software citation began with the FORCE11 Software Citation working group in 2015. A
second working group was started in 2017 to focus on implementation of software citation and
collaboration with publishers, conferences, repositories, indexers and funders. The working
groups aim to improve credit for software developers and maintainers, to incentivise better and
more sustainable software, and to gather support for reproducibility. Several task forces have
been formed for implementation, guidance, CodeMeta, software registries, and journals. The
initial group published two preprints, on the differences between citing data and software
(PeerJ) and on the challenges of software citation (ArXiv), and a paper with software citation
principles.

The working groups influenced a new version of DataCite's schema (v 4.1) that supports
software citation, and has encouraged using the CodeMeta framework and another citation
framework (citation.cff) for metadata to store with the repository.

Software citation principles

1. Importance - software should be considered a legitimate and citable product of


research
2. Credit and attribution - software citations should facilitate giving credit and attribution
3. Unique Identification - software citation should include a method for identification that
is machine actionable, globally unique & interoperable
4. Persistence - unique identifiers and metadata describing the software and its
disposition should persist
5. Accessibility - software citations should facilitate access to the software itself and to its
associated metadata, documentation, data, and other materials
6. Specificity - software citations should facilitate identification of, and access to, the
specific version of software that was used.

The Software Citation working group identified a number of technical challenges to software
citation and concluded that software metadata is fundamental. There are also social challenges,
different groups need to come together and establish norms that actually will work for their
communities.

4
The FORCE11 Software Citation Implementation working group has developed a set of best
practices for research software registries and repositories [FORCE11 2020]. The working group
had representatives from 25 different registries and repositories.

Nine best practices for software registries and repositories

1. Provide a public scope statement


2. Provide guidance for users
3. Provide guidance to software contributors
4. Establish an authorship policy
5. Share your metadata schema - provide documentation for metadata
6. Stipulate conditions of use - including citation guidelines, licenses, copyright,
disclaimers
7. State a privacy policy - what data is collected, how is it used, how long is it retained?
8. Provide a retention policy - beyond the end of your grant; what happens to metadata?
When is information removed?
9. Disclose your end-of-life policy

A new consortium, SciCodes has been formed to coordinate adoption and to update the best
practices.

The group also worked with a set of publishers to create specific guidance for them to use with
their editors, authors, and reviewers, and many publishers are now sharing their guidance
through a common index [Katz 2021].

Coalition of Open Access Repositories (COAR). COAR is a global network of open


repositories, representing among others libraries, universities, research institutions, and
government funders. They engage directly with members, but also with national and regional
networks of open repositories to build capacity, align policies and practices, and act as a global
voice for the open repository community. COAR reported that its community is confused and
overwhelmed by the plurality of existing requirements, recommendations and frameworks for
repositories. Therefore COAR developed a community framework; a global, multidimensional
framework for assessing good practices. An internal working group was set up to review and
assess existing recommendations and requirements. A draft was published in October 2020 and
is available in 8 languages.

The COAR framework consists of essential and desired characteristics, connected to


specific objectives:

● Discoverability
● Access

5
● Reuse
● Integrity and authenticity
● Quality assurance
● Privacy of sensitive data
● Preservation
● Sustainability and governance
● Other characteristics, such as user support, usage information, scope documentation
and interoperability.

The framework will be updated yearly. It is intended to work as a common baseline that links
out to more comprehensive requirements relevant to different communities, content types,
funder policies et cetera.

COAR reports that the repository ecosystem is diverse and other aspects than FAIR might be
relevant to users, such as jurisdiction, sustainability or governance model. Therefore good
practices should be used to improve (existing) repository operations, not to exclude or disqualify
them from participating in the ecosystem. Too high a threshold would be harmful and slow down
the progress towards open science especially in developing countries.

International Neuroinformatics Coordinating Facility (INCF). The INCF criteria for selecting
neuroscience data repositories and scientific gateways are being developed by the INCF
Infrastructure Committee to help neuroscience researchers and students choose good services
for their specific use cases, and to help neuroinformatics providers make good decisions. The
work consists of a criteria checklist and ten high level recommendations.

INCF Recommendations for repositories and scientific gateways

● Ensure findability and transparency in ownership & usage


● Clearly communicate access conditions
● Follow best practices for licensing & responsibility
● Involve community in governance & decision making
● Build in capabilities for reproducibility, replicability, reuse
● Be transparent on accessibility - financial & technical
● Be transparent on sustainability - financial & technical
● Excel in documentation & user support
● Consider ethics requirements for sensitive data and authorship transparency

Discussions
The discussion focused around the following 7 questions:

1. Do you have any feedback to the INCF Infrastructure Committee on the developing
criteria?

6
The draft criteria were positively received by the workshop’s participants and were seen
to address the major issues. It was recommended that INCF should focus on advising its
community members on how to get up to date and start using the criteria and identify
ways to help them do so.

2. Do you have any advice to INCF or to those who work with neuroscience repositories?

The participants agreed that recommendations of best practices should be aspirational,


a driver to improve existing repository operations, rather than a way to exclude them. It
should be recognized that getting existing repositories to the level that they can use the
recommendations will require extra resources on their part. It was agreed that it is
important to work closely with repositories from the start and recognize they may have
resource challenges in meeting criteria or recommendations. It was recommended that
repositories should receive assistance to develop a roadmap that tracks: i. where are
they now in terms of functionality and ii. how can they move towards supporting the
requirements.

It was noted that repository recommendation initiatives, in general, have much in


common but differ in granularity and focus. It was also noted that the many conflicting
recommendations that characterize the current landscape might be too complex for
service users to handle; however, the participants also noted that the current target
group mainly consists of people working in the field of repositories rather than general
researchers and that these people have the knowledge to handle the current complexity.

It was observed that there is a challenge in getting researchers to understand why they
are being asked to do more things and getting them to understand the value to the
community of doing those things. Neuroscience has many communities, at many
different levels of maturity, and it will be a challenge to reach all of them.
Feedb
3. Is there a way to look for the minimum of everything required? Is it the right way to think,
in those terms?

There was clear interest among the participants in coming together to collaborate on a
joint set of minimum requirements.

4. Can a minimum set of recommendations be standardized across disciplines?

It was agreed that building coherence could help many communities. It was
recommended that efforts should focus on interoperability, transparency, and making
data available. Information sharing was seen as a good first step before doing work
downstream.

7
It was suggested that it could be useful to assemble a set of community challenges and
then having different repositories come together and talk about what they would do in
order to support those challenges.

5. Is there a particular role for journals?

Journals were identified as the first to be asked by researchers on which services to use.
For this reason, journals could help set and reinforce high level minimum standards that
then could be implemented by different communities according to their own needs.

6. Can you comment on sustainability?

Sustainability for digital services was seen as a complex issue. It was noted that
currently many services have grown out of time-limited projects or have short funding
cycles and that there is a need to change how open science infrastructure is funded.
Centrally funded infrastructure was considered key. Funding a few larger services was
seen to be less complex than funding many small ones. Institutions were seen to have
an extremely important role since they can repurpose some funding towards institutional
repositories which are critical for fields that lack community repositories. The Open
Library of the Humanities was recommended as an interesting model to look at for
sustainability and institutional funding.

Software sustainability was observed to be harder than data sustainability because


software, unlike data, will stop working at some point unless people are actively
maintaining it. Simply putting software in a repository and thinking that people in five
years will be able to use it is unrealistic. To incentivize people to do this kind of work
(active maintenance), credit mechanisms are needed. Citation should be for credit rather
than for reproducibility.

Curation was seen to be very important for sustainability, but often not funded and hard
to get funding for. In addition, curators are hard to retain since they are not paid well.
COAR (the Coalition of Open Access Repositories) is discussing how to set up shared
curation models. The Data Curation Network might work as inspiration.

7. In the context of neuroscience, we are looking at petabytes of data coming into


existence over the next five years, coming from individual research projects.
Repositories at present are focused on storing and archiving data, not thinking about
computation; how can we bring these components together? How should institutions
approach that?

It was noted that increasingly, very large datasets require computational resources to be
provided in the same service. Institutions therefore may need to contribute compute
resources in order for researcher’s data to be available and creditable. What should
institutional policies be, should they let people outside of the institution use

8
computational resources? One solution could be to have communities develop main
specific repositories on top of shared infrastructures, just fielding the costs for curation
rather than building and maintaining the infrastructure. Communities are moving towards
this model, but funding is still an unsolved issue. National or regional infrastructures can
help.

Recommendations to INCF
1. INCF should embark on a campaign to promote the adoption of its guidelines by the
neuroscience community. Particular focus should be spent on advising the community on
how to get up to date and start using the guidelines and identify ways to help the
community to use the guidelines. Since the neuroscience community is large, diverse,
and at many different stages of data sharing maturity, it was recommended that INCF
focus its efforts on a limited number of subcommunities instead of trying to reach them
all. In addition, INCF should engage in targeted outreach to the publishing community
since the participants agreed that they are most likely the first place where researchers
will reach out for advice on which services to use. For this reason, publishers could help
set and reinforce high level minimum standards that then could be implemented by
different communities according to their own needs.

2. INCF should organize a follow-up workshop that is dedicated to defining a minimum set
of requirements/guidelines for digital data services that includes the current set of
participants as well as other interested organizations.

Next steps
As a first step, the INCF Infrastructure Committee plans to submit the INCF Recommendations
for repositories and scientific gateways for publication and establish a web presence on the
INCF portal for the recommendations that links to FAIR data sharing training modules on the
INCF TrainingSpace.

9
References
COAR (Coalition of Open Access Repositories) Community Framework for Good Practices in
Repositories,
https://www.coar-repositories.org/coar-community-framework-for-good-practices-in-repositories/
V1, October 8, 2020

FORCE11 Task Force on Best Practices for Software Registries Practices for Research
Software Registries and Repositories: A Concise Guide 2020-12-23
https://arxiv.org/pdf/2012.13117.pdf

Daniel S. Katz, Neil P. Chue Hong, Tim Clark, August Muench, Shelley Stall, Daina Bouquin,
Matthew Cannon, Scott Edmunds, Telli Faez, Patricia Feeney, Martin Fenner, Michael
Friedman, Gerry Grenier, Melissa Harrison, Joerg Heber, Adam Leary, Catriona MacCallum,
Hollydawn Murray, Erika Pastrana, Katherine Perry, Douglas Schuster, Martina Stockhause,
Jake Yeston
Recognizing the value of software: a software citation guide v2
F1000Research 2021, 9:1257
https://doi.org/10.12688/f1000research.26932.2

Susanna-Assunta Sansone, Peter McQuilton, Helena Cousijn, Matthew Cannon, Wei Mun
Chan, Ilaria Carnevale, Imogen Cranston, Scott Edmunds, Nicholas Everitt), Emma Ganley,
Chris Graf, Iain Hrynaszkiewicz, Varsha K. Khodiyar, Thomas Lemberger, Catriona J.
MacCallum, Kiera McNeice, Hollydawn Murray Philippe Rocca-Serra, Kathryn Sharples, Marina
Soares E Silva, Jonathan Threlfall
Data Repository Selection: Criteria That Matter 2019-10-28 https://osf.io/m2bce/

10

You might also like