You are on page 1of 15

Dutch Open Science

Dashboard 2020-2021

Peter Doorn
@DANSKNAW @PKDoorn
Draft 1, 11 August 2021

dans.knaw.nl
DANS is an institute of KNAW and NWO
Fig. 1. Sources & Definitions: Digital Research Infrastructure Components according to NWO,
Re3Data, MERIL, ESFRI and LCRDM

LCRDM Facilities (2021) 35

LCRDM Facilities (2017) 50

ESFRI Research Infrastructures (NWO 2021) 39

ESFRI Research Infrastructures (2018) 32

MERIL Research infrastructures (2019) 57

Re3Data Repositories (2021) 62

Re3Data Repositories (2020) 56

NWO Large-Scale Scientific Infrastructures (2021) 25 99

NWO Large-Scale Scientific Infrastructures (2018) 38 75

0 20 40 60 80 100 120 140

Data-oriented Other facilities Not specified


1. Sources & Definitions: Digital Research Infrastructure components according to
NWO, Re3Data, MERIL, ESFRI and LCRDM

Different registers and overviews define digital research infrastructure differently.


• NWO updated its 2018-list of “large-scale scientific infrastructures”, also called
“facilities” in 2021. Many of them qualify as equipment rather than as data facilities.
• Re3Data is an international registry of “repositories”, which include information systems
and databanks of various nature
• MERIL and ESFRI provide overviews of research infrastructures:
• MERIL is no longer maintained since 2019 and is being succeeded by CATRIS, which is however
incomplete
• ESFRI maintains a Roadmap, distinguishing R.I. “projects” and “landmarks”
• the overview of LCRDM “RDM facilities” was updated last year. The 2017 list was rather
heterogeneous, including repositories, training courses, RDM services, etc. From the
2020/21 update we selected “repositories” and “infrastructures”.
Conclusion: the figures presented depend on what is measured and how. Keep in mind that
registrations are seldom complete, and get quickly outdated. Still, we can safely assume the
Dutch data infrastructure consist of > 100 components of some substance and recognition.
Fig. 2. Research facilities/data repositories/infrastructures by discipline according to
NWO, Re3Data and MERIL
100%
9 6
10
14 8 14 16
90%

10
80%
13 13
57
6 - Humanities
70%
55
5 - Social Sciences
23 3 - Medical & Health Sciences (incl.
60% bio/life sciences, ecology)
2 + 4 - Engineering & Technology +
24 27 Agriculture (incl. computer science)
50%
48
1 - Natural Sciences (incl. astronomy,
11 earth science)
40%

29 12 12
30%

20% 34
69
22 24
10% 26

0%
NWO (2018) NWO (2021) Re3data (2021) Re3data (2021) MERIL (2019)
2. Research facilities/data repositories/infrastructures by
discipline according to NWO, Re3Data, MERIL

Three sources make it possible to compare facilities according to


discipline. The distribution over disciplines depends on the
definitions used.
• The 2018 NWO-list provides multiple disciplines, in 2021 we
classified them according to the primary discipline. No matter
the classifciation, the humanities and social sciences (HSS) are
strongly underrepresented in both 2018 and 2021 (12%).
• According to Re3Data, the repositories are more evenly spread
across scientific fields.
• In terms of MERIL’s general RIs, the share of the HSS is in
between the figures of NWO and Re3Data (22%).
Fig. 3. Certified Repositories 2020-21
50
45 44
40
40
35
30
25
20
15 12
10 7 8
5
5 3 3
1 1 1 1
0
CoreTrustSeal Data Seal of CLARIN DIN 31644  RatSWD  Repositories
Approval certificate B  without
certificate

2020 2021
3. Certified data repositories for Long Term Preservation

In 2020, 16 out of 56 data repositories in The Netherlands complied


with some certificate for trustworthy long-term preservation; in
2021, the numbers increased to 18 out of 62.
The total number of certificates grew from 20 to 22 (4 repositories
have more than one certificate). The Data Seal of Approval (DSA) is
gradually superseded by the Core Trust Seal (CTS). 17 repositories
comply with DSA or CTS in 2021 (in 2020: 15). These Seals make
sure data is preserved and shared in a FAIR way.
Fig. 4. PID systems used in repositories, 2020-21

30
26
25
23

20
18

15
13 13
12
10

5
5 4
3
1
0
DOI  Handle URN  Other (PURL, ARK) No PID

2020 2021
4. Use of Persistent Identifiers (PID) in data repositories

The use of persistent identifiers is an important element of FAIR


data. In 2020, 54% of the Dutch data repositories supplied PIDs,
mostly Handle (23%) or DOI (21%), sometimes a URN (8%).
Over the past year, the numbers improved: now, 63% supplies a
PID, and especially the share of DOI grew (29%), while Handle
remained stable.
Fig. 5. Numbers of datasets (logarithm) in data repositories

1000000
180464
134182
100000
7870
10000
1803
1000 696 740 713 470 805 820
331 322 279 226 255
169 172 185 155
79 65
100 35 27
10
Logarthm of Datasets

10

07/09/2020 10/08/2021
5. Numbers of datasets in data repositories

Although there is no complete overview of the research data in all Dutch repositories,
NARCIS aggregates the information of research data sets from 23 sources. With some
minor manual modifications and additions, the total number of datasets in 25 recorded
repositories amounted to 425,435 data sets on 7/9/2020 and 465,055 on 10/8/2021.

let alone of data that are not stored in a repository.


Moreover, the types of the data units (files, sets, collections) vary over (and within)
repositories. One data set may consist of just one file, or of thousands of files. One file
may be a small table or consist of millions of records. Such differences are partially
dependent upon the discipline or community. E.g., TLA-MPI registers individual language
files or bundles, that are parts of projects, archives or collections.
More research into what data are stored under controlled conditions in repositories, and
what data are not, is urgently required.
6. Access licenses and access restrictions applicable in data repositories

Other restricti
ons
23

Membership  Open 
1 45
Access fee
5

Registration 
29
Closed 
Embargoed  13
11
7. Metadata standards used by data repositories

None
provided
9
Dublin Core 
Other 16
10
DDI - Data
RDF Data Documenta
Cube tion
Vocabulary  Initiative 
3 7
ISO 19115  DataCite
5 Metadata
Schema  - 6
8. Open Access in 49 Academic and Higher Education publication repositories, 2016-
2020
(total number of publications in brackets)

100% 0% 0% 0% 1% 2%
90% 24% 17% 18%
28% 25%
80%
70% 21% 19%
17% Embargoed
24% Closed
60% 30% Restricted
50% Open
40%
30% 59% 60% 62%
51%
20% 43%

10%
0%
2016 (94786) 2017 (101622) 2018 (109840) 2019 (106267) 2020 (25939
on June 5)
Sources

Sources: 1: a, b, c, d, e 2: a, b, c 3, 4, 6, 7: b 5: f 8: g
a. NWO Large-Scale Scientific Infrastructures (2018): http://www.onderzoeksfaciliteiten.nl
b. Re3Data Repositories (2020): https://www.re3data.org/search?query=&countries[]=NLD
c. MERIL Research infrastructures (2019): https://portal.meril.eu/meril/
d. ESFRI Research Infrastructures (2018):
http://roadmap2018.esfri.eu/media/1049/roadmap18-part3.pdf
e. LCRDM Facilities (2017): https://www.lcrdm.nl/en/rdm-in-the-netherlands
f. DANS NARCIS (5 June 2020): https://www.narcis.nl/search/coll/dataset/Language/en
(corrected; some data holders added manually).
g. DANS NARCIS (5 June 2020): https://www.narcis.nl/search/coll/publication/Language/en
 
For a more extensive report see: Landscaping Country Report The Netherlands (final draft).
EOSC-Synergy 14-6-2020. https://doi.org/10.17026/dans-2by-ereu.
 

You might also like