Professional Documents
Culture Documents
Ares(2016)1344931 - 17/03/2016
Introduction
As a systematist, you are likely to examine, sequence or analyze a number of specimens
in the course of your studies. Those specimens can be on loan from a natural history
collection, or they can be in a collection you build yourself. In either case, you will have
to manage information about the specimens and link that information to the results of
your studies, which may be presented in papers or in various online databases. What is
the best way of approaching these information management tasks?
Most systematists today keep some kind of personal collection database. It is often built
from scratch using commercial database software like FileMaker Pro or Microsoft
Access. There are also dedicated software packages like Specify 1 and Biota 2, which allow
you to manage collections data, and VoSeq 3, which is focused on handling DNA sequence
and voucher information.
Unfortunately, keeping a personal collection database has the effect of building
information silos that are not easily connected. For instance, if you extract and sequence
DNA from some small part of a specimen from a natural history museum, you are likely
to create your own identifier for the voucher specimen, and this is the number that is
likely to end up with the sequence submission record. However, once you have returned
the specimen to the natural history museum where it belongs, it is unlikely that it will be
possible to link any additional information that might become available about the
specimen in the future through your identifier. Most natural history museums simply do
not have systems in place for routinely assigning globally unique identifiers (GUIDs) to
specimens, which make it possible to find information about them online. Many
entomology collections do not even keep specimen-level databases today, except
possibly for type specimens.
1 http://specifysoftware.org/
2 http://viceroy.eeb.uconn.edu/Biota/
3 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0039071
4 http://hbs.bishopmuseum.org/codens/codens-inst.html
maintaining GRBio5, the global registry of biodiversity repositories. Unfortunately, the
list is partly outdated and contains a number of obvious problems. It does not appear
that the list is very actively maintained.
Previously, there has been a movement to assign life science identifiers (LSIDs) as GUIDs
for biodiversity information records, such as specimen records. However, many experts
are now abandoning LSIDs in favor of plain old uniform resource identifiers (URIs), that
is, internet addresses. Some institutions will already now be able to provide permanent
URIs for their collection objects, often including the institutional address, a globally
recognized acronym for the collection, and a unique catalog number for the object.
However, these are still early days in the adoption of such URI schemes, and many
institutions are still working on this, or have not even started.
In conclusion, there is currently no way for you to assemble information about
specimens in natural history collections and be sure you can link this information online
through a GUID to other data associated with that specimen. The best you can do is to
ask each natural history museum you are borrowing specimens from for a GUID,
preferably a URI, for each specimen they send to you for loan or allow you to study. Then
use that GUID in your own database, and cite it in any online references you make to that
specimen in sequence databases or other online repositories to which you upload
results from your studies.
5
In recent years, this situation has started to change as natural history museums and
similar institutions have become increasingly aware of the value of digital assets and the
need for professional information management. Recent trends like the push for open
science, shared public data, the semantic web, and linked open data have accelerated
this development. This has led to a movement towards a coherent information
management strategy and a central institutional collection management system in most
places.
In the choice of a central system, an organization can opt to: (1) acquire a commercial
system (EMu being the major system used currently by large natural history museums);
(2) develop a system in-house; or (3) join other institutions in distributed open-source
development. There are many reasons suggesting that the third choice is going to be the
most flexible and cost-efficient solution in the long term (see separate PowerPoint
presentation).
The DINA 6 consortium is currently the largest initiative for producing a web-based
collection management system through distributed open-source development. The
consortium currently includes six organizations in six different countries, four of which
contribute actively to the development. Two of the BIG 4 institutions are among the core
members of the DINA consortium: the Swedish Museum of Natural History and the
Natural History Museum of Denmark.
DINA is based on the Specify data model. A hybrid DINA-Specify system, relying on the
Specify 6 Java client for core collection management tasks, is available from the DINA
team at the Swedish Museum of Natural History (contact Markus Skyttner
markus.skyttnar@nrm.se). The hybrid system has been in production at the Swedish
Museum of Natural History since 2011, when the first components were installed.
A fully web-based DINA version is not expected to be available until 2018 according to
the current DINA roadmap. Functionality specifically tailored for researchers is not
currently on the roadmap, but the consortium already now provides API specifications
that you can use to develop your own research client to the DINA-web database
backend. If you are interested in exploring this, contact Markus Skyttner (e-mail above)
for more information on how to install and run a DINA backend that you can use to
communicate with the front-end client you develop. You can run the entire DINA system
on your laptop, and you can share your front-end client with all other institutional and
individual DINA users through the DINA consortium and their github repository if you
like.
If you like the DINA approach and your institution is not a member of the DINA
consortium, you can ask the decision makers at your institution to consider the
possibility of joining the DINA initiative. An easy way of preparing yourself for a future
transition to the DINA system is to use Specify or a Specify-compatible data model for
your own private collection database.
Separately, you will find a PowerPoint presentation that gives you an introduction to the
DINA project, with pointers to web sites where you can find more information.
6 http://dina-project.net
Introduction to the DINA
project
Fredrik Ronquist
Dept. Bioinformatics and Genetics
Swedish Museum of Natural History
Collection Management Systems
Institutional Choices:
1. Develop your own system in-house
2. Acquire a commercial system (e.g., EMu)
3. Partner with other institutions in distributed open-
source development (e.g., DINA project)
The Case For Open Source
Market considerations. Professional collection management systems not
viable commercial products in a pluralistic market.
Long-term stability. An open-source software solution developed by
institutions with long-term focus will be more stable than a commercial
solution.
Flexibility. A distributed open-source system must by necessity conform to a
modular design based on open API:s. This favors flexibility and adaptability
in a way that a commercial product will not.
Cost effectiveness. Although some overhead is associated with distributed
development, more development teams involved in the effort will result in a
lower cost to the individual institution compared to in-house or commercial
solutions.
The Case For Open Source (cont’d)
Opt-in opt-out scheme. Institutions can participate in the development
when they have resources to do so, and can opt out when they do not. At any
single point in time, it should be feasible to have enough institutions involved
for development to move forward at an acceptable pace.
Community Control. A distributed open-source solution means that the
community retains control over both the information standards and the
system architecture and web service/API designs.
Egalitarian. A professional open-source collection management system
offers a better way for developing countries to catch up than any commercial
product.
Stable marketplace for extensions and services. A community-supported
de-facto standard for collection management systems architecture will ensure
that there is a stable market for various plugins, extensions and services based
on the system.
EMu: The major commercial collection management system used in natural
history museums
Axiell group, owned by Swedish venture capitalists, recently acquired the company
behind EMu. Lack of competition = profit.
The Natural History Museum in UK, one of several major natural history museums
currently running EMu. They have given Axiell 12 months to solve a number of
serious issues with the system; in parallel, open-source options are being reviewed.
Koha – Origin New Zealand, now 15 % of market share for Library Mgmt Systems
Atlas of Living Australia – Origin Australia (284 M SEK initial investment),
the world’s most complete system for integration, analysis and visualization
of biodiversity data, now 70+ developers around the world, running or being installed
in many countries in Europe and South America in addition to Australia
DINA Consortium
(Digital Information system for NAtural history data)
Core mission. Pool resources to develop an open-source web-based
collection management system for natural history collections.
Core Member. Required contribution 1.0 FTE to the project, of which at
least 0.5 to the development effort. Voting member of the DINA Technical
Committee (TC), which controls deliverables and deadlines for the 1.0 FTE
contribution.
Associate Member. No contribution requirements. Non-voting member of
the Steering Group.
Steering Group
(All Members)
Technical
Committee
Task Force I Task Force II
(Core
Members)
Specify 6
”thick client“
Java client
DINA Web System Overview
Biodiversit DNA
Collection Collection Species
y survey barcode
Manager web portal portal pages
client
Collection-
Media
related Media files BLAST DB Taxon info
metadata
databases
Current Specify 6 client (Java stand-alone). Old technology, old-style interface.