BioScience 2015 Ellwood - Etal

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/274371684
Accelerating the Digitization of Biodiversity Research Specimens through

Online Public Participation
Article in BioScience · February 2015

DOI: 10.1093/biosci/biv005
CITATIONS READS
49 330
12 authors, including:
Elizabeth R Ellwood Betty A. Dunckel

Natural History Museum of Los Angeles County University of Florida
51 PUBLICATIONS 785 CITATIONS 10 PUBLICATIONS 123 CITATIONS
SEE PROFILE SEE PROFILE
Guralnick Robert Gil Nelson

University of Florida Florida State University
261 PUBLICATIONS 7,010 CITATIONS 31 PUBLICATIONS 326 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Advancing and Mobilizing Citizen Science Data through an Integrated Sustainable Cyber-Infrastructure View project
Worldwide Engagement for Digitizing Biocollections (WeDigBio) View project
All content following this page was uploaded by Elizabeth R Ellwood on 26 June 2015.
The user has requested enhancement of the downloaded file.

BioScience Advance Access published February 25, 2015
Overview Articles
Accelerating the Digitization of

Biodiversity Research Specimens
through Online Public Participation
ELIZABETH R. ELLWOOD, BETTY A. DUNCKEL, PAUL FLEMONS, ROBERT GURALNICK, GIL NELSON,
GREG NEWMAN, SARAH NEWMAN, DEBORAH PAUL, GREG RICCARDI, NELSON RIOS, KATJA C. SELTMANN,
Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

AND AUSTIN R. MAST
A goal of the biodiversity research community is to digitize the majority of the one billion specimens in US collections by 2020. Meeting
this ambitious goal requires increased collaboration and technological innovation and broader engagement beyond the walls of universities
and museums. Engaging the public in digitization promises to both serve the digitizing institutions and further the public understanding
of biodiversity science. We discuss three broad areas accessible to public participants that will accelerate research progress: label and ledger
transcription, georeferencing from locality descriptions, and specimen annotation from images. We illustrate each activity, compare useful
tools, present best practices and standards, and identify gaps in our knowledge and areas for improvement. The field of public participation in
digitization of biodiversity research specimens is in a growth phase with many emerging opportunities for scientists, educators, and the public,
as well as broader communication with complementary projects in other areas (e.g., the digital humanities).
Keywords: crowdsourcing, citizen science, digital humanities, digitization of biodiversity research collections, public participation in scientific research.
W orldwide, there are approximately three billion

curated biodiversity research specimens, hereafter
referred to simply as specimens, including a wide variety
United States have been digitized with information available
online (Beach et al. 2010). Continued, broadscale digitiza-
tion of specimens (including databasing, georeferencing,
of samples from extant and extinct organisms, in the col- and digital imaging) and supporting source materials (e.g.,
lections of museums, universities, government agencies, field collection notebooks) is crucial for overcoming this
and research centers (Beach et al. 2010). These specimens impediment, but the scope and effort required to bring
and their data represent irreplaceable legacy information about a digitized biocollections commons is immense.
about our biosphere in an era dominated by planetary-scale Earlier projects that have engaged the public in online tasks
anthropogenic change (Walther et al. 2002, Parmesan and in other areas of science suggest that such approaches might
Yohe 2003) and unprecedented biodiversity loss (Jenkins be used to accelerate specimen digitization. For example, to
2003, Loreau et al. 2006, Wake and Vredenburg 2008). process the hundreds of thousands of images produced by
Already, this rich record has been used to benchmark the the Sloan Digital Sky Survey, scientists developed Galaxy
biological impacts of environmental change and to elucidate Zoo (http://galaxyzoo.org) and received more than 40 mil-
causal factors (Moritz et al. 2008, Rainbow 2009, Erb et al. lion classifications from 100,000 public volunteers (Lintott
2011, Everill et al. 2014). It also provides a unique resource et al. 2008). Here, we provide an overview of how online
for educators to teach core bioscience topics, as has been engagement of the public can advance digitization in three
recently highlighted by the activities of the Advancing activities—transcribing of specimen label and ledger text,
Integration of Museums into Undergraduate Programs georeferencing collection localities, and annotating speci-
(AIM-UP!) Research Coordination Network (Cook et al. mens—and how this engagement can lead to a deeper public
2014). In order for biocollections to be used to their full understanding of biodiversity science.
potential by researchers, policymakers, educators and the Public participation in the generation and communica-
public, there must be widespread access to the data they con- tion of knowledge in the sciences (Bonney et al. 2009),
tain (Ehrlich and Pringle 2008, Parr et al. 2012). However, humanities (Dunn and Hedges 2013), and other areas
only about 10% of the roughly one billion specimens in the (such as may be seen in Wikipedia, www.wikipedia.org,
BioScience XX: 1–14. © The Author(s) 2015. Published by Oxford University Press on behalf of the American Institute of Biological Sciences. All rights
reserved. For Permissions, please e-mail: journals.permissions@oup.com.
doi:10.1093/biosci/biv005 Advance Access publication XX XXXX, XXXX
http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 1

Overview Articles
and OpenStreetMap, http://openstreetmap.org) has become public participation, we focus here on those activities that
increasingly important (Bonney et al. 2014, Kelty et al. can be deployed online, where the number of potential par-
2014). Public participation is also known as citizen science ticipants is greater, because it is less limited by monetary and
(when scientists collaborate with the public) or crowdsourced physical constraints such as those related to onsite supervis-
science (in which contributions are made by a large, usually ing personnel, workspace, and parking. Improvements and
online and occasionally paid community of individuals; advancements made to online digitization tools for public
Wiggins and Crowston 2011). In the sciences, the need participation might also lead to their widespread use onsite
for a formalization of practice related to public participa- by paid staff.
tion and the establishment of supporting infrastructure has iDigBio’s Public Participation in Digitization of
been met by several recent organizational developments. Biodiversity Specimens Workshop participants recognized
The Human Computation and Crowdsourcing meetings 26 digitization activities in which the public could par-
began as annual workshops sponsored by the Association ticipate, some of which fit neatly into the last two task (i.e.,
for the Advancement of Artificial Intelligence in 2009 and activity) clusters of Nelson and colleagues (2012) described
became an annual conference in 2013. The biennial Citizen above and others that occur after the initial digitization of
Cyberscience Summit in the United Kingdom began in 2010 the specimen data and subsequent deployment of it online.

and is focused on Internet-deployed public engagement Given parallel advances in framing public participation in
projects. In August of 2012, Frontiers in Ecology and the the digital humanities (Dunn and Hedges 2013), we have
Environment released a special issue of their journal entitled organized these tasks within the Dunn and Hedges (2013)
Citizen Science—New Pathways to Public Involvement in typology (table 1). Dunn and Hedges’ (2013) framework
Research. That same month, a two-day Public Participation emerged from a literature review, two workshops on the
in Scientific Research (PPSR) workshop was convened at the topic, an online survey of contributors to crowd-sourcing
Ecological Society of America annual conference to gather projects, and interviews with contributors and consum-
together science researchers, project leaders, educators, ers of the data. They propose a framework for thinking
technology specialists, evaluators, and others representing about these projects in which “a process is composed of
diverse disciplines (including astronomy, molecular biology, tasks through which an output is produced by operating
human and environmental health, and ecology) to specifi- on an asset” (p. 156). Dunn and Hedges (2013) identified
cally discuss the formalization of the field of PPSR (Benz et twelve processes. The iDigBio workshop occurred prior to
al. 2013). Activities at that meeting established the foun- publication of the Dunn and Hedges’ (2013) typology, but
dation for the newly formed Citizen Science Association independently identified activities that fall under 11 of the
and a nascent journal. Other workshops have focused on 12 processes (table 1). The process for which correspond-
more narrow implementations of PPSR, including iDig- ing digitization activities were not identified—commenting,
Bio’s Public Participation in Digitization of Biodiversity crucial responses, and stating preferences—could become
Specimens Workshop in September 2012 (http://idigbio. important in future education and outreach activities. The
org/content/public-participation-digitization-biodiversity- independent and nearly simultaneous determination of
specimens-workshop-report) and the CITSCribe Hackathon these similar activities by iDigBio workshop participants and
(cosponsored by iDigBio and Notes from Nature; http://idig- Dunn and Hedges (2013) demonstrates the timeliness of the
bio.org/content/citscribe-hackathon) in December 2013 to topic and the occasion for increased connectivity between
improve online specimen label and ledger text transcription. the fields of biodiversity sciences and humanities.
The most frequent steps in the initial digitization of Here, we focus on three broadly defined digitization activ-
specimens have been described as five discrete task clusters ities that encompass what we consider to be the core digitiza-
(Nelson et al. 2012): predigitization curation and staging, tion activities in table 1: transcription, georeferencing, and
specimen image capture, specimen image processing, elec- annotation. As we will discuss, these overlap with Nelson
tronic data capture, and georeferencing locality descriptions. and colleagues’ (2012) electronic data capture and georefer-
The first three of these tasks are largely limited to onsite par- encing task clusters and Dunn and Hedges’ (2013) processes
ticipation, because that is the location of specimens (required of transcribing, cataloging, georeferencing, collaborative
by the first two) or the large image files (which are involved tagging (which we shorten to tagging), and categorizing. We
in the third). The last two task clusters can be performed recognize that, in most cases, only transcription is a literal
onsite or online offsite. Predigitization curation is generally capture of data and that georeferencing and annotation often
the first step in a digitization workflow and encompasses involve more substantial interpretation of specimen data.
preparing specimens for data entry, imaging, or both. Some A well-designed citizen science project can also contribute to
of these tasks include applying barcodes or other identifiers science literacy goals. By providing opportunities to learn about
to collection objects, updating taxonomic determinations scientific processes, experimental design, focal species, and
and nomenclature, transporting specimens to digitization data analysis (Bonney et al. 2009, Jordan et al. 2011, Whitmer
stations, cleaning specimens for imaging, and routing dam- et al. 2010) these projects provide educational benefits that are
aged specimens to a conservation workflow. Although the not possible through outsourcing the digitization to private
first three task clusters can benefit substantially from onsite companies such as Amazon (http://mturk.com) or Crowdflower
2 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Overview Articles
Table 1. Digitization activities identified by the participants of iDigBio’s Public Participation in Digitization of
Biodiversity Specimens Workshop organized by the twelve crowdsourcing processes recognized by Dunn and Hedges
(2013) for the humanities.
Process Activity
Transcribing • Into appropriate database fields.
Cataloging • Overlaps broadly with other processes (e.g., transcribing and georeferencing); identified by the
production of structured, descriptive metadata.
Translating • Between a nonnative language and the native language (e.g., between Chinese and English in the
United States).
Georeferencing • Assign latitude and longitude and measures of precision to collection localities not previously described
in that way.
Recording and creating content • Provide location and other information on historical place names used in collection locality descriptions.
Mapping • Production of maps useful for identifying outliers that might be due to errors or something that is
biologically interesting.
• Production of maps useful for citizen science research.

Tagging • Taxonomic identity.
• Phenological state or life stage.
• Existing disease, herbivory, parasite, etc., at collecting event.
• Damage (e.g., from insects) following collecting event.
• Entity–quality statements (e.g., the flower is red).
• Landmarks for morphometric analysis.
• Scientific significance of specimen (e.g., unrecognized type specimen).
• Other significance of specimen (e.g., to history).
• Digitization process errors (e.g., image file named incorrectly).
Categorizing • Any of the collaborative tagging activities where the descriptive categories that may be used are
constrained.
Linking • Determine if similar specimens are duplicate collections (exsiccatae; often at different institutions,
sometimes with different annotation histories).
• Determine if similar records in different data sources are from the same specimen (e.g., from a
biodiversity research collections data management system and GenBank).
Contextualization • Associate specimens with the legacy scientific literature that cites them.
• Associate specimens with field collection notebook pages that cite them.
• Associate specimens with ongoing scientific research that uses them (e.g., in a citizen science content
management system).
Commenting, crucial responses, • None identified.
and stating preferences
Correcting or modifying content • Any content generated by a collection’s staff, public participants, or automation (e.g., optical character
recognition or automated georeferencing).
• Identification of outliers in appearance (e.g., from images of specimens identified as same taxon).
• Identification of outliers in georeferenced data (e.g., from map of all localities for a single taxon or from
map of all localities visited by a collector in a single day, week, or month).
• Identification of outliers in other data (e.g., habitat descriptions); any of these outliers could be
scientifically interesting, rather than an error.
Note: These activities also align with the classifications of the initial digitization task clusters: predigitization curation and staging, specimen
image capture, specimen image processing, electronic data capture, and georeferencing locality descriptions (Nelson et al. 2012). The activities
are all assumed to be occurring online from digital content (images or text).
(http://crowdflower.com). These benefits can be gained in for- et al. 2001, National Research Council 2009). These experiences
mal classroom settings or in informal settings. The design provide crucial lifelong learning opportunities to increase sci-
and supplementary materials for online digitization activities ence awareness, appreciation, interest, and understanding with
in a classroom setting can emphasize foundational areas in different types of digitization programs and activities being able
the Next Generation Science Standards (National Research to achieve a variety of learning outcomes.
Council 2012), including scientific and engineering practices, Despite successful scientific advancements (e.g., Lintott
crosscutting concepts, and disciplinary core ideas. ZooTeach et al. 2008), critics of these approaches cite data quality
(http://zooteach.org) is a repository for K–16 educational mate- as a primary concern over the use of citizen science data
rials that use Zooniverse’s citizen science tools (Masters 2013). (Penrose and Call 1995, Nerbonne and Vondracek 2003).
Participants in informal and online learning experiences are In addition, citizen science is not well suited to all facets
diverse and include all ages, cultural and socioeconomic back- of scientific applications and workflows (Dickinson et al.
grounds, abilities, knowledge, and educational backgrounds. 2010, Kremen et al. 2011). Description of data quality has
Their experiences are characterized as being self-motivated, been formalized in the areas of transcription (Hill et al.
guided by their own interests, voluntary, personal, embedded 2012) and georeferencing (e.g., the National Standard for
in a context, and open-ended (Falk and Dierking 2000, Falk Spatial Data Accuracy; http://fgdc.gov/standards/projects/

Overview Articles
FGDC-standards-projects/accuracy/part3/index_html). OCR errors to some success (Heidorn and Zhang 2013). In

Training (Dickinson et al. 2010), deliberate program design the case of at least the Arizona State University Herbarium,
(Shirk et al. 2012), flexible and multiscale data management the use of a semiautomated workflow involving OCR and
systems (Newman et al. 2011), well-chosen data validation SALIX led to a higher overall transcription or cataloging rate
protocols (Bonter and Cooper 2012), and rigorous statistical for a mix of specimen labels (in terms of expected OCR per-
techniques that handle sampling bias and random error (Bird formance) than did typing the transcriptions into database
et al. 2014) are known to collectively improve citizen science fields from a similar mix of specimen labels (26.3 records per
data quality. Furthermore, it is expected that increased atten- hour versus 20.4 records per hour by experts, respectively;
tion on citizen science data quality makes it more likely to Barber et al. 2013). Clearly, semiautomation of this task mer-
be critically evaluated. We expect that these approaches will its further development. However, we note that anecdotal
be valuable as data quality is addressed in this early stage of evidence reported by Barber and colleagues (2013) suggested
public participation in digitization of specimens. that typing the transcriptions into database fields is the more
Following on this history and these goals, we present each efficient workflow when the specimen labels are short and
of the three broadly defined areas of digitization (transcrip- OCR performance was in the low part of the observed dis-
tion, georeferencing, and annotation), explain and illustrate tribution for that particular herbarium (Barber et al. 2013).

the activity, identify competencies and training emphases The relative collection-wide efficiency of typing versus semi-
that will lead to the most efficient and accurate results, automated workflows might be different for collections with
compare existing tools, identify relevant best practice and mostly short specimen labels and text features that lead to
standards, and recognize gaps in our knowledge and oppor- low OCR performance (e.g., insect collections in which the
tunities for improvement. text is often imaged at an oblique angle and the labels are
often stacked on one another). Both online transcription and
Online activity 1: Transcribing specimen label and annotation (online activity 3) require digital images—of all
ledger text relevant labels or ledgers in the case of the former and the
Online activity 1 involves two processes from Dunn and specimen in the case of the latter.
Hedges’ (2013) typology: transcription (creating machine- In our experience, public participants can be expected to
readable text that reflects the textual content of the specimen be most efficient and accurate at the transcription activity
label or ledger; sometimes called text encoding) and cata- when they are proficient typists and can read the language
loging (the production of structured, descriptive metadata in which the label was written. Personal attributes that also
about the text). We will discuss both of these processes as benefit any of these digitization activities include attention
the activity of transcription, as is common in the biodiversity to detail, patience, dedication, and a desire to make a dif-
research collection domain. ference or contribution. Useful emphases in training for
the task can be placed on skills relevant to the basic under-
Overview. To date, this activity is most commonly completed standing of specimen labels such as interpreting common
onsite by paid technicians in one step: typing (or occasionally scientific jargon, abbreviations, label formats, and variability
reading) the text into appropriate fields in their institution’s in dates (ordering of month–day versus day–month in dif-
specimen data management system (Nelson et al. 2012). ferent cultures), as well as standard markup for capturing
These technicians have been trained to systematically catalog annotations, deletions, and markings in the original text.
the often complex and variable labels and ledgers found in Equally important is training in how to handle label infor-
their home biodiversity research collection. Recently, how- mation that requires further judgment such as when to type
ever, there has been parallel development of tools for the the element verbatim and when some interpretation may
semiautomation of both of the relevant processes and online be used (e.g., when common words are misspelled), how to
tools to engage the public in the activities. handle inconsistencies (e.g., when the city given is not found
The semiautomation of the task separates the two pro- in the state given or country names that have changed over
cesses: optical character recognition (OCR) can be used for time), and identifying targeted data elements and selecting
the transcription of typewritten (or printed) text that has been the appropriate element when multiple similar elements
captured in digital images, and applications such as SALIX exist (e.g., from among the scientific names on the original
(Barber et al. 2013), LABELX (Heidorn and Zhang 2013), label and later annotation labels). A set of specimen labels
and those developed by Apiary (apiaryproject.org) can auto- or ledger entries can vary substantially in legibility, informa-
matically produce structured data from the OCR text strings. tion content, and consistency, and training examples need to
A human typically then proofreads the output of these auto- adequately represent that variation.
mated workflows. OCR has a low accuracy rate with hand- There are relatively many online tools that engage the
written text and even in some cases of typewritten text (e.g., public in transcription of this type. Some of the better-
faded ink; Barber et al. 2013). This reduces the value of auto- known tools include the Atlas of Living Australia’s DigiVol
mation in the transcription process and, therefore, also the (http://volunteer.ala.org.au), Zooniverse’s Notes from Nature
cataloging process that follows it, but it is worth noting that (notesfromnature.org), Herbaria@Home (http://herbaria
LABELX uses a fuzzy matching algorithm to accommodate united.org/atHome), Smithsonian Digital Volunteers (http://

Overview Articles
data elements (e.g., taxonomic identifi-

cation, collection date) that are targeted
(figure 1). These differ most signifi-

cantly in whether the tasks are packaged
into subprojects (called expeditions in
DigiVol), incentives for participation, the
ability to discuss tasks with reference
to individual specimens, the means by
which the area of interest is shown in
the image, the number of entry fields
displayed on the page at a time, and the
validation of the entries (transcription
by one user then validation by another
versus multiple transcriptions that are
later reconciled; table 2).

Best practices and standards. To our knowl-
edge, there are not best practice docu-
ments specifically targeted at engagement
of the public in transcription for bio-
diversity research collections. However,
there are best practices for specimen
imaging that must occur to permit online
transcription and annotation (Häuser et
al. 2005; http://sciweb.nybg.org/Science2/
hcol/mtsc/NYBG_Best_Practices.doc),
and there are best practices that are
generally relevant to the digitization
activities identified in table 1, such as
DataONE’s Primer on Data Management
(http://dataone.org/sites/all/documents/
DataONE_BP_Primer_020212.pdf)
and the online Citizen Science Central
Toolkit (http://birds.cornell.edu/citsci-
toolkit/toolkit/steps). On the basis of the
experience of three of us developing two
of the transcription tools (DigiVol and
Notes from Nature), we suggest several
considerations related to the online tool,
its interface, and the most efficient ways
a participant can engage with it (box 1).
Many of these recommendations also
have clear relevance to the georeferenc-
ing and annotating activities that we
Figure 1. Example transcription interfaces. The Atlas of Living Australia’s DigiVol discuss below.
has an interface that permits zooming and panning of the image at any time; Relevant sources of standards for this
all targeted fields for digitization are displayed at once. Zooniverse’s Notes from activity and, to some extent, the other
Nature has an interface that requires the user to choose a portion of the image for two include the Dublin Core Metadata
zooming, and that portion remains static through the transcription. In Notes from Initiative (http://dublincore.org), the
Nature, one field is requested at a time, but users can return to earlier fields. Darwin Core for biodiversity informa-
tion (http://rs.tdwg.org/dwc; Wieczorek
transcription.si.edu), Les Herbonautes from the National et al. 2012), the Audubon Core for metadata about mul-
Herbarium in France (http://herbonautes.mnhn.fr), and timedia files associated with biodiversity research collec-
Discover Life’s Time Machine (http://discoverlife.org/time- tions and resources (http://tdwg.org/standards/638), and
machine). These are similar in that there is a specimen the Ecological Metadata Language project (http://knb.ecoin-
label or ledger image viewer in each and a subset of shared formatics.org). Specific to markup text in the humanities

Overview Articles
Table 2. Online tools for public participation in transcription of biodiversity specimen labels and field notebooks.
Characteristics of each are described as applicable according to the given category. Values are valid as of February
2015, unless otherwise noted.
Transcription Taxonomic; Training Incentives Contributors Transcriptions Interface Validation
tool geographic; process
and object
type focus
Atlas of Living Life; global, Onsite tutorials Recognition 860 130,816 Zoom and pan Each task
Australia’s but especially and forum. of every in window or has one
DigiVol Australia; individual’s in separate transcription
specimens and contributions window; all and one
field notebooks. to each fields seen at validation
expedition, once. (proofread by
as well as an experienced
those making transcriber).
the greatest
contribution.

Zooniverse’s Life; global, Onsite Badges 6,833 1,042,592 Drag box Four
Notes from but especially instructions and earned upon around label, participants
Nature United States; forums. completion label appears enter data
specimen of a certain in window; one for each
labels and field number of field shown at a specimen with
notebooks transcriptions. time. postprocessing
of these.
Herbaria@ Plants; United Onsite None. 420 146,834 Zoom in on ~1% of records
Home Kingdom; instructions and label. All fields are cross-
specimen videos. seen at once, checked by
labels plant name additional
provided, other participants.
field values Data users
provided by pull- can also make
down menu. edits.
Smithsonian Life; global, Onsite tutorials None. 1,163 26,520 Zoom and pan Participants
Digital from within and tips. (as of in window, review
Volunteers Smithsonian April 2014) fields divided completed
collections; into several pages.
specimen windows
labels and field (labels) or
notebooks. in one field
(notebooks).
Les Plants and Onsite guidelines. Badges 1,859 1,292,722 Zoom in Validation
Herbonautes algae; global Participants earned on (contributions window; all of individual
(National but especially start with simple completion of individual fields seen fields by other
Herbarium in Europe; transcription of a certain values) at once. participants,
France) specimen fields (e.g., number of usually 2, until
labels. country), and are transcriptions. consensus is
tested before reached.
progressing to
more challenging
fields.
Discover Life Plants and Onsite guidelines None. 64 3,489 Zoom in No validation
insects; global; and help. window; all
specimen fields seen
labels at once.
is XML-TEI markup (http://tei-c.org/index.xml), which is methods. Each of these also has clear relevance to the
important in the context of transcribing ledgers. georeferencing and annotating activities. Improvements to
transcription tools could enhance participant enjoyment
Gaps in our knowledge and areas for improvement. Despite recent and ease of use. For example, new functionality could give
recommendations from the Notes from Nature project (Hill the contributor more control of their transcription experi-
et al. 2012) and limited research into motivations of citizen ence, such as providing them with the ability to establish
scientists (Rotman et al. 2014), we still lack a satisfactory the criteria used to determine the specimens that they
understanding of several aspects of public participation in transcribe (e.g., on the basis of the collection supplying the
transcribing biodiversity specimen labels and ledgers. These specimen images or the occurrence of a word in the OCR
include the most significant factors affecting efficiency, text strings generated from images) or the ability to toggle
accuracy, initial motivation, and long-term engagement; the between interfaces that show a single field at a time and
best algorithms to produce consensus transcription from multiple fields at a time. Furthermore, records could be
multiple replicates; and the most effective data validation sorted for transcription based on similarity (e.g., overall

Overview Articles
Box 1. Our recommendations for online transcription tools.
The image display should produce a clear view of all relevant text at an appropriate zoom level at once or via panning.
Data entry fields should be accessible whilst viewing the image.
Drop-down lists should be provided when the universe of acceptable responses can be populated from controlled vocabularies and is
relatively small (e.g., the 50 US states); autocomplete functionality in free text fields should be provided when the number of acceptable
responses is larger and cannot be fully populated from the beginning of the project (e.g., collector names).
Dependencies in the acceptable values for fields should be built in (e.g., only those counties from the state of Georgia are available in
a dropdown once the state is established as Georgia).
Readily accessible examples and directions for each field should be available during the activity.
Forums to enable volunteers to ask questions about specific specimens or ledgers or the general process of transcription to the project
manager and each other should be provided.
A task completion count should provide the public participant with both progress towards the project’s digitization goal and the

articipant’s overall contributions to the project.
p
Scientists should regularly communicate to the public the value of the generated data (e.g., through a blog and social media); develop-
ers should regularly communicate new developments and bug fixes.
Response and loading time of images and transcription pages should be quick.
Permit transcribers to explore the portion of the image containing the organism or view an image of the taxon from another source
(e.g., Notes from Nature’s Macrofungi Interface displays images of the taxon from Encyclopedia of Life).
similarity of OCR text strings). Improvements could also transects may be recorded as a line with start and stop
address data quality issues by providing the ability for par- coordinates, as is common in samples from trawlers. The
ticipants to return to earlier transcription records to correct expression of uncertainty is crucial to determining a data
what they later learn are transcription errors. The biodi- record’s fitness for use (Wieczorek et al. 2004). For example,
versity research collections community would also benefit point data with an uncertainty of 10 km may be unsuitable
from greater sharing of best practices and tools with the for an analysis across 1-km-resolution environmental gra-
digital humanities community, in which projects, such as the dients. Georeferences as latitude and longitude coordinates
University College London’s Transcribing Bentham Project and the datum on which the coordinates are based are typi-
(http://blogs.ucl.ac.uk/transcribe-bentham), the University of cally lacking from terrestrial and inland aquatic specimens
Iowa’s Civil War Diaries and Letters Transcription Project collected before the 1990s (Beaman and Conn 2003; marine
(http://digital.lib.uiowa.edu/cwd), and the Medici Archives specimens might differ). Where those are available, they
Project (http://medici.org), and standalone tools such as can provide useful validation for textual descriptions or vice
Ben Brumfield’s FromThePage (http://beta.fromthepage.com) versa, because such latitude and longitude readings also have
for transcription and Juxta (http://juxtasoftware.org) for associated, and often unreported, uncertainties.
the comparison of multiple transcriptions of a single text, Public participants can be expected to be most efficient
represent significant overlap in objectives between the two and accurate at georeferencing when they can read the
communities. language in which the label was written, can read relevant
map types (e.g., topographic or nautical), and have some
Online activity 2: Georeferencing familiarity with the area in which the specimen was collected
Georeferencing, as applied to biodiversity research collec- (i.e., experience on the ground or with locally used names).
tions, is the inference of a geospatial geometry from the Useful emphases in training for the task can be placed on
textual collection locality description on a label or in a ledger basic geographical skills such as identifying the locality
(figure 2; Guralnick et al. 2006). information and interpreting locality types, interpreting
geographic jargon, compass bearings, abbreviations, and for-
Overview. The geospatial geometry is often expressed as a mats, and understanding the common types of geographic
single point representing latitude and longitude, usually with projections (e.g., equal area), coordinate systems (e.g.,
an associated radius allowing representation of uncertainty Universal Transverse Mercator) and geodetic systems (e.g.,
(Wieczorek et al. 2004). However, localities could also be World Geodetic System 1984). Training will also improve
represented as multipoints, lines, multilines, polygons, and a participant’s ability to interpret locality descriptions and
multipolygons to better reflect either the collection method uncertainties. For these skills, training emphases can be
or imprecision associated with the interpretation of a tex- placed on finding and using relevant maps and indices of
tual collection locality description. For example, sampling place names, and precisely describing the georeferencing

Overview Articles
the community type at alternative loca-

tions (e.g., swamp versus upland) is also
helpful. The extent to which the train-
ing is needed will vary depending on
the locality descriptions. For example,
the description “Pushepatapa Creek, 7.8
miles north of Bogalusa at Hwy 21;
Washington Parish; Louisiana” requires
very little expertise to pinpoint, because
it is at the intersection of a bridge and
a creek. However, the description “San
Francisco Bay, Shag Rock, S. 58° W,
Rt. Tang. Pt. Avisadero, S. 74° W., Goat
Island. Lighthouse, N. 21°W.; United
States” requires an understanding of

compass bearings and reading naviga-
tional charts.
We are aware of a single online tool
that has been used to engage the pub-
lic in the georeferencing of specimens,
although many other examples of what
has been called “volunteered geo-
graphic information” (Goodchild 2007,
Elwood et al. 2011) exist. GEOLocate
provides users with, among other func-
tions, a Collaborative Georeferencing
Web Client (http://museum.tulane.edu/
geolocate/community/default.html)—a
framework for managing a community
of georeferencers and a tool that auto-
matically interprets textual locality infor-
mation and supplies candidate points
and associated uncertainties (radii and
polygons). A georeferencing volunteer
or technician can evaluate the automated
results against various online base maps
(e.g., aerial photography) to select the
most appropriate point and uncertainty
description or make modifications as
necessary. Occasionally, additional
resources such as Google Earth (http://
Figure 2. Example of georeferencing results by volunteers for a single locality. In earth.google.com), historical paper
this example, nine minimally trained undergraduate students georeferenced this maps, web searches, original field notes,
herbarium specimen label. Two of these were outside of the bounds of the national detailed specimen records, ship logs,
forest and were removed as obvious outliers. The seven remaining points are cemetery records, and so on are required
represented by the green dots. A mathematical mean of these points is shown with to accurately determine the location of
a red dot. A local expert familiar with Apalachicola National Forest (where this collection, and these resources can be
specimen was collected) georeferenced the label as represented by the yellow dot. recorded by the georeferencer along with
data quality issues. Although it is not
method in a standard way, using known sampling biases to wholly volunteer-based, the FishNet Project (http://fishnet2.
interpret locality descriptions (e.g., the tendency to collect net; an online archive of fish collection holdings around
near existing roads), and describing uncertainty quantita- the world), illustrates an implementation of GEOLocate for
tively (e.g., as the radius of a circle) or using other geom- volunteer georeferencing. In that project, the georeferencing
etries (e.g., a polygon). An understanding of the historical of 3.7 million lots is distributed among staff technicians and
context and relevant training in interpreting the p atterns in occasional volunteers at 12 institutions around the United
historical aerial photographs that are relevant to predicting States with georeferencing responsibilities parsed by the

Overview Articles
geographic origin of the specimen (e.g., Africa), rather than produce a useful consensus georeference. Still lacking are the
the collection that curates the specimen. ability to match georeferencing competencies with collec-
tion localities and sufficient strategies for assessing a user’s
Best practices and standards. Best practice documents specific georeferencing competencies initially and through time. A
to georeferencing specimens include Guide to Best Practices better understanding of how to enable collaboration and
for Georeferencing (Chapman et al. 2006), Principles and communication (e.g., by visualizing on a map the collection
Methods of Data Cleaning—Primary Species and Species- localities being discussed in a forum) is also needed.
Occurrence Data (Chapman 2005), and Guide to Best Digital imaging and linking of field notes to specimens
Practices for Generalising Sensitive Species Occurrence Data would likely provide a big benefit to georeferencing, because
(Chapman and Grafton 2008). However, the geospatial com- field notes can contain a wealth of information about
munity has produced many other best practice documents, collecting sites, including travel itineraries, site sketches,
including those related to standards (e.g., as at the Open environmental information, and other remarks not often
Geospatial Consortium; http://opengeospatial.org/standards/ found on specimen labels. iDigBio’s 2014 Digitizing from
bp) and commercial or open-source geographic information Source Materials Workshop (http://idigbio.org/wiki/index.
systems (e.g., as found at ESRI; http://esri.com). A useful php/Digitizing-From-Source-Materials) laid the groundwork

clearinghouse for information about the process of georefer- for this link. The biodiversity research collections commu-
encing specimens is provided by VertNet (http://vertnet.org) nity would also benefit from greater sharing of best practices
at http://georeferencing.org. and tools with other communities, including the ecologi-
We are unaware of best practice documents produced to cal citizen science projects that enable mapping of species
address public participation in the generation of geospatial observations (e.g., National Geographic’s FieldScope project,
data. However, on the basis of the experience of develop- http://education.nationalgeographic.com/education/program/
ing GEOLocate and implementing tools in projects such fieldscope, and iNaturalist, http://inaturalist.org), digital
as VertNet (http://vertnet.org), we suggest several con- humanities projects that rectify digital images of historical
siderations that are important to successfully engage the maps (e.g., Map Georeferencer, http://maps.nls.uk/projects/
public in this activity. The categorization of data records georeferencer/about.html, which has been used in the British
into administrative unit of specimen origin (e.g., country, Library Georeferencer Project, http://bl.uk/maps), and proj-
state, county) is useful for assigning records to public par- ects to develop “framework data” (sensu Elwood et al 2012;
ticipants; a user survey can provide information regarding e.g., OpenStreetMap, http://openstreetmap.org).
on-the-ground knowledge for alignment with the specimen
localities. Classification of georeferencing difficulty (using, Online activity 3: Annotating
e.g., the uncertainty that GEOLocate automatically assigns) Beyond the label data used for the transcribing activity (online
is useful for assigning records as well; a participant’s perfor- activity 1), a wealth of additional information can be derived
mance with control localities (where accurate coordinates from the image of the specimen and shared through annota-
are known) can be used to evaluate georeferencing skill. tions. Annotating can be variously characterized in Dunn
Each locality record should be georeferenced multiple times and Hedges’ (2013) classification as tagging, categorizing, and
until the points reach some clustering threshold (a pre- cataloging, depending on constraints imposed on the activity
defined spatial variance) or the replicates reach a limit, at (tagging versus categorizing) and the degree of structure in
which the record is flagged for the attention of an expert. the metadata generated (more required for cataloging).
Recommendations made for transcription best practices are
also relevant here, especially provision of a forum for users Overview. Physical annotations traditionally were associated
to discuss specific localities or general patterns with each with (e.g., pasted on or placed in the same jar as) a physical
other and project scientists, leading to greater user profi- specimen that was visited at its home collection or exam-
ciency and understanding. ined while on loan to another collection. In online speci-
Relevant sources of standards for the generation and men annotation, a feature of interest can be described and
communication of geospatial data include the Federal measured from a digital image, often with an area of interest
Geographic Data Committee (http://fgdc.gov), the Open specified, linking the annotation not only to a specimen,
Geospatial Consortium (http://opengeospatial.org), and but a region on the specimen image (figure 3). Annotations
within Darwin Core (i.e., DC-location), as well as most of can be related to taxonomic identity, phenological state or
those presented for transcription. life stage, features in existence at the time of the collecting
event (e.g., evidence of disease or herbivory), damage fol-
Gaps in our knowledge and areas for improvement. We do not lowing the collecting event (e.g., from pests), entity–quality
have a satisfactory understanding of several aspects of pub- statements (e.g., the flower is red), landmarks for morpho-
lic participation in georeferencing, including the average metric analysis, and so on (see table 1 for further detail).
number of replicate georeferencing events needed to reach a Annotations are not typically a focus of the initial specimen
sufficient level of accuracy and effective methods for balanc- digitization (e.g., those task clusters described by Nelson
ing accuracy and precision (e.g., by removal of outliers) to et al. 2012) unless they are legacy physical annotations

Overview Articles
annotations related to damage would

increase efficiency. Researchers using
public annotation applications would
conceivably receive much richer infor-
mation based on carefully constructed
guided-choice questions, example images
for comparison, and associated ontology
classes creating properly formed entity–
quality syntax statements (Gkoutos et al.
2004) from the users choices.
Augmenting specimen informa-
tion with useful conclusions from the
specimen image encompasses a variety
of strategies and techniques that can
include both automation and public par-

ticipation. For example, various research
projects are exploring methods for auto-
mated taxonomic identification. Similar
to facial recognition applications used to
identify people, these methods require
an accurate training data set of identi-
fied images from one or more stan-
dard angles. Examples of automation of
this type include Leafsnap (Kumar et al.
2012) for identification of 184 tree spe-
cies in the Northeastern United States
and SPIDA (Russell et al. 2007) for one
family of Australasian ground spiders. It
is unclear whether—or how well—this
implementation of computer vision will
scale up to larger geographic areas and
taxonomic groups in the future, and
we note Russell and colleagues’ (2007)
conclusion (p. 149): “Automating the
identification of specimens to species
is a difficult task. There is no reason to
believe that teaching a computer to iden-
Figure 3. Example of a highly structured image annotation. Image of Ampulex tify species will be any easier than teach-
compressa (F.) from the Museum für Naturkunde Berlin (http://morphbank. ing a person to do so. In fact, it is likely
net/?id=102143), illustrates a few possible hypothetical public image annotation a trickier process altogether, considering
types, which engage the public actively in augmented image annotation for the amazing ability of the human mind
phenotype research. (A) A participant may be asked to draw a box around to compensate for missing information
any damage to the specimen, which would get reported to a researcher as a and recognize the similarity in objects.”
positive response to this area of interest. (B) A guided question, “Outline the Nevertheless, public participants could
wings of the specimen,” which can be converted to a shape file of the wing. (C) A be important in the development of this
description of a certain quality. A participant would see the generic black-and- process by building the training data sets
white image with a body part highlighted, along with a specimen, and be asked for these automation methods as those
to describe that body part on the specimen itself—that is, “What color is this algorithms become more successful. We
part of the body?” In this case, documentation of a red femur would be recorded. are unaware of automation of the other
Abbreviation: mm, millimeter. types of annotation (e.g., finding image
edges that could allow automated ways
associated with the specimen at the time of digitization, to measure typical trait values like body length and width).
but they can be fundamental to the downstream research Public participants can be expected to be most efficient
applicability of specimens. For example, DNA barcoding and accurate at annotation when they have existing familiar-
can require voucher specimens with minimal damage (Jinbo ity with the focal taxonomic group or the focal taxonomic
et al. 2011) and the ability to search on legacy and digital group within a focal geographic region (e.g., millipedes of

Overview Articles
Arkansas), the use of authoritative resources (e.g., taxonomic anticipated (e.g., many beetles are only identifiable by the
keys and illustrated glossaries), and the use of relevant terms number of segments on the tarsus and without that part in
(e.g., leaves and glaucous). Useful emphases in taxa-specific the image, an annotation of taxonomic identity is difficult).
training can be placed on recognizing relevant features of Also, users should have easy access to tools for zooming and
the focal taxonomic group, correct usage of relevant terms, panning and designating an area of interest in the image to
use of specific resources (e.g., a key to the millipedes of associate with the annotation. Finally, constraint of annota-
Arkansas) and the protocol for describing relevant resources tion terms to those in controlled vocabularies (e.g., from
and methods used for reaching the conclusion of an annota- ontologies or taxonomic authority files) can enable semantic
tion. Process- and image-specific training can include iden- processing and reduces spelling errors. Recommendations
tifying typical changes that can occur in the phenotype after made above in reference to transcription and georeferencing
preservation as a specimen (e.g., common color changes or best practices are also relevant here, especially provision of
pest damage patterns) and typical distortions introduced a forum for the users to discuss annotations with each other
by an imaging technique (e.g., deviations from a rectilinear and project scientists, leading to greater user proficiency and
projection or chromatic aberrations). understanding.
Relatively many online applications enable public partici- Standards relevant to annotation specifically include the

pation in the annotation of images (although not necessar- relevant taxonomic codes (McNeill et al. 2012, Tschöpe
ily specimen images) in a constrained way (mostly falling et al. 2013), the Apple Core extension of the Darwin Core
within Dunn and Hedges’ 2013 categorizing process). For (for sharing botanical annotations, http://code.google.com/p/
example, Citizen Sort (http://citizensort.org) has online applecore), and various controlled vocabularies that have
games such as Happy Match (http://citizensort.org/web. the potential to greatly extend the value of annotations for
php/happymatch), in which users categorize organisms in discovery (Deans et al. 2012). Potentially relevant controlled
images. Crowdcrafting (http://crowdcrafting.org) is an open- vocabularies include Phenotypic Quality Ontology (previ-
source platform that enables image classification projects ously known as Phenotype and Trait Ontology; Mungall
(as well as transcription and georeferencing projects), such et al. 2010) for phenotypic characteristics including color
as The Faces We Make (http://crowdcrafting.org/app/the- and shape, Environment Ontology (http://environmenton-
facewemake), in which users associate expressions on faces tology.org; Hirschman et al. 2008) for environmental and
with emoticons for social science research. Zooniverse habitat descriptions, and general anatomical ontologies for
(http://zooniverse.org) has several image annotation proj- various morphological details (e.g., Hymenoptera Anatomy
ects, including those with biological applications, such as Ontology; Yoder et al. 2010).
in the Seafloor Explorer project (http://seafloorexplorer.org)
and Condor Watch (http://condorwatch.org). The biological Gaps in our knowledge and areas for improvement. We do not have
image repository Morphank (http://morphbank.net) gives a satisfactory understanding of several aspects of public par-
users the ability to annotate images with taxonomic and ticipation in annotation including the interface design that is
morphological observations to produce highly structured most suitable for capturing complex data while maintaining
metadata that become searchable at that site. Citizen science participants’ interest and furthering science literacy goals,
communities building observational data sets (which might the accuracy rate for different forms of annotation (e.g.,
not be vouchered with a specimen) also perform taxonomic taxonomic identification or determination of phenological
annotations with tools at iNaturalist (http://inaturalist.org), state), and the most successful methods of quality control for
BugGuide (http://bugguide.net), and Mushroom Observer variable citizen science contributions.
(http://mushroomobserver.org). The annotation activity can potentially be improved by
providing more advanced image viewing tools in the public
Relevant best practice and standards. We are unaware of best participation sites, such as side-by-side image comparisons
practice documents that address public participation in and transparency overlays that allow direct comparison of
annotations of digital specimen images. However, best prac- one image on top of another (e.g., two leaf images), more
tice documents related to the creation and management of complete annotation metadata that records such informa-
somewhat analogous annotations of images do exist in the tion as the zoom-level and frame viewed at the time of anno-
digital humanities at Europeana Connect (http://europeana- tation, and greater flexibility in the designation of an area of
connect.eu; e.g., as it relates to map annotations), and there interest (e.g., using multiple polygons or edge detection or
is a best practice document for specimen imaging (Häuser selection tools).
et al. 2005), which can become especially important when
features of the specimen are interpreted from the digital Conclusions
image (Zelditch et al. 2012). On the basis of the experience Data about a huge number of biodiversity research speci-
of four of us in developing Morphbank’s image annota- mens (perhaps as many as 900 million in the United States)
tion tool, we suggest several considerations to successfully are still stuck in cabinets—not yet represented online in
engage the public in this activity. Imaging techniques should digital form. We see engagement of the public in that digi-
take into account annotation when it is planned or can be tization as one strategy to accelerate data capture for urgent

Overview Articles
social challenges, such as predicting biotic responses to Finally, the development of a public digitization project
climate change and invasive species. Here, we reviewed the relies on somewhat ad hoc negotiations between a collection
state of public participation in three major areas of digitiza- curator and the managers of relevant public participation
tion—transcription, georeferencing, and annotation. Each tools who each require different information in differ-
of these activities contributes crucial data to research and ent formats. This can slow progress and is an area where
offers educational opportunities, but public participation in standardization has the potential to make the creation and
transcription is most advanced of the three. This is perhaps management of public digitization projects accessible to not
due to efficiencies that can be introduced into the latter two just all collections curators but also members of the public
activities once the specimen’s identity and collection locality (e.g., a local chapter of a native plant society). Empowering
description have been digitized. the latter group has the potential to engage far more par-
Across the three major digitization tasks, several com- ticipants by better aligning the digitization projects that
mon needs for improvement can be noted. We recognize are available with the motivations of the public, making
seven high priority steps for the community to take in this the projects collaborative or cocreated, rather than simply
area: (1) All of the public participation tools for biodiversity contributory (sensu Shirk et al. 2012). By contrast, opportu-
specimen digitization that we have discussed are relatively nities for public engagement today are largely contingent on

new, and experimental data on optimal user interface con- decisions made by collections curators and tool managers.
figurations (e.g., to maximize such things as efficiency or We are encouraged by projects such as iDigBio’s Biospex
accuracy or user enjoyment) are almost totally lacking. New Public Participation Management System (http://biospex.
tool developments in this area should be driven by experi- org) in this area.
ments, user surveys, and participatory design principles. (2)
Experimentation is also merited in the area of quality control Acknowledgments
of the data. For example, which transcription method leads The authors thank the following people for useful con-
to greatest accuracy and efficiency (Brumfield 2012)? And versations on this topic: Melody Basham, Jim Beach,
which are the most useful crowd consensus benchmarks Jason Best, Cathy Bester, David Bonter, Ben Brumfield,
(Sheshadri and Lease 2013) in these activities? (3) Although Michael Denslow, Renato Figueiredo, Jose Fortes, Charlotte
the need for tools to engage the public in various forms Germain-Aubrey, Michael Giddens, Ed Gilbert, Jonathan
of transcription and annotation is certainly not fully met, Hendricks, Austin Hendy, Andrew Hill, Kevin Love, Bruce
and there is plenty of space for improvement, restriction MacFadden, Elizabeth Martin, Andrea Matsunaga, Tom
of the georeferencing activity to a single tool with some- Nash, Larry Page, Richard Primack, Pam Soltis, Julie
what nascent public participation functionality (GEOLocate’s Speelman, Patrick Sweeney, Barbara Thiers, Alex Thompson,
Collaborative Georeferencing Web client) suggests that devel- Bill Watson, Andrea Wiggins, Nathan Wilson, Alison Young,
opments in that area are crucial. (4) Motivation for initial and Jessica Zelt. iDigBio is funded by a grant from the
and sustained user engagement is an active area of research National Science Foundation’s Advancing Digitization of
but is still poorly understood and would benefit from more Biodiversity Collections Program (Cooperative Agreement
widespread use of user surveys. (5) We view the relative no. EF-1115210). The National Ecological Observatory
paucity of education and outreach materials to complement Network (NEON) is a project sponsored by the National
public engagement in digitization to represent an area for Science Foundation and managed under cooperative agree-
considerable growth. For example, ZooTeach offers over ment by NEON. This material is based on work supported
49 lesson plans using Zooniverse science tools, but none of by the National Science Foundation under Cooperative
them involve Notes from Nature. (6) Existing best practice Agreement no. EF-1029808. Any opinions, findings, and
and standards documents do not sufficiently overlap the conclusions or recommendations expressed in this material
needs of this area of public participation. We are optimistic are those of the authors and do not necessarily reflect the
that the reinvigorated Biodiversity Informatics Standards views of the National Science Foundation.
(TDWG) Citizen Science Working Group (as of the 2013
TDWG meeting) and the iDigBio Public Participation in the References cited
Digitization of Biodiversity Specimens Working Group will Barber A, Lafferty D, Landrum LR. 2013. The SALIX Method: A semi-
provide leadership in this area. (7) Interoperability between automated workflow for herbarium specimen digitization. Taxon 62:
581–590.
digitization tools that engage the public and specimen data
Beach J, et al. 2010. A Strategic Plan for Establishing a Network Integrated
management systems (e.g., Specify and Symbiota) should be Biocollections Alliance. iDigBio. (17 January 2015; http://digbiocol.
expanded beyond the small number of cases. In one of the wordpress.com/brochure)
few examples, FilteredPush (http://wiki.filteredpush.org) can Beaman R, Conn B. 2003. Automated geoparsing and georeferencing of
send annotations made in Morphbank to a collection cura- Malesian collection locality data. Telopea 10: 43–52.
Benz S, Miller-Rushing A, Domroese M, Ballard H, Bonney R, DeFalco T,
tor’s Specify or Symbiota specimen data management system.
Newman S, Shirk J, Young A. 2013. Workshop 1: Conference on Public
iDigBio and Notes from Nature hosted a workshop on this Participation in Scientific Research 2012: An international, interdisci-
topic in fall 2014. plinary conference. Bulletin of the Ecological Society of America 94:
112–117.

Overview Articles
Bird TJ, et al. 2014. Statistical solutions for error and bias in global citizen Hirschman L, et al. 2008. Habitat-Lite: A GSC case study based on free text
science datasets. Biological Conservation 173: 144–154. terms for environmental metadata. OMICS: A Journal of Integrative
Bonney R, Cooper CB, Dickinson J, Kelling S, Phillips T, Rosenberg KV, Biology 12: 129–136.
Shirk J. 2009. Citizen science: A developing tool for expanding science Jenkins M. 2003. Prospects for biodiversity. Science 302: 1175–1177.
knowledge and scientific literacy. BioScience 59: 977–984. Jinbo U, Kato T, Ito M. 2011. Current progress in DNA barcoding and future
Bonney R, Shirk JL, Phillips TB, Wiggins A, Ballard HL, Miller-Rushing AJ, implications for entomology. Entomological Science 14: 107–124.
Parrish JK. 2014. Next steps for citizen science. Science 343: 1436–1437. Jordan RC, Gray SA, Howe DV, Brooks WR, Ehrenfeld JG. 2011. Knowledge
Bonter DN, Cooper CB. 2012. Data validation in citizen science: A gain and behavioral change in citizen-science programs. Conservation
case study from Project FeederWatch. Frontiers in Ecology and the Biology 25: 1148–1154.
Environment 10: 305–307. Kelty C, Panofsky A, Currie M, Crooks R, Erickson S, Garcia P, Wartenbe
Brumfield B. 2012. Quality control for crowdsourced transcription. In M, Wood S. 2014. Seven dimensions of contemporary participation
Brumfield B, ed. Collaborative Manuscript Transcription. BlogSpot. disentangled. Journal of the Association for Information Science and
(17 January 2015; http://manuscripttranscription.blogspot.com/2012/03/ Technology. Forthcoming. doi:10.1002/asi.23202
quality-control-for-crowdsourced.html) Kremen C, Ullman KS, Thorp RW. 2011. Evaluating the quality of citizen-
Chapman AD. 2005. Principles and Methods of Data Cleaning: Primary scientist data on pollinator communities. Conservation Biology 25:
Species and Species-Occurrence Data. Global Biodiversity Information 607–617.
Facility. Kumar N, Belhumeur PN, Biswas A, Jacobs DW, Kress WJ, Lopez IC,

Chapman A, Grafton O. 2008. Guide to Best Practices for Generalising Soares JV. 2012. Leafsnap: A computer vision system for automatic
Sensitive Species Occurrence Data. Global Biodiversity Information plant species identification. Pages 502–516. Computer Vision–ECCV
Facility. 2012. Springer.
Chapman AD, Wieczorek J, BioGeomancer Consortium. 2006. Guide to Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Thomas D,
Best Practices for Georeferencing. Global Biodiversity Information Raddick MJ, Nichol RC, Szalay A, Andreescu D. 2008. Galaxy Zoo:
Facility. Morphologies derived from visual inspection of galaxies from the Sloan
Cook JA, et al. 2014. Aiming up: Natural history collections as emerg- Digital Sky Survey. Monthly Notices of the Royal Astronomical Society
ing resources for innovative undergraduate education in biology. 389: 1179–1189.
BioScience 64: 725–734. Lister AM, Climate Change Research Group. 2011. Natural history collec-
Deans AR, Yoder MJ, Balhoff JP. 2012. Time to change how we describe tions as sources of long-term datasets. Trends in Ecology and Evolution
biodiversity. Trends in Ecology and Evolution 27: 78–84. 26: 153–154.
Dickinson JL, Zuckerberg B, Bonter DN. 2010. Citizen science as an ecolog- Loreau M, et al. 2006. Diversity without representation. Nature 442: 245–246.
ical research tool: Challenges and benefits. Annual Review of Ecology, Masters KL. 2013. A Zoo of Galaxies. arXiv preprint arXiv:1303.7118.
Evolution, and Systematics 41: 149–172. McNeill J, et al. 2012. International code of nomenclature for algae, fungi
Dunn S, Hedges M. 2013. Crowd-sourcing as a component of humanities and plants (Melbourne code) adopted by the Eighteenth International
research infrastructures. International Journal of Humanities and Arts Botanical Congress Melbourne, Australia, July 2011. Koeltz Scientific
Computing 7:147–169. Books.
Ehrlich PR, Pringle RM. 2008. Where does biodiversity go from here? Moritz C, Patton JL, Conroy CJ, Parra JL, White GC, Beissinger SR. 2008.
A grim business-as-usual forecast and a hopeful portfolio of partial Impact of a century of climate change on small-mammal communities
solutions. Proceedings of the National Academy of Sciences 105: in Yosemite National Park, USA. Science 322: 261–264.
11579–11586. Mungall CJ, Gkoutos GV, Smith CL, Haendel MA, Lewis SE, Ashburner
Elwood S, Goodchild MF, Sui DZ. 2011. Researching volunteered geographic M. 2010. Integrating phenotype ontologies across multiple species.
information: Spatial data, geographic research, and new social practice. Genome Biology 11 (art. R2).
Annals of the Association of American Geographers 102: 571–590. National Research Council. 2009. Learning Science in Informal
Erb LP, Ray C, Guralnick R. 2011. On the generality of a climate-mediated Environments: People, Places, and Pursuits. National Academies Press.
shift in the distribution of the American pika (Ochotona princeps). ———. 2012. A Framework for K–12 Science Education: Practices,
Ecology 92: 1730–1735. Crosscutting Concepts, and Core Ideas. National Academies Press.
Everill PH, Primack RB, Ellwood ER, Melaas EK. 2014. Determining past Nelson G, Paul D, Riccardi G, Mast A. 2012. Five task clusters that enable
leaf-out times of New England’s deciduous forests from herbarium efficient and effective digitization of biological collections. ZooKeys
specimens. American Journal of Botany 101: 1293–1300. doi:10.3732/ 209: 19–45.
ajb.1400045. Nerbonne JF, Vondracek B. 2003. Volunteer macroinvertebrate monitoring:
Falk JH, Dierking LD. 2000. Learning from Museums: Visitor Experiences Assessing training needs through examining error and bias in untrained
and the Making of Meaning. AltaMira Press. volunteers. Journal of the North American Benthological Society 22:
Falk JH, Donovan E, Woods R. 2001. Free-Choice Science Education: How 152–163.
We Learn Science Outside of School. Teachers College Press. Newman G, Graham J, Crall A, Laituri M. 2011. The art and science of
Gkoutos GV, Green EC, Mallon AM, Hancock JM, Davidson D. 2004. Using multi-scale citizen science support. Ecological Informatics 6: 217–227.
ontologies to describe mouse phenotypes. Genome Biology 6 (art. R8). Parmesan C, Yohe G. 2003. A globally coherent fingerprint of climate
Goodchild M. 2007. Citizens as sensors: The world of volunteered geogra- change impacts across natural systems. Nature 421: 37–42.
phy. GeoJournal 69: 211–221. Parr CS, Guralnick R, Cellinese N, Page RDM. 2012. Evolutionary infor-
Guralnick RP, Wieczorek J, Beaman R, Hijmans RJ, Group BW. 2006. matics: Unifying knowledge about the diversity of life. Trends in
BioGeomancer: automated georeferencing to map the world’s biodiver- Ecology and Evolution 27: 94–103.
sity data. PLOS Biology 4 (art. e381). Penrose D, Call SM. 1995. Volunteer monitoring of benthic macroin-
Häuser C, Steiner A, Holstine J, Scoble M. 2005. Digital Imaging of vertebrates: Regulatory biologists’ perspectives. Journal of the North
Biological Type Specimens. A Manual of Best Practice. Stuttgart. American Benthological Society 14: 203–209.
Heidorn PB, Zhang Q. 2013. Label annotation through biodiversity Rainbow PS. 2009. Marine biological collections in the 21st century.
enhanced learning. Pages 882–884 in iConference 2013 Proceedings. Zoologica Scripta 38: 33–40.
iConference. Rotman D, Hammock J, Preece J, Hansen D, Boston C, Bowser A, He Y.
Hill A, et al. 2012. The notes from nature tool for unlocking biodiversity 2014. Motivations affecting initial and long-term participation in citizen
records from museum records through citizen science. ZooKeys 209: science projects in three countries. Pages 110–124 in iConference 2014
219–233. Proceedings. iConference.

Overview Articles
Russell KN, Do MT, Huff JC, Platnick NI. 2007. Introducing SPIDA-Web: scientist for the Center for Science Learning at the Florida Museum of Natural
Wavelets, neural networks and internet accessibility in an image-based History, in Gainesville. She has formed innovative partnerships and developed
automated identiﬁcation system. Pages 131–152 in MacLeod N, ed. numerous programs designed to promote science interest, understanding, and
Automated Taxon Identification in Systematics: Theory, Approaches engagement. Paul Flemons is head of the Science Services and Infrastructure
and Applications. CRC Press Taylor & Francis Group. Branch and is manager of collection informatics at the Australian Museum,
Sheshadri A, Lease M. 2013. SQUARE: A benchmark for research on in Sydney. His focus is on research and development of innovative solutions
computing crowd consensus. Pages 156–164 in Proceedings of the to biodiversity informatics challenges, particularly Web-based applications
First AAAI Conference on Human Computation and Crowdsourcing. for accessing and analysing biodiversity collection data. Robert Guralnick is
Association for the Advancement of Artificial Intelligence. (17 January an Associate Curator of Biodiversity Informatics at University of Florida. His
2015; http://ir.ischool.utexas.edu/square/documents/sheshadri.pdf) research bridges from biodiversity informatics, especially the digitization and
Shirk JL, et al. 2012. Public participation in scientific research: A framework mobilization of biodiversity data, to scientific questions related to assessing
for deliberate design. Ecology and Society 17 (art. 29). drivers of broad-scale biospheric change. Gil Nelson is an assistant professor for
Tschöpe O, Macklin JA, Morris RA, Suhrbier L, Berendsohn WG. 2013. research in the Institute for Digital Information and Scientific Communication
Annotating biodiversity data via the Internet. Taxon 62: 1248–1258. at Florida State University, in Tallahassee, where he specializes in digitiza-
Wake DB, Vredenburg VT. 2008. Are we in the midst of the sixth mass tion research and practice for iDigBio. Greg Newman is a research scientist
extinction? A view from the world of amphibians. Proceedings of the at the Natural Resource Ecology Laboratory at Colorado State University, in
National Academy of Sciences 105: 11466–11473. Fort Collins, whose research focuses on citizen science, ecological informatics

Walther G-RR, Post E, Convey P, Menzel A, Parmesan C, Beebee TJC, supporting biodiversity and citizen observations, and plant biodiversity conser-
Fromentin J-M, Hoegh-Guldberg O, Bairlein F. 2002. Ecological vation. He is the director of the CitSci.org Web site and cyberinfrastructure sup-
responses to recent climate change. Nature 416: 389–395. port system. Sarah Newman is the citizen science coordinator at the National
Whitmer A, et al. 2010. The engaged university: Providing a platform Ecological Observatory Network, where she works with Project BudBurst,
for research that transforms society. Frontiers in Ecology and the developing content, coordinating partnerships, and consulting on the Citizen
Environment 8: 314–321. Science Academy and other projects. Deborah Paul is an information technol-
Wieczorek J, Guo Q, Hijmans R. 2004. The point-radius method for geo- ogy specialist at iDigBio. Her current work for iDigBio is centered on user ser-
referencing locality descriptions and calculating associated uncertainty. vices as they relate to biological specimen digitization workflows, international
International journal of geographical information science 18: 745–767. collaboration, and workforce training for the natural science museum com-
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, munity. Greg Riccardi is a professor in the School of Library and Information
Robertson T, Vieglais D. 2012. Darwin Core: An Evolving Community- Studies at Florida State University (FSU), in Tallahassee, with a research spe-
Developed Biodiversity Data Standard. PLOS ONE 7 (art. e29715). cialization in scientific information management. He is also the codirector for
Wiggins A, Crowston K. 2011. From conservation to crowdsourcing: A computational activities at iDigBio. Nelson Rios is the manager of biodiversity
typology of citizen science. Pages 1–10 in HICSS Proceedings of the informatics for the Tulane University Biodiversity Research Institute, in New
2011 44th Hawaii International Conference on System Sciences. IEEE Orleans, Louisiana. His current research interests focus on the development
Computer Society. of software and services to enhance the use of natural history collections
Yoder MJ, Mikó I, Seltmann KC, Bertone MA, Deans AR. 2010. A gross data. Katja C. Seltmann is the project manager for the Plants, Herbivores and
anatomy ontology for Hymenoptera. PLOS ONE 5 (art. e15991). Parasitoids: A Model System for the Study of Tri-Trophic Associations Project.
Zelditch ML, Swiderski DL, Sheets HD. 2012. Geometric morphometrics Her interests are in collection digitization, biodiversity informatics, ontology,
for biologists: A primer. Academic Press. data visualization, Hymenoptera systematics and morphology and data dis-
covery in digitized collection records. Austin R. Mast (amast@bio.fsu.edu) is an
associate professor in the Department of Biological Science at FSU and director
Elizabeth R. Ellwood is a postdoctoral researcher at The Florida State of FSU’s Robert K. Godfrey Herbarium. He studies the interplay of ecology and
University, in Tallahassee. Her research is focused on the growing field of evolution that determines the form and function of plant life on Earth and the
citizen science, particularly its applications in ecology, phenology, and climate use of biodiversity research specimens and digital information about them to
change science. Betty A. Dunckel is the program director and an associate bring that interplay into sharper focus.
View publication stats

BioScience 2015 Ellwood - Etal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BioScience 2015 Ellwood - Etal

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Accelerating the Digitization of Biodiversity Research Specimens through

Article in BioScience · February 2015

Elizabeth R Ellwood Betty A. Dunckel

SEE PROFILE SEE PROFILE

Guralnick Robert Gil Nelson

SEE PROFILE SEE PROFILE

Worldwide Engagement for Digitizing Biocollections (WeDigBio) View project

The user has requested enhancement of the downloaded file.

Accelerating the Digitization of

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

W orldwide, there are approximately three billion

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 1

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

2 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 3

FGDC-standards-projects/accuracy/part3/index_html). OCR errors to some success (Heidorn and Zhang 2013). In

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

4 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

data elements (e.g., taxonomic identifi-

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 5

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

6 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Box 1. Our recommendations for online transcription tools.

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 7

the community type at alternative loca-

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

8 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 9

annotations related to damage would

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

10 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 11

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

12 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

http://bioscience.oxfordjournals.org XXXX XXXX / Vol. XX No. X • BioScience 13

Downloaded from http://bioscience.oxfordjournals.org/ at Florida State University on March 2, 2015

14 BioScience • XXXX XXXX / Vol. XX No. X http://bioscience.oxfordjournals.org

View publication stats

You might also like