You are on page 1of 7

OCLC Systems & Services

Emerald Article: Digital libraries: the systems analysis perspective


Robert Fox

Article information:
To cite this document: Robert Fox, (2012),"Digital libraries: the systems analysis perspective", OCLC Systems & Services, Vol. 28
Iss: 4 pp. 170 - 175
Permanent link to this document:
http://dx.doi.org/10.1108/10650751211279102
Downloaded on: 04-11-2012
References: This document contains references to 4 other documents
To copy this document: permissions@emeraldinsight.com

Access to this document was granted through an Emerald subscription provided by ST HELENS COLLEGE

For Authors:
If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service.
Information about how to choose which publication to write for and submission guidelines are available for all. Please visit
www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
With over forty years' experience, Emerald Group Publishing is a leading independent publisher of global research with impact in
business, society, public policy and education. In total, Emerald publishes over 275 journals and more than 130 book series, as
well as an extensive range of online products and services. Emerald is both COUNTER 3 and TRANSFER compliant. The organization is
a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
*Related content and download information correct at time of download.
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1065-075X.htm

OCLC DIGITAL LIBRARIES: THE SYSTEMS


28,4 ANALYSIS PERSPECTIVE
Digital libraries: the systems
170 analysis perspective
Accepted July 2012 Robert Fox
Hesburgh Library, University of Notre Dame, Notre Dame, Indiana, USA

Abstract
Purpose – This column seeks to examine the role of libraries in the description, organization, and
accumulation of critical research data.
Design/methodology/approach – This is an opinion column, but the literature was briefly
examined to understand the trends for this area.
Findings – Libraries have an important role in data curation and preservation regarding the research
data output of a university.
Practical implications – There is a very important opportunity for libraries to take the lead in the
important area of data curation and preservation regarding research data.
Originality/value – Libraries can play a vital role in establishing orderly policies and procedures
surrounding the curation, preservation, and organization of important research data, which is an
emerging area at research institutions, but is becoming an acute need.
Keywords Data management, Data preservation, Research data, Digital libraries, Data management
Paper type Viewpoint

Digital emancipation
A chief cornerstone of modern civilization is industry. Our capacity to transform raw
materials into useful products has steadily increased over the past four hundred years.
It is amazing to consider, in just the past century, how rapidly tools and machines have
been invented that make the transformation from naturally occurring substances a
very efficient and almost magical process. It is fascinating to watch documentaries that
narrate and demonstrate visually how all of the products that we take for granted, from
basic every day items, to complex components and machines, are formed from
fundamental elements such as sand, wood, rock, naturally occurring metals, etc.
Although we take for granted what we can buy in the department store, what
becomes readily apparent when one sees the fluid motion of an assembly line very
precisely manufacturing, for example, a fire resistant set of boots, is that a tremendous
amount of detailed analysis not only of the construction of the product from raw
materials, but also the intricate automation process that needs to be in place to make
sure that the product is of the highest quality. While it is inevitable that there will be
OCLC Systems & Services:
International digital library occasional mishaps, it is nothing short of amazing to see this almost flawless process
perspectives work over and over again taking something amorphous and producing something
Vol. 28 No. 4, 2012
pp. 170-175 recognizable when the process is complete.
q Emerald Group Publishing Limited In the sphere of academics and society, libraries have an analogous role regarding
1065-075X
DOI 10.1108/10650751211279102 data. And, when we refer to data, we really mean two primary types of data: metadata
and primary items themselves (either born digital or digital surrogates and The systems
derivatives). Data in many ways is the raw material that libraries work with, are analysis
stewards of, and in a fashion, docents guiding patrons through the ever growing and
already vast universe of data. Up until the last couple of decades, this responsibility perspective
corresponded primarily to the management of metadata, albeit in an analog venue on
book spines, card catalogs, vertical filing systems and microfilm/microfiche. The
organization and description of primary items (or artifacts, if you will, such as the 171
codex) has been and continues to be driven by the creation and maintenance of
metadata. Admittedly, they types of metadata that the modern, especially academic,
library works with has diversified and grown both more complex and flexible.
However metadata data as a primary category of data that libraries deal with on a
daily basis will continue to be central to library operations. The other category of data,
the primary objects, has now grown into a significant portion of daily library
operations as well. As was alluded to earlier, many artifacts that libraries have
traditionally worked with are now born digital and so that digital object is the
canonical reference point. As preservers of cultural heritage and information, the
housing and maintenance of these primary digital objects requires as much intentional
planning, and sometimes more planning, as what has gone into the housing and
maintenance of physical items.
As libraries deal with the influx and production of both kinds of data, the planning
around how to process, store, and access it has been a rather organic process. All along
the journey, though, librarians have been the first and best contributors to the store of
metadata that is now ubiquitous. The management and distribution of metadata has
had an interesting history. In one sense, descriptive metadata is a form of intellectual
property similar to any item that requires a good deal of analysis and expertise. On the
other hand, it is also something that is intended to be shared and leveraged across
organizations. While ownership of metadata is an interesting question, it pales in
comparison to the issues concerning primary data objects. Stewardship of data, not to
mention the analysis and provisioning of access to data objects has made the situation
for libraries considerably more complex. Digital rights management and licensing
seems to be in a constant state of flux, and has had a significant influence regarding the
loaning of this material, placement with digital reserves, and ownership.
This concern is especially acute regarding born digital serials content. If the content
cannot be archived in paper format, and access to licensed or free open access journals
is terminated, ownership of these digital assets comes to the fore. Solutions have been
put in place that help to mitigate against this sort of loss, but libraries must actively
commit financially to them, over and above the cost of licensing access to the journal
content itself. Escrow services and peer to peer archival mechanisms (e.g. LOCKSS and
Portico) do not fully resolve the problem and sometimes leave libraries in the difficult
position of deciding how to re-provision access to the content if that is feasible.
Abstract and indexing services are also stewards of metadata in particular, and the
licensing for access to this metadata again highlights the nature of intellectual property
in relation to metadata.
The ways in which libraries access and utilize both categories of data have become
increasingly complex, however they can be distilled into three major areas. The first is,
of course, the traditional method in which libraries legally and physically possess an
object. The object is a part of the library’s collection, but with digital objects the mode
OCLC of access differentiates it from physical objects. The object is stored on some form of
28,4 digital media, usually located within the bounds of the parent institution. It is
maintained, backed up and hopefully preserved appropriately in the digital context.
The second category is somewhat of a hybrid approach. In this approach, the metadata
or data object is in fact owned by the library, but it is housed and hosted by another
organization which may be a vendor, a consortial member, or a centrally managed
172 government organization. And in the third category, the library simply licenses access
to these two types of data, but possesses no ownership rights. Once the licensing
arrangement has been terminated, access to the data is lost.

Lifeblood of services
The modern library (as opposed to the ancient or medieval conception of the library as
a repository) has served as a conduit to information, and over time, rich sets of services
have also been provided in order to assist patrons in locating appropriate content and
to discern the quality of the content that they find. As the demand for ubiquitous
access to information has increased, libraries have had to relinquish the direct
administration of metadata to automation vendors who are now consolidating access
via cloud based, “software as a service” architectures. For smaller institutions, this is
often a necessity due to scalability and financial constraints. However, in many cases,
what is offered to libraries is a completely turn key hosted service that is both
physically and logically detached from the day to day workflow. Data now fuels
practically all of the services that libraries provide, and given that it is such a crucial
component, it is somewhat ironic that information science professionals have a
diminished ability to influence the content provided and the means by which these
hosted metadata services are provisioned.
There is nothing inherently imprudent about the situation as it currently stands.
Obviously having automated discovery systems which either simply provide an index
tuned for relevancy against an institutions holdings, or which combine aggregated
results from multiple sources including abstracting and indexing content, is a vast
improvement over non-automated methods. It must be admitted, though, that
companies such as Google have proven that having an unmediated ability to study,
manipulate, and directly enhance discovery algorithms which are constantly informed
by feedback mechanisms based on user input is so far the most successful approach.
Automated algorithms, no matter how sophisticated, can only go so far. At some point,
we need to concede that constant human input is required in order to further enhance
the relevancy of result sets. And, as Google has also proven, simply overloading an
index with keywords that act as conceptual access points is not enough. In fact, any
Internet search engine that is worth mentioning deliberately ignores such an approach.
Information professionals have for some time been aware of these facts. And yet,
apart from a few notable exceptions, libraries have not argued strongly for having
input into the algorithms that are used in vendor supplied discovery systems, nor have
we as a profession pressed for access to the business intelligence data that vendors
most assuredly collect. The jurisdiction over metadata and the systems that are built to
expose that carefully crafted information is typically not viewed as a partnership
between the libraries and vendors. And so in many cases, institutions which have more
abundant resources have pooled their talent and have been generous with their time, in
order to better leverage descriptive and technical metadata.
Within the academy, though, we are at a crossroads regarding data in general. The systems
Institutions now have an acute need for data management services that far exceed analysis
what libraries have traditionally offered. This is true not only in the realm of metadata,
but also with primary data objects themselves. Research needs, particularly in the
perspective
sciences but also in other fields of study, are poised to outpace the capacity for libraries
to assist in an area in which they are particularly suited to provide assistance. Grant
agencies now require a data management plan to be in place and articulated by the 173
principle investigators prior to awarding grant funds. Typically, the researchers
themselves have a primary focus on what they are studying and not on the mechanics
of project management. Whereas previously academic libraries would have the
responsibility of providing access to materials necessary for researchers to do their
work, there is now an opportunity for libraries to provide work-flow assistance to
researchers over the entire course of their projects. Also, information professionals are
particularly suited to assist with archival and data repository concerns.
There are many ways in which libraries are positioned to meet the data
requirements of institutions, which aspire to be top tier research schools. The National
Science Foundation in their report entitled Cyberinfrastructure Vision for the 21st
Century highlights libraries as being central to the provisioning of data archiving,
curation, and analysis of research data, and this has been echoed by studies that have
been conducted by ARL over the past five years (Peters and Dryden, 2011). An
example of the way in which academic libraries are being proactive in this regard is the
Distributed Institutional Repository that was developed by Purdue University, and
other schools such as Cornell and Georgia Tech have followed suit by providing
innovative services in their own libraries (Peters and Dryden, 2011, p. 388). In
providing this services, it is important to keep in mind that these institutions
recognized the unique skills that archivists and librarians bring to this endeavor not
only in the area of cataloging and categorizing resources, but also in the storage and
preservation of digital assets. With the full realization that data resources in a research
context are not monolithic, libraries have become increasingly prepared to deal with
heterogeneous data such as multimedia, statistical data files, and raw data formats.
Data management is becoming a necessity for current research interests. In the
corporate world, data management standards have been in place for quite some time,
since the data that is amassed in those situations corresponds directly to the fiscal
interests of the company. Backups, retention policies, OLAP analysis, and data
warehousing are all critical in this sphere. Academic researchers, though, have
typically not been as concerned with how their data is housed and managed over the
long haul because their focus is divided between academic duties, publishing, and their
own research interests. It has usually been an afterthought as to how data should be
managed (both metadata, and primary research data), and in many cases this is turned
over to graduate students, or a simple jointly accessible departmental storage area is
used. It is amazing, though, how much data can be accumulated for even one project,
and on average, none of this data has any organizational scheme applied whatsoever.
Also, little thought is given as to any perennial value that the data may have for the
future. In many cases, it is largely abandoned after the projects are complete.
OCLC Stewardship partners
28,4 Coming back to the two types of data that has been focused on: metadata and primary
data sources, while the former has traditionally been an area of expertise for libraries, the
latter is as well despite the fact that the artifacts are digital. Institutions are now
recognizing that action needs to be taken in this area because this is, for the university,
and issue of stewardship and responsibility. Previously, research data would have been
174 primarily in print form, occupying file cabinets and organized into reports prior to it’s
use for publication. This forced the researcher to be a little more organized and also pay
attention to the quantity of data that was being accumulated. Now, for most university
researchers, storage space seems to be a practically unlimited resource. It is not unusual
for certain scientific projects to accumulate terabytes, if not petabytes, of research data.
Clearly information specialists cannot organize such a vast quantity of information on
their own. They have no ability to properly analyze these data artifacts to determine
what would be valuable to retain, and what the meaning of the data is itself. Therefore, if
data curation on the part of libraries is to be successful, a partnership needs to be formed
between subject liaisons, metadata specialists, and research faculty members.
In many ways, we are now in the midst of what some have called a “data deluge”.
As was previously alluded to, researchers are now producing unprecedented quantities
of data, and not only in the hard sciences. Much of this data is, in fact, useful to the
broader research community. And yet in the state that it is usually stored and
managed, it is not capable of being shared or understood by those who are not
intimately familiar with the project at hand (Borgman, 2012). However, the need to
retrieve and intelligently access previously accumulated research data is sometimes
vital to the broader scientific endeavor. For example, genetics research now typically
involves scientists from many corners of the globe who are constantly needing to share
critical information in order to further their own research.
Information specialists and metadata experts have a place at almost every stage of
academic research today, given the more rigorous stipulations that are being imposed
by grant funding agencies, broad academic and research foundations, and universities
themselves. There are many areas that require analysis on this front, including the
scope of a given project, the goals of a project, how to define and describe what various
researchers consider to be “data”, the collection and organization of data both at the
physical level as well as the logical level, retention policies, preservation practices, etc.
In many cases, it may be desirable to retain what could be considered ephemeral data,
in order that future researchers perform more extensive analysis, and to see whether or
not conclusions can be verified and replicated.
This is a largely untapped area that libraries need to consider at a strategic level.
The needs go far beyond simply warehousing information, or providing rudimentary
search and discovery. With this opportunity, though, there are risks involved, and in
order to meet these needs, an entrepreneurial mindset needs to prevail. It is necessary
for libraries to fill this gap, however, since it will assist the research enterprise (both
scientific and non-scientific) avoid the liability of data loss, which could impact the
credibility of the parent institution or the researcher (Heidorn, 2011). The task of
convincing university administrative personnel who are in charge of institutional
research may not in some cases be an easy sell, but it is becoming an issue that cannot
be avoided, and the merits of leveraging the time honored skills already in place in
libraries makes for a strong argument (Lage et al., 2011).
References The systems
Borgman, C. (2012), “The conundrum of sharing research data”, Journal of the American Society analysis
for Information Science & Technology, Vol. 63 No. 6, pp. 1059-78.
Heidorn, P.B. (2011), “The emerging role of libraries in data curation and e-science”, Journal of
perspective
Library Administration, Vol. 51 Nos 7/8, pp. 662-72.
Lage, K., Losoff, B. and Maness, J. (2011), “Receptivity to library involvement in scientific data
curation: a case study at the University of Colorado Boulder”, portal: Libraries & the 175
Academy, Vol. 11 No. 4, pp. 915-37.
Peters, C. and Dryden, A.R. (2011), “Assessing the academic library’s role in campus-wide
research data management: a first step at the University of Houston”, Science & Technology
Libraries, Vol. 30 No. 4, pp. 387-403.

Corresponding author
Robert Fox can be contacted at: rfox2@nd.edu

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com


Or visit our web site for further details: www.emeraldinsight.com/reprints

You might also like