You are on page 1of 2

E-Content [All Things Digital]

Five New Paradigms for Science


and an Introduction to DataONE

W
e are entering a new era of science and scholar- Associated Challenges
ship. At least five paradigm shifts are driving The five paradigm shifts described above have created a need
many of the emerging trends associated with for new information infrastructure and research approaches.
this new era. First, grand challenge ques- This is exemplified in the environmental sciences, where
tions are increasingly dominating the scientific the scope and nature of biological, environmental, and earth
research agenda. The National Science Foundation (NSF) budget, sciences research are evolving in response to environmental
for example, designates significant funding for challenging prob- challenges such as global climate change, invasive species,
lems including clean energy research; science, engineering, and and emergent diseases. Scientific studies, as a consequence,
education for sustainability; and creating cyberinfrastructure for are increasingly focusing on long-term, broad-scale, and
the 21st century.1 NSF has invested heavily in telescopes, gravita- complex questions. Large volumes of diverse data collected
tional observatories, and other community instruments for the by remote-sensing platforms and embedded environmental
astronomy and physics communities. The term big science is sensor networks via collaborative, interdisciplinary science
often used to refer to the use of these community-based infra- teams are required to address such questions. In addition, new
structure platforms that engage large, interdisciplinary teams of approaches are necessary for managing, preserving, analyzing,
scientists in addressing extremely complex and challenging ques- and sharing the diverse array of data.
tions. The biological sciences and geosciences are now seeing sim- We face several challenges as we move into this new era
ilar investments in community infrastructure such as EarthScope of grand challenge science and scholarship. First, big science
(http://www.earthscope.org/), the Ocean Observatories Initiative and the digital makeover of libraries require substantial fund-
(http://www.oceanobservatories.org/), and the National Ecologi- ing, which has been difficult to realize in recent times when
cal Observatory Network (http://www.neoninc.org/). Thus, in this research sponsors and university systems have been financially
new era, big science extends to all research domains. strapped.
Second, data are now being viewed as valuable products of the Second, numerous informatics-related challenges compli-
scientific enterprise, as evidenced by the requirements for data- cate the picture. In a recent survey of environmental scientists,
management plans by the National Institutes of Health and NSF.2 Carol Tenopir and her colleagues ascertained that more than 80
This represents a major departure from the past, when project percent of the respondents agreed that they would be willing
success was judged almost entirely by the number of publications to share data across a broad group of researchers who use data
and number of students supported. in different ways.4 However, a majority of respondents also
Third, libraries are going virtual and are becoming the new noted that they experienced difficulties in doing so because
eras repositories for knowledge, information, and data. Increas- of the absence of formal established processes to store data
ingly, books are shelved on moveable stacks, freeing up space for beyond the project, inadequate tools and support for data
new collaboration spaces, as well as computing and visualization management during the life of the project, and the poor state
hardware. One consequence of this change is the move toward of existing tools for preparing documentation. These chal-
digital content collections that are readily accessible via the web, lenges are amplified by the fact that most data sets reside in
enabling even small libraries to develop and make accessible valu- hard-to-discover data silos, including individual laptop and
able digital material as part of curated collections. desktop computers, institutional repositories, and even large
Fourth, data-intensive science has been characterized as the data centers that may not be readily accessible to the interested
fourth research paradigm, following on the heels of experimen- scientist. Hence, I postulate that science is presently hindered
tation, theory, and computer simulation.3 It is now possible, for by the 80:20 problemthat is, 80 percent of a scientists effort
example, to perform dynamic simulations with sensor networks is spent discovering, acquiring, documenting, transforming,
providing real-time data that are used to update models and fore- and integrating data, whereas only 20 percent of the effort is
casts on-the-fly. devoted to more intellectually stimulating pursuits such as
Fifth, with the emergence of data-intensive science, it can be analysis, visualization, and making new discoveries. New IT
argued that data management has become the new statistics, solutions are clearly needed.
meaning that students now need to be trained in all aspects of
the data life cycle so that they can proficiently manage massive New Solutions to the Informatics Challenges
volumes of complex data and use new analytical and visualization The DataNet program at NSF was created to catalyze the
tools to interpret underlying patterns and processes. development of a system of science and engineering data col-

50 E d u c a u s E r e v i e w M a r c h / A p r i l 2 01 2 E-Content Department Editor: Kevin M. Guthrie


B y W i l l ia m Mic h e n e r

lections that is open, extensible and evolvable. 5 To date, five expose DataONE as essentially a large network drive. The over-
DataNet awards have been made, three of them in late 2011; arching goal of the Investigator Toolkit is to provide seamless
the two earlier awards (in 2009) went to DataONE (University interaction with the DataONE cyberinfrastructure for storing,
of New Mexico) and the Data Conservancy (Johns Hopkins retrieving, discovering, and visualizing data.
University). To focus on one example, DataONE (https://www
.dataone.org/) was designed to provide an underlying informa- Ushering in the New Era
tion infrastructure that facilitates data preservation and reuse Innovative cyberinfrastructure platforms, a re-envisioning
for research with a principal focus on the biological, environ- of the librarys role in support of scholarship, and scientific
mental, and earth sciences. DataONE, which stands for Data approaches that place high value on data stewardship are
Observation Network for Earth, supports rapid data discovery needed to resolve the numerous grand challenges faced by
and access across diverse data centers distributed worldwide scientists and society. Platforms like DataONE and tools that
and will provide scientists with an integrated set of familiar reduce the amount of time scientists spend focusing on more
tools that support all elements of the data life cycle (e.g., from mundane data-management activities are expected to signifi-
data-managementplanning and acquisition through data inte- cantly advance the nature and pace of science.
gration, analysis, and visualization). The new era of grand challenge science and scholarship
The cyberinfrastructure implemented by DataONE com- offers significant potential to advance our state of knowledge,
prises three principal components: Member Nodes, Coordinat- transform academia, and benefit society. Three specific actions
ing Nodes, and an Investigator Toolkit. Member Nodes include can advance this transition. First, we need to promote this change
existing or new data repositories that install the DataONE by embracing interdisciplinary, transdisciplinary, collaborative,
Member Node application programming interfaces (APIs). and data-intensive science. This requires lobbying for the nec-
Member Nodes encompass natural history collections, earth- essary funding and providing support and recognition to those
observing institutions, research projects and networks, librar- individuals who choose to join teams in addressing grand chal-
ies, universities, and governmental and nongovernmental lenge problems. Second, we need to educate future generations
organizations. Each Member Node acquires and maintains data of scientists by inculcating informatics throughout domain
and frequently provides value-added support services (e.g., curricula, not just in the computer and information sciences.
user help desk, visualization services) to a particular commu- Third, we need to advocate for change with a focus on break-
nity of users. ing down data, academic, and funding silos so that adequately
Coordinating Nodes are designed to be tightly coordinated, funded, interdisciplinary teams of scientists are poised to tackle
stable platforms providing network-wide services to Member the grand challenges. n
Nodes. They are responsible for cataloging content, managing
replication of content, providing search and discovery mecha- Notes
1. NSF Presents Presidents Fiscal Year 2012 Budget Request of $7.76 Billion,
nisms, managing access-control rules, and mapping identities
press release, February 14, 2011, <http://www.nsf.gov/news/news_summ
among different identity providers. Three initial Coordinating .jsp?cntn_id=118642>.
Nodes are located at Oak Ridge Campus (a consortium com- 2. NIH Data Sharing Policy and Implementation Guidance: <http://grants
.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm>; NSF
prising Oak Ridge National Laboratory and the University of
Dissemination and Sharing of Research Results: <http://www.nsf.gov/bfa/
Tennessee), the University of California, Santa Barbara, and dias/policy/dmp.jsp>.
the University of New Mexico. Coordinating Nodes maintain 3. Tony Hey, Stewart Tansley, and Kristin Tolle, eds., The Fourth Paradigm: Data-
Intensive Scientific Discovery (Redmond, Wash.: Microsoft Research, 2009),
the integrity of the DataONE federation by ensuring sufficient
<http://research.microsoft.com/en-us/collaboration/fourthparadigm/>.
replicas are made of digital objects (e.g., data plus associated 4. Carol Tenopir, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu,
metadata) to facilitate long-term preservation and by tracking Lei Wu, et al., Data Sharing by Scientists: Practices and Perceptions, PLoS
ONE, vol. 6, no. 6 (June 2011), pp. 121, <http://www.plosone.org/article/
those replicas to enable the identification of specific Member
info:doi/10.1371/journal.pone.0021101>.
Nodes where the content can be retrieved. The Coordinating 5. National Science Foundation, Office of Cyberinfrastructure, Directorate for
Node indexing services, in essence, provide a system-wide Computer & Information Science & Engineering, Sustainable Digital Data
Preservation and Access Network Partners (DataNet), <http://www.nsf.gov/
search mechanism enabling users to discover relevant content
pubs/2007/nsf07601/nsf07601.htm>.
from all participating Member Nodes.
The Investigator Toolkit is a modular set of software and
William Michener (wmichene@unm.edu) is Professor and Director of
plug-ins that enables interaction with DataONE infrastructure e-Science Initiatives for University Libraries at the University of New
through commonly used analysis and data-management tools. Mexico. He is Project Director for Data Observation Network for Earth
Components in the Investigator Toolkit include low-level soft- (DataONE) and is involved in research related to sustainability of
ware libraries intended for developers and more technically cyberinfrastructure, development of federated data systems, and
community engagement and education.
inclined investigators, desktop application plug-ins like the R
Project for Statistical Computing (http://www.r-project.org/), 2012 William Michener. The text of this article is licensed under the Creative Commons
and operating system extensions such as file system drivers that Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/).

w w w. e d u c a u s e . e d u / e r M a r c h / A p r i l 2 01 2 E d u c a u s e r e v i e w 51

You might also like