Unfortunately, it is difficult to monitor data sharing because data can be shared in so many different ways.Previous assessments of data sharing have included manual curation[8-10] and investigator self-reporting.These methods are only able to identify instances of data sharing and data withholding in a limited number of cases, and therefore are unable to support inquiry into patterns of data sharing behavior.This proposal addresses three phases of research critical to a full evaluation of data sharing behavior:Aim 1: Does sharing have benefit for those who share?Aim 2: Can sharing and withholding be systematically measured?Aim 3: How often is data shared? What predicts sharing? How can we model sharing behavior?
B.1 Significance of this proposal
This proposal describes a new, exploratory, and innovative research project that could radically impact theadoption of data sharing in biomedical research.This work directly supports NIH strategic initiatives. This work will provide a foundation for evaluating theeffectiveness of the NIH’s data sharing policies. Further, the current proposal will contribute to the NLM’sgoals of “contributing to comprehensive strategies for preservation of biomedical information in the US andworldwide” and “developing linked databases for discovering relationships between clinical data, geneticinformation, and environmental factors” . This work would also be relevant for the NCRR in its work to“facilitate information sharing among biomedical researchers” as part of its current strategic plan.Progress will also be of significant value to a broad cross-section of disciplines, including:
Funders, policy makers and thought leaders.
Although some results of this analysis may be intuitive(a stronger journal data sharing policy results in more data sharing, or shared data permits reuse andthus supports a higher citation rate), these relationships have not yet been demonstrated. Concrete,supporting – or contradictory! – evidence will be of value to a wide spectrum of policy makers andthought leaders. For example, funders can improve the impact and efficiency of their investmentsthrough applying lessons learned from communities with high data sharing adoption to those with roomfor improvement.
Database, software, and data standard developers
. The usage patterns of those who share dataprovide critical requirement specification feedback for developing and refining databases, software, andstandards to support data sharing and reuse. Learning who does not currently share data can provideinsight into failings of current tools and opportunities for improvements.
Biomedical informatics community
. Informatics involved evaluation of the generation, use, andvalue of information resources; this research addresses this topic from a novel perspective. Thebiomedical informatics field will also benefit by exposure to methods it does not commonly apply. For example, my plans to apply NLP techniques to the biomedical literature through full-text portals couldhave wide applicability for information retrieval. Finally, the general biomedical informatics communitywill benefit if and when this research leads to initiatives that increase the rate of data sharing.
Information science and digital library community
. Data use behavior and resource usage metricsare active research topics in information science and digital library research. Several ongoing projectsare investigating data use in the social sciences, but there has been little recent investigationmeasurement of research sharing in the biomedical arena. Furthermore, most of the informationscience studies have used survey approaches; my emphasis on measured variables will provide adiverse perspective.
Open Science community
. Grassroots movements to increase openness and transparency inscience will benefit from rigorous, quantitative assessments of current data sharing behavior.
. Last but not least, I expect that this research will help inspire investigators toshare their data and help inform the creation of tools that help them. As data sharing is evaluated andpolicies and incentives improved, hopefully investigators will become more apt to share and reusestudy data and thus maximize its usefulness to society.Expected contributions, taking the form of papers and associated datasets, include:1. an assessment of the observed and measured rewards, prevalence, and patterns of gene expressionmicroarray dataset sharing