You are on page 1of 28

Research Data Management

DataGuide
Version 6.2

UBC Library
Research Data Services
research.data@ubc.ca

Last Updated:
July 28, 2022
Introduction 3
The FAIR Principles 4
ORCID ID: A Persistent Identifier for You 4
DOIs: A Persistent Identifier for Your Digital Objects 5
Introduction References 5

Research Data and Data Management Plans 6


What is research data? 6
Why is it important to properly manage your data? 6
The research data lifecycle 7
What is a Data Management Plan? 8
Canadian Requirements 8
Writing a DMP 9
Indigenous Research Data 10
Research Data and DMP References 12

Metadata and Data Organization 13


What is Metadata? 13
Organizing Metadata 13
README files 14
Recommended Naming Conventions 15
Metadata References 17

Data Storage and Security 18


Data Storage 18
File Formats 19
Considerations When Selecting File Formats 19
Recommended File Formats 20
Data security 21
Data Storage and Security References 23

Data Sharing and Reuse 24


Why share data? 24
Challenges to sharing data 24
How to share data 25
UBC Supported Repositories 26
Licensing Data 26
Citing Data 27
Data Sharing and Reuse References 27

Major Version History 28

2
Introduction

Research Data Management is increasingly important for researchers at all stages of their
careers. Adopting a workflow that produces well organized, documented, preserved, and
accessible research data will make your job easier in the long run. Funding organizations and
ethics boards have begun to require more transparency and compliance with research data
management practices as a condition of approval and funding. Being upfront with your project
and your team can save you countless hours of deciphering, numerous head scratching
moments, and can ensure that your research data is accessible in the future to you and others
you may share your data with.

Research settings commonly include numerous researchers or assistants. Creating a plan for
your research data from the beginning will serve you all well. Part of the planning process allows
you to account for all members of the team, and consider what the procedure would be for any
incoming or departing team members. In the long run, this can help avoid data loss that occurs
as teams evolve over time.

Take a moment to think about your research and the kinds of data you generate.

● Where is this data stored and how is it organized?


● If you were asked to share your data with another researcher would they be able to
make sense of your work?
● If you needed to locate your data files from 5 years ago, how easy would they be to find
and use?

If you are unsure about the answer to any of these questions you have come to the right place.
This document was created for graduate students, research assistants, and researchers
interested in improving their data management skills. We cover the basics of data management
plans, metadata and data documentation, data storage and security, and data sharing. This
document is designed to get researchers thinking about the steps they can take to better
manage their research data.

3
The FAIR Principles
FAIR stands for:
● Findable - Findable data is discoverable thanks to its metadata
● Accessible - Accessible data is always available and obtainable, this does not mean the
files are open, rather that you can access the metadata regarding the files.
● Interoperable - Interoperable data is able to be used by many researchers from many
locations
● Reusable - Reusable data is described, licensed, and shared in such a way that wide
reuse is possible.

Image: OpenAIRE. https://www.openaire.eu/how-to-make-your-data-fair

The FAIR Principles of Research Data Management focus on metadata, which is the descriptive
data about your project that supports its ability to be FAIR. The cartoon above, by OpenAIRE,
illustrates the principles nicely.

It is important to note that all data can be FAIR, but not all FAIR data is open. OpenAIRE states
that data should be “as open as possible, as closed as necessary.” Not all data can be fully
open, but it should still be findable at the metadata level.

ORCID ID: A Persistent Identifier for You


Persistent identifiers, like DOIs, are utilised to create reliable, persistent links to digital objects
like research papers and datasets. There are also persistent identifiers for researchers. Unlike
those used for digital objects, those for researchers are attached to one person’s research
output. However, some persistent identifiers are directly tied to disciplines or publishers, thus
limiting their application.

We’d like to introduce you to ORCID.

4
ORCID is an international non-profit organization. In Canada, the ORCID-CA consortium is
administered by the Canadian Research Knowledge Network (CKRN) with dozens of members.
They hope to register all researchers in Canada for an ORCID ID to use throughout their
careers.

ORCID IDs allow researchers to connect all of their research to a singular, unique record. An
ORCID ID is yours, for your entire career. Even if your name changes or you move between
countries, postings, or fields, your ORCID ID remains unchanged. You will commonly see
ORCID IDs listed on publications and requested in grant applications. ORCID IDs can be
represented as a unique URL or as a QR code, allowing you to add it to anything you wish,
including CV’s, personal websites, business cards, and presentations with ease. ORCID ID
allows you to compile your works and other pertinent information into a profile accessible
through your ORCID ID.

Register for your ORCID ID here: https://orcid.org

DOIs: A Persistent Identifier for Your Digital Objects


Digital Object Identifiers (DOI) are utilised to create reliable, persistent links to digital objects like
research papers and datasets. Some repositories will automatically assign DOIs to your
datasets, allowing you to link to them without concern that the link has changed. This is
especially useful for citations. DOIs create a discoverable object, with a persistent home,
helping you track your scholarly impact.

DOIs are created automatically for any digital objects you deposit in UBC Dataverse Collection,
cIRcle, FRDR and Dryad. For more information visit: Get DOIs.

Introduction References
OpenAIRE. (n.d.). How to make your data FAIR. OpenAIRE.
https://www.openaire.eu/how-to-make-your-data-fair

ORCID. (n.d.). About ORCID. ORCID. https://info.orcid.org/what-is-orcid/

UBC Library. (2018). Get DOIs. University of British Columbia.


https://researchdata.library.ubc.ca/plan/get-dois/

5
Research Data and Data Management Plans

What is research data?


Before we begin our discussion of research data management we should clarify what we mean
by research data. Research data can be defined as:

Data that are used as primary sources to support technical or scientific


enquiry, research, scholarship, or artistic activity, and that are used as
evidence in the research process and/or are commonly accepted in the
research community as necessary to validate research findings and results.
All other digital and non digital content have the potential of becoming
research data. Research data may be experimental data, observational data,
operational data, third party data, public sector data, monitoring data,
processed data, or repurposed data.
- (CASRAI Dictionary, 2018)

In practice, research data can be a great many things, from DNA samples to interview
transcripts to photographs.

Why is it important to properly manage your data?


Most researchers today work with a lot of research data, and without proper data management
it can be difficult to keep track of everything!

Proper data management can make it easier for you to:


● Find your files
● Keep track of different versions of your data
● Organize and compile information at the end of a project
● Reproduce your work (if required for a journal or patent)
● Pass on your work to another researcher or bring on a new team member
● Share your work
● Satisfy grant requirements
● Satisfy journal requirements
● Satisfy research ethics board requirements

6
The research data lifecycle
It is important to remember that managing your research data is not something that you do at the
end of a project, but throughout each stage of the process. The following diagram illustrates the
different stages in working with research data:

Image: University of Ottawa

We use the data life cycle to help us understand where we are in our research and the research
data management needs of our data. The data life cycle consists of seven phases:
● Plan - Review existing data, gain informed consent for sharing, consider costs for data
management and storage, write a DMP
● Create - Produce research data, record all metadata
● Process - Digitize data. Data is validated, cleaned, coded, and anonymized.
● Analyze - Interpretation to produce findings. All documentation completed.
● Preserve - Selection of formats for storage, DOIs assigned
● Share - A license is applied and access is outlines
● Reuse - Complete research data packages are made available for reuse

While the cycle presents a beautiful, cyclical flow, most research flows back and forth through
the cycle over time. In order to determine how best to manage data through each stage of a
project, researchers create a data management plan.

7
What is a Data Management Plan?
A data management plan, or a DMP, is a living document that helps you manage your research
data by outlining in advance what you will do with your data during and after your research
(DataOne, 2012). As a living document data management plans can be updated throughout
your research.

A detailed data management plan can help you save time and money by getting you to think
about the different steps in your research process and the resources and tools you will need in
order to organize, store, and share your data now and in the future.

Canadian Requirements
Data management plans are a key requirement of the Tri-Agency Research Data
Management Policy. The Tri-Agency is a joint effort from three Canadian agencies: The
Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering
Research Council of Canada (NSERC), and the Social Sciences and Humanities Research
Council of Canada (SSHRC).

The policy outlines the steps the Tri-Agency will take to incorporate DMPs into grant
applications beginning in the spring of 2022. The policy also requires institutions to outline how
they will support research data and researchers. UBC is in the process of developing its
institutional plan. Many American funding agencies, including the National Science Foundation
(NSF), require data management plans. We recommend that you save yourself a headache
later and incorporate writing a data management plan into your research development
processes. We also recommend that you read the policy in full, as there are nuances that may
apply to your research.

Section 3.2 of the Tri-Agency Data Management Policy states that

[A]ll research data management plans should describe:

● How data will be collected, documented, formatted, protected and


preserved;
● How existing datasets will be used and what new data will be created
over the course of the research project;
● Whether and how data will be shared; and
● Where data will be deposited.

DMPs should state who is responsible for the project’s data management, the succession plan
should they leave, and what the role of each team member is as it pertains to data.

8
Writing a DMP

There are several online tools that exist to help you write a DMP. In Canada, the Portage
Network has created DMP Assistant, in the United States the University of California maintains
DMP Tool, and in the UK the Digital Curation Centre maintains DMP Online. Each of these tools
functions similarly by walking you through the process of writing a DMP.

Standard components of a DMP include:


● Data Collection
○ What types of data are collected & how do you plan to organize it?
● Documentation and Metadata
○ How do you plan to describe your data?
○ What standards apply in your field?
● Storage and Backup
○ What storage methods and backup procedures will you implement?
● Preservation
○ What is your plan for long term access?
● Sharing and Reuse
○ How will you ensure that your data is accessible?
○ What licenses will you apply to aid in reuse of data?
● Responsibilities
○ What data management tasks will be managed by each member of the team?
Who is overseeing the management?
○ What are the data management, storage, and security costs of your research in
the short and long term?
● Ethics and Legal Compliance
○ What legal requirements must you meet?
○ What ethical concerns accompany your data collection?
○ How do you plan to obtain consent for participants to share the research data?

Here in Canada, the DMP Assistant is available in both English and French. It allows you to
create and export your DMP. Portage has created templates for various fields, and UBC has a
general template for use by UBC researchers. Note that not every question is applicable to
every research project. The Portage Network provides access to free webinars and training
modules on data management and DMP Assistant. If you have questions about your DMP,
please ask your liaison librarian or email research.data@ubc.ca.

9
Indigenous Research Data

There are specific considerations and protocols surrounding Indigenous research data
collection, use, and sharing.

UBC Principles for Indigenous Data Governance


UBC’s Indigenous Research Support Initiative (ISRI) has published its own Principles for
Indigenous Data Governance.

1. UBC recognizes Indigenous self-governance and self-determination

2. Engagement and data governance will be informed by Indigenous communities


following a community-driven and Nation-based pathway.

3. Ownership and right to control their data, as asserted by Indigenous


governments.

4. Recognition that access and possession of Indigenous data must be respectfully


permitted by the relevant authority. This applies to the collection, protection, use,
and management of data records and information.

5. UBC helps to advance the interests, rights, and jurisdiction with respect to
reclamation of information in UBC’s possession including research data, records,
and other types of information.

6. Indigenous data governance standards apply to data that relates to each Nation
and their identity as distinct people, communities, and Nations regardless of
where the data is held across UBC.

7. Individual rights and privacy are protected, while collective rights, privacy, and
security are evolving.

8. Adoption of common approaches to advancing data-related interests and issues


will be considered.

9. Collaboration on projects will increase the capacity of Indigenous Nations to


manage and govern their own data and information.

10. All activities of UBC ISRI will be transparent and consistent with co-developed
management processes.

Please visit the ISRI web site for more information related to engagement principles, data
governance, and ethics. ISRI is currently developing ethics guidelines for Indigenous research.

10
CARE Principles for Indigenous Data Governance

The CARE Principles for Indigenous Data Governance are designed to complement the FAIR
principles and take into account the current and historic power imbalances between researchers
and Indigenous communities.

CARE stands for:


● Collective Benefit - “Data ecosystems shall be designed and function in ways that
enable Indigenous Peoples to derive benefit from the data.”
● Authority to Control - Indigenous people have the right and authority to control their
data.
● Responsibility - Researchers working with Indigenous Peoples have a responsibility to
support Indigenous Peoples rights.
● Ethics - “Indigenous Peoples’ rights and wellbeing should be the primary concern at all
stages of the data life cycle and across the data ecosystem.”

The CARE Principles can be examined in depth here: CARE Principles for Indigenous Data
Governance

The First Nations Principles of OCAP®

The OCAP® Principles of data governance outline how to interact with First Nations data.
OCAP® stands for:
● Ownership - First Nations communities or groups own their data collectively
● Control - First Nations communities can control all aspects of the research cycle that
impact them directly.
● Access - First Nations retain access to the data, regardless of where it is held.
● Possession - First Nations retain physical control of the data.

OCAP® certifications are available through the First Nations Information Governance Centre
(FNIGC). OCAP® is a registered trademark of the FNIGC.

11
Research Data and DMP References

CASRAI. (n.d.). Research data management glossary. CASRAI. https://casrai.org/rdm-glossary/

DataOne. (2012). DataOne education modules. DataOne.


https://old.dataone.org/education-modules

Digital Curation Centre. (n.d.) Data management plans. Digital Curation Centre.
https://www.dcc.ac.uk/resources/data-management-plans

First Nations Information Governance Centre. (n.d.). The First Nations principles of OCAP®.
https://fnigc.ca/ocap-training/

Government of Canada. (2021). Tri-Agency research data management policy. Tri-Agency.


https://www.science.gc.ca/eic/site/063.nsf/eng/h_97610.html

Indigenous Research Support Services. (2019). Principles for Indigenous data governance.
University of British Columbia.
https://irsi.ubc.ca/transforming-research/indigenous-data-governance

Krier, L. & Strasser, C. A. (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.

Portage Network. (2020). Brief guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989

Research Data Alliance International Indigenous Data Sovereignty Interest Group. (2019).
CARE principles of Indigenous data governance. The Global Indigenous Data Alliance.
https://www.gida-global.org/care

UBC Library. (2018). Research data management. University of British Columbia.


https://researchdata.library.ubc.ca/

Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf

12
Metadata and Data Organization

What is Metadata?

Metadata is often described as “data about data” and helps answer the questions of who, what,
when, where, why. This descriptive data is essential for creating FAIR and open data, and
ensuring that the datasets you preserve will be accessible for many years to come.

Metadata makes it easier for researchers to:


● share their data
● publicize their data
● locate and retrieve data sets from others

Three of the most common categories of metadata are:

Descriptive: Descriptive metadata describes the content and context of your data at both the
dataset and item level. Examples: title, author, keywords

Administrative: Administrative metadata includes information needed to use the data.


Examples: software requirements, copyright, licensing

Structural: Structural metadata describes how different data sets relate to one another, or
any processing or formatting steps that were undertaken. Examples: Information about the
relationship between data sets in a database, file formats

Take a moment to think about your research project. What kind of descriptive, administrative
and structural metadata might you want to record?

Organizing Metadata

Odds are you have already written down a good deal of metadata about your project; hopefully
you don’t plan on doing it all at the end. Save yourself some trouble and start gathering
metadata at the beginning of your project.

Are you unsure what to record? Many disciplines have created their own metadata standards to
ensure that data records can be interpreted and compared across projects and fields. A typical
metadata standard provides a set structure and language for describing your data. Some of the
most common metadata standards include Dublin Core, Darwin Core (for the biological
sciences), and DDI (Data Documentation Initiative).

13
If you are deciding which metadata standard to use, remember that many data repositories,
organizations, and journals have specific requirements for metadata. Double check before you
commit.

Curious about what metadata standards are common in your field? Take a moment to visit
the following link and find a metadata standard used in your field: Data Curation Centre:
Disciplinary Metadata Standards

README files

Regardless of which metadata standards you follow, it is important to properly document your
data. README files are the most basic tool for project documentation. They contain basic
descriptive metadata about your project and should accompany your data throughout its life.
README files are plain text files (.txt) that are operable by all computers. A README file can
pertain to your entire project, or you can create several README files for more complex
datasets.

At the very least you should document the following in a README.txt file stored alongside
your data:
● Contact information of researchers, including ORCID IDs
● Description of dataset
● Sources used
● Date of collection
● Use license that dictates how the data can be reused
● Methods of collection (protocols, sampling, instruments, coverage, etc.)
● Tools used to collect & process the data
● Data modifications made
● Quality assurances (data validation, checking)
● File structure and file relations for the data set
● Explanations of codes, classifications, variables, and file names

Cornell has a very useful README template that you can use to build your own.

In addition to what you write, how you write it is very important. Always remember to be as clear
as possible! It is easy to take for granted what is “common knowledge.” Remember that
common knowledge changes over time and failing to record something because “everyone
does it this way” could have dire consequences for the future accessibility and reusability of
your data.

The following are a list of best practices related to data documentation:


● Don’t use jargon
● Define all terms and acronyms
● State limitations

14
● Use descriptive titles
● Be specific and quantify
● Use keywords
● Make it machine readable (avoid symbols)

Finally, don’t wait to document your data! If you wait until the end of your project, you might
lose valuable information!

File naming

Research projects can generate hundreds or even thousands of individual data files. Proper file
names and organization can make these files easier to locate and navigate. But even if you
don’t have hundreds of files, creating a file naming structure will keep your research organized,
especially within research teams.

It is recommended that you choose a file naming convention and implement it throughout the
duration of your project. Make sure that everyone on your team is following the same rules for
naming files. When deciding how to name your files remember the following:

● Keep file names under 32 characters


● Classify broad types of files (transcript, photo, etc.)
● Avoid spaces and special characters
● Use underscores instead of periods or spaces to separate portions of the file names
● Make sure that file names are descriptive outside of their folders (in case they are
misplaced or change locations); i.e., the file name should include all necessary
descriptive information
● Include dates and format them consistently (international standard for date notation is
YYYYMMDD or YYYY_MM_DD)
● Include a version number to track multiple versions of a document
● Be consistent!

Recommended Naming Conventions


1. Denote dates in YYYYMMDD format because computers sort it in chronological order.

DO: 20180403

DON’T: 04032018

2. Use a short unique identifier (e.g. Project Name or Grant #) to reduce the need to scroll
horizontally in order to read the file name.

DO: CHHM

DON’T: Centre for Hip Health and Mobility

15
3. Include a summary of content (e.g. Questionnaire or GrantProposal) as part of the file
name

DO: FileNm_Guidelines_20180409_v01.docx

DON’T: FileNm_20180409.docx

4. Use _ (underscore) as a delimiter. Avoid spaces between words and these special
characters: & , * % # * ( ) ! @$ ^ ~ ‘ { } [ ] ? < > – as different
operating systems handle special characters differently. Using special characters can
impact the ability of a file to be opened or change how the system sorts the files.

DO: FileNm_Guidelines_20140409_v01.docx

DON’T: FileNm Guidelines 2014 04 09 v01.docx

5. Keep track of document versions either sequentially (e.g. v01, v02,) or with a unique
date and time ( e.g. 20140403_1800) to accurately track versions.

DO: FileNm_Guidelines_20140409_v01.docx

DON’T: FileNm_Guidelines_20140409_Review.docx OR

FileNm_Guidelines_20140409_Investigation.docx

6. A good file naming system will replace an extensive folder hierarchy. Limit the number of
nested folders and strive to make hierarchies as simple as possible. Complex folder
hierarchies are harder to navigate and offer more opportunities for filing errors. System
back-ups may take longer.

DO: F:/ Env/LIBR/DataMgmt_FileFormats_20140409_v01.docx

DON’T:
F:/Environment/Library/Woodward/Data/Education/Materi
als/Draft/2014/04/DataMgmt_FileFormats_20140409_v01.d
ocx

16
Metadata References

DataOne. (2012). DataOne education modules: Metadata. DataOne.


https://old.dataone.org/education-modules

GO FAIR. (n.d). FAIR principles. GO FAIR. https://www.go-fair.org/fair-principles/

Government of Canada. (2021). Tri-Agency research data management policy. Tri-Agency.


https://www.science.gc.ca/eic/site/063.nsf/eng/h_97610.html

Krier, L., & Strasser, C. A. (2014). Data management for libraries: A LITA guide . Chicago: ALA
TechSource.

Library of Congress. (2021). Sustainability of digital formats planning for Library of Congress
collections. Library of Congress. http://www.digitalpreservation.gov/formats/

Portage Network. (2020). Brief guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989

UBC Library. (2018). Research data management. University of British Columbia.


https://researchdata.library.ubc.ca/

University of Oregon Libraries. (2021). Research guide: Research data management. University
of Oregon. https://library.uoregon.edu/research-data-management

Virginia Tech Digital Library and Archives. (2017, June 16). Recommended file formats. Virginia
Tech. https://etd.vt.edu/howto/accept.html

17
Data Storage and Security

Data Storage

Data storage and security considerations are essential aspects of managing research data and
should be mapped out in your data management plan. At the beginning of any project
researchers should map out what data they will be generating and how they plan on storing it. In
deciding where to store your data ensure that you understand your organization’s policies and
infrastructure for data storage and backups. This includes considering the most appropriate
storage system for sensitive data and what institutional policies apply to its handling.

A best practice is to have three copies stored in at least two locations (in case of a failure at
one location), one of them off-site. Even if each location is a cloud-based server, do not store all
of your backups on the same cloud-based server as a precaution. Cloud-based servers do have
internal redundancies to prevent the loss of data, but utilizing multiple services is a good
practice in the off-chance of a catastrophic loss.

18
Another essential step in data storage is to retain an original, unedited copy of your raw data
file. This file should be locked in a read-only format, which requires copying the file to make
changes. It is imperative that you do not overwrite this file so you have a fail-safe to return to
should something go awry.

Remember, just because you have saved your data doesn’t mean it is safe! Data can be lost
for a number of reasons including:

● hardware failures
● software failures
● viruses or hacking
● power failures
● natural disasters
● human error
● theft of equipment

Even if you are backing up your data, remember to check that the backups are working and that
the data is accessible. Every time you edit your working copy, the backup copies should be
updated! A backup copy from 6 months ago that contains none of your recent data is practically
useless. This backup copy should include all pertinent files, including your README files. Think
of each backup as a complete packaged copy of your working files, allowing you to return to
work without any rework should you need to utilize your copies. Finally, backing up the entire
package of stored data helps ensure that everything can be understood in the future.

File Formats

A file format is a way of encoding information within a computer file so that it can be recognized
by an application and accessed. It is indicated by the file name extension (generally a full stop
followed by three letters such as .txt, .doc, .jpg, .mov). In other words, this allows the computer
to recognize that a document contains text or that a file should be processed as a video.
Additionally, file formatting is important as this may affect whether the file contents are
accessible following long-term storage.

File formats are an essential consideration in data storage. Software and data storage
technology changes quickly, and files can easily become obsolete or difficult to access. In
general, it is recommended that data files are copied to new media every 2-5 years, especially if
technology changes or if files begin to degrade.

Considerations When Selecting File Formats


1. Proprietary and non-proprietary (open) formats
Proprietary formats are limited by software patents, lack of format specification details, or
built-in encryption to prevent open usage by the public. This results in requiring specific
software provided by one vendor in order to use the proprietary format. In contrast, an

19
open format is a file format that is freely available for everyone to use. Because the
specifications are released, open-source developers can write software to utilize the file
format in the case that a particular vendor no longer supports the file format. This
increases the chances that technological developments do not make particular file
formats obsolete.

2. Industry format adoption


In some cases, an industry may treat specific file formats as a de facto standard even if
the formats are proprietary and rely on expensive software. In those cases, it may be
more convenient to use the same proprietary file format.

3. Technical dependencies
Technical dependencies are the degree to which a particular format depends on
particular hardware, operating system, or software and how these dependencies might
influence future usage of the media. Using non-proprietary file formats may decrease the
risk of technical obsolescence by removing the dependency on the underlying
technology.

4. File quality and file size


Each file type such as text, images, or sound has many file formats available. File
quality, the representation of the given item’s characteristics, is a large part of the file
format decision. Encoding that handles high resolution will be larger than lower quality
file formats. However, the trade-off comes at the cost of storage space and convenience
in disseminating the file to others.

Recommended File Formats

Digital Images
● TIFF version 6 uncompressed (.tif)
● JPEG (.jpeg, .jpg)
● TIFF (other versions)(.tif, .tiff)
● JPEG 2000 (.jp2)
● Adobe Portable Document Format (PDF/A, PDF) (.pdf)

Digital Sound
● AIFF (96kHz 16bit PCM) (.aif, .aiff)
● FLAC (.flac)
● MP3 (.mp3)
● WAV (96kHz 24bit PCM) (.wav)

Digital Video
● MPEG-4 High Profile (.mp4)

E-Books

20
● EPUB

Qualitative Data (text)


● eXtensible Mark-up Language (XML) text according to an appropriate Document Type
Definition (DTD) or schema (.xml)
● Rich Text Format (.rtf)
● plain text data, UTF-8 (unicode) (.txt)

Quantitative Data, tabular with extensive metadata


A dataset with variable labels, code labels, and defined missing values, in addition to the matrix
of data
● Character delimited text (ASCII or Unicode preferred): Comma Separated Values (.csv)
or Delimited Text (.txt)
● Structured text or mark-up file containing metadata information, e.g. DDI XML or JSON

Quantitative Data, tabular with minimal metadata


A matrix of data with or without column headings or variable names, but no other metadata or
labelling
● comma-separated values (CSV) file (.csv)
● tab-delimited file (.tab) including delimited text of given character set with SQL data
definition statements where appropriate

Text Documentation and Scripts


● Plain text (.txt)
○ Encoding: USASCII, UTF-8, UTF-16 with BOM)
● PDF/A-1 (ISO 19005-1)
● XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema)

Vector and Raster Geospatial Data


● ESRI Shapefile (essential -- .shp,.shx, .dbf; optional -- .prj, .sbx, .sbn)
● geo-referenced TIFF (.tif, .tfw)
● CAD data (.dwg)
● tabular GIS attribute data
● Keyhole Mark-up Language (KML) (.kml)

Data security

As an ethical researcher, data security is an essential aspect of data management. Security


regulations will differ based on the confidentiality of your data. Generally, the more confidential
your data, the more you should limit access to it.

Security planning should encompass the following areas:


● Network security
○ Who has access to the network?

21
○ Are there firewalls?
● Physical security
○ Who has access to the computers?
○ Who can access physical files?
○ How is data transported?
● Computer security
○ Is antivirus software up to date?
○ Are you protected against power surges?
○ Do you use passwords and firewalls?
○ Is data encrypted?
○ Is data storage secure?

If you are dealing with private or sensitive data make sure you understand your organization’s
regulations about storage, security, and disposal. Data can be sensitive due to direct and
indirect identifiers, but can also be due to data ownership, use agreements, etc. If you’re
unsure, please ask. Some countries including Canada do not allow personal data to be stored
in servers outside the country, making commercial storage systems like Dropbox or Google
Drive unusable for files containing personal information.

Finally, remember that just because you deleted something doesn’t mean it can’t be recovered!
To destroy data, you must overwrite a hard drive, physically destroy memory sticks and shred
paper documents.

For more help and training on data security, please visit Privacy Matters @ UBC. They have a
two part training module on privacy and information security.

22
Data Storage and Security References

Cornell University Library. (2021). Recommended file formats. Cornell University.


https://guides.library.cornell.edu/ecommons/formats

Krier, L., & Strasser, C. A. (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.

MIT Libraries. (n.d.). Data management. Massachusetts Institute of Technology.


https://libraries.mit.edu/data-management/

Portage Network. (2020). Brief Guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989

Research Data Management Services Group. (n.d.). Data management planning. Cornell
University. https://data.research.cornell.edu/content/data-management-planning

UBC Library. (2018). Research data management. University of British Columbia.


https://researchdata.library.ubc.ca/

UO Libraries. (2021). Research guide: File formats. University of Oregon.


https://researchguides.uoregon.edu/data-management/fileformats

Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf

23
Data Sharing and Reuse

Why share data?

Have you considered what you might do with your data once your project has finished? Have
you thought that someone else might benefit from your raw data? You might want to consider
sharing your data!

Sharing research data can:


● Satisfy grant requirements or journal requirements
● Make research more open and accessible
● Promote scholarly rigor
● Raise the profile of a researcher
● Increase research efficiency
● Promote collaboration
● Establish a public record
● Maximize transparency
● Promote inquiry and innovation
● Increase the economic and social impact of research
● Provide greater resources for education and training

If that weren’t incentive enough the Canadian Social Sciences and Humanities Research
Council (SSHRC) and Canadian Institutes of Health Research (CIHR) require grantees to
deposit their data in publically accessible repositories.

Challenges to sharing data

While sharing research data can have huge benefits there are sometimes barriers to sharing.
Preparing data for a repository can be time consuming and concerns about legal and ethical
issues can make researchers wary of sharing data with others. Some types of data are simply
not meant to be shared. These include trade secrets, medical information, commercial
information, preliminary analysis, third party data, and some geospatially linked data. Other
data, however, can be shared after it has been anonymized.

24
In order to ensure you are sharing data in an ethical manner you should:

● evaluate the anonymity of your data


● obtain a confidential review (someone from the repository looks it over)
● comply with institutional regulations (e.g., those of your institution’s research ethics
board)
● comply with other regulations (HIPAA, BREB)
● have informed consent for data sharing
● restrict use of confidential data

How to share data

Once you have decided you are interested in sharing your data, how do you go about sharing
it?

The easiest ways to share your data are to:


● Submit it into a subject specific or institutional repository or archive
● Post it to a project website
● Submit it to a journal
● License your data and provide suggested data citation

Data repositories are an especially great way to share data as many of them offer long-term
storage and preservation, regular backups, licensing arrangements, and online discovery and
data promotion.

Data repositories exist at the institutional, national, and discipline level. It’s probably a good idea
to check with your colleagues and peers to see whether there is a recommended repository in
your field.

When choosing a repository, consider the following:


● Who might want access to your data and where will they look?
● Is there an appropriate discipline specific repository?
● What are the access policies?
● What is the storage and preservation plan?
● What kind of data do they accept?
● What metadata standards are required?
● Do they charge any fees?

Take a moment to check out the repository database at http://www.re3data.org/. Is there a


repository in your area? What are the requirements for submitting data?

25
UBC Supported Repositories

UBC Dataverse Collection at Borealis. Borealis, the Canadian Dataverse Repository, is a


bilingual, multidisciplinary, secure, Canadian research data repository, supported by academic
libraries and research institutions across Canada. Borealis supports open discovery,
management, sharing, and preservation of Canadian research data. On Borealis, we have the
UBC Data Collection, which contains sub-Dataverses for researchers, labs, and projects across
UBC. There is a file size limit of 2.5GB per file, with no limit to the number of files, but it accepts
all data formats. It also allows UBC to mint DOIs for your datasets, which helps fulfil the
Accessible portion of FAIR. Datasets in our dataverse are discoverable by Google, Google
Data, UBC Library Summon, FRDR, DataCite, and much more. We also import all UBC
connected datasets that were deposited in Dryad to our dataverse.

FRDR, the Federated Research Data Repository, is a Canadian national research data
repository. It allows researchers to discover, share, and download Canadian research data. It
complies with FAIR principles. FRDR is great for large individual files or for a large number of
research files. FRDR mints DOIs as well, and allows you to apply a reuse license. Geodisy is a
geographic overlay within FRDR that allows you to search for data by research location.

Dryad is an international data repository that supports access to data underlying published
literature. UBC is a Dryad institutional partner. Dryad is able to assign DOIs and licenses,
typically CC0.

UBC cIRcle is UBC’s digital repository for research and teaching materials created by the UBC
community and its partners. Materials in cIRcle are openly accessible to anyone on the web,
and will be preserved for future generations.

Licensing Data
When you submit your data to a repository it is a good idea to license your data. Licensing data
allows researchers to clearly state how they want their data to be used and makes it easier for
others to re-use the data. While data itself does not fall under copyright protection, datasets and
databases do, and the easiest way to protect your copyright while allowing access is by
attaching a license.

Before deciding what license to use, you must first ensure that you yourself have permission to
license the data, as only the rights holder can grant a license. Once you are sure you can grant
a license, you must choose which license to apply. Make sure to check with your organization
or repository as they might recommend a certain license or provide one for you.

The most common data licenses are from Creative Commons and the Open Data
Commons. Each has standard sets of licenses that allow data to be used in different ways.
Alternatively, you can place your data in the public domain, allowing free and unrestricted
access. The Creative Commons zero license is the most popular copyright waiver.

26
Citing Data
Let’s say you are interested in using someone else’s data that you have located in a repository.
How do you cite it? Many journals and conferences have established data citation rules. Most
citation styles, besides APA, have not yet formally included datasets within their citation
standards. Generally, it is a good idea to include the following information:
● Author/creator
● Date created
● Title
● Publisher
● Persistent Identifier (e.g. DOI)

For more information on citing datasets please see the UBC Library guide on How to Cite.

Data Sharing and Reuse References


Ball, A. and Duke, M. (2015.) How to cite datasets and link to publications. Digital Curation
Centre. https://www.dcc.ac.uk/resources/how-guides/cite-datasets

Ball, A. (2014.) How to license research data. Digital Curation Centre.


https://www.dcc.ac.uk/resources/how-guides/license-research-data

Government of Canada. (2016). Tri-Agency open access policy. Tri-Agency.


https://www.science.gc.ca/eic/site/063.nsf/eng/h_F6765465.html?OpenDocument

Government of Canada. (2021). Tri-Agency research data management policy. Tri-Agency.


https://www.science.gc.ca/eic/site/063.nsf/eng/h_97610.html

Krier, L., Strasser, C. A., (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.

UBC Library. (2018). Research data management. University of British Columbia.


https://researchdata.library.ubc.ca/

Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf

27
Version History

Version Date Contributors

6.2 2022-07-28 Eugene Barsky

6.1 2021-09-13 Doug Brigham, Jentry Campbell

5.1 2018-05-08 Eugene Barsky

4.1 2016-07-12 Eugene Barsky

3.1 2015-10-20 Eugene Barsky

2.1 2014-11-03 Arielle Lomness, Eugene Barsky, Sally


Taylor, Marjorie Mitchell, Laurie
Henderson, Jennifer Abel, Devin Soper

28

You might also like