Professional Documents
Culture Documents
DataGuide
Version 6.2
UBC Library
Research Data Services
research.data@ubc.ca
Last Updated:
July 28, 2022
Introduction 3
The FAIR Principles 4
ORCID ID: A Persistent Identifier for You 4
DOIs: A Persistent Identifier for Your Digital Objects 5
Introduction References 5
2
Introduction
Research Data Management is increasingly important for researchers at all stages of their
careers. Adopting a workflow that produces well organized, documented, preserved, and
accessible research data will make your job easier in the long run. Funding organizations and
ethics boards have begun to require more transparency and compliance with research data
management practices as a condition of approval and funding. Being upfront with your project
and your team can save you countless hours of deciphering, numerous head scratching
moments, and can ensure that your research data is accessible in the future to you and others
you may share your data with.
Research settings commonly include numerous researchers or assistants. Creating a plan for
your research data from the beginning will serve you all well. Part of the planning process allows
you to account for all members of the team, and consider what the procedure would be for any
incoming or departing team members. In the long run, this can help avoid data loss that occurs
as teams evolve over time.
Take a moment to think about your research and the kinds of data you generate.
If you are unsure about the answer to any of these questions you have come to the right place.
This document was created for graduate students, research assistants, and researchers
interested in improving their data management skills. We cover the basics of data management
plans, metadata and data documentation, data storage and security, and data sharing. This
document is designed to get researchers thinking about the steps they can take to better
manage their research data.
3
The FAIR Principles
FAIR stands for:
● Findable - Findable data is discoverable thanks to its metadata
● Accessible - Accessible data is always available and obtainable, this does not mean the
files are open, rather that you can access the metadata regarding the files.
● Interoperable - Interoperable data is able to be used by many researchers from many
locations
● Reusable - Reusable data is described, licensed, and shared in such a way that wide
reuse is possible.
The FAIR Principles of Research Data Management focus on metadata, which is the descriptive
data about your project that supports its ability to be FAIR. The cartoon above, by OpenAIRE,
illustrates the principles nicely.
It is important to note that all data can be FAIR, but not all FAIR data is open. OpenAIRE states
that data should be “as open as possible, as closed as necessary.” Not all data can be fully
open, but it should still be findable at the metadata level.
4
ORCID is an international non-profit organization. In Canada, the ORCID-CA consortium is
administered by the Canadian Research Knowledge Network (CKRN) with dozens of members.
They hope to register all researchers in Canada for an ORCID ID to use throughout their
careers.
ORCID IDs allow researchers to connect all of their research to a singular, unique record. An
ORCID ID is yours, for your entire career. Even if your name changes or you move between
countries, postings, or fields, your ORCID ID remains unchanged. You will commonly see
ORCID IDs listed on publications and requested in grant applications. ORCID IDs can be
represented as a unique URL or as a QR code, allowing you to add it to anything you wish,
including CV’s, personal websites, business cards, and presentations with ease. ORCID ID
allows you to compile your works and other pertinent information into a profile accessible
through your ORCID ID.
DOIs are created automatically for any digital objects you deposit in UBC Dataverse Collection,
cIRcle, FRDR and Dryad. For more information visit: Get DOIs.
Introduction References
OpenAIRE. (n.d.). How to make your data FAIR. OpenAIRE.
https://www.openaire.eu/how-to-make-your-data-fair
5
Research Data and Data Management Plans
In practice, research data can be a great many things, from DNA samples to interview
transcripts to photographs.
6
The research data lifecycle
It is important to remember that managing your research data is not something that you do at the
end of a project, but throughout each stage of the process. The following diagram illustrates the
different stages in working with research data:
We use the data life cycle to help us understand where we are in our research and the research
data management needs of our data. The data life cycle consists of seven phases:
● Plan - Review existing data, gain informed consent for sharing, consider costs for data
management and storage, write a DMP
● Create - Produce research data, record all metadata
● Process - Digitize data. Data is validated, cleaned, coded, and anonymized.
● Analyze - Interpretation to produce findings. All documentation completed.
● Preserve - Selection of formats for storage, DOIs assigned
● Share - A license is applied and access is outlines
● Reuse - Complete research data packages are made available for reuse
While the cycle presents a beautiful, cyclical flow, most research flows back and forth through
the cycle over time. In order to determine how best to manage data through each stage of a
project, researchers create a data management plan.
7
What is a Data Management Plan?
A data management plan, or a DMP, is a living document that helps you manage your research
data by outlining in advance what you will do with your data during and after your research
(DataOne, 2012). As a living document data management plans can be updated throughout
your research.
A detailed data management plan can help you save time and money by getting you to think
about the different steps in your research process and the resources and tools you will need in
order to organize, store, and share your data now and in the future.
Canadian Requirements
Data management plans are a key requirement of the Tri-Agency Research Data
Management Policy. The Tri-Agency is a joint effort from three Canadian agencies: The
Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering
Research Council of Canada (NSERC), and the Social Sciences and Humanities Research
Council of Canada (SSHRC).
The policy outlines the steps the Tri-Agency will take to incorporate DMPs into grant
applications beginning in the spring of 2022. The policy also requires institutions to outline how
they will support research data and researchers. UBC is in the process of developing its
institutional plan. Many American funding agencies, including the National Science Foundation
(NSF), require data management plans. We recommend that you save yourself a headache
later and incorporate writing a data management plan into your research development
processes. We also recommend that you read the policy in full, as there are nuances that may
apply to your research.
DMPs should state who is responsible for the project’s data management, the succession plan
should they leave, and what the role of each team member is as it pertains to data.
8
Writing a DMP
There are several online tools that exist to help you write a DMP. In Canada, the Portage
Network has created DMP Assistant, in the United States the University of California maintains
DMP Tool, and in the UK the Digital Curation Centre maintains DMP Online. Each of these tools
functions similarly by walking you through the process of writing a DMP.
Here in Canada, the DMP Assistant is available in both English and French. It allows you to
create and export your DMP. Portage has created templates for various fields, and UBC has a
general template for use by UBC researchers. Note that not every question is applicable to
every research project. The Portage Network provides access to free webinars and training
modules on data management and DMP Assistant. If you have questions about your DMP,
please ask your liaison librarian or email research.data@ubc.ca.
9
Indigenous Research Data
There are specific considerations and protocols surrounding Indigenous research data
collection, use, and sharing.
5. UBC helps to advance the interests, rights, and jurisdiction with respect to
reclamation of information in UBC’s possession including research data, records,
and other types of information.
6. Indigenous data governance standards apply to data that relates to each Nation
and their identity as distinct people, communities, and Nations regardless of
where the data is held across UBC.
7. Individual rights and privacy are protected, while collective rights, privacy, and
security are evolving.
10. All activities of UBC ISRI will be transparent and consistent with co-developed
management processes.
Please visit the ISRI web site for more information related to engagement principles, data
governance, and ethics. ISRI is currently developing ethics guidelines for Indigenous research.
10
CARE Principles for Indigenous Data Governance
The CARE Principles for Indigenous Data Governance are designed to complement the FAIR
principles and take into account the current and historic power imbalances between researchers
and Indigenous communities.
The CARE Principles can be examined in depth here: CARE Principles for Indigenous Data
Governance
The OCAP® Principles of data governance outline how to interact with First Nations data.
OCAP® stands for:
● Ownership - First Nations communities or groups own their data collectively
● Control - First Nations communities can control all aspects of the research cycle that
impact them directly.
● Access - First Nations retain access to the data, regardless of where it is held.
● Possession - First Nations retain physical control of the data.
OCAP® certifications are available through the First Nations Information Governance Centre
(FNIGC). OCAP® is a registered trademark of the FNIGC.
11
Research Data and DMP References
Digital Curation Centre. (n.d.) Data management plans. Digital Curation Centre.
https://www.dcc.ac.uk/resources/data-management-plans
First Nations Information Governance Centre. (n.d.). The First Nations principles of OCAP®.
https://fnigc.ca/ocap-training/
Indigenous Research Support Services. (2019). Principles for Indigenous data governance.
University of British Columbia.
https://irsi.ubc.ca/transforming-research/indigenous-data-governance
Krier, L. & Strasser, C. A. (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.
Portage Network. (2020). Brief guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989
Research Data Alliance International Indigenous Data Sovereignty Interest Group. (2019).
CARE principles of Indigenous data governance. The Global Indigenous Data Alliance.
https://www.gida-global.org/care
Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf
12
Metadata and Data Organization
What is Metadata?
Metadata is often described as “data about data” and helps answer the questions of who, what,
when, where, why. This descriptive data is essential for creating FAIR and open data, and
ensuring that the datasets you preserve will be accessible for many years to come.
Descriptive: Descriptive metadata describes the content and context of your data at both the
dataset and item level. Examples: title, author, keywords
Structural: Structural metadata describes how different data sets relate to one another, or
any processing or formatting steps that were undertaken. Examples: Information about the
relationship between data sets in a database, file formats
Take a moment to think about your research project. What kind of descriptive, administrative
and structural metadata might you want to record?
Organizing Metadata
Odds are you have already written down a good deal of metadata about your project; hopefully
you don’t plan on doing it all at the end. Save yourself some trouble and start gathering
metadata at the beginning of your project.
Are you unsure what to record? Many disciplines have created their own metadata standards to
ensure that data records can be interpreted and compared across projects and fields. A typical
metadata standard provides a set structure and language for describing your data. Some of the
most common metadata standards include Dublin Core, Darwin Core (for the biological
sciences), and DDI (Data Documentation Initiative).
13
If you are deciding which metadata standard to use, remember that many data repositories,
organizations, and journals have specific requirements for metadata. Double check before you
commit.
Curious about what metadata standards are common in your field? Take a moment to visit
the following link and find a metadata standard used in your field: Data Curation Centre:
Disciplinary Metadata Standards
README files
Regardless of which metadata standards you follow, it is important to properly document your
data. README files are the most basic tool for project documentation. They contain basic
descriptive metadata about your project and should accompany your data throughout its life.
README files are plain text files (.txt) that are operable by all computers. A README file can
pertain to your entire project, or you can create several README files for more complex
datasets.
At the very least you should document the following in a README.txt file stored alongside
your data:
● Contact information of researchers, including ORCID IDs
● Description of dataset
● Sources used
● Date of collection
● Use license that dictates how the data can be reused
● Methods of collection (protocols, sampling, instruments, coverage, etc.)
● Tools used to collect & process the data
● Data modifications made
● Quality assurances (data validation, checking)
● File structure and file relations for the data set
● Explanations of codes, classifications, variables, and file names
Cornell has a very useful README template that you can use to build your own.
In addition to what you write, how you write it is very important. Always remember to be as clear
as possible! It is easy to take for granted what is “common knowledge.” Remember that
common knowledge changes over time and failing to record something because “everyone
does it this way” could have dire consequences for the future accessibility and reusability of
your data.
14
● Use descriptive titles
● Be specific and quantify
● Use keywords
● Make it machine readable (avoid symbols)
Finally, don’t wait to document your data! If you wait until the end of your project, you might
lose valuable information!
File naming
Research projects can generate hundreds or even thousands of individual data files. Proper file
names and organization can make these files easier to locate and navigate. But even if you
don’t have hundreds of files, creating a file naming structure will keep your research organized,
especially within research teams.
It is recommended that you choose a file naming convention and implement it throughout the
duration of your project. Make sure that everyone on your team is following the same rules for
naming files. When deciding how to name your files remember the following:
DO: 20180403
DON’T: 04032018
2. Use a short unique identifier (e.g. Project Name or Grant #) to reduce the need to scroll
horizontally in order to read the file name.
DO: CHHM
15
3. Include a summary of content (e.g. Questionnaire or GrantProposal) as part of the file
name
DO: FileNm_Guidelines_20180409_v01.docx
DON’T: FileNm_20180409.docx
4. Use _ (underscore) as a delimiter. Avoid spaces between words and these special
characters: & , * % # * ( ) ! @$ ^ ~ ‘ { } [ ] ? < > – as different
operating systems handle special characters differently. Using special characters can
impact the ability of a file to be opened or change how the system sorts the files.
DO: FileNm_Guidelines_20140409_v01.docx
5. Keep track of document versions either sequentially (e.g. v01, v02,) or with a unique
date and time ( e.g. 20140403_1800) to accurately track versions.
DO: FileNm_Guidelines_20140409_v01.docx
DON’T: FileNm_Guidelines_20140409_Review.docx OR
FileNm_Guidelines_20140409_Investigation.docx
6. A good file naming system will replace an extensive folder hierarchy. Limit the number of
nested folders and strive to make hierarchies as simple as possible. Complex folder
hierarchies are harder to navigate and offer more opportunities for filing errors. System
back-ups may take longer.
DON’T:
F:/Environment/Library/Woodward/Data/Education/Materi
als/Draft/2014/04/DataMgmt_FileFormats_20140409_v01.d
ocx
16
Metadata References
Krier, L., & Strasser, C. A. (2014). Data management for libraries: A LITA guide . Chicago: ALA
TechSource.
Library of Congress. (2021). Sustainability of digital formats planning for Library of Congress
collections. Library of Congress. http://www.digitalpreservation.gov/formats/
Portage Network. (2020). Brief guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989
University of Oregon Libraries. (2021). Research guide: Research data management. University
of Oregon. https://library.uoregon.edu/research-data-management
Virginia Tech Digital Library and Archives. (2017, June 16). Recommended file formats. Virginia
Tech. https://etd.vt.edu/howto/accept.html
17
Data Storage and Security
Data Storage
Data storage and security considerations are essential aspects of managing research data and
should be mapped out in your data management plan. At the beginning of any project
researchers should map out what data they will be generating and how they plan on storing it. In
deciding where to store your data ensure that you understand your organization’s policies and
infrastructure for data storage and backups. This includes considering the most appropriate
storage system for sensitive data and what institutional policies apply to its handling.
A best practice is to have three copies stored in at least two locations (in case of a failure at
one location), one of them off-site. Even if each location is a cloud-based server, do not store all
of your backups on the same cloud-based server as a precaution. Cloud-based servers do have
internal redundancies to prevent the loss of data, but utilizing multiple services is a good
practice in the off-chance of a catastrophic loss.
18
Another essential step in data storage is to retain an original, unedited copy of your raw data
file. This file should be locked in a read-only format, which requires copying the file to make
changes. It is imperative that you do not overwrite this file so you have a fail-safe to return to
should something go awry.
Remember, just because you have saved your data doesn’t mean it is safe! Data can be lost
for a number of reasons including:
● hardware failures
● software failures
● viruses or hacking
● power failures
● natural disasters
● human error
● theft of equipment
Even if you are backing up your data, remember to check that the backups are working and that
the data is accessible. Every time you edit your working copy, the backup copies should be
updated! A backup copy from 6 months ago that contains none of your recent data is practically
useless. This backup copy should include all pertinent files, including your README files. Think
of each backup as a complete packaged copy of your working files, allowing you to return to
work without any rework should you need to utilize your copies. Finally, backing up the entire
package of stored data helps ensure that everything can be understood in the future.
File Formats
A file format is a way of encoding information within a computer file so that it can be recognized
by an application and accessed. It is indicated by the file name extension (generally a full stop
followed by three letters such as .txt, .doc, .jpg, .mov). In other words, this allows the computer
to recognize that a document contains text or that a file should be processed as a video.
Additionally, file formatting is important as this may affect whether the file contents are
accessible following long-term storage.
File formats are an essential consideration in data storage. Software and data storage
technology changes quickly, and files can easily become obsolete or difficult to access. In
general, it is recommended that data files are copied to new media every 2-5 years, especially if
technology changes or if files begin to degrade.
19
open format is a file format that is freely available for everyone to use. Because the
specifications are released, open-source developers can write software to utilize the file
format in the case that a particular vendor no longer supports the file format. This
increases the chances that technological developments do not make particular file
formats obsolete.
3. Technical dependencies
Technical dependencies are the degree to which a particular format depends on
particular hardware, operating system, or software and how these dependencies might
influence future usage of the media. Using non-proprietary file formats may decrease the
risk of technical obsolescence by removing the dependency on the underlying
technology.
Digital Images
● TIFF version 6 uncompressed (.tif)
● JPEG (.jpeg, .jpg)
● TIFF (other versions)(.tif, .tiff)
● JPEG 2000 (.jp2)
● Adobe Portable Document Format (PDF/A, PDF) (.pdf)
Digital Sound
● AIFF (96kHz 16bit PCM) (.aif, .aiff)
● FLAC (.flac)
● MP3 (.mp3)
● WAV (96kHz 24bit PCM) (.wav)
Digital Video
● MPEG-4 High Profile (.mp4)
E-Books
20
● EPUB
Data security
21
○ Are there firewalls?
● Physical security
○ Who has access to the computers?
○ Who can access physical files?
○ How is data transported?
● Computer security
○ Is antivirus software up to date?
○ Are you protected against power surges?
○ Do you use passwords and firewalls?
○ Is data encrypted?
○ Is data storage secure?
If you are dealing with private or sensitive data make sure you understand your organization’s
regulations about storage, security, and disposal. Data can be sensitive due to direct and
indirect identifiers, but can also be due to data ownership, use agreements, etc. If you’re
unsure, please ask. Some countries including Canada do not allow personal data to be stored
in servers outside the country, making commercial storage systems like Dropbox or Google
Drive unusable for files containing personal information.
Finally, remember that just because you deleted something doesn’t mean it can’t be recovered!
To destroy data, you must overwrite a hard drive, physically destroy memory sticks and shred
paper documents.
For more help and training on data security, please visit Privacy Matters @ UBC. They have a
two part training module on privacy and information security.
22
Data Storage and Security References
Krier, L., & Strasser, C. A. (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.
Portage Network. (2020). Brief Guide - Research data management. Portage Network.
https://doi.org/10.5281/zenodo.4000989
Research Data Management Services Group. (n.d.). Data management planning. Cornell
University. https://data.research.cornell.edu/content/data-management-planning
Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf
23
Data Sharing and Reuse
Have you considered what you might do with your data once your project has finished? Have
you thought that someone else might benefit from your raw data? You might want to consider
sharing your data!
If that weren’t incentive enough the Canadian Social Sciences and Humanities Research
Council (SSHRC) and Canadian Institutes of Health Research (CIHR) require grantees to
deposit their data in publically accessible repositories.
While sharing research data can have huge benefits there are sometimes barriers to sharing.
Preparing data for a repository can be time consuming and concerns about legal and ethical
issues can make researchers wary of sharing data with others. Some types of data are simply
not meant to be shared. These include trade secrets, medical information, commercial
information, preliminary analysis, third party data, and some geospatially linked data. Other
data, however, can be shared after it has been anonymized.
24
In order to ensure you are sharing data in an ethical manner you should:
Once you have decided you are interested in sharing your data, how do you go about sharing
it?
Data repositories are an especially great way to share data as many of them offer long-term
storage and preservation, regular backups, licensing arrangements, and online discovery and
data promotion.
Data repositories exist at the institutional, national, and discipline level. It’s probably a good idea
to check with your colleagues and peers to see whether there is a recommended repository in
your field.
25
UBC Supported Repositories
FRDR, the Federated Research Data Repository, is a Canadian national research data
repository. It allows researchers to discover, share, and download Canadian research data. It
complies with FAIR principles. FRDR is great for large individual files or for a large number of
research files. FRDR mints DOIs as well, and allows you to apply a reuse license. Geodisy is a
geographic overlay within FRDR that allows you to search for data by research location.
Dryad is an international data repository that supports access to data underlying published
literature. UBC is a Dryad institutional partner. Dryad is able to assign DOIs and licenses,
typically CC0.
UBC cIRcle is UBC’s digital repository for research and teaching materials created by the UBC
community and its partners. Materials in cIRcle are openly accessible to anyone on the web,
and will be preserved for future generations.
Licensing Data
When you submit your data to a repository it is a good idea to license your data. Licensing data
allows researchers to clearly state how they want their data to be used and makes it easier for
others to re-use the data. While data itself does not fall under copyright protection, datasets and
databases do, and the easiest way to protect your copyright while allowing access is by
attaching a license.
Before deciding what license to use, you must first ensure that you yourself have permission to
license the data, as only the rights holder can grant a license. Once you are sure you can grant
a license, you must choose which license to apply. Make sure to check with your organization
or repository as they might recommend a certain license or provide one for you.
The most common data licenses are from Creative Commons and the Open Data
Commons. Each has standard sets of licenses that allow data to be used in different ways.
Alternatively, you can place your data in the public domain, allowing free and unrestricted
access. The Creative Commons zero license is the most popular copyright waiver.
26
Citing Data
Let’s say you are interested in using someone else’s data that you have located in a repository.
How do you cite it? Many journals and conferences have established data citation rules. Most
citation styles, besides APA, have not yet formally included datasets within their citation
standards. Generally, it is a good idea to include the following information:
● Author/creator
● Date created
● Title
● Publisher
● Persistent Identifier (e.g. DOI)
For more information on citing datasets please see the UBC Library guide on How to Cite.
Krier, L., Strasser, C. A., (2014). Data management for libraries: A LITA guide. Chicago: ALA
TechSource.
Van den Eynden, V., Corti, L., Wollard, M., Bishop, L. and Horton, L. (2011). Managing and
sharing data: Best practices for researchers (3rd ed.). UK Data Archive.
https://ukdataservice.ac.uk/media/622417/managingsharing.pdf
27
Version History
28