You are on page 1of 4
Research Data Management Lifecycle Checklist Collect & Create What types of data will be produced? What standards will be used for data documentation and metadata? G] Project Description Begin by building or locating 2 detailed README.txt overview of your project immediately. Content description (brief) of the data that will be collected. Include any value definitions, {questionnaires or instruments, or analysis procedures. Examples of data documentation include lab notebooks and experimental protocols, questionnaires, codebooks, data dictionaries, software syntax and output files, information about your equipment settings and calibration, database schema, methodology reports, and provenance information, | data Gathering Explain how the data will be collected and/or describe any existing data being used (citations, link and DO) Create an organizational workflow for the data, detailing any methods, procedures, and/or protocols used. Ciclestronic Lab Notebook: ELNs allow users to enter protocols, observations, notes, and ‘other data using a computer or mobile device. Determine how you will document data collection methods. Create an organizational workflow for the data, detailing any methods, procedures, and/or protocols used. protocols: Record protocols and methods collaboratively so that they canbe edited and shared when appropriate r2toc0's jo is @ collaborative platform and preprint server for methods and protocols that allow you to create step-by-step detailed, interactive, and dynamic protocols that can be run on mobile ‘or web. Free Carbon accounts are aveilable to HMS/HSDM/HSPH email holders. | Data Types & Formats Describe the types {imagine data, genomic, Qx, etc.) and formats of the data. If you need to ‘convert or migrate your data files from one format to another, be aware of the potential risk of the loss or corruption of your data and take appropriate steps to avoid/minimize it. Justify the use of format —is your chosen format open, non-proprietary, and in widespread use? Plan for creating sustainable file formats like bx, .pdf, and .csv ] Data Volume Estimate how much data will be produced throughout the project, at what growth rate, and if the production rate will change during the project. LD] Data Access Address any tools or software needed to create, process, andor visualize the data, 2022-12-16 ch Data Management Lifecycle Checklist Metadata Standards Use community standards for sharing and integration. Metadata standards include Dublin Core, GMS, IS0191152003E- Geo, PREMIS, MIBBI, Be sure to include all the information needed for the data to he read and interpreted in the future, Data Dictionary or README Create a Data dictionary or README files that dofine variables, measurement units, formats, anc data types. Ensure data quality and integrity during collection by creating training documentation that includes the data dictionary, establish shared understanding on what is collected or recorded and why, and document data decisions (ex. raw vs compound variables; age vs DOB), Code Documentation Establish documentation for code, scripts, and software (revise as the project continues) Include descriptive comments within code or scripts to explain what it’s doing. Record scripts for every stage of data processing and/or have a plan to document every manual action/change. Data Organization Establish rules for data organization. Describe your file naming/folder structure, Research data files and folders need to be labeled and organized in a systematic way agreed upon by the entire research team, so they're both identifiable and accessible for current and future users. Team cconsensus/agreement to use standard file naming conventions and versioning plans. '* Will someone new to the project be able to follow the workflow easily? ‘¢ Are the process and organization consistent throughout? (Delle Naming: Be consistent and use descriptive names (ex. 20190112-RawData-Smith) Cotte Structure: Create folders according to file naming conventions, document their purpose, and ensure files are stored in correct folders versioning: Establish naming and storage rules for different versions of the same data, Versioning control can be achieved manually or with a system (e.g. Git) inventory: rack data fies using a simple spreadshect/databasa/callaboratve tool 2022-12-16 Research Data Management Lifecycle Checklist Analyze & Collaborate What software or tools will be used for data analysis? How will you collaborate and document the process over time? Active Date Review institutional storage options to better understand where to store data based on behavior, performance, and means of access. A master copy of raw data should be retained, with further ‘changes to subsequent versions well documented. Analysis Ready Datasets Analysis-ready datasets have been responsibly collected and reviewed so that analysis of the data yields clear, consistent, and error-free results to the greatest extent possible, When working on a research project, take steps to ensure that your data is safe, authentic, and usable. Since raw data is often unstructured, with data management, data cleaning is part of the analysis process. Document Your Procese Consider the software you use for analysis, and whether those applications automatically generate documentation about your data files and process steps. Keeping track of your s ‘save you time when you want to recreate your work, or share your methodology with others Data Analytics Data analysis isa process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Shared Drive Centralized file servers are secure and backed up nightly Storing your personal and departmental documents on Harvard servers protects you from data loss that can accur when storing files on local computer or portable hard drives. Dropbox (or other cloud storage) Dopbox (and other cloud sharing options) offer secure ways to store, sync, and share data across platforms and across the globe. It keeps your computer up to date with your working files. Collaborate Wiki Interactive wiki-type platforms can offer researchers solutions for web-based project, management and content collaboration. High-Performance Computing (HPC) clusters offer the equivalent computational power of thousands of workstations and can help make your analysis more efficient and fast, HPC environments are typically connected to high-performing storage clusters where you can store the valuable data being processed Electronic Lab Notebooks Consider using an Electronic Lab Notebook for data collection. Electronic notebooks allow users to enter protocols, observations, notes, and other data using a computer or mobile device Github Version cantral can help you understand how the code or writing came to be, who wrote or contributed particular parts, and who you might ask to help understand it better. While not 2022-12-16 ch Data Management Lifecycle Checklist ‘meant to be a backup solution, using version control systems means that your code and writing can be stored on multiple other computers. Git isthe most common and widely accepted version control software, which you can run locally ‘on your computer, or through services like GitHub or Bitbucket. With git, you can have versions of the project, not just versions of each file as in Google or Dropbox. When using git, you are prompted along the way to add comments to your changes, which provides a way for you to ‘capture the documentation of a project as you go. Git is a free, open-source tool that can be downloaded to your lacal machine and used for logging all changes made to a group of designated computer files over time. It can be used to control ile versions locally by you alane on your computer and used to coordinate simultaneous work on @ group of files shared among a group of people. Gittiub is o popular website for hosting and sharing Git repositories remotely. It offers a web interface and provides functionality and a mixture of both free and paid services for working with such repositories. Image Management Images can be collected in a number of different ways, such as in-house scanning or photography, digital creation, or purchased from outside sources. Just like any data gathering process, researchers should have a plan for image management: collecting, capturing, analyzing, and storing images. Preferred file formats for image data: ‘@ Moving images: MOV, MPEG, AVI, MXF ‘© stil mages: TIFF, JPEG 2000, PDF, PNG, GIF, BMP OMERO: client-server software for visualization, management, and analysis of biological 8 8 ‘microscope images OO Adobe Ig: free software for locally organizing images C1 Immegel: free open-source, Java-based image processing and display tool 1 Trepy: free and open-source software that allows you to organize, manage and describe photographs of research materials 2022-12-16

You might also like