Data Management Plans JENNIFER L. THOEGERSEN, DATA CURATION LIBRARIAN NURAMP WORKSHOP SERIES MARCH 17, 2016 Jenny Thoegersen, Data Curation Librarian University of Nebraska-Lincoln Libraries jthoegersen2@unl.edu 2 Agenda Data Management Definition Importance Overview of Data Management Plans (DMPs) Components of a DMP Library Services DMP Activity 3
Acronyms DMP PII DOI ARK URN PURL CSV TIF/TIFF XML UNLDR Data Management Plan Personally Identifiable Information Digital Object Identifier Archival Resource Key Uniform Resource Name Persistent Uniform Resource Locator Comma-Separated Values Tagged Image File (Format) extensible Markup Language UNL Data Repository and Registry 4 Definition "Data Management is the process of controlling the information generated during a research project Penn State University Libraries 5 Importance Validity Funding Security Publishing. 6
Journal Data Archiving Policy (JDAP) requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive Dryad (2014-04-04). Joint Data Archiving Policy (JDAP). http://datadryad.org/pages/jdap 7 DMPs for Proposals Follow guidelines provided by granting agency, directorate, division, and solicitation Keep the plan clear, complete, and concise Refer back to the project proposal, if necessary Start early! Recheck requirements for changes 8 NSF Basic DMP Requirements 1. Types of data 2. Standards for data and metadata 3. Policies for sharing and protection 4. Provisions for re-use 5. Plans for preservation From the Grant Proposal Guide (http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp) 9
Data Management Components Planning Data Metadata Storage & Backup Legal & Ethical Issues Preservation & Sharing 10 Data Include ALL data to be produced/used in project Explicitly match data types to format Consider open, widely-used formats for sharing/preservation http://en.wikipedia.org/wiki/open_format Library of Congress Formats Recommendations http://www.loc.gov/preservation/resources/rfs/toc.html 11 Examples of Open Formats CSV XML tar HTML PNG FLAC MKV Plain text epub LaTex JSON OpenDocument PDF/A 12
Metadata Provides contextual and descriptive information Aids in discovery Should use standards, if possible http://www.dcc.ac.uk/ resources/metadatastandards http://datadryad.org/resource/doi:10.5061/dryad.1321/1 13 Storage & Backup Identify where data will be stored during the project Specify access and security restrictions Explain how data will be backed up What data will be backed up? Where? How often? Make it clear that you will maintain at least 3 copies of your data at all times 1 should be remote 14 Backup 3 2 1 Copies Media Types Remote 15
Storage & Backup Options at UNL Departmental servers Box Personal computer External hard drives Holland Computing Center Nsave backup by ITS 16 Cloud Considerations Research data can be stored in the cloud Must be conscious of security, privacy, confidentiality, legal, and access issues HIPAA Export controls PII See box.unl.edu for details on allowable data 17 Ethical & Legal Considerations DMP must address whether there are any legal/ethical/ip considerations for data If human subjects, how will privacy be protected? Consider in terms of storage, access, and potential sharing State explicitly if not applicable 18
Preservation & Sharing Identify what data will be preserved and shared, when, where and for how long May be able to embargo data for a time (usually 1-2 years) Data repositories are encouraged for preservation/sharing Use re3data.org to search for repositories in your discipline Consider cost, longevity, audience Popular repositories include ICPSR, Dryad, and Figshare 19 Data identifiers Persistent identifiers assist in citation and long-term access Common systems include: DOI ARK URN PURL Message error 404 by Roberto Zingales, https://www.flickr.com/photos/filicudi/2891898817 (CC BY 2.0) 20 Services at UNL Libraries Workshops Consultations UNL Data Repository & Registry 21
Potential Workshop Topics File organization Data management plans Storage and backup File formats Documentation and metadata Data repositories 22 Consultations Every Project Is Unique By Muhammad Rafizeldi (MRafizeldi) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons 23 UNLDR Deposit inactive datasets Data can be private or public (with embargo if needed) Assign DOIs to public data Guarantee 20 years 50GB free for UNL researchers Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779 http://127.0.0.1:8081/plosbiology/article?id=info:doi/10.1371/journal.pbio. 1001779 24
Bach, Roger and Batelaan, Herman (2015): Electron Double Slit and Talbot-Lau Inteferometer. UNL Data Repository. Dataset. http://dx.doi.org/10.13014/k2rn35sz Data Management Plan Activity See separate handout 26 Review an NSF Data Management Plan Read through the DMP from a successful NSF Grant Proposal Note where the researchers addressed: Data Metadata Storage & Backup Legal & Ethical Considerations Preservation & Sharing 37
Resources Overview: Sample DMPs: UNLDR: unl.libguides.com/datamanagement go.unl.edu/sample_dmps dataregistry.unl.edu Email: jthoegersen2@unl.edu 38