Data management Backgrounds and steps to implementation; A pragmatic approach.

Similar documents
NSF Data Management Plan Template Duke University Libraries Data and GIS Services

Fair data and open data: differences and consequences

Swedish National Data Service, SND Checklist Data Management Plan Checklist for Data Management Plan

Developing a Research Data Policy

Research Data Management Procedures

Data Management Checklist

Legal Issues in Data Management: A Practical Approach

Science Europe Consultation on Research Data Management

Scientific Research Data Management Policy

How to make your data open

Data Curation Profile Movement of Proteins

EC Horizon 2020 Pilot on Open Research Data applicants

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

NOW ON. Mike Takats Thomson Reuters April 30, 2013

INF - INFORMATION SCIENCES

Adding Research Datasets to the UWA Research Repository

How to share research data

Guideline on Data Handling and Methods Reporting (DHMR) TSB (School of Social and Behavioral Sciences) Science Committee, 2017

The library s role in promoting the sharing of scientific research data

Basic Requirements for Research Infrastructures in Europe

Checklist and guidance for a Data Management Plan, v1.0

Writing a Data Management Plan A guide for the perplexed

4.2 Electronic Mail Policy

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

Open Access & Open Data in H2020

The Data Curation Profiles Toolkit: Interview Worksheet

DATA MANAGEMENT PLANS Requirements and Recommendations for H2020 Projects. Matthias Razum April 20, 2018

Guidelines for Depositors

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Survey of Research Data Management Practices at the University of Pretoria

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Research Data Management. What s in it for me?

Big Data infrastructure and tools in libraries

Data Management Plan Generic Template Zach S. Henderson Library

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

RPg procedures for Research Data Management (RDM)

ZB MED Information Center Life Sciences

BUILDING A NEW DIGITAL LIBRARY FOR THE NATIONAL LIBRARY OF AUSTRALIA

UC Irvine LAUC-I and Library Staff Research

RADAR A Repository for Long Tail Data

RDMG Research Data Management Group at Freiburg University

Opus: University of Bath Online Publication Store

Collection Policy. Policy Number: PP1 April 2015

Dataset Documentation Reference Guide for Pure Users

Data Management Dr Evelyn Flanagan

New Zealand Certificate in Regulatory Compliance (Core Knowledge) (Level 3)

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Research Data Management Procedures and Guidance

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Data Curation Profile Human Genomics

Wendy Thomas Minnesota Population Center NADDI 2014

Protecting Future Access Now Models for Preserving Locally Created Content

Introduction to Data Management

Data Management Plans

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution

Scientific Data Policy of European X-Ray Free-Electron Laser Facility GmbH

The rapid expansion of usage over the last fifty years can be seen as one of the major technical, scientific and sociological evolutions of

JISC WORK PACKAGE: (Project Plan Appendix B, Version 2 )

ISO Self-Assessment at the British Library. Caylin Smith Repository

Focus: Themes within Introduction and Context

Data Citation and Scholarship

Data Curation Handbook Steps

CURRICULUM The Architectural Technology and Construction. programme

NRF Open Access Statement

University at Buffalo's NEES Equipment Site. Data Management. Jason P. Hanley IT Services Manager

Callicott, Burton B, Scherer, David, Wesolek, Andrew. Published by Purdue University Press. For additional information about this book

Introducing the Springer Nature Data Support Services

Agenda. Bibliography

Introduction to Data Management for Ocean Science Research

ICGI Recommendations for Federal Public Websites

ISSMP is in compliance with the stringent requirements of ANSI/ISO/IEC Standard

Towards FAIRness: some reflections from an Earth Science perspective

KING S COLLEGE, CAMBRIDGE

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

Emory Libraries Digital Collections Steering Committee Policy Suite

Archive II. The archive. 26/May/15

Policies & Regulations

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

Administrative Directive No. 4: 2011 Continuing Professional Education Requirements for All Certification Programs

Striving for efficiency

Records Retention Training

Guardian Electrical Compliance Ltd DATA PROTECTION GDPR REGULATIONS POLICY

Data Curation Profile Water Flow and Quality

Deliverable Initial Data Management Plan

Records Management and Retention

Financial Planning Institute of Southern Africa SETTING THE STANDARD. Continuous Professional Development (Cpd) Policy

Texas Commission on Fire Protection

SHARING YOUR RESEARCH DATA VIA

Forensic analysis with leading technology: the intelligent connection Fraud Investigation & Dispute Services

Records Information Management

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

DATA SHARING FOR BETTER SCIENCE

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION

Survey of research data management practices at the University of Pretoria, South Africa: October 2009 March 2010

Transcription:

Data management Backgrounds and steps to implementation; A pragmatic approach.

Research and data management through the years Find the differences 2

Research and data management through the years Find the similarities 3

Research and data management through the years Similarities Lots of data (and increasing explosively) Lots of information Unstructured Poorly accessible from outside Poorly searchable; dependend of index used Items hardly findable; based on system used Hardly administered Unsafe (content & bearer) Ownership based on geographical location 4

Research and data management through the years By those Poorly accessible Poorly findable Poorly searchable Poorly reproducible Poorly verifiable Poorly reusable Unsafe 5

How do we solve that? 6

First some definitions, theories and thoughts 7

Definitions Data versus Information Data are facts. If data are processed, organised and structured or presented in a specific context, in order to render it useful, it is called information. 8

Definitions Wat are Research data? Research data are data collected during research in order to analyse and by that producing original results. e.g. measurements, pictures, models, chromatographs, surveys 9

Definitions What are Big Data and what makes those data BIG? Gartner: 1. High volume 2. High velocity 3. High variety 10

Definitions What are Metadata? Metadata are data describing characteristics of data. So, metadata are data about data. 11

Definitions What is an Archive? An accumulation of historical records, or the physical place they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of that person or organization. Source: Wikipedia 12

Definitions What is a repository? A place where, or in which, things are, or may, be stored. 13

Definitions What is data deposition? To place data in a place especially for safekeeping or as proof for a longer period of time. Mostly in a repository. In law, a deposition is the out-of-court oral testimony of a witness that is reduced to writing for later use in court or for discovery purposes. 14

Definitions What is a Dataverse? A Dataverse is a container for research data studies, customized and managed by its owner. A study is a container for a research data set. It includes cataloging information, data files and complementary files. Source: thedata.harvard.edu 15

Dynamic (work in progress) Static (Archive) Theory Structuring and arranging: The Research data lifecycle Data discovery Data dissemination is the distribution or Data transmitting of data repurposing to end users Data Study, Concept & Design Data collection Data processing Data access & Dissemination Analysis KT Cycle Research outcomes Source: Charles Humphrey and Elizabeth Hamilton (2004) 16

Theory Knowledge transfer cycle Popularizing Popularising Popular Popular literature, literature, newspapers, newspaper, practice practice Conceptualising E-mails, letters, literature reviews Conceptualising E-mails, letters literature reviews Value Value meeting minutes Grant applications, meeting minutes teaching/research Teaching/research numeracy Numeracy Formalising informed Informed public public Analysis Formalising Analysis Journal articles, Presentations, Journal articles, books, curricula conferences, books, curricula, Presentations, content, policy seminars content, policy conferences, Initial seminars Initial results Results Grant reports, Grant technical reports, reports, technical reports, thesis thesis Initialising Grant applications, Initialising Source: Charles Humphrey and Elizabeth Hamilton (2004) 17

Thoughts Per researcher per three years 1 TB Rough data After cleaning 750 GB remains 8-10 datasets > ~ 7.5 TB Metadata / analysis and reports 1 GB Approx. 8 TB per researcher per three years 30 researchers generate > 240 TB in three years 18

Recap Research data was and still is Poorly accessible Poorly findable Poorly searchable Poorly reproducible Poorly verifiable Poorly reusable Unsafe 19

May I introduce 20

Solution The data management plan: In a data management plan a researcher puts down how the research data will be stored, administered, documented, protected and shared. 21

A data management plan describes Information about data & data format Include a description of data to be produced by the project. This might include (but is not limited to) data that are: How will the data be acquired? When and where will they be acquired? After collection, how will the data be processed? Include information about Software used Algorithms. Describe the file formats that will be used, justify those formats, and describe the naming conventions used. Identify the quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? How will the data be managed in the short-term? E.g.: Version control; Backing up data and data products; Security & protection of data and data products; Who will be responsible for management. Metadata content and format What metadata are needed? How will the metadata be created and/or captured? Examples include lab notebooks. What format will be used for the metadata? Consider the metadata standards commonly used in the scientific discipline that contains your work. Policies for access, sharing, and re-use Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. Address any ethical or privacy issues with data sharing. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? Describe the intended future uses/users for the data. Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a digital object identifier (doi) assigned to it? Long-term storage and data management Researchers should identify an appropriate archive for long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. Budget Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are: Personnel time for data preparation, management, documentation, and preservation. Hardware and/or software needed for data management, backing up, security, documentation, and preservation. Costs associated with submitting the data to an archive. The data management plan should include how these costs will be paid. 22

Dynamic (work in progress) Static (Archive) Recap: Research data lifecycle Data discovery Data repurposing Data Study, Concept & Design Data collection Data processing Data access & Dissemination Analysis KT Cycle Research outcomes Source: Charles Humphrey and Elizabeth Hamilton (2004) 23

How does this fit in (the former) models? 24

How to get there and how we, at FPN, did it A. Define a guideline DHMR: Data handling & methods reporting describing regulations about data packages: 1. Meta-data 2. Data collecting 3. RAW data file definition 4. Data storage 5. Materials 6. Statistical processing 7. Processed data file 8. Access and verification 9. Retention B. Let the (faculty) board decide on implementing the Guideline 25

How to get there and how we, at FPN, did it C. Define rules/regulations for a DHMR commission D. Instate a DHMR commission* with a number of tasks: a. (Re)defining and updating the DHMR guidelines. b. Overseeing compliance with DHMR by means of internal evaluations and/or audits on research c. Drawing up an annual DHMR report. d. Providing the dean with asked/unasked advice about: Aspects of DHMR (e.g. infrastructure; open access;) (Legal) aspects of data management; DHMR Training (staff; facilities; programmes;) Research culture and -ethics. E. Install the DHMR commission F. Implement and start working according to DHMR *Consists of senior researchers; at FPN ORa 26

27