Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

Similar documents
Developing a Research Data Policy

GEOSS Data Management Principles: Importance and Implementation

Science Europe Consultation on Research Data Management

The Materials Data Facility

Introducing the Springer Nature Data Support Services

Data Management Plan Generic Template Zach S. Henderson Library

Data Discovery - Introduction

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

BPMN Processes for machine-actionable DMPs

Advancing code and data publication and peer review. Erika Pastrana, PhD Executive Editor, Nature Journals ALPSP_Sept 2018

Data Curation Handbook Steps

SHARING YOUR RESEARCH DATA VIA

A Data Management Plan Template for Ecological Restoration and Monitoring

Checklist and guidance for a Data Management Plan, v1.0

Big Data infrastructure and tools in libraries

Make your own data management plan

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Writing a Data Management Plan A guide for the perplexed

Data Management Checklist

Open Science, FAIR data and effective data management

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Reproducibility and FAIR Data in the Earth and Space Sciences

The Data Management Plan: Putting policy into practice Suzanne Clarke Director, Information Resources

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Emory Libraries Digital Collections Steering Committee Policy Suite

ONBOARDING APPLICATION

Horizon2020/EURO Coordination and Support Actions. SOcietal Needs analysis and Emerging Technologies in the public Sector

Susan Thomas, Project Manager. An overview of the project. Wellcome Library, 10 October

Competency Definition

Data Management Dr Evelyn Flanagan

Swedish National Data Service, SND Checklist Data Management Plan Checklist for Data Management Plan

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

IBM Information Server on Cloud

Data Management Plans

EUROPEANA METADATA INGESTION , Helsinki, Finland

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

For Attribution: Developing Data Attribution and Citation Practices and Standards

The iplant Data Commons

Virtual Machine Encryption Security & Compliance in the Cloud


Establishing Enterprise-Wide Data Governance

Amazon Elastic File System


The Value of Metadata

Guidelines for Depositors

Where to store research data during and after a project. Dr. Chris Emmerson Research Data Manager

Open Access, RIMS and persistent identifiers

The Emerging Data Lake IT Strategy

Focus: Themes within Introduction and Context

How to make your data open

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Data Curation Profile Movement of Proteins

DATA SELECTION AND APPRAISAL CHECKLIST University of Reading Research Data Archive

Towards FAIRness: some reflections from an Earth Science perspective

Data driven transformation of the public sector Tallinn, Estonia Head of unit 22 September 2016 European Commission

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

Integration of New and Big Administrative Data Sources into Turkish Statistical System

ORCID: A simple basis for digital data governance

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Records Management at MSU. Hillary Gatlin University Archives and Historical Collections January 27, 2017

Records Management Standard for the New Zealand Public Sector: requirements mapping document

Archiving Market Update

Leveraging the Cloud for Law Enforcement. Richard A. Falkenrath, PhD Principal, The Chertoff Group

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Developing Data Management Plans (DMP) Scholarly Communication Initiative Mississippi State University Libraries March 25, 2015

Common approaches to management. Presented at the annual conference of the Archives Association of British Columbia, Victoria, B.C.

Legal Issues in Data Management: A Practical Approach

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

The Data Curation Profiles Toolkit: Interview Worksheet

National Materials Data Initiatives

2 The IBM Data Governance Unified Process

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

Open Data is a new paradigm in which research data are freely and openly shared, with full re-use rights. Open data ensures that research integrity

CA ERwin Data Profiler

Bringing Business Value to Object Oriented Storage

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

LOUGHBOROUGH UNIVERSITY RESEARCH OFFICE STANDARD OPERATING PROCEDURE. Loughborough University (LU) Research Office SOP 1027 LU

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1

Research Data Management Procedures

Research Data Management Procedures and Guidance

Chapter 11: Editorial Workflow

COALITION ON PUBLISHING DATA IN THE EARTH AND SPACE SCIENCES: A MODEL TO ADVANCE LEADING DATA PRACTICES IN SCHOLARLY PUBLISHING. Source: NSF.

AS/NZS ISO 13008:2014

Integration With the Business Modeler

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

DURAFILE. D1.2 DURAFILE Technical Requirements

Introduction to SURE

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

WE HAVE SOME GREAT EARLY ADOPTERS

DATABASE ADMINISTRATOR

Global Reference Architecture: Overview of National Standards. Michael Jacobson, SEARCH Diane Graski, NCSC Oct. 3, 2013 Arizona ewarrants

The Open Archives Initiative in Practice:

DIGITAL RECORDS MANAGEMENT GUIDELINES

Preservation and Access of Digital Audiovisual Assets at the Guggenheim

Data Archival and Dissemination Tools to Support Your Research, Management, and Education

Research Elsevier

Transcription:

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, 24 Symonds St

Outline The need The approach The progress to date

What do we need? Data storage services that are reliable And that are quick to provision Clarity over what we are entitled to Services that integrate well into research workflows and practices

What do others need? Open data Discoverable data Well documented or self-describing data Reliable data (or at least the quality is assessed) Data that can automatically configure itself to our needs

Turning the question around Not: How do I want to describe the data? But rather: What does a future data consumer need to know? So, as well as asking: How do we share our data? we should also be asking: What kinds of data descriptions are demonstrably useful to facilitate data reuse?

Andrew Treloar, Australian National Data Service (ANDS)

Storage Solutions Backup Service: taking a snapshot of the current stare, just in case File Sharing Service: (like DropBox) for collaborative work on active data Data Publishing Service: (like figshare) to publish & discover data, capture metadata and track impact Archive Service: planned, long-term data preservation for important resources Fast Data Transfer for accessing remote science equipment

Data Lifecycle Services Creating Data Management Plans (DMPs): good practice, and increasingly required by funders Data Ethics Advice: privacy, encryption, access considerations, disposal Database Design: for ease of data management and preservation Data Publication advice: including metadata, ownership and licensing

Data Management Plans (DMPs) They describe how (and when) you will take care of, and share, your data They show funding agencies they can trust you in turning their money into data They help IT support groups understand your needs They allow the institution to know what we have, and when we can delete it.

http://www.lib.cam.ac.uk/dataman/pages/preservation.html

figshare

Discover data

Manage data

Publish data

Long term Research Data Challenges 1. Storing unprecedented volumes of data (and accelerating) Data production passed storage capacity in 2007 Cost differential is increasing, Rate of data production is increasing 2. Describing what we have in ways that are helpful to future users (and our future selves) Metadata and Semantics for describing content (this tends to be producer-focused) But also use-case metadata and emergent relationships (tends to be consumer-focused) 3. Finding what we need, in the context of our current task semantically-enabled search engines that can use the above descriptions, (ideally from within analytical tools and workflows) 4. Working out what we do not need to keep Because it will not be used again or offers no information gain Because it is easier to recreate than to store 5. Governing data collections well, within their communities of use effective governance of data resources quality control strategies, including peer review and rewarding excellence

Manifesto for Open Science 1. Remove restrictions on data re-use 2. Data and metadata should be persistent and linked 3. Build or learn strong descriptions of data to aid automated discovery and human comprehension 4. Expose the provenance and uncertainty where possible 5. Develop indicators of data quality 6. Encourage and support secondary use of data Provide a contextual measure of Fitness For Purpose (FFP), to connect researchers with useful resources

Questions?

1. Online submission of data for publication with basic metadata *2. Editor verifies that the data is within the scope of the collection 3. Automated tools check data for obvious omissions and errors. 4. Online tools ingest and integrate data & generate tables of statistics **5. Potential errors and omissions reported to data author and/or editors 6. Data author acts on this feedback 7. Automated checks verify that data set is complete and standardised. ***8. Data editor confirms that resubmitted data and metadata are correct 9. Independent peer review of data 10. Author responds to referees comments 11. Editor makes a publishing decision based on quality standard achieved by dat set, (including reject and revise and resubmit). ****12. Data and metadata are published online. The data has its own identity that tracks its use, or is integrated into the authoritative subject databases *****13. Papers are published that consumed the data and any errors found have been corrected Biodiversity data should be published, cited, and peer reviewed Mark J. Costello1,, William K. Michener2, Mark Gahegan3, Zhi-Qiang Zhang4, Philip E. Bourne5

Storage devices: Cost vs Value vs Use $0.026/GB 2PB = $52K IOPS 0 $0.05/GB $100K 200 $0.60/GB $1.2m 80K-100K >$1/GB $2m 250K-2m $0.12/GB/pa $240K/pa 0 Near Future 3D 2.5 SSD form factor, 10TB, 20TB by 2017 $0.05/GB Also 3D mssd form factors aiming for 2TB (laptop dimensions) Tim Chaffe, UoA Chief Architect: Valid as at 8/4/2015 with the proviso that this market is changing monthly