Data Reuse and Transparency in the Data Lifecycle. Steven Worley Doug Schuster Bob Dattore National Center for Atmospheric Research Boulder, CO USA

Similar documents
Tracking data usage at NCAR s Research Data Archive. Steven Worley Computational and Information System Laboratory NCAR

Data Management Components for a Research Data Archive

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Big Data infrastructure and tools in libraries

Introduction to Data Management for Ocean Science Research

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

Data publication and discovery with Globus

C3S Data Portal: Setting the scene

The Materials Data Facility

Best Practice Guidelines for the Development and Evaluation of Digital Humanities Projects

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Scientific Data Curation and the Grid

GEOSS Data Management Principles: Importance and Implementation

Research repository models: Can one size fit all?

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Technical documentation. D2.4 KPI Specification

ICOADS: Update Status and Data Distribution

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Virtualization and Softwarization Technologies for End-to-end Networking

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

General Certificate of Secondary Education Digital Technology. Unit 5 Digital Development Practice MARK SCHEME

Writing a Data Management Plan A guide for the perplexed

Developing a Research Data Policy

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Robin Wilson Director. Digital Identifiers Metadata Services

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

Indiana University Research Technology and the Research Data Alliance

Comparing Curricula for Digital Library. Digital Curation Education

Sparta Systems TrackWise Digital Solution

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Design patterns for data-driven research acceleration

For Attribution: Developing Data Attribution and Citation Practices and Standards

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

StrongLink: Data and Storage Management Simplified

Remarks to the HCI&IM-Sponsored National Workshop on Information Integration. Workshop Deliverables: Roadmap, Hard Problems, and Report

Introduction to ALM, UFT, VuGen, and LoadRunner

The CEDA Archive: Data, Services and Infrastructure

Cloud Archive and Long Term Preservation Challenges and Best Practices

2011 INTERNATIONAL COMPARISON PROGRAM

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

/// INTEROPERABILITY BETWEEN METADATA STANDARDS: A REFERENCE IMPLEMENTATION FOR METADATA CATALOGUES

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

2011 INTERNATIONAL COMPARISON PROGRAM

S-100 Product Specification Roll Out Implementation Plan. Introduction

CERA: Database System and Data Model

Towards FAIRness: some reflections from an Earth Science perspective

Digital Preservation DMFUG 2017

Data on the Web Best Practices: Challenges and Benefits

Western U.S. TEMPO Early Adopters

Research Data Management and Institutional Repositories

Preservation of Web Materials

Open Science, FAIR data and effective data management

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Striving for efficiency

Data Curation Profile Movement of Proteins

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Version 3 Updated: 10 March Distributed Oceanographic Match-up Service (DOMS) User Interface Design

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Topic 01. Software Engineering, Web Engineering, agile methodologies.

Design Informatics. - Report Out -

The C3S Climate Data Store and its upcoming use by CAMS

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

Integration Standards for SmartPlant Instrumentation

NeAT Business Plan Component Data Integration and Annotation Services in Biodiversity (DIAS-B) 1. Service Description

Interoperability in Science Data: Stories from the Trenches

Assessment of product against OAIS compliance requirements

The Community Data Portal and the WMO WIS

An Introduction to Digital Preservation

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Wendy Thomas Minnesota Population Center NADDI 2014

Earth Science Community view on Digital Repositories

Stewarding NOAA s Data: How NCEI Allocates Stewardship Resources

OPENAIRE FP7 POST-GRANT OPEN ACCESS PILOT

B2FIND and Metadata Quality

Horizon 2020 Open Research Data Pilot: What is required? Sarah Jones Digital Curation Centre

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Research Data Repository Interoperability Primer

DALA Project: Digital Archive System for Long Term Access

CoSA & Preservica Practical Digital Preservation 2015/16. Practical OAIS Digital Preservation Online Workshop Module 2

Data Virtualization Implementation Methodology and Best Practices

Version March Distributed Oceanographic Match-up Service (DOMS) Translation Specification: ICOADS In Situ Data

Thailand Digital Government Development Plan Digital Government Development Agency (Public Organization) (DGA)

Integration With the Business Modeler

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

What's new with Rational IBM s Telelogic Solutions move to Jazz

Data is the new Oil (Ann Winblad)

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

ISTE SEAL OF ALIGNMENT REVIEW FINDINGS REPORT. Certiport IC3 Digital Literacy Certification

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Digital Preservation Policy. Principles of digital preservation at the Data Archive for the Social Sciences

ISO/IEC INTERNATIONAL STANDARD. Information technology Software asset management Part 1: Processes and tiered assessment of conformance

Data Exchange and Conversion Utilities and Tools (DExT)

Data Curation Handbook Steps

Policy Based Distributed Data Management Systems

Transcription:

Data Reuse and Transparency in the Data Lifecycle Steven Worley Doug Schuster Bob Dattore National Center for Atmospheric Research Boulder, CO USA

Topics Data Reuse and Transparency What are these data features? Why are they important? Archiving practices Access practices DCERC, NCAR, June 5-7 2012 2

What are these data features? Data reuse implies: Expanding usage beyond intended primary community Maintaining reference datasets and building many products from them Data transparency implies: Reproducibility - ability to reproduce data files or products for users Traceability tagging and preserving access details DCERC, NCAR, June 5-7 2012 3

Why are Reuse and Transparency Important? Data centers/providers are expected to support factbased outcomes: Traditionally for science/research Now also for policy makers, community leaders, individual citizens, and commercial interests. DCERC, NCAR, June 5-7 2012 4

Supporting New Reuse and Transparency Decisions by policy makers Traceable open access sources Actions by community leaders Planning for societal services Emergencies, water, energy, etc. Usage by citizens and educators Inquisitive science, family activities, safety Science learning Collaborative commercial applications Tighter coupling between engineering and science Wx forecasts for wind energy production Energy companies contribute mesoscale observations DCERC, NCAR, June 5-7 2012 5

Archiving practices Curation that assures data authenticity Preserve original data formats, to the max. extent possible. Maintaining 100% content and accuracy serious challenge Use a rich metadata standard A local standard? Generate discipline and cross-discipline standards E.g. ISO, DIF, etc. Create multiple copies Data files, metadata, documentation, and software Disaster recovery not a secondary concern DCERC, NCAR, June 5-7 2012 6

Archiving practices Collection completeness and integrity Closely monitor data work flow Account for every file Read every file Gather, check, preserve metadata Compute and preserve file checksums Maintain dataset lineage / provenance Use approved processes to delete datasets (never?) Establish tiered level of service for data Move old / superseded versions to lower level Keep all metadata on the highest tier discoverable! DCERC, NCAR, June 5-7 2012 7

Archiving practices Explicit data version tracking Sometimes, internal to files Always, within data management system Include notations in all documentation Establish Digital Object Identifiers (DOIs) Two-way linkage between publications and data Promotes easy path for follow-on research from publications Leverages skills / facilities of libraries richer knowledge base Create data family tree connections DCERC, NCAR, June 5-7 2012 8

Dataset Family Tree Example Global and Regional Atmospheric and Ocean Re-analyses NCEP/NCAR, NARR, ERA-40, ERA-Interim, 20CR, OARCA NOC Surf. Flux (1973-2009) Ocean Clouds (1900-2010) HadISST (1871-2011) WASwind (1950-2009) JMA SST (1871-2011) NOAA OI SST (1981-2011) Etc. HadSLP (1871-2011) NOAA ERSST (1854-2011) International Comprehensive Ocean Atmosphere Data Set (ICOADS) Global marine surface observations (1662-2011) DCERC, NCAR, June 5-7 2012 9

Dataset Family Tree - Evolution Grand Child Child Child Parent Grand Child Data Center Centric Parent Challenges: Web Centric System of immutable IDs DOIs? Multi-institution preservation commitment Transparency across institutions, accepted standards/governance Promote discovery by sharing metadata, OAI-PMH Future, knowledge-based discovery and access via ontologies within semantic web DCERC, NCAR, June 5-7 2012 10

Access Practices User Identification key to reproducibility Record all data access transactions Who received what and when Log product creation constraints from GUIs and web services Log software IDs used for product creation Benefits Reproduce a data access process Feedback to users about data changes Use metrics imply how to improve access DCERC, NCAR, June 5-7 2012 11

Metrics Example CFSR 6hrly, GRIB2, 1979-2011, 75TB, 28K fields/time step, 168K files 63% of users are non-us Now exporting 25+ TB monthly Subsetting, now ~500 requests/month Track User activity: - who accessed what and when DCERC, NCAR, June 5-7 2012 12

Conclusions Data reuse and transparency are rapidly expanding in importance Many best practices in archive management support reuse and transparency Archive access monitoring is necessary for transparency, reproducibility, and traceability Need significant improvement in linking data family trees and data to publications to advance reuse and transparency DCERC, NCAR, June 5-7 2012 13

Research Data Archive at NCAR http://rda.ucar.edu/ DCERC, NCAR, June 5-7 2012 14