Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

Similar documents
DataONE. Promoting Data Stewardship Through Best Practices

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

DataONE: Open Persistent Access to Earth Observational Data

Exercise 3 Visualization of MODIS LAI/FPAR product at local scale at ORNL DAAC

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Defense Coastal/Estuarine Research Program (DCERP) SERDP RC DCERP Data Policy Version 2.0

Sustainable Governance for Long-Term Stewardship of Earth Science Data

NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION S SCIENTIFIC DATA STEWARDSHIP PROGRAM

The U.S. National Spatial Data Infrastructure

GEOSS Data Management Principles: Importance and Implementation

Data Stewardship NOAA s Programs for Archive, Access, and Producing Climate Data Records

A Data Management Plan Template for Ecological Restoration and Monitoring

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

DEEPWAVE DATA MANAGEMENT

Growing Variety and Volume of Remote Sensing and In Situ Data

Levels of Service Authors: R. Duerr, A. Leon, D. Miller, D.J Scott Date 7/31/2009

The Changing Role of Data Stewardship in Creating Trustworthy, Transdisciplinary High Performance Data Platforms for the Future

California GIS Enterprise Strategy

DIAS_Satellite_MODIS_SurfaceReflectance dataset

Introduction to Data Management for Ocean Science Research

DATA STEWARDSHIP BODY OF KNOWLEDGE (DSBOK)

Western U.S. TEMPO Early Adopters

The US National Near-Earth Object Preparedness Strategy and Action Plan

4 th Working Group on Geospatial Information

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

Data Governance Central to Data Management Success

National Snow and Ice Data Center. Plan for Reassessing the Levels of Service for Data at the NSIDC DAAC

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Image Processing on the Cloud. Outline

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Toward the Development of a Comprehensive Data & Information Management System for THORPEX

AWIPS Technology Infusion Darien Davis NOAA/OAR Forecast Systems Laboratory Systems Development Division April 12, 2005

University of Michigan deepblue.lib.umich.edu Web Presence Update. Vacek, Rachel.

US NODC Archival Management Practices and the OAIS Reference Model Donald W. Collins

State of the Art in Ethno/ Scientific Data Management

Arctic Data Center: Call for Synthesis Working Group Proposals Due May 23, 2018

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

For Attribution: Developing Data Attribution and Citation Practices and Standards

The CEDA Archive: Data, Services and Infrastructure

and The Technical Assist Database Presented to the Regional GIS Council October 8, 2008

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

ANZPAA National Institute of Forensic Science BUSINESS PLAN

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

GRID MODERNIZATION INITIATIVE PEER REVIEW GMLC Industrial Microgrid Analysis and Design for Energy Security and Resiliency

Data Symposium 2012 SeWHIP & CTSI John W. Cobb, Ph.D. Milwaukee, WI March 1, 2012

State of Nevada Perspective

GEO Update and Priorities for 2014

Earth Science Community view on Digital Repositories

ORION/OOI CYBERINFRASTRUCTURE

Data Curation Profile Cornell University, Biophysics

SLAS Special Interest Group Charter Application

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners

State of the Art in Data Citation

Reproducibility and FAIR Data in the Earth and Space Sciences

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

National Snow and Ice Data Center. Plan for Reassessing the Levels of Service for Data at the NSIDC DAAC

Data Citation. Mark Parsons, Ruth Duerr and the Federation of Earth Science Information Partners (ESIP)

PNAMP Metadata Builder Prototype Development Summary Report December 17, 2012

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Data Curation Profile Movement of Proteins

NASA's GLOBAL CHANGE MASTER DIRECTORY: FOSTERING COLLABORATIONS FOR EARTH SCIENCE INFORMATION AND DATA RETRIEVAL

CODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey

A/AC.105/C.1/2013/CRP.6

Annual Report for the Utility Savings Initiative

Denver Regional Data Summit 2009

Integrating Ground-Based EO Data in Satellite-Based Systems*

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

JCOMM Observing Programme Support Centre

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

Good morning, Chairman Harman, Ranking Member Reichert, and Members of

Implementing a Data Quality Strategy to simplify access to data

GNSSN. Global Nuclear Safety and Security Network

Advancing the fourth paradigm of research: Assimilating repositories into active research phases

Homeland Security and Geographic Information Systems

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

NASA Policy Directive

Digital Preservation Policy. Principles of digital preservation at the Data Archive for the Social Sciences

Analysis Ready Data For Land (CARD4L-ST)

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Earth Observation, Climate and Space for Smarter Government

DATA SHARING FOR BETTER SCIENCE

Stewarding NOAA s Data: How NCEI Allocates Stewardship Resources

Data Curation Profile Human Genomics

The IRIS Data Management Center maintains the world s largest system for collecting, archiving and distributing freely available seismological data.

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Marine Transportation System Resilience: A Federal Agency Perspective

National Counterterrorism Center

Collaborations and Partnerships. John Broome CODATA-International

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

DataFlow and VIDaaS Workshop

SCICEX Data Stewardship: FY2012 Report

Callicott, Burton B, Scherer, David, Wesolek, Andrew. Published by Purdue University Press. For additional information about this book

STC Program. Title: Is your Campus website in compliance with NYS accessibility standards

Earth Observation Imperative

Indiana University Research Technology and the Research Data Alliance

Data Citation and Scholarship

STATE BROADBAND ACTION PLAN MAY 2015 Nevada Economic Development Conference PREPARED BY CONNECT NEVADA AND THE NEVADA BROADBAND TASK FORCE

Transcription:

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center Robert Cook, DAAC Scientist Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN cookrb@ornl.gov Data Curation Education in Research Centers Workshop Boulder, CO June 5-7, 2012

Presentation Outline Overview of the ORNL DAAC Data Curation Practices Tools Challenges and Lessons 2

The Earth Observing System (EOS) Program NASA s Earth Observing System (EOS) Program centerpiece of NASA s Earth Science Division and largest component in the U.S Global Change Research Program space-based observing system ground-based field campaigns EOS Data and Information System objective is to enable quick and easy access to data about the earth data available from 12 Data Centers, organized by scientific discipline ORNL DAAC focuses on terrestrial biogeochemistry and ecological dynamics data 3

ORNL DAAC Archive data products produced by projects within NASA s Terrestrial Ecology Program Mission: assemble, distribute, and provide data services for a comprehensive archive of terrestrial biogeochemistry and ecological dynamics observations and models to facilitate research, education, and decision-making in support of NASA s Earth science. ORNL DAAC: daac.ornl.gov 4

ORNL DAAC: Data Holdings Total Data Sets = 969 1. Field Campaigns (738) FIFE OTTER SNF BOREAS LBA NACP LBA BOREAS S2K LBA S2K 2. Validation of Land Products (23) Land Validation MODIS Subsets FLUXNET NPP BigFoot In-situ Observations LAI/fPAR NPP? Remote Sensing LAI/fPAR NPP 3. Regional and Global Studies (198) Climate Soils Vegetation Hydroclimatology Daymet 4. Model Products (10) Benchmark Models IBIS, BIOME-BGC, LSM Manuscript Models PNeT, Century, Biome-BGC 5

ORNL DAAC by the numbers Data cost: Zero (free, unrestricted) Staff: 10 full-time equivalents Environmental scientists, computer scientists, database managers, and GIS specialists Data Holdings: 969 data sets; ~0.5 Tb volume Data orders: FY 2011: 1,300,000 products ordered by 16,500 users FY 2012 (first half): Nearly 450,000 products distributed to 11,000 users ~30% are foreign and ~30% have numeric IP Remaining user domains are commercial, educational, and government Data Product Use: 140 data products cited in the literature in 2011 Distributing data for 18 years (began in 1994) 6

Data Flow - From Investigator to the Archive Investigators Collect Document QA / QC Analyze Publish Project Policies Protocols QA Review Generate standard products Integration and synthesis DAAC QA Review Document Archive Distribute User Support Project Investigators Project Data System Finalized Project Data Sets 7

Proper Curation Enables Data Reuse 20-Year Rule NRC 1991 The metadata accompanying a data set should be written for a user 20 years into the future. 8

Provide Guidance: Best Practices for Preparing Ecological Data to Share Cook et al. 2001 Bulletin ESA 82: 138 141 Best Practices include: 1. Assign Descriptive File Names 2. Use Consistent and Stable File Formats 3. Define the Parameters 4. Use Consistent Data Organization 5. Perform Basic Quality Assurance 6. Assign Descriptive Data Set Titles 7. Provide Documentation Update on-line: http://daac.ornl.gov/pi/pi_info.html Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive Robert B. Cook, Richard J. Olson, Paul Kanciruk, and Leslie A. Hook Environmental Sciences Division Oak Ridge National Laboratory Workshops at ESA 2010 & 2011 (with DataONE) Workshops at AGU 2010 and 2011 with ESIP Workshop at AMS 2012 with ESIP 9

ORNL DAAC: Curation and Archive Functions Acquisition identify how best to serve the scientific community establish how and when to receive data Ingest perform QA checks compile project-provided metadata convert to archivable file formats Enhance (as requested) convert to standard formats & units aggregate files Metadata / Documentation Prepare final metadata record and documentation Archive / Publish generate citation Exploration and Distribution provide tools to explore, access, and extract data for users worldwide Post-Project Data Support advertise data serve as a buffer between end users and PIs provide usage / citation statistics Stewardship provide long-term secure archiving of the data security, disaster recovery migration to new computer systems User Working Group science panel that provides advice on all aspects of the ORNL DAAC 10

ORNL DAAC: User Working Group Working Group Includes NASA sponsors and DAAC leaders Board of Directors function not FACA (Federal Advisory Committee Act) Serves a peer review function for an on-going project (began in 1992; began distributing data in 1994) Assist in defining the DAAC's science goals, setting priorities Represent the scientific interests of the research community members are data providers and data users Driving force for DAAC evolution over the past decade Provide guidance on DAAC activities, including data set acquisition, development of tools and services, incorporation of new technologies 11

Enable collectors and users to spend more time analyzing the data and less time doing data management Acquiring Data Discover, access, extract, integrate and analyze Training in Data Management Enhanced metadata entry tools Data Discovery and Access Tools Data Center Activities Data Exploration tools Visualize Subset Analyze Integrate Spatial data analysis 12

User Tools Metadata Editor Search and Access Spatial Data Access Tool THREDDS Data Server MODIS Subsetting Search and Access Spatial Data Access Tool THREDDS Data Server 13

Subsets of MODIS* remote sensing products for field studies MODIS Rapid Response System 2009.08.30 Station Fire Station Fire with JPL in the foreground (photo courtesy of C. Miller, JPL) *MODIS = Moderate Resolution Imaging Spectroradiometer 14

MODIS Subsets: Vegetation Index Interactive time series Increasing Greeness Demonstration 15

Data management coordination Collectors Users Data Management and Analysis System Close coordination / communication among data managers, those making the measurements, and data users is critical 16

Challenges Obtaining documented, organized, and highquality data from field investigators 17

Lessons Data centers should be a partnership of data managers, data providers, and data users Seek formal guidance from User Working Group Data centers should focus on use of the data Data centers should facilitate new science Metric is citations to data sets Expect change Funder and research community expectations and advances in information technologies demand flexibility 18

Web Resources ORNL DAAC http://daac.ornl.gov Search Tool / Metadata Clearinghouse http://mercury.ornl.gov/ornldaac/index.jsp MODIS Land Product Subsetting Spatial Data Access Tool 19

Additional Slides 20

Challenges Obtaining documented, organized, and highquality data from field investigators Carrots Providing training and tools Work effectively Citations & indices Collaborations Credit for future proposals Funders look good Sticks Can t publish without archiving Requirement for closing projects Link to future funding 21

Metadata needed to Understand Data The details of the data. parameter name Measurement date sample ID location For those on Investigator s team, amount of metadata required to understand the data is small 22

Metadata Needed to Understand Data: 20-year perspective date words, words. QA def. Units def. Units QA flag generator date words, words units method Parameter def. parameter name Measurement sample ID method location media records lab field Method def. Record system org.type name custodian address, etc. From Raymond McCord, ORNL Sample def. type date location generator coord. elev. type depth GIS 23