Data Exchange in the Earth Sciences

Similar documents
Paving the Rocky Road Toward Open and FAIR in the Field Sciences

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Reproducibility and FAIR Data in the Earth and Space Sciences

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

COALITION ON PUBLISHING DATA IN THE EARTH AND SPACE SCIENCES: A MODEL TO ADVANCE LEADING DATA PRACTICES IN SCHOLARLY PUBLISHING. Source: NSF.

SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Dataverse and DataTags

Interoperability in Science Data: Stories from the Trenches

State of the Art in Data Citation

Reducing Consumer Uncertainty

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

Indiana University Research Technology and the Research Data Alliance

GEOSS Data Management Principles: Importance and Implementation

Programming with the Semantic Web. Adam Shepherd C. Chandler, R. Arko, D. Fils, D. Kinkade, M. Jones

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Using DCAT-AP for research data

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

5/16/2018. Researcher Challenges with Data Use. AGU s position statement on data affirms that

How to make your data open

WP4: Data Forum. Øystein Godøy, Boris Radosavljević, Boris Biskaborn, Anna Irrgang

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Towards FAIRness: some reflections from an Earth Science perspective

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

Roy Lowry, Gwen Moncoiffe and Adam Leadbetter (BODC) Cathy Norton and Lisa Raymond (MBLWHOI Library) Ed Urban (SCOR) Peter Pissierssens (IODE Project

DOIs for Research Data

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

Research Elsevier

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Data and visualization

Checklist and guidance for a Data Management Plan, v1.0

Technical documentation. SIOS Data Management Plan

Linking datasets with user commentary, annotations and publications: the CHARMe project

Reproducible Workflows Biomedical Research. P Berlin, Germany

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

From Open Data to Data- Intensive Science through CERIF

Big Data infrastructure and tools in libraries

Robin Wilson Director. Digital Identifiers Metadata Services

Research Data Repository Interoperability Primer

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

RADAR Project. Data Archival and Publication as a Service. Matthias Razum FIZ Karlsruhe RESEARCH DATA REPOSITORIUM. Zurich, December 15, 2014

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Implementation of Open-World, Integrative, Transparent, Collaborative Research Data Platforms: the University of Things (UoT)

Improving a Trustworthy Data Repository with ISO 16363

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Science Europe Consultation on Research Data Management

Data Discovery - Introduction

DATA SHARING FOR BETTER SCIENCE

The Common Framework for Earth Observation Data. US Group on Earth Observations Data Management Working Group

Web-enabled Physical Samples: Curating and Publishing Physical Samples in CSIRO

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Data Curation Profile Movement of Proteins

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

Introduction to Data Management for Ocean Science Research

EUDAT. Towards a pan-european Collaborative Data Infrastructure

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

State of the Art in Ethno/ Scientific Data Management

NSF Proposals and the Two Page Data Management Plan. 14 January 2011 Cyndy Chandler

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Scientific Data Curation and the Grid

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Adoption of Data Citation Outcomes by BCO-DMO

EUDAT Common data infrastructure

EUDAT- Towards a Global Collaborative Data Infrastructure

re3data.org - Making research data repositories visible and discoverable

CUED Library Research Support. Dan Crane, Research Support Librarian

The State of Arctic Data the IPY experience

EUDAT & SeaDataCloud

Scholix Metadata Schema for Exchange of Scholarly Communication Links

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Data Publishing and Data Linking Introducing SCHOLIX

SHARING YOUR RESEARCH DATA VIA

The Changing Role of Data Stewardship in Creating Trustworthy, Transdisciplinary High Performance Data Platforms for the Future

Data Citation and Scholarship

Data Management at NIST

INSPIRE & Environment Data in the EU

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Interoperability ~ An Introduction

Making data publication a first class research output

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Rolling Deck to Repository: Opportunities for US-EU Collaboration

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

EPOS a long term integration plan of research infrastructures for solid Earth Science in Europe

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Interoperability Framework Recommendations

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

Stewarding NOAA s Data: How NCEI Allocates Stewardship Resources

Transcription:

Data Exchange in the Earth Sciences Perspective of a multidisciplinary data facility Kerstin Lehnert, Columbia University lehnert@ldeo.columbia.edu 1

Access to Data Transparency & Reproducibility Publishers/Journals New science Funders Open Data Researchers Return on Investment Repositories Credit 2

FAIR Data Findable Easy to find by both humans and computer systems based on mandatory description (metadata) that allow the discovery Accessible Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content Interoperable Ready to be combined with other datasets by humans as well as computer systems Reusable Ready to be used for future research and to be processed further using computational methods. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016). 3

Topics The Research Data Lifecycle Enhance re-usability: provenance, integration Enhance access: linking data & literature 4

FAIR FARI Integrated / interoperable V a l accessible re-usable context, provenance harmonized, machine-readable u e findable identification, persistence protection, protocols Data Curation Standards Domain-specific Data Standards 5 Domain Repositories

Data Facilities: Objectives Advance & accelerate scientific discoveries Easy & fast access to data comprehensive knowledge base Integration of data for interdisciplinary research Ensure quality of data and trust into scientific results Preserve irreproducible observations & results Support research ethics 6

IEDA: A Multi-Disciplinary Data Facility geochemistry, marine geophysics, marine geology, geochronology, and more sensor data and sample-based observations & experiments field data, lab data, processed data, samples gridded data, point data, time-series data, maps, photos, and more long-tail to big data 7 Geochemistry Marine Geophysics Samples Antarctic Science

IEDA Services Trusted repository services Ensure citability (DOI registration) Long-term preservation Links to publications Domain-specific data curation QA/QC metadata standards development Science-specific user interfaces & software tools Data products (synthesis) Cross-disciplinary data access Programmatic, standards-based interfaces User support & training 8 www.iedadata.org

IEDA Repositories Data Data Data Investigators Metadata Catalog FINDABLE & ACCESSIBLE DOI registration Long-term archiving CC license Guidelines for data reporting Provenance metadata Formats QC by data managers Data Data Data Data Data EarthChem Data Managers 9

Guidance for Investigators 10

QA/QC: Data Reporting Standards Accessible in the EarthChem Library 11

DOI to allow proper citation of data 12 Link to publications Link to funding source 12

13

Publishers Concern: Reproducibility M. McNutt, K. Lehnert, B. Hanson, B. A. Nosek, A. M. Ellison, J. L. King SCIENCE Policy Forum, 04 MAR 2016 14

Recent Alignment by Publishers, Repositories, and Funders Around Data TOP (transparency and openness promotion guidelines) 538 journals COPDESS.org (Coalition on Publishing Data in the Earth and Space Sciences) Statement of Commitment endorsed by most publishers and repositories in the Earth and space sciences Joint Declaration of Data Citation Principles endorsed by 109 organizations including most major publishers. Reproducibility conferences and outcomes (AAAS and other orgs) Certification standards for repositories (WDS, DSA, ISO) 15

TOP s 8 Standards Data citation Design transparency Research materials transparency Data transparency Analytic methods (code) transparency Preregistration of studies Preregistration of analysis plans Replication 3 Tiers: Disclose Require Verify 16

https://www.force11.org/group/joint-declaration-data-citation-principles-final 17

Certification standards for repositories 18 From Helen Glaves and Gary Baker

Coalition on Publishing Data in the Earth and Space Sciences (COPDESS.org) Connecting Earth Science publishers and Data Facilities to help translate the aspirations of open, available, and useful data from policy into practice. Formed in October 2014 Endorsed a Statement of Commitment, 2015 Consistent policies across publishers/journals Increase development and enforcement of data best practices Reduce effort of metadata QC Increase flow of small data into repositories 19

COPDESS Goals Consistent policies across publishers/journals Increase development and enforcement of data best practices Reduce effort of metadata QC Increase flow of small data into repositories 20 www.copdess.org

New Publishing Paradigms 21

Future of Linking Literature & Data http://dliservice.research-infrastructures.eu/ http://www.scholix.org 22

The Challenge of Data Integration System Metadata Standard Database Schema Data Types MGDL ECL Datacite, ISO 19115-2, DIF, SeismicXML DataCite, Dublin Core, ISO 19115-2 expedition-based dataset-based USAP-DC DataCite, DIF NSF award-based ASP@UTIG DataCite, SeismicXML expedition-based SESAR IGSN Description Metadata Custom (sample-based) Mutlidisciplinary Marine Geoscience Sensor Field Data (e.g. towed bathymetry, side scan sonar, controlled source seismic, seafloor photos and photomosaics). Derived data products (e.g. DEMs, microseismicity catalogs, magnetic and gravity gridded compilations, geologic interpretations). Diverse complementary sensor data (e.g. temperature and chemical probes, optical backscatter) Geochemical, geochronological, petrographic, petrological datasets, code for geochemical data reduction and modelling Multi-disciplinary data types spanning Antacrtic research (e.g. volcano observatory video, penguin counts, paleo-geologic maps, meteorological model outputs) Controlled source seismic field (legacy) and processed data Sample metadata for Earth science samples (e.g. rock, fossil, sediment, soil, fluid) 23

Toward Integration of Systems 24

EarthCube GeoLink Building Block: Find and integrate resources across repositories Awards Expeditions Datasets Documents Features Instruments Measurements Organizations Papers Persons Platforms Presentations Programs Repositories Samples via Semantic Web (Linked Data). Partners: : (more) Slide courtesy of R. Arko, Columbia University GeoLink + IGSNs

GeoLink design 1.Shared Ontology classes +properties that describe resources, sufficient for discovery 2.Linked Data Resources are on the Web, open license Resources are structured and use non-proprietary formats and languages (RDF, SPARQL) Resources have HTTP URIs Resources are linked to other Resources 3.Canonical Resources agreement on reference set of resources to anchor mappings GeoLink + IGSNs

Example: Linked Resources ORCID Expedition @R2R Award @NSF Paper Dataset @ECL Dataset @BCO-DMO Samples @SESAR GeoLink + IGSNs

Data Synthesis Forming a picture 28

Concept: Data Mining Measurements stored in relational database Linked to context & provenance metadata for search and filter Harmonized and quality controlled data & metadata Data output allows immediate data analysis Users generate new data compilations across any number of data sets Downloadable spreadsheets with all context & provenance information Use of unique sample ID links data for individual samples across datasets to support further exploration 29

Data Synthesis: PetDB Global compilation of geochemical data for igneous rocks from the ocean floor & mantle xenoliths > 2,280 data sets/publications > 87,600 samples > 3.3 million observed values 30 http://www.earthchem.org/petdb

Syntactic: ODM2 ODM2 Team: J S Horsburgh A K Aufdenkampe L Hsu A Jones K Lehnert E Mayorga L Song D Tarboton I Zaslavsky 31

ODM2 Benefits Integration of different observational data types specimen-based single observations, time-series, arrays Alignment with OGC Conceptual Data Model Observations & Measurements (ISO 19156:2011) Comprehensive capture of provenance (but not aligned with W3C PROV) Controlled vocabularies Interoperability at the level of sampling features Web services (OGC standards), XML (e.g. GeoSciML) 32

Interoperability with Library of Experimental Phase Relations 33

Interoperability with Modeling Tools

Data Synthesis A (too?) BIG Effort Transfer of data and metadata from publications into databases by data curators: Time consuming Requires deep understanding of data Lack of recognition for the job Partial automation possible currently not satisfactory expensive development effort 35

Automation: Example Best Reference: DeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015. 36

Integrating Heterogeneous Data You think it s easy? 37

Take home messages Relevance of standards Community best practices: Data type specific documentation of provenance Data exchange protocols & APIs Repository best practices Benefits of collaboration & participation in research data organizations (RDA, WDS, etc.) Connections to publishers & editors Difficulty of migrating older (legacy) systems 38

39