SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management

Similar documents
Paving the Rocky Road Toward Open and FAIR in the Field Sciences

COALITION ON PUBLISHING DATA IN THE EARTH AND SPACE SCIENCES: A MODEL TO ADVANCE LEADING DATA PRACTICES IN SCHOLARLY PUBLISHING. Source: NSF.

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

ISMTE Best Practices Around Data for Journals, and How to Follow Them" Brooks Hanson Director, Publications, AGU

Data Exchange in the Earth Sciences

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Reproducibility and FAIR Data in the Earth and Space Sciences

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Web-enabled Physical Samples: Curating and Publishing Physical Samples in CSIRO

DOIs for Research Data

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data publication and discovery with Globus

Indiana University Research Technology and the Research Data Alliance

Rolling Deck to Repository: Opportunities for US-EU Collaboration

Making Sense of Data: What You Need to know about Persistent Identifiers, Best Practices, and Funder Requirements

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

SHARING YOUR RESEARCH DATA VIA

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

The Materials Data Facility

Research Elsevier

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

GEOSS Data Management Principles: Importance and Implementation

Towards FAIRness: some reflections from an Earth Science perspective

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Data Archival and Dissemination Tools to Support Your Research, Management, and Education

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

NeAT Business Plan Component Data Integration and Annotation Services in Biodiversity (DIAS-B) 1. Service Description

Robin Wilson Director. Digital Identifiers Metadata Services

International Oceanographic Data and Information Exchange - Ocean Data Portal (IODE ODP)

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

INDEPTH Network Introduction to NADA

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Ag Data Commons: Harnessing the Power of Digital Agriculture Cynthia Parr USDA ARS National Agricultural Library

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

eresearch Collaboration across the Pacific:

Metadata Management System (MMS)

Introducing the Springer Nature Data Support Services

Sessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

POC Evaluation Guide May 09, 2017

Interoperability in Science Data: Stories from the Trenches

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

CrossRef developments and initiatives: an update on services for the scholarly publishing community from CrossRef

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury

A VO-friendly, Community-based Authorization Framework

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

Data Curation Handbook Steps

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

DATAVERSE FOR JOURNALS

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Registry Interchange Format: Collections and Services (RIF-CS) explained

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

BPMN Processes for machine-actionable DMPs

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe

Demos: DMP Assistant and Dataverse

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Brown University Libraries Technology Plan

Handles at LC as of July 1999

Data Discovery - Introduction

Data Reuse and Transparency in the Data Lifecycle. Steven Worley Doug Schuster Bob Dattore National Center for Atmospheric Research Boulder, CO USA

Data Curation Profile Human Genomics

Ponds, Lakes, Ocean: Pooling Digitized Resources and DPLA. Emily Jaycox, Missouri Historical Society SLRLN Tech Expo 2018

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

B2SAFE metadata management

Call for Participation in AIP-6

Software Requirements Specification for the Names project prototype

Towards a joint service catalogue for e-infrastructure services

Dryad Curation Manual, Summer 2009

PDS, DOIs, and the Literature. Anne Raugh, University of Maryland Edwin Henneken, Harvard-Smithsonian Center for Astrophysics

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:

James Hardiman Library. Digital Scholarship Enablement Strategy

Managing Complex SAS Metadata Security Using Nested Groups to Organize Logical Roles

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

CrossRef tools for small publishers

Design patterns for data-driven research acceleration

Oracle WebCenter Interaction: Roadmap for BEA AquaLogic User Interaction. Ajay Gandhi Sr. Director of Product Management Enterprise 2.

The Virtual Observatory and the IVOA

A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0

Transcription:

Moving Repositories into the Digital Age: SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management Kerstin Lehnert & Megan Carter Orlando IEDA Lamont-Doherty Earth Observatory, Columbia University 1

Talk Outline Update on CI developments: SESAR, IGSN Motivations & Previous Work Proposed System Functionalities Leveraging Existing Components Discussion 2

SESAR System for Earth Sample Registration Authenticated workspace with tools for users to submit & manage sample metadata (MySESAR) IGSN Allocating Agent: Register samples with IGSN Searchable catalog of sample metadata & supplementary documents submitted by users Register samples (batch or individual) View/edit metadata Create groups/collections Transfer ownership of metadata Generate labels (QR code) 3 Role-based access

Update: New SESAR Features Architecture aligned with new IGSN syntax rules Allows IGSN >9 digits SESAR holds name space IE, users have sub-name spaces APIs for submission (incl. authentication), updates, & access of sample metadata Batch updating of sample metadata Role-based permissions Linked Data version of SESAR (GeoLink project R. Arko, P. Ji) Links to cruise DOIs, publication & data DOIs, ORCIDs OAI-PMH provider for IGSN Central Catalog & Community Portals 4

IEDA Data Browser 5

SESAR: Plans Modify metadata to distinguish samples in trusted repositories from informal (uncurated) archiving to accommodate emerging open access policies of publishers & funders Allow users to add customized metadata fields to accommodate specific sample types & communities Improved search interface (SolR index, Elastic Search) Linking to data (as data repositories start to use IGSN) PID based linking to people (ORCID), cruises (DOI), funding awards (FundRef), repositories (re3data), etc. 6

IGSN Resolvable, globally unique PID (Persistent Identifier) Based on the Handle System (like DOI), resolvable at http://igsn.org/xxxzzzzzz Governed by an international non-profit organization (IGSN e.v.) Membership in 5 continents 7

Update about IGSN 4,106,273 >6.25 million IGSNs issued! 2,100,273 Status January 2017 25,748 25,583 4,461 Allocating Agent: started: IEDA Geoscience Australia CSIRO MARUM GFZ 2005 2015 2015 2015 2015 8

120 IGSN Prefixes Used at SESAR (proxy for active users) 100 80 60 Batch registration released 40 20 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 9

Linking Samples, Data, & Publications 10

Adoption: Agencies USGS State surveys Smithsonian GA BGS IFREMER GESEP 11

IGSN Adoption by publishers AGU Publications also strongly encourages use of other identifiers in our journal papers. International Geo Sample Numbers (IGSNs) uniquely identify items, such as a rock sample, a piece of coral, or a vial of water taken from the natural environment, and provide important, consistent information about these samples. Registering samples and including the IGSN in papers helps secure provenance information but most importantly connects common samples across multiple studies in the literature. IGSNs also will help you keep track of your samples. These identifiers can be reserved before a field season or assigned afterward. Hanson, B. (2016), AGU opens its journals to author identifiers, Eos, 97, doi:10.1029/2016eo043183. Published on 7 January 2016. 12

IGSN in DataCite 13

IGSN & ORCID All, ORCID is officially introducing IGSN as a new identifier type in the API and search indexes. It will use the igsn.org prefix and should be live in a couple of weeks. I will check what this means concretely when it is live and see if I can come up with some examples that demonstrate the functionality. Cheers, m. (mail from Markus Stocker, Pangaea, 4/25/2017) 14

IGSN Architecture The separation of administrative and descriptive metadata (learning from setting up DataCite). Metadata in a common service can be: Least common denominator All-encompassing with lots of optional elements Communities of practice define their metadata as an extension of a core set of metadata. 15 http://igsn.github.io/

IGSN Description Metadata The schema gives a minimal set of descriptive metadata for a global sample catalogue. Sample Identification Sampling Activity Sample Curation Related Resources Elements reference to IGSN admin, ODM2, O&M, DataCite. http://schema.igsn.org/description/ 16

A Repository Portal Federated network of repository catalogs that improves discovery, access, sharing, analysis, and curation of physical samples to promote transparency, reproducibility, and re-use in the era of open science. Support for digital collection management if needed Streamline collection metadata management, exchange, & integration with other systems (e.g. IGSN, IMLGS) Facilitates discovery & access to samples for investigators Harmonizes sample request & access policies and procedures across repositories. Supports consistent & automated generation of use statistics to demonstrate Return on Investment (ROI) 17

Requirements Gathering: DESC Supplement to IEDA in 2011 from OCI (DCL) Collaboration with OSU, LacCore, AZGS, RENCI Extensive survey of repositories Produced a report on status and requirements 18

Requirements Gathering: isamples EarthCube isamples Research Coordination Network (RCN) Stakeholder Alignment Survey (J. Cutcher-Gershenfeld) Use Cases Working Group (S. Ramdeen, A. Deere) Collected user stories to articulate life cycle practices of samples for different users Workflow Applications Working Group (J. Bowring, A. Hangsterfer) Identify barriers that inhibit adoption of leading practices, incl. IGSN assignment, sample documentation, and sample citation. 19 Requesting and receiving actual physical samples from a museum/repository is important.81 (.19) Requesting and receiving actual physical samples from a museum/repository is easy..43 (.24)

Requirements Gathering: SESAR Ongoing feedback from the community (investigators, curators) regarding functionality, usability, performance Includes EAR and PLR funded repositories, museums, agencies Collaboration with other Allocating Agents of the IGSN e.v. International best practices and emerging software tools LDEO is the Managing Office of the IGSN e.v. Participation in previous Curators Meetings Extensive use cases provided by Anders Noren, Anthony Koppers, Kevin Johnson and others 20

Possible System Components Repository Portal User interface and APIs for sample search across repositories Authenticated workspace for users & repositories Request samples from multiple repositories with a single transaction View loans Communicate with curators Authenticated dashboard for NSF to view use statistics Repository Collection Management Digital sample and collection management for curators 21

Repository Portal Functionality Rich metadata catalog, harvests catalogs from external systems UI to search for samples across repositories Phase 1: simple search by location, lithology, age, expedition Phase 2: advanced search on available data (via OpenCoreData?) Shopping cart approach for sample requests Submit single request to multiple repositories Centralized user management (aligned with ORCID) User account setup and login for transactions with all repositories Dashboard for users to view pending, active, and past requests Track communication with repositories Dashboard for NSF to view use statistics such as number of requests, samples shipped, users served, requests returned, etc. Additional functionality as requested and as funding allows 22

Repository Collection Management Hosted collection database to enter & edit sample metadata Batch upload of new samples added to the collection Batch upload of samples from sampling events in the lab Customizable metadata (add fields, add vocabularies) Set role-based permissions for metadata access & editing Track metadata changes (who did it, when) Automated IGSN registration Integration with Repository Portal loan management Alerts about new requests View, approve, track request status Send reminder to users for sample returns and reports Automatically tracks loan statistics Other functionality as desired (data & image storage, label generation, etc.) 23

Leveraging Existing Components System PLUS MINUS SESAR Specify Polar Rock Repository Curation DIS CyVerse (former iplant) Multi-user; role-based access; dashboards; APIs; integration with IGSN; operational environment (data facility) Cloud-based, scalable, modular; broad adoption in BIO; open source; customizable Great functionality for collection management and sample requests; user friendly core-specific functionality; multi-institutional architecture Scalable architecture Missing sample request & loan management, technology upgrade required Designed for BIO; no corespecific functionality; single operator at KU Single user implementation; no core-specific functionality Not open source; old technology; single POF; only for cores Complex (overkill?) TAMU ODP To be explored To be explored 24

Value Proposition For Repositories Online loan management & user interactions Automated use statistics Seamless IGSN registration APIs to support metadata harvest by IMLGS & others catalogs Common user database aligned with ORCID Linking of samples to cruises (R2R), data, publications, etc. Integration with Open Core Data? Enhanced communication & coordination across repositories Long-term sample metadata preservation 25

Value Propositions For Users One-stop-shop to search for and request samples across repositories Single user account to view pending/active/past requests, and to communicate with repositories For NSF Increased efficiency of repository operations over the long-term Easy access to harmonized up-to-date use statistics via dashboard Implementation of common sample access policies 26

Discussion Does this plan address your most pressing concerns? Is there anything that is missing? Are there other existing tools or efforts that you think we should look into/leverage?. 27

Adoption Poster session at AGU Fall Meeting 2016 EGU 2016: "The IGSN Experience" Posters at EGU General Assembly 2016 28