Moving Repositories into the Digital Age: SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management Kerstin Lehnert & Megan Carter Orlando IEDA Lamont-Doherty Earth Observatory, Columbia University 1
Talk Outline Update on CI developments: SESAR, IGSN Motivations & Previous Work Proposed System Functionalities Leveraging Existing Components Discussion 2
SESAR System for Earth Sample Registration Authenticated workspace with tools for users to submit & manage sample metadata (MySESAR) IGSN Allocating Agent: Register samples with IGSN Searchable catalog of sample metadata & supplementary documents submitted by users Register samples (batch or individual) View/edit metadata Create groups/collections Transfer ownership of metadata Generate labels (QR code) 3 Role-based access
Update: New SESAR Features Architecture aligned with new IGSN syntax rules Allows IGSN >9 digits SESAR holds name space IE, users have sub-name spaces APIs for submission (incl. authentication), updates, & access of sample metadata Batch updating of sample metadata Role-based permissions Linked Data version of SESAR (GeoLink project R. Arko, P. Ji) Links to cruise DOIs, publication & data DOIs, ORCIDs OAI-PMH provider for IGSN Central Catalog & Community Portals 4
IEDA Data Browser 5
SESAR: Plans Modify metadata to distinguish samples in trusted repositories from informal (uncurated) archiving to accommodate emerging open access policies of publishers & funders Allow users to add customized metadata fields to accommodate specific sample types & communities Improved search interface (SolR index, Elastic Search) Linking to data (as data repositories start to use IGSN) PID based linking to people (ORCID), cruises (DOI), funding awards (FundRef), repositories (re3data), etc. 6
IGSN Resolvable, globally unique PID (Persistent Identifier) Based on the Handle System (like DOI), resolvable at http://igsn.org/xxxzzzzzz Governed by an international non-profit organization (IGSN e.v.) Membership in 5 continents 7
Update about IGSN 4,106,273 >6.25 million IGSNs issued! 2,100,273 Status January 2017 25,748 25,583 4,461 Allocating Agent: started: IEDA Geoscience Australia CSIRO MARUM GFZ 2005 2015 2015 2015 2015 8
120 IGSN Prefixes Used at SESAR (proxy for active users) 100 80 60 Batch registration released 40 20 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 9
Linking Samples, Data, & Publications 10
Adoption: Agencies USGS State surveys Smithsonian GA BGS IFREMER GESEP 11
IGSN Adoption by publishers AGU Publications also strongly encourages use of other identifiers in our journal papers. International Geo Sample Numbers (IGSNs) uniquely identify items, such as a rock sample, a piece of coral, or a vial of water taken from the natural environment, and provide important, consistent information about these samples. Registering samples and including the IGSN in papers helps secure provenance information but most importantly connects common samples across multiple studies in the literature. IGSNs also will help you keep track of your samples. These identifiers can be reserved before a field season or assigned afterward. Hanson, B. (2016), AGU opens its journals to author identifiers, Eos, 97, doi:10.1029/2016eo043183. Published on 7 January 2016. 12
IGSN in DataCite 13
IGSN & ORCID All, ORCID is officially introducing IGSN as a new identifier type in the API and search indexes. It will use the igsn.org prefix and should be live in a couple of weeks. I will check what this means concretely when it is live and see if I can come up with some examples that demonstrate the functionality. Cheers, m. (mail from Markus Stocker, Pangaea, 4/25/2017) 14
IGSN Architecture The separation of administrative and descriptive metadata (learning from setting up DataCite). Metadata in a common service can be: Least common denominator All-encompassing with lots of optional elements Communities of practice define their metadata as an extension of a core set of metadata. 15 http://igsn.github.io/
IGSN Description Metadata The schema gives a minimal set of descriptive metadata for a global sample catalogue. Sample Identification Sampling Activity Sample Curation Related Resources Elements reference to IGSN admin, ODM2, O&M, DataCite. http://schema.igsn.org/description/ 16
A Repository Portal Federated network of repository catalogs that improves discovery, access, sharing, analysis, and curation of physical samples to promote transparency, reproducibility, and re-use in the era of open science. Support for digital collection management if needed Streamline collection metadata management, exchange, & integration with other systems (e.g. IGSN, IMLGS) Facilitates discovery & access to samples for investigators Harmonizes sample request & access policies and procedures across repositories. Supports consistent & automated generation of use statistics to demonstrate Return on Investment (ROI) 17
Requirements Gathering: DESC Supplement to IEDA in 2011 from OCI (DCL) Collaboration with OSU, LacCore, AZGS, RENCI Extensive survey of repositories Produced a report on status and requirements 18
Requirements Gathering: isamples EarthCube isamples Research Coordination Network (RCN) Stakeholder Alignment Survey (J. Cutcher-Gershenfeld) Use Cases Working Group (S. Ramdeen, A. Deere) Collected user stories to articulate life cycle practices of samples for different users Workflow Applications Working Group (J. Bowring, A. Hangsterfer) Identify barriers that inhibit adoption of leading practices, incl. IGSN assignment, sample documentation, and sample citation. 19 Requesting and receiving actual physical samples from a museum/repository is important.81 (.19) Requesting and receiving actual physical samples from a museum/repository is easy..43 (.24)
Requirements Gathering: SESAR Ongoing feedback from the community (investigators, curators) regarding functionality, usability, performance Includes EAR and PLR funded repositories, museums, agencies Collaboration with other Allocating Agents of the IGSN e.v. International best practices and emerging software tools LDEO is the Managing Office of the IGSN e.v. Participation in previous Curators Meetings Extensive use cases provided by Anders Noren, Anthony Koppers, Kevin Johnson and others 20
Possible System Components Repository Portal User interface and APIs for sample search across repositories Authenticated workspace for users & repositories Request samples from multiple repositories with a single transaction View loans Communicate with curators Authenticated dashboard for NSF to view use statistics Repository Collection Management Digital sample and collection management for curators 21
Repository Portal Functionality Rich metadata catalog, harvests catalogs from external systems UI to search for samples across repositories Phase 1: simple search by location, lithology, age, expedition Phase 2: advanced search on available data (via OpenCoreData?) Shopping cart approach for sample requests Submit single request to multiple repositories Centralized user management (aligned with ORCID) User account setup and login for transactions with all repositories Dashboard for users to view pending, active, and past requests Track communication with repositories Dashboard for NSF to view use statistics such as number of requests, samples shipped, users served, requests returned, etc. Additional functionality as requested and as funding allows 22
Repository Collection Management Hosted collection database to enter & edit sample metadata Batch upload of new samples added to the collection Batch upload of samples from sampling events in the lab Customizable metadata (add fields, add vocabularies) Set role-based permissions for metadata access & editing Track metadata changes (who did it, when) Automated IGSN registration Integration with Repository Portal loan management Alerts about new requests View, approve, track request status Send reminder to users for sample returns and reports Automatically tracks loan statistics Other functionality as desired (data & image storage, label generation, etc.) 23
Leveraging Existing Components System PLUS MINUS SESAR Specify Polar Rock Repository Curation DIS CyVerse (former iplant) Multi-user; role-based access; dashboards; APIs; integration with IGSN; operational environment (data facility) Cloud-based, scalable, modular; broad adoption in BIO; open source; customizable Great functionality for collection management and sample requests; user friendly core-specific functionality; multi-institutional architecture Scalable architecture Missing sample request & loan management, technology upgrade required Designed for BIO; no corespecific functionality; single operator at KU Single user implementation; no core-specific functionality Not open source; old technology; single POF; only for cores Complex (overkill?) TAMU ODP To be explored To be explored 24
Value Proposition For Repositories Online loan management & user interactions Automated use statistics Seamless IGSN registration APIs to support metadata harvest by IMLGS & others catalogs Common user database aligned with ORCID Linking of samples to cruises (R2R), data, publications, etc. Integration with Open Core Data? Enhanced communication & coordination across repositories Long-term sample metadata preservation 25
Value Propositions For Users One-stop-shop to search for and request samples across repositories Single user account to view pending/active/past requests, and to communicate with repositories For NSF Increased efficiency of repository operations over the long-term Easy access to harmonized up-to-date use statistics via dashboard Implementation of common sample access policies 26
Discussion Does this plan address your most pressing concerns? Is there anything that is missing? Are there other existing tools or efforts that you think we should look into/leverage?. 27
Adoption Poster session at AGU Fall Meeting 2016 EGU 2016: "The IGSN Experience" Posters at EGU General Assembly 2016 28