Sessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

Similar documents
DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

DataONE Cyberinfrastructure. Ma# Jones Dave Vieglais Bruce Wilson

Wendy Thomas Minnesota Population Center NADDI 2014

Research Data Repository Interoperability Primer

Comparing Open Source Digital Library Software

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

NRF Open Access Statement Impact Survey

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Building a Digital Repository on a Shoestring Budget

8th International Digital Curation Conference January 2013

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

DSpace Fedora. Eprints Greenstone. Handle System

International Multidisciplinary Metadata Workshop 18 January Rebecca Koskela Arctic Region Supercomputing Center

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Helping Journals to Upgrade Data Publications for Reusable Research

Dagmar Triebel, Peter Grobe, Anton Güntsch, Gregor Hagedorn, Joachim Holstein, Carola Söhngen, Claus Weiland, Tanja Weibulat.

Data Curation Handbook Steps

DATAVERSE FOR JOURNALS

Digital repositories as research infrastructure: a UK perspective

The GeoPortal Cookbook Tutorial

Adding OAI ORE Support to Repository Platforms

Networking European Digital Repositories

Engaging and Connecting Faculty:

Networking European Digital Repositories

An overview of the OAIS and Representation Information

Dryad Curation Manual, Summer 2009

Data Symposium 2012 SeWHIP & CTSI John W. Cobb, Ph.D. Milwaukee, WI March 1, 2012

Building for the Future

EUDAT. Towards a pan-european Collaborative Data Infrastructure

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

The European Repositories Landscape - The view from 20,000 feet

B2SAFE metadata management

DA-NRW: a distributed architecture for long-term preservation

Enabling Interaction and Quality in a Distributed Data DRIS

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

Increasing access to OA material through metadata aggregation

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

DataONE: Open Persistent Access to Earth Observational Data

Persistent identifiers, long-term access and the DiVA preservation strategy


EPrints: Repositories for Grassroots Preservation. Les Carr,

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Town Hall Meeting on the National and International Data Infrastructure. Jim Warren, NIST Lisa Friedersdorf, NNCO

The Design of a DLS for the Management of Very Large Collections of Archival Objects

A Micro-Services-Based Approach for Curation and Preservation Solutions

Data publication and discovery with Globus

A service-oriented national e-thesis information system and repository

How to contribute information to AGRIS

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Current Digital Preservation Trends and SDB4 Features. Pauline Sinclair, Tessella, PASIG, Madrid, 5 July 2010

ACDH AUSTRIAN CENTRE FOR DIGITAL HUMANITIES

Site# Date H20 Temperature Conductance Turbidity KRS Sep KRS Aug KRS Aug

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Its All About The Metadata

Mitigating Risk of Data Loss in Preservation Environments

OPENAIRE FP7 POST-GRANT OPEN ACCESS PILOT

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

Part 2: Current State of OAR Interoperability. Towards Repository Interoperability Berlin 10 Workshop 6 November 2012

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

DATA SHARING AND DISCOVERY WITH ARCGIS SERVER GEOPORTAL EXTENSION. Clive Reece, Ph.D. ESRI Geoportal/SDI Solutions Team

Research Data Management and Institutional Repositories

Presented by Dr Joanne Evans, Centre for Organisational and Social informatics Faculty of IT, Monash University Designing for interoperability

EUDAT Towards a Collaborative Data Infrastructure

Writing a Data Management Plan A guide for the perplexed

CMSP Discovery Vocabularies Workshop. A Joint WHOI/USGS/NOAA Workshop

The MetaArchive of Southern Digital Culture. Building a Multi-Institutional Digital Preservation Network

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

GNU EPrints 2 Overview

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Developing Java TM 2 Platform, Enterprise Edition (J2EE TM ) Compatible Applications Roles-based Training for Rapid Implementation

Registry Interchange Format: Collections and Services (RIF-CS) explained

Software + Services for Data Storage, Management, Discovery, and Re-Use

Demos: DMP Assistant and Dataverse

If you build it, will they come? Issues in Institutional Repository Implementation, Promotion and Maintenance

Showing it all a new interface for finding all Norwegian research output

DOIs for Research Data

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Networking European Digital Repositories

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

Dataverse and DataTags

Metadata Zoo Dataset Metadata Rebecca Koskela Execu4ve Director, DataONE

Igitur Archive: Institutional Repository Utrecht University. May , Martin Slabbertje

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

Agenda. Bibliography

The Common Framework for Earth Observation Data. US Group on Earth Observations Data Management Working Group

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

Infrastructure for the UK

Research on the Interoperability Architecture of the Digital Library Grid

Building a Materials Data Facility (MDF)

SMART CONNECTOR TECHNOLOGY FOR FEDERATED SEARCH

Fedora Commons Update

The Materials Data Facility

Transcription:

Sessions 3/4: Member Node Breakouts John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

Schedule 1:00-2:20 and 2:40-4:00 Member Node Breakouts Member Node Overview and Process Overview Documentation and Communications Data driven Science, Data archives (MNs) and DataONE Integration opportunities Summary / Questions & Comments http://epad.dataone.org/dug2103-bk1-mns 2

Member Nodes Authoritative members of the Federation Curate data holdings Provide unique identifiers for each object Ensure availability, quality, and reliability Log and report accesses to objects Control access to data and metadata Replicate holdings for other MNs Deploy a DataONE-compatible software system 3 3 3

DataONE Design goals Embrace heterogeneity via Data Packaging Preservation High availability Reproducibility, Immutability and Versioning Logging All within a distributed, heterogeneous, autonomous federation of Member Nodes 4

Persistent Identifiers Goal: Uniquely identify data or metadata objects to support reproducible analysis via data citation Every object in DataONE gets a Persistent Identifier Not-reusable Indirect reference to immutable content 800 Unicode characters or less Whitespace and non-printing characters illegal http://mule1.dataone.org/architecturedocs-current/design/ PIDs.html 5 5

Some identifiers doi:10.5063/aa/duc_merp.126.4 ark:/13030/m5zp459k/1/cadwsaps5800837-001.xml urn:uuid:e26ef510-cfcd-11e1-aee9-7f4e395c5a4c urn:lsid:ubio.org:namebank:11815 http://example.com/data/mydata?row=24 duc_merp.126.4 ข อม ลท เป นประโยชน 6 6

DataONE Packages Goal: Aggregate heterogeneous data and metadata objects, linking among components Flexibly describe complex data structures Supports arbitrary file formats Virtual aggregations by reference Objects can be in multiple packages Extensible model for relationships Linked-data compatible http://mule1.dataone.org/architecturedocs-current/design/datapackage.html 7 7

Package Model Package Data System Metadata Science Metadat a System Metadata Any data object Each object: Has unique identifier Content does not change Resourc e Map OAI-ORE RDF XML documents: ISO19115, EML, FGDC, System Metadata 8

Discover Content: Metadata Formats Extract and index common fields from metadata standards Ecological Metadata Language (EML) FGDC Biological Data Profile (BDP) ISO 19115 Geospatial Metadata Dublin Core Darwin Core METS Extensible to include many more DIF, NexML, WaterML, CF, NcML, ESML, DDI, MIENS,... 9 9

Road Map to a Member Node Plan Develop Deploy Operate Determine feasibility Join the DataONE federation Select the tier Plan the implementation Use existing MN software Ye s Deploy in staging? environment Announcement No Developmen t iteration Developme nt test Establish a test system Test in staging environment Deploy in production Mutual acceptance Ongoing operations Participate in MN Forums 10

Plan: Member Node Tiers Plan Determine feasibility Join the DataONE federation Select the tier Tier 1: Public content Tier 2: Access control Tier 3: Write services Tier 4: Act as a replication target Plan the implementation 11 11

Develop: Available MN Software Develop Use existing MN software? No Developmen t iteration Developme nt test Establish a test system Wrap your own Implement the Tier 1 MN APIs however you like Implement Tiers 2-4 as you need Use existing repository software Metacat (Tier 4) Generic Member Node (Tier 4) Dryad (Tier 1) Mercury (Tier 1) Merritt (Tier 1)... more coming... 12 12 38 12

Deploy: The Bottom Line Deploy Deploy in staging environment Test in staging environment Deploy in production Mutual acceptance Hardware CPUs Storage Infrastructure Power Internet Facilities Administration MN operation Data curation Implementation Design Development Long-term Maintenance Migration Shutdown 13

Operate: Moving to Production Operate Announcement Ongoing operations Participate in MN Forums To flip the switch: Must pass all tests required for tier Security updates, patches applied Administrative procedures in place Agreements in place Once running: Announcement Maintain member node integrity Respond to administrative requests Community participation 14 14

Typical Barriers During deployment Establishing secure SSL environment Providing package ORE maps Design Issues Immutability & Versioning Logging 15

Questions and Discussion Member Node Checklist http://mule1.dataone.org/operationdocs/ member_node_deployment/mn_checklist.html http://epad.dataone.org/dug2103-bk1-mns 16

MN Documentation and Communication Current communication channels Are they working? What can make them better? Documentation is a form of communication Current status What can make it better? 17

DataONE communications channels with MNs Member Node Forum (MNF) Meets biweekly Desire for MNF to be: One stop communications start location Especially for issues common to MNs (which most are) A place for MN leverage (MNs helping MNs directly) IRC (Internet Relay Chat) Redmine can follow status of MN tickets DataONE developers mailing list great for technical POCs http://epad.dataone.org/dug2013-bk1-mns 18

MN documentation status Current documentation Overview (what is DataONE, how to become a member node, etc.) DataONE website https://www.dataone.org/ https://ask.dataone.org/ Technical documentation Architecture documentation http://mule1.dataone.org/architecturedocs-current/ Operations documentation http://mule1.dataone.org/operationdocs/ http://epad.dataone.org/dug2013-bk1-mns 19

MN documentation status What s missing? A link between the high-level overview information and the technical details It needs to be accessible user-friendly comprehensive http://epad.dataone.org/dug2103-bk1-mns 20

Road Map to a Member Node Plan Develop Deploy Operations Determine feasibility Join the DataONE federation Select the tier Plan the implementation Use existing MN software Ye s Deploy in staging? environment Announcement No Developmen t iteration Developme nt test Establish a test system Test in staging environment Deploy in production Mutual acceptance Ongoing operations Participate in MN Forums 21

Questions: Documentation 1. What should be seen on the DataONE website regarding Member Nodes? http://epad.dataone.org/dug2103-bk1-mns 22

Questions: Documentation 1. What should be seen on the DataONE website regarding Member Nodes? 2. What new questions and answers are needed on ask.dataone.org re: Member Nodes (and science users)? http://epad.dataone.org/dug2103-bk1-mns 23

Questions: Documentation 1. What should be seen on the DataONE website regarding Member Nodes? 2. What new questions and answers are needed on ask.dataone.org re: Member Nodes (and science users)? 3. What 2-3 things do you need to know, or some questions that you have had trouble finding the answer to, that would improve documentation? http://epad.dataone.org/dug2103-bk1-mns 24

Question: MN communications When and how do you use IRC? Online documentation? Developers mailing list? CCIT calls Redmine? How can we maximize the utility of the MNF and other communication means? http://epad.dataone.org/dug2103-bk1-mns 25

DataONE and MN added value For science researchers, how can DataONE help amplify the science impact of the data archives represented by the Member nodes? Known: Resilient access Persistence A single interface (may or may not be preferred) Unified discovery across archives Pilot demonstrations Scaling Enabling new investigations (may be scaling related) Not known: Routinely access multiple archives simultaneously (incl. synthesis) Build multi-archive interoperable workflows Building useful/powerful derived data products Known not possible Silver bullets Everyone gets a pony Free lunches 26

Questions: What can MNs and DataONE do to augment synergistic opportunities? What are the general areas of opportunity? How to identify the ripe, low hanging fruit? What are the principal challenges? Can one DUG function as a self-sustaining reservoir for this discussion? http://epad.dataone.org/dug2103-bk1-mns 27

Questions: What can MNs and DataONE do to augment synergistic opportunities? How to find and assist science users who want to use multiple archives (multiple MN s)? What is the value to science users beyond direct access to individual archives? How to enable this value within DataONE? What are the important differences between using different MNs in DataONE versus using multiple access methods (for example, multiple datanets) http://epad.dataone.org/dug2103-bk1-mns 28

Questions: What can MNs and DataONE do to augment synergistic opportunities? How to find and assist science users who want to use multiple archives (multiple MN s) Other Questions? http://epad.dataone.org/dug2103-bk1-mns 29

Contact Points John Cobb cobbjw@ornl.gov 865.576.5439 Matt Jones jones@nceas.ucsb.edu Laura Moyers lmoyers1@utk.edu Member Node Forum (contact us to be added) http://www.dataone.org/contact 30

Metacat (Tier 4) Flexible storage system for metadata and data Stores, search, and document data Java webapp runs on Linux, Windows, MacOS Deployed worldwide, maintained over 10 years Web-based search interface Customizable user interface Web metadata entry tool Tier 4 Replication capabilities Postgres or Oracle backend DOI Support OAI-PMH harvester GPL open source license 31 31

Generic Member Node (GMN) (Tier 4) Lightweight system to wrap backend data stores Tier 4 compliant Simple to install Configurable to wrap different back end systems Python Django implementation Linux, MacOS X, Windows Apache2 open source license No user interface http://repository.dataone.org/ software/cicore/trunk/mn/d1_mn_generic 32 32 40 32

Dryad (Tier 1) Data from peer-reviewed articles in bioscience Metadata schemas focused on METS Submissions integrated with journal workflows Customized DSpace repository DOI Support OAI-PMH harvester Data usage displays Dublin Core Application Profile Data file format determined by depositor and journal policy Some curation and migration of file formats 33 33 41 33

Mercury (Tier 1) Flexible data and metadata repository Supports common metadata formats (FGDC, Dublin- Core, EML, ISO-19115) Open source, Java-based webapp jointly developed by USGS, DOE, NASA and NSF SOLR-based member node implementation for metadata File-system store for data Uses existing Clearinghouse harvesting architecture DOI support Deployed across many environmental data projects Web-based search interface Customizable user interface Web metadata entry tool 42 34 34 34

Merritt (Tier 1) Flexible data repository from California Digital Library Implementation uses Metacat for DataONE services Micro-Services approach to digital curation Easy-to-use interface for deposit and update Integration with EZID/ DOI support Tools for long-term management 35 35 43 35