IRODS: the Integrated Rule- Oriented Data-Management System

Similar documents
Transcontinental Persistent Archive Prototype

Policy Based Distributed Data Management Systems

White Paper: National Data Infrastructure for Earth System Science

irods Status Integrated Rule-Oriented Data System Reagan Moore Mike Wan Jean-Yves Nief

irods User Group integrated Rule Oriented Data System Reagan Moore

MetaData Management Control of Distributed Digital Objects using irods. Venkata Raviteja Vutukuri

DSpace Fedora. Eprints Greenstone. Handle System

The International Journal of Digital Curation Issue 1, Volume

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

A Simple Mass Storage System for the SRB Data Grid

Data Grids, Digital Libraries, and Persistent Archives

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

Knowledge-based Grids

Building a Reference Implementation for Long-Term Preservation

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Policy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

SRB Logical Structure

Mitigating Risk of Data Loss in Preservation Environments

Implementing Trusted Digital Repositories

Policy Based Data Management irods

Virtualization of Workflows for Data Intensive Computation

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Constraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives: Final Report May 2007

The DataBridge: A Social Network for Long Tail Science Data!

CI-BER tutorial. Richard Marciano Chien-Yi Hou 11/14/2013. Title

Integration of Cloud Storage with Data Grids

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

DataBridge: CREATING BRIDGES TO FIND DARK DATA. Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES. The Team

Introduction to The Storage Resource Broker

CineGrid Exchange. Building A Global Networked Testbed for Distributed Media Management and Preservation

Distributed Data Management with Storage Resource Broker in the UK

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

TECHNICAL OVERVIEW irods Technical Overview 2016 edition RCI_iROD_Report_final2.indd 1-2 5/26/16 9:21 AM

High Availability irods System (HAIRS)

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

The International Journal of Digital Curation Issue 1, Volume

Managing Large Distributed Data Sets using the Storage Resource Broker

Collection-Based Persistent Digital Archives - Part 1

Managing Large Scale Data for Earthquake Simulations

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Data Management for Distributed Scientific Collaborations Using a Rule Engine

Anatomy of the BIRN The Biomedical Informatics Research Network

Proceedings. irods User Group Meeting Policy-Based Data Management Sharing and Preservation

Implementation of the Pacific Research Platform over Pacific Wave

irods 4.0 and Beyond Presented at the irods & DDN User Group Meeting 2014

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Storage Challenges for LSST: When Science Is Bigger Than Your Hardware

DataONE: Open Persistent Access to Earth Observational Data

Enabling Interaction and Quality in a Distributed Data DRIS

OpenSees on Teragrid

The NASA Center for Climate Simulation Data Management System

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Grid Portal Architectures for Scientific Applications

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Requirements for data catalogues within facilities

How LambdaGrids are Transforming Science"

DSpace development from MIT's Digital Library Research Program

Simplifying Collaboration in the Cloud

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

irods usage at CC-IN2P3: a long history

OOI CyberInfrastructure Conceptual and Deployment Architecture

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Cyberinfrastructure!

Datagridflows: Managing Long-Run Processes on Datagrids

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Distributed Repository for Biomedical Applications

Server-Side Workflow Execution using Data Grid Technology for. Reproducible Analyses of Data-Intensive Hydrologic Systems

Jim Mains Director of Business Strategy and Media Services Media Solutions Group, EMC Corporation

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

San Diego Supercomputer Center: Best practices, policies

Indiana University s Media Digitization and Preservation Initiative

Transitioning NCAR MSS to HPSS

Electronic Records Archives: Philadelphia Federal Executive Board

irods usage at CC-IN2P3 Jean-Yves Nief

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

An overview of the OAIS and Representation Information

Promoting Open Standards for Digital Repository. case study examples and challenges

LASDA: an archiving system for managing and sharing large scientific data

Harnessing the Data Deluge

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions

Building a Collaborative Testbed for Very High-Quality Media Over Very High-Speed Networks. February 18, 2010

An overview of irods clients. Ton Smeele

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

PRESERVING DIGITAL OBJECTS

LHC and LSST Use Cases

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Science-as-a-Service

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

The Canadian CyberSKA Project

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

California Digital Library UC Office of the President

Potential for Technology Innovation within the Internet2 Community: A Five-Year View

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Transcription:

IRODS: the Integrated Rule- Oriented Data-Management System Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural Computation (INC), University of California San Diego irods.org, dice.unc.edu, diceresearch.org

Who Are We? Computer Scientists and Software Engineers Started in 1997 Grew out of High Performance Computing Broadened support to Digital Libraries/Preservation Doing applied research Digital Preservation and Data-Grids Develop and distribute Integrated Rule-Oriented Data System (irods) Open Source; PCs to High-Performance Computing

What Problems Are We Solving? Researchers may have millions of computer files Keep them safely stored and replicated (remotely) Distribute them across the network; remote access Automate handling; rules, work-flows Keep track of what they are (meta-data) Be able to find the right ones quickly (queries) Share them, in a controlled manner (authentication, access control, audit trails) Preserve them; change storage transparently

Data Management Applications Data grids Build shareable collection Processing pipelines Apply standard procedures on files Digital Libraries Publish data collections Preservation Environments Ensure properties of a collection Migrate collections to new technology Federation Build collection that spans multiple data grids

What Does irods Do? (1 of 3) Remote High-Performance Data Access get/put, read/write Encapsulate small message in request to send Parallel threads for large transfers with TCP/IP Reliable Blast UDP Unified View Of Disparate Data Organizes distributed data into a shareable collection Separates physical from logical (logical name-space) Keeps track of names and locations of files Storage System Independent Unix/Windows File Systems HPSS (Archival Storage) Universal mass storage system interface

What Does irods Do? (2 of 3) Replication/Backup Physical copies Metadata (RDBMS) System and user-defined PostgreSQL, Oracle, MySQL Queries/Information Discovery Access Control users/groups Secure Passwords, Grid Security Infrastructure (GSI), Kerberos, Shibboleth soon

What Does irods Do? (3 of 3) Policies / computer actionable rules Highly configurable Enforce management policies Automate administrative functions Validate assessment criteria Workflows Rules/Micro-services (immediate, delayed, periodic) Management of Large Collections irsync - synchronize remote directory into the data grid Audit trails Descriptive metadata - extensible schema

Sharing Data in irods Data System Scientist A Adds data to Shared Collection Scientist B Accesses and analyzes shared Data irods Data System Brain Data Server, CA Audio Data Server, NJ Video Data Server, TN irods Metadata Catalog Scientists can use irods as a data grid to share multiple types of data, near and far. irods Rules also enforce and audit human subjects access restrictions.

DICE Technologies Helping UCSD Projects The National Center for Microscopy and Imaging Research (NCMIR) is using DICE SRB and testing irods in the Cell Centered Database project. DICE irods helps computational seismologists from the Southern California Earthquake Center (SCEC) manage large-scale earthquake simulation data at SDSC and other TeraGrid sites. UCSD Libraries Digital Asset Management System (DAMS) using DICE technologies, including SRB. DICE irods helps Ocean Observatories Initiative (OOI) with Scripps and Calit2 manage large-scale diverse ocean data, including real-time streaming data. And others including CineGrid, TDLC, etc.

Connecting Data Collections for New Science "Federating" isolated "silos" of data enables new collaborations OOI ocean data flows in irods data grid to NOAA National Climatic Data Center (NCDC) NCDC climate data is accessed through data grid for CUAHSI hydrology research on floods CUAHSI hydrology data connects to Odom Institute for social science research on human impacts and response to floods OOI climate data discovered and flows to iplant Consortium for designing drought-resistant plants for climate change adaptation

Growing Use of irods Data System Astronomy: NOAO, NVO, Observatoire de Strasbourg, France; CADAC, etc. Geo: NOAA NCDC; OOI; SCEC, etc. HPC: TeraGrid sites, SDSC, TACC, NICS, etc. NASA NCCS Bio: TDLC, NICMIR, iplant, etc. Preservation: NARA TPAP, French National Library; Texas Digital Library; Fedora Commons; Dspace, etc. Workflow: Kepler, Taverna, etc. International: EU SHAMAN; Australian ARCS; UK e-science; KEK (Japan); Academica Sinica (Taiwan); CC-IN2P3 HEP, France; etc. Industry: IBM, Oracle/Sun, Atos Origin, Microsoft, DataDirect

DICE Team Data Intensive Cyber Environments Center University of North Carolina at Chapel Hill (UNC) UNC School of Information and Library Science (SILS) Renaissance Computing Institute (RENCI) Reagan Moore (Professor) Arcot Rajasekar (Professor) Antoine de Torcy, Chien-Yi Hou, Mike Conway UC San Diego Institute for Neural Computation (INC) Mike Wan Wayne Schroeder Sheau-Yen Chen, Bing Zhu, Paul Tooby irods development is supported by NSF OCI-0848296 "NARA Transcontinental Persistent Archives Prototype" (2008-2012) NSF SDCI 0721400 "Data Grids for Community Driven Applications" (2007-2010)