irods - An Overview Jason Coposky @jason_coposky Executive Director, irods Consortium CS3 2018 Department of Computer Science, AGH Kraków, Poland 1
What is irods irods is Distributed Open source Metadata Driven Data Centric A flexible framework for the abstraction of infrastructure 2
irods as the Integration Layer 3
Data Virtualization Combine various distributed storage technologies into a Unified Namespace Existing file systems Cloud storage On premises object storage Archival storage systems irods provides a logical view into the complex physical representation of your data, distributed geographically, and at scale. 4
Data Virtualization Logical Path Physical Paths(s) 5
Data Virtualization $ ils -L /tempzone/home/rods/thefile.txt rods 0 demoresc 29606 2016-10-05.09:05 & thefile.txt generic /var/lib/irods/irods/vault/home/rods/thefile.txt rods 1 repl;u2 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u2vault/home/rods/thefile.txt rods 2 repl;u1 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u1vault/home/rods/thefile.txt Logical Path Physical Paths /tempzone/home/rods/thefile.txt /var/lib/irods/irods/vault/home/rods/thefile.txt /tmp/u2vault/home/rods/thefile.txt /tmp/u1vault/home/rods/thefile.txt 6
Data Discovery Attach metadata to any first class entity within the irods Zone Data Objects Collections Users Storage Resources The Namespace irods provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable. 7
Metadata Everywhere 8
Workflow Automation Integrated scripting language which is triggered by any operation within the framework Authentication Storage Access Database Interaction Network Activity Extensible RPC API The irods rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system. 9
Dynamic Policy Enforcement The irods rule may: restrict access log for audit and reporting provide additional context send a notification 10
Dynamic Policy Enforcement A single API call expands to many plugin operations all of which may invoke policy enforcement Plugin Interfaces: Authentication Database Storage Network Rule Engine Microservice RPC API 11
Provenance and Reporting 12
Secure Collaboration irods allows for collaboration across administrative boundaries after deployment No need for common infrastructure No need for shared funding Affords temporary collaborations irods provides the ability to federate namespaces across organizations without pre-coordinated funding or effort. 13
irods Service Interface 14
Federation - Shared Data and Services 15
Institutional repositories As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements. 16
irods Use Cases 17
On Premises to Any Cloud Infrastructure 18
Data to Compute Use Case 19
Compute to Data Use Case 20
The Wellcome Trust Sanger Institute 21
Sanger - Replication Data preferentially placed on resource servers in the green data center (fallback to red) Data replicated to the other room. Checksums applied Green and red centers both used for read access. 22
Sanger - Metadata Example metadata attributes Users query and access data from local compute clusters Users access irods locally via the command line interface attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment 23
Sanger - Federation 24
University College London UK sponsored research requirements: last date of access request plus 10 years irods tiers data across storage technologies Enables federated access from other centers 25
irods Software Roadmap 26
The Roadmap irods 4.3 Packaged irods Capabilities Multipart Transfer Cacheless Object Storage Query Arrow Metadata Templates Filesystem Integration 27
The Roadmap - irods 4.3 Hardening Release Logging irods Monitor Delegate Checksum to Storage Plugins 28
Packaged irods Capabilities 29
Multipart Transfer Provide reliable transfer with restart - object parts tracked in the catalog Later versions will provide fast, first class access to object storage 30
irods 4.2 and Beyond - The Scatter 31
Next Generation Query Interface 32
irods 4.3 and Beyond - The Gather 33
Shared Data - Shared Infrastructure 34
Metadata Templates 35
irods Consortium Business Model 36
The irods Consortium Our Mission Write Good Software Grow the Community Show Value to our Membership 37
Why Open Source Transparency Quality Persistence Vendor Neutrality Customization Community Try before you buy 38
Our Membership 39
Our Business Model Consortium Membership Participate in roadmap development Participate in consortium governance Direct support from the team Tier 3 support agreements Discount for support agreements 40
Our Business Model Service & Support Contracts Billed hourly Implement Proofs of Concept Custom rule and plugin development Expand to new use cases Discounted rate for consortium members 41
Membership Committees Technology Working Group Monthly web conferences Build irods Roadmap Propose new technology direction Propose inclusion of new software Propose new working groups 42
Membership Committees Planning Committee Monthly web conferences Discuss consortium policy and business practices Propose conferences and workshops Vote on inclusion of new software Vote on roadmap 43
Membership Committees Executive Board Meets twice yearly Votes on consortium budget and bylaw changes Determines the thematic priorities of the consortium Additional working groups are formed as required 44
Our Consortium Participation 45