irods and Objectstorage UGM 2016, Chapel Hill / Othmar Weber, Bayer Business Services / v0.2

Similar documents
Implementing a Genomic Data Management System using irods at Bayer HealthCare

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

FEBRUARY - MAY 2017 PROOF OF CONCEPT AND CASE STUDY. IBM Spectrum Protect and Backing up to Object Storage in the Cloud

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Simplifying Collaboration in the Cloud

Dell EMC CIFS-ECS Tool

Cloud Computing /AWS Course Content

Storage for HPC, HPDA and Machine Learning (ML)

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Object Storage Level 100

Zadara Enterprise Storage in

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

What s New in Oracle Cloud Infrastructure Object Storage Classic. Topics: On Oracle Cloud. Oracle Cloud

It s More Than Just Backup. Sean Craig

Tape Sucks for Long-Term Retention Time to Move to the Cloud. How Cloud is Transforming Legacy Data Strategies

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

System Requirements. Hardware and Virtual Appliance Requirements

Web Object Scaler. WOS and IRODS Data Grid Dave Fellinger

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

APRIL 2017 PROOF OF CONCEPT AND CASE STUDY. IBM Spectrum Protect and Backing up to Object Storage in the Cloud

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

The Balabit s Privileged Session Management 5 F5 Azure Reference Guide

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Archive II. The archive. 26/May/15

irods usage at CC-IN2P3 Jean-Yves Nief

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

3.30pm. A sneak peek at Veeam 2018 releases Veeam for VMware Cloud on AWS technical deep dive Veeam Availability Console Update pm. 2.

Microsoft SharePoint 2010 The business collaboration platform for the Enterprise and the Web. We have a new pie!

Introduction to The Storage Resource Broker

Cloud object storage : the right way. Orit Wasserman Open Source Summit 2018

HPSS Treefrog Summary MARCH 1, 2018

Best Practices in Designing Cloud Storage based Archival solution Sreenidhi Iyangar & Jim Rice EMC Corporation

Copyright 2012 EMC Corporation. All rights reserved.

Vendor: IBM. Exam Code: Exam Name: Storage Sales V2. Version: DEMO

Security and Reliability of the SoundBite Platform Andy Gilbert, VP of Operations Ed Gardner, Information Security Officer

Storage S3 in backup. When? Value Architecture.

VMware Virtual SAN Technology

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

CC-IN2P3 activity. irods in production: irods developpements in Lyon: SRB to irods migration. Hardware setup. Usage. Prospects.

The Latest EMC s announcements

Reactive Microservices Architecture on AWS

Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment. By Michael Noel

Architecting Microsoft Azure Solutions (proposed exam 535)

DSpace Fedora. Eprints Greenstone. Handle System

SXL-8500 Series: LTO Digital Video Archives

How Symantec Backup solution helps you to recover from disasters?

Refresh a 1TB+ database in under 10 seconds

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

powered by Cloudian and Veritas

The Fastest and Most Cost-Effective Backup for Oracle Database: What s New in Oracle Secure Backup 10.2

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

XenData Product Brief: SX-550 Series Servers for LTO Archives

Using Cohesity with Amazon Web Services (AWS)

Rio-2 Hybrid Backup Server

SUSE Enterprise Storage Case Study Town of Orchard Park New York

How to solve your backup problems with HP StoreOnce

SnapCenter Software 4.0 Concepts Guide

Course AZ-100T01-A: Manage Subscriptions and Resources

How To Guide: Long Term Archive for Rubrik. Using SwiftStack Storage as a Long Term Archive for Rubrik

Copyright 2012 EMC Corporation. All rights reserved.

Got Isilon? Need IOPS? Get Avere.

The Technology Behind Datrium Cloud DVX

An overview of irods clients. Ton Smeele

Overcoming Obstacles to Petabyte Archives

GlusterFS and RHS for SysAdmins

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

How to Protect SAP HANA Applications with the Data Protection Suite

ROCK INK PAPER COMPUTER

CIMERA ARCHITECTURE. Release 4.2.x

Secure VFX in the Cloud. Microsoft Azure

Securing Data-at-Rest

Actifio Test Data Management

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Tanium Endpoint Detection and Response. (ISC)² East Bay Chapter Training Day July 13, 2018

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

SXL-6500 Series: LTO Digital Video Archives

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Backup and archiving need not to create headaches new pain relievers are around

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Developing Microsoft Azure Solutions (70-532) Syllabus

AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?

AWS Services for Data Migration Luke Anderson Head of Storage, AWS APAC

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

Business Continuity and Disaster Recovery Disaster-Proof Your Business

Developing Microsoft Azure Solutions (70-532) Syllabus

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Google Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda

Exploiting Cloud Storage With DB2 for LUW

Cohesity Microsoft Azure Data Box Integration

Future plans for irods. John Constable Informatics Support Group

Presents. Functionality. Overview. Great Compatibility. LTO-7 Archive System. managed by XenData6 Server software

Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

Amazon S3 Glacier. Developer Guide API Version

Cloud Analytics and Business Intelligence on AWS

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Transcription:

irods and Objectstorage UGM 2016, Chapel Hill 2016-06-08 / Othmar Weber, Bayer Business Services / v0.2

Agenda irods at Bayer Situation and call for action Object Storage PoC Pillow talks Page 2

Overview irods@bayer Introduced in 2014 by an IT research project for LifeSciences Developed as internal data platform to securely store and share genomic data Integrated into the Bayer specific environment Identified as key enabler for future research data strategy Provided as a solution by a joint operations team (DevOps style) 2016: Bayer joined the irods consortium @ Used by HC Bioinformatics (Research) for data management HC Biomarker as an archival system CS Digital Farming in consideration by other projects Page 3

irods integration example HealthCare Bioinformatics HPC Users send / receive data (on demand) transparent storage access file system, object storage Hard Disk Import HiSeq, MiSeq file transfer send new experiment data (scheduled job) custom script irods metadata storage and query user + access rights management Genedata Profiler Metadata Input and Validation file and metadata transfer metadata transfer APIs audit trail Page 4

irods setup 1.0 Traditional deployment like any other server icat: VM with local database (local=san) ires: VM with local filesystem (local=san) It works. Why should I change something? Ever heard of NGS? NGS means ~ 300 GB per individual How do we handle the data of a study with 1000 individuals? Page 5

irods setup 1.0 Deployment like any other server icat: VM with local database (local=san) ires: VM with local filesystem (local=san) How does this work at Petabyte scale? It simply does not! Cost: SAN storage + Backup DR: Data mirror in 2 nd site needed Recovery: How do I revert a user or system error? Page 6

How should a storage solution in PB scale look like? (Whishes are for free ) highly scalable (from TB to PB) low price tag fully automated low administrative effort (self service?) backup included by design (replication + versioning) DR included by design (replication) data security features (fast rebuilt, check summing, self healing, access control, encryption) appliance with vendor support (no D.I.Y.) S3 interface support (de facto standard with irods plugin available) performant Object Storage! Page 7

The Proof of Concept Setup Setup: two different storage solutions S3 interface dedicated resource server with local file systems both storage solutions connected to the same resource server irods version 4.0.3 (start), 4.1.8 (later) S3-E icat ires S3-H Page 8

The Proof of Concept irods integration (Compound resource) S3 resource is implemented as compound resource Cache and Archive resource No direct storage access additional cache file system needed dedicated resource server with local file system both storage solutions connected to the same resource server Observations/Findings write cache read additional caching tier is not for free (housekeeping, performance) consistency monitoring between cache and archive necessary iput/iget provide purge cache option to delete cached objects archive not available for all commands Page 9

The Proof of Concept irods S3 resource plugin current available plugin is at version v1.3 compatible with irods v4.1.8 only Support of multiple connection parameters (->) Relies on libs3 & libcurl Object versioning was enabled on bucket level restore script to revert old versions Parameter S3_DEFAULT_HOSTNAME S3_AUTH_FILE S3_KEY_ID S3_ACCESS_KEY S3_RETRY_COUNT S3_WAIT_TIME_SEC S3_PROTO Meaning S3 endpoint address (e.g. s3.acme.com:80) File with Access key and secret key Access key (instead of S3_AUTH_FILE) Secret key (instead of S3_AUTH_FILE) Number of retries on connection error Time between retries HTTP or HTTPS More to come Example: iadmin mkresc S3archive s3 server:/bucket/irods/vault "S3_DEFAULT_HOSTNAME=s3.acme.com:80;S3_AUTH_FILE=/var/lib/irods/.S3_Auth;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP" Page 10

The Proof of Concept irods S3 resource plugin observations and findings Load balancing somehow tricky S3 endpoints are usually hidden behind a load balancer iput with bulk option was only using one S3 connection multiple iput requests were faster because of load balancing Configuring multiple endpoints on same ires did not work (issue #1826) AWS v4 signatures currently not supported (regions eu-central-1 and ap-northeast-2, issue #1831) Renaming issues for files with special characters (issue #1829) Renaming issues for large files (no multipart copy, #1830) rename is a two step procedure (copy & delete) -> processing time related to object size! consistency issues between irods catalog and S3 bucket Issue to delete orphan catalog entries (issue #3154) ichksum did not work with archive resources (hierarchy issue, #3127) Page 11

Pillow talks (1/2) How to improve the relationship of irods and object storage irods currently relies on a POSIX file system implemented as compound resource Native object implementation possible? S3 plugin uses catalog entries as object keys leads to issues (rename timeout, ) archive nature of object storage should be considered decouple object and catalog entries Utilize object storage checksum capabilities (iscan, ichksum) Consider increasing parallel sessions on bulk transfer to better utilize load balancing multiple single files in bulk upload multipart upload (#1824,#1827) Page 12

Pillow talks (2/2) How to improve the relationship of irods and object storage Error handling for object storage access object storage needs to be handled different than POSIX files see: S3 ErrorBestPractices.html Think about metadata usage in object store metadata is a feature of object storage why not use this feature for recovery purposes? Things get better with plugin version 1.4: V4 signature, decoupling object names, renaming, multipart, new parameters, Page 13

Any questions? OTHMAR WEBER Scientific Computing Competence Center Phone: E-mail: +49 214 30 56972 othmar.weber@bayer.com Page 14 Discovering Bayer March 2016

Forward-Looking Statements This website/release/presentation may contain forward-looking statements based on current assumptions and forecasts made by Bayer management. Various known and unknown risks, uncertainties and other factors could lead to material differences between the actual future results, financial situation, development or performance of the company and the estimates given here. These factors include those discussed in Bayer s public reports which are available on the Bayer website at http://www.bayer.com/. The company assumes no liability whatsoever to update these forward-looking statements or to conform them to future events or developments. Page 15

Thank you!

Image credits Slide 6, DNA Sequencer CC BY 2.0, link to picture Slide 8, Think out of the box CC0, link to picture Slide 9, S3 Bucket, CC BY-SA 3.0, link to picture Slide 13/14, Pillow talk CC BY 2.0, link to picture Page 17