Simplifying Collaboration in the Cloud

Similar documents
Web Object Scaler. WOS and IRODS Data Grid Dave Fellinger

Academic Workflow for Research Repositories Using irods and Object Storage

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

DSpace Fedora. Eprints Greenstone. Handle System

2012 HPC Advisory Council

Content Addressed Storage (CAS)

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

EMC Integrated Infrastructure for VMware. Business Continuity

Software-defined Storage: Fast, Safe and Efficient

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Storage for HPC, HPDA and Machine Learning (ML)

Solution Brief: Archiving with Harmonic Media Application Server and ProXplore

ActiveScale Erasure Coding and Self Protecting Technologies

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

EMC DATA DOMAIN PRODUCT OvERvIEW

2014 VMware Inc. All rights reserved.

Milestone Solution Partner IT Infrastructure Components Certification Report

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Unified Management for Virtual Storage

The Microsoft Large Mailbox Vision

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

ActiveScale Erasure Coding and Self Protecting Technologies

Copyright 2012 EMC Corporation. All rights reserved.

IBM N Series. Store the maximum amount of data for the lowest possible cost. Matthias Rettl Systems Engineer NetApp Austria GmbH IBM Corporation

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

Integrated hardware-software solution developed on ARM architecture. CS3 Conference Krakow, January 30th 2018

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Active Archive and the State of the Industry

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

DAHA AKILLI BĐR DÜNYA ĐÇĐN BĐLGĐ ALTYAPILARIMIZI DEĞĐŞTĐRECEĞĐZ

Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

HCI: Hyper-Converged Infrastructure

Carbonite Availability. Technical overview

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

BLOCKBUSTER BACKUP REDESIGN

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016

WOS COMPREHENSIVE OBJECT STORAGE EXECUTIVE SUMMARY WHITEPAPER. Collaborate Distribute Archive

The Software Driven Datacenter

Business Resiliency in the Cloud: Reality or Hype?

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

VERITAS Volume Replicator. Successful Replication and Disaster Recovery

Dell Storage Point of View: Optimize your data everywhere

Storage Optimization with Oracle Database 11g

A Solution for Maintaining File Integrity within an Online Data Archive. Dan Scholes PDS Geosciences Node Washington University

product overview CRASH

Copyright 2012 EMC Corporation. All rights reserved.

EMC Backup and Recovery for Microsoft Exchange 2007

STORAGE EFFICIENCY: MISSION ACCOMPLISHED WITH EMC ISILON

Microsoft SQL Server

Introduction to Digital Archiving and IBM archive storage options

Combining HP StoreOnce and HP StoreEver Tape

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

THE COMPLETE GUIDE COUCHBASE BACKUP & RECOVERY

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved.

Policy Based Distributed Data Management Systems

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

EMC DATA DOMAIN OPERATING SYSTEM

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Dell EMC Surveillance for Reveal Body- Worn Camera Systems

Modern Data Warehouse The New Approach to Azure BI

Dell EMC Surveillance for IndigoVision Body-Worn Cameras

Deploying VSaaS and Hosted Solutions using CompleteView

powered by Cloudian and Veritas

The Google File System

Copyright 2012 EMC Corporation. All rights reserved.

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

MapReduce. U of Toronto, 2014

Complete Data Protection & Disaster Recovery Solution

BigData and Map Reduce VITMAC03

Rediffmail Enterprise High Availability Architecture

Storage Solutions for VMware: InfiniBox. White Paper

Renovating your storage infrastructure for Cloud era

Step into the future. HP Storage Summit Converged storage for the next era of IT

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

TechTour. Winning Technology Roadmaps. Sig Knapstad. Cutting Edge

Vblock Architecture. Andrew Smallridge DC Technology Solutions Architect

A Robust, Flexible Platform for Expanding Your Storage without Limits

Information Infrastructure Forum

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

Dell Technologies IoT Solution Surveillance with Genetec Security Center

Using VMware vsphere Replication. vsphere Replication 6.5

Update on SAM-QFS: Features, Solutions, Roadmap. Michael Selway Sr. Consultant Engineer ILM Software Sales Sun Microsystems Sun.

Jai Menon and the rest of the team IBM Research Autonomic Storage Systems April 11, 2002

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Transcription:

Simplifying Collaboration in the Cloud WOS and IRODS Data Grid Dave Fellinger dfellinger@ddn.com

Innovating in Storage DDN Firsts: Streaming ingest from satellite with guaranteed bandwidth Continuous service to air for a major network Guaranteed QOS for Supercomputers 12GB/s Storage Controller Controller with embedded application VM And the first Object storage system designed for scientific collaboration WOS!

The Big Data Reality Information universe in 2009: - 800 Exabytes In 2020 s: - 35 Zettabytes A new type of data is driving this growth Structured data Relational tables or arrays Unstructured data All other human generated data Machine-Generated Data growing as fast as Moore s Law

Overview of irods + WOS Architecture User Can Search, Access, Add and Manage Data & Metadata irods Rule Engine Track policies irods Data System Data Server Data on Disk irods Metadata Catalog Track data Assemble distributed data into a shared collection» Manage properties of the collection WOS Object System» Enforce management policies WOS Data» Register data to facilitate search mechanisms WOS Object Locality Support Mgr. wide range of management applicationssupervisor Policy driven Global Data» Data sharing, publication, preservation, analysis Replication Management» Automate administrative 60 Drives tasks in 4 RU

NASA & irods Some irods Examples»Jet Propulsion Laboratory Selected for managing distribution of Planetary Data»MODUS (NASA Center for Climate Simulation) Federated satellite image and reference data for climate simulation U.S. Library of Congress» Manages the entire digital collection U.S. National Archives» Manages ingest and distribution French National Library»iRODSrules control ingestion, access, and audit functions Australian Research Coordination Service» Manages data between academic institutions

A Paradigm Shift is Needed File storage Object Storage Millions of Files Scalability 100 s of Billions of Objects Point to Point, Local Access Peer to Peer, Global Fault-Tolerant Management Self-Healing, Autonomous Files, Extent Lists Information Objects w/ Metadata, NoFS 75% on average Space Utilization Near 100%

The WOS initiative Understand the data usage model in a collaborative environment where immutable data is shared and studied. A simplified data access system with minimal layers. Eliminate the concept of FAT and extent lists. Reduce the instruction set to PUT, GET, & DELETE. Add the concept of locality based on latency to data.

A match made in irods Ease of Access» 48 clients Secure access controls Policy based data mgmt. WOS Simplicity Scalability Reliability with Replication Policy based Deterministic Data Redundancy

Storage should improve collaboration Not make it harder Minutes to install, not hours Milliseconds to retrieve data, not seconds Replication built in, not added on Instantaneous recovery from disk failure, not days Built in data integrity, not silent data corruption

WOS: Distributed Data Mgmt. Application returns file to user. A user needs to retrieve a file. A file is uploaded to the application or web server. The WOS client automatically determines Application what makes nodes a have the call requested to the WOS object, client retrieves the to read object (GET) from the lowest latency object. source, The unique and rapidly returns Object ID it to is the passed application. to the WOS client. App/Web Servers OID = 5718a36143521602 The WOS client returns a unique Object ID which the application stores in lieu of a Application file path. The WOS makes The application client stores a the call object to registers the on WOS this a node. client OID with the Subsequent to objects store content (PUT) database. are automatically a new load object balanced across the cloud. OID = 5718a36143521602 Database The system then replicates the data according to the WOS policy, in this case the file is replicated to Zone 2. LAN/WAN Zone 1 Zone 2

irods Integration WOS irods Integration Petabyte scalability: Scale out by simply adding storage modules Unrivaled Simplicity: Management simplicity translates directly to lower cost of ownership Self-Healing: Zero intervention required for failures, automatically recovers from lost drives Rapid Rebuilds: Fully recover from lost drives in moments Replication Ready: Ingest & distribute data globally Disaster Recoverable: For uninterrupted transactions no matter what type of disaster occurs File Layout: Capacity and performance optimized Object Metadata: User-defined metadata makes files smarter 2011 DataDirect Networks. All rights reserved 11

Superior IRODS performance Traditional Storage Is Performance - Expensive» Expends excess disk operations - 5-12 Disk Operations per File Read» Multiple levels of translation and communication - metadata lookups and directory travelling - extent list fetches - RAID & block operations WOS Delivers Performance Through Simplicity» None of the constructs of traditional systems» Single-Disk-Operation Reads, Dual-Operation Writes» Reduced latency from SATA Disks since seeks are minimized» Millions of file/ops per second with ¼ of the disks

WOS Under the Hood Other WOS Nodes WOS Node / MTS Clients Clients/Apps WOS-Lib Clients/Apps HTTP/REST WOS Tray Components Processor /controller motherboard WOS Node software SATA or SAS Drives (2 or 3TB) WOS Software Services I/O requests from clients Directs local I/O requests to disk Replicates objects during PUTs Replicates objects to maintain policy compliance Monitors hardware health 4 GigE WOS API WOS API Array Controllers.. Node OID Map 2 or 3TB drives 2011 DataDirect Networks. All rights reserved 13

WOS + irods: the Division of Labor irods User Features Federated namespace across administrative domains Policy-based data integrity checking Policy-based data authenticity: version control and/or WORM Policy-based data distribution and/or recovery Administrative policies» Retention, disposition, access controls Management: Metadata Server based management WOS Infrastructure Common object namespace across distributed locations Automatic data verification and correction Object Immutability Data continuance» Automatic replication and recovery Storage policies» Data placement, geographic replica distribution Management: Storage Distributed management 2011 DataDirect Networks. All rights reserved 14

WOS: Common Object Namespace Global View, Local Access Complete DR & Global View Key Features Replication across all 4 sites Users have access to data globally Users always access closest data Key Benefits Enables global collaboration Increases performance & productivity via data locality No risk of data loss 2011 DataDirect Networks. All rights reserved 15

WOS: Common Object Namespace Data Locality & Collaboration Site 1 Users WOS-Lib The WOS-Lib service directly routes object requests to the closest copy of the object Site 2 Users WOS-Lib Key Features Up to 4-way replication, local &/or remote Disaster protection Single namespace across all replicas xgige Network xgige Network Data locality B A A B Intelligent data placement Each WOS Node maintains a memory map of the locations of the objects on disk, enabling a single disk operation to retrieve the object, vastly improving performance Data Sharing and Collaboration Objects are replicated across sites per policy All replicas have the same object ID Users always access closest object 2011 DataDirect Networks. All rights reserved 16

WOS Objects: Data Integrity Sample Object ID (OID): Signature Full File or Sub-Object Policy Checksum User Metadata Key Value or Binary ACuoBKmWW3Uw1W2TmVYthA A random 64-bit key to prevent unauthorized access to WOS objects Eg. Replicate Twice; Zone 1 & 3 Robust 128-bit MD5 checksum to verify data integrity during every read. Object = DNA_SEQ Tag = BRACA1 @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG +EAS54_6_R1_2_1_443_348 ;;;;;;;;;;;9;7;;.7;393333 thumbnails

Failure recovery - Data, Disk or Net Operation: GET A Client App WOS-Lib WOS Cluster Group Map Latency Map 10 ms 40 ms 80 ms Get Operation Get Operation Corrupted with Repair 1. WOS-Lib 1. WOSLib selects selects replica replica with least with latency least latency & sends GET request path & sends GET request 2. Node 2. in Node Zone in San Zone Fran San detects Fran returns object corruption object A 3. WOS-Lib back finds to next application nearest copy & retrieves it to the client app 4. In the background, good copy is used to replace corrupted object in San Fran zone 3 10.8.24.101.. 10.8.24.105 1 WOS Nodes 2 4 10.8.25.101.. 10.8.25.105 WOS Nodes 10.8.26.101.. 10.8.26.105 A A A A Zone San Fran Zone New York WOS Nodes Zone London 2011 DataDirect Networks. All rights reserved Best viewed in presentation mode 18

Geographic Replica Distribution Client App WOS-Lib 3 2 WOS Cluster Group Map 4 1 Latency Map 10 ms 40 ms 80 ms PUT with Asynchronous Replication 1. WOSLib selects shortest-path node 2. Node in Zone San Fran stores 2 copies of object to different disks (nodes) 3. San Fran node returns OID to application 4. Later (ASAP) Cluster asynchronously replicates to New York & London zones 5. Once ACKs are received from New York & London zones, extra copy in San Fran zone is removed 10.8.24.101.. 10.8.24.105 1 WOS Nodes A A Zone San Fran 10.8.25.101.. 10.8.25.105 WOS Nodes A Zone New York 10.8.26.101.. 10.8.26.105 WOS Nodes A Zone London 2011 DataDirect Networks. All rights reserved Best viewed in presentation mode 19

Efficient Data Placement WOS Buckets Contain objects of similar size to optimize placement The Result: WOS Object Slots Different sized objects are written in slots contiguously Slots can be as small as 512B to efficiently support the smallest of files. WOS eliminates the wasted capacity seen with conventional NAS storage 2011 DataDirect Networks. All rights reserved 20

Storage Management: Provisioning WOS building blocks are easy to deploy & provision in 10 minutes or less» Provide power & network for the WOS Node» Assign IP address to WOS Node & specify cluster name ( Acme WOS 1 )» Go to WOS Admin UI. WOS Node appears in Pending Nodes List for that cluster» Drag & Drop the node into the desired zone San Francisco New York London Tokyo Simply drag new nodes to any zone to extend storage Congratulations! You have just added 120TB to your WOS cluster!

WOS + irods: YottaBrain Program Each container: 5 PB of WOS WOS clusters federated with irods

WOS + IRODS is the simple solution for Cloud Collaboration WOS is a flat, addressable, low latency data structure. WOS creates a trusted environment with automated replication. WOS is not an extents based file system with layers of V- nodes and I-nodes. IRODS is the ideal complement to WOS allowing multiple client access and an incorporation of an efficient DB for metadata search activities.

Thank You Dave Fellinger dfellinger@ddn.com