Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

Similar documents
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Data Movement & Tiering with DMF 7

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

A product by CloudFounders. Wim Provoost Open vstorage

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

13th ANNUAL WORKSHOP Objects over RDMA. Jeff Inman, Will Vining, Garret Ransom. Los Alamos National Laboratory. March, 2017

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Structuring PLFS for Extensibility

Coordinating Parallel HSM in Object-based Cluster Filesystems

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Fast Forward I/O & Storage

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Lustre A Platform for Intelligent Scale-Out Storage

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Rethink Storage: The Next Generation Of Scale- Out NAS

Campaign Storage. Dave Bonnie Technical Lead - Campaign Storage Los Alamos National Laboratory Los Alamos, NM, USA

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Storage for HPC, HPDA and Machine Learning (ML)

Effizientes Speichern von Cold-Data

HPC Storage Use Cases & Future Trends

Improved Solutions for I/O Provisioning and Application Acceleration

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Flashed-Optimized VPSA. Always Aligned with your Changing World

The Fusion Distributed File System

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

HPC NETWORKING IN THE REAL WORLD

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

ATEMPO-DIGITAL ARCHIVE FOR MEDIA AND ENTERTAINMENT

ECS ARCHITECTURE DEEP DIVE #EMCECS. Copyright 2015 EMC Corporation. All rights reserved.

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

API and Usage of libhio on XC-40 Systems

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

EMC XTREMCACHE ACCELERATES ORACLE

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Got Isilon? Need IOPS? Get Avere.

Copyright 2012 EMC Corporation. All rights reserved.

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Accelerating Spectrum Scale with a Intelligent IO Manager

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

libhio: Optimizing IO on Cray XC Systems With DataWarp

Let s Make Parallel File System More Parallel

An Introduction to GPFS

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

Compatibility Guide Storfirst Migration Version 4.9 Q1 2015

The Fastest Scale-Out NAS

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

GlusterFS Architecture & Roadmap

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Distributed File Systems II

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

HPSS Treefrog Summary MARCH 1, 2018

DDN About Us Solving Large Enterprise and Web Scale Challenges

SMORE: A Cold Data Object Store for SMR Drives

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

Discriminating Hierarchical Storage (DHIS)

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

Lustre overview and roadmap to Exascale computing

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Emerging Technologies for HPC Storage

Fast Forward Storage & I/O. Jeff Layton (Eric Barton)

OpenStack SwiftOnFile: User Identity for Cross Protocol Access Demystified Dean Hildebrand, Sasikanth Eda Sandeep Patil, Bill Owen IBM

1Z0-433

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Enhancing Lustre Performance and Usability

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Introduction to High Performance Parallel I/O

The storage challenges of virtualized environments

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

EMC ISILON HARDWARE PLATFORM

Bridging the peta- to exa-scale I/O gap

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

MapR Enterprise Hadoop

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Shared Object-Based Storage and the HPC Data Center

Big Data and Object Storage

IME Infinite Memory Engine Technical Overview

Designing Next Generation FS for NVMe and NVMe-oF

dcache Ceph Integration

Dell Technologies IoT Solution Surveillance with Genetec Security Center

EIOW Exa-scale I/O workgroup (exascale10)

Deduplication File System & Course Review

Tutorial. Storage Lessons from HPC: Extreme Scale Computing Driving High Performance Data Storage. Gary Grider HPC Division Leader LANL/US DOE

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF DELL EMC ISILON ONEFS 8.0

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

An ESS implementation in a Tier 1 HPC Centre

Warsaw. 11 th September 2018

Transcription:

Campaign Storage Peter Braam 2017-04 Co-founder & CEO Campaign Storage

Contents Memory class storage & Campaign storage Object Storage Campaign Storage Search and Policy Management Data Movers & Servers Road Ahead 4/1/17 Campaign Storage LLC 2

Campaign Storage Campaign Storage was invented at Los Alamos National Laboratory 2014- Peter Braam & Nathan Thompson founded Campaign Storage, LLC in March 2016 deliver products in this space Software Defined Storage we will partner with integrators Other companies are addressing parts of Campaign Storage also 3/16/17 Campaign Storage LLC 3

Storage Tiers & Campaign Storage 3/16/17 Campaign Storage LLC 4

CPU or GPU packages CPU cores High Bandwidth Memory NVRAM e.g. XPOINT, PCM, STTRAM RAM FLASH DISK TAPE Node BW (GB/sec) 1 TB/s 100 GB/s 20 GB/s 5 GB/s 350 MB/s Cluster BW (TB/sec) 1 PB/s 100 TB/s 5 TB/s 100 GB/s 10 s GB/s Software Language level Language level, NVM libs HDF5 & DAOS Key features transparent computation transparent computation ultra-fast storage apps Burst Buffers DDN IME, Cray Data Warp HDF5 DAOS name space scientific formats FS style container Parallel FS & Campaign Storage bulk data movement - many files - subtrees of MD Archive & Campaign BW Cost $/ (GB/s) $10 (CPU included!) $10 $300 $2K $30K Capacity Cost $/GB $ $8 $0.3 $0.05 $0.01 3/16/17 Campaign Storage LLC 5

Role of containers Fundamentally unlikely: different tiers perform data movement at similar granularity Containers are a must-have 3/16/17 Campaign Storage LLC 6

Tiers and NVRAM Considerations Tiering Persistence RAM tiers are for computation è migrate pointers, pages Distinguishing NVM feature is that data stays if power is off. Flash storage is 5x faster with large IO Disk similarly is very IO size sensitive: èretrieve & store containers (distributed?) èshow internal structure on faster side èstream and serialize data to slower side Internal program data formats not re-usable è computing format to namespace NVRAM will be the fastest storage device è for most demanding storage applications NVRAM: what other benefits to computing? Current libraries transactions, persistent heaps (not so novel see Camelot & RVM from 1980 s) 3/16/17 Campaign Storage LLC 7

Example Container Functionality - lower tiers 3/16/17 Campaign Storage LLC 8

Tiers & Transparency RAM - Demote infrequently used pointers - Promote frequently used pointers If pointers are not first class objects - Promote upon access - Demote finding less used ones Low level languages HW or OS support Storage Same principle transparency requires accessing data through a handle One handle system with location database allows other objects to move Expect distributed tiered KV store Key value lookup Callbacks for invalidation 3/16/17 Campaign Storage LLC 9

Using Tiers 3/16/17 Campaign Storage LLC 10

Data Center Identity and namespace management with e.g. AD or LDAP Data Center Compute Data Center Nodes Compute Data Center Nodes Compute Nodes Containers for global namespace Enterprise Data Services Enterprise Data Services Enterprise - Groups Data of Services file servers - Groups - Availability of file servers - - Groups Availability - Performance of file servers - - Availability Performance - Performance Campaign Storage Service Layer - SMB, NFS, other Data Movers - Parallel ingest & restore Storage layer - ZFS & object - Integrated search - Analytics Support - Data management - Massive, low $ 3/16/17 Campaign Storage LLC 11

Future Exa-Scale Storage Architecture 3/16/17 Campaign Storage LLC 12

Object Storage 3/16/17 Campaign Storage LLC 13

Cloud object stores pros & cons pro con massive scalability very good storage management widely agreed S3 REST API runs on cheapest hardware data lacks organization API s don t allow distributed concurrent access or random writes performance can be disappointing difficult to re-use as a component of other storage systems 3/16/17 Campaign Storage LLC 14

Too much choice? Caringo Swarm (formerly CAStor) Cleversafe dsnet Cloudian Data Direct Networks Web Object Scaler (WOS) EMC Atmos EMC Centera EMC Elastic Cloud Storage (ECS) HP StoreAll HGST Himalaya HGST Active Archive Hitachi Data Systems HCP NetApp StorageGrid Webscale Quantum Lattus Scality Ring What is needed offers: SwiftStack Swift To mention a few. (others S3, CEPH, SNIA T10, Seagate A200, DDN WOZ.) - Normal read/write IO per object - Non overlapping IO from multiple clients - 3 tier hierarchical redundancy (box, rack, data center) - Transaction protocol to snapshot consistent state 3/16/17 Campaign Storage LLC 15

Campaign Storage 3/16/17 Campaign Storage LLC 16

Campaign Storage - a new tier Old World New World Parallel File System High BW, high $$$ Decreasing capacities Burst Buffer Parallel File System decreasing emphasis Campaign Storage new Archive Archive Cloud 3/16/17 Campaign Storage LLC 17

Campaign Storage It is A file system Focus: staging and archiving Built from Industry standard object stores Existing metadata stores Lowest cost HW High capacity, ultra scalable Not highest BW or lowest latency 10-100x higher than archives 10x lower than PFS It is not General purpose file system Wait these don t exist actually Using object stores has problems Limited set of data movers supported 3/16/17 Campaign Storage LLC 18

Implementation OS with VFS and Fuse MarFS Object Storage Metadata FS 3/16/17 Campaign Storage LLC 19

HPC Cluster A Simulation Cluster 20PF Burst Buffer 5 PB & 5 TB/s Campaign Storage Campaign Storage HPCD & Viz Cluster HDFS Campaign Storage Mover Nodes Campaign Storage Metadata Repository Campaign Storage Mover Nodes Campaign Storage Object Repository HPC Cluster B HPCDCluster 20PF Burst Buffer 5 PB & 5 TB/s Lustre FS 1 TB/s Parallel Staging & Archiving File System Interface customer infrastructure Search & Data Management Optional other tools: Policy managers (e.g. Robinhood) Workflow managers (e.g. Irods) 3/16/17 Campaign Storage LLC 20

Campaign Storage Data Layout 3/16/17 Campaign Storage LLC 21

Search & Policy Management 3/16/17 Campaign Storage LLC 22

3/16/17 Campaign Storage LLC 23

Histograms for subtree search Every directory has histogram DB recording properties of its subtree: i.e. #files, #bytes in the subtree have a property? Limited granularity, limited relational algebra Store perhaps ~100,000 properties in multiple histograms Examples: Quota in subtree? What fileservers contain files? Geospatial information in file? (file type, size, access time) tuples Allows limited relational algebra User database for subtree eliminates reliance on external identity management Not a new idea. Can be added to ZFS & Lustre 3/16/17 Campaign Storage LLC 24

Data Movers & Services 3/16/17 Campaign Storage LLC 25

Data Movers Data Movement Today LANL parallel rsync pftool Lustre HSM mover Packing small files & striping big files Candidates DMAPI HSM mover Gridftp Full POSIX interface Metadata Movement Today Traditional metadata API Multiple namespaces Coming Bulk integration of containers Accompanying metadata 3/16/17 Campaign Storage LLC 26

pftool internals Load Balancer Scheduler dirs queue stat queue cp/s/v queue readdir stat copy / sync validate D O N E Q U E U E REPORTER 3/16/17 Campaign Storage LLC 27

Features of DS3 archival data mover Object store moves batches of files New concept: file level I/O vectorization Includes server driven ordering Packing small files into one object int copy_file_range_fv(copy_range *r, uint count, int flags) struct copy_range { int source_fd; int dest_fd; off_t source_offset; off_t dest_offset; size_t length; } 3/16/17 Campaign Storage LLC 28

Services Campaign Storage always exports the MarFS file system Enterprise services as further exported protocols: - SMB, NFS, HTTP - Data movement can be out of band Integration of namespaces, user databases, other plugins 3/16/17 Campaign Storage LLC 29

Campaign Storage Use Cases 3/16/17 Campaign Storage LLC 30

Workflows - HPC Staging & De-staging Schedule migration with pftool HSM Copy metadata first Use subtree search index Execute policies Specialized data movers For transparent retrieval & attributes Single project extraction Use ZFS namespace and object bucket per project Hot vs cold Campaign Locations Select destination object stores Migration on campaign storage Multi site Leverage object bucket replication Leverage ZFS pool replication Cloud Migrate pool and buckets to S3 Use Snowball? 3/16/17 Campaign Storage LLC 31

Workflows Data Center Staging & archive Schedule migration with pftool Service offload to Campaign Data available without staging Single project extraction Use ZFS namespace and object bucket per project Hot vs cold locations Select destination object stores Migration within campaign storage Automatic movement when services need the data Multi site Leverage object bucket replication Leverage ZFS pool replication Cloud Migrate pool and buckets to S3 Use Snowball? 3/16/17 Campaign Storage LLC 32

Road Forward Unique opportunity to innovate data management LANL and Campaign Storage created an Industry Steering Group Seek agreement on Data layout handling Attributes used in connection with long term storage Interfaces for workflows 3/16/17 Campaign Storage LLC 33

Conclusions 3/16/17 Campaign Storage LLC 34

Conclusions Hardware diversification è Software Specialization Expect a rich high speed exa-scale I/O platform to use containers Similar containers will organize enterprise tiers of storage Campaign Storage: bulk data store, archive & data movement 3/16/17 Campaign Storage LLC 35

Thank you 3/16/17 Campaign Storage LLC 36