Data Movement & Tiering with DMF 7

Similar documents
IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

HPSS Treefrog Summary MARCH 1, 2018

Storage for HPC, HPDA and Machine Learning (ML)

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

HPSS Treefrog Introduction.

Effizientes Speichern von Cold-Data

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Introducing SUSE Enterprise Storage 5

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

TechTour. Winning Technology Roadmaps. Sig Knapstad. Cutting Edge

Emerging Technologies for HPC Storage

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Data Protection Modernization: Meeting the Challenges of a Changing IT Landscape

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

SC17 - Overview

Introduction to Digital Archiving and IBM archive storage options

Active Archive and the State of the Industry

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

DDN About Us Solving Large Enterprise and Web Scale Challenges

São Paulo. August,

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

IBM Storwize V7000 Unified

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

IBM TS4300 with IBM Spectrum Storage - The Perfect Match -

Using Cohesity with Amazon Web Services (AWS)

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

<Insert Picture Here> Tape Technologies April 4, 2011

EMC ISILON HARDWARE PLATFORM

Data Protection for Virtualized Environments

A GPFS Primer October 2005

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

PRESENTATION TITLE GOES HERE

Software Defined Storage for the Evolving Data Center

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY Unstructured data storage made simple

Coordinating Parallel HSM in Object-based Cluster Filesystems

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Cohesity Flash Protect for Pure FlashBlade: Simple, Scalable Data Protection

Automating Information Lifecycle Management with

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Remove complexity in protecting your virtual infrastructure with. IBM Spectrum Protect Plus. Data availability made easy. Overview

Oracle Exadata: Strategy and Roadmap

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Kaladhar Voruganti Senior Technical Director NetApp, CTO Office. 2014, NetApp, All Rights Reserved

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Costefficient Storage with Dataprotection

designed. engineered. results. Parallel DMF

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

Next Generation Storage for The Software-Defned World

Ta kontroll över er data! Christofer Jensen Client Technical Specialist. Stockholm

Modernize Your Storage

Improved Solutions for I/O Provisioning and Application Acceleration

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Reconstruyendo una Nube Privada con la Innovadora Hiper-Convergencia Infraestructura Huawei FusionCube Hiper-Convergente

Edge to Core to Cloud Architecture for AI

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

An Introduction to GPFS

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Intelligent Storage for HPC: Sun StorageTek QFS and Sun StorageTek Storage Archive Manager Harriet Coverston

BlackPearl Customer Created Clients Using Free & Open Source Tools

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

FLASHARRAY//M Business and IT Transformation in 3U

The Intersection of Cloud & Solid State Storage

Lustre overview and roadmap to Exascale computing

Got Isilon? Need IOPS? Get Avere.

IBM Spectrum Protect Plus

BIG DATA READY WITH ISILON JEUDI 19 NOVEMBRE Bertrand OUNANIAN: Advisory System Engineer

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY

Modernize Your Backup and DR Using Actifio in AWS

Glamour: An NFSv4-based File System Federation

FLASHARRAY//M Smart Storage for Cloud IT

Bringing HyperScale Computing to the Enterprise. The need for Enterprises to overhaul their IT systems

Storage Optimization with Oracle Database 11g

WHITE PAPER Software-Defined Storage IzumoFS with Cisco UCS and Cisco UCS Director Solutions

Modern Data Warehouse The New Approach to Azure BI

1Z0-433

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

EMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

Parallel File Systems. John White Lawrence Berkeley National Lab

Capture Business Opportunities from Systems of Record and Systems of Innovation

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Virtualizing Oracle on VMware

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Transcription:

Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019

Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2

Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive So we need to move data to and from non-volatile medium Solid State or Magnetic Make copies: snapshot, backup, archive Data in Memory Parallel Filesystems 3

Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive So we need to move data to and from non-volatile medium Solid State or Magnetic Make copies: snapshot, backup, archive When we move data to less expensive medium it s called tiering Solid State to Hard Disk to Cloud to Tape Data in Memory Parallel Filesystems Backend Store 4

Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive So we need to move data to and from non-volatile medium Solid State or Magnetic Make copies: snapshot, backup, archive When we move data to less expensive medium it s called tiering Solid State to Hard Disk to Cloud to Tape We also move data because of locality Different compute system Another data center or another organization Computing at Edge Data in Memory Parallel Filesystems Backend Store 5

Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive So we need to move data to and from non-volatile medium Solid State or Magnetic Make copies: snapshot, backup, archive When we move data to less expensive medium it s called tiering Solid State to Hard Disk to Cloud to Tape We also move data because of locality Different compute system Another data center or another organization Computing at Edge Doing this well requires Data Management Data in Memory Parallel Filesystems Backend Store 6

Pushing Compute to Edge Industry Trend Moving compute closer to where data is Transferring large amounts of data produced by IoT devices is too expensive Decision making must occur in real time, transfers take too long AI, Big Data and HPC processing gravitates to minidatacenters at Edge 7

Pushing Compute to Edge Industry Trend Moving compute closer to where data is Transferring large amounts of data produced by IoT devices is too expensive Decision making must occur in real time, transfers take too long AI, Big Data and HPC processing gravitates to minidatacenters at Edge What happens to data & results produced at Edge? Valuable IoT data as well as results should be preserved Typically this means copying to one or more locations Key to organizing and optimizing this process is distributed metadata Workflow managers and users query metadata & schedule data movement in dormant form 8

Pushing Compute to Edge Industry Trend Moving compute closer to where data is Transferring large amounts of data produced by IoT devices is too expensive Decision making must occur in real time, transfers take too long AI, Big Data and HPC processing gravitates to minidatacenters at Edge What happens to data & results produced at Edge? Valuable IoT data as well as results should be preserved Typically this means copying to one or more locations Key to organizing and optimizing this process is distributed metadata Workflow managers and users query metadata & schedule data movement in dormant form POSIX is still dominant access method in HPC Typically, not 100% compliant consistency optimized for performance Significant dependency of codes on POSIX semantics Non-HPC applications store data differently Buckets of objects in cloud, or S3-API Emergence of Data Lakes 9

Pushing Compute to Edge Industry Trend Moving compute closer to where data is Transferring large amounts of data produced by IoT devices is too expensive Decision making must occur in real time, transfers take too long AI, Big Data and HPC processing gravitates to minidatacenters at Edge What happens to data & results produced at Edge? Valuable IoT data as well as results should be preserved Typically this means copying to one or more locations Key to organizing and optimizing this process is distributed metadata Workflow managers and users query metadata & schedule data movement in dormant form POSIX is still dominant access method in HPC Typically, not 100% compliant consistency optimized for performance Significant dependency of codes on POSIX semantics Non-HPC applications store data differently Buckets of objects in cloud, or S3-API Emergence of Data Lakes Moving POSIX data Better done in dormant form where it is immutable Once local to data center, data can be staged as POSIX and computed on 10

Data Management What Challenges does it Solve? Too much data 1PB or more of unstructured file data Need simple & costeffective storage or backup solution Too many files Billions of files that require periodic movement Locate & construct datasets based on workflow Need for Speed Workflow requires high bandwidth or I/O rate HPC, data analytics or AI clusters need faster storage 11

Introducing Data Management Framework Active & Dormant Data Forms HPC/AI Compute Cluster High-Performance Storage Active Tier Hot Data: Performance All-Flash File System Parallel File Systems Scale-out NAS & Object Storage HPE Data Management Framework Tiered data management Dormant Tier Cold Data: Capacity Tape DMF zero watt storage Object Storage & Cloud 12

Data Management Framework Technology Highlights What is DMF? Data Management hub encompassing entire data life cycle Policy-driven POSIX data movement & tiering solution for tape, disk and cloud Scalable metadata capture & search engine based on Big Data technology Scalable parallel data transfer engine optimized for backend Brief history of the product Data Migration Facility Cray + SGI DMF 1.0-2.5 Cray DMF 2.6-3.11 SGI Data Management Framework HPE DMF Suite (7.1) HPE DMF 4.0-6.9 SGI

Stage & De-stage files & directories File System Management What Can DMF 7 Do? Maintain namespace reflection with file and directory metadata that can be queried independently of filesystem or data Transparently migrate & recall files on Lustre, HPE XFS and other parallel filesystems to and from versioned backend store De-stage & stage files and directories, including all metadata, among managed namespaces Recover files, directories and entire file systems, replacing backups Store files in a dormant form without file system representation Construct and manage datasets based on file & directory metadata, including extended attributes Stage datasets just-in-time on demand via API or HPC job scheduler Tier, copy or move datasets according to policy or workflow DMF Metadata Repository Migrate & Recall file data 14

Extensible Metadata Support Data management flexibility and precision with extensible metadata DMF v7 is based on scalable metadata repository Repository functions as a long term data store for information about file system structure, attributes, contents and evolution over time Metadata repository supports POSIX extended attributes on files and directories, e.g. project name, project ID, etc. Queries can be run against metadata including extended attributes for precise and flexible selection of files, e.g. data set creation Additionally, policies can be run against the results of metadata queries for data movement, archiving, etc. Extensible Metadata Project Name Project ID Author Name Budget Category 15

Tape Storage Zero Watt Storage On-Premise Object Storage Or Off-Site Cloud Vertical & Horizontal Data Movement HPC Compute Nodes Work Directory File System & Storage Home Directory File System & Storage All-Flash File Systems & Burst Buffers DMF Policy-based Data Management DMF can provide backup, disaster recovery and highperformance tiered storage for user namespaces Data staging to WORK and BURST is orchestrated by DMF 16

Tape Storage Zero Watt Storage On-Premise Object Storage Or Off-Site Cloud Next Generation Storage Transition to New Technology Manage introduction of new storage technologies over time without disruption Seamlessly manage migration, validation and consolidation of massive data sets Perform migration over period of weeks or months with no impact to user data access Stage managed data to burst buffers or allflash filesystems HPC Compute Nodes High-Performance File System & Storage DMF Policy-based Data Management All-Flash File Systems & Burst Buffers

DMF 7 Software Architecture Designed for Scalability Dynamic or Static Namespaces State-of-art open source components Kafka for Changelog processing Cassandra for Scalable Metadata Mesos for Task Scheduling Changelogs Policy Engine Data Movers Spark for Query Engine Zookeeper for Configuration Cassandra Containerized Components Dedicated Components per Filesystem Component Level HA File Object Metadata

Solution Scaling & Extensibility Unified Scalable Front-End DMF 7 has a unified scalable front end for Lustre, HPE XFS and other filesystems (e.g. GPFS) lfs hsm commands /lfs1 /lfs2 /xfs1 API DMF cli Same Query and Policy engine for the all filesystem types Same DMF CLI for all filesystem types Lustre lfs hsm commands are supported along with the native DMF CLI Copytool Agent Change Log Job Dispatcher FS Agents Path Finder Policy Engine Query Agent API Server Data Manager Cassandra DB

20 Data Management DMF Tape Storage Integration DMF is certified with libraries from HPE, as well as Spectra Logic, IBM and Oracle (StorageTek) Streams to tape drive at native rates, even for small files Block ID positioning for fast seek Support for latest LTO-8 and Enterprise-class drive technology Advanced feature support for accelerated retrieval and automated library management Supports Data Integrity Verification (DIV) and Logical Block Protection (LBP) available with Oracle T10k and IBM LTO7 drives

Zero Watt Storage High-Density Storage for DMF Performance-Oriented & Power-Managed Hardware-withsoftware solution optimized for use with the HPE DMF data management platform Cost-optimized = Total cost of storage competitive with Tape and lower than Cloud High Performance disk based tier provides instant access to first byte and high throughput streaming On-premise storage tier which can be used as a capacity storage tier as well as a fast mount cache or relief to RAID arrays 21

Solution Scaling & High Availability DMF can scale by adding nodes that perform the required roles Add nodes that are DMF database servers to scale metadata capability Add nodes that are DMF movers to scale data migration capability Example of large Lustre configuration Five nodes each provide the DMF database server. Three of those nodes also act as the DMF core server Six nodes are the DMF movers 22

Data Management DMF 7 Design of Lustre Tiering to SES DMF 1 Lustre Cluster(s) DMF 2 DMF 3 DMF 4 DMF 5 Mover 1 Target: 10-20GB/s RGW 1 RGW 2 RGW 3 RGW 4 RGW 5 SUSE Enterprise Storage Mover 2 Mover 3 RGW 6 Mover 4 Infiniband 100gE 23

GiB/sec DMF7 S3 Mover Performance Four DL360 Mover Nodes Lustre to SES 20 18 16 14 12 10 8 Mover4 Mover3 Mover2 Mover1 6 4 2 0 24

Summary DMF 7 Tiered Storage Scalable storage tiering and backup High-latency media such as tape and cloud Metadata Search Locate, select and move large groups of files Standard and userassigned attributes Flash Scratch Right-sized, flashbased, burst buffer namespaces High throughput and millions of IOPS 25

Thank you kirill.malkin@hpe.com 26