Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Similar documents
Using D as the programming language of choice for large scale primary storage system

Designing Next Generation FS for NVMe and NVMe-oF

Emerging Technologies for HPC Storage

Data Movement & Tiering with DMF 7

WekaIO Matrix Architecture WHITE PAPER DECEMBER 2017

DDN About Us Solving Large Enterprise and Web Scale Challenges

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Oracle IaaS, a modern felhő infrastruktúra

Designing elastic storage architectures leveraging distributed NVMe. Your network becomes your storage!

InfiniBand Networked Flash Storage

Deep Learning Performance and Cost Evaluation

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile

Deep Learning Performance and Cost Evaluation

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Computational Storage: Acceleration Through Intelligence & Agility

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

The Impact of SSDs on Software Defined Storage

Improved Solutions for I/O Provisioning and Application Acceleration

Flashed-Optimized VPSA. Always Aligned with your Changing World

The Oracle Database Appliance I/O and Performance Architecture

Introducing Panasas ActiveStor 14

Create a Flexible, Scalable High-Performance Storage Cluster with WekaIO Matrix

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

IME Infinite Memory Engine Technical Overview

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

HPC Storage Use Cases & Future Trends

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

The Fastest And Most Efficient Block Storage Software (SDS)

Hitachi Virtual Storage Platform Family

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

3D NAND Technology Scaling helps accelerate AI growth

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Universal Storage. Innovation to Break Decades of Tradeoffs VASTDATA.COM

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Storage Protocol Offload for Virtualized Environments Session 301-F

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

VMware Virtual SAN Technology

Infinite Memory Engine Freedom from Filesystem Foibles

Storage for HPC, HPDA and Machine Learning (ML)

Company. Intellectual Property. Headquartered in the Silicon Valley

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Applying DDN to Machine Learning

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Opportunities from our Compute, Network, and Storage Inflection Points

EMC ISILON HARDWARE PLATFORM

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Architecting For Availability, Performance & Networking With ScaleIO

IBM CORAL HPC System Solution

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

SurFS Product Description

DELL EMC VXRACK FLEX FOR HIGH PERFORMANCE DATABASES AND APPLICATIONS, MULTI-HYPERVISOR AND TWO-LAYER ENVIRONMENTS

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

Warsaw. 11 th September 2018

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Rethink Storage: The Next Generation Of Scale- Out NAS

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Software Defined Storage

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

WHITE PAPER QUANTUM S XCELLIS SCALE-OUT NAS. Industry-leading IP Performance for 4K, 8K and Beyond

Open vstorage EMC SCALEIO Architectural Comparison

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

A product by CloudFounders. Wim Provoost Open vstorage

Isilon Performance. Name

StorPool Distributed Storage Software Technical Overview

Application Acceleration Beyond Flash Storage

An ESS implementation in a Tier 1 HPC Centre

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Evolution of Rack Scale Architecture Storage

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

From Rack Scale to Network Scale: NVMe Over Fabrics Enables Exabyte Applica>ons. Zivan Ori, CEO & Co-founder, E8 Storage

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

No Tradeoff Low Latency + High Efficiency

Enabling NVMe I/O Scale

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Coordinating Parallel HSM in Object-based Cluster Filesystems

Modernize with all-flash

HCI: Hyper-Converged Infrastructure

Backup and archiving need not to create headaches new pain relievers are around

Ethernet. High-Performance Ethernet Adapter Cards

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

New Computational Approaches to Big Data

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Transcription:

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1

WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible Bare Metal WekaIO Matrix Shared File System Fully Coherent POSIX File System That is Faster than a Local FS Cloud Native Distributed Coding, Scale-out metadata, Fast Rebuilds, End-to-End DP Instantaneous Snaps, Clones, Performance Tiering to S3, DR, Backup InfiniBand or Ethernet, Hyperconverged or Dedicated Storage Server

Why NVMe-oF parallel FS? o Local copy architectures were developed when 1GbitE and HDDs were standard o Modern networks on 100Gbit are 100x faster than SSD o It is much easier to create distributed algorithms when locality is not important o 4KB IOs latency similar to local FS, bigger IOs parallelize, so even lower latency SSD Read SSD Write 10Gbit (SDR) 100Gbit (EDR) Time it takes to Complete a 4KB Page Move 0 20 40 60 80 100 120 Microseconds

Only PFS for NVMe-oF, PFS over S3 Faster than burst-buffer + traditional PFS WekaIO Performance AFA SAN All Flash NAS Speed Simplicity o o Massive Scale Trillions of Files Billions of files per directory 100's of Petabytes Millions of IOPS 100 s of GB of BW Lowest latency FS, higher perf than AFA Scale-out NAS Scale-out Parallel NAS o HDD throughput similar to traditional PFS o Cloud Economics Cloud Object Store Scalability Scale and Value

Software Architecture Keep out of kernel Runs inside LXC container for isolation SR-IOV to run network stack and NVMe in user space Provides POSIX VFS through lockless queues to WekaIO driver I/O stack bypasses kernel Scheduling and memory management also bypass kernel Metadata split into many Buckets Buckets quickly migrate è no hot spots Support, bare metal, container & hypervisor Distributed FS Distributed Bucket FS Distributed Bucket FS Bucket Distributed FS Bucket FS Bucket Data Placement FS Metadata Backend Tiering SSD Agent Networking Clustering Balancing Failure Det. & Comm. IP Takeover TCP/IP Stack Frontend NFS S3 SMB HDFS Weka Driver Applicatio Applicatio Applicatio n n Application n User Space Kernel H/W

Processing Has Shrunk while Data Sets Explode 10 CPU-Only Servers 1 GPU Accelerated Server GPUs have shrunk compute infrastructure by 10x 60 50 40 30 20 10 0 DATA GROWTH 2010 2012 2016 2020 But the data that needs processing has grown 50x Industry is cornered into an I\O nightmare

NFS = Not For Speed NFS Bandwidth 1-1.5GBytes/sec GPU Bandwidth 3-6GBytes/sec o A protocol developed in 1984 trying to solve a 2018 problem o pnfs tried to fix NFS but failed when metadata workloads exploded o Legacy parallel file systems like Lustre and GPFS cannot handle billions of small files And they require a PhD to operate

WekaIO Solves the Data Accessibility Problem GPU Bandwidth 3-6GBytes/sec Per GPU Server 5-44GBytes/sec o Shared, Parallel file system written for NVMe o POSIX Client runs on GPU Servers o Cluster of servers provide high performance file services from NVMe o Low latency networking on InfiniBand or Ethernet o Training data lake stored on low cost object storage for best cost

Actual Results from Bake-off vs All flash filer 7x Faster 1MB Throughput 7.1GB/sec 2.6x Better 4KB IOPS/Latency 165K IOPS 271µsec latency 1GB/sec 61K IOPS 670µsec latency 3.3x Better FS Traverse (Find) 5.5x Better ls Directory 6.5 Hours 55 seconds 2 Hours 10 seconds

GPU Performance vs. Alternatives https://www.theregister.co.uk/2018/06/07/pure_beats_netapp_at_ai/ http://dlpg.labs.hpe.com

Deep Learning Requirements o Actually very close to HPC problems o Store a vast amount of data Effectively stage working set back on fast storage, for efficient access o High bandwidth, low latency o Very good metadata performance, traverse files quickly Billions of files per directory, huge namespaces o Very high single host performance o Support multiprotocol (S3, HDFS, SMB, NFS)

SSD vs HDD pricing (per gb ratio) Source: Hyperion research https://www.storagenewsletter.com/2018/08/07/flash-storage-trends-and-impacts/

HPC only cares about throughput, right? o NAND is cheaper for IOPS (and obviously latency) for several years now o HDD stats: 160MB/sec ; $0.02/GB capacity for 10TB devices o 3.84TB TLC devices read at 1700MB/sec ; so faster than 10 HDDs Total HDD cost needed to read at 1700MB/sec à $2000; avg per NAND device $0.52/GB Already cheaper today! o 7.68TB QLC devices coming next year writing at 1000MB/sec; 6 HDDs needed Total HDD cost needed to read at 1000MB/sec à $1200; avg per NAND device $0.16/GB Next year QLC will be cheaper for write throughput Endurance will probably not hold for checkpointing; but anywyas small capacity that TLC makes sense for

Future of HPC storage is NAND FLASH o Currently HDDs still make sense for some workloads o In a year (and obviously later) HPC storage should steer towards NAND FLASH technologies o Parallel FS for NVMe-oF require different data structure, and algorithms based on modern workloads (scaling metadata, small IOPS, etc) o HPC applications should consider NAND FLASH only for active workload; other media (tape;optics; etc) for archival capacity