IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Similar documents
Improved Solutions for I/O Provisioning and Application Acceleration

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

IME Infinite Memory Engine Technical Overview

HPC Storage Use Cases & Future Trends

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Using DDN IME for Harmonie

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

DDN About Us Solving Large Enterprise and Web Scale Challenges

Applying DDN to Machine Learning

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

The Fusion Distributed File System

Analyzing I/O Performance on a NEXTGenIO Class System

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

Application Performance on IME

I/O Profiling Towards the Exascale

Introduction to HPC Parallel I/O

Infinite Memory Engine Freedom from Filesystem Foibles

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Introduction to High Performance Parallel I/O

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

SFA12KX and Lustre Update

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

CA485 Ray Walshe Google File System

Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

ECE7995 (7) Parallel I/O

Improving I/O Bandwidth With Cray DVS Client-Side Caching

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

An Evolutionary Path to Object Storage Access

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Leveraging Burst Buffer Coordination to Prevent I/O Interference

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Isilon Performance. Name

Quobyte The Data Center File System QUOBYTE INC.

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Structuring PLFS for Extensibility

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Increasing Performance of Existing Oracle RAC up to 10X

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Designing elastic storage architectures leveraging distributed NVMe. Your network becomes your storage!

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Data Movement & Tiering with DMF 7

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Enosis: Bridging the Semantic Gap between

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

The Leading Parallel Cluster File System

Parallel File Systems. John White Lawrence Berkeley National Lab

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

PowerVault MD3 SSD Cache Overview

Parallel I/O on JUQUEEN

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

NEXTGenIO Performance Tools for In-Memory I/O

Innovator, Disruptor or Laggard, Where will your storage applications live? Next generation storage

libhio: Optimizing IO on Cray XC Systems With DataWarp

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Brent Gorda. General Manager, High Performance Data Division

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

Lustre Parallel Filesystem Best Practices

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

A New Key-Value Data Store For Heterogeneous Storage Architecture

IBM Spectrum Scale IO performance

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Toward a Memory-centric Architecture

Fast Forward I/O & Storage

I/O: State of the art and Future developments

Aerospike Scales with Google Cloud Platform

The Fastest And Most Efficient Block Storage Software (SDS)

A Talari Networks White Paper. Turbo Charging WAN Optimization with WAN Virtualization. A Talari White Paper

Deep Learning Performance and Cost Evaluation

The Oracle Database Appliance I/O and Performance Architecture

Extreme I/O Scaling with HDF5

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

2012 HPC Advisory Council

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Extremely Fast Distributed Storage for Cloud Service Providers

FLASHARRAY//M Business and IT Transformation in 3U

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

Taming Parallel I/O Complexity with Auto-Tuning

Lustre A Platform for Intelligent Scale-Out Storage

High-Performance Lustre with Maximum Data Assurance

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Transcription:

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi

2 What is IME? This breakthrough, software defined storage application introduces a whole new applicationaware data acceleration tier that provides gamechanging latency reduction and greater bandwidth and IOPS performance for today s and tomorrow s performance hungry scientific, analytic and big data applications.

3 What is IME? IME delivers the performance of flash with the manageability & capacity of shared storage IME is a new new tier of transparent, extendable, non-volatile memory (NVM), that provides gamechanging latency reduction and greater bandwidth and IOPS performance for the next generation of performance hungry scientific, analytic and big data applications.

4 What is IME? IME creates a new applicationaware fast data tier that resides right between compute and the parallel file system to accelerate I/O, reduce latency and provide greater operational and economic efficiency

5 How Does IME Help? Changes the I/O Provisioning Paradigm & Reduces the Total Cost of Storage IME enables organizations to separate the provisioning of peak & sustained performance requirements with greater operational efficiency and cost savings than utilizing exclusively disk-based parallel file systems STORAGE BANDWIDTH UTILIZATION OF A MAJOR HPC PRODUCTION STORAGE SYSTEM 99% of the time < 33% of max 70% of the time< 5% of max IME Reduces Storage Hardware up to 70% Fewer systems to buy, power manage, maintain

6 How Does IME Help? Limitless Performance Scaling Removes Architectural & Economic & Barriers IME makes exascale I/O a reality, and finally enables the enterprise to run HPC jobs with much greater performance and efficiency IME Eliminates: Parallel file system locking, limitations & bottlenecks 70% of storage hardware, consumed floorspace Latency driving a 30% loss of compute resources 90% of checkpoint/restart downtime

7 Why Cache Matters in HPC Even Large HPC Sites Drive a Lot of Small I/O Cache is critical in aligning all-too-frequent unaligned writes and capturing small writes to preserve spinning disk performance All DDN Storage products offers cache mirroring & battery-backed RAM cache - proven across 3 generations to accelerate all varieties of data Many systems today do not even offer a protected, redundant write cache. Caching is one of the most difficult layers of a storage stack to engineer, it s also the most critical 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

8 Where IME Provides Value IME Accelerates Parallel Filesystems Absorbs all sizes of I/O at full performance, unlike Lustre* and GPFS 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

9 Where IME Provides Value 1. MITIGATES POOR PFS PERFORMANCE caused by PFS locking, small I/O, and mal-aligned, fragmented I/O patterns. IME makes bad apps run well and also prevents a poor-behaving app from impacting the entire supercomputer. This is especially valuable to diverse workload environments and ISV applications. IOR benchmarks indicate a 3x 20x speedup on I/Os <32KB. 25 MB/s S3D Turbulent Flow Model 50 GB/s 4 GB/s 2) PROVIDES HIGHER PERFORMANCE I/O (bandwidth and latency) to the application. At ISC14, we demonstrated three orders of magnitude speed-up due to this high performance tier 3) IME DRIVES SIGNIFICANTLY MORE EFFICIENT I/O TO THE PFS by re-aligning and coalescing data within the non-volatile storage. At ISC14, we demonstrated two orders of magnitude speed-up due to this efficiency 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

10 IME Lowers the Total Cost of Storage IME+PFS delivers better price/performance over PFS alone Cluster Memory = 400TB Qty: 12 IME Appliances NVM Capacity = 2.75X Cluster Memory (Each w/ Qty: 48 1.9TB NVMe SSDs) Components SFA Only IME + SFA Advantages Cluster I/O BW 540 GB/s 756 GB/s 216 GB/s More BW Delivered Storage Fabric BW 540 GB/s 270 GB/s 50% Less BW Needed to PFS Qty: OSS 112 56 50% Less OSS to Buy Qty: SFA Appliances 14 7 50% Less SFA Appliances Needed Qty HDDs/SFA QTY: HDDs IME Value Proposition 400 (80 HDD * 5 Enclos) 5,600 (14 SFA *400 HDD) 800 (80 HDD * 10 Enclos) 5,600 (7 SFA *800 HDD) More bandwidth to the cluster (Faster job turn-around, more jobs in same period, fewer nodes needed to complete same amount of work) 200% More HDD Density per SFA Appliance Delivering the Same Capacity Fewer OSS and SFAs Reduced power, space and operational cost Similar persistent capacity Lower overall capital cost 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

11 HPC Ecosystem Client IO Interfaces Application IO Implementation High-level IO Libraries (optional) MPI-IO Native IO POSIX IO Data path for HL IO library built on POSIX Forwarded + Exported IO (optional) File System IO Interface (VFS, User Space Library) 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

12 High-Level IO Libraries Provides an application and end-user oriented IO interface Files / directories abstracted from users in favor of data sets / objects / containers / variables Object operations (put, get) instead of byte streams (read, write) Portable, self-describing data sets Example High-Level IO Libraries HDF5 (http://www.hdfgroup.org/hdf5/) netcdf (http://www.unidata.ucar.edu/software/netcdf/) PnetCDF (http://cucis.ece.northwestern.edu/projects/pnetcdf/) ADIOS (https://www.olcf.ornl.gov/center-projects/adios/) Implementations leverage lower-level IO interfaces POSIX MPI-IO 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

13 MPI-IO Provides a high-performance parallel IO interface and semantics Applies successful MPI capabilities to file IO Bulk data capabilities (MPI_File_write_at_all) Metadata capabilities (e.g. scalable file open() ) Most popular implementation is Argonne National Laboratory s ROMIO Distributed in MPICH Available in MPICH derivatives (MVAPICH, IBM MPI, Intel MPI, Cray MPI, and others) Key Features: Independent IO: Uncoordinated parallel IO from many concurrent readers and writers Collective IO: Coordinated IO from many readers and writers. Two popular implementations o Data Sieving Selective filtering of data (reduces IOPs) o Two-phase IO Intermediate processes collect and serve data to other processes (reduces number of readers-writers touching PFS) MPI Derived Data Type support: Allow MPI runtime to load non-contiguous data in files directly into application data structures in RAM o Used heavily by higher-level IO libraries (e.g. PnetCDF and HDF5) Specialization for storage system targets (ROMIO ADIO drivers) o IME provides an ADIO driver that translates MPI-IO requests into IME requests o ROMIO provides drivers for Lustre, GPFS, PanFS, Further Reading Chapter 13 of the MPI3 Standard http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

14 POSIX IO Provides a portable byte-stream IO interface read(), write(), open(), close(), POSIX IO Pros Portable Inertia POSIX IO Cons Some design assumptions no longer true for modern computers (concurrency and parallelism) Lots of state at runtime (file descriptors) Further Reading POSIX standard (POSIX.1-2001) 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

15 DDN IME Ecosystem Client IO Interfaces Application IO Implementation High-level IO Libraries (optional) MPI-IO MPI-IO (IME) (POSIX) POSIX (IME FUSE) IME Native Client Library Data path for HL IO library built on POSIX 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

16 DDN IME Ecosystem Client IO Interfaces Three primary interfaces for IME IME FUSE o Provides POSIX IO o Captures IO requests through the Linux VFS o Target Use Case: General purpose applications that use POSIX IME ROMIO o Provides MPI-IO support o Captures IO requests through the MPI runtime in user space o Target Use Case: Parallel applications IME Native Library o Low-level programming interface o FUSE and ROMIO layers implemented on this interface o Target Use Case: Highly-optimized customer applications that may not map cleanly onto POSIX or MPI-IO 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. ddn.com

IME Internal Architecture Overview

18 Aggregate IME Adaptive vs. Non-Adaptive WRITE Performance Ideal, healthy system One degraded IME server, Adaptive Amdahl s Law in action! One degraded IME server, Non-adaptive

19 Real-Time IME Adaptive vs. Non-adaptive WRITE Performance Adaptive heuristic learns quickly 4x Performance Lost with Non-adaptive

20 Use of Log Structuring in IME What does this give us? Near line rate performance regardless of output pattern.