HPC Storage Use Cases & Future Trends

Similar documents
Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

DDN About Us Solving Large Enterprise and Web Scale Challenges

Improved Solutions for I/O Provisioning and Application Acceleration

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

2012 HPC Advisory Council

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

SFA12KX and Lustre Update

High Capacity network storage solutions

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

IME Infinite Memory Engine Technical Overview

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Using DDN IME for Harmonie

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

Infinite Memory Engine Freedom from Filesystem Foibles

Application Performance on IME

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

The Fusion Distributed File System

Introducing Panasas ActiveStor 14

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Emerging Technologies for HPC Storage

Applying DDN to Machine Learning

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

An Overview of Fujitsu s Lustre Based File System

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Tech Talk on HPC. Ken Claffey. VP, Cloud Systems. May 2016

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Isilon Performance. Name

NVM Express over Fabrics Storage Solutions for Real-time Analytics

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Data Movement & Storage Using the Data Capacitor Filesystem

The Fastest And Most Efficient Block Storage Software (SDS)

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

System Software for Big Data and Post Petascale Computing

Create a Flexible, Scalable High-Performance Storage Cluster with WekaIO Matrix

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Organizational Update: December 2015

Active-Active LNET Bonding Using Multiple LNETs and Infiniband partitions

CSCS HPC storage. Hussein N. Harake

The RAMDISK Storage Accelerator

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Data Movement & Tiering with DMF 7

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

AFM Use Cases Spectrum Scale User Meeting

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Accelerating Spectrum Scale with a Intelligent IO Manager

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

Fast Forward I/O & Storage

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

GPUs and Emerging Architectures

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Flash Storage with 24G SAS Leads the Way in Crunching Big Data

Brand-New Vector Supercomputer

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

Refining and redefining HPC storage

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

The Datacentered Future Greg Huff CTO, LSI Corporation

GPFS for Life Sciences at NERSC

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading

The Leading Parallel Cluster File System

Feedback on BeeGFS. A Parallel File System for High Performance Computing

IBM CORAL HPC System Solution

Data storage services at KEK/CRC -- status and plan

Let s Make Parallel File System More Parallel

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

HP Z Turbo Drive G2 PCIe SSD

IBM Spectrum Scale IO performance

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

The BioHPC Nucleus Cluster & Future Developments

Life In The Flash Director - EMC Flash Strategy (Cross BU)

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

THOUGHTS ABOUT THE FUTURE OF I/O

MAHA. - Supercomputing System for Bioinformatics

Oracle Exadata: Strategy and Roadmap

White Paper. File System Throughput Performance on RedHawk Linux

InfiniBand Networked Flash Storage

A New NSF TeraGrid Resource for Data-Intensive Science

Transcription:

Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@

DDN About Us DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data and Cloud Applications Main Office: Santa Clara, California, USA Installed Base: 1,000+ Customers in 50 Countries Go-To-Market: Partner & Reseller Assisted, Direct DDN: World s Largest Private Storage Company World-Renowned & Award-Winning 2

DDN The Technology Behind The World s Leading Data-Driven Organizations HPC & Big Data Analysis Cloud & Web Infrastructure Professional Media Security 3

Biggest of the HPC Customers DDN Powers Fastest HPC Centers in the World #1 on 5 Continents and 61 of the Top100 Delivering More GB/s Than All Others Combined Fastest Parallel Filesystem 32PB @ 1 TB/s 20PB @ 250 GB/s 26PB @ 250 GB/s 10PB @ 200 GB/s 10PB @ 150 GB/s 4

Supercomputer Storage in Asia-Pacific Best Suited for CPU OR Accelerator Based Clusters Raijin @ ANU-NCI 1.1PF CPU only cluster 10 PB DDN Storage SFA12K with FDR IB Mix of 10K RPM SAS & 7.2K RPM Nearline SAS DDN supplied and supported Lustre parallel filesystem 150 GB/s throughput Tsubame2 @ Titech Japan 2.2PF CPU+GPU cluster 7.2 PB DDN Storage SFA10K with QDR IB Mix of 10K RPM SAS & 7.2K RPM Nearline SAS DDN supplied and supported Lustre + GPFS parallel filesystem 70 GB/s throughput

Life Science Systems Asia Pacific 2012 Kazusa DNA Research Institute 30 GB/sec 4.5 PB, several storage tiers 2013 Kyoto University CiRA 4 PB capacity 2012/2014 National Institute of Genetics 300 GB/sec aggregate throughput ~10 PB Disc Capacity 2014 Medical Megabank Project 500 GB/sec aggregate throughput 10-50 PB Disc Capacity 2014 Peking University 60 GB/sec 2015 Tokyo University, Human Genome Center (HGC) 22 PB disk capacity 200 TB Flash, 4.5 million random read IOPS 10s to 100s of petabyte of tape library 6

Snapshot of DDN Customers Singapore & India 2PB @ 20 GB/s At CSIR C-MMACS 1.5PB @ 15 GB/s At TIFR-NCBS 500TB @ 25 GB/s At IIT Kanpur 1PB @ 8 GB/s At IUCAA Pune 7 1PB @ 8 GB/s At IITM Pune 300TB @ 5 GB/s At NIBMG

Use Cases for Petaflop Exaflop class machines 8

Storage Use Cases for 100 500 PF Supers Taken from RFPs of Trinity, NERSC-8, ORNL, LLNL 9 Priority 1: Defensive I/O: Checkpoint-restart Users are demanding ability to dump 100% memory in very short period of time & reduction of I/O wait times. Trinity requires 4.4 17.8 TB/s whereas NERSC-8 requires 2.2 8.9 TB/s bandwidth to handle these scenarios Priority 2: Handle concurrency of 50 Million CPU cores Handle average concurrency of application running on 50 Million CPU cores where each core generates data Handle on an average 1 Million files create/s of metadata performance Priority 3: On the fly visualization with quick-drain & prestaging of data Start visualization as soon as data is generated by simulation but before it is written to parallel filesystem. Requires average 500 GB/s write throughput followed by 500 GB/s read throughput. Users are demanding reduction of time to start jobs which need to pre-stage multi Petabytes of data

Continued. Priority 4: Accelerate reads for common files Load user profiles, shell environment, libraries, configuration files, data files for 1000s of users on a multidisciplinary HPC system. Also applicable to chip-design industry for Silicon verification process, pinning reference Genome in bio0informatics etc Priority 5: Fast Data Lookup Scale-out Pattern matching defense/intelligence use cases Google scale data lookups - Hadoop jobs Fraud detection based on embarrassingly parallel databases Priority 6: Application level power capping Ability to dynamically allocate faster CPU, network and storage components to power hungry applications before start of power capping 10

Common Requirements from Storage New supercomputers require main memory performance (RAM) at the cost of disk performance (PFS) Extremely fast cache to absorb checkpoint-restart data Must be able to cope up with bursts of random data Has to be an extension of parallel filesystem as data finally needs to go to long term persistent storage Also needs to integrate with caching mechanism of underlying storage infrastructure This storage must not require re-compiling of applications (has to provide POSIX or MPI-IO semantics) Must be power efficient & space efficient 11

What is Burst Buffer & Why you need it? Analysis: Argonne s LCF production storage system 99% of the time, storage BW utilization < 33% of max 70% of the time, storage BW utilization < 5% of max Trend: Burst Buffers will demand smaller, robust parallel file systems that sustain very high bandwidth efficiency; SFA value proposition remains strong in a burst buffer world DDN Confidential NDA Required, Roadmap Subject to Change Need for Burst Buffer: ExaScale applications need extreme throughput of the order of 5x 10x of what PFS can provide. Large systems do not want to invest in disk only technology, its too costly & power hungry. Capacity required for extreme throughput is usually 2-5 times the system memory size. Such small PFS can not deliver performance. PFS storage often not used to 100% of their capability. Small random I/O is still a challenge for all PFS.

Move to NVRAM (SSDs) is a MUST! NVRAM Is A Viable HPC Tier, Today And Performance Gains Outpace HDD $/MB/s: $0.29 (91% less) $/GB: $1 (1000% more) W/GB/s: 0.003 (98% less) 1E+10 100000000 1000000 GHz Widening 10 6 to 10 8 $/MB/s: $3.19 $/GB: $0.1 W/GB/s: 0.13 10000 100 1 IOPS IOPS 2000 2010 2018 HDDs (IOPS) Flash CPU Operations 13

What is Infinite Memory Engine (IME)? A DDN-developed Bust Buffer Implementation using patentprotected Distributed Hash Table algorithm that manages distributed, non-volatile memory devices: High bandwidth Low latency o reads & writes o large and small o aligned or random Data integrity & protection Massive scalability DDN Confidential NDA Required, Roadmap Subject to Change

Where does Infinite Memory Engine fit? HPC Burst Buffer and DDN s Exascale/Webscale Architecture Elements of the DDN Exascale/Webscale Stack Massively Parallel Computational Platform DDN IME Burst Buffer Tier DDN SFA Persistent Storage Tier DDN WOS Archival Storage Tier NEW Massively Parallel Processing Platform + High-Performance Network Fabric File System Buffer Cache Software And NVRAM Appliances High-Performance, High- Capacity, and Reliable File Storage Appliances Cost-Effective Cloud- Enabled Object-Based Archive

IME Design Decisions 1 Decouples Storage Performance from Capacity (SSD vs Spinning Disk) 2 Speeds Up Apps by Moving I/O Next to Compute (Bandwidth & IOPS, Read & Write, Small & Large) 3 Shrinks Cluster Idle Time With I/O Provisioning (You Bought a $100 Cluster But Are Using Only $25, IME Gives Back the Other $75)

The Infinite Memory Engine: How it Works? 17

How IME Works? Continued. 18

Demo Comparative Testing: Shared Writes IME Accelerates Parallel File Systems By 2,000x CLUSTER LEVEL TESTING DDN GRIDScaler IME( overall) 6,225 Concurrent Write Requests 49 GB/s 49 GB/s Linear Cluster Scaling 12,250,000 Concurrent Write Requests 17 MB/s 49 GB/s DISK LEVEL TESTING DDN GRIDScaler (per SSD) IME (per SSD) 62.5 Concurrent Write Requests 438 MB/s 500 MB/s 125,000 Concurrent Write Requests 170 KB/s 500 MB/s SSDs behind a PFS don t help. IME is at line rate to scale with SSD rates. Avg. 2018 Top500 Cluster Concurrency 57,772,000 Cores (est) 19

IME Demo at ISC 2014 - Summary Content Write to, and read from IME with IOR Write S3D application data to IME Purge data to underlying parallel file system Interface MPI-IO driver for IME Testbed Hardware DDN IME 2U servers 24 SATA SSDs per IME server GRIDScaler(GPFS) with SFA7700 S3D App Performance (MPI-IO) <50 MB/s Compute Cluster Burst Buffer Parallel Filesystem 80 GB/s 3 GB/s

IME Roadmap Various Degrees of DDN Hardware Product Appliance Software Only Clustered Systems IME Client SW DDN SSDs & Server SW IME SERVER IME Client and Server SW IME SERVER IME Client and Server SW DDN Confidential NDA Required, Roadmap Subject to Change

IME Phase 1 System Available for POC now.. Single Rack Solution 20 2U IME Server nodes Dual-rail IB FDR switching 250TB of SSD Capacity Extends PFS interface to cluster 50M IOPS and 200 GB/s Throughput Clustered Systems IME SERVER IME Client and Server SW This rack can support ~1,600 compute nodes, or can be operated as a stand-alone data-intensive cluster

IME Users In Tremendous Value To HPC Centers! Clear Benefits At Any Level of Scale >90% Less I/O & Router Nodes >90% Less Spinning Disks >90% Less Storage Networking >90% Less Data Center Racks >90% Less Data Center Power 23

Thank You 24