Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1
DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business Challenges at Scale Main Office: Sunnyvale, California, USA Go To Market: Partner & Reseller Assisted, Direct DDN: World s Largest Private Storage Company Only Storage Company with Long-Term on Big Data Focus World-Renowned & Award-Winning
An Elite Collection Of HPC s Finest... Some of our 1000+ Customers 3
DDN 15 Years of HPC Innovation 4 DDN FOUNDED 1 st CUSTOMER NASA 10GB/s NCSA 100GB/s CEA, LLNL LARGEST PRIVATE STORAGE CO. (IDC) 1TB/s ORNL 500+ EMPLOYES 5 BP / Rack 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 1 st Real-Time Appliance for High-Scale Big Data EXAScaler DDN s 1 st Parallel File System Offering ft. Lustre 1 st in Data Center Density DDN Leads On The List of Lists: 80% of the Top 10 67% of the Top 100 32% of the Top 500 SFA Storage Fusion Architecture 1 st in Bandwidth + IOPS 1 st In-Storage Processing SW-Only, Portable Architecture 1 st Hyperscale Object Storage Web-Scale Computing and HPC Collaboration SFX Flash Tiering 1 st Application-Aware Hybrid Caching Revolutionizing HPC
Our Unwavering Commitment to HPC 5 Investments in Exascale Real Engineering Is Needed To Scale 1000x Fast Forward
Exascale I/O Challenges - Cost 6 LANL Trinity Hybrid Scratch Cost Analysis Hybrid approach is necessary to meet bandwidth & capacity requirements
# of HDDs Exascale I/O Challenges Power Consumption 7 70000 60000 NERSC-8 Cost Comparison Power 768KW 26 SFA Controllers + BB 7 Power 1792KW 99 SFA Controllers 50000 40000 Power 470KW 26 SFA Controllers 30000 Hybrid HDDs 20000 10000 0 0.76 2.96 Burst Throughput (TB/sec)
Exascale I/O Challenges Efficiency Analysis: Argonne s LCF production storage system (circa 2010) 99% of the time, storage BW utilization < 33% of max 70% of the time, storage BW utilization < 5% of max Burst Buffer Absorbs the Peak Load 25 MB/s 50 GB/s 4 GB/s IME SC 13 Demo Cluster Burst Buffer Tier 8 1) Separation of bandwidth and capacity is required 2) Utilization efficiency must be improved Filesystem Handles the Remaining Load Persistent Storage Tier Archival Storage Tier
Why is today s I/O efficiency so poor? 9 Serialization at various points in the I/O path Stripe and block alignment (PFS and RAID) o Read-modify-writes to underlying storage Lock contention o Exacerbated by poor I/O structuring in applications Compute Node1 Compute Node2 100000000 10000000 1000000 100000 10000 1000 2015 2018 Performance (TF) 20000 1000000 Concurrency 5000000 1000000000 1.2E+09 1E+09 800000000 600000000 400000000 200000000 0 Source: http://storageconference.org/2011/presentations/snapi/1.grider.pdf Lock Contention Worsens with 1000s of nodes File Server 1 File Server 2 File Server 3 File Server 4 Storage
Why is today s I/O efficiency so poor? 10 Poor horizontal scaling characteristics in the PFS weakest link PFS are only as fast as the slowest I/O component Oversubscribed or crippled I/O components affect the entire system performance As I/O sections get larger and # of components increases the problem worsens (congestion) This weakest link can be all the way down to disks (RAID rebuilds) A single overloaded server can slow down the entire system File Server 1 File Server 2 File Server 3 File Server 4 Storage
Performance Efficiency (Percent) Percentage of Stripe Size Throughput (MB/s) PFS Efficiency as a Function of I/O Size 11 100 90 80 70 60 50 40 30 20 10 Performance & Efficiency of Non- Mergeable Writes as a Function of I/O Size 0 4096 409600 I/O Size (Log-Scale) 100 Aligned, full-stripe-width IO required for maximum PFS I/O performance 90 80 70 60 50 40 30 20 10 0 Performance I/O Size (bytes) 2000 1800 1600 1400 1200 1000 800 600 400 200 0 Parallel Filesystem on IME Demo Cluster SSDs (50GB/s available) 1 8 64 512 IO Request SIze (KB) Faster media (SSDs) may not address the underlying PFS performance limitations Avg
What is Infinite Memory Engine (IME )? 12 High performance I/O system based on parallel log structuring Massive concurrency regardless of application I/O pattern Dynamically load balancing helps steer clear of oversubscribed and handicapped components Innovative lookup mechanism enables immediate availability of data Distributed fault tolerance 12
The Infinite Memory Advantage 13 Designed for Scalability Patented DDN Algorithms Fully POSIX & HPC Compatible No Application Modifications Scale-Out Data Protection Distributed Erasure Coding Non-Deterministic System Write Anywhere, No Layout Needed Integrated With File Systems Designed to Accelerate Lustre*, GPFS No Code Modification Needed Writes: Fast. Reads: They re Fast Too. No other system offers both at scale.
SC 13 Demo Comparative Testing: Shared Writes 14 IME Clients one per compute node; 98 node-local MLC SSDs Linear Cluster Scaling 14 CLUSTER LEVEL TESTING DDN GRIDScaler IME( overall) 6,225 Concurrent Write Requests (8 MB) 12,250,000 Concurrent, Interleaved Write Requests (4 KB) 49 GB/s 49 GB/s 17 MB/s 49 GB/s DISK LEVEL TESTING DDN GRIDScaler (per SSD) IME (per SSD) 62.5 Concurrent Write Requests 438 MB/s 500 MB/s 125,000 Concurrent Write Requests 170 KB/s 500 MB/s SSDs behind a PFS don t help IME is at line rate to scale with SSD rates Avg. 2018 Top500 Cluster Concurrency 57,772,000 Cores (est)
IME Checkpoint / Migration Workload Demo Achieves >90% of Available Storage Bandwidth 15 Checkpoint I/O directed at IME (emulated with IOR) File #1 (49 50 GB/s) File #2 (49 50 GB/s) File #3 (49 50 GB/s) Migration of File #3 from IME to PFS (4-5 GB/s)
ISC 14 IME Demo Server 16 Off the shelf 2U Server Chassis Dual Socket Ivy Bridge with 128 GB RAM Up to 24 SSDs per IME Server 2 FDR IB Ports Expected Burst Bandwidth per IME Server: ~10GB/s
ISC 14 Demo System in DDN Booth 17 16U (servers) Total Peak BW: ~80GB/s