Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018
The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional High Performance Computing Computationally-intensive modeling and simulation Computer-aided engineering Weather forecasting Oil exploration Artificial Intelligence Machine and deep learning applications Fraud / anomaly detection Predictive maintenance Personalized medicine High Performance Data Analytics Complex or time-critical big data analytics workloads Genomics Financial analytics Business intelligence
HPC vs HPDA from a data perspective HPC HPDA Start End Time 3
Value of data over time Value of Data ($) Fast Data Big Data µs ms s hour day month year yr+ Time 4
HPC Storage Challenges Critical Challenge Performance High Availability, Backup, Data Protection Data Sharing and Accessibility Management & Integration Performance Traditional /scratch bare metal, parallel access Uptime important, but no special backup requirements Not the critical feature Important, but not the critical feature Persistence Traditional /home Scalable performance, tunable for workload Able to fulfill compliance requirements, protect important data Pre and post processing, analytics, desktop access Management functionality and support; connections with other tools 5
Performance NVMe-oF Dell EMC & 3 rd Party Job Scheduler HPC Fast Storage (Lustre, BeeGFS, GPFS) Isilon HPC NFS Storage Elastic Cloud Storage (ECS) Virtustream Scratch Project Archive Capacity 6
NVMe Usage Scenarios: Local Dedicated Devices Up to 24 U.2 NVMe (R740XD) Up to 24 U.2 NVMe (R740XD) Up to 24 U.2 NVMe (R740XD) Each host has one or more dedicated NVMe devices Up to 24 U.2 NVMe (R740XD) Targets: Up to 4 Intel AIC NVMe P3700 2TB Targets: Up to 4 Intel AIC NVMe P3700 2TB Targets: Up to 4 Intel AIC NVMe P3700 2TB Targets: Up to 4 Intel AIC NVMe P3700 2TB 7
NVMe Usage Scenarios: Sharing NVMe over Fabrics Targets: Up to 24 U.2 NVMe (R740XD) Targets U.2 NVMe Even hosts with no space or support for NVMe can use NVMeF devices. NVMeF Host Client Targets: Up to 4 Intel AIC NVMe P3700 2TB Targets: Up to 4 Intel AIC NVMe P3700 2TB 100 Gb/s Each host can mount one or more NVMe Targets. NVMeF Host Client 8
NVMe-oF Test System 1 R730 NVMeF Targets Server E5-2690 v4 @ 2.60 GHz. 256 GiB DDR4 2166 MHz Mellanox EDR ConnectX-5 100 Gb/s R730 NVMeF Host Client 1 R730 NVMeF Host Client 2 Targets: 4 Intel AIC NVMe P3700 2TB 2x Xeon E5-2690 v3 @ 2.60 GHz 256 GiB DDR4 2166 MHz Each host can mount one or more Targets. RHEL 7.4 x86_64 (GA level), kernel version 3.10.0-693 Native Drivers 9
NVMe-oF Test System 2 R730 NVMeF Targets Server E5-2690 v4 @ 2.60 GHz. 256 GiB DDR4 2166 MHz OmniPath 100 Gb/s R730 NVMeF Host Client 1 R730 NVMeF Host Client 2 Targets: 4 Intel AIC NVMe P3700 2TB 2x Xeon E5-2690 v3 @ 2.60 GHz 256 GiB DDR4 2166 MHz Each host can mount one or more Targets. RHEL 7.4 x86_64 (GA level), kernel version 3.10.0-693 Native Drivers 10
Configuration Details of NVMe-oF 3 Dell PowerEdge R730, one used as (target) server connected to two clients (hosts). Clients Dual Intel Xeon E5-2690 v3 @ 2.60 GHz. Server Dual Intel Xeon E5-2690 v4 @ 2.60 GHz. 256 GiB of DDR4 @ 2133 MHz. Omni-Path adapters installed on slot 4 (PCIe x16) connected to a switch. RHEL 7.4 x86_64 (GA level), kernel version 3.10.0-693 4 Intel P3700 2TB AIC adapters on the server slots 1, 2, 3 & 5 (all PCIe x8) FIO 2.99 compiled on each machine with libaio support DirectIO and no buffered IO was used to prevent RAM cache Ramp up time was one hour and test time was limited to 300 seconds for each data point Each write test with a different block size was followed by a consistent read test 11
MB/sec Bandwidth Baseline 12000 ib_write_bw -F -R -a 172.20.1.1 Infiniband EDR (100 Gb/s) BW Average OmniPath (100 Gb/s) BW Average 10000 8000 6000 4000 2000 0 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M Package Size 12
Going to Next Level with Hardware ME4 RAID Array: 2u12, 2u24, 5u84 Expansion: 2u12,2u24,5u84 (DAE) Backend Interface:12G SAS FE Interface: 16 FC 4 ports per controller 10G iscsi 4 ports per controller (SFP+ or BaseT) 12G SAS 4 ports per controller Reads IOPS: 320K 4x over MD3 Seq. Reads: 7000 MB/s Seq. Writes: 5500 MB/s 2.6x over MD3 Total system drive count: 336-1.75x over MD3 Raw Capacity: 4PB Single or Dual Controller ME4012: 12-drive RBOD ME4024: 24-drive RBOD ME4084: 84-drive RBOD ME412: 12-drive Expansion ME424: 24-drive Expansion ME484: 84-drive Expansion Note: ME Expansion Units (DAE) cannot be connected to a server directly (not a server-attached JBOD) 13
Parallel File System Lustre w/ ME4 Dell PowerVault ME4024 Dell PowerVault ME4024 (optional for DNE) Dell PowerVault ME4084 14
Dell EMC Isilon Scale-out NAS High Performance to Archive Few TBs to 100 PB in a single file system Files Reduced cost/tb F-Series Tier 1 H-Series Tier 2 A-Series Tier 3 CloudPools Up to 1.5 TB/sec Aggregate Read Policy-based Automatic Tiering Flash, SAS, Sata Native Multi Protocol Access NFS, CIFS, HDFS, Swift Enterprise Features for Data Management, Long Term Archive and Compliance 15
ECS Scale-Out Object Store Modern archive Universal archive for existing primary storage. Replaces tape. No changes to applications or operations. Archive always online of analytics workflows Cloud native Enable new healthcare business operations. Cloud economics and ease of use on-premise. Lower TCO compared to public cloud providers Scalability Deployable in clusters for petabyte and exabyte scalability Data protection Provides geo-distributed data protection with no single point of failure. Globally accessible. One namespace. Multi-tenant architecture Accelerate cloud native applications Future healthcare IoT applications on private infrastructure Operational flexibility Multi-protocol support for legacy & modern applications 16
Emerging Technologies for Persistent Storage Higher Scale 100s PB for File Exabyte for Object High Performance Object Removing Protocol Overheads Gen 7 17