Performance Boost for Seismic Processing with right IT infrastructure Vsevolod Shabad vshabad@netproject.ru CEO and founder +7 (985) 765-76-03
NetProject at a glance 2 System integrator with strong oil&gas focus: we build and expand the IT infrastructure for Geological and Geophysical applications
How to make the HPC right balanced? 3 The applications performance must be limited only by the licenses: Paradigm ES 360, GeoDepth, Echos Schlumberger Omega Landmark SeisSpace ProMAX CGG Geocluster with minimal overall cost
Cluster components utilization 4 100% Cluster performance must be limited only by the CPU! 90% 80% 70% 60% 50% 40% 30% Optimal (limited only by CPU) Suboptimal (nothing limits - excessive resources) CPU RAM Network Local HDD Shared File System Suboptimal (limited by Network but not CPU)
Key ideas for optimization 5 Massive reduction: CPU downtime for compute nodes CPU load by auxiliary tasks operating costs The typical sample of HPC CPU load during workday (real case)
Candidates for optimization 6 Compute nodes File nodes and storage Management nodes and resource schedulers Cluster interconnect Backup and restore subsystem Monitoring and management subsystem Cooling subsystem
Compute nodes optimization 7 CPU & RAM: optimal choice CPU offload with RDMA CPU offload with GPU Right form factor of servers
CPU & RAM: optimal choice sample 8 Part Haswell-EP (Xeon E5-2600 v3) Broadwell-EP (E5-2600 v4) Server type CPU RAM HDD Network 2x Xeon E5-2650 v3 (20 cores, 2.3 GHz) 160 GB (8 GB/core) PC4-2133 HPE ProLiant XL230a Gen9 2x Xeon E5-2680 v4 (28 cores, 2.4 GHz) 224 GB (8 GB/core) PC4-2400 2x 1.8 TB SAS 10K RPM 1x Infiniband FDR, 4x GigE List Price, USD 17 456 18 366 Performance, TFLOPS 0.74 1.08 List Price per core, USD 872.8 655.9 ( 24.8%) List Price per TFLOPS, USD 23 713.4 17 081.5 ( 28.0%) Typical compute node for Paradigm ES 360
CPU & RAM: non-optimal choice sample 9 Part Broadwell-EP (E5-2600 v4) Broadwell-EP (E5-2600 v4) Server type CPU RAM HDD Network 2x Xeon E5-2680 v4 (28 cores, 2.4 GHz) 224 GB (8 GB/core) PC4-2400 HPE ProLiant XL230a Gen9 2x Xeon E5-2699 v4 (44 cores, 2.2 GHz) 352 GB (8 GB/core) PC4-2400 2x 1.8 TB SAS 10K RPM 1x Infiniband FDR, 4x GigE List Price, USD 18 366 36 396 Performance, TFLOPS 1.08 1.55 List Price per core, USD 655.9 827.2 (+26.1%) List Price per TFLOPS, USD 17 081.5 23 499,5 (+37,6%) Typical compute node for Paradigm ES 360
Some non-obvious restrictions 10 HPE Proliant XL230a Gen9: can t work with single CPU two same processors required can t use the 32GB RDIMM for DDR4 RAM only LRDIMM for 32GB DDR4 allowed Fujitsu CX400 M1: can t use two Infiniband HBAs with liquid cooling only one Infiniband HBA allowed
CPU offload with RDMA Direct buffer-to-buffer data move without CPU and operating system
CPU offload with RDMA 12 RDMA usage in HPC: between compute nodes between compute node and file node (storage)
Sample of RDMA efficiency 13 Source: http://cto.vmware.com/wpcontent/uploads/2012/09/rdmaonvsphere.pdf
RDMA must be supported by the apps! 14 Seismic processing: Paradigm ES 360, GeoDepth, Echos since version 15.5 Schlumberger Omega, Landmark ProMAX with MPI File systems: IBM Spectrum Scale (ex GPFS) Lustre BeeGFS (ex FhGFS)
CPU offload with GPU 15 Paradigm: Echos RTM Schlumberger: Omega RTM, Kirchhoff Tsunami: Tsunami RTM Full list: http://www.nvidia.co.uk/content/emeai/pdf/teslagpu-applications/gpu-apps-catalog-eu.pdf
Sample of GPU efficiency 16 Statoil custom seismic processing application NVIDIA GPU (GeForce GTX280) Source: http://www.idi.ntnu.no/~elster/master-studs/owej/owe-johansen-masterntnu.pdf
Right form factor of servers 17 Rack-optimized 1U HPE Proliant DL360 Gen9, Rack-optimized 2U Fujitsu RX2540 M2, High-density servers Lenovo NeXtScale M5, Blade servers DELL M1000e,
Typical high-density chassis 18 Chassis Number of nodes Chassis height, RU Lenovo NeXtScale n1200 12 6 HPE Apollo 2000 4 2 HPE Apollo a6000 10 5 (+1.5) Fujitsu Primergy CX400 S2 4 2 Huawei X6800 8 4 DELL PowerEdge FX2 8 2 DELL PowerEdge C6320 4 2
File nodes and storage 19 Right choice of file system with RDMA и QoS High-density block storage arrays Short IOPS caching with SSD File system overhead reduction Backup to Redirect-On-Write snapshots Transparent data migration to tapes
Right choice of file system 20 RDMA support at client side Redirect-On-Write snapshot support Transparent migration to tape IBM Spectrum Scale (GPFS) Lustre (+ ZFS) BeeGFS Panasas PanFS Yes Yes Yes No No Yes Yes (with ZFS) No Yes Yes Yes No No No No Commercial support Yes Yes (no for ZFSon-Linux) Yes Yes Yes Installation difficulty High High High Low Low Single-Thread performance Moderate Low High Low Low EMC Isilon Huawei 9000 NetApp FAS (NFS)
File system overhead sample 21 Source: http://wiki.lustre.org/images/b/b2/lustre_on_zfs-ricardo.pdf
Right choice of block storage array 22 Requirements sample (moderate cluster for SLB Omega): I/O throughput 3 GB/sec on sequential I/O usable capacity 300 TB Solutions: EMC VNX5400 174 spindles, 39 RU NetApp E5600 69 spindles, 12 RU Front-end I/O ports options: SAS, Fibre Channel, 10GbE, Infiniband NetApp E5600 disk array supports iser connection via Infiniband fabric (without FC & SAS)
Short IOPS caching with SSD 23 Most applicable for: metadata trace headers File system support: IBM Spectrum Scale: Highly-available write cache (HAWC) Local Read-Only Cache (LROC) Panasas ActiveStor: internal Panasas ActiveStor capabilities Lustre: no EMC Isilon: SmartFlash
Backup to Redirect-On-Write snapshots 24 Traditional backup to tapes: 500 TB of data, verification after write four LTO-7 (300 MB/sec) drives: 500 000 000 / (4 * 300 * 2 * 3600) = 58 hours under ideal conditions Innovative backup (snapshots): 10 minutes regardless of data volume
Transparent data migration to tapes 25 Middleware: IBM Spectrum Archive (LTFS) Drives IBM TS1150: 360 MB/s, 10 TB per cartridge
Resource schedulers 26 Cluster usage alignment and optimization Energy aware scheduling Top-3 systems: IBM Spectrum LSF (Platform LSF) Altair PBS Pro Adaptive Computing MOAB HPC Suite
Cluster interconnect 27 Key technology RDMA Two options: Infiniband FDR/EDR Ethernet 40G/56G/100G (with RoCE)
Backup and restore subsystem 28 Protection from logical data damage with Redirect-On-Write snapshots Protection from от physical data damage is not necessary with dedicated seismic data archive on LTO-7 tapes
Snapshot technologies comparative 29 Redirect-On-Write NetApp FAS3240 Copy-On-Write IBM Storwize v7000 Source: NetProject s comparative testing for JSC NOVATEK (2012)
Monitoring and management 30 Dedicated Gigabit Ethernet fabric Management software: Altair PBS Pro + HPE CMU IBM Platform Cluster Manager
Cooling subsystem 31 Air conditioning: applicable for any equipment (up to 15 кw/rack) temperature range could be expanded Liquid conditioning: equipment adaptation or liquid cooling rack door required high level of data center density
Temperature range expansion 32 4% operational savings from cooling for every 1 0 C increase in operating temperature (Intel, IDC, Gartner) Incompatible: with NVIDIA GRID with CPU > 120W ATD module price for Fujitsu Primergy is 26 USD!
Our product portfolio: 33 Servers: Lenovo, HPE, DELL, Fujitsu, Cisco, Huawei, Intel, Supermicro, Inspur, Sugon, Storage: NetApp, IBM, HDS, Panasas, Huawei, EMC, Network: Mellanox, Cisco, Lenovo, Huawei, Brocade Tape libraries: IBM, HPE, Quantum Resource scheduling tools: IBM Platform, Altair, Adaptive Computing
Company strengths 34 Strong industry focus on G&G IT infrastructure Deep knowledge and experience: Industry specifics and major applications: better than most system integrators and IT infrastructure vendors IT infrastructure products and technologies better than most G&G software vendors Advanced project management methodology Structured knowledgebase of past projects High engineering culture of staff Deep customer involvement in projects
Company weaknesses 35 Strong industry focus on G&G IT infrastructure Low level of company brand recognition Lack of overseas market experience A small number of employees Limited financial resources
Company background in a brief 36 Technology experience: Networking since 1996 Storage since 2004 Servers since 2006 VDI since 2010 HPC since 2014 Oil & Gas industry focus since 2012 metallurgy and banking focus in 1996-2014 Big Data experience since 2007 ISO 9001:2008 certified since 2012
Partner reference about NetProject 37 At the last time we are increasingly ask NetProject for consulting about hardware configurations, clusters, workstations, and infrastructure solutions. This company's team has participated in the creation of the various data center architectures for the wide spectrum of Paradigm technologies and has proved itself from the best side. Serge Levin, Paradigm Geophysical (Russia) Sales Director
A proven way to win! Join to us! 38 http://www.netproject.ru/