HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

Similar documents
HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell HPC System for Manufacturing System Architecture and Application Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Emerging Technologies for HPC Storage

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Dell EMC HPC System for Life Sciences v1.4

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

ABySS Performance Benchmark and Profiling. May 2010

PlaFRIM. Technical presentation of the platform

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET

INGRAM MICRO & DELL. Partner Kit

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

The Future of High Performance Interconnects

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Architectures for Scalable Media Object Search

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

POWEREDGE RACK SERVERS

Xyratex ClusterStor6000 & OneStor

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Secure, scalable storage made simple. OEM Storage Portfolio

InfiniBand Networked Flash Storage

HPC Hardware Overview

IBM Power AC922 Server

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016

Interconnect Your Future

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MAHA. - Supercomputing System for Bioinformatics

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

LS-DYNA Performance Benchmark and Profiling. October 2017

DDN About Us Solving Large Enterprise and Web Scale Challenges

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

NAMD Performance Benchmark and Profiling. January 2015

Using an HPC Cloud for Weather Science

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

for Power Energy and

Microsoft SharePoint Server 2010 on Dell Systems

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

HPC Technology Update Challenges or Chances?

Building NVLink for Developers

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

NAMD GPU Performance Benchmark. March 2011

LS-DYNA Performance Benchmark and Profiling. April 2015

n N c CIni.o ewsrg.au

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

Interconnect Your Future

User Training Cray XC40 IITM, Pune

SNAP Performance Benchmark and Profiling. April 2014

IBM CORAL HPC System Solution

HP Update. Bill Mannel VP/GM HPC & Big Data Business Unit Apollo Servers

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Session 201-B: Accelerating Enterprise Applications with Flash Memory

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Application Performance on IME

Isilon Performance. Name

Overview of Reedbush-U How to Login

OpenPOWER Performance

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

AI-accelerated HPC Hardware Infrastructure. Francis Lam Huawei Technologies

ICON Performance Benchmark and Profiling. March 2012

libhio: Optimizing IO on Cray XC Systems With DataWarp

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

Advanced Research Compu2ng Informa2on Technology Virginia Tech

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Deploying and Managing Dell Big Data Clusters with StackIQ Cluster Manager

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

Transcription:

HPC Innovation Lab Update Dell EMC HPC Community Meeting 3/28/2017

Dell EMC HPC Innovation Lab charter Design, develop and integrate Heading HPC systems Lorem ipsum Flexible reference dolor sit amet, architectures Systems tuned consectetur for research computing, adipiscing manufacturing, elit. life sciences, oil and gas, etc Act as the focal point for joint R&D activities Technology collaboration with partners for joint innovation Research coordination with DSC, COEs and customers New Investment: more SMEs, huge innovation eco-system HPC Innovation Lab Technical briefings, tours, remote access Conduct application Heading performance Lorem studies ipsum and develop best dolor practices sit amet, White papers, consectetur blogs, adipiscing elit. presentations www.hpcatdell.com Prototype and evaluate advanced technologies HPC+Cloud, HPC+Big Data Processors, Accelerators, File systems, software, etc. 2

Focus areas HPC software stack Bright Cluster Manager, OpenHPC Integration of all software components Compute performance and tuning Application focus: BIOS, Memory, Interconnect Accelerators and co-processors Interconnect performance and tuning Storage solutions NSS, IEEL Vertical solutions Genomics research CFD/Manufacturing Proof of Concept studies OpenStack for HPC, Hadoop on Lustre, etc. Collateral at: https://esg.one.dell.com/sites/solutions/esc/hpc/whiteblogs/sitepages/home.aspx 3

World-class infrastructure 13K sqft facility with 1300+ servers and ~10PB storage dedicated to HPC research, development and innovation in collaboration with Dell HPC community Zenith Top500 system based on Intel Scalable Systems Framework (OPA, KNL, Xeon, OpenHPC) 384-nodes with dual E5-2697v4 processors, non-blocking OPA fabric and 451 TFlops sustained performance (#372 on Top 500) Will grow to 512-nodes Rattler Research/development system in collaboration with Mellanox, nvidia 84 nodes with IB EDR and 2697v4 processors 4

C6320P

PowerEdge C6320p Delivering balanced high performance computing Intel Xeon Phi processor Up to 72 outof-order cores, energy efficient Embedded Omnipath and InfiniBand fabric options Provides a choice of low-latency IO for applications with the most demanding IO requirements 6 DIMMs of memory (384GB max.) Local memory enables easy scaling across scale-out computing infrastructures 6 internal drives (12TB max.) Local storage capacity permits faster access to data for better performance, faster results 1 PCIe Gen3 x16 (low profile)/ 1 Mezz x4 permits flexible range of usage 6

WRF BDW, KNL and different modes Average Time Step (lower is better) 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 baseline baseline KNL different memory modes (60C2T) and Broadwell 53.7% 34.4% 1.00 0.99 0.98 0.96 1.00 0.99 0.98 0.97 OSB Flat Cache Flat Cache 2697v4, 2x18c Quad Quad All2All All2All KNL 54% better than 2S BDW for new dataset. 34% better than current conus 12km benchmark. BDW to KNL: 3.3x more cores in use. Without HT, 67% more cores in use. Quad mode with MCDRAM in flat mode is best performance. Quad, Cache is within 1%. Dataset fits in MCDRAM. 31 of 7 40 All2All within 4% of Quad. Intel conus12k Public conus12k Intel-relative-perf Public-relative-perf

Storage with IEEL

Dell Storage for HPC with Intel EE for Lustre Solution Turn-key solution designed for high speed fast scratch storage Solution benefits & Dell differentiation Parallel scalable file system based on Intel EE for Lustre software Single file system namespace scalable to high capacities and performance Best practices developed by Dell HPC Engineering provide optimal performance on Dell hardware Tests yield peaks of roughly 15GB/s write and 17GB/s read per building block Lustre Distributed Namespace allows distribution of Lustre sub-directories across multiple MDTs to increase metadata capacity capabilities and performance Solution design for Big Data workloads using Intel Hadoop Adapter for Lustre (HAL) Share data with other file systems utilizing optional NFS/CIFS gateway Dell Networking 10/40GbE, InfiniBand or Omni-Path Dell PowerVault MD3460 Dell PowerVault MD3420 Intel Manager for Lustre PowerEdge R630 MDS Pair Dell PowerEdge R730 Active/Passive OSS Pair Dell PowerEdge R730 Active/Active 12 Gbps SAS Failover Connections Dell PowerVault MD3420 (Optional for DNE) 12 Gbps SAS Failover Connections 9

IEEL3.0+OPA 10

ML/ DL

HPL Performance on P100-PCIe HPL Performance Scaling on P100-PCIe Performance (TFLOPS) 70 60 50 40 30 20 10 0 93 82 86 84 81 81 85 57.8 41.8 29.4 3.9 15.5 1.1 7.9 CPU(2x 2690 v4) 1 P100 2 P100 4 P100 8 P100 12 P100 16 P100 100 90 80 70 60 50 40 30 20 10 0 Efficiency(%) TFLOPS Efficiency 12 HPL is on Double precision 1 P100 node = 14.1 CPU nodes ( 2x E5-2690v4) Scales very well across nodes, 16x P100 across 4 nodes = 14.9x 1x P100 card

NV-Caffe Training on single P100-PCIe node Images/sec (higher the better) 2000 1800 1600 1400 1200 1000 800 600 400 200 0 Training Speed of GoogleNet in NV-Caffe 89 1.0 476 1.9 905 3.7 1782 2 x 2690 v4 1 P100 2 P100 4 P100 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Speedups Dataset is ImageNet 2012 dataset (ILSVRC2012) 1.2M training images 50K validation images 1000 categories Tested with GoogleNet model 1 P100 node = 20 CPU nodes 4 P100 = 3.75x 1P100, Scales very well with multiple GPUs. CPU P100-PCIe speedup over 1P100 13

HPC System for Manufacturing

Dell EMC HPC System for Manufacturing ISV Applications Dell Pro Support, Pro Support Plus, Deployment Services Bright Cluster Manager Key takeaways Comprehensive offering that includes compute, storage, networking, unified management, monitoring and services Choice & flexibility at every level HPC storage offerings HPC networking offerings Explicit Solver Building Blocks Implicit Solver Building Blocks Remote Visualization Building Blocks Management Building Blocks 15

CD-adapco STAR-CCM+ Explicit BB Scaling (1/2) CD-adapco STAR-CCM+ Scaling Explicit BB CD-adapco STAR-CCM+ Scaling Explicit BB Performance Relative to 32 Cores (1 Node) 10 8 6 4 2 0 32 (1) 64 (2) 128 (4) 192 (6) 256 (8) Number of Cores (Number of Nodes) Civil_Trim_20M HlMach10Sou Performance Relative to 32 Cores (1 Node) 9 8 7 6 5 4 3 2 1 0 32 (1) 64 (2) 128 (4) 192 (6) 256 (8) Number of Cores (Number of Nodes) KcsWithPhysics LeMans_Poly_17M EglinStoreSeparation LeMans_100M Reactor_9M TurboCharger VtmUhoodFanHeatx68m Scaling for all data sets is as expected. Scaling for most data sets is very good, with linear scaling up to 8 nodes. 16

HPC System for Life Sciences

Turn-key solutions designed for genomic computing 18

Cryo-EM ROME SML - Xeon vs KNL over OPA Compute time (lower is better) DATA8 45000 40000 35000 30000 3.1 3.1 3.1 25000 3.0 3.0 2.9 2.9 20000 15000 10000 5000 0 1 2 4 8 10 12 16 Number of servers in test DATA8.OPA.BDW DATA8.OPA.KNL Perf over BDW Compute time (lower is better) DATA6 45000 40000 35000 30000 25000 2.8 2.9 2.8 2.8 2.9 2.9 2.6 20000 15000 10000 5000 0 1 2 4 8 10 12 16 Number of servers in test DATA6.OPA.BDW DATA6.OPA.KNL Perf over BDW Compute time (lower is better) 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 RING11_ALL 3.4 3.4 3.4 3.4 3.4 3.4 3.3 1 2 4 8 10 12 16 Number of servers in test RING11_ALL.OPA.BDW RING11_ALL.OPA.KNL Perf over BDW Xeon 2697 v4, 18c CPU (36c per server) KNL 7230 is ~3x better than Xeon for all three datasets. Both architectures scale well, but KNL starts off and stays better. 19

Dell EMC Isilon Isilon X410 The results came from a 3-node configuration (3 144 nodes, up to 20.7 PB capacity) SmartConnect: maximizing performance by keeping client connections balanced across the entire storage cluster 20

Tying it together - Access to the lab - White papers and Blogs

How to Engage HPC Innovation Lab 22 1) Work with your Dell account team. 2) Submit a request using the tool below. Include as much detail as possible https://esg.one.dell.com/sites/solutions/esc/hpc/request/_layouts/15/start.aspx # Complete the Dell HPC Innovation Lab Evaluation Program Agreement Will the customer / SC complete the benchmarking remotely or is an HPC Engineering team member being requested to assist? 3) Expect a response within 2 days on availability, scheduling. 4) The HPC Innovation lab is located in the Dell Parmer Campus, Austin, Texas. Resources: https://esg.one.dell.com/sites/solutions/esc/hpc/whiteblogs/sitepages/home.aspx http://www.dell.com/hpc http://www.hpcatdell.com

Team publications Blogs www.hpcatdell.com White papers www.dell.com 23