Cyclone SGI Cloud Computing for HPC. Christian Tanasescu Vice President Software Engineering

Similar documents
The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

FUSION1200 Scalable x86 SMP System

Scalable x86 SMP Server FUSION1200

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

sgi Scalability Considerations for Compute Intensive Applications on Clusters Christian Tanasescu Daniel Thomas SGI Inc.

The State of Accelerated Applications. Michael Feldman

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

MD NASTRAN on Advanced SGI Architectures *

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Optimizing LS-DYNA Productivity in Cluster Environments

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

AcuSolve Performance Benchmark and Profiling. October 2011

Building NVLink for Developers

2008 International ANSYS Conference

AcuSolve Performance Benchmark and Profiling. October 2011

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

ANSYS HPC Technology Leadership

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Stockholm Brain Institute Blue Gene/L

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited

HPC Solution. Technology for a New Era in Computing

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Leveraging LS-DYNA Explicit, Implicit, Hybrid Technologies with SGI hardware and d3view Web Portal software

Headline in Arial Bold 30pt. SGI Altix XE Server ANSYS Microsoft Windows Compute Cluster Server 2003

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Dell HPC System for Manufacturing System Architecture and Application Performance

Cray XC Scalability and the Aries Network Tony Ford

Manufacturing Bringing New Levels of Performance to CAE Applications

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Accelerating High Performance Computing.

Practical Scientific Computing

MSC NASTRAN EXPLICIT NONLINEAR (SOL 700) ON ADVANCED SGI ARCHITECTURES. Authors. Summary. Tech Guide

THE HIGH-END VIRTUALIZATION COMPANY SERVER AGGREGATION CREATING THE POWER OF ONE

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

White Paper. Technical Advances in the SGI. UV Architecture

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

The Fermi GPU and HPC Application Breakthroughs

HPC Architectures. Types of resource currently in use

2008 International ANSYS Conference

Interconnect Your Future

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

OpenFOAM Performance Testing and Profiling. October 2017

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Trends in systems and how to get efficient performance

MSC NASTRAN EXPLICIT NONLINEAR (SOL 700) ON ADVANCED SGI ARCHITECTURES

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

SuperMike-II Launch Workshop. System Overview and Allocations

Interconnect Your Future

ABySS Performance Benchmark and Profiling. May 2010

HOKUSAI System. Figure 0-1 System diagram

Birds of a Feather Presentation

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

SGI. Technology Guide for Users of Abaqus. September, Authors Scott Shaw, Dr. Olivier Schreiber, Tony DeVarco TECH GUIDE

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

The Future of High Performance Interconnects

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Interconnect Your Future

Cluster Network Products

STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Single-Points of Performance

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

GROMACS Performance Benchmark and Profiling. August 2011

GROMACS Performance Benchmark and Profiling. September 2012

QLogic in HPC Vendor Update IDC HPC User Forum April 16, 2008 Jeff Broughton Sr. Director Engineering Host Solutions Group

Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics

Future Routing Schemes in Petascale clusters

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Solutions for Scalable HPC

Clustering Optimizations How to achieve optimal performance? Pak Lui

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Intel Accelerates Supercomputing. Press Briefing June 19, 2007

Considerations for LS-DYNA Workflow Efficiencies in an HPC Linux Environment

Mapping MPI+X Applications to Multi-GPU Architectures

FUJITSU PHI Turnkey Solution

NAMD Performance Benchmark and Profiling. January 2015

Transcription:

Cyclone SGI Cloud Computing for HPC Christian Tanasescu Vice President Software Engineering

Agenda Rationale for Cyclone SGI offering Role in SGI business model Cyclone service and usage models Partnerships Future Directions

Limitations of Existing Cloud Services for HPC Most cloud systems built on virtualized instances only Limited scalability of available servers: Scale out clusters only Lack of high speed networks Mostly GigE Lack of user control over node interconnect topology MPI latency, core distribution, contiguous memory Lack of available HPC software stacks Lack of technical application specific configurations 3

Benefit of HPC Cloud Computing 10000 Scalability (50% Efficiency) NUMBER OF USERS, APPLICATIONS Desktop Computing 1000 100 10 1 NUMBER OF USERS, APPLICATIONS Nastran Ansys Abaqus/Std. Pamcrash Ls-Dyna Abaqus/Expl. Fluent StarCCM+ Powerflow ANSYS CFX CFD++ FEKO HFSS Gaussian Gamess Amber CASTEP DMOL3 NAMD VASP BLAST FASTA ClustalW HMMER MM5 WRF HIRLAM IFS CCSM POP ProMAX RTM EPOS Eclipse VIP HHG (LRZ) G5D (JAEA) GCM (NASA) BQCD Entry-Level Cluster Computing Enabling wider use of HPC High-End Computing (HPC) Leading-Edge Computing OpenMP MPI Hybrid NUMBER OF PROCESSORS, MEMORY SIZE, JOB COMPLEXITY NUMBER OF PROCESSORS, MEMORY SIZE, JOB COMPLEXITY Source: Nimbis Services, 2010 Graphic adopted from OSC, Council on Competitiveness and the University of Southern California.

Completing the need Personal& Workgroup Traditional Data Center Modular Data Center Cloud Access& Compute Scale & Control Modular & Mobile On Demand Move freely between the environments

SGI Cyclone results on demand Cloud computing dedicated to technical computing Cyclone Offers Flexible Choice of Platform Scale up (Altix 4700 and Altix UV) and Scale out (Altix ICE and Altix XE) Hybrid systems with NVIDIA Tesla, ATI FireStream and Tilera accelerators Operating System (SLES, RHEL) Interconnect (NUMAlink, InfiniBand, GigE) Physicalization and virtualization Application Specific Cloud Application tuned software stacks Open source and close source applications Differentiators Significantly broader platform flexibility SGI Performance Suite for application acceleration Deep Application Engineering expertise

SGI Cyclone Usage Models Cloud Service for Customers SaaS and Iaas Bridge for overflow capacity Bridge for new system installation Hub for new applications and architectures Cloud for Software Development SGI Solutions Partner Program Internal Cloud Benchmarking Software development

SGI Cyclone Domain and Application Focus Domain Use Cases Applications Computational Biology (BIO) Computational Chemistry and Materials (CCM) Computational Fluid Dynamics (CFD) Genomics, proteomics, molecular modeling and drug discovery Nanotechnology, materials research (metal alloys, polymers, composites, ceramics and plastics) Automotive design, aerospace design, defense systems design and power generation design BLAST, FASTA, HMMER, ClustalW Gaussian, Amber, Gamess, Namd, Gromacs, LAMMPS, DL_POLY StarCCM+, OpenFOAM, Acusolve, NUMECA Computational Structural Mechanics (CSM) Computational Electromagnetis (CEM) Structural, heat transfer, fatigue and vibrational analysis for manufactured products Design and analysis of antennas, antenna placement, EMC (shielding, coupling.), RF components and bioelectromagnetic analysis LS DYNA FEKO Ontologies (ONT) Semantic Web, data mining OntoStudio, SemanticMiner Six Initial Domains: More to follow 19 Supported Applications: More to follow

SGI Cyclone System Architectures 1400 1200 Fluent Performacne: FL5L3 9,7M cells Cluster No. of jobs per day 1000 800 600 400 200 Altix ICE, XE IB Fabric Flat tree Hypercube Enhanced Hypercube single rail dual rail 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Nr. of cores IB Altix XE1200 Xeon 5160 3GHz IB Altix XE1200 Xeon 5160 3GHz GigE LS-Dyna Performance - Frontal Crash 1200 IB HCA IB HCA 1000 1091 1091 Elapsed Time (secs) 800 600 400 200 0 628 569 382 326 376 36% 16 32 64 128 256 Number of Cores MPT Single-rail, ICE 8200 MPT Dual-rail, ICE 8200 246 353 226 Platform interconnect choices determine the achieved application perforamance

SGI Cyclone System Architectures NUMAlink SMP Altix UV NUMAlink Router IB Cluster Altix ICE, XE IB Fabric UV HUB UV HUB IB HCA IB HCA 256GB 8PB Shared Memory NUMAlink 5 is the glue of Altix UV Global Shared Memory Sockets: 2 to 256 in in one OS image system Memory: up to 16TB in one OS image system NUMAlink5 BW: 15 GB/s aggregate MPI off load engine on Altix UV MPI reduction, barrier > Synchronization 3x 10x MPI short messages > local Latency 2x MPI next neighbor communication > local BW 3x MPI Barrier Latency <1usec (4096 thread)

Performance Preview on Altix UV SPECfp_rate base2006 SPECint_rate base2006 SPECjbb2005* 8000 6000 4000 2000 6840 X2.7 2550 10000 8000 6000 4000 2000 10400 X3.30 3150 24000000 20000000 16000000 12000000 8000000 X3.2 0 SGI Altix UV 1000, 64S Next x86 competitor 0 SGI Altix UV 1000, 64S Next competitor 4000000 SGI Altix UV 1000, 64S next competitor GUPS* STREAM* 900 800 700 600 500 400 300 200 100 0 SGI Altix UV 1000, 256C X1.8 next SSI competitor 401 301 201 101 1 SGI Altix UV 1000, 64S next x86 competitor SGI Altix UV is the industry s leading supercomputer in relevant performance metrics

Rationale for GPU Analytics of Top500 list, Nov 2009 System Performance = Chip Performance x Nr of Chips 10000,0 10 Pflops 100 Pflops 1Eflops Rmax/Chip Performance( Gflop/s) 1000,0 100,0 10,0 1 Pflops Earth Sim NEC NUDT GPU AMD DKRZ IBM Power6 LANL Roadrunner ECMWF IBM Power6 575 Hybrid SW availability NICS CRAY XT5 AMD64 NASA SGI Altix ICE8200EX X86 General Purpose power, clock ORNL CRAY XT5 AMD64 FZJ IBM BG/P LLNL IBM BG/P ANL IBM BG/P Low-power Comm. overhead 1,0 1000 10000 100000 1000000 Nr of Chips/Sockets

SGI Cyclone Software Stack Applications Development Tools and Libraries Altair PBSProfessional SGI Management Suite SGI Management Center SGI Performance Suite SGI ProPack SGI Foundation Software Novell SUSE Linux, Red Hat Enterprise Linux SGI Scale up, Scale out and Hybrid Systems and Storage SGI products ( SGI Third party product (available from and/or integrated by

SGI Performance Suite Application Accelerator FFIO IO Accelerator linkless IO library that enhances performance for jobs that have much larger IO footprint sizes than free memory sizes on the systems. SGISolve -Scalable SMP parallel in-core and outof-core sparse solvers - Parallel iterative solvers and preconditioners library MPT - MPI offload engine for SMP and Clusters PerfBoost Accelerate applications certified HP-MPI, Intel MPI or OpenMPI No recompile or re-linking needed XPMEM, XPNET fast cross-partition data transfer MPInside MPI performance modelling and prrojection tool Seconds Elapsed Time (min) 600 500 400 300 200 100 0 Powertrain Model,10M DOFs on 2 sockets 532 +20% 444 Altix ICE 8200EX Xeon X5572 Altix ICE 8200EX Xeon X5572 University 2.93GHz of w/o Dublin FFIOComplex Matrix 2.93GHz Benchmark with FFIO 14 12 Pardiso PSLDLT 10 8 6 4 2 0 +30% 5344x5344 1core 10688x10688 1 core 21376x21376 1 core 21376x21376 8core

Partnership Nimbis Cloud Portal for SGI Cyclone Desktop Only Users Nimbis Cloud Portal Nimbis Ecommerce Subsystem Nimbis Broker Subsystem SGI Cyclone

Cyclone Technical Application Library Application tuned HPC environments SGI Apps Portal Gaussian Recipe LS Dyna Recipe Fluent Recipe Process Altix XE Process Process OpenFOAM StarCCM+ Recipe.. Recipe Factory Integration Process Process Process Shipping Factory Order Customer

Conclusions Cyclone completes the need Specifically dedicated to technical computing Making HPC pervasive Fast access to new technologies