Cyclone SGI Cloud Computing for HPC Christian Tanasescu Vice President Software Engineering
Agenda Rationale for Cyclone SGI offering Role in SGI business model Cyclone service and usage models Partnerships Future Directions
Limitations of Existing Cloud Services for HPC Most cloud systems built on virtualized instances only Limited scalability of available servers: Scale out clusters only Lack of high speed networks Mostly GigE Lack of user control over node interconnect topology MPI latency, core distribution, contiguous memory Lack of available HPC software stacks Lack of technical application specific configurations 3
Benefit of HPC Cloud Computing 10000 Scalability (50% Efficiency) NUMBER OF USERS, APPLICATIONS Desktop Computing 1000 100 10 1 NUMBER OF USERS, APPLICATIONS Nastran Ansys Abaqus/Std. Pamcrash Ls-Dyna Abaqus/Expl. Fluent StarCCM+ Powerflow ANSYS CFX CFD++ FEKO HFSS Gaussian Gamess Amber CASTEP DMOL3 NAMD VASP BLAST FASTA ClustalW HMMER MM5 WRF HIRLAM IFS CCSM POP ProMAX RTM EPOS Eclipse VIP HHG (LRZ) G5D (JAEA) GCM (NASA) BQCD Entry-Level Cluster Computing Enabling wider use of HPC High-End Computing (HPC) Leading-Edge Computing OpenMP MPI Hybrid NUMBER OF PROCESSORS, MEMORY SIZE, JOB COMPLEXITY NUMBER OF PROCESSORS, MEMORY SIZE, JOB COMPLEXITY Source: Nimbis Services, 2010 Graphic adopted from OSC, Council on Competitiveness and the University of Southern California.
Completing the need Personal& Workgroup Traditional Data Center Modular Data Center Cloud Access& Compute Scale & Control Modular & Mobile On Demand Move freely between the environments
SGI Cyclone results on demand Cloud computing dedicated to technical computing Cyclone Offers Flexible Choice of Platform Scale up (Altix 4700 and Altix UV) and Scale out (Altix ICE and Altix XE) Hybrid systems with NVIDIA Tesla, ATI FireStream and Tilera accelerators Operating System (SLES, RHEL) Interconnect (NUMAlink, InfiniBand, GigE) Physicalization and virtualization Application Specific Cloud Application tuned software stacks Open source and close source applications Differentiators Significantly broader platform flexibility SGI Performance Suite for application acceleration Deep Application Engineering expertise
SGI Cyclone Usage Models Cloud Service for Customers SaaS and Iaas Bridge for overflow capacity Bridge for new system installation Hub for new applications and architectures Cloud for Software Development SGI Solutions Partner Program Internal Cloud Benchmarking Software development
SGI Cyclone Domain and Application Focus Domain Use Cases Applications Computational Biology (BIO) Computational Chemistry and Materials (CCM) Computational Fluid Dynamics (CFD) Genomics, proteomics, molecular modeling and drug discovery Nanotechnology, materials research (metal alloys, polymers, composites, ceramics and plastics) Automotive design, aerospace design, defense systems design and power generation design BLAST, FASTA, HMMER, ClustalW Gaussian, Amber, Gamess, Namd, Gromacs, LAMMPS, DL_POLY StarCCM+, OpenFOAM, Acusolve, NUMECA Computational Structural Mechanics (CSM) Computational Electromagnetis (CEM) Structural, heat transfer, fatigue and vibrational analysis for manufactured products Design and analysis of antennas, antenna placement, EMC (shielding, coupling.), RF components and bioelectromagnetic analysis LS DYNA FEKO Ontologies (ONT) Semantic Web, data mining OntoStudio, SemanticMiner Six Initial Domains: More to follow 19 Supported Applications: More to follow
SGI Cyclone System Architectures 1400 1200 Fluent Performacne: FL5L3 9,7M cells Cluster No. of jobs per day 1000 800 600 400 200 Altix ICE, XE IB Fabric Flat tree Hypercube Enhanced Hypercube single rail dual rail 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Nr. of cores IB Altix XE1200 Xeon 5160 3GHz IB Altix XE1200 Xeon 5160 3GHz GigE LS-Dyna Performance - Frontal Crash 1200 IB HCA IB HCA 1000 1091 1091 Elapsed Time (secs) 800 600 400 200 0 628 569 382 326 376 36% 16 32 64 128 256 Number of Cores MPT Single-rail, ICE 8200 MPT Dual-rail, ICE 8200 246 353 226 Platform interconnect choices determine the achieved application perforamance
SGI Cyclone System Architectures NUMAlink SMP Altix UV NUMAlink Router IB Cluster Altix ICE, XE IB Fabric UV HUB UV HUB IB HCA IB HCA 256GB 8PB Shared Memory NUMAlink 5 is the glue of Altix UV Global Shared Memory Sockets: 2 to 256 in in one OS image system Memory: up to 16TB in one OS image system NUMAlink5 BW: 15 GB/s aggregate MPI off load engine on Altix UV MPI reduction, barrier > Synchronization 3x 10x MPI short messages > local Latency 2x MPI next neighbor communication > local BW 3x MPI Barrier Latency <1usec (4096 thread)
Performance Preview on Altix UV SPECfp_rate base2006 SPECint_rate base2006 SPECjbb2005* 8000 6000 4000 2000 6840 X2.7 2550 10000 8000 6000 4000 2000 10400 X3.30 3150 24000000 20000000 16000000 12000000 8000000 X3.2 0 SGI Altix UV 1000, 64S Next x86 competitor 0 SGI Altix UV 1000, 64S Next competitor 4000000 SGI Altix UV 1000, 64S next competitor GUPS* STREAM* 900 800 700 600 500 400 300 200 100 0 SGI Altix UV 1000, 256C X1.8 next SSI competitor 401 301 201 101 1 SGI Altix UV 1000, 64S next x86 competitor SGI Altix UV is the industry s leading supercomputer in relevant performance metrics
Rationale for GPU Analytics of Top500 list, Nov 2009 System Performance = Chip Performance x Nr of Chips 10000,0 10 Pflops 100 Pflops 1Eflops Rmax/Chip Performance( Gflop/s) 1000,0 100,0 10,0 1 Pflops Earth Sim NEC NUDT GPU AMD DKRZ IBM Power6 LANL Roadrunner ECMWF IBM Power6 575 Hybrid SW availability NICS CRAY XT5 AMD64 NASA SGI Altix ICE8200EX X86 General Purpose power, clock ORNL CRAY XT5 AMD64 FZJ IBM BG/P LLNL IBM BG/P ANL IBM BG/P Low-power Comm. overhead 1,0 1000 10000 100000 1000000 Nr of Chips/Sockets
SGI Cyclone Software Stack Applications Development Tools and Libraries Altair PBSProfessional SGI Management Suite SGI Management Center SGI Performance Suite SGI ProPack SGI Foundation Software Novell SUSE Linux, Red Hat Enterprise Linux SGI Scale up, Scale out and Hybrid Systems and Storage SGI products ( SGI Third party product (available from and/or integrated by
SGI Performance Suite Application Accelerator FFIO IO Accelerator linkless IO library that enhances performance for jobs that have much larger IO footprint sizes than free memory sizes on the systems. SGISolve -Scalable SMP parallel in-core and outof-core sparse solvers - Parallel iterative solvers and preconditioners library MPT - MPI offload engine for SMP and Clusters PerfBoost Accelerate applications certified HP-MPI, Intel MPI or OpenMPI No recompile or re-linking needed XPMEM, XPNET fast cross-partition data transfer MPInside MPI performance modelling and prrojection tool Seconds Elapsed Time (min) 600 500 400 300 200 100 0 Powertrain Model,10M DOFs on 2 sockets 532 +20% 444 Altix ICE 8200EX Xeon X5572 Altix ICE 8200EX Xeon X5572 University 2.93GHz of w/o Dublin FFIOComplex Matrix 2.93GHz Benchmark with FFIO 14 12 Pardiso PSLDLT 10 8 6 4 2 0 +30% 5344x5344 1core 10688x10688 1 core 21376x21376 1 core 21376x21376 8core
Partnership Nimbis Cloud Portal for SGI Cyclone Desktop Only Users Nimbis Cloud Portal Nimbis Ecommerce Subsystem Nimbis Broker Subsystem SGI Cyclone
Cyclone Technical Application Library Application tuned HPC environments SGI Apps Portal Gaussian Recipe LS Dyna Recipe Fluent Recipe Process Altix XE Process Process OpenFOAM StarCCM+ Recipe.. Recipe Factory Integration Process Process Process Shipping Factory Order Customer
Conclusions Cyclone completes the need Specifically dedicated to technical computing Making HPC pervasive Fast access to new technologies