HPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria

Similar documents
Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

One Optimized I/O Configuration per HPC Application: Leveraging the Configurability of Cloud

Performance Impact of Resource Contention in Multicore Systems

NEMO Performance Benchmark and Profiling. May 2011

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

HPC Architectures. Types of resource currently in use

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures

ABySS Performance Benchmark and Profiling. May 2010

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.

Habanero Operating Committee. January

AcuSolve Performance Benchmark and Profiling. October 2011

Efficiency Evaluation of the Input/Output System on Computer Clusters

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

Feedback on BeeGFS. A Parallel File System for High Performance Computing

SuperMike-II Launch Workshop. System Overview and Allocations

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

ICON Performance Benchmark and Profiling. March 2012

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Real Application Performance and Beyond

GROMACS Performance Benchmark and Profiling. September 2012

REMEM: REmote MEMory as Checkpointing Storage

AcuSolve Performance Benchmark and Profiling. October 2011

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

The RAMDISK Storage Accelerator

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

COMPUTER SCIENCE DEPARTMENT IT INFRASTRUCTURE

Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

CPMD Performance Benchmark and Profiling. February 2014

LBRN - HPC systems : CCT, LSU

Introduction to Jackknife Algorithm

Manufacturing Bringing New Levels of Performance to CAE Applications

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

2008 International ANSYS Conference

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand

Cross-layer Optimization for Virtual Machine Resource Management

OpenFOAM Performance Testing and Profiling. October 2017

Communication Models for Resource Constrained Hierarchical Ethernet Networks

Testing Docker Performance for HPC Applications

HPC Capabilities at Research Intensive Universities

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Voltaire Making Applications Run Faster

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Purchasing Services AOC East Fowler Avenue Tampa, Florida (813) Web Address:

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Parallel File Systems for HPC

Cost-benefit analysis and exploration of cost-energy-performance trade-offs in scientific computing infrastructures

OCTOPUS Performance Benchmark and Profiling. June 2015

Two-Phase flows on massively parallel multi-gpu clusters

Parallel Computing at DESY Zeuthen. Introduction to Parallel Computing at DESY Zeuthen and the new cluster machines

MILC Performance Benchmark and Profiling. April 2013

University at Buffalo Center for Computational Research

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Himeno Performance Benchmark and Profiling. December 2010

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

Maximizing Memory Performance for ANSYS Simulations

Particleworks: Particle-based CAE Software fully ported to GPU

HYCOM Performance Benchmark and Profiling

NAMD GPU Performance Benchmark. March 2011

ANSYS HPC Technology Leadership

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications

Single-Points of Performance

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Scalable x86 SMP Server FUSION1200

Stockholm Brain Institute Blue Gene/L

DATARMOR: Comment s'y préparer? Tina Odaka

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Datura The new HPC-Plant at Albert Einstein Institute

FUSION1200 Scalable x86 SMP System

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

GROMACS Performance Benchmark and Profiling. August 2011

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

SNAP Performance Benchmark and Profiling. April 2014

AMBER 11 Performance Benchmark and Profiling. July 2011

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Hyper-Threading Influence on CPU Performance

Performance Estimation of High Performance Computing Systems with Energy Efficient Ethernet Technology

Performance comparison between a massive SMP machine and clusters

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

A Laconic HPC with an Orgone Accumulator. Presentation to Multicore World Wellington, February 15-17,

CP2K Performance Benchmark and Profiling. April 2011

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Altair RADIOSS Performance Benchmark and Profiling. May 2013

How то Use HPC Resources Efficiently by a Message Oriented Framework.

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

CP2K Performance Benchmark and Profiling. April 2011

IBM Information Technology Guide For ANSYS Fluent Customers

Transcription:

HPC In The Cloud? Michael Kleber Department of Computer Sciences University of Salzburg, Austria July 2, 2012

Content 1 2 3 MUSCLE NASA 4 5

Motivation wide spread availability of cloud services easy access to compute resources easy to scale pay-as-you-go

Questions Performance? suitable for every HPC application? shortcomings and problems? cost-effective?

Cloud Amazon Amazon EC2 well known available to everyone CCI targets HPC user

Cloud virtualisation processor type per instance is not specified performance (CU) and Memory are specified 1GbE or 10GbE (CCI) (real) network capacity, quality and topology are not known

HPC Cluster no virtualisation processor type and Memory ar known InfiniBand, HT-Link or ethernet (1Gb oder 10Gb) several network topologies, e.g. 2D-mesh (real) capacity and quality are known

Outline 1 2 3 MUSCLE NASA 4 5

Overview EC2 CCI versus local cluster [1] "synthetic benchmarks" network performance compute performance parallel I/O performance variation cost-effective

EC2 CCI Setup 2 quad core Xeon X5570, 23GiB RAM 10GbE per instance LBS 1690 GiB EBS + S3 Amazon Linux 64 Bit Intel compiler

Cluster Setup six core Xeon 5670, 32 GiB RAM QDR InfiniBand NFS Intel compiler similar compute performance

Intel MPI NBP

Intel OpenMPI

NBP Completion Time

NPB Communication

Parallel I/O NFS common on small and some mid range clusters, parallel file systems are popular on large clusters some parallel file systems are PVFS, GPFS and Lustre changing the file system every session is only possible with cloud services tested file systems: NFS PVFS benchmarks: IOR micro benchmarks BT-IO

Parallel I/O IOR Read Rate

Parallel I/O IOR Write Rate

Parallel I/O BT-IO Completion Time

Parallel I/O BT-IO Data Rate

Performance Variation

Outline 1 2 3 MUSCLE NASA 4 5

Background case study by Perdue [2] 2 community cluster Steele: 902 nodes, Xeon E5410, utilisation 68.4% Coates: 993 nodes, Opteron 2380, utilisation 80.1% Amazon EC2 CCI: 1.6$ per node hour + storage => ca 1.75$

Cluster Cost

Performance

Outline MUSCLE NASA 1 2 3 MUSCLE NASA 4 5

MUSCLE MUSCLE NASA Multi-Scale Simulation MUSCLE [3] usage examples: blood flow solid tumor models stellar system virtual reactor

Setup MUSCLE NASA in-stent restenosis application 15 and 150 iterations HP cluster platform with 2.26 GHz Xeon processors, InfiniBand EC2 high CPU extra large instances including 7 GiB of RAM, 20 CUs (8 virtual cores) and 1690 GiB of local storage EC2 cluster compute quadruple extra large including 23 GiB of RAM, 33.5 CUs and 1690 GiB of local storage

Performance MUSCLE NASA

Outline MUSCLE NASA 1 2 3 MUSCLE NASA 4 5

Setup MUSCLE NASA NASA [4] Amazon EC2 quadruple extra large, NFS, SLES 11, Intel compiler 12.4, OpenMPI 1.4.4 Pleiades: SGI ICE with Xeon X5570, 24 GiB of RAM, InfiniBand 4XQDR, MPT 2.04, Intel compiler 12.4, Lustre parallel FS

Applications MUSCLE NASA 4 real world applications Cart3D - CFD fluid simulation DDSCAT - calculates absorption and scattering properties Enzo - to simulate cosmological structure formation MITgen - global ocean simulation model

Cart3D MUSCLE NASA

DDSCAT MUSCLE NASA

Enzo MUSCLE NASA

MITgen MUSCLE NASA

good suited for tigthly coupled application high variance comparable costs easy to setup, low maintanance easy to switch filesystems no waiting time

Reference 1 Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications 2 Cost-effective HPC: The Community or the Cloud? 3 Comparison of Cloud and Local HPC approach for MUSCLE-based Multiscale Simulations 4 Performance Evaluation of Amazon EC2 for NASA HPC Applications