HPC In The Cloud? Michael Kleber Department of Computer Sciences University of Salzburg, Austria July 2, 2012
Content 1 2 3 MUSCLE NASA 4 5
Motivation wide spread availability of cloud services easy access to compute resources easy to scale pay-as-you-go
Questions Performance? suitable for every HPC application? shortcomings and problems? cost-effective?
Cloud Amazon Amazon EC2 well known available to everyone CCI targets HPC user
Cloud virtualisation processor type per instance is not specified performance (CU) and Memory are specified 1GbE or 10GbE (CCI) (real) network capacity, quality and topology are not known
HPC Cluster no virtualisation processor type and Memory ar known InfiniBand, HT-Link or ethernet (1Gb oder 10Gb) several network topologies, e.g. 2D-mesh (real) capacity and quality are known
Outline 1 2 3 MUSCLE NASA 4 5
Overview EC2 CCI versus local cluster [1] "synthetic benchmarks" network performance compute performance parallel I/O performance variation cost-effective
EC2 CCI Setup 2 quad core Xeon X5570, 23GiB RAM 10GbE per instance LBS 1690 GiB EBS + S3 Amazon Linux 64 Bit Intel compiler
Cluster Setup six core Xeon 5670, 32 GiB RAM QDR InfiniBand NFS Intel compiler similar compute performance
Intel MPI NBP
Intel OpenMPI
NBP Completion Time
NPB Communication
Parallel I/O NFS common on small and some mid range clusters, parallel file systems are popular on large clusters some parallel file systems are PVFS, GPFS and Lustre changing the file system every session is only possible with cloud services tested file systems: NFS PVFS benchmarks: IOR micro benchmarks BT-IO
Parallel I/O IOR Read Rate
Parallel I/O IOR Write Rate
Parallel I/O BT-IO Completion Time
Parallel I/O BT-IO Data Rate
Performance Variation
Outline 1 2 3 MUSCLE NASA 4 5
Background case study by Perdue [2] 2 community cluster Steele: 902 nodes, Xeon E5410, utilisation 68.4% Coates: 993 nodes, Opteron 2380, utilisation 80.1% Amazon EC2 CCI: 1.6$ per node hour + storage => ca 1.75$
Cluster Cost
Performance
Outline MUSCLE NASA 1 2 3 MUSCLE NASA 4 5
MUSCLE MUSCLE NASA Multi-Scale Simulation MUSCLE [3] usage examples: blood flow solid tumor models stellar system virtual reactor
Setup MUSCLE NASA in-stent restenosis application 15 and 150 iterations HP cluster platform with 2.26 GHz Xeon processors, InfiniBand EC2 high CPU extra large instances including 7 GiB of RAM, 20 CUs (8 virtual cores) and 1690 GiB of local storage EC2 cluster compute quadruple extra large including 23 GiB of RAM, 33.5 CUs and 1690 GiB of local storage
Performance MUSCLE NASA
Outline MUSCLE NASA 1 2 3 MUSCLE NASA 4 5
Setup MUSCLE NASA NASA [4] Amazon EC2 quadruple extra large, NFS, SLES 11, Intel compiler 12.4, OpenMPI 1.4.4 Pleiades: SGI ICE with Xeon X5570, 24 GiB of RAM, InfiniBand 4XQDR, MPT 2.04, Intel compiler 12.4, Lustre parallel FS
Applications MUSCLE NASA 4 real world applications Cart3D - CFD fluid simulation DDSCAT - calculates absorption and scattering properties Enzo - to simulate cosmological structure formation MITgen - global ocean simulation model
Cart3D MUSCLE NASA
DDSCAT MUSCLE NASA
Enzo MUSCLE NASA
MITgen MUSCLE NASA
good suited for tigthly coupled application high variance comparable costs easy to setup, low maintanance easy to switch filesystems no waiting time
Reference 1 Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications 2 Cost-effective HPC: The Community or the Cloud? 3 Comparison of Cloud and Local HPC approach for MUSCLE-based Multiscale Simulations 4 Performance Evaluation of Amazon EC2 for NASA HPC Applications