Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Size: px

Start display at page:

Download "Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud"

Luke Hudson
5 years ago
Views:

1 Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532

2 Agenda Purpose Benchmarks used Machine Setups (including EC2) Experiment Setup Results Conclusions

3 Introduction The purpose of this paper is to compare Amazon EC2 service performance against industry standard benchmarks for High Performance Computing data centers. This papers draws comparison between known super computers, and HP data center, and AWS EC2

4 Benchmarks NERSC Framework Workload includes: Areas of climate Materials science Fusion Accelerator modeling Astrophysics Quantum Chromodynamics Integrated Performance Monitoring Used to quantify the computing and communications with MPI interfaces.

5 Machine Setup Carver National Energy Research Scientific Computing Center at Lawrence Berkeley National Labs. 400 nodes Quad-core Intel Nehalem 2.67 Ghz Dual socket nodes and a single Quad Data Rate (QDR) Each Node has 24 GB of RAM (3GB per core)

6 Machine Setup Franklin National Energy Research Scientific Computing (NERSC) Center at Lawrence Berkeley National Labs nodes Cray XT4 supercomputers Single quad-core 2.3 Ghz AMD Opteron Budapest processpr 6.4Gb interconnects (node innerconnect) Each Node has 8 GB of RAM (2 GB per core)

7 Machine Setup Lawrencium Information Technology Division at Berkeley 198 nodes (1584 core) Dell PowerEdge 1950 server Two Intel Xeon quad-core 64 bit, 2.66Ghz Harptown processors DDR Infiniband network Each node, 16GB of RAM (2GB per core)

8 Machine Setup Amazon EC2 Virtual configuration CPU Capacity is defined in terms of an abstract Amazon EC2 compute unit. EC2 CU are approximately equivalent to Ghz The large instances has: 4 EC2 Compute Units 2 Virtual Cores 7.5 GB of memory Interconnect: Gigabit ethernet

9 Machine Setup

10 Machine Setup /proc/cpuinfo Different combinations (no control over assignation) Intel Xeon E Ghz quad-core processor AMD Opteron Ghz dual-cores AMD Opteron 2218 HE 2.6Ghz dual-core

11 Experiment Setup CAM The community Atmosphere Model (CAM) is the atmospheric component of the Community Climate System Model (CCSM) GAMESS Uses sockets communication Considered stride-1 memory access, which stresses memory bandwidth, and interconnect collective performance

12 Experiment Setup GTC Fully self-consistent, gyrokinetic 3-D Particle-in-cell (PIC) code with a non-spectral poisson solver IMPACT-T Integrated Map and Particle Accelerator Tracking Time Uses Hockneys FFT MAESTRO Used to simulating astrophysical flows such as those leading up to ignition in Type Ia supernovae MILC Represents lattice computation that is used to study Quantum ChromoDynamics. Paratec Performs Density Functional Theory quantum-mechanical total energy calculations using pseudi-potentials

13 Results

14 Results Franklin, Lawrence, and EC2, are 1.4x, 2.6x and 2.7x slower than Carver In GAMES Worse case on PARATEC, EC2 is more than 50x slower than Carver. Paratec performs a 3-DFFT and EC2 performed 52x slower than carver

15 Results

16 Results: AWS Cloud HW Variance

17 CONCLUSION Cannot control type of hardware in the cloud Near supercomputer speeds at every house hold

The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality

The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality Katie Antypas, Tina Butler, and Jonathan Carter NERSC Division, Lawrence Berkeley National Laboratory ABSTRACT: This