The Use of Cloud Computing Resources in an HPC Environment

Similar documents
HPC Architectures. Types of resource currently in use

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

GPUs and Emerging Architectures

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Memory Systems IRAM. Principle of IRAM

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Future Trends in Hardware and Software for use in Simulation

The Optimal CPU and Interconnect for an HPC Cluster

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

COSC 6385 Computer Architecture - Multi Processor Systems

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Operating Systems: Internals and Design Principles, 7/E William Stallings. Chapter 1 Computer System Overview

Best Practices for Setting BIOS Parameters for Performance

Chapter 2 Parallel Hardware

Designing a Cluster for a Small Research Group

Cisco Wide Area Application Services and Cisco Nexus Family Switches: Enable the Intelligent Data Center

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Multicore Computing and Scientific Discovery

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics

Introduction to parallel Computing

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Lecture 9: MIMD Architecture

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez

BİL 542 Parallel Computing

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Getting the Best Performance from an HPC Cluster: BY BARIS GULER; JENWEI HSIEH, PH.D.; RAJIV KAPOOR; LANCE SHULER; AND JOHN BENNINGHOFF

The Future of Interconnect Technology

Hybrid Model Parallel Programs

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Assessment of LS-DYNA Scalability Performance on Cray XD1

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

High Performance Computing (HPC) Introduction

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

High-Performance and Parallel Computing

Linux Clusters for High- Performance Computing: An Introduction

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

Real Parallel Computers

Real Parallel Computers

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Four Components of a Computer System

Introduction to Parallel Programming

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

VARIABILITY IN OPERATING SYSTEMS

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

A Case for High Performance Computing with Virtual Machines

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

What are Clusters? Why Clusters? - a Short History

Using an HPC Cloud for Weather Science

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

Introduction to Parallel Programming

Lecture 9: MIMD Architectures

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Optimizing LS-DYNA Productivity in Cluster Environments

Trends in HPC (hardware complexity and software challenges)

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Creating High Performance Clusters for Embedded Use

Why Multiprocessors?

Twos Complement Signed Numbers. IT 3123 Hardware and Software Concepts. Reminder: Moore s Law. The Need for Speed. Parallelism.

Was ist dran an einer spezialisierten Data Warehousing platform?

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Accelerating Implicit LS-DYNA with GPU

Parallelism. Parallel Hardware. Introduction to Computer Systems

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

Introduction to High-Performance Computing

Building NVLink for Developers

Chapter 7 The Potential of Special-Purpose Hardware

4. Hardware Platform: Real-Time Requirements

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

ECE 8823: GPU Architectures. Objectives

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

2008 International ANSYS Conference

Top 5 Reasons to Consider

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Accelerating image registration on GPUs

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Simplify System Complexity

General Purpose GPU Computing in Partial Wave Analysis

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

What does Heterogeneity bring?

White Paper: Graphics Processing Units in Enterprise Architectures

2 TEST: A Tracer for Extracting Speculative Threads

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Parallels Virtuozzo Containers

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Transcription:

The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes relevant to HPC users only when their specific requirements can be met in a cloud environment. In broad terms these requirements involve hardware and OS/software dependencies. The more generic the requirements an HPC user has (particularly for hardware), the greater the probability their requirements can be met by a cloud. The hardware computing environment that HPC applications require, as opposed to the OS/software environment, is much harder to realize from a general cloud computing resource provider. This is because HPC users are generally interested in getting as much performance from hardware as possible. A knowledge of the CPU type, amount of L2 Cache, floating point operations per clock cycle, availability of additional accelerator hardware such as a GPU, special purpose FPGA or a Cell processor, bus architecture, the amount of memory per core, availability of parallel file system versus standard NFS storage, an Ethernet network versus a high-performance interconnect and a good compiler all contribute to how well a given code will perform. By appropriately modifying their code to mach the environment, a skilled researcher can speedup their job runtime 10 to 100 times. At the same time there are situations were running a job on a generic environment is sufficient enough no matter how long it takes to finish. This is generally due to insufficient local resources and getting access to something that runs slower is better than not getting anything at all. In any case the cloud environment must meet some level of applicability to be useful. Because the revenue model of many of the large cloud computing providers, such as Google and Amazon, is to use idle cycles from their fairly generic hardware, there is no incentive (at this time) for them to provide the highly optimized hardware that is often required for HPC applications. While as of now you can specify things like memory, CPU speed, number of cores and physical proximity of systems, to date no company has taken the next step and made truly HPC-caliber hardware resources available. For the main consumers of cloud services a typical applications might be one that includes a website, some type of application processing and the population of a database. In this case the user may only care about the CPU speed, memory and storage space. It is into this type of environment that we look to properly match with HPC requirements. 1

Definition Just as with Grid computing, cloud computing has many different definitions and interpretations. In this paper we are defining cloud computing in terms of HPC as a configurable versus a fixed resource such as those provided through a computational Grid. Configurable in this context means the OS/software operating environment can be tailored to a specific application/code run. This could be OS, kernel level, libraries and other software (versus hardware) related dependencies. As we mentioned above no cloud computing service that we are aware of offers hardware tailoring that would be of interest in HPC, i.e., highspeed interconnects and storage/scratch space. Hardware tailoring would be done by utilizing a given cloud resource that meets a specific requirement. Currently there is no way to dynamically change hardware other than by either targeting specific resources or by some type of scheduling/resource allocation system. Basically the hardware you need must already be installed on the resource you want to use. HPC Use Cases and Hardware Requirements High performance computing can be divided into two major use cases - serial or parallel and a hybrid called multi-threaded which is essentially parallel computing on a single node. Parallel can further be divided into loosely or tightly coupled. The most demanding characteristic is the amount of dependency or communication (coupling) between multiple processes being used for a given job. Serial (Single Threaded) A good candidate for cloud computing would be serial applications; those that run simultaneous but separate threads on separate hardware (hosts). These single threaded jobs, will runs as fast as the CPU, memory and I/O hardware permits. The only communication is done when a given process completes and writes results back to a central location. The job finishes when all processes complete.. This use case is often referred to as embarrassingly parallel as there is no dependency on other process that are running and can easily be scaled in a distributed computing environment depending on the ability of a given application to do so. A good example of this type of scenario is SETI@home run through the Berkeley Open Infrastructure for Network Computing (BOINC) project. Here over 500,000 hosts can work on discrete pieces of data and produce their results without any requirement to synchronize with any other host except for the main SETI@home server. Note that some users run commercial applications in their serial slots so licensing will be a factor as some of these licenses are limited to either number of nodes they can run or specific nodes identified by MAC address. It is entirely possible that some applications will not be able to be used in a cloud environment. Multi Threaded These are jobs that share memory on the same node and all threads must run on the same node. Most C/C++ and Fortran compilers can distribute 2

the compute intensive part of a job into multiple threads with the insertion of some compiler directives by the programmer. OpenMP is the industry standard for these kinds of jobs. This type of computing is becoming increasingly important as core counts increase. Both Intel and AMD have invested substantial resources to allow their chips to run multi-threaded applications more efficiently. Parallel (Distributed Memory) Jobs that run in this environment are CPU intensive, memory intensive or both CPU as well as memory intensive. Memory intensive jobs are those jobs whose memory requirements are higher than the maximum memory that a single node can provide no matter how fast the CPUs are. The distributed parallel jobs often send part of the data or the computations into multiple nodes interconnected through a network switch. Each of the threads may need to physically exchange or update the local memory with other threads on the same node or remote nodes periodically. This makes the parallel jobs both latency and memory data bandwidth dependent. Some parallel jobs become impossible to scale beyond a few nodes unless a faster interconnect fabric such as Infiniband or Myrinet is used. The amount (bandwidth) and the time sensitive nature of this communication (latency) determine the application coupling. In this case the network or interconnect determines how efficiently a given job is processed. A cloud resource would have some type of network, generally 100Mb Fast Ethernet or in some cases 1Gb Gigabit Ethernet. In contrast most tightly coupled clusters rely on at least Gigabit Ethernet while higher performance systems rely on a special interconnect such as Infiniband, Myrinet or other proprietary fabrics which have high bandwidth (10Gb or higher) and low latency (in the low, single figure microsecond range). Note that there is no reason a tightly coupled job could not run on a 100Mb network. But, depending on how much communication the application requires between hosts, the slowdown in processing a given job could be many orders of magnitude to the point that the hardware was for all intents and purposes useless. Custom Operating Environments One of the benefits of the current Grid environment is also a drawback to usability. That is, the ability to construct and run in a custom operating environment does not currently exist, as Grid resources are available only in a static, predefined way. On the positive side, with well-defined specifications, it makes matching requirements to resources much easier. For some users however, this rigidity inhibits running their code in an optimized (for them) environment. Frequently users have prototyped their applications on hardware that they own and have complete control over. Generally the environment they have created fulfills dependencies that are required by their application. In some cases when an application is moved to a new resource they 3

will have to modify it, sometimes substantially. If they were allowed to take their environment with them this precludes this extra step. On the other hand HPC applications that are written with scalability in mind should keep the need for a custom environment to a minimum. Building the Custom Environment For an HPC user with the requirement for a custom operating environment, using cloud resources requires some extra effort. They must recreate their operating environment that is compatible with the OS choices provided by the cloud, load it on to the cloud resource and deploy it over the virtual hardware assigned to them (Amazon, for instance, calls these hardware Instance Types). For most users who have created a custom operating environment, this is a somewhat trivial effort. For other, less sophisticated users, this could be beyond their ability to easily construct. One way around this issue is to provide pre-built environments that have been compiled and tested on a given hardware environment. Most cloud services provide this with environments that can include web servers, databases and application environments on specific operating systems. This can also be done for HPC users where specific libraries, compilers and applications can be pre-built for an environment. One could envision a service that would allow this to be done interactively from a menu of choices. Such things already exist for non-hpc environments and could be adopted for HPC users. How Cloud Computing Could Fit Into the HPC Environment From the various use cases discussed previously, there are several where ondemand resources from a cloud provider could be utilized in an HPC environment. Serial/embarrassingly parallel) jobs Serial-type jobs, with and without the requirement for a custom operating environment, are ideally suited for cloud environments. Generally the custom operating environment requirements are low, possibly limited to specific commercial applications, compilers or libraries. Multi threaded jobs Multi-threaded jobs run more efficiently on hardware that has been optimized for this purpose. CPUs with advanced multi-threading capabilities as well as fast memory and memory bus architecture are better suited for multi-threaded jobs. If a cloud computing provider is willing to make this type of hardware available then this is a viable use case. Multi-threaded jobs can run on less optimized hardware, just not as efficiently. For some use cases this is entirely acceptable. An ad on capability or overflow service on the Grid For serial and multi-threaded jobs, if cloud resources could be coupled to a Grid scheduling system, it would then be possible to extend a Grid with cloud resources when either sufficient resources were not available within 4

the Grid or a large burst requirement needs to be accommodated. In order for this to work it would require certain network and operating environment dependencies. One could look at this use case as either being stable or ad-hoc. For a stable resource, some type of longer term agreement would have to be worked out with the cloud computing provider. The fact that these resources must be paid for would also require careful accounting of their usage. For an ad-hoc resource a method would have to be developed to quickly establish and breakdown a connection with a cloud provider. Some companies are actually offering spare cycles to universities for a tax deduction. The notice given for this availability could be fairly short so a way of establishing this link quickly is extremely important. Supercomputing Centers as an HPC Cloud The national supercomputing centers are in the best position to provide a true HPC cloud environment, one that could be used for serial, multithreaded and parallel applications and could fulfill specific hardware requirements for interconnects, storage and node configurations. Defined loosely, supercomputing centers are a cloud without the ability to provide configurable resources. The ability and desire for the national supercomputing centers to provide a configurable environment is unknown at this time. Conclusions Cloud computing offers great promise for organizations to supplement their computing capabilities without the need to build out their IT infrastructure. Surges in demand can be met more efficiently and the build out of very expensive data centers and the related energy costs can be avoided or mitigated. Given the cost of cloud service are reasonable, this is a very attractive scenario. For HPC users, cloud computing provides a method of satisfying some specific use cases as described above. At this time cloud computing for high-end HPC usage is not a viable solution. It remains to be seen if cloud service providers can develop a revenue model that would make true HPC resources available at a reasonable price. 5