Large Scale Computing Infrastructures

GC3: Grid Computing Competence Center Large Scale Computing Infrastructures Lecture 2: Cloud technologies Sergio Maffioletti <sergio.maffioletti@gc3.uzh.ch> GC3: Grid Computing Competence Center, University of Zurich http://www.gc3.uzh.ch/ October 3, 2012

What will we cover today? 1. What is cloud computing? Basic concepts Little taxonomy we will be using during the lectures 2. Running large scale scientific usecases on cloud What are possible models/scenario s that could be used

What is cloud computing? (Wikipedia) It is a paradigm shift whereby details are abstracted from the users who no longer need knowledge of, expertise in, or control over the technology infrastructure in the cloud that supports them. 1 1 Wikipedia

What is cloud computing? (NIST) Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. 2 2 http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf

Virtualization as the foundation for resource provision A virtual machine is taken to be an efficient, isolated duplicate of a real machine. 3 3 Reference: G. J. Popek and R. P. Goldberg (1974): Formal Requirements for Virtualizable Third Generation Architectures Communications of the ACM 17 (7).

Why Virtualization? Execution environments with resource limits and/or resource guarantees. Provide secure, isolated sandboxes for running untrusted applications. Make systems independent of the hardware. Run legacy applications. Provide binary compatibility. Co-locate and consolidate independent workloads. Run multiple operating systems. Treat application suites as appliances by packaging and running each in a virtual machine.

Pros of using Virtualization Server Consolidation Testing and development Dynamic Load Balancing Disaster Recovery Reduction in cost of infrastructure

Cons of using Virtualization Magnified physical failures Degraded performance Complex root cause analysis New management tools

Fundamental components for a data processing infrastructure Computing: the Virtual Appliances where application will run Image Repository: Where Virtual Appliances wil be stored Storage: object storage used for persistent data (e.g. computation resutls) Identity management: authentication system to access Virtual Appliances and deployed services

Prototype of a cloud: the OpenStack example

Prototype of a cloud: the OpenStack example Computing provisioning: provision and manage large networks of virtual machines Storage blocks: create redundant, scalable object storage Image repository: discovery, registration, and delivery services for virtual disk images Identity management: authentication system across the cloud operating system

Prototype of a cloud: the Amazon example

Prototype of a cloud: the Amazon example Computing provisioning: Elastic Compute Cloud (EC2) Storage blocks: Elastic Block Storage (EBS) and Simple Storage Service (S3) Image repository: Amazon Machine Image (AMI) are stored on S3 buckets Identity management: Identity and Access Management (IAM)

Prototype of a cloud: the Amazon example Amazon offers a large variety of infrastructural services and it has started to provide platform services like Elastic MapReduce (EMR) Simple Data Base (SimpleDB) Simple Queue Service (SQS)

Classification of cloud provisioning

Infrastructure as a Service (IaaS) Provisions and manages the physical processing, storage, networking and the hosting environment and cloud infrastructure. Fully outsourced service so businesses do not have to purchase servers, software or equipment Infrastructure providers can dynamically allocate resources for service providers Users have to create/install, manage and monitors services for IT infrastructure operations Examples: Amazon s EC2, RightScale, CloudSigma, FutureGrid

Classification of cloud provisioning

Platform as a Service (PaaS) Provisions and manages cloud infrastructure and middleware; provides development, deployment and administration tools. Infrastructure providers can transparently alter the platforms for their customers unique needs Users have to develop, test, deploy and manage applications hosted in a cloud environment Example: Google App Engine, SalesForce.

Classification of cloud provisioning

Software as a Service (SaaS) Defined as service-on-demand, where a provider will license software tailored Installs, manages, maintains and supports the software application on a cloud infrastructure. Infrastructure providers can allow customers to run applications off their infrastructure, but transparent to the end user Users interact with application/service for process operations Example: Gmail, Facebook, Flickr,...

Classification of cloud provisioning 4 4 image from NIST Cloud Computing Reference Architecture & Taxonomy Working Group.

Cloud architectures Public Clouds Private Clouds Hybrid Clouds

Public Clouds The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services. Only operational expenses No control on cloud stack, dependency on external partner

Private Clouds The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise. Owner organization provides cloud services to his own customers Full control on cloud stack, accounting, allocation

Hybrid Clouds The cloud infrastructure is a composition of two or more clouds (private and public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds). Constraints on own cloud stack: needs to interoperate with public cloud

Running large scale scientific usecases on cloud

We still need to put all pieces together Implement a scalable solution for a large-scale data analysis usecase, still requires to: Design the infrastructure Identify the services and Integrate them together. This has to be done even on cloud.

We still need to put all pieces together How to control the execution flow on the processing instances? Where data will be stored? How data will be made available to the processing instances? What about the results?

An example of common cloud usecase: Web application hosting Highly available and scalable Accommodate with dense peak periods Store accounting and log files in a secured location (with backups) Aimed to long-lasting lifecycle (Start it and let it go)

An example of common cloud usecase: Web application hosting 1. Create a dedicated Virtual Appliance and then customize it (e.g. Amazon AMI) 2. Use a Load Balancer to distribute the load of incoming requests 3. Requests sent across multiple computing instances (e.g. Amazon EC2) 4. Use something like Amazon Auto Scaling to automatically adjusts the number of computing instances according to load conditions (both direction: augment and reduce) 5. Resources and static content used by the web application are stored on a storage service (e.g. Amazon S3)

Example for System Biology ROSETTA is a software suite for predicting and designing protein structures, Run ROSETTA application over 4000 proteins Each protein is a single file Each execution takes 20 minutes

Example from Computational Chemistry GAMESS is a general ab initio quantum chemistry package. Run a new release of GAMESS over a validation suite. New code available on local SVN repository. Validation suite composed of 44.inp files available online. Application needs to be compiled.

Example from Cryptography Factorization of a 768-bit RSA range: 200M - 240M Each execution takes a chunk of 2000 numbers and runs on average for 4h

Webserver-like approach 1. Create a dedicated Virtual Appliance with the application binaries and all dependencies 2. The customized Virtual Appliance will also contains a simple web-server to accept requests for processing 3. Use a Load Balancer to distribute the load of incoming requests across multiple computing instances 4. Use an Auto Scaling service to adjusts the number of computing instances to number of requests (e.g. 1 request per input file to be processed) 5. Results stored on a storage service or on a Database 6. Once all data have been processed, turn all Virtual Appliances off 7. Download results (if needed)

Webserver-like approach This approach works well for small computations (each Virtual Appliance can serve multiple simultaneous requests) Problem arise when large computations are calling for several Virtual Appliance to be instantiated

Use only limited number of Virtual Appliances When it is not possible to allocate all necessary computing units at once to fulfill the data size, it is possible to use a queuing system to store processing requests that will be served by the computing instances when available. In this case, the queues carry messages to be processed in an orderly fashion by application running on the computing instance. The computing instances can read the queue, process the job, and then post the results.

Use only limited number of Virtual Appliances 1. Use a messaging system (e.q. Amazon SQS or ActiveMQ) to queue processing requests 2. Create a dedicated Virtual Appliance with the application binaries and all dependencies 3. The customized Virtual Appliance will also contains a message consumer to process queued requests 4. Each message will contain a reference (link) to the input file to process and the expected location of the result 5. Results stored on a storage service or on a Database 6. Once all data have been processed, turn all Virtual Appliances off 7. Download results (if needed)

When the processing flow has been already implemented for cluster resources Most of the scientific applications have been used in batch-based systems Several tools have been written to automate the execution of specific scientific applications Re-write such tools to cope with cloud is out of scope for a research group. In this case, it is possible to instantiate a batch-based cluster as a collection of Virtual Appliances

Use virtual cluster 1. Create a dedicated set of Virtual Appliances with the application binaries and all dependencies 2. The customized Virtual Appliances will also contain a Job management system 3. Job management system controls the process of accepting, scheduling, starting, managing, and completing batch jobs. 4. Interaction with Virtual Cluster is done like an ordinary computing cluster 5. Processing requests are defined by queuing jobs in the Job management system s queue. 6. Execution needs to be supervised 7. Results stored on a storage service or on a Database 8. Once all data have been processed, turn all Virtual Appliances off 9. Download results (if needed)

CloudBursting Cloud bursting is a resource provisioning model in which an application runs on an on presime computing resource bursts into a public/private cloud when the demand for computing capacity spikes. The advantage is that an organization only pays for extra compute resources when they are needed.

Worth to consider Bare in mind that most of the mentioned components (Load Balancer, queuing system, attached storage,...) exist long before Cloud and do not depend on any virtualized infrastructure Heavily used in IT infrastructures

Worth to consider Not used in scientific computing, mostly because of lack of know-how, difficult to deploy and configure, difficult to integrate, not enough control on the infrastructure. The paradigm shift comes from the fact that through easy provisioning of dedicated Appliances (and a robust platform that allows to integrate them), one can design and implement scalable solutions that would have been non-trivial to deploy from scratch on his/her own local infrastructure.

Homework http://aws.amazon.com/architecture/ 3 working groups Select one illustrated usecase: Fault tolerance and High Availability Log Analysis Large Scale Processing and Huge Data sets Explain the proposed solution Criticize it Provide an alternative