HPC learning using Cloud infrastructure

Similar documents
Large Scale Sky Computing Applications with Nimbus

Virtual Appliances and Education in FutureGrid. Dr. Renato Figueiredo ACIS Lab - University of Florida

Autonomic Condor Clouds. David Wolinsky ACIS P2P Group University of Florida

By the end of the class, attendees will have learned the skills, and best practices of virtualization. Attendees

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

VMware vsphere Customized Corporate Agenda

VMware Overview VMware Infrastructure 3: Install and Configure Rev C Copyright 2007 VMware, Inc. All rights reserved.

VMware vsphere 6.5 Boot Camp

VMware vsphere with ESX 4.1 and vcenter 4.1

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

Baremetal with Apache CloudStack

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

Lecture 09: VMs and VCS head in the clouds

Virtualization. Michael Tsai 2018/4/16

Module 1: Virtualization. Types of Interfaces

Overview. Prerequisites. VMware vsphere 6.5 Optimize, Upgrade, Troubleshoot

Multiprocessor Scheduling. Multiprocessor Scheduling

VIRTUAL CENTRAL LOCK

The Future of Virtualization Desktop to the Datacentre. Raghu Raghuram Vice President Product and Solutions VMware

The vsphere 6.0 Advantages Over Hyper- V

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

20 Fast Facts About Microsoft Windows Server 2012

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

Next-Generation Cloud Platform

Improving Blade Economics with Virtualization

Logical Operations Certified Virtualization Professional (CVP) VMware vsphere 6.0 Level 2 Exam CVP2-110

The Future of Virtualization. Jeff Jennings Global Vice President Products & Solutions VMware

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Simplified and Consolidated Parallel Media File System Solution

Distributed Systems COMP 212. Lecture 18 Othon Michail

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

vsan Mixed Workloads First Published On: Last Updated On:

IBM Bluemix compute capabilities IBM Corporation

"Charting the Course... VMware vsphere 6.7 Boot Camp. Course Summary

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

VxRAIL for the ClearPath Software Series

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

VMware vsphere with ESX 6 and vcenter 6

Build your own Cloud on Christof Westhues

Virtualization and HA PI Systems: Three strategies to keep your PI System available, scalable, and portable

Cloud Computing the VMware Perspective. Bogomil Balkansky Product Marketing

VMware vsphere with ESX 4 and vcenter

Logical Operations Certified Virtualization Professional (CVP) VMware vsphere 6.0 Level 1 Exam CVP1-110

Comet Virtualization Code & Design Sprint

Citrix CloudPlatform (powered by Apache CloudStack) Version 4.5 Concepts Guide

OpenNebula on VMware: Cloud Reference Architecture

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Introduction to Virtualization

[TITLE] Virtualization 360: Microsoft Virtualization Strategy, Products, and Solutions for the New Economy

When (and how) to move applications from VMware to Cisco Metacloud

Symantec Backup Exec Blueprints

Virtualization Overview. Joel Jaeggli AFNOG SS-E 2013

How it can help your organisation

Chapter 5. The MapReduce Programming Model and Implementation

VMware vsphere 5.5 Advanced Administration

VMware vsphere: Install, Configure, Manage (vsphere ICM 6.7)

A Laconic HPC with an Orgone Accumulator. Presentation to Multicore World Wellington, February 15-17,

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

Copyright 2012 EMC Corporation. All rights reserved.

Solaris Engineered Systems

Technical Deep Dive: VMware Lab Manager. Steven Kishi, Product Manager Wilson Huang, R&D Manager

Red Hat Virtualization 4.1 Technical Presentation May Adapted for MSP RHUG Greg Scott

Distributed File System Support for Virtual Machines in Grid Computing

VMware - VMware vsphere: Install, Configure, Manage [V6.7]

V.I.B.E. Virtual. Integrated. Blade. Environment. Harveenpal Singh. System-x PLM

Containerizing GPU Applications with Docker for Scaling to the Cloud

Large Scale Computing Infrastructures

Online Help StruxureWare Central

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

AltaVault Cloud Integrated Storage Installation and Service Guide for Virtual Appliances

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Introducing SUSE Enterprise Storage 5

Andrew Pullin, Senior Software Designer, School of Computer Science / x4338 / HP5165 Last Updated: October 05, 2015

A Case for High Performance Computing with Virtual Machines

Dell DVS. Enabling user productivity and efficiency in the Virtual Era. Dennis Larsen & Henrik Christensen. End User Computing

An Integration and Load Balancing in Data Centers Using Virtualization

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

On-Premises Cloud Platform. Bringing the public cloud, on-premises

Resiliency Replication Appliance Installation Guide Version 7.2

BUILDING A PRIVATE CLOUD. By Mark Black Jay Muelhoefer Parviz Peiravi Marco Righini

Andrew Pullin, Senior Software Designer, School of Computer Science / x4338 / HP5165 Last Updated: September 26, 2016

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Eliminate the Complexity of Multiple Infrastructure Silos

Virtual Server Agent for VMware VMware VADP Virtualization Architecture

Linux Automation.

EMC Business Continuity for Microsoft Applications

Workload management at KEK/CRC -- status and plan

Microsoft Office SharePoint Server 2007

Nested Virtualization and Server Consolidation

Windows Server 2012 Hands- On Camp. Learn What s Hot and New in Windows Server 2012!

CIS : Computational Reproducibility

Getting to Know Apache CloudStack

Exam : VMWare VCP-310

The OnApp Cloud Platform

Expert Reference Series of White Papers. Understanding Data Centers and Cloud Computing

Transcription:

HPC learning using Cloud infrastructure Florin MANAILA IT Architect florin.manaila@ro.ibm.com Cluj-Napoca 16 March, 2010

Agenda 1. Leveraging Cloud model 2. HPC on Cloud 3. Recent projects - FutureGRID 4. Open research problems 5. Conclusions

1. Leveraging Cloud model

Cloud support beneficial to HPC Application portability Image management Simulate scaling VM migration Better resource utilization Rapid provisioning

Application portability Users maintains their own virtualized OS Linux, Windows, Solaris,... Isolation: No conflict: OS level, libraries, software,... Concurrent mix of different environments on same physical server Subnetwork: cluster of servers

Image management Capture an HPC environment Create & customize your own Vendor supplied software stack Usage Save, restore Share, publish Archive legacy environment Possible in any storage configuration LVM, SAN LUN, VMWare

Simulate scaling Develop & debug large scale execution Time consuming, system not productive Result and execution time not a factor Shared cluster: more efficient usage: don't need to tie up resource when ready, execute on dedicated cluster

VM migration Fault tolerance Detect pending failure: disk error, high temperature, communication error,... Migrate VM to healthy server Load balancing High end servers server 1 VM server 2 Requirements SAN, separate network for migration Our observation: 300-400msec down time SAN

Better resource utilization Consolidate multiple VM's on same node Multicore CPU in high end server idle time due to disk I/O, blocking send/receive Network bandwidth Applications have different communication requirement Tradeoff: Flexibility in job time More users, full system utilization

Rapid provisioning New VM in about 2-3 minutes No work by IT team: Cluster created automatically when requested, freed up automatically when done With self service portal: Change thinking about IT infrastructure Enable more experimentation Vehicle for teaching

Concerns about Cloud Overhead Virtualization cost Computation, memory, disk I/O, networking Reliable performance Resource sharing in VM's Communication capacity

2. HPC on Cloud

Typical use cases HPC environment resources dedicated (physical) applications tuned to environment batch scheduled: MOAB + xcat Public Cloud resources virtualized and shared personalized environment run continously, create/start/stop by user

Developing & running HPC applications Dedicated cluster Virtualized environment Custom scheduler Grid model Install grid management software Launch applications Map Reduce Model Install Hadoop or use ready made template Launch applications

Map Reduce, a new computation model Model developed by Google 1. map: transmute input to (key, value) pair 2. reduce: reduce set of (key, value) into one Hadoop runtime developed by Yahoo Distribution Redundant computation: fault tolerant File system for distributed data Simple distributed computing model Applicable for many applications: log processing, Web index building

Word Count Dataflow (Hairong Kuang, Yahoo)

Word Count Example (Hairong Kuang, Yahoo) Mapper Input: value: lines of text of input Output: key: word, value: 1 Reducer Input: key: word, value: set of counts Output: key: word, value: sum Launching program Defines the job Submits job to cluster

Current Hadoop projects http://wiki.apache.org/hadoop/poweredby Yahoo: More than 25,000 computers running Hadoop (100,000 CPUs) Largest cluster: 4,000 nodes (2*4cpu boxes, 4*1TB disk & 16GB RAM) Support research for Ad Systems and Web Search Scaling tests to support development of Hadoop on larger clusters

3. Recent projects: FutureGRID

FutureGrid The goal of FutureGrid is to support the research on the future of distributed, grid, and cloud computing. FutureGrid will build a robustly managed simulation environment or testbed to support the development and early use in science of new technologies at all levels of the software stack: from networking to middleware to scientific applications. The environment will mimic TeraGrid and/or general parallel and distributed systems FutureGrid is part of TeraGridand one of two experimental TeraGrid systems (other is GPU) This test-bed will succeed if it enables major advances in science and engineering through collaborative development of science applications and related software. FutureGrid is a (small 5600 core)science/computer Science Cloud but it is more accurately a virtual machine based simulation environment

Future Grid FutureGrid Hardware

FutureGrid Usage Scenarios Developers of end-user applications who want to develop new applications in cloud or grid environments, including analogs of commercial cloud environments such as Amazon or Google. Is a Science Cloud for me? Is my application secure? Developers of end-user applications who want to experiment with multiple hardware environments. Grid/Cloud middleware developers who want to evaluate new versions of middleware or new systems. Networking researchers who want to test and compare different networking solutions in support of grid and cloud applications and middleware. (Some types of networking research will likely best be done via through the GENI program.) Education as well as research Interest in performance requires that bare metal important

Education and Training Importance of experimental work in contemporary distributed systems research Needs also to be addressed in education Complement to fundamental theory FutureGrid: a testbed for experimentation and collaboration around new architectures Education and training key to: Enable new users to quickly get started Enable students to experiment with FutureGrid technologies from core to bleeding-edge Foster dissemination of FutureGrid architectures

Goals and Approach A flexible, extensible platform for hands-on, lab-oriented education on FutureGrid Focus on usability- lowering barriers to entry Plug and play, open-source Apply virtualization and social networking technologies to create educational sandboxes Virtual Grid appliances: self-contained, prepackaged execution environments Group VPNs: simple management of virtual clusters by students and educators

Background Virtual appliances Encapsulate software environment in an image Virtual disk file(s) and virtual hardware configuration Grid appliance at UF Encapsulates cluster software environments Current examples: Condor, MPI, Hadoop Homogeneous images at each node Virtual LAN connecting nodes to form a cluster Deploy within or across domains

Appliance Deployments PlanetLab overlay: ~450 nodes, 24/7 on a shared infrastructure Archer cluster ramp-up: UFL, NEU, UMN, UTA, FSU

Appliance interface VM Hardware configuration User files Domain tools Linux + IPOP+ Condor

Connecting virtual nodes The virtual LAN is a self-configuring VPN IP-over-P2P overlay network IPOP flexibility, usability in the setup and management of a virtual cluster Users can configure and manage their own VPN groups using simple interfaces GroupVPN:all-to-all connectivity within group; membership managed by group owner Web interface: Grid appliance; FutureGrid, or deploy your own (WebUI appliance ongoing work)

GroupVPN Example: Archer 1: Download appliance 2. Create/join VPN group Download config Free pre-packaged Archer Virtual appliances - run on free VMMs (VMware, VirtualBox, KVM) CMS, Wiki, YouTube: Community-contributed content: applications, datasets, tutorials Archer Global Virtual Network Archer seed resources 450 cores, 5 sites 3. Boot appliances Automatic connection to group VPN self-configuring DHCP Middleware: Condor scheduler NFS file systems

References/Contact FutureGrid http://www.futuregrid.org Grid appliances used in FutureGrid education http://www.grid-appliance.org HPC Group Europe http://www.hpc-g.eu.org

4. Open research problems

Open research related to HPC Networking latency/bandwidth reduce overhead, better sharing Scheduling gang scheduling for VM's Placement algorithm optimize for power consumption, utilization Migrate applications to new computation model: Map Reduce

Research on networking Virtualization support in advanced NIC e.g. Infiniband Shift virtualization from software to firmware Reduce latency 50% and double throughput Virtual machines can support HPC applications I/O requirement without impact

5. Conclusions

Some successful educational Cloud Google Cloud: Built by IBM HiPODS team, supported by NSF for university research University-based Cloud: Virtual Computing Lab, North Carolina State University Collaboration with IBM University Relations Apache open source Lightweight infrastructure, effective for teaching purpose

Current state Concerns and perception Virtualization overhead not significant Relatively reliable performance can be achieved Many alternatives to organize Cloud + HPC Continuing research, optimization

Challenge of HPC administration Dynamic computing power/platform request Maintenance cost increase: staff, monitoring, management, etc Flexible infrastructure request Need dynamic & flexible system to meet user faster!! Need improve computing effectiveness!! Need a shared & simple management system!!

Compelling arguments for Cloud Advantages: Application portability Image management Simulate scaling VM migration Better resource utilization Rapid provisioning Impact: Easier to use and manage for both users and administrator Interesting new research areas

Thank you! 39 Cloud Computing 6/30/2009