Virtual Appliances and Education in FutureGrid Dr. Renato Figueiredo ACIS Lab - University of Florida
Background l Traditional ways of delivering hands-on training and education in parallel/distributed computing have non-trivial dependences on the environment Difficult to replicate same environment on different resources (e.g. HPC clusters, desktops) Difficult to cope with changes in the environment (e.g. software upgrades) l Virtualization technologies Remove key software dependences Allow packaging and replication of hands-on, executable educational environments Can be deployed, managed with cloud technologies
Overview l FutureGrid enables new approaches to education and training and opportunities to engage in outreach Cloud, virtualization and dynamic provisioning environment can adapt to the user, rather than expect user to adapt to the environment l FutureGrid education leverages the unique capabilities of the infrastructure and its software to: Reduce barriers to entry and engage new users Use of encapsulated environments ( appliances ) as a primary delivery mechanism of education/ training modules promoting reuse, replication, and sharing
Guiding principles l Fidelity: activities should use full-fledged, executable software: education/training modules Learn using the proper tools l Reproducibility: Creators of content should be able to install, configure, and test their modules once, and be assured of the same functional behavior regardless of where the module is deployed Incentive to invest effort in developing, testing and documenting new modules
Guiding principles l Deployability: Students and users should be able to deploy modules in a simple manner, and in a variety of resources Reduce barriers to entry; avoid dependences upon a particular infrastructure l Community-oriented: Modules should be simple to share, discover, reuse, and expand Create conditions for viral growth
Role of clouds and portal l Executable modules virtual appliances Deployable on FutureGrid resources Deployable on other cloud platforms, as well as virtualized desktops l Community sharing Web 2.0 portal, appliance image repositories An aggregation hub for executable modules and documentation
What is an appliance? l Hardware/software appliances TV receiver + computer + hard disk + Linux + user interface Computer + network interfaces + FreeBSD + user interface
Virtual appliance example LAMP image l Linux + Apache + MySQL + PHP A web server Another Web server copy instantiate Virtualization Layer Repeat
Educational appliances l A flexible, extensible platform for hands-on, lab-oriented education on FutureGrid l Support clustering of resources Virtual machines + virtual networking to create sandboxed modules Virtual Grid appliances: self-contained, pre-packaged execution environments Group VPNs: simple management of virtual clusters by students and educators
Virtual Networking l A single appliance encapsulates software and configuration l Cluster/Grid/Cloud computing Middleware expects a collection of machines, typically on a LAN (Local Area Network) Appliances need to communicate and coordinate with each other Each worker needs an IP address, uses TCP/ IP sockets
Virtual cluster appliances l Virtual appliance + virtual network Hadoop + Virtual Network copy instantiate A Hadoop worker Virtual network Virtual machine Repeat Another Hadoop worker
Grid appliance in a nutshell l Plug-and-play clusters with a preconfigured software environment Linux + (Hadoop, Condor, MPI, ) Scripts for zero-configuration l Hands-on examples, bootstrap infrastructure, and zero-configuration software you re off to a quick start
Virtual Network - GroupVPN l Setting up and managing typical VPNs can be daunting VPN server(s), key distribution, NAT traversal l GroupVPN makes it simple for users to create and manage virtual cluster VPNs l Key insights: Web 2.0 interface: create/manage user groups All the complexity of setting up and managing VPN links is automated
GroupVPN Web interface l You can request to join or create your own VPN group Determines who is allowed to connect to virtual network l You can request to join or create your own appliance group Determines priorities of users on resources owned by their groups
Deploying virtual clusters l Same image, different VPNs Hadoop + Virtual Network GroupVPN Credentials (from Web site) copy instantiate A Hadoop worker Virtual IP - DHCP 10.10.1.1 Group VPN Virtual machine Repeat Another Hadoop worker Virtual IP - DHCP 10.10.1.2
Cloud deployment approach l Generate virtual floppies Through GroupVPN and GroupAppliance Web interface l Deploy appliances image(s) FutureGrid (Nimbus/Eucalyptus), EC2 GUI or command line tools Use APIs to copy virtual floppy to image l Submit jobs; terminate VMs when done
Setting up classes l Classes are setup and managed using the FutureGrid portal l Project proposal: can be a class, workshop, short course, tutorial Needs to be approved by FutureGrid project to become active l Users can be added to a project Users create accounts using the portal Project leaders can authorize them to gain access to resources Students can then interactively use FG resources (e.g. to start VMs)
Use of FutureGrid in classes l Cloud computing/distributed systems classes U.of Florida, U. Central Florida, U. of Puerto Rico, Univ. of Piemonte Orientale (Italy), Univ. of Mostar (Croatia) l Distributed scientific computing Louisiana State University l Tutorials, workshops: Big Data for Science summer school A cloudy view on computing SC 11 tutorial Clouds for science Science Cloud Summer School
Cloud computing classes l Massimo Canonico, U. Piemonte Orientale l Difficulties to overcome: Hardware issues: find enough free physical machines able to host virtual machines Software issues: time to install/configure as many as possible different cloud platforms University was not able to provide me the necessary hardware and software support l Students started to play with FutureGrid After attending few lessons, they were able to start/stop virtual instance with several Cloud Computing platforms
Cloud computing classes l Students used Eucalyptus, OpenStack and Nimbus Half were not computer scientists. l As FutureGrid freely shares their physical machines and their cloud platforms, decided to freely share all materials of my class. Hands-out, configuration files and link to useful documentation are available https://portal.futuregrid.org/contrib/cloudcomputing-class
Cloud computing classes l Graduate-level Cloud computing for Data-Intensive Sciences (Judy Qiu, Fall 2010) Virtualization technologies and tools Infrastructure as a service Parallel programming (MPI, Hadoop) FutureGrid provided a set of software options that made it possible for students to work on different projects along the system stack.
Term Projects Higher Level Languages Cloud Platform Dryad/DryadLINQ #1 Matrix Multiplication (Swapnil,Amit,Pradnay) #2 PhyloD (Ratul,Adrija,Chengming) Iterative MapReduce #3 LDA (Changsi, Yang) #4 MemCache (Saliya, Yiming,Jerome) #5 Avro (Yuduo, Yuan, patanachai) #6 PageRank (Shuo-Huan,Parag) Cloud Infrastructu re Hypervisor/ Virtualizatio n Cloud Infrastructure #7 Nimbus, Eucalyptus (Stephen, Sonali, Shakeela) Cloud Storage #8 Cloud Storage Survey (Xiaoming, Nixiaogang) Virtualization #9 Hypervisor Performance Analysis Project (James, Andrew) (Slide courtesy of Judy Qiu)
Big Data for Science 300+ Students (200 on sites from 10 insbtutes; 100 online) July 26-30, 2010 NCSA Summer School Workshop IU MapReduce and UF Virtual Appliance technologies are supported by FutureGrid. hdp://salsahpc.indiana.edu/tutorial Washington University IBM Almaden Research Center Iowa State University of Minnesota Univ.Illinois at Chicago Michigan State Notre Dame Penn State Johns Hopkins University of California at Los Angeles Indiana University San Diego Supercomputer Center University of Texas at El Paso University of Arkansas University of Florida (Slide courtesy of Judy Qiu)
Demonstration l Deploying virtual appliance node on FutureGrid (Nimbus @ Alamo) l Connecting to virtual machine l Virtual networking l Running sample job
Demonstration l Pre-instantiated VM to save us time: cloud-client.sh --conf alamo.conf --run --name grid-appliance-2.05.03.gz --hours 24 l Connect to VM ssh root@vmip l Check virtual network interface ifconfig l Ping other VMs in the virtual cluster l Submit Condor job
Uploading and sharing images l APIs available to upload images, customize, save, and share images l Community education pages are available FutureGrid Web portal allows users to publish their own content Tutorials, presentations on Web portal; VMs on image repositories
Where to go from here? l Tutorials on FutureGrid and Grid appliance Web sites for various middleware stacks Condor, MPI, Hadoop l A community resource for educational virtual appliances Success hinges on users effectively getting involved If you are happy with the system, let others know! Contribute with your own content virtual appliance images, tutorials, etc
Questions? l More information: http://www.futuregrid.org http://grid-appliance.org l This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF