CloudMan cloud clusters for everyone

Similar documents
Get your own Galaxy within minutes

Building the Genomics Virtual Lab

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Introduction to Cloud Computing

OPENSTACK: THE OPEN CLOUD

BIG DATA COURSE CONTENT

Google Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda

Backtesting in the Cloud

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

AWS Solution Architecture Patterns

Cloud Computing II. Exercises

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Tokalabs LaunchStation. Software Defined Solution for Inventory Management, Topology Creation, Test Automation, & Resource Utilization

Upcoming Services in OpenStack Rohit Agarwalla, Technical DEVNET-1102

Introduction to Amazon EC2 Container Service (Amazon ECS) Hands On Lab

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

NBIC Cloud. Mattias de Hollander David van Enckevort Leon Mei Rob Hooft

Automated Deployment of Private Cloud (EasyCloud)

INDIGO-DataCloud Architectural Overview

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

LaunchStation Controller

Building a Data-Friendly Platform for a Data- Driven Future

Galaxy. Daniel Blankenberg The Galaxy Team

Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS)

Red Hat OpenStack Platform 10 Product Guide

Container-Native Storage

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Isolation Forest for Anomaly Detection

EDB Ark 2.0 Release Notes

Cloud Computing. Amazon Web Services (AWS)

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

AWS Setup Guidelines

Adding Transparency and Automation into the Galaxy Tool Installation Process

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

SUG Breakout Session: OSC OnDemand App Development

arxiv: v1 [cs.dc] 7 Apr 2014

Galaxy. Data intensive biology for everyone. / #usegalaxy

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

LINUX, WINDOWS(MCSE),

Cloud Essentials for Architects using OpenStack

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Cloud Infrastructure for Research Computing and Laboratory Environment. Bach Dániel, Geist Éva, Guba Sándor, Imre Szeberényi

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

OPENSTACK PRIVATE CLOUD WITH GITHUB

Deploy Mediawiki using Fiware Lab facilities

High-performance computing on Microsoft Azure: GlusterFS

Managing Deep Learning Workflows

Galaxy Project Update

Amazon Web Services Training. Training Topics:

Oracle Cloud IaaS: Compute and Storage Fundamentals

Next Generation Storage for The Software-Defned World

High Performance and Cloud Computing (HPCC) for Bioinformatics

Large Scale Computing Infrastructures

CIT 668: System Architecture. Amazon Web Services

Puppet on the AWS Cloud

Deploying a distributed application with OpenStack

Automated Deployment of Private Cloud (EasyCloud)

Orchestrating the Continuous Delivery Process

Oracle Cloud Using Oracle Big Data Cloud. Release 18.1

Overview of Data Services and Streaming Data Solution with Azure

Amazon Web Services (AWS) Training Course Content

Baremetal with Apache CloudStack

ElasterStack 3.2 User Administration Guide - Advanced Zone

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Automation with Meraki Provisioning API

ADABAS & NATURAL 2050+

App Orchestration 2.0

Faculté Polytechnique

OpenNebula 4.4 Quickstart CentOS 6 and ESX 5.x. OpenNebula Project

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Netflix OSS Spinnaker on the AWS Cloud

Designing MQ deployments for the cloud generation

AT&T Flow Designer. Current Environment

Quick Install for Amazon EMR

Introduction to Cloudbreak

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

Web-Based Visualization and Visual Analysis for High-Throughput Genomics. Jeremy Goecks! Computational Biology Institute

Lambda Architecture for Batch and Stream Processing. October 2018

Azure Certification BootCamp for Exam (Developer)

What s new in MicroStrategy on AWS

GlusterFS Architecture & Roadmap

Automating Elasticity. March 2018

HPE Digital Learner OpenStack Content Pack

Hadoop. Introduction / Overview

NA120 Network Automation 10.x Essentials

JEUDI 19 NOVEMBRE 2015

opennebula and cloud architecture

Top 40 Cloud Computing Interview Questions

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

modencode Galaxy: Uniform ChIP-Seq Processing Tools for modencode and ENCODE Data

Orchestrating Big Data with Apache Airflow

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

About Intellipaat. About the Course. Why Take This Course?

CloudCenter for Developers

VMware vcloud Director for Service Providers

Transcription:

CloudMan cloud clusters for everyone Enis Afgan usecloudman.org

This is accessibility!

But only sometimes

So, there are alternatives

BUT WHAT IF YOU WANT YOUR OWN, QUICKLY

The big picture A. Users in different labs B. Isolated cluster instances C. Dense data center SaaS CloudMan IaaS

What is CloudMan? A cloud manager that orchestrates all of the steps required to provision, manage, and share a compute platform on a cloud infrastructure, all through a web browser.

More specifically CloudMan allows one to create a compute cluster in the cloud, use pre-configured applications, or add one s own. And then share it all.

Where is it used? Galaxy project Genomics Virtual Laboratory (GVL) Endocrine Genomics Virtual Laboratory (endovl) Neuroimaging Virtual Lab Human Communications Services VL NBIC's SARA cloud JCVI, MSI, Harvard Medical School

Deploying a CloudMan Platform 1. An account on the supported cloud 2. Start a master instance via BioCloudCentral.org or the cloud web console 3. Use the CloudMan web interface on the master instance to manage the platform

Starting an Instance Launcher choices usegalaxy.org/cloudlaunch launch.genome.edu.au biocloudcentral.org Cloud dashboard / console Script it via API

Configuring a cluster

Manage Your Cluster

Remote desktop Use VNC server on the instance In-browser access via novnc Just point your browser to <inst IP>:6080 Early days for the feature

Beyond GUI API interface Goal is to enable creation of automated and scalable pipelines while hiding infrastructure details CloudMan as an infrastructure manager Galaxy as a workflow execution engine

BioBlend is a Python library which wraps the Galaxy API and the CloudMan API

http://bioblend.readthedocs.org/

Beyond GUI #2 Command line access (with sudo access) $ ssh ubuntu@<instance IP address> $ sudo -s Run jobs on a cluster $ qsub job_script_from_any_sge_cluster.sh $ qstat -f Use Galaxy tools and reference genome data $ cd ~/gvl_commandline_utilities $ sh run_all.sh $ module avail

CloudMan Platform Features A complete solution for instantiating, running and scaling cloud resources Get a scalable compute cluster: SGE, Hadoop, HTCondor Get an automatically configured Galaxy application Scope of tools and reference datasets exceed Galaxy Main Deployment on AWS, OpenStack, Eucalyptus, and OpenNebula clouds Automated configuration for machine image, tools, and data Wizard-guided startup: requires no computational expertise, no infrastructure, no software Self-contained deployment Ability to re-launch clusters after periods of inactivity Elastic resource scaling: manual or automatic On AWS, support for Spot instances Dynamic persistent storage Use any S3 bucket as a local file system Use managed NFS / Gluster as a file system Use archive URL to download arbitrary file system Share your instance: including all customizations (data, tools & configurations) Easily replicate the EXACT environment

Value Added Features Customizing, Sharing, Scaling

Customize Your Instance Each CloudMan instance is self-contained, meaning that it can be built upon If a tool is missing, simply install it Readily integrates with the Tool Shed Install your tool and make it available With all the configurations and sample data

Share Your Instance Share entire (Galaxy) CloudMan platform Even the customized ones (including data and/or tools) Fully automated solution Publish a self-contained analysis Analyses in progress or complete

Instance sharing

Scaling the infrastructure with the computation

Fixed cluster size Exercising elasticity with Auto-Scaling 5 nodes Computation time: 9 hrs Computation cost: $20 Computation time: 6 hrs 20 nodes Computation cost: $50 Dynamic cluster size 1 to 16 nodes Computation time: 6 hrs Computation cost: $20

Underlying architecture Deploying, coding, extending

System architecture Block storage FS archive Inst. storage or Snap1 Snap.. FS 1 FS... or Managed FS 5 1 Management Console 2 Contextualize image 3 6, 8 Start CloudMan Setup services 7 9 4 Persistent data repository S3/Swift 10 Application(s) (eg, Galaxy) 11 CloudMan MI CloudMan MI CloudMan MI CM-w CM-w CM-w CloudMan machine image CloudMan instance......

Data incarnations Swift container https://swift.rc.nectar.org.au:8888/v1/ AUTH_377/cloudman-os/galaxyFS-2.13.tar.gz Volume snapshots snap-a734hin1 Managed Gluster file system 115.146.85.193:/gvl-vol-replicated wget & untar vol-82ojids2 Block storage mount & nfs share FS archive Inst. storage or Snap1 Snap.. FS 1 FS... or Managed FS

CloudMan infrastructure requirements Customizable machine image Support for instance user data Persistent object store Data volumes and (shareable) volume snapshots Resource metadata (ie, tags)

Building the components Leverage CloudBioLinux build framework A number of flavors exist Core CloudMan image Base Galaxy image Full CloudBioLinux image Semi-automated process See Ron Horst s talk tomorrow

CloudMan-as-a-Platform User-specific platform Application = Infrastructure + execution + + Applications Data environment Easily adaptable Customizable Configurable, Sharable, Scalable Enable easy creation of user-specific cloud platforms Couple the infrastructure, functional application execution environments, applications, and data into a single unit that can easily be used and manipulated by a user.

Extending the platform Batch Cluster jobs Cloudgene MapReduce tools Galaxy Traditional tools CloudMan platform SGE Hadoop HTCondor Spark... AWS OpenStack Eucalyptus OpenNebula...

Packaged platform enables reproducibility Move or share Move or share User-specific platform App App App App Platform A? Platform A required CloudMan CloudMan Cloud provider X Cloud provider Y Cloud provider X Cloud provider Y

- Project start as part of Galaxy - Initial version available on AWS - Support for instance sharing - AWS spot instance support - Use AWS S3 as a local file system * - Versioned deployments - Galaxy & object store integration - Support for cloud bursting 2010 2011 2012 2013 2014 - Expansion of available tools - Deployment automation - Infrastructure scaling: compute and storage - Multi-cloud support: OpenStack, OpenNebula, Eucalyptus - Support for Hadoop jobs, HTCondor - API * planned

Acknowledgments