TPP On The Cloud Joe Slagel
Lecture topics Introduc5on to Cloud Compu5ng and Amazon Web Services Overview of TPP Cloud components Setup trial AWS and use of the new TPP Web Launcher for Amazon (TWA) Future TPP Direc5on with the Cloud 2
So What is Cloud Compu5ng? Cloud compu5ng is Internet- based compu3ng, whereby shared resources, so9ware, and informa3on are provided to computer and other devices on demand, like the electricity grid. - - Wikipedia 3
Three Aspects of Cloud Compu5ng SaaS So9ware as a Service So9ware applica3ons available via the browser E.g. Gmail, flickr, NCBI IaaS Infrastructure as a Service Storage, servers and networking components provided on demand through the internet E.G Amazon EC2 & S3, Rackspace, IBM, HP, PaaS PlaKorm as a Service Hosted development environment for building and deploying cloud applica3ons E.g Google Apps, Microso9 Azure, Salesforce.com
Cloud Compu5ng Players Source: Gartner (June 2015) 5
Amazon Web Services Collec5on of web compu5ng services offered by Amazon Elas5c IT infrastructure allocate computers, storage, and other services as needed Cost effec5ve - - pay only for what you use Easy to use simple API accessed over HTTP which supports almost every language Elas3c Compute Cloud (EC2) Simple Queue Service Flexible Payments Services SimpleDB AWS Import/ Export Mechanical Turk Simple Storage Service (S3) Elas3c MapReduce And many more Large number of tools available built for it
Amazon Web Services Regions
Amazon S3: Simple Storage Service S3 lets you store files/data on the web in buckets Virtually unlimited storage, bandwidth, and # users No loss of data 99.999999999 % durability/yr Always on - 99.99% availability/yr Files can range from 1 byte to 5 gigabytes * in size with no limit to # files Authen5ca5on mechanisms ensure that data is kept secure and access rights can be granted to specific users. Uses standard h[p REST and SOAP interfaces to access the data that work with any language Storage Data Transfer Requests First Tier S3 Pricing $0.150 $0.11 $0.0295 per GB first 50 TB / month of storage used ( 100 GB = $35/yr ) In: $0.100 Free Out: $0.120 per GB first 10 TB/month (sliding scale) $0.01 per 1,000 PUT, COPY, POST, or LIST requests $0.01 per 10,000 GET and all other requests Delete requests are free
Amazon S3: Management Console Web based, secure management tool for managing S3 storage Features include: Create/delete buckets Create/delete folders Upload or download files Modify proper5es (permissions) 9
Amazon EC2: Elas5c Compute Cloud EC2 allows you launch new computer instances in minutes on demand Can choose from Small to High- CPU Extra Large to Cluster compute instances Choice of large assortment of different OS images (Linux, Windows) or create your own amazon machine image (AMI) Billed only for actual usage/hr + data transfers Full control of the instance Small Large High - CPU Extra Large US East EC2 Linux/Unix Pricing 1.7 GB 32- bit 1 Core, 1 ECU 1, 160 GB storage, Moderate I/O 7.5 GB, 64- bit, 2 Core 2 ECU 1 each 850 GB storage, High I/O 7GB, 64- bit, 8 Core, 2.5 ECU 1 each 1690 GB storage, High I/O $0.08/hr + I/O 2 $0.32/hr + I/O 2 $0.64/hr + I/O 2 1 EC2 Compute Units One unit is equivalent CPU capacity of a 1.0-1.2 GHZ Operon or Xeon processor 2 Data transfer in is $0.10/GB free, transfer out is $0.12/GB for first 10 TB/month
Amazon EC2: Management Console Web based, secure management tool for managing EC2 Start and stop EC2 instances Find, manage, and create Amazon Machine Images (AMIs) Monitor instances with real 5me- opera5onal metrics 11
Gecng Started: Account Crea5on Go to h[ps://aws.amazon.com/ and click on the Create a Free Account bu[on
Gecng Started: AWS Console
Gecng Started: Amazon Key ID and Secret Key Your Amazon API key and secret key are used to programma5cally access Amazon services 1. Under the account menu select Security Creden3als 2. Go to iam users, select user to see Access Key ID/Keys 14
Lecture topics Introduc5on to Cloud Compu5ng and Amazon Web Services Overview of new TPP Cloud components Setup and Trial of the new TPP Web Launcher for Amazon (TWA) Future TPP Direc5on with the Cloud 15
TPP Amazon Images Publicly available Amazon Machine Instances (AMI) for the TPP Based on official public releases of Ubuntu Contain addi5onal open sogware (omssa, myrimatch, etc) Publicly available scripts for building, upda5ng and publishing images Instruc5ons on usage and details documented on wiki site http://tools.proteomecenter.org/wiki/index.php?title=amazon_ec2_ami 16
Using TPP on the Cloud TPP Web Application (TWA) Simple web based launcher to start petunia on a Amazon server Starts up an pre- configured TPP instance Doesn t require any sogware installa5on and is inexpensive to run Great tool for just trying out TPP Can be used when memory and be[er CPU is needed for an analysis TPP Amazon command line tools (amztpp) Advanced command line toolset Launches parallel searches of files across mul5ple nodes Currently supports X!Tandem, OMSSA, MyriMatch, InsPect Manage all aspects of cloud compu5ng including data transfer, scheduling, and instances Great for quickly and inexpensively processing large amounts of data Direct Cloud support in TPP s User Interface, Petunia
TPP Web Launcher for Amazon (TWA) 1. Navigate to h[p://tools.proteomecenter.org/twa 2. Enter your Amazon Key ID and Secret 3. Click Start Instance 4. Welcome to Petunia 5. When you are done just click Stop Instance 18
Using the amztpp tool Requires a separate installa5on not included in the standard TPP installa5on (See wiki for instruc5ons and post on spc- discuss mailing list) Has a simple command line interface amztpp <op3ons> <command> <file(s)> Examples: amztpp xtandem *.mzml => submit xtandem searches amztpp n 5 launch amztpp status amztpp man => start 5 more instances => report current status => display manual Supports X!Tandem, omssa, Myrimatch, and Inspect search engines Can run mul5ple different search engines in parallel on mul5ple instances Data automa5cally copied to the cloud and results downloaded when available Automa5cally launches instances based on amount of data being searched NEW! Execute and monitor X!Tandem searches via Petunia, as well as basic cloud account administra5on 19
Amazon Web Services Cost Hypothe5cal Analysis 100 mzxml files Avg 100MB/file Avg 10 min/file # EC2 Time (Hrs) Cost 1 16.98 $8.10 5 3.7 $8.10 Hours 20.00 15.00 10.00 5.00 0.00 Compute Time vs Cost 1 10 30 50 70 90 $15.00 $10.00 $5.00 $- 100 0.54 $11.34 # Instances Compute Time Cost Actual Results AWS Cost Breakdown Data Set # EC2 / Threads # Files Scan Count Time Cost Cost/ file Amazon S3 1% Amazon SQS 0% 0021 20 / 8x 132 1,372,984 4:03 $ 50.05 $ 0.38 0049 20 / 8x 132 2,279,874 6:08 $ 46.24 $ 0.35 Amazon EC2 99% 20
amztpp: What does it cost? Canis lupus familiaris Data Set Total 982 raw files organized in 35 folders 598 raw files from LTQ Orbitrap 288 raw files from LTQ 96 raw files from LCQ Deca Searched using X!Tandem, InsPecT, MyriMatch and OMSSA Total of 3,928 searches Total AWS cost of $112.74 82% was EC instances Time to comple3on 5.95 hrs 3,915/3,928 Completed (99%) Total of 10,759,379 spectrum Spot price (- - ec2- spot) of $ 0.22 (market $.2160) Max # of EC2 instances (- m) = 100 Max # of parallel upload/download processes (- P) = 10 E C 2 S 3 S Q S Opera5on Spot Price Hours Cost m1.xlarge $ 0.216 95 $ 20.52 m1.xlarge $ 0.22 328 $ 72.16 Subtotal 423 $ 92.68 Opera5on Price Usage Cost PublicIP- In $ 0.12/GB 0.0062 $ 0.00 PublicIP- Out $ 0.12/GB 0.0105 $ 0.00 InterZone- In $ 0.12/GB 0.0211 $ 0.00 InterZone- Out $ 0.12/GB 0.0005 $ 0.00 Subtotal $ 0.00 EC2 Total $ 92.68 Opera5on Price Usage Cost PUT, COPY, POST, LIST $0.01/1,000 11,909 $ 0.12 GET, all other $0.01/10,000 17,433 $ 0.02 Data Transfer In $ - 118.08 $ - Data Transfer Out $ 0.12/GB 165.56 $ 19.87 S3 Total $ 20.00 Opera5on Price Usage Cost Requests $0.01/10,000 58,392 $ 0.06 Data Transfer In $ - 0 $ - Data Transfer Out $ 0.12/GB 0.023 $ 0.00 SQS Total $ 0.06
Advantages of Cloud vs. Cluster Scalable Unlimited amount of disk space As many servers as needed Dependable Large distributed system Secure Resilient Plauorm agnos5c CentOS, Debian, Ubuntu, Windows, etc. 32- bit/64- bit No support costs Tradi5onal Cluster High ini5al startup cost Limited scalability Single point of failure Requires local IT personal for maintenance OS/Hardware lock- in Requires 3 rd party grid sogware (PBS/GridEngine, etc) High ini5al costs and variable support costs Scheduling issues/complexity Users compete for resources
Advantages of Cloud vs. Cluster Performance issues Bandwidth between instances Bandwidth between S3 upload/ downloads Instance performance Resource alloca3on (instances) File I/O Cost model Could get expensive Pay as you go means you also pay for mistakes Lack of control Regulatory compliance Tradi5onal Cluster High Performance Can finely tune network performance Can finely tune hardware performance Excep3onal hardware capabili3es Excep3onal File I/O Dedicated resources Instances immediately available Host MPI Applica5ons Security
TPP Cloud Future Direc5ons Include features for persistently storing MS data and TPP results both using TWA and amztpp Improved performance in file transfers to S3 Launch other search engines in the cloud via Petunia Support more TPP programs (e.g prophets) Ability to share data sets on the cloud 24
Ar5cle in MCP describing AWS & TPP 25
More Informa5on TPP Cloud Services hxp://tools.proteomecenter.org/wiki/index.php?3tle=tpp:cloud Cloud compu5ng with Amazon Web Services hxp://www.ibm.com/developerworks/library/ar- cloudaws1/ Amazon Elas5c Compute Cloud hxp://aws.amazon.com/ec2/ Amazon Simple Storage Solu5on hxp://aws.amazon.com/s3 TPP Mailing List hxp://groups.google.com/group/spctools- discuss 26