More AWS, Serverless Computing and Cloud Research

Basics of Cloud Computing Lecture 7 More AWS, Serverless Computing and Cloud Research Satish Srirama

Outline More Amazon Web Services More on serverless computing Cloud based Research @ Mobile & Cloud Lab 3/27/2018 Satish Srirama 2/34

Cloud Providers and Services we Amazon Web Services Amazon EC2 Amazon S3 Amazon EBS Amazon Elastic Load Balancing Amazon Auto Scale Amazon CloudWatch Eucalyptus OpenStack SciCloud Management providers AWS Management Console OpenStack Horizon RightScale PaaS Google AppEngine Windows Azure already discussed 3/27/2018 Satish Srirama 3/34

MORE AWS 3/27/2018 Satish Srirama 4

AWS we discuss AWS Management Console AWS Identity and Access Management AWS CloudFormation Amazon Elastic MapReduce 3/27/2018 Satish Srirama 5/34

AWS Management Console Hope some of you have started using Amazon accounts You can manage your complete Amazon account with management console AMI Management Instance Management Security Group Management Elastic IP Management Elastic Block Store Key Pair management etc. Have different panes for different services 3/27/2018 Satish Srirama 6/34

AWS Management Console - screenshot https://console.aws.amazon.com/

AWS Identity and Access Management (IAM) How can an enterprise or group of people use a single credit card? Manage IAM users Create new users and manage them Create groups Manage permissions Creating policies Manage credentials Create and assign temporary security credentials 3/27/2018 Satish Srirama 8/34

IAM policy Example policy giving access to complete EC2 http://aws.amazon.com/iam/ 3/27/2018 Satish Srirama 9/34

AWS CloudFormation Provides an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion It is based on templates model Templates describe the AWS resources, the associated dependencies, and runtime parameters to run an app. The templates describe stacks, which are set of software and hardware resources. Something similar to CloudML and RightScale server templates Hides several details How the AWS services need to be provisioned Subtleties of how to make those dependencies work. 3/27/2018 Satish Srirama 10/34

AWS CloudFormation Amazon provides several pre-built templates to start common apps as: WordPress (blog) LAMP stack Gollum (wiki used by GitHub) There is no additional charge for AWS CloudFormation. You pay for AWS resources (e.g. EC2 instances, Elastic Load Balancers, etc.) http://aws.amazon.com/cloudformation/ 3/27/2018 Satish Srirama 11/34

Amazon Elastic MapReduce Last lecture already discussed it as Data Processing as a Service Web interface and command-line tools for running Hadoop jobs on EC2 Data stored in Amazon S3 Monitors job and shuts machines after use Running a job Upload job jar & input data to S3 Create the cluster Create a Job Flow as steps Wait for the completion and examine the results http://aws.amazon.com/elasticmapreduce/ 3/27/2018 Satish Srirama 12/34

Other interesting AWS Amazon Relational Database Service Provides access to the capabilities of familiar database engines MySQL, Oracle or Microsoft SQL Server NoSQL databases Simple DB DynamoDB AWS Snowball & Snowmobile Exabyte-scale data transfer services used to move extremely large amounts of data to AWS 3/27/2018 Satish Srirama 13/34

A BIT MORE ON SERVERLESS COMPUTING 3/27/2018 Satish Srirama 14

3/27/2018 Satish Srirama 15

Serverless computing - continued Newer workloads are a better fit for event driven programming Execute application logic in response to database triggers Execute app logic in response to sensor data Execute app logic in response to scheduled tasks etc. Applications are charged by compute time (millisecond) rather than by reserved resources Greater linkage between cloud resources used and business operations executed Serverless in a nutshell Event-action platforms to execute code in response to events 3/27/2018 Satish Srirama 16/34

Current platforms for serverless 3/27/2018 Satish Srirama 17/34

Apache OpenWhisk Initiated by IBM but now an Apache project Open source cloud platform Serverless deployment and operations model Optimal utilization and granular pricing Scales on a per-request basis Supports JS, Swift, Python, Java, Docker 3/27/2018 Satish Srirama 18/34

Triggers, actions, rules (and packages) Services or data sources define the events they emit as triggers Developers associate the actions to handle the events via rules Packages are a shared collection of Triggers and Actions 3/27/2018 Satish Srirama 19/34

Triggers and Rules Trigger examples changes to database records IoT sensor readings that exceed a certain temperature new code commits to a GitHub repository simple HTTP requests from web or mobile apps Rule is an association of a trigger to an action Many to many mapping 3/27/2018 Satish Srirama 20/34

Actions They can be small snippets of JavaScript or Swift code custom binary code embedded in a Docker container Instantly deployed and executed whenever a trigger fires It is also possible to directly invoke an action by using the OpenWhisk API, CLI, or ios SDK A set of actions can also be chained CLI Command line interface 3/27/2018 Satish Srirama 21/34

Actions - continued Create an action : wsk action create firstaction Hello-Python.py Invoke an action : wsk action invoke firstaction-- result Update an action : wsk action update firstaction Hello-Python.py 3/27/2018 Satish Srirama 22/34

System overview https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md 3/27/2018 Satish Srirama 23/34

The internal flow of processing For more information read https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md

CLOUD BASED RESEARCH @ MOBILE & CLOUD LAB 3/27/2018 Satish Srirama 25

Scientific Computing on the Cloud Public clouds provide very convenient access to computing resources On-demand and in real-time As long as you can afford them High performance computing (HPC) on cloud Virtualization and communication latencies are major hindrances [Srirama et al, SPJ 2011; Batrashev et al, HPCS 2011] Things have improved significantly over the years Containerization is a good alternative Research at scale Cost-to-value of experiments 3/27/2018 Satish Srirama 26/34

Communication pattern of Cluster vs Cloud Cloud has huge troubles with communication/transmission latencies Virtualization technology is the culprit Performance Comparison of virtual machines and Linux containers (e.g. Docker) [Srirama et al, SPJ 2011] 3/27/2018 Satish Srirama 27/34

Migrating Scientific Workflows to the Cloud Workflow can be represented as weighted directed acyclic graph (DAG) Partitioning the workflow into groups with graph partitioning techniques [Srirama and Viil, HPCC 2014] Such that the sum of the weights of the edges connecting to vertices in different groups is minimized Utilized Metis multilevel k-way partitioning Scheduling the workflows with tools like Pegasus Considered peer-to-peer file manager (Mule) for Pegasus A framework is developed to help the scientist with the procedure [Viil and Srirama, JSC 2018]

Adapting Computing Problems to Cloud Reducing the algorithms to cloud computing frameworks such as MapReduce [Srirama et al, FGCS 2012] Designed a classification on how the algorithms can be adapted to MR Algorithm single MapReduce job Monte Carlo, RSA breaking Algorithm nmapreduce jobs CLARA (Clustering), Matrix Multiplication Each iteration in algorithm single MapReduce job PAM (Clustering) Each iteration in algorithm nmapreduce jobs Conjugate Gradient Applicable especially for Hadoop MapReduce 3/27/2018 Satish Srirama 29/34

Issues with Hadoop MapReduce It is designed and suitable for: Data processing tasks Embarrassingly parallel tasks Has serious issues with iterative algorithms Long start up and clean up times ~17 seconds No way to keep important data in memory between MapReduce job executions At each iteration, all data is read again from HDFS and written back there at the end Results in a significant overhead in every iteration 3/27/2018 Satish Srirama 30/34

Alternative Approaches Restructuring algorithms into non-iterative versions CLARA instead of PAM [Jakovits & Srirama, Nordicloud 2013] Alternative MapReduce implementations that are designed to handle iterative algorithms [Jakovits and Srirama, HPCS 2014] E.g. Twister, HaLoop, Spark [Distributed Data Processing on the Cloud - LTAT.06.005 (Fall 2018)] Alternative distributed computing models Bulk Synchronous Parallel model [Valiant, 1990] [Jakovits et al, HPCS 2013] Building a fault-tolerant BSP framework (NEWT) [Kromonov et al, HPCS 2014] 3/27/2018 Satish Srirama 31/34

This week in lab Continue working with Google AppEngine 3/27/2018 Satish Srirama 32/34

Next Lecture Continue with Research on cloud 3/27/2018 Satish Srirama 33/34

References Check Amazon videos and webinars at http://aws.amazon.com/resources/webinars/ Mike Roberts, Serverless Architectures, https://martinfowler.com/articles/serverless.html Abel Avram, FaaS, PaaS, and the Benefits of the Serverless Architecture, https://www.infoq.com/news/2016/06/faas-serverless-architecture Apache OpenWhisk - https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md List of Publications - Satish Narayana Srirama - http://math.ut.ee/~srirama/publications.html [Srirama et al, FGCS 2012] S. N. Srirama, P. Jakovits, E. Vainikko: Adapting Scientific Computing Problems to Clouds using MapReduce, Future Generation Computer Systems Journal, 28(1):184-192, 2012. Elsevier press. DOI 10.1016/j.future.2011.05.025. [Jakovits and Srirama, HPCS 2014] P. Jakovits, S. N. Srirama: Evaluating MapReduce Frameworks for Iterative Scientific Computing Applications, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 226-233. IEEE. [Kromonov et al, HPCS 2014] I. Kromonov, P. Jakovits, S. N. Srirama: NEWT - A resilient BSP framework for iterative algorithms on Hadoop YARN, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 251-259. IEEE. [Srirama and Viil, HPCC 2014] S. N. Srirama, J. Viil: Migrating Scientific Workflows to the Cloud: Through Graphpartitioning, Scheduling and Peer-to-Peer Data Sharing, 16th Int. Conf. on High Performance and Communications (HPCC 2014) workshops, August 20-22, 2014, pp. 1137-1144. IEEE. [Viil and Srirama, JSC 2018] J. Viil, S. N. Srirama: Framework for Automated Partitioning and Execution of Scientific Workflows in the Cloud, The Journal of Supercomputing, ISSN: 0920-8542, 2018. Springer. DOI: 10.1007/s11227-018-2296-7 (In Print) [Jakovits and Srirama, Nordicloud 2013] P. Jakovits, S. N. Srirama: Clustering on the Cloud: Reducing CLARA to MapReduce, 2nd Nordic Symposium on Cloud Computing & Internet Technologies (NordiCloud 2013), September 02-03, 2013, pp. 64-71. ACM. [Jakovits et al, HPCS 2013] P. Jakovits, S. N. Srirama, I. Kromonov: Viability of the Bulk Synchronous Parallel Model for Science on Cloud, The 2013 (11th) International Conference on High Performance Computing & Simulation (HPCS 2013), July 01-05, 2013, pp. 41-48. IEEE. [Srirama et al, SPJ 2011] S. N. Srirama, O. Batrashev, P. Jakovits, E. Vainikko: Scalability of Parallel Scientific Applications on the Cloud, Scientific Programming Journal, Special Issue on Science-driven Cloud Computing, 19(2-3):91-105, 2011. IOS Press. DOI 10.3233/SPR-2011-0320. [Batrashev et al, HPCS 2011] O. Batrashev, S. N. Srirama, E. Vainikko: Benchmarking DOUG on the Cloud, The 2011 International Conference on High Performance Computing & Simulation (HPCS 2011), 2011, pp. 677-685. IEEE. [Valiant, 1990] L. G. Valiant: A bridging model for parallel computation, Commun. ACM, vol. 33, no. 8, pp. 103 111, Aug. 1990.