CS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud

Similar documents
BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Training on Amazon AWS Cloud Computing. Course Content

Designing Fault-Tolerant Applications

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Introduction to Cloud Computing

Introduction to Amazon Web Services

CIT 668: System Architecture. Amazon Web Services

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Security & Compliance in the AWS Cloud. Amazon Web Services

Security & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web

Introduction to Amazon Web Services. Jeff Barr Senior AWS /

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

Amazon Web Services Training. Training Topics:

Expected Learning Outcomes Introduction To AWS

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Amazon Web Services. Amazon Web Services

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Amazon Web Services (AWS) Training Course Content

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

Srinath Vaddepally.

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Amazon AWS-Solution-Architect-Associate Exam

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

LINUX, WINDOWS(MCSE),

Introduction to Amazon Cloud & EC2 Overview

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Amazon Web Services: The Pla6orm for Data Science. Jamie Kinney

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Amazon Web Services and Feb 28 outage. Overview presented by Divya

High School Technology Services myhsts.org Certification Courses

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

CIT 668: System Architecture

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Certificate of Registration

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

Netflix OSS Spinnaker on the AWS Cloud

Automating Elasticity. March 2018

Cloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018

Engage with ESRI in the AWS Cloud. Teresa Carlson, VP of Global Public Sector

AWS Course Syllabus. Linux Fundamentals. Installation and Initialization:

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

WHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud.

Introduction to. Amazon Web Services. Thilina Gunarathne Salsa Group, Indiana University. With contributions from Saliya Ekanayake.

AWS Solution Architect Associate

Getting started with AWS security

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

PracticeDump. Free Practice Dumps - Unlimited Free Access of practice exam

MarkLogic Cloud Service Pricing & Billing Effective: October 1, 2018

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

AWS_SOA-C00 Exam. Volume: 758 Questions

Better, Faster, Stronger web apps with Amazon Web Services. Senior Technology Evangelist, Amazon Web Services

AWS Well Architected Framework

Amazon Elastic Compute Cloud (EC2)

Cloud Providers more AWS, Aneka

JIRA Software and JIRA Service Desk Data Center on the AWS Cloud

Introducing Amazon Elastic File System (EFS)

AWS Security Overview. Bill Shinn Principal Security Solutions Architect

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter

Amazon Linux: Operating System of the Cloud

AWS Solution Architecture Patterns

EE 660: Computer Architecture Cloud Architecture: IaaS

CIT 668: System Architecture

Cloud Computing /AWS Course Content

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

About Intellipaat. About the Course. Why Take This Course?

Pass4test Certification IT garanti, The Easy Way!

How to go serverless with AWS Lambda

Security on AWS(overview) Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Cloud Computing. Amazon Web Services (AWS)

Private Cloud Public Cloud Edge. Consistent Infrastructure & Consistent Operations

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD

CLOUD AND AWS TECHNICAL ESSENTIALS PLUS

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Advanced Architectures for Oracle Database on Amazon EC2

Managing Deep Learning Workflows

Project Presentation

Exam Questions AWS-Certified- Developer-Associate

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Confluence Data Center on the AWS Cloud

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

Getting started with AWS security

AWS Security. Stephen E. Schmidt, Directeur de la Sécurité

What is. Thomas and Lori Duncan

Informatica Big Data Management on the AWS Cloud

ArcGIS 10.3 Server on Amazon Web Services

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

HPE Digital Learner AWS Certified SysOps Administrator (Intermediate) Content Pack

Amazon AppStream 2.0: SOLIDWORKS Deployment Guide

Principal Solutions Architect. Architecting in the Cloud

Oracle WebLogic Server 12c on AWS. December 2018

Transcription:

CS15-319: Cloud Computing Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud

Lecture Outline Discussion On Course Project Amazon Web Services 2

Course Project Course Project Phase I-A Phase I-B Phase II Phase III 3

Course Project Course Project Phase I-A Phase I-B Phase II Phase III Introduction to Amazon AWS and Hadoop 4

Project Phase 1-A Get comfortable with the AWS web interface for VM management Provision a single instance manually and analyze instance performance Run and analyze resource micro-benchmarks (CPU and Memory micro-benchmarks) Provision and utilize a Hadoop cluster on Amazon using Apache Whirr and Amazon EC2 and S3 web services Learn how to develop and deploy MapReduce jobs on your Amazon Hadoop cluster Perform scalability and sizing studies for MapReduce applications 5

Course Project Course Project Phase I-A Phase I-B Phase II Phase III Project Management 6

What is Project Management? Project management is a series of flexible and iterative steps through which you identify: Where you want to go A reasonable way to get there Specifics of what to do Specifics of when to do what Project management consists of planning each part of your project via: Stating your projected work Tracking your work 7

The Statement of Work (1/2) The statement of work is a written document that clearly explains what the project is about Your statement of work should include the following sections: 1. Purpose 2. Objectives 3. Constraints 4. Assumptions 8

The Statement of Work (2/2) 1. The purpose section typically includes: A Background Section: This is the first step of all good investigations. You typically provide a very brief review of literature A Scope of Work Section: What will you do? a brief statement describing the major work to be performed A Strategy Section: How will you perform the work? 9

The Statement of Work (2/3) 2. Objectives are the end results achieved by the project 2.1. Each objective should include a description of the desired outcome when the project is completed 3. Constraints are the restrictions on the project (e.g., wanting to complete all your experiments 2 weeks before the end of the semester) 4. Assumptions are the unknowns you assume in developing your plan statements about uncertain information you will take as fact as you conceive, plan and perform the project 10

Tracking the Work A project requires setting a schedule for a series of activities to be performed Your project schedule should: 1. Include estimates of how long each activity will take 2. Identify the order of experiments 3. Show the relationship of experiments to each other (e.g., do they need to be done sequentially or can they be done in parallel) 4. Identify bottlenecks 11

Tools for Tracking the Work Here are two tools that you can use to track your work: Activities plan: A table showing activities (at a coarse-grain level) and their planned start and end dates Gantt chart: A graph consisting of horizontal bars that depict the start date and duration for each sub-activity (at a fine-grain level) Activity Start Date End Date A1 -- -- Activities and Sub- Activities A1 Jan Feb March April A2 -- -- Activities Plan A1.1 A1.2 Sequential sub-activities A2 A2.1 Parallel sub-activities Gantt Chart 12

Keep in Mind Project management is: Not just a planning tool, it is also a training and communication tool A process for identifying what to think about, not how to think about it Merely a tool to help you plan and organize your work It shouldn't become your work, bogging you down in complex manipulations or fancy tables/graphs that look impressive but don t contribute to your progress in the project Effective project management demands that the components of a project be constantly monitored and revised with new information (especially as you make progress and get more experienced) 13

Course Project Course Project Phase I-A Phase I-B Phase II Phase III Developing a MapReduce Application for a Real Problem 14

Project Phase II Develop a MapReduce application for a problem from following domains: Natural Language Processing Image Processing Bioinformatics Domain 15

Image Processing I. Edge Detection Edge detection is a basic operation that is used in applications, such as image recognition Edge detection is representative of a class of applications that highlight or collect points on high contrast (e.g., edges of an object in an image) in the input One of the algorithms which you can implement is the Sobel algorithm 16

Natural Language Processing II. Word Alignment You can implement word alignment, a problem of determining translational correspondence at the word level given a corpus of parallel sentences Automatic word alignment (Brown et al.,1993) has received extensive treatment in natural language processing as it is a vital component of all statistical machine translation approaches Michael geht davon aus -- dass er im haus bleibt Michael assumes that he will stay in 17

Bioinformatics Benchmarks III. Single Nucleotide Polymorphisms (SNPs) One of the applications that you can implement in Bioinformatics is SNP search, which uses the hill climbing search method SNPs are DNA sequence variations that occur when a single nucleotide is altered in a genome sequence Understanding the importance of many recently identified SNPs in human genes has become a primary goal of human genetics 18

Amazon Deployments and Discussions You will deploy and test your application on the Amazon public cloud You will also provide a discussion on: Your experience in applying MapReduce to a real problem Your insights concerning the performance of a real MapReduce application Your thoughts on the applicability of MapReduce to your selected problem Your recommendations regarding the usage of MapReduce for algorithms similar to your selected one 19

Course Project Course Project Phase I-A Phase I-B Phase II Phase III Characterizing Hadoop 20

Workload Characterization In order to capture the processing and the I/O characteristics of MapReduce applications, we have to perform what is referred to as workload characterization Workload characterization is a crucial component of any performance analysis process In this part of the project, you will characterize Hadoop using the MapReduce application you developed in Phase II 21

Why Workload Characterization? Through your workload characterization you can: Provide insights into the Hadoop framework bottlenecks and intricacies and probably trigger framework improvements Provide a quantitative foundation for MapReduce researchers and developers seeking validations for their hypotheses against real-world workloads Help researchers and organizations adopt sound experimental methodology via: Fine-tuning application parameters and achieve performance targets for applications similar to yours 22

Characterization Dimensions Among the dimensions that you can use to pursue your workload characterization are: 1.Data Patterns 7000000 5000000 3000000 2.Concurrency 1000000-1000000 3.Phase Timelines 4.Dataset Types 5.Network Traffic Patterns 6.System Resource Utilization 7.Communication to Computation Ratio Partition Size (Bytes) Mapper ID 212 197 182 167 152 137 122 107 92 77 62 47 32 17 2-1000000-1000000 1000000-3000000 3000000-5000000 5000000-7000000 20 5 10 15 Reducer ID 23

Lecture Outline Discussion On Course Project Amazon Web Services 24

Amazon Web Services (AWS) AWS is a collection of remote computing services that together make up a cloud computing platform AWS offers the following products: Compute (e.g., EC2) Storage (e.g., S3) Database (e.g., RDS) Networking (e.g., VPC) 25

AWS Ecosystem 26

Regions and Availability Zones 27

Regions and Availability Zones Amazon EC2 provides the ability to place instances in multiple locations Amazon EC2 locations are composed of Availability Zones and Regions Regions are dispersed and located in separate geographic areas (US, EU, etc.) You can design your application to be closer to specific customers or to meet legal or other requirements Availability Zones are distinct locations within a Region You can protect your applications from the failure of a single location 28

Amazon s Global Datacenters Amazon EC2 is currently available in eight regions: US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), South America (Sao Paulo), and AWS GovCloud 29

Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud It has been offered in 2006 It is deemed the first real cloud computing product You can rent EC2 instances by the hour Many instance types are available in Amazon 30

Amazon EC2 Amazon EC2 presents a true virtual computing environment, allowing you to: Use web service interfaces to launch instances with a variety of operating systems (bundled into AMIs) Load your instances with your custom application environment Manage your network s access permissions Amazon EC2 reduces the time required to obtain and boot new server instances to minutes This allows you to quickly scale capacity as your computing requirements change 31

EC2 Instance Models 1. On-Demand Instances Pay-by-the hour Start and stop as you wish 2. Reserved Instances Pay-by-the-year Reserved for you, can start-or-stop as needed 3. Spot Instances Bid for unused EC2 capacity Mention your Spot Price and if the market rate is less than your Bid, you get your instance Instance automatically terminates if your Spot Price becomes less than the current market rate 32

EC2 Instance Parameters CPU Power Elastic Compute Unit (ECU) Defined by amazon as the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron/Zeon processor Memory GB I/O performance Low/Moderate/High High-end instances have 10 Gigabit Ethernet 33

EC2 Instance Types Instance Name Mem (GB) CPU Capacity Disk (GB) Platform On-Demand Pricing / hour (Linux) Micro 0.59 Upto 2 ECUs -- 32/64 $0.02 Small 1.7 1 core - 1 ECU 160 32 $0.085 Large 7.5 2 cores, 2 ECUs each 850 64 $0.34 Extra Large 15 4 cores, 2 ECUs each 1690 64 $0.68 High-Mem Extra Large 17.1 2 cores, 3.25 ECUs each 420 64 $0.50 High-Mem Double Extra Large 34.2 4 cores, 3.25 ECUs each 850 64 $1.00 High-Mem Quad Extra Large 68.4 8 cores, 3.25 ECUs each 1690 64 $2.00 High CPU Medium 1.7 2 cores, 2.5 ECUs each 350 32 $0.17 High CPU Extra Large 7 8 cores, 2.5 ECUs each 1690 64 $0.68 Cluster Compute Quad XL 23 33.5 ECUs 1690 64 $1.30 Cluster GPU Quad XL 22 33.5 ECUS + 2x GPUs 1690 64 $2.10 34

EC2 Instance Lifecycle (1/2) Costs $ (instance-hours) ec2-reboot-instances Pending Running ec2-run-instances ec2-terminate-instances Terminated Shutting Down Standard EC2 Instance (Once terminated, data is lost) 35

EC2 Instance Lifecycle (2/2) Costs $ (instance-hours) ec2-reboot-instances Pending Running ec2-run-instances ec2-terminate-instances ec2-stop-instances Terminated Shutting Down ec2-start-instances Stopped ec2-terminate-instances Costs $ (EBS volumes) EBS-backed EC2 Instance (Data persists as long as EBS volume is Intact) 36

Amazon EC2 Characteristics Characteristic Elastic Completely Controlled Flexible Extensible Reliable Secure Inexpensive Description Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days You have complete control of your instances You have the choice of multiple instance types, operating systems, and software packages Amazon EC2 works in conjunction with Amazon S3, Amazon RDS, Amazon SimpleDB and Amazon SQS to provide a complete solution for computing, query processing and storage across a wide range of applications Amazon EC2 offers a highly reliable environment. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 region Amazon EC2 provides numerous mechanisms for securing your compute resources (e.g., Amazon VPC) You pay a very low rate for the compute capacity you actually consume 37

Amazon Simple Storage Service Amazon S3 can be used to store and retrieve any amount of data, at any time, from anywhere on the web You can write, read, and delete objects into S3 containing from 1B to 5TBs of data each (the number of objects you can store is unlimited) Each object is stored in a bucket and a bucket can be stored in one of several regions Objects stored in a region never leave the region unless you transfer them out 38

Amazon Elastic Block Store Amazon Elastic Block Store (EBS) offers persistent storage for Amazon EC2 instances Amazon EBS volumes provide off-instance storage that persists independently from the life of an instance Amazon EBS provides the ability to create point-in-time consistent snapshots of your volumes that are then stored in Amazon S3, and automatically replicated across multiple available zones These snapshots: Can be used as the starting point for new Amazon EBS volumes Can protect your data for long term durability Can be easily shared with co-workers and other AWS developers 39

Amazon CloudWatch Amazon CloudWatch provides monitoring for AWS cloud resources and applications running on AWS It provides you with visibility into resource utilization, operational performance, and overall demand patterns including metrics such as CPU utilization, disk reads and writes, and network traffic Amazon CloudWatch provides a reliable, scalable, and flexible monitoring solution that you can start using within minutes You no longer need to set up, manage, or scale your own monitoring systems and infrastructure 40

Other AWS Products / Services / Features Compute Auto Scaling Elastic Load Balancing Amazon Elastic MapReduce (EMR) Database Amazon Relational Database Service (RDS) Amazon SimpleDB Amazon DynamoDB Networking Amazon Virtual Private Cloud (VPC) Amazon Route 52 Amazon Direct Connect Messaging Amazon Simple Queue Service (SQS) Amazon Simple Email Service (SES) Amazon Simple Notification Service (SNS) Management and Deployment Amazon Identity and Access Management (IAM) Amazon Cloudwatch Amazon Elastic Beanstalk Amazon CloudFormation and much more 41