Cloud Architectures. Jinesh Varia. Amazon Web Services. Technology Evangelist

Similar documents
Amazon Web Services Cloud Computing in Action. Jeff Barr

Chapter 5. The MapReduce Programming Model and Implementation

Principal Solutions Architect. Architecting in the Cloud

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Amazon Web Services CSE 490H. This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License.

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

CIT 668: System Architecture. Amazon Web Services

Big Data 7. Resource Management

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Welcome to the New Era of Cloud Computing

Introduction to MapReduce

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Amazon Web Services: Building a Web-Scale Computing Infrastructure. Jeff Barr Senior Web Services Evangelist

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Introduction to Data Management CSE 344

Prototyping Data Intensive Apps: TrendingTopics.org

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

How to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Designing Fault-Tolerant Applications

Hadoop An Overview. - Socrates CCDH

Big Data for Engineers Spring Resource Management

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

High Performance Computing on MapReduce Programming Framework

Large Scale Processing with Hadoop

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

The Bits Are In The Clouds

Hadoop/MapReduce Computing Paradigm

DATA MINING II - 1DL460

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Amazon Elastic MapReduce. API Reference API Version

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

CISC 7610 Lecture 2b The beginnings of NoSQL

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The MapReduce Framework

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to MapReduce (cont.)

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Project Presentation

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Intro to Software as a Service (SaaS) and Cloud Computing

Amazon AWS-Solution-Architect-Associate Exam

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CS November 2018

CS November 2017

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng

amazon.com s Journey to the Cloud Jon Jenkins AWS Summit June 13, 2011

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Introduction to Amazon Web Services. Jeff Barr Senior AWS /

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MapReduce, Hadoop and Spark. Bompotas Agorakis

CS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud

Ahead in the Cloud. Matt Wood TECHNOLOGY EVANGELIST

CS 345A Data Mining. MapReduce

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin

Big Data Infrastructure at Spotify

PaaS Cloud mit Java. Eberhard Wolff, Principal Technologist, SpringSource A division of VMware VMware Inc. All rights reserved

AWS Solution Architecture Patterns

Embedded Technosolutions

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

The Gearman Cookbook OSCON Eric Day Senior Software Rackspace

Launch and Scale your Social Game in the Cloud with AltEgo, Amazon Web Services and RightScale

Big Data Management and NoSQL Databases

AWS Certified Solutions Architect - Associate 2018 (SAA-001)

1. Stratified sampling is advantageous when sampling each stratum independently.

A Review Paper on Big data & Hadoop

Introduction to. Amazon Web Services. Thilina Gunarathne Salsa Group, Indiana University. With contributions from Saliya Ekanayake.

Training on Amazon AWS Cloud Computing. Course Content

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

The Evolution of a Data Project

MySQL in the Cloud Tricks and Tradeoffs

The ATLAS EventIndex: Full chain deployment and first operation

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Sensor Data Collection and Processing

Data-Intensive Distributed Computing

Programming Systems for Big Data

Hadoop Map Reduce 10/17/2018 1

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

WHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud.

AWS_SOA-C00 Exam. Volume: 758 Questions

what is cloud computing?

AWS Interview Questions and Answers

Database Systems CSE 414

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

Distributed Computing.

HaLoop Efficient Iterative Data Processing on Large Clusters

CS 345A Data Mining. MapReduce

7/22/2008. Transformations

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Transcription:

Cloud Architectures Jinesh Varia Technology Evangelist Amazon Web Services

ANIMOTO.COM

Scale: 50 servers to 3500 servers in 3 days

Winner December 2007 Prize:Golden Hammer Photo: Smashing the hardware

TimesMachine from NY Times 1851-1922 Articles TIFF -> PDF Input: 11 Million Articles (4TB of data) What did he do? 100 EC2 Instances for 24 hours All data on S3 Output: 1.5 TB of Data Hadoop, itext, JetS3t Under $400

CS290F : Scalable Internet Services USCB Fall 2006 Prof created an app to manage team usage Ruby on Rails Complete Stack: From Load balancer, App Server to DB Learn how to scale: Simulated load Generated Graphs All course contents, students assignments, lessons learned are on the Wiki

CS345a : Data Mining @ Stanford Tools used: Shell/Linux/Java Hadoop on EC2 Data set on S3 Datasets :NetFlix, Alexa, IR datasets from TREC Class organization: Stanford Winter 2007 30-35 Students Each Team spawns 10-15 Hadoop slave nodes TA created Getting- Started AMIs (& scripts) TA managed the students usage

Cloud Architectures

Cloud Architectures Hardware Infrastructure Job execution time time

Cloud Architectures Hardware Infrastructure Job execution time time

Shrink your processing time CPUs time

Shrink your processing time CPUs time

Main Problems Technical How to co-ordinate jobs between machines (distributed processing)? What if a machine fails? How will I Scale-out? Business How do I get management signoff? Resources to manage the infrastructure? How do I get rid of the Idle Infrastructure? Hadoop Web Services Cloud Computing

Let s take a usecase Web Company : Analyze large-data sets of clickstream logs Social Networking Company : Analyze demographic and market data Phone Company : Locate all customers who have called in a given area Large Retailer Chain : Wants to know what items a particular customer bought last month Surveillance Company : Wants to transcodevideo for last several years PharmaCompany : Wants locate people who were prescribed a certain drug

GrepTheWeb

What s so cool about GrepTheWeb? RegEx WWW

Examples of Patterns Source Code intx = 40 + i Any thing with punctuation Hey! he said, Are you ok? Case Sensitive Function CallOrderController() Equations f(x) = x^2 Other Patterns (dis)integration of life, Email Address

Zoom Level 1 RegEx Input dataset (List of Document Urls) Alexa GrepTheWeb Service GetStatus Subset of document URLs that matched the RegEx

Zoom Level 2 Amazon SQS Distributed Transient Buffer Never Lose a message StartGrep RegEx Amazon SQS Input Files (Alexa Crawl) Amazon S3 Infinitely Scalable Storage in the cloud Ideal for small short-lived Highly Available, Amazon Durable SimpleDB and Reliable Amazon EC2 Database in the cloud messages Manage phases Resizable Computing Private and Public Storage Capacity in the cloud Controller Access control Pay by the GB Lightweight Query-able User info, Attribute Launch, Monitor, Store Spawn Server Instances Job status info Shutdown Message using a Locking Web Service call Root Level Access Amazon SimpleDB DB Amazon EC2 Cluster Amazon S3 GetStatus Get Output Pay by the hour Distributed and Partitioned Input Output Pay by GB, Pay per Query

Zoom Level 3 Amazon SQS Billing Queue StartGrep Launch Queue Monitor Queue Shut down Queue Billing Service Controller Launch Monitor Shutdown Controller Controller Controller Billing Controller Insert JobID, Status launch Insert EC2 info ping Get EC2 Info Shutdown Check for results GetStatus Status DB Master M Slaves N HDFS Put File Output Input Get Amazon HadoopCluster on File SimpleDB Amazon EC2 Amazon S3 Get Output Input Files (Alexa Crawl)

Zoom Level 4 User1 StartJob 1 Servic e User2 StartJob 2 Map Map Map.. Map Tasks Map Map Map.. Map Tasks Combin e Reduce Hadoop Job Combine Reduce Hadoop Job StopJob 1 StopJob 2 Store status and results Get Result

SideTrack: WordCountExample MAPPER: For each input record, extract a set of key/value pairs that we care about the each record Hi Hadoop, Bye Hadoop Input Map Input key value pairs ( Hi, 1), ( Hadoop, 1), ( Bye, 1), ( Hadoop, 1) key 1 Values.. key 3 Values.. REDUCER: For each extracted key/value pair, combine it with other values that share the same key ( Hadoop, [1,1]) Key 1 All Values.. Reduce Aggregate ( Hadoop, 2) Final Key 1 Values.. Source: Doug Cutting s Slide Deck on Hadoop

Zoom Level 5 (HadoopMapReduce) MAPPER: For each input record, extract a set of key/value pairs that we care about the each record (LineNumber, s3pointer) Input Map Input key value pairs (s3pointer, [matches]) key 1 Values.. key 3 Values.. REDUCER: For each extracted key/value pair, combine it with other values that share the same key Key 1 All Values.. Reduce Aggregate Identity Function Final Key 1 Values.. Source: Doug Cutting s Slide Deck on Hadoop

The Open Source Hadoop framework is giving developers the power to do some pretty extraordinary things.

The Open Source Hadoop frameworkon Amazon EC2/S3is giving every developerthe power to do some pretty extraordinary things.

References Running Hadoop MapReduce on Amazon EC2 and Amazon S3 http://developer.amazonwebservices.com/connect/entry.jspa?externali D=873 Hadoop on Amazon EC2 Step By Step Wiki http://wiki.apache.org/hadoop/amazonec2 Taking Massive Distributed Computing to the Common Man - Hadoop on Amazon EC2/S3 http://aws.typepad.com/aws/2008/02/taking-massive.html

Thank you! Jinesh Varia jvaria@amazon.com http://aws.amazon.com /s3 /ec2 /sdb /sqs /forums /resources /blog