Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Similar documents
Distributed Computations MapReduce. adapted from Jeff Dean s slides

The MapReduce Abstraction

Big Data Management and NoSQL Databases

Map Reduce Group Meeting

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Parallel Computing: MapReduce Jin, Hai

L22: SC Report, Map Reduce

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

Parallel Programming Concepts

MapReduce: Simplified Data Processing on Large Clusters

The MapReduce Framework

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

CS 61C: Great Ideas in Computer Architecture. MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

MI-PDB, MIE-PDB: Advanced Database Systems

MapReduce: A Programming Model for Large-Scale Distributed Computation

Introduction to Map Reduce

Google: A Computer Scientist s Playground

Google: A Computer Scientist s Playground

Database Systems CSE 414

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

CS427 Multicore Architecture and Parallel Computing

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

Concurrency for data-intensive applications

PaaS Cloud mit Java. Eberhard Wolff, Principal Technologist, SpringSource A division of VMware VMware Inc. All rights reserved

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Large-Scale GPU programming

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Introduction to Data Management CSE 344

MapReduce-style data processing

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Map Reduce. Yerevan.

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

CIT 668: System Architecture. Amazon Web Services

Introduction to Data Management CSE 344

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Next-Generation Cloud Platform

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

MapReduce, Hadoop and Spark. Bompotas Agorakis

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CS 345A Data Mining. MapReduce

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

CompSci 516: Database Systems

CS November 2017

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Cluster-Based Computing

Introduction to MapReduce

Introduction to MapReduce

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

CS November 2018

Clustering Lecture 8: MapReduce

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

CS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

CS MapReduce. Vitaly Shmatikov

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Hadoop/MapReduce Computing Paradigm

Advanced Database Technologies NoSQL: Not only SQL

CA485 Ray Walshe Google File System

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing, MapReduce, and Spark

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Introduction to Cloud Computing

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11

Introduction to Data Management CSE 344

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Distributed Systems COMP 212. Lecture 18 Othon Michail

SURVEY PAPER ON CLOUD COMPUTING

Introduction to MapReduce

MapReduce. Cloud Computing COMP / ECPE 293A

Parallel data processing with MapReduce

Chapter 5. The MapReduce Programming Model and Implementation

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

Distributed Filesystem

Cloud Computing CS

Distributed File Systems II

1. Introduction to MapReduce

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

MapReduce. U of Toronto, 2014

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Transcription:

Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability Manageability No need to procure, maintain, and update computers Large-scale distributed data processing by MapReduce Loosely coupled data intensive computing Can be a standard parallel language other than MPI

Salesforce.com (1999) Provides Customer Relationship Management (CRM) service via network No need to install software and hardware Web interface Outlook, Office, Notes, mobile, offline Customizable By mouse click, or Apex code Multitenant

Amazon Web Services (2002) On-demand elastic infrastructure managed by web services Elastic Compute Cloud (EC2) Web service that provides resizable compute capacity Simple Storage Service (S3) Simple web service I/F to store and retrieve data Elastic Block Store (EBS) Block level storage used by EC2 in the same AZ Automatically replicate within the same AZ Point-in-time snapshots can be persisted to S3 Region and Availability Zone

Amazon CloudFront (2008) Web Service for Content Delivery Low latency, high data transfer, no commitments Cache copies close to end users US, Europe, Japan, Hong Kong No need to maintain web servers By default, support peak speeds of 1 Gbps, and peak rates of 1,000 req/sec Designed for delivery of popular objects Cache poplar objects and remove less poplar objects

Google App Engine (2008) Google provides infrastructure to execute Web apps Python SDK Datastore - Distributed data storage service Data objects have a set of properties Objects are retrieved by properties Not for large scale data processing

Taxonomy of Cloud SaaS (Software as a Service) Google Apps (Gmail, ), CRM Microsoft Online Services PaaS (Platform as a Service) Development of Web apps Force.com, Google App Engine Windows Azure IaaS (Infrastructure as a Service) Amazon EC2, S3 Service Software package Platform Service, Database Infrastructure Hardware

Cloud technology SaaS (Software as a Service) Web 2.0 PaaS (Platform as a Service) Web API Web Service XML, WSDL, SOAP/REST IaaS (Infrastructure as a Service) Virtual machine (Xen, KVM) Virtualization of harddisk, storage and network Service Software package Platform Service, database Infrastructure Hardware

Example of IaaS: Eucalyptus [2009 Nurmi] Cloud controller Storage 2) Store the VM image Each node is virtualized by Xen or KVM 4) Allocate NCs via CC and execute the VM image Cluster controller (CC) Storage controller (SC) Node controller (NC)

Eucalyptus (2) Node controller virtualizes compute node on which VM image is executed (equivalent of EC2) Storage Controller virtualizes block device (EBS) Warlus virtualizes storage (S3) Cloud controller manages the cloud system via Web interface Registers a VM image Allocates a block device Allocates a compute node, execute the VM image, and mount the block device Accesses to storage

Storage system in cloud Availability, reliability Amazon Web Services S3, EBS Can construct any (file) system that uses block device HDFS (using EBS) for Elastic MapReduce Difficult to construct a system beyond Availability Zone and Region Google App Engine Utilize GFS and BigTable Cannot use MapReduce Cannot be geometrically distributed

Summary of cloud computing Resources in cloud computing Inexpensive, always available, reliable, high performance Easy to maintain Realized by virtualization and web interface No need to procure, maintain, and update computers If required, more resources can be obtained by cloud

MapReduce (2004) Programming model and runtime for data processing on large-scale cluster A user specifies map and reduce functions Runtime system does Automatically parallelize Manage machine failure Schedule jobs to efficiently exploit disk and network

Background Google requires to process Inverted index Various graph expression of Web documents Number of pages that each host crawls Set of the hottest query in a day from large amount of crawled documents and Web request logs using hundreds to thousands of compute nodes Large amount of codes for parallelization, data distribution, error handing are required These hide original code for computation

New abstraction (1) Describes only required computation Runtime library hides complicated processes including parallelization, fault handling, data distribution, load balancing Most computation has the following same pattern Input 1 Input 2 Input 3.. Input N map map map.. map K1, v1 K2, v2 K3, v3 K1, K1, v4 v1 K2,... v2 K3, K1, v3 v1 K1, K2, v4 v2 K3,... v3 K1, v4 K1,... v1 K2, v2 K3, v3 K1, v4... Shuffle and sort K1, [v1, v2, v4] K2, [v2, v3]... K3, [v2, v4, v5] K4, [v1, v3]... reduce.. reduce Output 1.. Output N

New abstraction (2) A functional model of user-supplied map and reduce operations enables Easy parallelization of large-scale computation To run failed tasks again for fault tolerance Simple but powerful interface It enables high-performance computation on large-scale cluster by auto-parallelization and auto-distribution

Programming model Input 1 Input 2 Input 3.. Input N map map map.. map K1, v1 K2, v2 K3, v3 K1, K1, v4 v1 K2,... v2 K3, K1, v3 v1 K1, K2, v4 v2 K3,... v3 K1, v4 K1,... v1 K2, v2 K3, v3 K1, v4... Shuffle and sort K1, [v1, v2, v4] K2, [v2, v3]... K3, [v2, v4, v5] K4, [v1, v3]... reduce.. reduce Output 1.. Output N Input, output, intermediate data are set of key/value pair Map and reduce operations are specified by a user Output of map task is sorted by key, and transferred to reduce task

Example: word count Map task emits a word as a key and 1 as a value (doc, this is a pen ) (this, 1), (is, 1), (a, 1), (pen, 1) Reduce task sums a list of values [1 1 1] of each key (this, [1 1 1 1]), (is, [1 1 1]),... (this, 4), (is, 3),...

Pseudocode for word count map(string key, String value): // key: document name // value: document contents for each word w in value: // for each word w, emit (w, 1 ) EmitIntermediate(w, 1 ); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: // sum all counts for each word result += ParseInt(v); Emit(AsString(result));

Execution overview Input files are split into M pieces of 16-64MB per piece Intermediate keys are split into R pieces by user-specified partitioner, and executed by multiple workers

Fault tolerance Indispensable when using hundreds to thousands of nodes Handling worker failures The master pings every workers periodically If no response is received from a worker in a certain amount of time, the master marks the worker as failed Any map tasks completed by the worker, any map task or reduce task in progress on a failed worker are re-scheduled Output of map task is stored to a local disk. If the node fails, the output cannot be read. Output of reduce task is stored to a shared file system, which can be read after the worker failure Handling master failure It is possible by checkpoint/restart mechanism, however, the master failure is not often since it is a single master

Locality Network bandwidth is a relatively scarce resource in PC cluster Input data is stored in Google file system (GFS) The file data is stored on the local disks of the worker nodes Each file is divided into 64MB blocks. 3 copies of each block are stored on different machines Master takes the location information of the input files into account and attempts to schedule a map task on a machine that contains a replica of the corresponding input data Or, on a machine that is on the same network switch Most input data is read locally and consumes no network bandwidth

Task Granularity Let be M map tasks and R reduce tasks M, R >> #workers is ideal Improves dynamic load balancing Speeds up recovery when a worker fails Practical bounds of M and R Implementation issue: master must make O(M+R) scheduling decisions and keep O(M*R) state in memory In practice, M is chosen so that each individual task is 16MB to 64MB of input data R is a small multiple of # worker machines Typical example, M = 200,000 and R = 5,000 using 2,000 worker machines

Backup tasks A straggler, a machine that takes an unusually long time to complete, causes that the total time lengthens A bad disk may slow its read performance from 30MB/s to 1MB/s Other tasks may be scheduled on the machine, which causes competition for CPU, memory, local disk or network bandwidth Master schedules backup executions of the remaining inprogress tasks when the MapReduce operation is close to completion The task completes whenever either execution completes This mechanism can be tuned so that it increases the used computational resources by no more than a few percent Sort example: 44% longer to complete when this is disabled

Refinements User-specified partitioning function for determining the mapping of intermediate key values to the R reduce tasks Ordering guarantees of intermediate key/value pairs User-specified combiner functions For doing partial combination of generated intermediate values with the same key within the same map task To reduce the amount of intermediate data that must be transferred across the network Custom input and output types A mode for execution on a single machine for simplifying debugging and small-scale testing http server function to monitor the execution

Environment of performance evaluation 1,800 nodes of cluster Two 2GHz Xeon with Hyper-Threading enabled 4GB of memory Two 160GB IDE disks Gigabit Ethernet Network configuration Two-level tree-shaped switched network 100-200Gbps of aggregate bandwidth available at the root In the same hosting facility, RTT is less than a millisecond

Grep 10 10 100-byte records (~1TB) Searching for three-character pattern The pattern occurs in 92,337 records M = 15,000 (input data is split into 64MB pieces), R = 1

Data transfer rate over time The rate peaks at over 30GB/s when 1,764 workers has been assigned Startup overhead of workers collecting time of location information Delays of GFS to open 1,000 input files

Sort Sorts 10 10 100-byte records (~1TB) Cf. TeraSort benchmark http://sortbenchmark.org/ Less than 50 lines of user code The final output is written to a set of 2-way replicated GFS files M = 15,000, R = 4,000 Partitioning function uses the initial bytes of the key (12bit?) In general, knowledge of the distribution of keys is required Which can be obtained by prepassing MapReduce operation to obtain a sample of the keys

High data rate by scheduling of map tasks that considers the locality The rate peaks at about 13GB/s It is less than for grep since the sort needs to output the same size of data After 1,700 map tasks complete, the intermediate data is transferred to the reduce tasks Two waves to transfer data Due to storing two copies, the rate is less than for shuffle

Example of larges-scale indexing All indexing processes are written in MapReduce in Google The indexing code is simpler and smaller. 3,800 lines in C++ to 700 lines Easy to change the indexing process The operator intervention is not needed by fault tolerance of MapReduce Easy to improve the performance by adding new machines to the cluster

Summary of MapReduce MapReduce programming model has been successfully used at Google for many different purposes Easy to use It hides details of parallelization, fault tolerance, locality optimization and load balancing A large variety of problems are easily expressible Scales to large clusters of machines comprising thousands of machines It can be obtained by restricting the programing model