Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Similar documents
A Platform for Fine-Grained Resource Sharing in the Data Center

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CompSci 516: Database Systems

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

The Datacenter Needs an Operating System

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Chapter 5. The MapReduce Programming Model and Implementation

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Massive Online Analysis - Storm,Spark

MapReduce, Hadoop and Spark. Bompotas Agorakis

Spark, Shark and Spark Streaming Introduction

Databases 2 (VU) ( / )

Lecture 11 Hadoop & Spark

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Big Data Hadoop Stack

CS 61C: Great Ideas in Computer Architecture. MapReduce

A BigData Tour HDFS, Ceph and MapReduce

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Distributed Computation Models

MI-PDB, MIE-PDB: Advanced Database Systems

Hadoop/MapReduce Computing Paradigm

Database Applications (15-415)

Challenges for Data Driven Systems

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Resilient Distributed Datasets

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

BigData and Map Reduce VITMAC03

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Hadoop An Overview. - Socrates CCDH

Database Systems CSE 414

A Glimpse of the Hadoop Echosystem

MapReduce. U of Toronto, 2014

CSE 444: Database Internals. Lecture 23 Spark

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Warehouse- Scale Computing and the BDAS Stack

Programming Systems for Big Data

Cloud Computing & Visualization

Introduction to MapReduce Algorithms and Analysis

Distributed Systems CS6421

CISC 7610 Lecture 2b The beginnings of NoSQL

Map Reduce. Yerevan.

Introduction to Data Management CSE 344

Outline. CS-562 Introduction to data analysis using Apache Spark

Distributed Systems 16. Distributed File Systems II

Embedded Technosolutions

Spark Overview. Professor Sasu Tarkoma.

Introduction to Hadoop and MapReduce

A Review Paper on Big data & Hadoop

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Big Data Infrastructures & Technologies

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

Map-Reduce. Marco Mura 2010 March, 31th

Hadoop. Introduction / Overview

what is cloud computing?

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Distributed computing: index building and use

Introduction to Data Management CSE 344

Hadoop. copyright 2011 Trainologic LTD

Distributed Computing with Spark and MapReduce

Distributed File Systems II

Principal Software Engineer Red Hat Emerging Technology June 24, 2015

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

CA485 Ray Walshe Google File System

An Introduction to Big Data Analysis using Spark

HADOOP FRAMEWORK FOR BIG DATA

Webinar Series TMIP VISION

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Cloud Computing at Yahoo! Thomas Kwan Director, Research Operations Yahoo! Labs

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Shark: Hive (SQL) on Spark

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

MapReduce. Stony Brook University CSE545, Fall 2016

Microsoft Big Data and Hadoop

Batch Processing Basic architecture

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

A Survey on Big Data

Introduction to MapReduce

CSE6331: Cloud Computing

Hadoop ecosystem. Nikos Parlavantzas

Certified Big Data and Hadoop Course Curriculum

COSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?

CS 345A Data Mining. MapReduce

Data Platforms and Pattern Mining

High Performance Computing on MapReduce Programming Framework

Shark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley

Transcription:

Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony D. Joseph and John Canny http://inst.eecs.berkeley.edu/~cs162 Note: Some slides and/or pictures in the following are" adapted from slides Ali Ghodsi." 24.2 Background of Cloud Computing" 1990: Heyday of parallel computing, multi-processors 52% growth in performance per year 2002: The thermal wall Speed (frequency) peaks, but transistors keep shrinking Sources Driving Big Data" It s All Happening On-line" Every:" Click" Ad impression" Billing event" Fast Forward, pause, " Friend Request" Transaction" Network message" Fault" " User Generated (Web & Mobile)".." The Multicore revolution 15-20 years later than predicted, we have hit the performance wall Internet of Things / M2M" Scientific Computing" 24.3 24.4 Page 1

Data Deluge" Data Grows Faster than Moore s Law" Billions of users connected through the net WWW, FB, twitter, cell phones, 80% of the data on FB was produced last year Increase over 2010" 60 Storage getting cheaper Store more data 50 Moore's Law 40 Overall Data 30 Particle Accel. 20 10 0 12/2/2013 Anthony D. Joseph and John Canny CS162 UCB Fall 2013 12/2/2013 24.5 Projected Growth" At the same time " 2010 2011 2012 2013 2014 2015 Anthony D. Joseph and John Canny CS162 UCB Fall 2013 24.6 Solving the Impedance Mismatch" Amount of stored data is exploding Computers not getting faster, and we are drowning in data How to resolve the dilemma? Solution adopted by web-scale companies Go massively distributed and parallel 12/2/2013 Anthony D. Joseph and John Canny CS162 UCB Fall 2013 12/2/2013 24.7 7 Page 2 Anthony D. Joseph and John Canny CS162 UCB Fall 2013 24.8

Enter the World of Distributed Systems" Distributed Systems/Computing Loosely coupled set of computers, communicating through message passing, solving a common goal Distributed computing is challenging Dealing with partial failures (examples?) Dealing with asynchrony (examples?) Dealing with Distribution" We have seen several of the tools that help with distributed programming Message Passing Interface () Distributed Shared Memory (DSM) Remote Procedure Calls (RPC) But, distributed programming is still very hard Programming for scale, fault-tolerance, consistency, Distributed Computing versus Parallel Computing? distributed computing=parallel computing + partial failures 24.9 24.10 The Datacenter is the new Computer" Datacenter/Cloud Computing OS" Program == Web search, email, map/gis, Computer == 10,000 s computers, storage, network If the datacenter/cloud is the new computer What is its Operating System? Note that we are not talking about a host OS Warehouse-sized facilities and workloads Built from less reliable components than traditional datacenters 24.11 24.12 Page 3

Classical Operating Systems" Data sharing Inter-Process Communication, RPC, files, pipes, Programming Abstractions Libraries (libc), system calls, Multiplexing of resources Scheduling, virtual memory, file allocation/protection, Datacenter/Cloud Operating System" Data sharing Google File System, key/value stores Programming Abstractions Google MapReduce, PIG, Hive, Spark Multiplexing of resources Apache projects: Mesos, YARN (MRv2), ZooKeeper, BookKeeper, 24.13 24.14 Google Cloud Infrastructure" GFS/HDFS Insights " Google File System (GFS), 2003 Distributed File System for entire cluster Single namespace Petabyte storage Files split into large blocks (128 MB) and replicated across several nodes Big blocks allow high throughput sequential reads/writes Google MapReduce (MR), 2004 Runs queries/jobs on data Manages work distribution & fault- tolerance Colocated with file system Data striped on hundreds/thousands of servers Scan 100 TB on 1 node @ 50 MB/s = 24 days Scan on 1000-node cluster = 35 minutes Apache open source versions Hadoop DFS and Hadoop MR 24.15 24.16 Page 4

GFS/HDFS Insights (2) " Failures will be the norm Mean time between failures for 1 node = 3 years Mean time between failures for 1000 nodes = 1 day Use commodity hardware Failures are the norm anyway, buy cheaper hardware MapReduce Insights" Restricted key-value model Same fine-grained operation (Map & Reduce) repeated on big data Operations must be deterministic" Operations must be idempotent/no side effects" Only communication is through the shuffle Operation (Map & Reduce) output saved (on disk) No complicated consistency models Single writer, append-only data 24.17 24.18 What is MapReduce Used For?" At Google: Index building for Google Search Article clustering for Google News Statistical machine translation At Yahoo: Index building for Yahoo Search Spam detection for Yahoo Mail At Facebook: Data mining Ad optimization Spam detection MapReduce Pros" Distribution is completely transparent Not a single line of distributed programming (ease, correctness) Automatic fault-tolerance" Determinism enables running failed tasks somewhere else again Saved intermediate data enables just re-running failed reducers Automatic scaling" As operations as side-effect free, they can be distributed to any number of machines dynamically Automatic load-balancing" Move tasks and speculatively execute duplicate copies of slow tasks (stragglers) 24.19 24.20 Page 5

MapReduce Cons" Restricted programming model Not always natural to express problems in this model Low-level coding necessary Little support for iterative jobs (lots of disk access) High-latency (batch processing) Addressed by follow-up research Pig and Hive for high-level coding Spark for iterative and low-latency jobs Pig" High-level language: Expresses sequences of MapReduce jobs Provides relational (SQL) operators (JOIN, GROUP BY, etc) Easy to plug in Java functions Started at Yahoo Research Runs about 50% of Yahoo s jobs 24.21 24.22 Example Problem" In MapReduce" Given user data in one file, and website data in another, find the top 5 most visited pages by users aged 18-25 Load Users Filter by age Load Pages Join on name Group on url Count clicks Order by clicks Take top 5 Example from http://wiki.apache.org/pig-data/attachments/pigtalkspapers/attachments/apacheconeurope09.ppt 24.23 24.24 Example from http://wiki.apache.org/pig-data/attachments/pigtalkspapers/attachments/apacheconeurope09.ppt Page 6

In Pig Latin" Translation to MapReduce" Users = load users as (name, age); Filtered = filter Users by age >= 18 and age <= 25; Pages = load pages as (user, url); Joined = join Filtered by name, Pages by user; Grouped = group Joined by url; Summed = foreach Grouped generate group, count(joined) as clicks; Sorted = order Summed by clicks desc; Top5 = limit Sorted 5; store Top5 into top5sites ; Example from http://wiki.apache.org/pig-data/attachments/pigtalkspapers/attachments/apacheconeurope09.ppt 24.25 Notice how naturally the components of the job translate into Pig Latin." Job 1 Load Users Filter by age Job 2 Job 3 Join on name Group on url Count clicks Order by clicks Take top 5 Load Pages Users = load Filtered = filter Pages = load Joined = join Grouped = group Summed = count() Sorted = order Top5 = limit 24.26 Example from http://wiki.apache.org/pig-data/attachments/pigtalkspapers/attachments/apacheconeurope0 Hive" Relational database built on Hadoop Maintains table schemas SQL-like query language (which can also call Hadoop Streaming scripts) Supports table partitioning, complex data types, sampling, some query optimization Spark Motivation" Complex jobs, interactive queries and online processing all need one thing that MR lacks: Efficient primitives for data sharing Developed at Facebook Used for many Facebook jobs Stage 1" Stage 2" Stage 3" Query 1" Query 2" Query 3" Job 1" Job 2" " Iterative job" Interactive mining" Stream processing" 24.27 24.28 Page 7

Spark Motivation" Complex jobs, interactive queries and online processing all need one thing that MR lacks: Efficient primitives for data sharing Examples" HDFS HDFS HDFS HDFS read" write" read" write" iter. 1" iter. 2"..." Input" Stage 1" Stage 2" Stage 3" Query 1" Problem: in MR, the only way to share data Query 2" across jobs is using stable storage Query 3" (e.g. file system) slow" Iterative job" Interactive mining" Stream processing" Job 1" Job 2" " Input" HDFS read" query 1" result 1" Opportunity: DRAM query is 2" getting result cheaper 2" use main memory for intermediate query 3" result 3" results instead of disks"..." 24.29 24.30 Goal: In-Memory Data Sharing" Solution: Resilient Distributed Datasets (RDDs)" Partitioned collections of records that can be stored in memory across the cluster Input" iter. 1" iter. 2"..." Manipulated through a diverse set of transformations (map, filter, join, etc) one-time processing" query 1" query 2" Fault recovery without costly replication Remember the series of transformations that built an RDD (its lineage) to recompute lost data Input" Distributed memory" query 3"..." http://spark.incubator.apache.org/ 12/2/2013 10-100 Anthony D. Joseph faster and John than Canny network CS162 UCB Fall and 2013 disk" 24.31 24.32 Page 8

Administrivia" Project 4 Design Doc due today (12/2) by 11:59pm Code due next week Thu 12/12 by 11:59pm MIDTERM #2 is this Wednesday 12/4 5:30-7pm in 145 Dwinelle (A-L) and 2060 Valley LSB (M-Z) Covers Lectures #13-24, projects, and readings One sheet of notes, both sides Prof Joseph s office hours extended tomorrow: 10-11:30 in 449 Soda RRR week office hours: TBA 24.33 24.34 Datacenter Scheduling Problem " Rapid innovation in datacenter computing frameworks No single framework optimal for all applications" Want to run multiple frameworks in a single datacenter to maximize utilization to share data between frameworks 5min Break" Pregel CIEL" Pig Dryad Percolator 24.35 24.36 Page 9

Where We Want to Go" Today: static partitioning Dynamic sharing Hadoop Solution: Apache Mesos" Mesos is a common resource sharing layer over which diverse frameworks can run Hadoop Node Node Pregel Node Node Hadoop Mesos Pregel Node Node Node Node Pregel Shared cluster 24.37 Run multiple instances of the same framework Isolate production and experimental jobs Run multiple versions of the framework concurrently Build specialized frameworks targeting particular problem domains Better performance than general-purpose abstractions 24.38 Mesos Goals" High utilization of resources Support diverse frameworks (current & future) Scalability to 10,000 s of nodes Reliability in face of failures http://mesos.apache.org/ Mesos Design Elements" Fine-grained sharing: Allocation at the level of tasks within a job Improves utilization, latency, and data locality Resource offers: Simple, scalable application-controlled scheduling mechanism Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks" 24.39 24.40 Page 10

Element 1: Fine-Grained Sharing" Coarse-Grained Sharing (HPC): Fine-Grained Sharing (Mesos): Fw. 3" Fw. 2" 3" Fw. 1" Framework 1" Fw. 1" Fw. 2" Fw. 2" Fw. 2" Fw. 1" Fw. 3" 1" Framework 2" Fw. 3" Fw. 3" Fw. 2" Fw. 2" Fw. 3" Fw. 2" Framework 3" Fw. 1" Fw. 2" 1" Fw. 3" Element 2: Resource Offers" Option: Global scheduler Frameworks express needs in a specification language, global scheduler matches them to resources + Can make optimal decisions Complex: language must support all framework needs Difficult to scale and to make robust Future frameworks may have unanticipated needs Storage System (e.g. HDFS)" Storage System (e.g. HDFS)" + Improved utilization, responsiveness, data locality " 24.41 24.42 Element 2: Resource Offers" Mesos Architecture" Mesos: Resource offers Offer available resources to frameworks, let them pick which resources to use and which tasks to launch " + Keeps Mesos simple, lets it support future frameworks - Decentralized decisions might not be optimal job" scheduler" Mesos" master" Hadoop job" Hadoop scheduler" Allocation module Resource offer" Pick framework to offer resources to" Mesos slave" executor task" Mesos slave" executor task" 24.43 24.44 Page 11

Mesos Architecture" Mesos Architecture" job" Hadoop job" job" Hadoop job" scheduler" Hadoop scheduler" scheduler" Hadoop scheduler" task" Frameworkspecific scheduling" Resource offer = Mesos" list of (node, Pick framework to Allocation availableresources) master" module Resource offer resources to" offer" E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) } Mesos" master" Allocation module Resource offer" Pick framework to offer resources to" Mesos slave" executor task" Mesos slave" executor task" Mesos slave" executor task" Mesos slave" Hadoop executor executor task" Launches and isolates executors" 24.45 24.46 Deployments" Summary" 1,000 s of nodes running over a dozen production services Genomics researchers using Hadoop and Spark on Mesos Spark in use by Yahoo Research Spark for analytics " Hadoop and Spark used by machine learning researchers Cloud computing/datacenters are the new computer Emerging Datacenter/Cloud Operating System appearing Pieces of the DC/Cloud OS High-throughput filesystems (GFS/HDFS) Job frameworks (MapReduce, Apache Hadoop, Apache Spark, Pregel) High-level query languages (Apache Pig, Apache Hive) Cluster scheduling (Apache Mesos) 24.47 24.48 Page 12