Batch Processing Basic architecture

Similar documents
Big data systems 12/8/17

Distributed Systems. COS 418: Distributed Systems Lecture 1. Mike Freedman. Backrub (Google) Google The Cloud is not amorphous

CS MapReduce. Vitaly Shmatikov

Concurrency in Go 9/22/17

Clustering Lecture 8: MapReduce

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Introduction to MapReduce (cont.)

L22: SC Report, Map Reduce

Parallel Computing: MapReduce Jin, Hai

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

CSE 444: Database Internals. Lecture 23 Spark

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Lecture 11 Hadoop & Spark

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Parallel Programming Concepts

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

MI-PDB, MIE-PDB: Advanced Database Systems

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CompSci 516: Database Systems

Improving the MapReduce Big Data Processing Framework

2/4/2019 Week 3- A Sangmi Lee Pallickara

Apache Spark Internals

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Parallel Nested Loops

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

Parallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018

Chapter 4: Apache Spark

Introduction to Data Management CSE 344

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Introduction to Data Management CSE 344

CS 345A Data Mining. MapReduce

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

MapReduce, Hadoop and Spark. Bompotas Agorakis

Hadoop Map Reduce 10/17/2018 1

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce

Big Data Management and NoSQL Databases

Resilient Distributed Datasets

Data-intensive computing systems

An Introduction to Big Data Analysis using Spark

Spark Overview. Professor Sasu Tarkoma.

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

The MapReduce Abstraction

Developing MapReduce Programs

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

a Spark in the cloud iterative and interactive cluster computing

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

An Introduction to Apache Spark

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

CS427 Multicore Architecture and Parallel Computing

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Processing of big data with Apache Spark

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

Programming Models MapReduce

Part A: MapReduce. Introduction Model Implementation issues

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Database Systems CSE 414

CSE 414: Section 7 Parallel Databases. November 8th, 2018

Programming Systems for Big Data

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.

Introduction to Map Reduce

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/24/2018 Week 10-B Sangmi Lee Pallickara

MapReduce Simplified Data Processing on Large Clusters

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

COSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Spark, Shark and Spark Streaming Introduction

Map Reduce. Yerevan.

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

15.1 Data flow vs. traditional network programming

Introduction to Data Management CSE 344

Large-Scale GPU programming

Apache Flink Big Data Stream Processing

STA141C: Big Data & High Performance Statistical Computing

6.830 Lecture Spark 11/15/2017

Introduction to Apache Spark

Concurrency for data-intensive applications

Introduction to MapReduce

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Algorithms for MapReduce. Combiners Partition and Sort Pairs vs Stripes

Introduction to MapReduce

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Chapter 5. The MapReduce Programming Model and Implementation

CS Spark. Slides from Matei Zaharia and Databricks

Introduction to MapReduce

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Transcription:

Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3 4

Submit WordCount.java Launch driver Launch Launch 5 6 Submit FindTopTweets.java driver driver 7 8

Launch driver Launch driver Tweets driver Tweets driver 9 10 Basic architecture s submit applications to the cluster manager manager assigns cluster resources to applications Each launches containers for each application Driver containers run main method of user program Tweets driver App3 App3 Executor containers run actual computation Tweets App3 driver driver Examples of cluster manager: YARN, Mesos Examples of computing frameworks: Hadoop MapReduce, Spark 12 11 12

Two levels of scheduling -level: manager assigns resources to applications Application-level: Driver assigns tasks to run on s A task is a unit of execution that operates on one partition Some advantages: Applications need not be concerned with resource fairness Case Study: MapReduce (Data-parallel programming at scale) manager need not be concerned with individual tasks Easy to implement priorities and preemption 13 14 13 14 Application: Word count Application: Word count Locally: tokenize and put words in a hash map Hello my love. I love you, my dear. Goodbye. How do you parallelize this? Split document by half hello: 1, my: 2, love: 2, i: 1, dear: 1, goodbye: 1 Build two hash maps, one for each half Merge the two hash maps (by key) 15 16 15 16

How do you do this in a distributed environment? When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume, among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. Input document 17 18 requires that they should declare the causes which impel them When in the Course of human events, it becomes necessary for one people to When in the Course of human events, it to the separation. Nature and of Nature's dissolve the political bands which have connected them with another, and to assume, among the Powers of the earth, the separate becomes necessary for one people to God entitle them, a decent respect to the opinions of mankind and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. Partition dissolve the political bands which have connected them with another, and to assume, among the Powers of the earth, the separate and equal station to which the Laws of 19 20

when: 1, in: 1, the: 1, course: 1, of: 1, human: 1, events: 1, it: 1 dissolve: 1, the: 2, political: 1, bands: 1, which: 1, have: 1, connected: 1, them: 1... requires: 1, that: 1, they: 1, should: 1, declare: 1, the: 1, causes: 1, which: 1... nature: 2, and: 1, of: 2, god: 1, entitle: 1, them: 1, decent: 1, respect: 1, mankind: 1, opinion: 1... among: 1, the: 2, powers: 1, of: 2, Compute word counts locally earth: 1, separate: 1, equal: 1, and: 1... when: 1, in: 1, the: 1, course: 1, of: 1, human: 1, dissolve: 1, the: 2, political: 1, bands: 1, which: 1, have: 1, connected: 1, them: 1... requires: 1, that: 1, they: 1, should: 1, declare: 1, the: 1, causes: 1, which: 1... nature: 2, and: 1, of: 2, god: 1, entitle: 1, them: 1, decent: 1, events: 1, it: 1Now what opinion: 1... How to merge results? respect: 1, mankind: 1, among: 1, the: 2, powers: 1, of: 2, Compute word counts locally earth: 1, separate: 1, equal: 1, and: 1... 21 22 Merging results computed locally requires: 1, that: 1, Several options Don t merge requires additional computation for correct results Send everything to one node what if data is too big? Too slow Partition key space among nodes in cluster (e.g. [a-e], [f-j], [k-p]...) when: 1, in: 1, the: 1, course: 1, of: 1, human: 1, events: 1, it: 1 they: 1, should: 1, declare: 1, the: 1, causes: 1, which: 1... nature: 2, and: 1, of: 2, god: 1, entitle: 1, them: 1, decent: 1, respect: 1, mankind: 1, opinion: 1... 1. Assign a key space to each node 2. Partition local results by the key spaces 3. Fetch and merge results that correspond to the node s key space dissolve: 1, the: 2, political: 1, bands: 1, which: 1, have: 1, connected: 1, them: 1... among: 1, the: 2, powers: 1, of: 2, earth: 1, separate: 1, equal: 1, and: 1... 23 23 24

[a-e] [f-j] [k-p] [q-s] [t-z] when: 1, the: 1, in: 1, it: 1, human: 1, course: 1, events: 1, of: 1 causes: 1, declare: 1, requires: 1, should: 1, that: 1, they: 1, the: 1, which: 1 nature: 2, of: 2, mankind: 1, opinion: 1, entitle: 1, and: 1, decent: 1, god: 1, them: 1, respect: 1, [t-z] [q-s] [f-j] bands: 1, dissolve: 1, connected: 1, have: 1, political: 1, the: 1, them: 1, which: 1 among: 1, and: 1, equal: 1, earth: 1, separate: 1, the: 2, powers: 1, of: 2 [a-e] [k-p] Split local results by key space All-to-all shuffle 25 26 [a-e] [f-j] [k-p] [q-s] [t-z] when: 1, the: 1, that: 1, they: 1, the: 1, which: 1, them: 1, the: 2, the: 1, them: 1, which: 1 requires: 1, should: 1, respect: 1, separate: 1 god: 1, have: 1, in: 1, it: 1, human: 1, [a-e] [f-j] [k-p] [q-s] [t-z] when: 1, the: 4, that: 1, they: 1, which: 2, them: 2 requires: 1, should: 1, respect: 1, separate: 1 god: 1, have: 1, in: 1, it: 1, human: 1, bands: 1, dissolve: 1, connected: 1, course: 1, events: 1, among: 1, and: 1, equal: 1, earth: 1, entitle: 1, and: 1, decent: 1, causes: 1, declare: 1 powers: 1, of: 2, nature: 2, of: 2, mankind: 1, of: 1, opinion: 1, political: 1 bands: 1, dissolve: 1, connected: 1, course: 1, events: 1, among: 1, and: 2, equal: 1, earth: 1, entitle: 1, decent: 1, causes: 1, declare: 1 powers: 1, of: 5, nature: 2, mankind: 1, opinion: 1, political: 1 Note the duplicates... Merge results received from other nodes 27 28

MapReduce Partition dataset into many chunks Map stage: Each node processes one or more chunks locally Reduce stage: Each node fetches and merges partial results from all other nodes MapReduce Interface map(key, value) -> list(<k, v >) Apply function to (key, value) pair Outputs set of intermediate pairs reduce(key, list<value>) -> <k, v > Applies aggregation function to values Outputs result 29 30 29 30 MapReduce: Word count MapReduce: Word count map(key, value): // key = document name // value = document contents for each word w in value: emit (w, 1) reduce(key, values): // key = the word // values = number of occurrences of that word count = sum(values) emit (key, count) map combine partition reduce 31 32 31 32

Synchronization Barrier MapReduce 2004 2007 2011 2012 2015 Dryad 33 33 34 Brainstorm: Top K Top K is the problem of finding the largest K values from a set of numbers How would you express this as a distributed application? In particular, what would the map and reduce phases look like? Brainstorm: Top K Assuming that a set of K integers fit in memory Key idea... Map phase: everyone maintains a heap of K elements Reduce phase: merge the heaps until you re left with one Hint: use a heap 35 36 35 36

Brainstorm: Top K Problem: What are the keys and values here? No notion of key here, just assign the same key to all the values (e.g. key = 1) Map task 1: [10, 5, 3, 700, 18, 4] (1, heap(700, 18, 10)) Map task 2: [16, 4, 523, 100, 88] (1, heap(523, 100, 88)) Map task 3: [3, 3, 3, 3, 300, 3] (1, heap(300, 3, 3)) Brainstorm: Top K Idea: Use X different keys to balance load (e.g. X = 2 here) Map task 1: [10, 5, 3, 700, 18, 4] (1, heap(700, 18, 10)) Map task 2: [16, 4, 523, 100, 88] (1, heap(523, 100, 88)) Map task 3: [3, 3, 3, 3, 300, 3] (2, heap(300, 3, 3)) Map task 4: [8, 15, 20015, 89] (2, heap(20015, 89, 15)) Map task 4: [8, 15, 20015, 89] (1, heap(20015, 89, 15)) Then all the heaps will go to a single reducer responsible for the key 1 This works, but clearly not scalable Then all the heaps will (hopefully) go to X different reducers Rinse and repeat (what s the runtime complexity?) 37 38 37 38 What is? General distributed data execution engine Case Study: (Data-parallel programming at scale) Key optimizations General computation graphs (pipelining, lazy execution) In-memory data sharing (caching) Fine-grained fault tolerance #1 most active big data project 39 40 39 40

What is? Spark computational model Most computation can be expressed in terms of two phases Map phase defines how each machine processes its individual partition Spark SQL structured data Spark Streaming real-time MLlib machine learning GraphX graph processing Reduce phase defines how to merge map outputs from previous phase Spark Core Spark expresses computation as a DAG of maps and reduces 41 42 41 42 Reduce phase Map phase Synchronization barrier More complicated DAG 43 44

Spark: Word count Transformations express how to process a dataset Actions express how to turn a transformed dataset into results sc.textfile("declaration-of-independence.txt").flatmap { line => line.split(" ") }.map { word => (word, 1) }.reducebykey { case (counts1, counts2) => counts1 + counts2 }.collect() Pipelining + Lazy execution Transformations can be pipelined until we hit A synchronization barrier (e.g. reduce), or An action Example: data.map {... }.filter {... }.flatmap {... }.groupbykey().count() These three operations can all be run in the same task This allows lazy execution; we don t need to eagerly execute map 45 46 45 46 In-memory caching Store intermediate results in memory to bypass disk access Important for iterative workloads (e.g. machine learning!) Reusing map outputs Reusing map outputs allows Spark to avoid redoing map stages Along with caching, this makes iterative workloads much faster Example: val cached = data.map {... }.filter {... }.cache() (1 to 100).foreach { i => cached.reducebykey {... }.saveastextfile(...) } Example: val transformed = data.map {... }.reducebykey {... } transformed.collect() transformed.collect() // does not run map phase again 47 48 47 48

Logistic Regression Performance 4000 Running Time (s) 3000 2000 1000 127 s / iteration Hadoop Spark Monday Stream processing 0 1 5 10 20 30 Number of Iterations first iteration 174 s further iterations 6 s [Borrowed from Matei Zaharia s talk, Spark: Fast, Interactive, Language-Integrated Computing ] 50 49 50 Application: SELECT count(word) FROM data GROUP BY word cat data.txt tr -s '[[:punct:][:space:]]' '\n' sort uniq -c Using partial aggregation 1. Compute word counts from individual files 2. Then merge intermediate output 3. Compute word count on merged outputs 51 52 51 52

Using partial aggregation 1. In parallel, send to worker: Compute word counts from individual files Collect result, wait until all finished 2. Then merge intermediate output 3. Compute word count on merged intermediates MapReduce: Programming Interface map(key, value) -> list(<k, v >) Apply function to (key, value) pair and produces set of intermediate pairs reduce(key, list<value>) -> <k, v > Applies aggregation function to values Outputs result 53 54 53 54 MapReduce: Programming Interface map(key, value): for each word w in value: EmitIntermediate(w, "1"); reduce(key, list(values): int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); MapReduce: Optimizations combine(list<key, value>) -> list<k,v> Perform partial aggregation on mapper node: <the, 1>, <the, 1>, <the, 1> à <the, 3> reduce() should be commutative and associative partition(key, int) -> int Need to aggregate intermediate vals with same key Given n partitions, map key to partition 0 i < n Typically via hash(key) mod n 55 56 55 56

Fault Tolerance in MapReduce Map worker writes intermediate output to local disk, separated by partitioning. Once completed, tells master node. Reduce worker told of location of map task outputs, pulls their partition s data from each mapper, execute function across data Note: All-to-all shuffle b/w mappers and reducers Written to disk ( materialized ) b/w each stage Fault Tolerance in MapReduce Master node monitors state of system If master failures, job aborts and client notified Map worker failure Both in-progress/completed tasks marked as idle Reduce workers notified when map task is re-executed on another map worker Reducer worker failure In-progress tasks are reset to idle (and re-executed) Completed tasks had been written to global file system 57 58 57 58 Straggler Mitigation in MapReduce Tail latency means some workers finish late For slow map tasks, execute in parallel on second map worker as backup, race to complete task 59 59