Database Systems CSE 414

Similar documents
Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CSE 344 MAY 2 ND MAP/REDUCE

Why compute in parallel?

Introduction to Data Management CSE 344

Distributed Computations MapReduce. adapted from Jeff Dean s slides

CompSci 516: Database Systems

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

CSE 414: Section 7 Parallel Databases. November 8th, 2018

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Big Data Management and NoSQL Databases

The MapReduce Abstraction

MI-PDB, MIE-PDB: Advanced Database Systems

Database Management Systems CSEP 544. Lecture 6: Query Execution and Optimization Parallel Data processing

MapReduce: Simplified Data Processing on Large Clusters

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 11 Parallel DBMSs and MapReduce

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Parallel Nested Loops

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

6.830 Lecture Spark 11/15/2017

Database Applications (15-415)

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

CS 61C: Great Ideas in Computer Architecture. MapReduce

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Parallel Computing: MapReduce Jin, Hai

Part A: MapReduce. Introduction Model Implementation issues

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

L22: SC Report, Map Reduce

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Introduction to MapReduce

CS 345A Data Mining. MapReduce

Database System Architectures Parallel DBs, MapReduce, ColumnStores

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce, Hadoop and Spark. Bompotas Agorakis

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

Clustering Lecture 8: MapReduce

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

MapReduce. Stony Brook University CSE545, Fall 2016

Lecture 11 Hadoop & Spark

Hadoop Map Reduce 10/17/2018 1

Introduction to MapReduce (cont.)

MongoDB DI Dr. Angelika Kusel

Distributed Computation Models

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

The MapReduce Framework

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Parallel Programming Concepts

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Map Reduce. Yerevan.

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

Databases 2 (VU) ( / )

Improving the MapReduce Big Data Processing Framework

Programming Models MapReduce

Programming Systems for Big Data

TI2736-B Big Data Processing. Claudia Hauff

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Advanced Data Management Technologies

1. Introduction to MapReduce

Distributed computing: index building and use

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

Databases 2 (VU) ( )

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Map Reduce Group Meeting

Introduction to Map Reduce

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

CS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.

Introduction to MapReduce

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Cloud Computing CS

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

MapReduce-style data processing

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Batch Processing Basic architecture

Google: A Computer Scientist s Playground

Google: A Computer Scientist s Playground

Introduction to MapReduce Algorithms and Analysis

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

CSE 444: Database Internals. Lecture 23 Spark

Introduction to MapReduce

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing, MapReduce, and Spark

MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University

Database Systems CSE 414

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

MapReduce: A Programming Model for Large-Scale Distributed Computation

Chapter 4: Apache Spark

Utah Distributed Systems Meetup and Reading Group - Map Reduce and Spark

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

CSE 344 Final Examination

CS427 Multicore Architecture and Parallel Computing

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

MapReduce and Friends

Transcription:

Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) CSE 414 - Fall 2017 1

Announcements HW5 is due tomorrow 11pm HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before arriving Follow the first part of the Spark setup instructions ( Setting up an AWS account ) to get credits for free use https://courses.cs.washington.edu/courses/cse414/17au/spark/spark-setup.html note that this may take a while to process Remember to terminate cluster when not in use!!! Otherwise, you will be charged lots of $$$$ CSE 414 - Fall 2017 2

Optional Reading Original paper: https://www.usenix.org/legacy/events/osdi04/t ech/dean.html Rebuttal to a comparison with parallel DBs: http://dl.acm.org/citation.cfm?doid=1629175.1 629198 Chapter 2 (Sections 1, 2, 3 only) of Mining of Massive Datasets, by Rajaraman and Ullman http://i.stanford.edu/~ullman/mmds.html CSE 414 - Fall 2017 3

Distributed File System (DFS) For very large files: TBs, PBs Each file is partitioned into chunks, typically 64MB Each chunk is replicated several times ( 3), on different racks, for fault tolerance Implementations: Google s DFS: GFS, proprietary Hadoop s DFS: HDFS, open source CSE 414 - Fall 2017 4

MapReduce Google: paper published 2004 Free variant: Hadoop MapReduce = high-level programming model and implementation for large-scale parallel data processing CSE 414 - Fall 2017 5

MapReduce Process Read a lot of data (records) Map: extract info you want from each record Shuffle and Sort Looks familiar Reduce: aggregate, summarize, filter, transform Write the results Paradigm stays the same, change map and reduce functions for different problems CSE 414 - Fall 2017 6 slide source: Jeff Dean

Data Model Files! A file = a bag of (key, value) pairs A MapReduce program: Input: a bag of (inputkey, value) pairs Output: a bag of (outputkey, value) pairs CSE 414 - Fall 2017 7

Step 1: the MAP Phase User provides the MAP-function: Input: (input key, value) Ouput: bag of (intermediate key, value) System applies the map function in parallel to all (input key, value) pairs in the input file CSE 414 - Fall 2017 8

Step 2: the REDUCE Phase User provides the REDUCE function: Input: (intermediate key, bag of values) Output: bag of output (key, value) pairs System groups all pairs with the same intermediate key, and passes the bag of values to the REDUCE function CSE 414 - Fall 2017 9

Example Counting the number of occurrences of each word in a large collection of documents Each Document The key = document id (did) The value = set of words (word) map(string key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, 1 ); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += 1; Emit(key, AsString(result)); CSE 414 - Fall 2017 10

MAP REDUCE (did1, v1) (w1, 1) (w2, 1) (w3, 1) Shuffle (w1, (1, 1, 1,, 1)) (w1, 25) (did2, v2) (w1, 1) (w2, 1) (w2, (1, 1, )) (w3, (1, )) (w2, 77) (w3, 12) (did3, v3).... CSE 414 - Fall 2017 11

Jobs & Tasks A MapReduce Job One single query, e.g. count the words in all docs More complex queries may consists of multiple jobs A Map Task, or a Reduce Task A group of instantiations of the map-, or reducefunction, which are scheduled on a single worker CSE 414 - Fall 2017 12

Workers A worker is a process that executes one task at a time Typically there is one worker per processor, hence 4 or 8 per node CSE 414 - Fall 2017 13

Fault Tolerance If one server fails once every year... then a job with 10,000 servers will fail in less than one hour MapReduce handles fault tolerance by writing intermediate files to disk: Mappers write file to local disk Reducers read the files (=reshuffling); if the server fails, the reduce task is restarted on another server CSE 414 - Fall 2017 14

MAP Tasks REDUCE Tasks (did1, v1) (w1, 1) (w2, 1) (w3, 1) Shuffle (w1, (1, 1, 1,, 1)) (w1, 25) (did2, v2) (w1, 1) (w2, 1) (w2, (1, 1, )) (w3,(1, )) (w2, 77) (w3, 12) (did3, v3).... CSE 414 - Fall 2017 15

MapReduce Execution Details Reduce (Shuffle) Task Output to disk, replicated in cluster Intermediate data goes to local disk: M R files (why?) Map Task Data not necessarily local File system: GFS or HDFS CSE 414 - Fall 2017 16

MapReduce Phases Local storage ` CSE 414 - Fall 2017 17

Implementation There is one master node Master partitions input file into M splits, by key Master assigns workers (=servers) to the M map tasks, keeps track of their progress Workers write their output to local disk, partition into R regions Master assigns workers to the R reduce tasks Reduce workers read regions from the map workers local disks CSE 414 - Fall 2017 18

Interesting Implementation Details Worker failure: Master pings workers periodically, If down, then reassigns the task to another worker CSE 414 - Fall 2017 19

Interesting Implementation Details Backup tasks: Straggler = a machine that takes unusually long time to complete one of the last tasks. E.g.: Bad disk forces frequent correctable errors (30MB/s 1MB/s) The cluster scheduler has scheduled other tasks on that machine Stragglers are a main reason for slowdown Solution: pre-emptive backup execution of the last few remaining in-progress tasks CSE 414 - Fall 2017 20

Issues with MapReduce Difficult to write more complex queries Need multiple MapReduce jobs: dramatically slows down because it writes all results to disk more recent systems work in memory Next lecture: Spark CSE 414 - Fall 2017 21

Relational Operators in MapReduce Given relations R(A, B) and S(B, C), compute: Selection: σ A=123 (R) Group-by: γ A, sum(b) (R) Join: R S CSE 414 - Fall 2017 22

Selection σ A=123 (R) map(string value): if value.a = 123: EmitIntermediate(value.key, value); reduce(string k, Iterator values): for each v in values: Emit(v); CSE 414 - Fall 2017 23

Selection σ A=123 (R) map(string value): if value.a = 123: EmitIntermediate(value.key, value); No need for reduce. But need system hacking to remove reduce from MapReduce reduce(string k, Iterator values): for each v in values: Emit(v); CSE 414 - Fall 2017 24

Group By γ A, sum(b) (R) map(string value): EmitIntermediate(value.A, value.b); reduce(string k, Iterator values): s = 0 for each v in values: s = s + v Emit(k, s); CSE 414 - Fall 2017 25

Join Two simple parallel join algorithms: Partitioned hash-join Broadcast join CSE 414 - Fall 2017 26

R(A, B) B=C S(C, D) Partitioned Hash-Join Reshuffle R on R.B and S on S.C Initially, both R and S are horizontally partitioned R 1, S 1 R 2, S 2... R P, S P Each server computes the join locally R 1, S 1 R 2, S 2... R P, S P CSE 414 - Fall 2017 27

R(A, B) B=C S(C, D) Partitioned Hash-Join map(string value): case value.relationname of R : EmitIntermediate(value.B, ( R, value)); S : EmitIntermediate(value.C, ( S, value)); reduce(string k, Iterator values): R = empty; S = empty; for each v in values: case v.type of: R : R.insert(v) S : S.insert(v); for v1 in R, for v2 in S Emit(v1, v2); CSE 414 - Fall 2017 28

R(A, B) B=C S(C, D) Broadcast Join Broadcast S Reshuffle R on R.B R 1 R 2... R P S R 1, S R 2, S... R P, S CSE 414 - Fall 2017 29

R(A, B) B=C S(C, D) Broadcast Join map(string value): open(s); /* over the network */ hashtbl = new() for each w in S: hashtbl.insert(w.c, w) close(s); map should read several records of R: value = some group of records Read entire table S, build a Hash Table for each w in hashtbl.find(value.b) Emit(v, w); reduce(): /* empty: map-side only */ CSE 414 - Fall 2017 30

Conclusions MapReduce offers a simple abstraction, and handles distribution + fault tolerance Speedup/scaleup achieved by allocating dynamically map tasks and reduce tasks to available server. However, skew is possible (e.g. one huge reduce task) Writing intermediate results to disk is necessary for fault tolerance, but very slow. Spark replaces this with Resilient Distributed Datasets = main memory + lineage CSE 414 - Fall 2017 31

Conclusions II Widely used in industry Google Search, machine learning, etc. looks good on a resume Has been generalized (see Google DataFlow) Harder to use than necessary language is imperative, not declarative (i.e., you have to actually write code) CSE 414 - Fall 2017 32