Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Size: px

Start display at page:

Download "Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414"

Julianna Johnson
5 years ago
Views:

) CSE 414 - Spring 2018 1 2 Approaches to Parallel Query Evaluation Inter-query parallelism One query per node Good for transactional (OLTP) workloads Inter-operator parallelism Operator per node

1 Introduction to Database Systems CSE 414 Lecture 17: MapReduce and Spark Announcements Midterm this Friday in class! Review session tonight See course website for OHs Includes everything up to Monday s lecture HW6 released Not due until next Friday 5/11 No WQ6 (Yay!) CSE Spring Approaches to Parallel Query Evaluation Inter-query parallelism One query per node Good for transactional (OLTP) workloads Inter-operator parallelism Operator per node Good for analytical (OLAP) workloads Parallel Data Processing in the 20 th Century Intra-operator parallelism Operator on multiple nodes Good for both? CSE Spring We study only intra-operator parallelism: most scalable 4 Parallel Execution of RA Operators: Partitioned Hash-Join Data: R(, A, ), S(,, C) Query: R(, A, ) S(,, C) Initially, both R and S are partitioned on and Reshuffle R on R. and S on S. Each server computes the join locally R 1, S 1 R 2, S 2... R P, S P R 1, S 1 R 2, S 2... R P, S P CSE Spring Data: R(,A, ), S(,, C) Query: R(,A,) S(,,C) Partition Shuffle on Local Join Parallel Join Illustration R1 S1 R2 S M R1 S1 R2 S2 M M2 M

Data: R(A, ), S(C, D) Query: R(A,) =C S(C,D) roadcast Join roadcast S Reshuffle R on R. R 1 R 2... R P S Parallel Data Processing @ 2000 R 1, S R 2, S... R P, S Why would you want to do this?

2 Data: R(A, ), S(C, D) Query: R(A,) =C S(C,D) roadcast Join roadcast S Reshuffle R on R. R 1 R 2... R P S Parallel Data 2000 R 1, S R 2, S... R P, S Why would you want to do this? CSE Spring CSE Spring Optional Reading Original paper: ech/dean.html Rebuttal to a comparison with parallel Ds: Chapter 2 (Sections 1,2,3 only) of Mining of Massive Datasets, by Rajaraman and Ullman CSE Spring Motivation We learned how to parallelize relational database systems While useful, it might incur too much overhead if our query plans consist of simple operations MapReduce is a programming model for such computation First, let s study how data is stored in such systems CSE Spring Distributed File System (DFS) For very large files: Ts, Ps Each file is partitioned into chunks, typically 64M Each chunk is replicated several times ( 3), on different racks, for fault tolerance Implementations: Google s DFS: GFS, proprietary Hadoop s DFS: HDFS, open source MapReduce Google: paper published 2004 Free variant: Hadoop MapReduce = high-level programming model and implementation for large-scale parallel data processing CSE Spring CSE Spring

Typical Problems Solved by MR Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, transform Write the results Paradigm

A MapReduce program: Input: a bag of (inputkey, value) pairs Output: a bag of (outputkey, value) pairs outputkey is optional CSE 414 - Spring 2018 13 slide source: Jeff Dean CSE 414 - Spring 2018 14

(intermediate key, bag of values) Output: bag of output (values) System applies the map function in parallel to all (input key, value) pairs in the input file System groups all pairs with the same

3 Typical Problems Solved by MR Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, transform Write the results Paradigm stays the same, change map and reduce functions for different problems Files! Data Model A file = a bag of (key, value) pairs Sounds familiar after HW5? A MapReduce program: Input: a bag of (inputkey, value) pairs Output: a bag of (outputkey, value) pairs outputkey is optional CSE Spring slide source: Jeff Dean CSE Spring Step 1: the MAP Phase Step 2: the REDUCE Phase User provides the MAP-function: Input: (input key, value) Output: bag of (intermediate key, value) User provides the REDUCE function: Input: (intermediate key, bag of values) Output: bag of output (values) System applies the map function in parallel to all (input key, value) pairs in the input file System groups all pairs with the same intermediate key, and passes the bag of values to the REDUCE function CSE Spring CSE Spring Example MAP REDUCE Counting the number of occurrences of each word in a large collection of documents Each Document The key = document id (did) The value = set of words (word) (did1,v1) (did2,v2) (did3,v3) (w3,1) Shuffle (w1, (1,1,1,,1)) (w2, (1,1,)) (w3,(1)) (w1, 25) (w2, 77) (w3, 12) map(string key, String value): // key: document name // value: document contents for each word w in value: emitintermediate(w, 1 ); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; result += ParseInt(v);.... emit(asstring(result)); 17 CSE Spring

4 Workers A worker is a process that executes one task at a time Typically there is one worker per processor, hence 4 or 8 per node (did1,v1) (did2,v2) (did3,v3) MAP Tasks (M) (w3,1) Shuffle REDUCE Tasks (R) (w1, (1,1,1,,1)) (w1, 25) (w2, (1,1,)) (w2, 77) (w3,(1)) (w3, 12).... CSE Spring Fault Tolerance If one server fails once every year... then a job with 10,000 servers will fail in less than one hour MapReduce handles fault tolerance by writing intermediate files to disk: Mappers write file to local disk Reducers read the files (=reshuffling); if the server fails, the reduce task is restarted on another server CSE Spring Implementation There is one master node Master partitions input file into M splits, by key Master assigns workers (=servers) to the M map tasks, keeps track of their progress Workers write their output to local disk, partition into R regions Master assigns workers to the R reduce tasks Reduce workers read regions from the map workers local disks CSE Spring Interesting Implementation Details ackup tasks: Straggler = a machine that takes unusually long time to complete one of the last tasks. E.g.: ad disk forces frequent correctable errors (30M/s à 1M/s) The cluster scheduler has scheduled other tasks on that machine Stragglers are a main reason for slowdown Solution: pre-emptive backup execution of the last few remaining in-progress tasks Worker 1 Worker 2 Worker 3 Straggler Example Killed ackup execution Straggler Killed time CSE Spring CSE Spring

Using MapReduce in Practice: Implementing RA Operators in MR Relational Operators in MapReduce Given relations R(A,) and S(,C) compute: Selection: σ A=123 (R) Group-by: γ A,sum() (R) Join: R S CSE

A, t); (123, [ t 2, t 3 ] ) if t.a = 123: EmitIntermediate(t.

5 Using MapReduce in Practice: Implementing RA Operators in MR Relational Operators in MapReduce Given relations R(A,) and S(,C) compute: Selection: σ A=123 (R) Group-by: γ A,sum() (R) Join: R S CSE Spring Selection σ A=123 (R) Selection σ A=123 (R) if t.a = 123: EmitIntermediate(t.A, t); (123, [ t 2, t 3 ] ) if t.a = 123: EmitIntermediate(t.A, t); A t 1 23 t t t 4 42 reduce(string A, Iterator values): Emit(v); ( t 2, t 3 ) 27 reduce(string A, Iterator values): Emit(v); No need for reduce. ut need system hacking in Hadoop to remove reduce from MapReduce 28 Group y γ A,sum() (R) Join EmitIntermediate(t.A, t.); (23, [ t 1 ] ) (42, [ t 4 ] ) (123, [ t 2, t 3 ] ) Two simple parallel join algorithms: Partitioned hash-join (we saw it, will recap) A t t t t reduce(string A, Iterator values): s = 0 s = s + v Emit(A, s); ( 23, 10 ), ( 42, 6 ), (123, 25) 29 roadcast join CSE Spring

R(A,) =C S(C,D) Partitioned Hash-Join R(A,) =C S(C,D) Partitioned Hash-Join Reshuffle R on R. and S on S.

relationname of R : EmitIntermediate(t., ( R, t)); S : EmitIntermediate(t.

.. R P, S P CSE 414 - Spring 2018 31 reduce(string k, Iterator values): R = empty; S = empty; case v.type of: R : R.

= new HashTable() for each w in S: hashtable.insert(w.

6 R(A,) =C S(C,D) Partitioned Hash-Join R(A,) =C S(C,D) Partitioned Hash-Join Reshuffle R on R. and S on S. Initially, both R and S are horizontally partitioned R 1, S 1 R 2, S 2... R P, S P type case t.relationname of R : EmitIntermediate(t., ( R, t)); S : EmitIntermediate(t.C, ( S, t)); actual tuple Each server computes the join locally R 1, S 1 R 2, S 2... R P, S P CSE Spring reduce(string k, Iterator values): R = empty; S = empty; case v.type of: R : R.insert(v) S : S.insert(v); for v1 in R, for v2 in S Emit(v1,v2); 32 R(A,) =C S(C,D) Reshuffle R on R. roadcast Join R 1 R 2... R P roadcast S S R(A,) =C S(C,D) roadcast Join map(string value): readfromnetwork(s); /* over the network */ hashtable = new HashTable() for each w in S: hashtable.insert(w.c, w) map should read several records of R: value = some group of tuples from R Read entire table S, build a Hash Table R 1, S R 2, S... R P, S CSE Spring for each v in value: for each w in hashtable.find(v.) Emit(v,w); reduce(): /* empty: map-side only */ 34 HW6 HW6 will ask you to write SQL queries and MapReduce tasks using Spark You will get to implement SQL using MapReduce tasks Can you beat Spark s implementation? 6

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm Announcements HW5 is due tomorrow 11pm Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before