Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

Size: px

Start display at page:

Download "Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)"

Terence Williams
5 years ago
Views:

Principles of Data Management Lecture #16 (MapReduce & DFS for Big Data) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J.

1 Principles of Data Management Lecture #16 (MapReduce & DFS for Big Data) Instructor: Mike Carey Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s News Bulletin v Project dates Query execution layer is due on 3/17 v Upcoming lectures Today: MapReduce (and distributed file systems) Next week: Wrap-up & review, in-class endterm v Other upcoming events The long-lost midterms will appear on Tuesday! v Class participation opportunities Teaching evaluations (after Tuesday J ) End-of-term opinion survey (watch for it!) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

2 Motivation v Google needed to process web-scale data Data much larger than what fits on one machine v Needed parallel processing to get results in a reasonable time Wanted to use cheap commodity machines to do the job Credits: Some of the following slide content is excerpted from the Google ODSI 2004 talk where MapReduce was publically born and/or Google s SOSP 2003 talk on the underlying DFS. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Requirements v Solution must Scale to 1000s of compute nodes Must automatically handle faults Provide monitoring of jobs Be easy for programmers to use Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

3 MapReduce Programming model v Input and Output are sets of key/value pairs v Programmer provides two functions map(k1, V1) -> list(k2, V2) Produces list of intermediate key/value pairs for each input key/value pair reduce(k2, list(v2)) -> list(k3, V3) Produces a list of result values for all intermediate values that are associated with the same intermediate key Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 MapReduce Pipeline Map Shuffle Reduce Read from DFS Write to DFS Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

4 MapReduce in Action Map (k1, v1) à list(k2, v2) Processes one input key/value pair Produces a set of intermediate key/value pairs Reduce (k2, list(v2)) list(k3, v3) Combines intermediate values for one particular key Produces a set of merged output values (usually one) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 MapReduce Architecture MapReduce Job Tracker Network MapReduce MapReduce MapReduce MapReduce Distributed File System Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

5 Software Components v Job Tracker (Master) Maintains Cluster membership of workers Accepts MR jobs from clients and dispatches tasks to workers Monitors workers progress Restarts tasks in the event of failure v Task Tracker (Worker) Provides an environment to run a task Maintains and serves intermediate files between Map and Reduce phases Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 MapReduce Parallelism Hash Partitioning Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

6 Example 1: Count Word Occurrences v Input: Set of (Document name, Document Contents) v Output: Set of (Word, Count(Word)) v map(k1, v1): for each word w in v1 emit(w, 1) v reduce(k2, v2_list): int result = 0; for each v in v2_list result += v; emit(k2, result) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Example 1: Count Word Occurrences this is a line this is another line Map this, 1 is, 1 a, 1 this, 1 is, 1 a, 1 is, 1 is, 1 Reduce a, 1 another, 3 is, 2 another line yet another line Map yet, 1 this, 1 this, 1 yet, 1 Reduce line, 4 this, 2 yet, 1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

MapReduce Pipeline Revisited (Picture borrowed from Shiv Babu @ Duke University) Database Management Systems 3ed, R. Ramakrishnan and J.

7 MapReduce Pipeline Revisited (Picture borrowed from Shiv Duke University) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Example 2: Equijoins v Input: Rows of Relation R, Rows of Relation S v Output: R join S on R.x = S.y v map(k1, v1) if (input == R) emit(v1.x, [ R, v1]) else emit(v1.y, [ S, v2]) v reduce(k2, v2_list) for r in v2_list where r[1] == R for s in v2_list where s[1] == S emit(1, result(r[2], s[2])) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

8 Other Examples v Distributed grep v Inverted index construction v Machine learning v Distributed sort v Fuzzy join v v Or: A Pig script or a Hive query (which are then auto-converted to a Hadoop MapReduce job series under the covers) e.g., at Netflix Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Fault Tolerant Evaluation v Task Fault Tolerance is achieved through reexecution ( roll forward, not back!) v All consumers consume data only after completely generated by the producer This is an important property to isolate faults to one task v Task completion committed through Master v Cannot handle master failure Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Task granularity and pipelining v Fine granularity tasks Many more map tasks than machines Minimizes time for fault recovery Pipelines shuffling with map execution Better load balancing Database

9 Task granularity and pipelining v Fine granularity tasks Many more map tasks than machines Minimizes time for fault recovery Pipelines shuffling with map execution Better load balancing Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Optimization: Combiners v Sometimes partial aggregation is possible on the Map side v May cut down the amount of data needing to be transferred to the reducer (significantly in some cases, like grouped aggregation in Hive) v combine(k2, list(v2)) -> K2, list(v2) v For Word Occurrence Count example, Combine == Reduce (Q: Why?) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

10 Example 1: Word Count Revisited (With Combiners) this is a line this is another line Map this, 1 is, 1 a, 1 this, 1 is, 1 a, 1 is, 2 another, 2 ß Intermediate Results Reduce a, 1 another, 3 is, 2 another line yet another line Map yet, 1 line, 2 this, 2 line, 2 yet, 1 Reduce line, 4 this, 2 yet, 1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19 Optimization: Redundant Execution v Slow workers lengthen completion time v Slowness happens because of Other jobs consuming resources Bad disks/network etc v Solution: Near the end of the job spawn extra copies of long running tasks Whichever copy finishes first, wins. Kill the rest v In Hadoop this is called speculative execution Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

11 Optimization: Locality v Task scheduling policy Ask DFS (next topic!) for locations of replicas of input file blocks Map tasks scheduled so that input blocks are machine local or rack local v Effect: Tasks read data at local disk speeds v Without this, rack switches limit data rate Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21 Distributed (Big!) Filesystem v Used as the store for MapReduce data v MapReduce reads its input from DFS and writes its output to DFS v Provides a shared disk view to applications using local storage on shared-nothing hardware v Provides redundancy by replication to protect from node/disk failures Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

12 DFS Architecture Single Master (with backups) that track DFS file name to chunk mapping Several Chunk servers that store chunks on local disks Chunk Size ~ 64MB or larger Chunks are replicated Master only used for chunk lookups Does not participate in transfer of data Taken from Ghemawat s SOSP 03 paper (The Google Filesystem) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23 Chunk Replication v Several Replicas of each Chunk Replicas usually spread across racks and data centers to maximize availability 3 replicas common (local, same rack, different rack) v Master tracks location of each replica of a chunk v When chunk failure is detected, master automatically rebuilds new replica to maintain replication level v Automatically picks chunk servers for new replicas based on utilization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24

MapReduce & DFS: Summary v Google laid a

13 MapReduce & DFS: Summary v Google laid a foundation for a new flurry of large-scale data storage and processing with their MR and DFS work in the early 2000 s v Apache open source versions soon sprung up outside of Google: Hadoop MapReduce & HDFS v Today, Big Data use cases are addressed with a mix of parallel RDBMS technologies as well as more flexible Hadoop-based technologies v So where are we now? Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25 Today s Tangled World (Pig) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

14 Today s Tangled World (Pig) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27 Additional Reading v Original MapReduce Paper (*** MUST READ! ***) Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat in OSDI 04 v Original DFS Paper The Google Filesystem by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in SOSP 03 v MapReduce vs. Parallel DBMS Papers in CACM (Jan. 2010) MapReduce and Parallel DBMSs: Friends or Foes? by Michael Stonebraker, Daniel Abadi, David DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin MapReduce: A Flexible Data Processing Tool by Jeffrey Dean and Sanjay Ghemawat v EDBT Ogres & Onions Keynote Paper Inside "Big Data Management": Ogres, Onions, or Parfaits? by Vinayak Borkar, Michael J. Carey, and Chen Li in EDBT '12 (or watch the movie ) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [MAPREDUCE] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Bit Torrent What is the right chunk/piece