CS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.

Similar documents
Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

L22: SC Report, Map Reduce

The MapReduce Abstraction

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Parallel Computing: MapReduce Jin, Hai

Big Data Management and NoSQL Databases

Parallel Programming Concepts

Advanced Data Management Technologies

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

Map Reduce Group Meeting

CS 345A Data Mining. MapReduce

The MapReduce Framework

MapReduce: Simplified Data Processing on Large Clusters

Introduction to MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS MapReduce. Vitaly Shmatikov

Introduction to Map Reduce

Introduction to MapReduce

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Map Reduce. Yerevan.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Clustering Lecture 8: MapReduce

Introduction to MapReduce

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Distributed computing: index building and use

1. Introduction to MapReduce

Distributed computing: index building and use

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

Large-Scale GPU programming

Database Systems CSE 414

MI-PDB, MIE-PDB: Advanced Database Systems

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS 345A Data Mining. MapReduce

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Introduction to Data Management CSE 344

Introduction to MapReduce (cont.)

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

MapReduce: A Programming Model for Large-Scale Distributed Computation

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Parallel Nested Loops

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Introduction to Data Management CSE 344

Part A: MapReduce. Introduction Model Implementation issues

Batch Processing Basic architecture

Introduction to MapReduce

CS427 Multicore Architecture and Parallel Computing

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

MapReduce-style data processing

Improving the MapReduce Big Data Processing Framework

CS November 2017

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Map Reduce.

CS November 2018

Google: A Computer Scientist s Playground

Google: A Computer Scientist s Playground

MapReduce & BigTable

Distributed File Systems II

Introduction to MapReduce

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11

MapReduce Algorithms

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Introduction to Data Management CSE 344

HADOOP FRAMEWORK FOR BIG DATA

MapReduce Simplified Data Processing on Large Clusters

STA141C: Big Data & High Performance Statistical Computing

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Lessons Learned While Building Infrastructure Software at Google

Tutorial Outline. Map/Reduce. Data Center as a computer [Patterson, cacm 2008] Acknowledgements

Map-Reduce. John Hughes

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

MapReduce, Hadoop and Spark. Bompotas Agorakis

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

CompSci 516: Database Systems

Programming Systems for Big Data

Concurrency for data-intensive applications

MapReduce. U of Toronto, 2014

MapReduce. Cloud Computing COMP / ECPE 293A

Chapter 5. The MapReduce Programming Model and Implementation

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Survey on MapReduce Scheduling Algorithms

Transcription:

Credit Much of this information is from Google: Google Code University [no longer supported] http://code.google.com/edu/parallel/mapreduce-tutorial.html Distributed Systems 18. : The programming model and practice research.google.com/pubs/pub36249.html See also: http://hadoop.apache.org/common/docs/current/ for the Apache Hadoop version Read this (the definitive paper): http://labs.google.com/papers/mapreduce.html Paul Krzyzanowski Rutgers University Fall 2018 November 5, 2018 2014-2017 Paul Krzyzanowski 1 2 Background Simplest environment for parallel processing Traditional programming is serial Parallel programming Break processing into parts that can be executed concurrently on multiple processors No dependency among data Data can be split into lots of smaller chunks - shards Each process can work on a chunk Master/ approach Challenge Identify tasks that can run concurrently and/or groups of data that can be processed concurrently Not all problems can be parallelized Master Initializes array and splits it according to # of s Sends each the sub-array Receives the results from each Worker Receives a sub-array from master Performs processing Sends results to master 3 4 Created by Google in 2004 Jeffrey Dean and Sanjay Ghemawat Inspired by LISP (function, set of values) Applies function to each value in the set (map length (() (a) (a b) (a b c))) (0 1 2 3) (function, set of values) Combines all the values using a binary function (e.g., +) (reduce #'+ (1 2 3 4 5)) 15 Framework for parallel computing Programmers get simple API Don t have to worry about handling parallelization data distribution load balancing fault tolerance Allows one to process huge amounts of data (terabytes and petabytes) on thousands of processors 5 6 Paul Krzyzanowski 1

Who has it? Google Original proprietary implementation Apache Hadoop Most common (open-source) implementation Built to specs defined by Google Amazon Elastic Uses Hadoop running on Amazon EC2 or Microsoft Azure HDInsight or Google Cloud for App Engine Grab the relevant data from the source User function gets called for each chunk of input Spits out (key, value) pairs Aggregate the results User function gets called for each unique key 7 8 : what happens in between? : (input shard) intermediate(key/value pairs) Automatically partition input data into M shards Discard unnecessary data and generate (key, value) sets Framework groups together all intermediate values with the same intermediate key & pass them to the function : intermediate(key/value pairs) result s Input: key & set of values Merge these values together to form a smaller set of values s are distributed by partitioning the intermediate key space into R pieces using a partitioning function (e.g., hash(key) mod R) The user specifies the # of partitions (R) and the partitioning function Grab the relevant data from the source (parse into key, value) Write it to an intermediate Partition Partitioning: identify which of R reducers will handle which keys partitions data to target it to one of R s based on a partitioning function (both R and partitioning function user defined) Shuffle (Sort) Fetch the relevant partition of the output from all mappers Sort by keys (different mappers may have output the same key) Input is the sorted output of mappers Call the user function per key with the list of values for that key to aggregate the results Worker Worker 9 10 : the complete picture Step 1: Split input s into chunks (shards) Shard 0 client forks master Assign tasks Break up the input data into M pieces (typically 64 MB) Shard 1 Shard 2 Shard 3 Shard M-1 R work items 1 2 Shard 0 Shard 1 Shard 2 Shard 3 Input s Divided into M shards Shard M-1 Input s M work items s s s s s 11 12 Paul Krzyzanowski 2

Step 2: Fork processes Start up many copies of the program on a cluster of machines 1 master: scheduler & coordinator Lots of s Idle s are assigned either: map tasks (each works on a shard) there are M map tasks reduce tasks (each works on intermediate s) there are R R = # partitions, defined by the user Step 3: Run Tasks Reads contents of the input shard assigned to it Parses key/value pairs out of the input data Passes each pair to a user-defined map function Produces intermediate key/value pairs These are buffered in memory User program Remote fork Shard 2 read master 13 14 Step 4: Create intermediate s key/value pairs produced by the user s map function buffered in memory and are periodically written to the local disk Partitioned into R regions by a partitioning function Notifies master when complete Passes locations of intermediate data to the master Master forwards these locations to the reduce Step 4a. Partitioning data will be processed by s The user s function will be called once per unique key generated by. This means we will need to sort all the (key, value) data by keys and decide which processes which keys the will do this Shard n read local write Partition 1 Partition 1 Partition R-1 Partition function: decides which of R reduce s will work on which key Default function: hash(key) mod R partitions the data by keys Each will read their partition from every 15 16 Step 5: Task: sorting gets notified by the master about the location of intermediate s for its partition Uses RPCs to read the data from the local disks of the map s When the reduce reads intermediate data for its partition It sorts the data by the intermediate keys All occurrences of the same key are grouped together Step 6: Task: The sort phase grouped data with a unique intermediate key User s function is given the key and the set of intermediate values for that key < key, (value1, value2, value3, value4, ) > The output of the function is appended to an output local write remote read remote read write local write remote read 17 18 Paul Krzyzanowski 3

Step 7: Return to user : the complete picture When all map and reduce tasks have completed, the master wakes up the user program Shard 0 client forks master Assign tasks The call in the user program returns and the program can resume execution. of is available in R output s Shard 1 Shard 2 1 Shard 3 Shard M-1 R work items 2 Input s M work items s s MAP SHUFFLE s REDUCE s s 19 November 5, 2018 2014-2017 Paul Krzyzanowski 20 Example Count # occurrences of each word in a collection of documents : Parse data; output each word and a count (1) : Sort: sort by keys (words) : Sum together counts each key (word) map(string key, String value): // key: document name, value: document contents for each word w in value: Emit(w, "1"); reduce(string key, Iterator values): // key: a word; values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Example Input It will be seen that this mere painstaking burrower and grub-worm of a poor devil of a Sub-Sub appears to have gone through the long Vaticans and streetstalls of the earth, picking up whatever random allusions to whales he could anyways find in any book whatsoever, sacred or profane. Therefore you must not, in every case at least, take the higgledy-piggledy whale statements, however authentic, in these extracts, for veritable gospel cetology. Far from it. As touching the ancient authors generally, as well as the poets here appearing, these extracts are solely valuable or entertaining, as affording a glancing bird's eye view of what has been promiscuously said, thought, fancied, and sung of Leviathan, by many nations and generations, including our own. After [ ] MAP it 1 will 1 be 1 seen 1 that 1 this 1 mere 1 painstaking 1 burrower 1 and 1 grub-worm 1 of 1 poor 1 devil 1 of 1 sub-sub 1 appears 1 to 1 have 1 gone 1 After Sort aback 1 aback 1 abaft 1 abaft 1 abandon 1 abandon 1 abandon 1 abandoned REDUCE 1 abandonedly 1 abandonment 1 abandonment 1 abased 1 abased 1 After a 4736 aback 2 abaft 2 abandon 3 abandoned 7 abandonedly 1 abandonment 2 abased 2 abasement 1 abashed 2 abate 1 abated 3 abatement 1 abating 2 abbreviate 1 abbreviation 1 abeam 1 abed 2 abednego 1 abel 1 abhorred 3 abhorrence 1 21 22 Fault tolerance Master pings each periodically If no response is received within a certain time, the is marked as failed or reduce tasks given to this are reset back to the initial state and rescheduled for other s. Locality Input and s GFS (Google File System) Bigtable runs on GFS chunkservers Keep computation close to the s if possible Master tries to schedule map on one of the machines that has a copy of the input chunk it needs. 23 24 Paul Krzyzanowski 4

Distributed grep (search for words) Search for words in lots of documents : emit a line if it matches a given pattern : just copy the intermediate data to the output Count URL access frequency Find the frequency of each URL in web logs : process logs of web page access; output <URL, 1> : add all values for the same URL Input: line of text If pattern matches : (, line) Input:, [lines] : lines Input: line from log : (url, 1) Input: url, [accesses] : url, sum(accesses) 25 26 Reverse web-link graph Find where page links come from : output <target, source>for each link to target in a page source Inverted index Find what documents contain a specific word : parse document, emit <word, document-id> pairs : concatenate the list of all source URLs associated with a target. <target, list(source)> : for each word, sort the corresponding document IDs Emit a <word, list(document-id)> pair The set of all output pairs is an inverted index Input: HTML s : (target, source) Input: target, [sources] : target, [sources]) Input: document : (word, doc_id) Input: word, [doc_id] : word, [doc_id]) 27 28 Stock summary Find average daily gain of each company from 1/1/2000 12/31/2015 Data is a set of lines: { date, company, start_price, end_price } : Two rounds Average salaries in regions Show zip codes where average salaries are in the ranges: (1) < $100K (2) $100K $500K (3) > $500K Data is a set of lines: { name, age, address, zip, salary } If (date >= 1/1/2000 && date <= 12/31/2015 ) : (company, end_price-start_price) Input: company, [daily_gains] : word, average([daily_gains]) : (zip, salary) (zip, salary) if (salary < 100K) output( <$100K, zip) Input: zip, [salary] : zip, average([salary]) Show average salary for each zipcode Input: range, [zips] else if (salary > 500K) output( >$500K, zip) else output( $100-500K, zip) : range For z in zips output(z) 29 30 Paul Krzyzanowski 5

for Rendering Tiles Summary Get a lot of data Parse & extract items of interest Sort (shuffle) & partition Aggregate results Write to output s From Designs, Lessons and Advice from Building Large Distributed Systems Jeff Dean, Google http://www.odbms.org/download/dean-keynote-ladis2009.pdf Used with permission 31 32 All is not perfect was used to process webpage data collected by Google's crawlers. It would extract the links and metadata needed to search the pages Determine the site's PageRank All is not perfect Web has become more dynamic an 8+ hour delay is a lot for some sites Goal: refresh certain pages within seconds The process took around eight hours. Results were moved to search servers. This was done continuously. Web crawlers Migrate to search servers Batch-oriented Not suited for near-real-time processes Cannot start a new phase until the previous has completed cannot start until all s have completed Suffers from stragglers s that take too long (or fail) This was done continuously is still used for many Google services ~ 8 hours! Search framework updated in 2009-2010: Caffeine Index updated by making direct changes to data stored in Bigtable Data resides in Colossus (GFS2) instead of GFS 33 34 In Practice Most data not simple s B-trees, tables, SQL databases, memory-mapped key-values Hardly ever use textual data: slow & hard to parse Most I/O encoded with Protocol Buffers More info Good tutorial presentation & examples at: http://research.google.com/pubs/pub36249.html The definitive paper: http://labs.google.com/papers/mapreduce.html 35 36 Paul Krzyzanowski 6

The End November 5, 2018 2014-2018 Paul Krzyzanowski 37 Paul Krzyzanowski 7