CS427 Multicore Architecture and Parallel Computing

Similar documents
Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

The MapReduce Abstraction

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

CS 345A Data Mining. MapReduce

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Clustering Lecture 8: MapReduce

MapReduce: Simplified Data Processing on Large Clusters

Lecture 11 Hadoop & Spark

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Map Reduce Group Meeting

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L12: Cloud Computing 1

Google: A Computer Scientist s Playground

Google: A Computer Scientist s Playground

CS MapReduce. Vitaly Shmatikov

Improving the MapReduce Big Data Processing Framework

MI-PDB, MIE-PDB: Advanced Database Systems

Programming Systems for Big Data

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L3b: Cloud Computing 1

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Introduction to MapReduce (cont.)

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

Parallel Programming Concepts

L22: SC Report, Map Reduce

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Introduction to MapReduce

Introduction to MapReduce

Parallel Computing: MapReduce Jin, Hai

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Big Data Management and NoSQL Databases

Map-Reduce. Marco Mura 2010 March, 31th

CS 61C: Great Ideas in Computer Architecture. MapReduce

Chapter 5. The MapReduce Programming Model and Implementation

COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Introduction to Map Reduce

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Introduction to MapReduce

MapReduce. U of Toronto, 2014

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

CMSC 330: Organization of Programming Languages

Database Applications (15-415)

STA141C: Big Data & High Performance Statistical Computing

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Distributed File Systems II

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Batch Processing Basic architecture

BigData and Map Reduce VITMAC03

MapReduce: A Programming Model for Large-Scale Distributed Computation

Clustering Documents. Case Study 2: Document Retrieval

MapReduce. Cloud Computing COMP / ECPE 293A

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Large-Scale GPU programming

Ruby Threads Thread Creation. CMSC 330: Organization of Programming Languages. Ruby Threads Condition. Ruby Threads Locks.

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

CompSci 516: Database Systems

Introduction to MapReduce

MapReduce Simplified Data Processing on Large Clusters

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Distributed Filesystem

CA485 Ray Walshe Google File System

MapReduce-style data processing

Introduction to MapReduce

Parallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Distributed Systems CS6421

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Distributed Computation Models

Cloud Computing CS

MapReduce & BigTable

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Database Systems CSE 414

Concurrency for data-intensive applications

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

Parallel Data Processing with Hadoop/MapReduce. CS140 Tao Yang, 2014

Programming Models MapReduce

Map Reduce. Yerevan.

CLOUD-SCALE FILE SYSTEMS

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

A BigData Tour HDFS, Ceph and MapReduce

CS370 Operating Systems

Introduction to MapReduce

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

CISC 7610 Lecture 2b The beginnings of NoSQL

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Transcription:

CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof. Li Jiang 2014/11/19 1

What is MapReduce Origin from Google, [OSDI 04] A simple programming model Functional model For large-scale data processing Exploits large set of commodity computers Executes process in distributed manner Offers high availability 2

Motivation Large-Scale Data Processing Want to use 1000s of CPUs But don t want hassle of managing things MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates 3

Benefit of MapReduce Map/Reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics 4

Distributed Word Count Very big data Split data Split data Split data count count count count count count merge merged count Split data count count 5

Distributed Grep Very big data Split data Split data Split data grep grep grep matches matches matches cat All matches Split data grep matches 6

Map+Reduce Very big data M A P Partitioning Function R E D U C E Result Map Accepts input key/value pair Emits intermediate key/value pair Reduce Accepts intermediate key/value* pair Emits output key/value pair 7

Map+Reduce map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output 8

Square Sum (map f list [list 2 list 3 ]) (map square (1 2 3 4)) (1 4 9 16) (reduce + (1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 9

Word Count Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts): Sum all 1 s in values list Emit result (word, sum) 10

Word Count Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts): Sum all 1 s in values list Emit result (word, sum) see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1 11

Reverse Web-Link Map For each URL linking to target, Output <target, source> pairs Reduce Concatenate list of all source URLs Outputs: <target, list (source)> pairs 12

Model is Widely Used Example uses: distributed grep distributed sort web link-graph reversal term-vector / host web access log stats inverted index construction document clustering machine learning......... statistical machine translation 13

Implementation Typical cluster: 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory Limited bisection bandwidth Storage is on local IDE disks GFS: distributed file system manages data (SOSP'03) Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines Implementation is a C++ library linked into user programs 14

Execution How is this distributed? Partition input key/value pairs into chunks, run map() tasks in parallel After all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel If map() or reduce() fails, reexecute! 15

Architecture Master node user Job tracker Slave node 1 Slave node 2 Slave node N Task tracker Task tracker Task tracker Workers Workers Workers 16

Task Granularity Fine granularity tasks: map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Often use 200,000 map & 5000 reduce tasks Running on 2000 machines 17

GFS Goal global view make huge files available in the face of node failures Master Node (meta server) Centralized, index all chunks on data servers Chunk server (data server) File is split into contiguous chunks, typically 16-64MB. Each chunk replicated (usually 2x or 3x). Try to keep replicas in different racks. 18

GFS GFS Master Client C 0 C 1 C 1 C 0 C 5 C 5 C 2 C 5 C 3 C 2 Chunkserver 1 Chunkserver 2 Chunkserver N 19

Execution 20

Execution 21

Workflow 22

Locality Master scheduling policy Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (== GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect Thousands of machines read input at local disk speed Without this, rack switches limit read rate 23

Fault Tolerance Reactive way Worker failure Heartbeat, Workers are periodically pinged by master NO response = failed worker If the processor of a worker fails, the tasks of that worker are reassigned to another worker. What about a completed Map task or Reduce task? Master failure Master writes periodic checkpoints Another master can be started from the last checkpointed state If eventually the master dies, the job will be aborted 24

Fault Tolerance Proactive way (Redundant Execution) The problem of stragglers (slow workers) Other jobs consuming resources on machine Bad disks with soft errors transfer data very slowly Weird things: processor caches disabled (!!) When computation almost done, reschedule in-progress tasks Whenever either the primary or the backup executions finishes, mark it as completed 25

Fault Tolerance Input error: bad records Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix, but not always possible On segment fault Send UDP packet to master from signal handler Include sequence number of record being processed Skip bad records If master sees two failures for same record, next worker is told to skip the record 26

Refinement Task Granularity Minimizes time for fault recovery load balancing Practical bounds: O(M+R) scheduling O(M*R) states of map task/reduce task pairs M -> large, but 16MB < each task < 64MB N -> small, multiple of worker machine Local execution for debugging/testing Compression of intermediate data 38

Notes No reduce can begin until map is complete Master must communicate locations of intermediate files Tasks scheduled based on location of data If map worker fails any time before reduce finishes, task must be completely return MapReduce library does most of the hard work for us! 39

Case Study User to do list: indicate: Input/output files M: number of map tasks R: number of reduce tasks W: number of machines Write map and reduce functions Submit the job 40

Case Study Map 41

Case Study Reduce 42

Main Case Study 43

Commodity Platform MapReduce Cluster, 1, Google 2, Apache Hadoop Multicore CPU, Phoenix @ stanford GPU, Mars@HKUST 44

Hadoop Google MapReduce GFS Bigtable Chubby Yahoo Hadoop HDFS HBase (nothing yet but planned) 45

Hadoop Apache Hadoop Wins Terabyte Sort Benchmark The sort used 1800 maps and 1800 reduces and allocated enough memory to buffers to hold the intermediate data in memory. 46

MR_Sort Normal No backup tasks 200 processes killed Backup tasks reduce job completion time a lot! System deals well with failures 47

Hadoop Hadoop Config File: conf/hadoop-env.sh:hadoop enviroment conf/core-site.xml:namenode IP and Port conf/hdfs-site.xml:hdfs Data Block Setting conf/mapred-site.xml:jobtracker IP and Port conf/masters:master IP conf/slaves:slaves IP 48

Hadoop Start HDFS and MapReduce [hadoop@ Master ~]$ start-all.sh JPS check status: [hadoop@ Master ~]$ jps Stop HDFS and MapReduce [hadoop@ Master ~]$ stop-all.sh 49

Hadoop Create(/root/test) two data files: file1.txt:hello hadoop hello world file2.txt:goodbye hadoop Copy files to HDFS: [hadoop@ Master ~]$ dfs copyfromlocal test-in is a data file folder under HDFS /root/test test-in Run hadoop WorldCount Program: [hadoop@ Master ~]$ hadoop jar hadoop-0.20.1-examples.jar wordcount test-in test-out 50

Hadoop Check test-out,the results are in test-out/part-r-00000 username@master:~/workspace/wordcount$ hadoop dfs -ls test-out Found 2 items drwxr-xr-x - hadoopusr supergroup 0 2010-05-23 20:29 /user/hadoopusr/test-out/_logs -rw-r--r-- 1 hadoopusr supergroup 35 2010-05-23 20:30 /user/hadoopusr/test-out/part-r-00000 Check the results username@master:~/workspace/wordcount$ hadoop dfs -cat test-out/part-r-00000 GoodBye 1 Hadoop 2 Hello 2 World 1 Copy results from HDFS to Linux username@master:~/workspace/wordcount$ hadoop dfs -get test-out/part-r-00000 test-out.txt username@master:~/workspace/wordcount$ vi test-out.txt GoodBye 1 Hadoop 2 Hello 2 World 1 51

Hadoop Program Development Programmers develop on his local machine and upload the files to the Hadoop cluster Eclipse development environment Eclipse is an open source enviroment(ide),provide integrated platform for Java. Eclipse official website:http://www.eclipse.org/ 52

MPI Vs. MapReduce MPI MapReduce Objective Availability General distributed programming model Weaker, harder better Large-scale data processing Data Locality MPI-IO GFS Usability Difficult to learn easier 53

Course Summary Most people in the research community agree that there are at least two kinds of parallel programmers that will be important to the future of computing Programmers that understand how to write software, but are naive about parallelization and mapping to architecture (Joe programmers) Programmers that are knowledgeable about parallelization, and mapping to architecture, so can achieve high performance (Stephanie programmers) Intel/Microsoft say there are three kinds (Mort, Elvis and Einstein) This course is about teaching you how to become Stephanie/Einstein programmers 54

Course Summary Why OpenMP, Pthreads, Mapreduce and CUDA? These are the languages that Einstein/Stephanie programmers use. They can achieve high performance. They are widely available and widely used. It is no coincidence that both textbooks I ve used for this course teach all of these except CUDA. 55

Course Summary It seems clear that for the next decade architectures will continue to get more complex, and achieving high performance will get harder. Programming abstractions will get a whole lot better. Seem to be bifurcating along the Joe/Stephanie or Mort/Elvis/Einstein boundaries. Will be very different. Whatever the language or architecture, some of the fundamental ideas from this class will still be foundational to the area. Locality Deadlock, load balance, race conditions, granularity 56