MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University

Similar documents
HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Parallel Computing: MapReduce Jin, Hai

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Map Reduce Group Meeting

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

MI-PDB, MIE-PDB: Advanced Database Systems

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Big Data Management and NoSQL Databases

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

HyperDex: Next Generation NoSQL

The MapReduce Abstraction

Programming Systems for Big Data

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 61C: Great Ideas in Computer Architecture. MapReduce

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Large-Scale Data Stores and Probabilistic Protocols

L22: SC Report, Map Reduce

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Introduction to Map Reduce

Introduction to Data Management CSE 344

Database Systems CSE 414

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Introduction to Hadoop and MapReduce

CS 345A Data Mining. MapReduce

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

Clustering Lecture 8: MapReduce

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Data Partitioning and MapReduce

Large-Scale GPU programming

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

CS MapReduce. Vitaly Shmatikov

Parallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

MapReduce Simplified Data Processing on Large Clusters

Programming Models MapReduce

Map Reduce. Yerevan.

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

Parallel Programming Concepts

Introduction to Data Management CSE 344

Batch Processing Basic architecture

Introduction to MapReduce (cont.)

Parallel Nested Loops

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Introduction to MapReduce

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Computing CS

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

The MapReduce Framework

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Database System Architectures Parallel DBs, MapReduce, ColumnStores

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

Tutorial Outline. Map/Reduce. Data Center as a computer [Patterson, cacm 2008] Acknowledgements

Distributed File Systems II

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

A BigData Tour HDFS, Ceph and MapReduce

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Distributed computing: index building and use

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Introduction to Distributed Data Systems

Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

HADOOP FRAMEWORK FOR BIG DATA

Introduction to MapReduce

MapReduce: Simplified Data Processing on Large Clusters

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

Introduction to Data Management CSE 344

Lecture 11 Hadoop & Spark

Distributed Filesystem

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

MapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan

Database Applications (15-415)

CompSci 516: Database Systems

The Google File System (GFS)

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Distributed computing: index building and use

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

MongoDB DI Dr. Angelika Kusel

MapReduce: A Programming Model for Large-Scale Distributed Computation

Concurrency for data-intensive applications

BigData and Map Reduce VITMAC03

Transcription:

MapReduce & HyperDex Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University 1

Distributing Processing Mantra Scale out, not up. Assume failures are common. Move processing to the data. Process data sequentially and avoid random access. Hide system-level details from the application developer. Incorporate seamless scalability. 2

Drivers to MapReduce Our ability to store data is fast overwhelming our ability to process what we store So you can write it you just can t use it for any calculations Increases in capacity are outpacing improvements in bandwidth So you can write it you just can t read it back in a reasonable time 3

Introduction to Parallelization Writing algorithms for a cluster On the order of 10,000 or more machines Failure or crash is not an exception, but common phenomenon Parallelize computation Distribute data Balance load Makes implementation of conceptually straightforward computations challenging 4

MapReduce Wanted: A model to express computation while hiding the messy details of the execution Inspired by map and reduce primitives in functional programming Apply map to each input record to create a set of intermediate key-value pairs Apply reduce to all values that share the same key (like GROUP BY) Automatically parallelized Re-execution as primary mechanism for fault tolerance 5

What is MapReduce? Programming model for expressing distributed computations on massive amounts of data AND An execution framework for large-scale data processing on clusters of commodity servers 6

Typical MapReduce Application MAP Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate results REDUCE Aggregate intermediate results Generate final outcome 7

Programming Model Transforms set of input key-value pairs to set of output keyvalue pairs Map function written by user Map: (k1, v1) list (k2, v2) MapReduce library groups all intermediate pairs with same key together Reduce written by user Reduce: (k2, list (v2)) list (v2) Usually zero or one output value per group Intermediate values supplied via iterator (to handle lists that do not fit in memory) 8

Execution Framework Handles scheduling of the tasks Assigns workers to maps and reduce tasks Handles data distribution Moves the process to the data Handles synchronization Gathers, sorts and shuffles intermediate data Handles faults Detects worker failures and restarts Understands the distributed file system 9

EXAMPLE: Count occurrences of each word in a document collection Map( String key, String value ): // key: document name // value: document contents for each word w in value: EmitIntermediate( w, "1 ); Reduce( String key, // key: a word Iterator values ): // values: a list of counts int result = 0; for each v in values: result += ParseInt( v ); Emit( AsString(result) ); 10

Distributing work to nodes Focuses on large clusters Relies on existence of reliable and highly available distributed file system Map invocations Automatically partition input data into M chunks (16-64 MB typically) Chunks processed in parallel Reduce invocations Partition intermediate key space into R pieces, e.g., using hash(key) mod R Master node controls program execution 11

Dealing with failing nodes Master monitors tasks on mappers and reducers: idle, in progress, completed Worker failure (common) Master pings workers periodically No response => assumes worker failed Resets worker s map tasks, completed or in progress, to idle state (tasks now available for scheduling on other workers) Completed tasks only on local disk, hence inaccessible Same for reducer s in-progress tasks Completed tasks stored in global file system, hence accessible Reducers notified about change of mapper assignment Master failure (unlikely) Checkpointing or simply abort computation 12

Other considerations Conserve network bandwidth ( Locality optimization ) Distributed file system assigns data chunks to local disks Schedule map task on machine that already has a copy of the chunk, or one nearby Choose M and R much larger than number of worker machines Load balancing and faster recovery (many small tasks from failed machine) Limitation: O(M+R) scheduling decisions and O(M*R) in-memory state at master Common choice: M so that chunk size is 16-64 MB, R a small multiple of number of workers Backup tasks to deal with machines that take unusually long for last few tasks For in-progress tasks when MapReduce near completion 13

MapReduce Execution flow Master Input files sent to map tasks M Intermediate keys partitioned into reduce tasks M R Input Files M R Reduce workers Output Map workers 14

Map Interface Input: <in_key, in_value> pair => <url, content> Output: list of intermediate <key, value> pairs => list of <word, url> key = http://url0.com value = every happy family is alike. <every, http://url0.com> <happy, http://url0.com> <family, http://url0.com> key = http://url1.com value = every unhappy family is unhappy in its own way. map() <every, http://url1.com> <unhappy, http://url1.com> <family, http://url1.com> 15 Map Input: <url, content> Map Output: list of <word, url>

Shuffle MapReduce system Collects outputs from all map executions Groups all intermediate values by the same key <every, http://url0.com> <happy, http://url0.com> <family, http://url0.com> <every, http://url1.com> <unhappy, http://url1.com> <family, http://url1.com> Map Output: list of <word, url> every -> http://url0.com http://url1.com happy -> http://url0.com unhappy -> http://url1.com family -> http://url0.com http://url1.com Reduce Input: <word, list of urls> 16

Reduce Interface Input: <out_key, list of intermediate_value> Output: <out_key, out_value> every -> http://url0.com http://url1.com happy -> http://url0.com reduce() <every, http://url0.com, http://url1.com > <happy, http://url0.com > unhappy -> http://url1.com family -> http://url0.com http://url1.com Reduce Input: <word, list of urls> <unhappy, http://url1.com > <family, http://url0.com, http://url1.com > Reduce Output: <word, string of urls> 17

Parallel Database SQL specifies what to compute, not how to do it Perfect for parallel and distributed implementation Just need an optimizer that can choose best plan in given parallel/distributed system Cost estimate includes disk, CPU, and network cost Recent benchmarks show parallel DBMS can significantly outperform MapReduce But many programmers prefer writing Map and Reduce in familiar PL (C++, Java) Recent trend: High-level PL for writing MapReduce programs with DBMS-inspired operators 18

19

My SQL vs. MongoDB SELECT goaltype, SUM(distancekm) as totalkm, COUNT(*) AS workouts, count(powersongalbum) as soungcount, avg(distancekm) as avgkm, max(distancekm) as maxkm, Min(distancekm) as minkm from workouts group by goaltype; Database gurus have spoken out against MapReduce Dave DeWitt, Michael Stonebraker db.runcommand({ mapreduce: "workouts", map: function () { emit( this.goaltype, { '_cfcount': 1, 'distancekm_cfsum': isnan(this.distancekm)? null : this.distancekm, 'distancekm_cfnum': isnan(this.distancekm)? 0 : 1, 'powersongalbum_cfcount': (this.powersongalbum == null)? 0 : 1, 'distancekm_cfmax': isnan(this.distancekm)? null : this.distancekm, 'distancekm_cfmin': isnan(this.distancekm)? null : this.distancekm } ); }, reduce: function (key,vals) { var ret = { 'distancekm_cfmax': null, 'distancekm_cfsum': null, 'distancekm_cfmin': null, 'distancekm_cfnum': 0, 'powersongalbum_cfcount': 0, '_cfcount': 0 }; for(var i = 0; i < vals.length; i++) { var v = vals[i]; ret['distancekm_cfnum'] += v['distancekm_cfnum']; if(!isnan(v['distancekm_cfmax'])) ret['distancekm_cfmax'] = (ret['distancekm_cfmax'] == null)? v['distancekm_cfmax'] : (ret['distancekm_cfmax'] > v['distancekm_cfmax'])? ret['distancekm_cfmax'] : v['distancekm_cfmax']; ret['_cfcount'] += v['_cfcount']; if(!isnan(v['distancekm_cfmin'])) ret['distancekm_cfmin'] = (ret['distancekm_cfmin'] == null)? v['distancekm_cfmin'] : (v['distancekm_cfmin'] > ret['distancekm_cfmin'])? ret['distancekm_cfmin'] : v['distancekm_cfmin']; ret['powersongalbum_cfcount'] += v['powersongalbum_cfcount']; if(!isnan(v['distancekm_cfsum'])) ret['distancekm_cfsum'] = v['distancekm_cfsum'] + (ret['distancekm_cfsum'] == null? 0 : ret['distancekm_cfsum']); } return ret; }, finalize: function (key,val) { return { 'totalkm' : val['distancekm_cfsum'], 'workouts' : val['_cfcount'], 'songcount' : val['powersongalbum_cfcount'], 'avgkm' : (isnan(val['distancekm_cfnum']) isnan(val['distancekm_cfsum']))? null : val['distancekm_cfsum'] / val['distancekm_cfnum'], 'maxkm' : val['distancekm_cfmax'], 'minkm' : val['distancekm_cfmin'] }; }, out: "s2mr", verbose: true }); 20 http://rickosborne.org/blog/2010/02/yes-virginia-thats-automated-sql-to-mongodb-mapreduce/

MapReduce Summary MapReduce = programming model that hides details of parallelization, fault tolerance, locality optimization, and load balancing Simple model, but fits many common problems Implementation on cluster scales to 1000s of machines and more Open source implementation, Hadoop, is available Parallel DBMS, SQL are more powerful than MapReduce and similarly allow automatic parallelization of sequential code Never really achieved mainstream acceptance or broad open-source support like Hadoop Recent trend: simplify coding in MapReduce by using DBMS ideas (Variants of) relational operators and BI being implemented on top of Hadoop 21

HyperDex Adapted from Hyperdex a Distributed, Searchable Key-value Store Robert Escriva, Bernard Wong, Emin Gun Sirer ACM SIGCOMM Conference, August 14, 2012 22

CAP Review Strong Consistency : all clients see the same view, even in the presence of updates High Availability : all clients can find some replica of the data, even in the presence of failures Partition-tolerance: the system properties hold even when the system is partitioned or not fully operational 23

Typical NoSQL architecture K Hashing function maps each key to a server (node) 24

The search problem: No Hash key Locating a record without the hash key requires searching multiple servers 25

The Fault tolerance problem Many NOSQL system s default settings consider a write complete after writing to just 1 node 26

The consistency problem Clients may read inconsistent data and writes may be lost 27

HyperDex Key Points Maps records to a Hypercube Space object s key are stored in a dedicated one-dimensional subspace for efficient lookup only need to contact the servers which match the regions of the hyperspace assigned for the search attributes Value-dependent chaining Keeps replicas consistent without heavy overhead from coordination of servers Uses the hypercube space Appoints a point leader that contains the most recent update of a record Other replicas are updated from the point leader 28

Attributes map to dimensions in a multidimensional hyperspace 29

Attributes values are hashed independently Any hash function may be used 30

Objects reside at the coordinate specified by the hashes 31

Different objects reside at different coordinates 32

The hyperspace is divided into regions Each object resides in exactly one region 33

Each server is responsible for a region of the hyperspace 34

Each search intersects a subset of regions in the hyperspace 35

Example: All people name Neil mapped to the yellow plane 36

All people named Armstrong map to the grey plane 37

A more restrictive search for Neil Armstrong contacts fewer servers 38

Range searches are natively supported 39

Space Partitioning: Subspaces The hyperspace would grow exponentially in the number of dimensions Space partitioning prevents exponential growth in the number of searchable attributes k a1 a2 a3 a4 ad-3 ad-2 ad-1 ad A search is performed in the most restrictive subspace 40

Hyperspace Hashing Index Searches are efficient Hyperspace hashing is a mapping not an index No per-object updates to a shared data structure No overhead for building and maintaining B-trees Functionality gained solely through careful placement 41

Update a record: Value dependent chaining Since the values of the fields determine where in the hypercube the record is stored Must update the record in an ordered method since it may involve a move of the data Commits start at the tail and move to the head A put involves one node from each subspace Servers are replicated in each region to hold replicas of the data providing fault tolerance Updates propagate from a point leader that contains the most recent update of a record proceed to the tail (last dimension) 42

Hyperdex features Consistent: linearizable; GET requests will always return the latest PUT. High Availability: the system will stay up in the presence of f failures. Partition-Tolerant: for partitions with f nodes, you can be certain your system is still operational. Horizontally Scalable: you can grow your system by adding additional servers. Performance: high throughput and low variance. Searchable: it provides an expressive API for searching your data. 43

Summary: HyperDex Hyperspace hashing Value-dependent chaining High-Performance: High throughput with low variance Strong Consistency: Strong safety guarantees Fault Tolerance: Tolerates a threshold of failures Scalable: Adding resources increases performance Rich API: Support for complex data structures and search 44