CmpE 138 Spring 2011 Special Topics L2

Similar documents
CS 345A Data Mining. MapReduce

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Clustering Lecture 8: MapReduce

CS November 2017

Reliability & Chubby CSE 490H. This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License.

Distributed File Systems II

Programming Systems for Big Data

CS 345A Data Mining. MapReduce

Distributed Systems 16. Distributed File Systems II

CS November 2018

Consensus and related problems

MapReduce & BigTable

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

MI-PDB, MIE-PDB: Advanced Database Systems

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

MapReduce. U of Toronto, 2014

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Distributed Filesystem

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Map-Reduce. Marco Mura 2010 March, 31th

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Introduction to MapReduce

CS427 Multicore Architecture and Parallel Computing

The Chubby lock service for loosely- coupled distributed systems

MapReduce. Stony Brook University CSE545, Fall 2016

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Today s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014

CA485 Ray Walshe NoSQL

Paxos Made Simple. Leslie Lamport, 2001

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

The amount of data increases every day Some numbers ( 2012):

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Principles of Data Management. Lecture #16 (MapReduce & DFS for Big Data)

BigData and Map Reduce VITMAC03

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Parallel Programming Concepts

Distributed Computation Models

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

BigTable: A Distributed Storage System for Structured Data

CLOUD-SCALE FILE SYSTEMS

The Google File System. Alexandru Costan

Distributed System. Gang Wu. Spring,2018

Improving the MapReduce Big Data Processing Framework

Introduction to MapReduce (cont.)

CS 61C: Great Ideas in Computer Architecture. MapReduce

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Intuitive distributed algorithms. with F#

CA485 Ray Walshe Google File System

Lecture XIII: Replication-II

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

A brief history on Hadoop

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

7680: Distributed Systems

Big Data Management and NoSQL Databases

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Distributed Systems [Fall 2012]

Bigtable. Presenter: Yijun Hou, Yixiao Peng

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CS5412: OTHER DATA CENTER SERVICES

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Recovering from a Crash. Three-Phase Commit

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Distributed Systems 11. Consensus. Paul Krzyzanowski

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

GFS: The Google File System

DIVING IN: INSIDE THE DATA CENTER

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

BigTable. CSE-291 (Cloud Computing) Fall 2016

Introduction to MapReduce

The Google File System

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Programming Models MapReduce

Google File System 2

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Hadoop/MapReduce Computing Paradigm

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

A BigData Tour HDFS, Ceph and MapReduce

GFS: The Google File System. Dr. Yingwu Zhu

Distributed Data Store

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

The MapReduce Abstraction

Transcription:

CmpE 138 Spring 2011 Special Topics L2 Shivanshu Singh shivanshu.sjsu@gmail.com

Map Reduce ElecBon process

Map Reduce Typical single node architecture Applica'on CPU Memory Storage

Map Reduce Applica'on

Map Reduce CounBng SorBng Merge sort, Quick sort BIG Data Data Mining Trend Analysis e.g. TwiPer RecommendaBon Systems If bought = (A, B) => likely to buy C Google Search

The Underlying Technologies

Distributed systems, storage, compubng. Web data sets can be very large Tens to hundreds of terabytes. soon petabyte(s) Cannot mine on a single server (why?) Standard architecture emerging: Cluster of commodity Linux nodes (very) High speed Ethernet interconnect How to organize computabons on this architecture? Storage is cheap but data management is not (Nodes are bound to fail) Mask issues such as hardware failure

Mask issues such as hardware failure

Goal: Stable Storage For: (Stable) ComputaBon In other words if any of the nodes, fails how do we ensure data availability, persistence.?

Goal: Stable Storage Answer: Distribute it have redundancy Filesystem! Manage this Data operabons and services Store, Retrieve on a single logical resource that is distributed over a number of locabons.

DFS Distributed File System Provides global file namespace Google GFS; Hadoop HDFS; etc. Typical usage papern Huge files (100s of GB to TB) Reads and appends are common

DFS Chunk Servers File is split into conbguous chunks Typically each chunk is 16-64MB Each chunk replicated (usually 2x or 3x) Try to keep replicas in different racks Master node (GFS) a.k.a. Name Nodes in HDFS Stores metadata Might be replicated Client library for file access Talks to master to find chunk servers Connects directly to chunk servers to access data

Chubby A coarse- grained lock service distributed systems can use this to synchronize access to shared resources Intended for use by loosely- coupled distributed systems In GFS: Elect a master In BigTable: master elecbon, client discovery, table service locking

Interface Presents a simple distributed file system Clients can open/close/read/write files Reads and writes are whole- file Also supports advisory reader/writer locks Clients can register for nobficabon of file update

Topology Replica Replica ALL Client Traffic Master Replica Replica One Chubby Cell

Master ElecBon All replicas try to acquire a write lock on designated file. The one who gets the lock is the master. Master can then write its address to file other replicas can read this file to discover the chosen master name. Chubby doubles as a name service

Consensus Chubby cell is usually 5 replicas 3 must be alive for cell to be viable How do replicas in Chubby agree on their own master, official lock values? PAXOS algorithm

PAXOS Paxos is a family of algorithms (by Leslie Lamport) designed to provide distributed consensus in a network of several processors.

Processor AssumpBons Operate at arbitrary speed Independent, random failures Processors with stable storage may rejoin protocol aoer failure Do not lie, collude, or apempt to maliciously subvert the protocol

Network AssumpBons All processors can communicate with (see) one another Messages are sent asynchronously and may take arbitrarily long to deliver Order of messages is not guaranteed: they may be lost, reordered, or duplicated Messages, if delivered, are not corrupted in the process

A Fault Tolerant Memory of Facts Paxos provides a memory for individual facts in the network. A fact is a binding from a variable to a value. Paxos between 2F+1 processors is reliable and can make progress if up to F of them fail.

Roles Proposer An agent that proposes a fact Leader the authoritabve proposer Acceptor holds agreed- upon facts in its memory Learner May retrieve a fact from the system

Safety Guarantees Nontriviality: Only proposed values can be learned Consistency: Only at most one value can be learned Liveness: If at least one value V has been proposed, eventually any learner L will get some value

Key Idea Acceptors do not act unilaterally. For a fact to be learned, a quorum of acceptors must agree upon the fact A quorum is any majority of acceptors Given acceptors {A, B, C, D}, Q = {{A, B, C}, {A, B, D}, {B, C, D}, {A, C, D}}

Basic Paxos Determines the authoritabve value for a single variable Several proposers offer a value V n to set the variable to. The system converges on a single agreed- upon V to be the fact.

Step 1: Prepare Credit: spinnaker labs inc.

Step 2: Promise PROMISE x Acceptor will accept proposals only numbered x or higher Proposer 1 is ineligible because an Acceptor quorum has voted for a higher number than j Credit: spinnaker labs inc.

Step 3: Accept Credit: spinnaker labs inc.

Step 4: Accepted ack Credit: spinnaker labs inc.

Learning If a learner interrogates the system, a quorum will respond with fact V_k

Basic Paxos conbnued.. Proposer 1 is free to try again with a proposal number > k; can take over leadership and write in a new authoritabve value Official fact will change atomically on all acceptors from the perspecbve of learners If a leader dies mid- negobabon, value just drops, another leader tries with higher proposal

Paxos in Chubby Replicas in a cell inibally use Paxos to establish the leader. Majority of replicas must agree Replicas promise not to try to elect new master for at least a few seconds (master lease) Master lease is periodically renewed Read More: hpp://labs.google.com/papers/chubby.html hpp://labs.google.com/papers/bigtable- osdi06.pdf

Big Table Google s Needs: Data reliability High speed retrieval Storage of huge numbers of records (several TB of data) (MulBple) past versions of records should be available

HBase - Big Table Features: Simplified data retrieval mechanism (row, col, Bmestamp) value lookup, only No relabonal operators Arbitrary number of columns per row Arbitrary data type for each column New constraint: data validabon must be performed by applicabon layer!

Logical Data RepresentaBon Rows & columns idenbfied by arbitrary strings MulBple versions of a (row, col) cell can be accessed through Bmestamps ApplicaBon controls version tracking policy Columns grouped into column families

Data Model Related columns stored in fixed number of families Family name is a prefix on column name e.g., fileapr:owning_group, fileapr:owning_user A column name has the form "<family>:<label>" where <family> and <label> can be arbitrary byte arrays. Lookup is Hash based Column families stored physically close on disk items in a given column family should have roughly the same read/write characterisbcs and contain similar data.

Conceptual View Column family

Physical Storage View Each stored in conbguous chunks over mulbple nodes as the data grows

Example GET DecimalFormat decimalformat = new DecimalFormat("0000000"); HTable htable = new HTable("rest_data"); String str =decimalformat.format(4); Get g = new Get(Bytes.toBytes(str)); Result r = htable.get(g); NavigableMap<byte[], byte[]> map = r.getfamilymap(bytes.tobytes("feature"));

Example PUT DecimalFormat restidformat = new DecimalFormat("0000000"); HTable htable = new HTable("restaurants"); String restid = restidformat.format(4); Put put = new Put(Bytes.toBytes("rest_ids")); put.add(bytes.tobytes("restaurant_id"), Bytes htable.put(put);.tobytes(restid), Bytes.toBytes(restId));

HBase - BigTable Further Reading with many more details: hpp://wiki.apache.org/hadoop/hbase/ HbaseArchitecture hpp://labs.google.com/papers/bigtable- osdi06.pdf

MapReduce ImplementaBons run on the backbone of DFS such as HDFS, GFS Using if needed, storage solubons like HBase, BigTable

Word Count We have a large file of words, one word to a line Count the number of Bmes each disbnct word appears in the file Sample applicabon: analyze web server logs to find popular URLs

Word Count Input: a set of key/value pairs User supplies two funcbons: map(k,v) Intermediate: list(k1,v1) reduce(k1, list(v1)) à v2 (k1,v1) is an intermediate key/value pair Output is the set of (k1,v2) pairs

Word Count using MapReduce map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; value: an iterator over counts result = 0 for each count v in values: result += v emit(result)

Overview

Data Flow Input, final output are stored on a distributed file system (GFS, HDFS) Scheduler tries to schedule map tasks close to physical storage locabon of input data Intermediate results are stored on local FS of map and reduce workers Output is ooen input to another map reduce task E.g. data mining apriori algorithm

CoordinaBon Master data structures Task status: (idle, in- progress, completed) Idle tasks get scheduled as workers become available When a map task completes, it sends the master the locabon and sizes of its R intermediate files, one for each reducer Master pushes this info to reducers Master pings workers periodically to detect failures

Failures Map worker failure Map tasks completed or in- progress at worker are reset to idle Reduce workers are nobfied when task is rescheduled on another worker Reduce worker failure Only in- progress tasks are reset to idle Master failure MapReduce task is aborted and client is nobfied

Combiners Ooen a map task will produce many pairs of the form (k,v1), (k,v2),... for the same key k E.g., popular words in Word Count Can save network Bme by pre- aggregabng at mapper combine(k1, list(v1)) à v2 Usually same as reduce funcbon

ParBBon FuncBon Inputs to map tasks are created by conbguous splits of input file For reduce, we need to ensure that records with the same intermediate key end up at the same worker System uses a default parbbon funcbon e.g., hash(key) mod R SomeBmes useful to override E.g., hash(hostname(url)) mod R ensures URLs from a host end up in the same output file

More Reading hpp://labs.google.com/papers/mapreduce- osdi04- slides/index.html hpp://labs.google.com/papers/mapreduce- osdi04.pdf hpp://wiki.apache.org/hadoop/ hpp://code.google.com/edu/parallel/mapreduce- tutorial.html