Distributed Programming the Google Way

Similar documents
CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Map-Reduce. Marco Mura 2010 March, 31th

Distributed File Systems II

CS November 2018

CS November 2017

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

L22: SC Report, Map Reduce

Clustering Lecture 8: MapReduce

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

MapReduce & BigTable

Distributed computing: index building and use

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Chapter 5. The MapReduce Programming Model and Implementation

CA485 Ray Walshe NoSQL

BigTable: A Distributed Storage System for Structured Data

Parallel Programming Concepts

Lessons Learned While Building Infrastructure Software at Google

CA485 Ray Walshe Google File System

Data Informatics. Seon Ho Kim, Ph.D.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

Map Reduce. Yerevan.

CISC 7610 Lecture 2b The beginnings of NoSQL

Introduction to Data Management CSE 344

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

BigTable. CSE-291 (Cloud Computing) Fall 2016

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CS 345A Data Mining. MapReduce

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Databases 2 (VU) ( / )

Introduction to MapReduce

Distributed computing: index building and use

Data Partitioning and MapReduce

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Google big data techniques (2)

Introduction to NoSQL Databases

NPTEL Course Jan K. Gopinath Indian Institute of Science

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Extreme Computing. NoSQL.

Factors Driving Cloud Adoption

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Database Systems CSE 414

The Google File System

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Advanced Data Management Technologies Written Exam

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Challenges for Data Driven Systems

Introduction to MapReduce

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Distributed Filesystem

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Parallel Computing: MapReduce Jin, Hai

Developing MapReduce Programs

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

CS5412: DIVING IN: INSIDE THE DATA CENTER

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

W b b 2.0. = = Data Ex E pl p o l s o io i n

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Chapter 24 NOSQL Databases and Big Data Storage Systems

MapReduce and Friends

CSE-E5430 Scalable Cloud Computing Lecture 9

The amount of data increases every day Some numbers ( 2012):

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Data Storage in the Cloud

Introduction to NoSQL

Part A: MapReduce. Introduction Model Implementation issues

Distributed Computations MapReduce. adapted from Jeff Dean s slides

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

The MapReduce Abstraction

MI-PDB, MIE-PDB: Advanced Database Systems

Lecture 11 Hadoop & Spark

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Distributed System. Gang Wu. Spring,2018

Distributed Computation Models

This material is covered in the textbook in Chapter 21.

Introduction to Column Oriented Databases in PHP

TI2736-B Big Data Processing. Claudia Hauff

Dremel: Interactice Analysis of Web-Scale Datasets

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Big Data Management and NoSQL Databases

A brief history on Hadoop

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

Distributed Systems 16. Distributed File Systems II

Map Reduce.

Transcription:

Distributed Programming the Google Way Gregor Hohpe Software Engineer www.enterpriseintegrationpatterns.com Scalable & Distributed Fault tolerant distributed disk storage: Google File System Distributed shared memory: Bigtable Parallel programming abstraction: MapReduce Domain Specific Languages: Sawzall 2

GFS: Google File System Data replicated 3 times. Upon failure, software rereplicates. Master: Manages file metadata. Chunk size 64 MB. GFS Master Client Client C 0 C C 5 C 2 C 5 C C 3 C 0 C 5 C 2 Chunkserver Chunkserver 2 Chunkserver N http://research.google.com/archive/gfs-sosp2003.pdf 3 Bigtable: A sparse, distributed, persistent, multidimensional, sorted Map (RowKey, ColumnFamily:Column, Timestamp) Value Column Family Column Family Column A Column B Column C Key Cell Cell Cell 2007//26 2:59 2008/04/0 :0 Column A Column A Key 2 Key 3 Cell Column E Cell Column F Cell Cell Latest Version of Key / Column C http://research.google.com/archive/bigtable-osdi06.pdf 4 2

Map-Reduce Express computation as Map / Group / Reduce map(in_key, data) list(key, value) (group output by key) reduce(key, list(values)) list(out_data) Well suited for "embarrassingly parallel" problems Open source implementation: Hadoop http://research.google.com/archive/mapreduce.html 5 Simple Example: Word Count Chose the key so you get the most out of the framework map(in_key, data): for word in data.split(): output(word, ) reduce(key, list(values)): print key, len(values) to be or not to be Map KeyValue to be or not to be Group Key Values be not or to Reduce be not or to 2 2 6 3

Log Processing: Sawzall Domain-specific language Process one record at a time Aggregation externalized into "tables" count: table sum of int; total: table sum of float; number: float = string(input); emit count <- ; emit total <- number; 3.5 2.25 ------- count[] = 3 total[] = 6.75 "Programming in the first derivative" http://labs.google.com/papers/sawzall.html 7 Simple Example: Word Count word_count: table sum[string] of int; line = input; words = split(line); for each word in words: emit word_count[words] <- ; to be or not to be that is the question input to be or not to be that is the question emit sum be 2 to 2 not that 8 4

Underlying Considerations. Sharding 2. Less is More 3. Autonomy 4. Expect Failure 5. Power to the Runtime 6. Favor Stateless 7. Separate Stateless from Stateful 8. Precision vs. Speed Conscious trade-offs 9 Sharding Divide and Conquer 0 5

Partition Data Across DB Instances Shard function, e.g. customer ID Hierarchies (one-way) work well Many-to-many relationships (two-way) difficult "Special Shard" / All shard queries Application Layer Logical Entity Persistence Layer Shard Function Physical Entity Less is More 2 6

Bigtable, Not Bigdatabase Less is More: No transactions No schema No foreign keys No join No relational algebra, Cartesian product, etc No SQL Single row updates are atomic. Everything else is not. Only basic data types: string, counter, protocol buffer 3 Autonomy Keep Going without Supervision 4 7

GFS Direct communication between client and Chunk server Large Chunk size (64MB) GFS Master Client Client C 0 C C 5 C 2 C 5 C C 3 C 0 C 5 C 2 Chunkserver Chunkserver 2 Chunkserver N 5 Expect Failure Not If, But When 6 8

Expect Failure: GFS & MapReduce GFS: Data replicated 3 times. Upon failure, software re-replicates. MapReduce: I n p u t Map Task Sort & Group Reduce Task Restarts failed map / reduce workers Detects key/value pairs that cause crashes, skips Tougher to deal with: laggards Map Task 2 key D a t a Map Task 3 Sort & Group Reduce Task 2 7 Power to the Runtime Free Flow Instead of Strict Rules 8 9

MapReduce Execution 9 Sawzall Quantifiers Descriptive as opposed to Prescriptive function(word: string): bool { when(i: some int; word[i]!= word[$--i]) return false; return true; }; some / all / each 20 0

Favor Stateless Don't Remember a Thing 2 MapReduce / Sawzall /Bigtable Map and Reduce step are stateless map(in_key, data) list(key, value) reduce(key, list(values)) list(out_data) Sawzall views input data as set, not list Bigtable has set() operation vs. insert / update 22

Separate Stateless From Stateful The Real World is Rarely Stateless 23 Sawzall State!! while!eof(file) { line = readline(file); words = split(line); for each word in words: map[word]++; } map: table sum[string] of int; line = input; words = split(line); for each word in words: emit map[words] <- ; Input Process Σ Process Process Σ 24 2

Precision vs. Speed Faster is Better (in Software) 25 Trade precision for speed: Sawzall Top() function: Statistical samplings that record the `top N' data items. // This type is for estimating the most common // entries based on the CountSketch algorithm from // "Finding Frequent Items in Data Streams", // Moses Charikar, Kevin Chen and Martin // Farach-Colton. 26 3

GFS It's all about trade-offs! Large chunk size (64MB) Optimized for sequential read, not random access Bigtable Optimized for read-intensive applications Distributed, but not transactional Sawzall Cannot detect duplicate rows 27 http://www.flickr.com/photos/hellolapomme/2552346073 http://www.flickr.com/photos/lorkano/327796949 http://www.flickr.com/photos/mukluk/28892573 Thank You! 4