Fault Tolerant Distributed Main Memory Systems

Similar documents
Resilient Distributed Datasets

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Data-intensive computing systems

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

Spark: A Brief History.

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

CS Spark. Slides from Matei Zaharia and Databricks

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

Database Systems CSE 414

Utah Distributed Systems Meetup and Reading Group - Map Reduce and Spark

Spark & Spark SQL. High- Speed In- Memory Analytics over Hadoop and Hive Data. Instructor: Duen Horng (Polo) Chau

Programming Systems for Big Data

a Spark in the cloud iterative and interactive cluster computing

CSE 444: Database Internals. Lecture 23 Spark

/ Cloud Computing. Recitation 13 April 14 th 2015

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Analytics in Spark. Yanlei Diao Tim Hunter. Slides Courtesy of Ion Stoica, Matei Zaharia and Brooke Wenig

Fast, Interactive, Language-Integrated Cluster Computing

Massive Online Analysis - Storm,Spark

Introduction to Apache Spark. Patrick Wendell - Databricks

/ Cloud Computing. Recitation 13 April 12 th 2016

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Big Data Infrastructures & Technologies

CompSci 516: Database Systems

Lecture 25: Spark. (leveraging bulk-granularity program structure) Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Distributed Computing with Spark

/ Cloud Computing. Recitation 13 April 17th 2018

Spark Overview. Professor Sasu Tarkoma.

We consider the general additive objective function that we saw in previous lectures: n F (w; x i, y i ) i=1

15.1 Data flow vs. traditional network programming

Evolution From Shark To Spark SQL:

An Introduction to Apache Spark

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/22/2018 Week 10-A Sangmi Lee Pallickara. FAQs.

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

Apache Spark Internals

Turning Relational Database Tables into Spark Data Sources

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Spark.jl: Resilient Distributed Datasets in Julia

Deep Learning Comes of Age. Presenter: 齐琦 海南大学

Chapter 4: Apache Spark

2

Numerical Computing with Spark. Hossein Falaki

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

An Introduction to Big Data Analysis using Spark

Lecture 11 Hadoop & Spark

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Spark and Spark SQL. Amir H. Payberah. SICS Swedish ICT. Amir H. Payberah (SICS) Spark and Spark SQL June 29, / 71

Big Data Infrastructures & Technologies Hadoop Streaming Revisit.

Chase Wu New Jersey Institute of Technology

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Lecture 30: Distributed Map-Reduce using Hadoop and Spark Frameworks

COMP 322: Fundamentals of Parallel Programming. Lecture 37: Distributed Computing, Apache Spark

Splunk and Spark. Liu- yuan Lai. So;ware Engineer, Splunk

In-Memory Processing with Apache Spark. Vincent Leroy

Distributed Computing with Spark and MapReduce

Distributed Computation Models

MapReduce, Hadoop and Spark. Bompotas Agorakis

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Database Systems CSE 414

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences

Data processing in Apache Spark

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

An Introduction to Apache Spark Big Data Madison: 29 July William Red Hat, Inc.

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F.

01: Getting Started. Installation. hands-on lab: 20 min

Map- Reduce. Everything Data CompSci Spring 2014

Introduction to MapReduce (cont.)

Introduction to Apache Spark

Batch Processing Basic architecture

Dell In-Memory Appliance for Cloudera Enterprise

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

6.830 Lecture Spark 11/15/2017

Asynchronous Graph Processing

Spark and distributed data processing

MapReduce. Stony Brook University CSE545, Fall 2016

Graph Processing & Bulk Synchronous Parallel Model

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

15-440/15-640: Homework 3 Due: November 8, :59pm

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/24/2018 Week 10-B Sangmi Lee Pallickara

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Big Data and IoT. Baris Aksanli 02/10/2016

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Large Scale Graph Processing Pregel, GraphLab and GraphX

Data at the Speed of your Users

Big Data Analytics. C. Distributed Computing Environments / C.2. Resilient Distributed Datasets: Apache Spark. Lars Schmidt-Thieme

CS535 Big Data Fall 2017 Colorado State University 9/19/2017 Sangmi Lee Pallickara Week 5- A.

Spark, Shark and Spark Streaming Introduction

CS60021: Scalable Data Mining. Spark. Sourangshu Bha<acharya

Survey on Incremental MapReduce for Data Mining

Fault Tolerance in K3. Ben Glickman, Amit Mehta, Josh Wheeler

CS 6453: Parameter Server. Soumya Basu March 7, 2017

Outline. CS-562 Introduction to data analysis using Apache Spark

Apache Spark. CS240A T Yang. Some of them are based on P. Wendell s Spark slides

Transcription:

Fault Tolerant Distributed Main Memory Systems CompSci 590.04 Instructor: Ashwin Machanavajjhala Lecture 16 : 590.04 Fall 15 1

Recap: Map Reduce! map!!,!! list!!!!reduce!!, list(!! )!! Map Phase (per record computaeon)!!,!!! list!!!,!!! Reduce Phase (global computaeon) Split Shuffle Lecture 16 : 590.04 Fall 15 2

Recap: Map Reduce Programming Model Simple model Programmer only describes the logic + Distributed System Works on commodity hardware Scales to thousands of machines Ship code to the data, rather than ship data to code Hides all the hard systems problems from the programmer Machine failures Data placement Lecture 16 : 590.04 Fall 15 3

Recap: Map Reduce But as soon as it got popular, users wanted more: More complex, mule- stage applicaeons (e.g. iteraeve machine learning & graph processing) More interaceve ad- hoc queries HDFS" read" HDFS" write" HDFS" read" HDFS" write" iter."1" iter."2".((.((.( Input" Lecture 16 : 590.04 Fall 15 4

Recap: Map Reduce But as soon as it got popular, users wanted more: More complex, mule- stage applicaeons (e.g. iteraeve machine learning & graph processing) More interaceve ad- hoc queries Input" HDFS" read" HDFS" write" HDFS" read" HDFS" write" Thus arose iter."1" many specialized frameworks iter."2" for parallel.((.((.( processing Lecture 16 : 590.04 Fall 15 5

Recap: Pregel 3 6 2 1 Superstep 0 6 6 2 6 Superstep 1 6 6 6 6 Superstep 2 6 6 6 6 Superstep 3 Figure 2: Maximum Value Example. Dotted lines are messages. Shaded vertices have voted to halt. Lecture 16 : 590.04 Fall 15 6

GraphLab Data Graph Shared Data Table Scheduling Lecture 16 : 590.04 Fall 15 Update FuncEons and Scopes 7

Problem with specialized frameworks Running mule- stage workflows is hard Extract a meneons of celebriees from news arecles Construct a co- reference graph of celebriees (based on cooccurence in the same arecle) Analyze this graph (say connected components / page rank) Graph processing on Map Reduce is slow. The input does not have a graph abstraceon. Map Reduce is a good candidate to construct the graph in the first place. Lecture 16 : 590.04 Fall 15 8

Root Cause Analysis Why do graph processing algorithms and iteraeve computaeon do poorly on Map Reduce? HDFS" read" HDFS" write" HDFS" read" HDFS" write" iter."1" iter."2".((.((.( Input" There is usually some (large) input that does not change across iteraeons. Map reduce unnecessarily keeps wrieng to and reading from disk. Lecture 16 : 590.04 Fall 15 9

Examples Page Rank Links in the graph do not change, only the rank of each node changes. LogisEc Regression The original set of points do not change, only the model needs to be updated Connected components / K- means clustering The graph/dataset does not change, only the labels on the nodes/ points changes. Lecture 16 : 590.04 Fall 15 10

Examples Page Rank Links in the graph do not change, only the rank of each node changes. LARGE LogisEc Regression The original set of points do not change, only the model needs to be updated Connected components / K- means clustering The graph/dataset does not change, only the labels on the nodes/ points changes. Lecture 16 : 590.04 Fall 15 11

Examples Page Rank Links in the graph do not change, only the rank of each node changes. small LogisEc Regression The original set of points do not change, only the model needs to be updated Connected components / K- means clustering The graph/dataset does not change, only the labels on the nodes/ points changes. Lecture 16 : 590.04 Fall 15 12

Idea: Load the immutable part into memory Twiger follows graph: 26GB uncompressed Can be stored in memory using 7 off the shelf machines each having 4 GB memory each. Lecture 16 : 590.04 Fall 15 13

Idea: Load the immutable part into memory Twiger follows graph: 26GB uncompressed Can be stored in memory using 7 off the shelf machines each having 4 GB memory each. Problem: Fault Tolerance! Lecture 16 : 590.04 Fall 15 14

Fault Tolerant Distributed Memory SoluEon 1: Global CheckpoinEng E.g., Piccolo (hgp://piccolo.news.cs.nyu.edu/) Problem: need to redo a lot of computaeon. (In Map Reduce: need to only to redo a Mapper or Reducer) Lecture 16 : 590.04 Fall 15 15

Fault Tolerant Distributed Memory SoluEon 2: ReplicaEon (e.g., RAMCloud ) Lecture 16 : 590.04 Fall 15 16

RAMCloud Log Structured Storage Each master maintains in memory An append only log Hash Table (object id, locaeon on the log) Every write becomes an append on the log Plus a write to the hash table Log is divided into log segments Lecture 16 : 590.04 Fall 15 17

Durable Writes Write to the head of log (in master s memory) Write to hash table (in master s memory) ReplicaEon to 3 other backups They each write to the backup log in memory and return Master returns as soon as ACK is received from replicas. Backups write to disk when the log segment becomes full. Lecture 16 : 590.04 Fall 15 18

Fault Tolerant Distributed Memory SoluEon 2: ReplicaEon Log Structured Storage (e.g., RAMCloud) + ReplicaEon Problem: Every write triggers replicaeon across nodes, which can become expensive. Log needs constant maintenance and garbage cleaning. Lecture 16 : 590.04 Fall 15 19

Fault Tolerant Distributed Memory Moreover, exiseng solueons (Piccolo, RAMCloud, memcached) assume that objects in memory can be read as well as wrigen But, in most applicaeons we only need objects in memory that are read (and hence immutable). Lecture 16 : 590.04 Fall 15 20

Fault Tolerant Distributed Memory SoluEon 3: Resilient Distributed Datasets Restricted form of distributed shared memory Data in memory is immutable ParEEoned colleceon of records Can only be built through coarse grained determinisec transformaeons (map, filter, join, etc) Fault Tolerance through lineage Maintain a small log of operaeons Recompute lost pareeons when failures occur Lecture 16 : 590.04 Fall 15 21

lines = spark.textfile( hdfs://... ) Example: Log Mining errors = lines.filter(_.startswith( ERROR )) messages = errors.map(_.split( \t )(2)) messages.persist() messages.filter(_.contains( foo )).count messages.filter(_.contains( bar )).count Original File This is the RDD that is stored First aceon triggers RDD computaeon and load into memory Lecture 16 : 590.04 Fall 15 22

RDD Fault Tolerance RDDs track the graph of operaeons used to construct them, called lineage. Lineage is used to rebuild data lost due to failures lines = spark.textfile( hdfs://... ) HadoopRDD errors = lines.filter(_.startswith( ERROR )) FilteredRDD messages = errors.map(_.split( \t )(2)) MappedRDD HadoopRDD" FilteredRDD" MappedRDD" HadoopRDD" " path"="hdfs:// " FilteredRDD" " func"="_.contains(...)" MappedRDD " func"="_.split( )" Lecture 16 : 590.04 Fall 15 23

RDD Fault Tolerance The larger the lineage, more computaeon is needed, and thus recovery from failure will be longer. Therefore, RDDs only allow operaeons that touch a large number of records at the same Eme. Transformations( (define"a"new"rdd)" map" filter" sample" groupbykey" reducebykey" sortbykey" flatmap" union" join" cogroup" cross" mapvalues" Actions( (return"a"result"to" driver"program)" collect" reduce" count" save" lookupkey" Lecture 16 : 590.04 Fall 15 24

RDD Fault Tolerance The larger the lineage, more computaeon is needed, and thus recovery from failure will be longer. Therefore, RDDs only allow operaeons that touch a large number of records at the same Eme. Great for batch operaeons Not so good for random access or asynchronous algorithms. Lecture 16 : 590.04 Fall 15 25

IteraEve ComputaEon LogisEc Regression val points = spark.textfile(...).map(parsepoint).persist() var w = // random initial vector for (i <- 1 to ITERATIONS) { val gradient = points.map{ p => p.x * (1/(1+exp(-p.y*(w dot p.x)))-1)*p.y }.reduce((a,b) => a+b) w -= gradient } Lecture 16 : 590.04 Fall 15 26

Page Rank Lineage graphs can be long. Uses checkpoineng in such cases. input file map links ranks 0 join contribs 0 reduce + map ranks 1 val links = spark.textfile(...).map(...).persist() var ranks = // RDD of (URL, rank) pairs for (i <- 1 to ITERATIONS) { // Build an RDD of (targeturl, float) pairs // with the contributions sent by each page val contribs = links.join(ranks).flatmap { (url, (links, rank)) => links.map(dest => (dest, rank/links.size)) } // Sum contributions by URL and get new ranks ranks = contribs.reducebykey((x,y) => x+y).mapvalues(sum => a/n + (1-a)*sum) } contribs 1 ranks 2 contribs 2... Lecture 16 : 590.04 Fall 15 27

TransformaEons and Lineage Graphs Narrow Dependencies: Wide Dependencies: map, filter groupbykey union join with inputs co-partitioned join with inputs not co-partitioned User can specify how data is pareeoned to ensure narrow dependencies Lecture 16 : 590.04 Fall 15 28

Scheduling A: B: Stage 1 C: D: groupby F: G: map E: join Stage 2 union Stage 3 Can pipeline execueon as long as dependencies are narrow Lecture 16 : 590.04 Fall 15 29

Summary Map Reduce requires wrieng to disk for fault tolerance Not good for iteraeve computaeon. RDD: Restricted form of distributed shared memory Data in memory is immutable ParEEoned colleceon of records Can only be built through coarse grained determinisec transformaeons (map, filter, join, etc) Fault Tolerance through lineage Maintain a small log of operaeons Recompute lost pareeons when failures occur Lecture 16 : 590.04 Fall 15 30