DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE

Similar documents
Drizzle: Fast and Adaptable Stream Processing at Scale

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

System Design for Large Scale Machine Learning. Shivaram Venkataraman. A dissertation submitted in partial satisfaction of the

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

The Future of Real-Time in Spark

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Lecture 11 Hadoop & Spark

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Data Analytics with HPC. Data Streaming

Resilient Distributed Datasets

PREDICTIVE DATACENTER ANALYTICS WITH STRYMON

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

A Platform for Fine-Grained Resource Sharing in the Data Center

Spark: A Brief History.

Naiad. a timely dataflow model

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Thrill : High-Performance Algorithmic

Improving the MapReduce Big Data Processing Framework

Pocket: Elastic Ephemeral Storage for Serverless Analytics

MapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Modern Stream Processing with Apache Flink

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Batch Processing Basic architecture

Shark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

CompSci 516: Database Systems

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The Datacenter Needs an Operating System

Amazon Aurora Deep Dive

Real-time data processing with Apache Flink

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

A BigData Tour HDFS, Ceph and MapReduce

The Stream Processor as a Database. Ufuk

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Streaming Auto-Scaling in Google Cloud Dataflow

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )

Spark, Shark and Spark Streaming Introduction

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Flash Storage Complementing a Data Lake for Real-Time Insight

Hadoop/MapReduce Computing Paradigm

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Prediction Systems. Dan Crankshaw UCB RISE Lab Seminar 10/3/2015

Personalizing Netflix with Streaming datasets

Spark Streaming. Professor Sasu Tarkoma.

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )

Architecture of Flink's Streaming Runtime. Robert

Using Apache Beam for Batch, Streaming, and Everything in Between. Dan Halperin Apache Beam PMC Senior Software Engineer, Google

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Fast, Interactive, Language-Integrated Cluster Computing

Databases 2 (VU) ( / )

Intelligent Services Serving Machine Learning

Naiad (Timely Dataflow) & Streaming Systems

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

The Power of Snapshots Stateful Stream Processing with Apache Flink

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Scalable Streaming Analytics

Practical Big Data Processing An Overview of Apache Flink

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_

Unifying Big Data Workloads in Apache Spark

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

CSE 444: Database Internals. Lecture 23 Spark

Streaming Analytics with Apache Flink. Stephan

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY

Introduction to MapReduce (cont.)

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Aggregation and Degradation in JetStream: Streaming analytics in the wide area

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Scalable Tools - Part I Introduction to Scalable Tools

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Data Analytics at Logitech Snowflake + Tableau = #Winning

Integration of Machine Learning Library in Apache Apex

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Fast and Easy Stream Processing with Hazelcast Jet. Gokhan Oner Hazelcast

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences

Implementing the speed layer in a lambda architecture

CA485 Ray Walshe Google File System

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/22/2018 Week 10-A Sangmi Lee Pallickara. FAQs.

The Google File System

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Portable stateful big data processing in Apache Beam

2/4/2019 Week 3- A Sangmi Lee Pallickara

Big Data Platforms. Alessandro Margara

Rethinking Adaptability in Wide-Area Stream Processing Systems

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Fault Tolerance in K3. Ben Glickman, Amit Mehta, Josh Wheeler

Warehouse- Scale Computing and the BDAS Stack

Transcription:

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica

STREAMING WORKLOADS

Streaming Trends: Low latency Results power decisions by machines Credit card fraud Disable account Suspicious user logins Ask security questions Slow video load Direct user to new CDN

Streaming Requirements: High throughput Disable stolen accounts Detect suspicious logins Dynamically adjust application behavior As many as 10s of millions of updates per second Need a distributed system

Distributed Execution Models

Execution models: CONTINUOUS OPERATORS Group by user, run anomaly detection

Execution models: CONTINUOUS OPERATORS Group by user, run anomaly detection Low latency output Mutable local state

Execution models: CONTINUOUS OPERATORS Group by user, run anomaly detection Low latency output Systems: Naiad Google MillWheel Streaming DBs: Borealis, Flux etc Mutable local state

Execution models: Micro-batch Group by user, run anomaly detection z Tasks output state on completion Output at task granularity

Execution models: Micro-batch Group by user, run anomaly detection Dynamic task scheduling z Adaptability Tasks output state on completion Straggler mitigation Elasticity Fault tolerance Google FlumeJava Output at task granularity Microsoft Dryad

Failure recovery

Chandy Lamport Async Checkpoint Checkpointed state Failure recovery: continuous operators All machines replay from checkpoint?

Failure recovery: Micro-batch z z z Task boundaries capture task interactions! z Task output is periodically checkpointed

Failure recovery: Micro-batch z z z Replay tasks from failed machine z Task output is periodically checkpointed Parallelize replay

Execution models Continuous operators Static scheduling Inflexible Slow failover Low latency Micro-batch Scheduling granularity Dynamic scheduling Adaptable Parallel recovery Straggler mitigation Processing granularity Higher latency

Execution models Continuous operators Static scheduling Low latency Drizzle Dynamic scheduling (coarse granularity) Low latency (fine-grained processing) Micro-batch Dynamic scheduling (coarse granularity) Higher latency (coarse-grained processing)

inside the scheduler? Scheduler (1) Decide how to assign tasks to machines data locality fair sharing (2) Serialize and send tasks

SCHEDULING OVERHEADS Median-task time breakdown Time (ms) 250 200 150 100 50 0 4 8 16 32 64 128 Machines Compute + Data Transfer Task Fetch Scheduler Delay Cluster: 4 core, r3.xlarge machines Workload: Sum of 10k numbers per-core

inside the scheduler?? Scheduler (1) Decide how to assign tasks to machines data locality fair sharing (2) Serialize and send tasks Reuse scheduling decisions!

DRIZZLE Goal: remove frequent scheduler interaction (2) Group schedule micro-batches (1) Pre-schedule reduce tasks

Goal: Remove scheduler involvement for reduce tasks (1) Pre-schedule reduce tasks

? Goal: Remove scheduler involvement for reduce tasks (1) Pre-schedule reduce tasks

coordinating shuffles: Existing systems Metadata describes shuffle data location Data fetched from remote machines

coordinating shuffles: Pre-scheduling (1) Pre-schedule reducers (2) Mappers get metadata (3) Mappers trigger reducers

DRIZZLE Goal: wait to return to scheduler (2) Group schedule micro-batches (1) Pre-schedule reduce tasks

Group of 2 Group scheduling Group of 2 Schedule group of micro-batches at once Fault tolerance, scheduling at group boundaries

Micro-benchmark: 2-stages Time / Iter (ms) 300 250 200 150 100 50 0 100 iterations Breakdown of pre-scheduling, group-scheduling Baseline Only Pre-Scheduling Drizzle-10 Drizzle-100 4 8 16 32 64 128 Machines In the paper: group size auto-tuning

Evaluation Continuous operators Static scheduling 1. Latency? Low latency Drizzle Dynamic scheduling (coarse granularity) Low latency (fine-grained processing) Micro-batch Dynamic scheduling (coarse granularity) 2. Adaptability? Higher latency (coarse-grained processing)

EVALUATION: Latency Yahoo! Streaming Benchmark Input: JSON events of ad-clicks Compute: Number of clicks per campaign Window: Update every 10s Comparing Spark 2.0, Flink 1.1.1, Drizzle 128 Amazon EC2 r3.xlarge instances

Streaming BENCHMARK - performance Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines Event Latency: Difference between window end, processing end 1 0.8 0.6 0.4 0.2 0 Spark Drizzle Flink 0 500 1000 1500 2000 2500 3000 Event Latency (ms)

Adaptability: FAULT TOLERANCE Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines Inject machine failure at 240 seconds 20000 2000 Spark Flink Drizzle Latency (ms) 15000 1500 10000 1000 5000 500 0 190 200 210 220 230 240 250 260 270 280 290 31

Execution models Continuous operators Static scheduling Low latency Drizzle Dynamic scheduling (coarse-granularity) Low latency (fine-grained processing) Optimization of batches Micro-batch Dynamic scheduling Higher latency Optimization of batches

INTRA-BATCH QUERY optimization Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines Optimize execution of each micro-batch by pushing down aggregation 1 0.8 0.6 0.4 0.2 0 Spark Drizzle Flink Drizzle-Optimized 0 500 1000 1500 2000 2500 3000 Event Latency (ms)

EVALUATION End-to-end Latency Fault tolerance Query optimization Throughput Elasticity Group-size tuning Yahoo Streaming Benchmark Synthetic micro-benchmarks Video Analytics Shivaram s Thesis: Iterative ML Algorithms

Conclusion Shivaram is answering questions on sli.do Continuous operators Static scheduling Low latency Drizzle Dynamic scheduling (coarse granularity) Low latency (fine-grained processing) Source code: https://github.com/amplab/drizzle-spark Optimization of batches Micro-batch Dynamic scheduling (coarse granularity) Higher latency (coarse-grained processing) Optimization of batches