Load Shedding in a Data Stream Manager

Similar documents
Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management

UDP Packet Monitoring with Stanford Data Stream Manager

The Aurora and Borealis Stream Processing Engines

Dealing with Overload in Distributed Stream Processing Systems

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Monitoring Streams A New Class of Data Management Applications

Load Shedding for Aggregation Queries over Data Streams

DSMS Benchmarking. Morten Lindeberg University of Oslo

Outline. Introduction. Introduction Definition -- Selection and Join Semantics The Cost Model Load Shedding Experiments Conclusion Discussion Points

Retrospective on Aurora

Design Issues for Second Generation Stream Processing Engines

DATA STREAMS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26

Abstract of Load Shedding Techniques for Data Stream Management Systems by Emine Nesime Tatbul, Ph.D., Brown University, May 2007.

Abstract of Load Management Techniques for Distributed Stream Processing by Ying Xing, Ph.D., Brown University, May 2006.

Streaming Data Integration: Challenges and Opportunities. Nesime Tatbul

utility utility utility message % output value delay

Retrospective on Aurora

Data Streams. Building a Data Stream Management System. DBMS versus DSMS. The (Simplified) Big Picture. (Simplified) Network Monitoring

RUNTIME OPTIMIZATION AND LOAD SHEDDING IN MAVSTREAM: DESIGN AND IMPLEMENTATION BALAKUMAR K. KENDAI. Presented to the Faculty of the Graduate School of

class 17 updates prof. Stratos Idreos

Simulation of Quality of Service (QoS) Graph-Based Load-Shedding Algorithm in. Aurora System Using CSIM. Xiaolan Hu.

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Storage-centric load management for data streams with update semantics

Operator Scheduling in a Data Stream Manager

C-Store: A column-oriented DBMS

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments

Query Processing over Data Streams. Formula for a Database Research Project. Following the Formula

A Cost Model for Adaptive Resource Management in Data Stream Systems

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS

Parser: SQL parse tree

Accommodating Bursts in Distributed Stream Processing Systems

A Cost-Space Approach to Distributed Query Optimization in Stream Based Overlays

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

1. General. 2. Stream. 3. Aurora. 4. Conclusion

Providing Resiliency to Load Variations in Distributed Stream Processing

A Comparison of Stream-Oriented High-Availability Algorithms

Querying Data with Transact SQL

Load Shedding for Aggregation Queries over Data Streams

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Network Awareness in Internet-Scale Stream Processing

Brown University Providence, RI 02912

7. Query Processing and Optimization

Challenges in Data Stream Processing

Profile-Driven Data Management

Evaluation of Relational Operations

Tuple Routing Strategies for Distributed Eddies

C-STORE: A COLUMN- ORIENTED DBMS

[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012

New Data Types and Operations to Support. Geo-stream

LOAD SHEDDING IN DATA STREAM SYSTEMS

Data Replication in the Quality Space

Ch 5 : Query Processing & Optimization

COMP 249 Advanced Distributed Systems Multimedia Networking. Performance of Multimedia Delivery on the Internet Today

DESIGN AND IMPLEMENTATION OF SCHEDULING STRATEGIES AND THEIR EVALUAION IN MAVSTREAM. Sharma Chakravarthy Supervising Professor.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

Database Applications (15-415)

Tuning SQL without the Tuning Pack. John Larkin JP Morgan Chase

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Rethinking the Design of Distributed Stream Processing Systems

Load Shedding. Recommended Reading

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Accelerating Analytical Workloads

Incremental Evaluation of Sliding-Window Queries over Data Streams

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.

Scheduling Strategies for Processing Continuous Queries Over Streams

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

One-Pass Streaming Algorithms

RASC: Dynamic Rate Allocation for Distributed Stream Processing Applications

QoS Management of Real-Time Data Stream Queries in Distributed. environment.

Differentiated Services

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

The Raindrop Engine: Continuous Query Processing at WPI

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Implementation of Relational Operations

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Querying Sliding Windows over On-Line Data Streams

EECS 647: Introduction to Database Systems

Load Shedding for Window Joins on Multiple Data Streams

Streaming Live Media over a Peer-to-Peer Network

Application of wavelet filtering to image compression

Relational Query Optimization

Evaluation of Relational Operations. Relational Operations

Operator Implementation Wrap-Up Query Optimization

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION

Relational Query Optimization. Highlights of System R Optimizer

Analytical and Experimental Evaluation of Stream-Based Join

Routing in a Delay Tolerant Network

Semantic Multicast for Content-based Stream Dissemination

Transcription:

Load Shedding in a Data Stream Manager Nesime Tatbul, Uur U Çetintemel, Stan Zdonik Brown University Mitch Cherniack Brandeis University Michael Stonebraker M.I.T.

The Overload Problem Push-based data sources High and unpredictable data rates Problem: Load > Capacity during spikes Load Shedding : eliminating excess load by dropping data 2

Aurora Data Stream Management System Aurora Query Network input data streams QoS QoS output to applications QoS Stream Query Algebra (SQuAl) Quality of Service Run-time Operation: Operator Scheduling Storage Management Performance Monitoring Load Shedding 3

Load Shedding by Inserting Drops QoS QoS two types of drops: Random Drop Drop k % Semantic Drop Filter P(value) 4

Quality of Service Loss-tolerance QoS utility 1.0 0.7 Value-based QoS utility 1.0 0.4 100 50 0 % delivery 0 80 120 200 values 5

Assumptions Processor is the main scarce resource Operators that produce new attribute values are not considered (e.g., aggregate, map) Load Shedder operates independently from Scheduler 6

Problem Statement N: query network I : set of input streams C : processing capacity when Load(N(I)) > C, transform N to N such that Load(N (I)) < C Utility(N(I))-Utility(N (I)) is minimized key questions: when to shed load? where to shed load? how much load to shed? which tuples to drop? 7

Talk Outline Introduction Technical Overview Experimental Results Related Work Conclusions and Future Work 8

Algorithmic Overview evaluate the Query Network Load IF Load > Capacity insert Drops to the Query Network ELSE IF Load < Capacity AND Drops in the Query Network remove Drops from the Query Network read stats, network modify network System Catalog look up Drop Insertion Plans Queries Statistics 9

Load Evaluation Load Coefficients for each input stream R i Cost 1 Sel 1 Cost 2 Sel 2 Cost n Sel n n j 1 L = Sel Cost j= 1 k= 1 i k j Total Load at run time (CPU cycles per tuple) can be pre-computed! m i= 1 L i R i (CPU cycles per time unit) 10

Load Shedding Road Map (LSRM) Excess Load Drop Insertion Plan QoS Cursors 10% shed less! cursor 20% shed more! 300% 11

Constructing the LSRM 1. identify potential drop locations 2. sort the drop locations 3. apply the drop locations in order: insert one unit of drop if semantic drop, determine the filter predicate create an LSRM entry 12

Drop Locations utility 1 2 3 5 6 utility % delivery 4 utility % delivery 7 8 early drops save more cycles drops before splits cause more utility loss % delivery 13

Best Drop Location Goal: maximize cycle gain, minimize utility loss subnetwork R Drop x % L subnetwork Cycle gain: R ( x L D) if x > 0 Gx ( ) = 0 otherwise Loss/Gain Ratio D cycles/tuple Utility loss: U(x) 1 0.7 100 50 0 x du ( x) / dx negative slope of U ( x) = dg( x)/ dx R L the smaller, the better! 14

From Values to % Delivery when, where, how much? have the same answer for Random and Semantic Load Shedding Trick: translation between QoS functions Assumption: drop the lowest utility values first utility relative frequency utility 1.0 0.2 0.6 0.4 1.0 0.88 0 50 100 values 0 50 100 values 100 60 0 %delivery 15

Determining the Filter Predicate utility utility relative frequency 1.0 0.88 1.0 0.2 0.6 0.4 0.2 100 80 60 0 %delivery 0 50 100 values 0 50 100 values drop 20% lowest utility value interval drop values < 25 Filter Predicate: value 25 16

Talk Outline Introduction Technical Overview Experimental Results Related Work Conclusions and Future Work 17

Setup: Experimental Study streams: uniformly distributed integer values networks: a mix of filters and unions QoS: : value-based QoS,, range utilities chosen from a Zipf distribution Metrics used: percent loss in total average utility on loss-tolerance QoS (Tuple Utility Loss) value-based QoS (Value Utility Loss) 18

Random-LS beats Admission Control 70 60 Input-Uniform Random-LS 50 40 30 20 10 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Load ~20% Load ~300% 19

Semantic-LS beats Admission Control 80 70 Input-Uniform Semantic-LS 60 50 40 30 20 10 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Load ~20% Load ~300% 20

Semantic-LS beats Random-LS 25 20 r=0.15 r=0.20 r=0.25 15 10 5 0 0 0.2 0.5 0.8 0.99 21

Performance for Networks with Sharing 9 8 7 6 theta=0.99 theta=0.5 theta=0 5 4 3 2 1 0 10 20 30 40 50 60 70 80 90 100 22

Talk Outline Introduction Technical Overview Experimental Results Related Work Conclusions and Future Work 23

Relevance to Existing Work Congestion control in networks Multimedia streaming Approximate query answering Data stream processing sampling, shedding on aggregates STREAM [MW+03, BDM03] approximate join processing [DGR03] adjusting rates to windowed joins [KNV03] 24

Talk Outline Introduction Technical Overview Experimental Results Related Work Conclusions and Future Work 25

Highlights An end-to to-end solution for detecting and resolving overload in data stream systems Utility loss minimization for networks of queries which share resources and processing semantic utility as well as quantity-based utility Static drop insertion plans, dynamic instantiation 26

Future Directions Handling complex operators joins and aggregates Other resource limitations memory - windowed operators bandwidth - Aurora* power - at sensor level 27

More Information Aurora Web Page: http://www.cs.brown.edu/research/aurora Email: tatbul@cs.brown.edu Demo 28