Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Size: px
Start display at page:

Download "Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich"

Transcription

1 Big Data Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

2 Data Stream Processing

3 Overview Introduction Models and languages Query processing systems Distributed stream processing Real-time analytics and stream warehousing 3

4 Data Streams Continuous sequences of data elements that are typically: Push-based (like in publish/subscribe systems) Ordered (e.g., by arrival time, or by explicit timestamps) Rapid (e.g., ~ millions of messages/sec in market data) Potentially unbounded (may have no (known) end) Time-sensitive (real-time events, latency-critical) Time-varying (in content and speed) Unpredictable (autonomous data sources) 4

5 Example Applications Financial Services Example: Trades(time, symbol, price, volume) Typical Applications: Algorithmic Trading Foreign Exchange Fraud Detection Compliance Checking 5

6 Example Applications System and Network Monitoring Example: Connections(time, srcip, destip, destport, status) Typical Applications: Server load monitoring Network traffic monitoring Detecting security attacks Denial of Service Intrusion 6

7 Example Applications Sensor-based Monitoring Example: CarPositions(time, id, speed, position) Typical Applications: Monitoring congested roads Route planning Rule violations Tolling 7

8 Historical Background 1990s: Various extensions to traditional database systems Triggers in Active DB s, Sequence DB s, Continuous Queries, Pub/Sub, etc. Early 2000s: Data Stream Management Systems (DSMSs) Aurora [Brandeis-Brown-MIT] STREAM [Stanford] TelegraphCQ [UC Berkeley] Many other research systems (NiagaraCQ, Gigascope, Nile, PIPES, ) After 2003: Start-ups Aurora -> StreamBase, Inc. -> Borealis (= distributed Aurora) Coral8, Inc. [Stanford] TelegraphCQ -> Truviso, Inc. Today: Active industry interest, new systems Acquisitions: Coral8 -> Aleri -> Sybase -> SAP, Truviso -> Cisco From DB vendors: IBM InfoSphere Streams, Microsoft StreamInsight, Oracle CEP (Market players as of Dec 2011: Open-source: ESPER, Yahoo! S4, Twitter Storm Streams Big Data (the velocity dimension) 8

9 A Paradigm Shift in Data Processing Model Query DBMS Answer Data DSMS Answer Data Base Query Base Traditional Data Management Data Stream Management 9

10 DBMS vs. DSMS Persistent relations Read-intensive One-time queries Random access Access plan determined by query processor and physical DB design Transient streams Update-intensive (mostly append-only) Continuous queries (a.k.a., long-running, standing, or persistent queries) Sequential access Unpredictable data characteristics and arrival patterns 10

11 Models & Languages

12 Model Issues Data models Relational-based vs. XML-based Time and Order Query models Declarative vs. Procedural Window-based Processing Integrity constraints and punctuations Views and negative tuples Pattern matching queries 12

13 Example Models STREAM / CQL Relational-based data model Declarative query language (SQL extensions) Aurora / SQuAl Relational-based data model Procedural query language (Relational algebra extensions) MXQuery [ETH Zurich] XML-based data model Declarative query language (XQuery extensions) 13

14 Window-based Processing Windows are finite excerpts of a potentially unbounded stream. Most streaming applications are interested in the readings of the recent past. Windows help us unblock operators such as aggregates. Windows help us bound the memory usage for operators such as joins. 14

15 Window Example Two basic parameters: size and slide Example: Trades(time, symbol, price, volume) size = 10 min slide by 5 min (10:00, IBM, 20, 100) (10:00, INTC, 15, 200) (10:00, MSFT, 22, 100) (10:05, IBM, 18, 300) (10:05, MSFT, 21, 100) (10:10, IBM, 18, 200) (10:10, MSFT, 20, 100) (10:15, IBM, 20, 100) (10:15, INTC, 20, 200) (10:15, MSFT, 20, 200).. 15

16 Windows: Unblocking Aggregate Operation Average Problem: No results can be produced until the stream ends. Average is blocked Average size = 3 slide = Solution: Average can be computed on sliding windows. Average is unblocked. 16

17 Windows: Bounding Join State Join.. (10, 10) (30, 30) Problem: Join must buffer its inputs until both streams end. Join state is unbounded Join size = 2.. (10, 10) (30, 30) Solution: Join must only buffer the latest window on its inputs. Join state is bounded. 17

18 Common Window Types Sliding window A window that slides (i.e., both of its end-points move) as new stream tuples arrive. Tumbling window A sliding window for which window size = window slide (i.e., consecutive windows do not overlap). Landmark window A window which is moving only on one of its endpoints (usually the forward end-point). 18

19 Time-based window Common Window Types A window whose size and content is determined by tuples that arrived within a time period. Note: The actual size (i.e., in terms of tuple count) of such a window may vary. Tuple-based window (a.k.a., count-based/row-based window) A window whose size and content is determined by the number of tuples arrived. Note: The actual size is always fixed. Semantic window (a.k.a., predicate-based window) A window whose size and content is determined by the tuple contents. Note: Time-based window can be seen as a simple form of semantic window where the time field carried in the tuple is used for windowing. 19

20 STREAM CQL: Continuous Query Language SQL for Relation-to-Relation operations Additionally: Stream as a new data type (in addition to Relation ) Continuous instead of one-time query semantics Stream-to-Relation operations: Window specifications derived from SQL-99 Relation-to-Stream operations: Three special operators: Istream, Dstream, Rstream Simple sampling operations on streams No Stream-to-Stream operations 20

21 CQL: Streams vs. Relations T: discrete, ordered time domain A stream S is a possibly infinite bag of elements <s, t>, where s is a tuple with the schema of S and t є T is the timestamp of the element. Note: Timestamp is not part of the tuple schema. A relation R is a mapping from each time instant in T to a finite but unbounded bag of tuples with the schema of R. 21

22 CQL: Continuous Query Semantics Time advances from t-1 to t, when all inputs up to t-1 have been processed. For a query producing a stream: At time t є T, all inputs up to t are processed and the continuous query emits any new stream result elements with timestamp t. For a query producing a relation: At time t є T, all inputs up to t are processed and the continuous query updates the output relation to state R(t). 22

23 CQL: Mappings between Streams and Relations Stream-to-Relation Streams Relations Relation-to-Relation Relation-to-Stream Stream-to-Stream = Stream-to-Relation + Relation-to-Stream 23

24 CQL: Stream-to-Relation Operators Time-based sliding windows FROM S[RANGE T] Tuple-based sliding windows FROM S[ROWS N] Partitioned windows FROM S[PARTITION BY A 1,, A k RANGE T] FROM S[PARTITION BY A 1,, A k ROWS N] Windows with a slide parameter FROM S[RANGE T SLIDE L] FROM S[ROWS N SLIDE L] FROM S[PARTITION BY A 1,, A k RANGE T SLIDE L] FROM S[PARTITION BY A 1,, A k ROWS N SLIDE L] 24

25 CQL: Relation-to-Stream Operators Insert stream Istream( R) = (( R( t) R( t 1)) {}) t Delete stream t 0 Dstream( R) = (( R( t 1) R( t)) {}) t Relation stream t> 0 Rstream( R) = ( R( t) {}) t t 0 SELECT Istream(..), SELECT Dstream(..), SELECT Rstream(..) 25

26 CQL: Example Queries Trades (time, symbol, price, volume) NYSE_Trades (time, symbol, price, volume) SWX_Trades (time, symbol, price, volume) Streaming Filter SELECT Istream(*) FROM Trades[RANGE Unbounded] WHERE price > 20 Streaming Aggregation SELECT Istream(Count(*)) FROM Trades[PARTITION BY symbol RANGE 10 Minutes SLIDE 1 Minute] Sliding-window Join SELECT Istream(*) FROM NYSE_Trades[RANGE 10 Minutes], SWX_Trades[RANGE 10 Minutes] WHERE NYSE_Trades.symbol = SWX_Trades.symbol 26

27 CQL: Example Query Execution Stream: S(A) Query: SELECT Istream(*) FROM S[ROWS 1] WHERE <Filter> Operations: LastRow: S-to-R Filter: R-to-R Istream: R-to-S Assumption: (a 0 ), (a 2 ), (a 4 ) satisfy the filter. 27

28 Aurora SQuAl: Stream Query Algebra A stream is an append-only sequence of tuples with a uniform schema. The system stamps each tuple with its time of arrival (which can be used for keeping latency statistics or time-based windowing). Disorder is allowed (but regarded as an irregularity). Queries are represented with data-flow diagrams consisting of operators. Order-agnostic operators: Filter, Map, Union Order-sensitive operators: BSort, Aggregate, Join, Resample All operators are Stream-to-Stream (i.e., SQuAl is closed under streams like relational algebra is closed under relations). Table operators (WriteSQL, ReadSQL) added in later versions. 28

29 SQuAl: Operators Filter applies a predicate on each stream tuple. Map applies a function on each stream tuple. (* extensibility) e.g., projection Union merges two or more streams into one. order-preserving version also exists. BSort is a buffer-based approximate sort. equivalent to n-pass bubble sort Aggregate applies aggregate functions to sliding windows over its input. (* extensibility) Join applies a predicate to pairs of tuples from two input streams that are within a certain window distance from each other. Resample applies an interpolation function on a stream to align it with another stream. 29

30 SQuAl: Example Query Filter Aggregate Filter Aggregate Filter symbol= IBM size = 5 min slide = 5 min diff = high-low diff > 5 size = 60 min slide = 60 min count count > 0 User-Defined Function (UDF) (provides extensibility) Boxes and arrows data-flow diagram instead of a declarative specification Same query could also be written in STREAM CQL as a nested query Windows are defined as part of other operators (e.g., Aggregate) 30

31 SQuAl: Slack & Timeout Parameters Slack is a stream parameter to specify the degree of disorder in that stream. Out of order tuples beyond the slack parameter are simply discarded. Timeout is a parameter for sliding window operators to specify the maximum time period that a window is allowed to remain open. Delayed tuples beyond the timeout parameter are simply discarded. 31

32 Summary: CQL vs. SQuAl CQL SQuAl Closed under Streams, Relations Streams Stream-to-Stream ops None, but many queries assume use of Istream Default Stream-to-Relation ops Windows None. Windows are internal. WriteSQL enables relation access as a side-effect. Relation-to-Stream ops Istream, Dstream, Rstream None. ReadSQL enables relation access as a side effect. Windowing support Window types Stream-Relation correlations First-class (can be named, shared, queried) Tuple-based, Time-based, Sliding, Tumbling No, but Relation-Relation correlation using a NOW window has same effect Stream-Stream correlations Must window both streams Join Not first-class (internal to the query operators) Tuple-based, Time-/Valuebased, Sliding, Tumbling ReadSQL 32

33 The SECRET Model Problem: execution model heterogeneity Currently, there is no standard model for defining and executing stream windows. Each Stream Processing Engine (SPE) has its own execution semantics which leads to unexpected differences in query results. SECRET: a descriptive model for analyzing the window execution semantics of SPEs [ETH Zurich]. Data Version 1 [VLDB 10] in-order streams (gaps, simultaneity) Version 2 [VLDBJ, to appear] Queries time-based windows + tuple-based windows Systems Coral8, STREAM, StreamBase + Oracle CEP 33

34 Example #1: StreamBase vs. Coral8 Input Stream: InStream(Time, Val) {(30, 10), (31, 20), (36, 30), } Continuous Query: Compute average value of the tuples for the last 5 seconds. Result Stream: OutStream(Val) StreamBase: {(10), (15), (15), (15), (15), (20), (30),...} Coral8 : {(10), (15), (20), (30),...} 34

35 SECRET Model Overview What affects query results produced by an SPE? Query Result = F(system, query, input) Tick REport ScopE t 0 window ScopE Content Tick REport Content Tick -> Report -> Content -> Scope 35

36 Time & Order in SECRET Two notions of time: t app : application time of a tuple (t in SECRET) t sys : system time (arrival time) of a tuple (τ in SECRET) Order assumptions: Tuples are partially ordered by their t app values. Tuples are totally ordered by their t sys values. Batch-id (bid) defines a further ordering among simultaneous tuples (i.e., tuples having the same t app ) [Jain et al.]. Tuples are partially ordered by their bid values. 36

37 ScopE in SECRET Scope is based on window specification (size (ω) and slide (β)) start of the first window (t 0 ) Scope(t) defines the interval of active window at application time t. Active window at time t is the open window with the earliest start time. W 1 W 2 t 0 t 0 +β t 0 +ω t 0 +2β t 0 +β+ω t 0 +2β+ω t app W 1 W 2 W 3 37

38 Content in SECRET Content(t, τ) specifies the set of input tuples that are in Scope(t) as of system time τ. It can return different values at t, depending on the arrival of tuples (due to τ). 38

39 REport in SECRET Report(t, τ) defines the conditions under which the window contents become visible for further query evaluation and result reporting. It can take a logical combination of the following: content change window close non-empty content periodic 39

40 Example #1 with SECRET Input: InStream(Time, Val) = {(30, 10), (31, 20), (36, 30), } Query: Compute average value of the tuples for the last 5 sec. Result: OutStream(Val) StreamBase (window close&non-empty) = {(10), (15), (15), (15), (15), (20), (30),...} Coral8 (content change&non-empty) = {(10), (15), (20),(30),...} t app

41 Tick in SECRET Tick(τ) defines the condition which drives an SPE to take action on its input. It can be based on one of the following: tuple-driven: react to individual tuples time-driven: react to all tuples with the same t app value batch-driven: react to subsets of tuples with the same t app value [Jain et al.] Note: tuple-driven and time-driven are in fact special cases of batch-driven. 41

42 Example #2: Coral8 - Difference in Tick Input: InStream(Time, Val) = {(3, 10), (5, 20), (5, 30), (5, 40), (5, 50),...} Query: Compute sum of the values for the last 4 seconds. Result: OutStream(Val) Coral8 (tuple-driven) = {(10), (30), (60), (100), (150),...} Coral8 (time-driven) = {(10), (150),...} t app

43 Extending SECRET From Time-based to Tuple-based Windowing domain is different. tid: tuple-id of a tuple (i in SECRET) Tuples are totally ordered by their tid values. Windows (size and slide) are defined in terms of tid s instead of t app s. Scope and Content: t -> i and t 0 -> i 0 t is not unique -> i is unique Tick: maps from t sys to t app -> maps from t sys to tid no gaps nor a notion of simultaneity in the tid domain => simpler to formulate Report: Content-change and non-empty-content always true => simpler to formulate Tick may invoke Report with multiple tid values (when: time- or batch-driven tick + simultaneous input) => One of those that satisfy the Report condition is chosen ( evaporating tuples [Jain et al.]) 43

44 Example #3: Tuple-based Windows Input: InStream(Time, Val) = {(1, 10), (2, 20), (2, 30), (2, 40), (2, 50), (4, 60),...} Query: Compute sum of the values for a tuple-based tumbling window of size 2. Result for an SPE S(Tick=time-driven, Report=window-close): OutStream(Val) = {(70), (110), } (1,10) (2,20) (2,30) (2,40) (2,50) (4,60) t sys tid

45 SPEs in SECRET Coral8 Oracle CEP STREAM StreamBase SPE t 0, i 0 Report Tick t t1 ω β 1, 0 β tt1 β ω, β ω β tt1 ωβ, ω t t1 ω β 1, 0 β content change window close non-empty content periodic content change window close non-empty content periodic content change window close non-empty content periodic content change window close non-empty content periodic tuple-driven time-driven batch-driven tuple-driven time-driven batch-driven tuple-driven time-driven batch-driven tuple-driven time-driven batch-driven 45

46 Query Processing Systems

47 Systems Issues In principle, similar concerns as in traditional databases (e.g., storage management, query processing and optimization) Different architectural requirements and opportunities: Latency more important than throughput Continuous query processing driven by data arrival (pushbased) On-the-fly processing in memory, queue-based storage Limited resources (CPU, memory) Multi-query optimization (sharing state and computation) Approximate query answering (load shedding) Adaptive query processing 47

48 Main Architectural Components 48

49 Example Systems STREAM Physical query plan structure (operators, queues, synopses) Selected optimizations (synopsis sharing, chain scheduling) Aurora System and QoS model Selected optimizations (batch-based scheduling, load shedding) 49

50 STREAM Query Plans Query in CQL -> Physical query plan tree SELECT * FROM S1 [ROWS 1000], S2 [RANGE 2 MINUTES] WHERE S1.A = S2.A AND S1.A > 10 operators for processing queues for buffering tuples synopses for storing operator state 50

51 STREAM Operators 51

52 STREAM Queues Queues encapsulate the typical producerconsumer relationship between the operators. They act as in-memory buffers. They enforce that tuple timestamps are nondecreasing. Why is this necessary? Heartbeat mechanism for time management 52

53 STREAM Heartbeats in a Nutshell Problem: Out of order data arrival Unsynchronized application clocks at the sources Different network latencies from different sources to the DSMS Data transmission over a non-orderpreserving channel Solution: Order tuples at the input manager by generating heartbeats based on applicationspecified parameters Heartbeat value T at a given time instant means that all tuples after that instant will have a timestamp greater than T. 53

54 STREAM Synopses A synopsis stores the internal state of an operator needed for its evaluation. Example: A windowed join maintains a hash table for each of its inputs as a synopsis. Do we need synopses for all types of operators? Like queues, synopses are also kept in memory. Synopses can also be used in more advanced ways: shared among multiple operators (for space optimization) store summary of stream tuples (for approximate processing) 54

55 Synopsis Sharing in STREAM Replace identical synopses with stubs and store the actual tuples in a single store. Also for multiple query plans. SELECT * FROM S1 [ROWS 1000], S2 [RANGE 2 MINUTES] WHERE S1.A = S2.A AND S1.A > 10 SELECT A, MAX(B) FROM S1 [ROWS 200] GROUP By A 55

56 Operator Scheduling in STREAM A global scheduler decides on the order of operator execution. Changing the execution order of the operators does not affect their semantic correctness (operator semantics does not depend on wallclock time), but may affect system s resource utilization or latencies of queries. Main focus in STREAM: Memory utilization (reducing total queue size). 56

57 Operator Scheduling in STREAM Example #1 Example Query Plan: Input Arrival Pattern: n 0.2n 0 OP1 OP2 cost = 1/n sel = 0.2 cost = 1/0.2n sel = 0 t t t t t t t Total Queue Sizes for two alternative scheduling policies: t time Process input in batches of n. Greedy prioritizes OP1 < OP2, since has higher rate of reduction in total queue size per unit time. FIFO schedules OP1-OP2 in sequence. Greedy uses less memory. 57

58 Operator Scheduling in STREAM Example #2 Example Query Plan: Input Arrival Pattern: n 0.9n 0.9n OP1 OP2 cost = 1/n sel = 0.9 cost = 1/0.9n sel = 1 OP3 cost = 1/0.9n sel = 0 t t t t t t t t Total Queue Sizes for two alternative scheduling policies: Process input in batches of n. Greedy prioritizes OP3 < OP1 < OP2, again based on the previous criteria. FIFO schedules OP1-OP2-OP3 in sequence. FIFO uses less memory. 58 time

59 Operator Scheduling in STREAM Chain Scheduling Greedy scheduling, based on prioritizing blocks of operators ( chains ) according to their cumulative rate of reduction in total queue size, instead of individual operators. Apply FIFO scheduling within each chain. This strategy is proven to be close to optimal. In Example #2: Consider OP1-OP2-OP3 as one chain. 59

60 Aurora System Model 60

61 Aurora Quality of Service (QoS) Latency QoS utility 1.0 Loss-tolerance QoS utility δ Value-based QoS utility latency % delivery values 61

62 Aurora Architecture 62

63 Operator Scheduling Goal: To allocate the CPU among multiple queries with multiple operators so as to optimize a metric, such as: minimize total average latency maximize total average latency QoS utility maximize total average throughput minimize total memory consumption Deciding which operator should run next, for how long or with how much input. Must be low overhead. 63

64 Batching Exploit inter-box and intra-box non-linearities in execution overhead Train scheduling batching and executing multiple tuples together Superbox scheduling batching and executing multiple boxes together 64

65 Batching reduces execution costs 65

66 The Overload Problem If Load > Capacity during the spikes, then queues form and latency proliferates. Given a query network N, a set of input streams I, and a CPU with processing capacity C; when Load(N(I)) > C, transform N into N such that: Load(N (I)) < C, and Utility(N(I)) Utility(N (I)) is minimized. 66

67 Load Shedding in Aurora Aurora Query Network.. Problem: When load > capacity, latency QoS degrades. Solution: Insert drop operators into the query plan. Result: Deliver approximate answers with low latency. 67

68 Load Shedding in Aurora Aurora Query Network. Key questions: when to shed load? where to shed load? how much load to shed? which tuples to drop? Problem: When load > capacity, latency QoS degrades. Solution: Insert drop operators into the query plan. Result: Deliver approximate answers with low latency.. 68

69 The Drop Operator is an abstraction for load reduction can be added, removed, updated, moved reduces load by a factor produces a subset of its input picks its victims probabilistically semantically (i.e., based on tuple content) 69

70 Load coefficients When to Shed Load? R cost 1 cost 2 i sel 1 sel 2 cost n sel n n j 1 L = sel cost Total load i k j j= 1 k = 1 (CPU cycles per tuple) m i= 1 L i R i (CPU cycles per time unit) 70

71 Aurora Load Shedding Three Basic Principles 1. Minimize run-time overhead. 2. Minimize loss in query answer accuracy. 3. Deliver subset results. 71

72 Principle 1: Plan in advance. Excess Load Drop Insertion Plan QoS Cursors 10% shed less! cursor 20% shed more! 300% 72

73 Principle 2: Minimize error. utility 2 % delivery 1 utility 3 % delivery Early drops save more processing cycles. Drops before sharing points can cause more accuracy loss. We rank possible drop locations by their loss/gain ratios. 73

74 Principle 3: Keep sliding windows intact. Two parameters: size and slide Example: Trades(time, symbol, price, volume) size = 10 min slide by 5 min (10:00, IBM, 20, 100) (10:00, INTC, 15, 200) (10:00, MSFT, 22, 100) (10:05, IBM, 18, 300) (10:05, MSFT, 21, 100) (10:10, IBM, 18, 200) (10:10, MSFT, 20, 100) (10:15, IBM, 20, 100) (10:15, INTC, 20, 200) (10:15, MSFT, 20, 200).. 74

75 Windowed Aggregation Apply an aggregate function on the window Average, Sum, Count, Min, Max User-defined Can be nested Example: Filter Aggregate Filter Aggregate Filter symbol= IBM ω = 5 min diff > 5 δ = 5 min diff = high-low ω = 60 min δ = 60 min count count > 0 75

76 Dropping from an Aggregation Query Tuple-based Approach Average ω = 3 δ = Drop before : non-subset result of nearly the same size Drop p = Average ω = 3 δ = Drop after : subset result of smaller size Average Drop ω = 3 p = 0.5 δ = 3 76

77 Dropping from an Aggregation Query Window-based Approach Drop before : subset result of smaller size Window Drop ω = 3, δ = 3 p = Average ω = 3 δ = Window-aware load shedding works with any aggregate function delivers correct results keeps error propagation under control can handle nesting can drop load early 77

78 References Aurora: A New Model and Architecture for Data Stream Management, D. Abadi et al., VLDB Journal, 12(2), The CQL Continuous Query Language: Semantic Foundations and Query Execution, A. Arasu, S. Babu, J. Widom, VLDB Journal, 15(2), Stanford Data Stream Management System, A. Arasu et al., book chapter, Extending XQuery with Window Functions, I. Botan et al., VLDB Stream-Oriented Query Languages and Operators, M. Cherniack, S. Zdonik, Encyclopedia of Database Systems, Springer, Data Stream Management, L. Golab, T. Ozsu, Morgan & Claypool Publishers, Towards a Streaming SQL Standard, N. Jain et al., VLDB The SECRET Model, 78

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 Data Streams Continuous

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 System Issues Architecture

More information

Streaming Data Integration: Challenges and Opportunities. Nesime Tatbul

Streaming Data Integration: Challenges and Opportunities. Nesime Tatbul Streaming Data Integration: Challenges and Opportunities Nesime Tatbul Talk Outline Integrated data stream processing An example project: MaxStream Architecture Query model Conclusions ICDE NTII Workshop,

More information

Load Shedding in a Data Stream Manager

Load Shedding in a Data Stream Manager Load Shedding in a Data Stream Manager Nesime Tatbul, Uur U Çetintemel, Stan Zdonik Brown University Mitch Cherniack Brandeis University Michael Stonebraker M.I.T. The Overload Problem Push-based data

More information

DATA STREAMS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26

DATA STREAMS AND DATABASES. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26 DATA STREAMS AND DATABASES CS121: Introduction to Relational Database Systems Fall 2016 Lecture 26 Static and Dynamic Data Sets 2 So far, have discussed relatively static databases Data may change slowly

More information

UDP Packet Monitoring with Stanford Data Stream Manager

UDP Packet Monitoring with Stanford Data Stream Manager UDP Packet Monitoring with Stanford Data Stream Manager Nadeem Akhtar #1, Faridul Haque Siddiqui #2 # Department of Computer Engineering, Aligarh Muslim University Aligarh, India 1 nadeemalakhtar@gmail.com

More information

1. General. 2. Stream. 3. Aurora. 4. Conclusion

1. General. 2. Stream. 3. Aurora. 4. Conclusion 1. General 2. Stream 3. Aurora 4. Conclusion 1. Motivation Applications 2. Definition of Data Streams 3. Data Base Management System (DBMS) vs. Data Stream Management System(DSMS) 4. Stream Projects interpreting

More information

Data Streams. Building a Data Stream Management System. DBMS versus DSMS. The (Simplified) Big Picture. (Simplified) Network Monitoring

Data Streams. Building a Data Stream Management System. DBMS versus DSMS. The (Simplified) Big Picture. (Simplified) Network Monitoring Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate students http://www-db.stanford.edu/stream stanfordstreamdatamanager Data Streams

More information

DSMS Benchmarking. Morten Lindeberg University of Oslo

DSMS Benchmarking. Morten Lindeberg University of Oslo DSMS Benchmarking Morten Lindeberg University of Oslo Agenda Introduction DSMS Recap General Requirements Metrics Example: Linear Road Example: StreamBench 30. Sep. 2009 INF5100 - Morten Lindeberg 2 Introduction

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

Query Processing over Data Streams. Formula for a Database Research Project. Following the Formula

Query Processing over Data Streams. Formula for a Database Research Project. Following the Formula Query Processing over Data Streams Joint project with Prof. Rajeev Motwani and a group of graduate students stanfordstreamdatamanager Formula for a Database Research Project Pick a simple but fundamental

More information

An Efficient Execution Scheme for Designated Event-based Stream Processing

An Efficient Execution Scheme for Designated Event-based Stream Processing DEIM Forum 2014 D3-2 An Efficient Execution Scheme for Designated Event-based Stream Processing Yan Wang and Hiroyuki Kitagawa Graduate School of Systems and Information Engineering, University of Tsukuba

More information

SCALABLE AND ROBUST STREAM PROCESSING VLADISLAV SHKAPENYUK. A Dissertation submitted to the. Graduate School-New Brunswick

SCALABLE AND ROBUST STREAM PROCESSING VLADISLAV SHKAPENYUK. A Dissertation submitted to the. Graduate School-New Brunswick SCALABLE AND ROBUST STREAM PROCESSING by VLADISLAV SHKAPENYUK A Dissertation submitted to the Graduate School-New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements

More information

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have

More information

Analytical and Experimental Evaluation of Stream-Based Join

Analytical and Experimental Evaluation of Stream-Based Join Analytical and Experimental Evaluation of Stream-Based Join Henry Kostowski Department of Computer Science, University of Massachusetts - Lowell Lowell, MA 01854 Email: hkostows@cs.uml.edu Kajal T. Claypool

More information

RUNTIME OPTIMIZATION AND LOAD SHEDDING IN MAVSTREAM: DESIGN AND IMPLEMENTATION BALAKUMAR K. KENDAI. Presented to the Faculty of the Graduate School of

RUNTIME OPTIMIZATION AND LOAD SHEDDING IN MAVSTREAM: DESIGN AND IMPLEMENTATION BALAKUMAR K. KENDAI. Presented to the Faculty of the Graduate School of RUNTIME OPTIMIZATION AND LOAD SHEDDING IN MAVSTREAM: DESIGN AND IMPLEMENTATION by BALAKUMAR K. KENDAI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial

More information

DSMS Part 1, 2 & 3. Data Stream Management. Example: Sensor Networks. Some Sensornet Applications. Data Stream Management Systems

DSMS Part 1, 2 & 3. Data Stream Management. Example: Sensor Networks. Some Sensornet Applications. Data Stream Management Systems Data Stream Management Systems Data Stream Management Systems Vera Goebel Thomas Plagemann University of Oslo 1 DSMS Part 1, 2 & 3 Introduction: What are DSMS? (terms) Why do we need DSMS? (applications)

More information

A Cost Model for Adaptive Resource Management in Data Stream Systems

A Cost Model for Adaptive Resource Management in Data Stream Systems A Cost Model for Adaptive Resource Management in Data Stream Systems Michael Cammert, Jürgen Krämer, Bernhard Seeger, Sonny Vaupel Department of Mathematics and Computer Science University of Marburg 35032

More information

Exploiting Predicate-window Semantics over Data Streams

Exploiting Predicate-window Semantics over Data Streams Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu

More information

Querying Sliding Windows over On-Line Data Streams

Querying Sliding Windows over On-Line Data Streams Querying Sliding Windows over On-Line Data Streams Lukasz Golab School of Computer Science, University of Waterloo Waterloo, Ontario, Canada N2L 3G1 lgolab@uwaterloo.ca Abstract. A data stream is a real-time,

More information

Linked Stream Data Processing Part I: Basic Concepts & Modeling

Linked Stream Data Processing Part I: Basic Concepts & Modeling Linked Stream Data Processing Part I: Basic Concepts & Modeling Danh Le-Phuoc, Josiane X. Parreira, and Manfred Hauswirth DERI - National University of Ireland, Galway Reasoning Web Summer School 2012

More information

Scheduling Strategies for Processing Continuous Queries Over Streams

Scheduling Strategies for Processing Continuous Queries Over Streams Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Scheduling Strategies for Processing Continuous Queries Over Streams Qingchun Jiang, Sharma Chakravarthy

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases Big Data Management and NoSQL Databases Lecture 13. Data Stream Management PD Dr. Andreas Behrend Data Stream A data stream is a sequence of data tuples. Think of standard tuples of relational databases.

More information

Window Specification over Data Streams

Window Specification over Data Streams Window Specification over Data Streams Kostas Patroumpas and Timos Sellis School of Electrical and Computer Engineering National Technical University of Athens, Hellas {kpatro,timos}@dbnet.ece.ntua.gr

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Data streams and low latency processing DATA STREAM BASICS What is a data stream? Large data volume, likely structured, arriving at a very high rate Potentially

More information

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Nesime Tatbul Uğur Çetintemel Stan Zdonik Talk Outline Problem Introduction Approach Overview Advance Planning with an

More information

Imperial College of Science, Technology and Medicine Department of Computing. Distributing Complex Event Detection Kristian A.

Imperial College of Science, Technology and Medicine Department of Computing. Distributing Complex Event Detection Kristian A. Imperial College of Science, Technology and Medicine Department of Computing Distributing Complex Event Detection Kristian A. Nagy MEng Individual project report Supervised by Dr. Peter Pietzuch June 2012

More information

Streaming SQL. Julian Hyde. 9 th XLDB Conference SLAC, Menlo Park, 2016/05/25

Streaming SQL. Julian Hyde. 9 th XLDB Conference SLAC, Menlo Park, 2016/05/25 Streaming SQL Julian Hyde 9 th XLDB Conference SLAC, Menlo Park, 2016/05/25 @julianhyde SQL Query planning Query federation OLAP Streaming Hadoop Apache member VP Apache Calcite PMC Apache Arrow, Drill,

More information

Data Stream Management Issues A Survey

Data Stream Management Issues A Survey Data Stream Management Issues A Survey Lukasz Golab M. Tamer Özsu School of Computer Science University of Waterloo Waterloo, Canada {lgolab, tozsu}@uwaterloo.ca Technical Report CS-2003-08 April 2003

More information

DSMS Part 1 & 2. Handle Data Streams in DBS? Data Management. Data Stream Management Systems - Applications, Concepts, and Systems

DSMS Part 1 & 2. Handle Data Streams in DBS? Data Management. Data Stream Management Systems - Applications, Concepts, and Systems Advanced Topics in Distributed Systems Data Stream Management Systems - Applications, Concepts, and Systems Vera Goebel & Thomas Plagemann University of Oslo 1 DSMS Part 1 & 2 Introduction: What are DSMS?

More information

RDF stream processing models Daniele Dell Aglio, Jean-Paul Cabilmonte,

RDF stream processing models Daniele Dell Aglio, Jean-Paul Cabilmonte, Stream Reasoning For Linked Data M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle, and J.Z. Pan RDF stream processing models Daniele Dell Aglio, daniele.dellaglio@polimi.it Jean-Paul

More information

Understanding and Improving the Cost of Scaling Distributed Event Processing

Understanding and Improving the Cost of Scaling Distributed Event Processing Understanding and Improving the Cost of Scaling Distributed Event Processing Shoaib Akram, Manolis Marazakis, and Angelos Bilas shbakram@ics.forth.gr Foundation for Research and Technology Hellas (FORTH)

More information

Storage-centric load management for data streams with update semantics

Storage-centric load management for data streams with update semantics Research Collection Report Storage-centric load management for data streams with update semantics Author(s): Moga, Alexandru; Botan, Irina; Tatbul, Nesime Publication Date: 2009-03 Permanent Link: https://doi.org/10.3929/ethz-a-006733707

More information

One Size Fits All: An Idea Whose Time Has Come and Gone

One Size Fits All: An Idea Whose Time Has Come and Gone ICS 624 Spring 2013 One Size Fits All: An Idea Whose Time Has Come and Gone Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/9/2013 Lipyeow Lim -- University

More information

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Nesime Tatbul ETH Zurich tatbul@inf.ethz.ch Uğur Çetintemel Brown University ugur@cs.brown.edu Stan Zdonik Brown University

More information

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann 1.

More information

A Temporal Foundation for Continuous Queries over Data Streams

A Temporal Foundation for Continuous Queries over Data Streams A Temporal Foundation for Continuous Queries over Data Streams Jürgen Krämer and Bernhard Seeger Dept. of Mathematics and Computer Science, University of Marburg e-mail: {kraemerj,seeger}@informatik.uni-marburg.de

More information

Spatio-Temporal Stream Processing in Microsoft StreamInsight

Spatio-Temporal Stream Processing in Microsoft StreamInsight Spatio-Temporal Stream Processing in Microsoft StreamInsight Mohamed Ali 1 Badrish Chandramouli 2 Balan Sethu Raman 1 Ed Katibah 1 1 Microsoft Corporation, One Microsoft Way, Redmond WA 98052 2 Microsoft

More information

Stream Schema: Providing and Exploiting Static Metadata for Data Stream Processing

Stream Schema: Providing and Exploiting Static Metadata for Data Stream Processing Stream Schema: Providing and Exploiting Static Metadata for Data Stream Processing Peter M. Fischer Systems Group ETH Zurich Switzerland peter.fischer@inf.ethz.ch Kyumars Sheykh Esmaili Systems Group ETH

More information

Incremental Evaluation of Sliding-Window Queries over Data Streams

Incremental Evaluation of Sliding-Window Queries over Data Streams Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue

More information

Big Data Platforms. Alessandro Margara

Big Data Platforms. Alessandro Margara Big Data Platforms Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Data Science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms

More information

data parallelism Chris Olston Yahoo! Research

data parallelism Chris Olston Yahoo! Research data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy

More information

Data Stream Management Systems - for Sensor Networks

Data Stream Management Systems - for Sensor Networks Data Stream Management Systems - for Sensor Networks Vera Goebel Department of Informatics, University of Oslo INF5100, Fall 2013 What are DSMSs? (terms) Why do we need DSMSs? (applications) Concepts:

More information

No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams

No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams Jin ~i', David ~aier', Kristin Tuftel, Vassilis papadimosl, Peter A. Tucke? 1 Portland State University 2~hitworth

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Design and Evaluation of an FPGA-based Query Accelerator for Data Streams

Design and Evaluation of an FPGA-based Query Accelerator for Data Streams Design and Evaluation of an FPGA-based Query Accelerator for Data Streams by Yasin Oge A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Engineering

More information

Update-Pattern-Aware Modeling and Processing of Continuous Queries

Update-Pattern-Aware Modeling and Processing of Continuous Queries Update-Pattern-Aware Modeling and Processing of Continuous Queries Lukasz Golab M. Tamer Özsu School of Computer Science University of Waterloo, Canada {lgolab,tozsu}@uwaterloo.ca ABSTRACT A defining characteristic

More information

Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg

Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg Data Stream Management and Complex Event Processing in Esper INF5100, Autumn 2010 Jarle Søberg Outline Overview of Esper DSMS and CEP concepts in Esper Examples taken from the documentation A lot of possibilities

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

CQL: A Language for Continuous Queries over Streams and Relations

CQL: A Language for Continuous Queries over Streams and Relations CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu Data Streams Continuous, unbounded, rapid, time-varying

More information

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/

More information

Rethinking the Design of Distributed Stream Processing Systems

Rethinking the Design of Distributed Stream Processing Systems Rethinking the Design of Distributed Stream Processing Systems Yongluan Zhou Karl Aberer Ali Salehi Kian-Lee Tan EPFL, Switzerland National University of Singapore Abstract In this paper, we present a

More information

Parallel and Distributed Stream Processing: Systems Classification and Specific Issues

Parallel and Distributed Stream Processing: Systems Classification and Specific Issues Parallel and Distributed Stream Processing: Systems Classification and Specific Issues Roland Kotto-Kombi, Nicolas Lumineau, Philippe Lamarre, Yves Caniou To cite this version: Roland Kotto-Kombi, Nicolas

More information

Enforcing Access Control Over Data Streams

Enforcing Access Control Over Data Streams Enforcing Access Control Over Data Streams Barbara Carmina8, Elena Ferrari, and Kian Lee Tan Presented by: Mehmud Abliz Mo8va8on Data stream management systems have been increasingly used to support a

More information

Monitoring Streams A New Class of Data Management Applications

Monitoring Streams A New Class of Data Management Applications Monitoring Streams A New Class of Data Management Applications Don Carney dpc@csbrownedu Uğur Çetintemel ugur@csbrownedu Mitch Cherniack Brandeis University mfc@csbrandeisedu Christian Convey cjc@csbrownedu

More information

DELIVERING QOS IN XML DATA STREAM PROCESSING USING LOAD SHEDDING

DELIVERING QOS IN XML DATA STREAM PROCESSING USING LOAD SHEDDING DELIVERING QOS IN XML DATA STREAM PROCESSING USING LOAD SHEDDING Ranjan Dash and Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019, USA ranjan.dash@mavs.uta.edu fegaras@cse.uta.edu

More information

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Nikos Zacheilas, Vana Kalogeraki Department of Informatics Athens University of Economics and Business 1 Big Data era has arrived!

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects

More information

Dealing with Overload in Distributed Stream Processing Systems

Dealing with Overload in Distributed Stream Processing Systems Dealing with Overload in Distributed Stream Processing Systems Nesime Tatbul Stan Zdonik Brown University, Department of Computer Science, Providence, RI USA E-mail: {tatbul, sbz}@cs.brown.edu Abstract

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Comprehensive Guide to Evaluating Event Stream Processing Engines

Comprehensive Guide to Evaluating Event Stream Processing Engines Comprehensive Guide to Evaluating Event Stream Processing Engines i Copyright 2006 Coral8, Inc. All rights reserved worldwide. Worldwide Headquarters: Coral8, Inc. 82 Pioneer Way, Suite 106 Mountain View,

More information

6.893 Database Systems: Fall Quiz 2 THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO LAPTOPS, NO PDAS, ETC. Do not write in the boxes below

6.893 Database Systems: Fall Quiz 2 THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO LAPTOPS, NO PDAS, ETC. Do not write in the boxes below Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.893 Database Systems: Fall 2004 Quiz 2 There are 14 questions and 12 pages in this quiz booklet. To receive

More information

Streaming Video and TCP-Friendly Congestion Control

Streaming Video and TCP-Friendly Congestion Control Streaming Video and TCP-Friendly Congestion Control Sugih Jamin Department of EECS University of Michigan jamin@eecs.umich.edu Joint work with: Zhiheng Wang (UofM), Sujata Banerjee (HP Labs) Video Application

More information

Transformation of Continuous Aggregation Join Queries over Data Streams

Transformation of Continuous Aggregation Join Queries over Data Streams Transformation of Continuous Aggregation Join Queries over Data Streams Tri Minh Tran and Byung Suk Lee Department of Computer Science, University of Vermont Burlington VT 05405, USA {ttran, bslee}@cems.uvm.edu

More information

different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management

different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management Traffic and Congestion Management in ATM 3BA33 David Lewis 3BA33 D.Lewis 2007 1 Traffic Control Objectives Optimise usage of network resources Network is a shared resource Over-utilisation -> congestion

More information

DESIGN AND IMPLEMENTATION OF SCHEDULING STRATEGIES AND THEIR EVALUAION IN MAVSTREAM. Sharma Chakravarthy Supervising Professor.

DESIGN AND IMPLEMENTATION OF SCHEDULING STRATEGIES AND THEIR EVALUAION IN MAVSTREAM. Sharma Chakravarthy Supervising Professor. DESIGN AND IMPLEMENTATION OF SCHEDULING STRATEGIES AND THEIR EVALUAION IN MAVSTREAM The members of the Committee approve the master s thesis of Vamshi Krishna Pajjuri Sharma Chakravarthy Supervising Professor

More information

S-Store: Streaming Meets Transaction Processing

S-Store: Streaming Meets Transaction Processing S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing

More information

Extending XQuery with Window Functions

Extending XQuery with Window Functions Extending XQuery with Window Functions Irina Botan, Peter M. Fischer, Dana Florescu*, Donald Kossmann, Tim Kraska, Rokas Tamosevicius ETH Zurich, Oracle* Elevator Pitch Version of this Talk XQuery can

More information

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management The VLDB Journal (2003) / Digital Object Identifier (DOI) 10.1007/s00778-003-0095-z Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Uǧur Çetintemel 2, Mitch

More information

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management The VLDB Journal (2003) 12: 120 139 / Digital Object Identifier (DOI) 10.1007/s00778-003-0095-z Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2,Uǧur Çetintemel

More information

Ariadne: Managing Fine-Grained Provenance on Data Streams

Ariadne: Managing Fine-Grained Provenance on Data Streams Ariadne: Managing Fine-Grained Provenance on Data Streams Boris Glavic Illinois Institute of Technology Chicago, IL bglavic@iit.edu Kyumars Sheykh Esmaili Nanyang Technological University Singapore kyumarss@ntu.edu.sg

More information

Data Stream Processing

Data Stream Processing Data Stream Processing Part II 1 Data Streams (recap) continuous, unbounded sequence of items unpredictable arrival times too large to store locally one pass real time processing required 2 Reservoir Sampling

More information

Low-latency Estimates for Window-Aggregate Queries over Data Streams

Low-latency Estimates for Window-Aggregate Queries over Data Streams Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 1-1-2011 Low-latency Estimates for Window-Aggregate Queries over Data Streams Amit Bhat Portland State University

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

Replay-Based Approaches to Revision Processing in Stream Query Engines

Replay-Based Approaches to Revision Processing in Stream Query Engines Replay-Based Approaches to Revision Processing in Stream Query Engines Anurag S. Maskey Brandeis University Waltham, MA 02454 anurag@cs.brandeis.edu Mitch Cherniack Brandeis University Waltham, MA 02454

More information

Measuring Performance of Complex Event Processing Systems

Measuring Performance of Complex Event Processing Systems Measuring Performance of Complex Event Processing Systems Torsten Grabs, Ming Lu Microsoft StreamInsight Microsoft Corp., Redmond, WA {torsteng, milu}@microsoft.com Agenda Motivation CEP systems and performance

More information

Dynamic Plan Migration for Snapshot-Equivalent Continuous Queries in Data Stream Systems

Dynamic Plan Migration for Snapshot-Equivalent Continuous Queries in Data Stream Systems Dynamic Plan Migration for Snapshot-Equivalent Continuous Queries in Data Stream Systems Jürgen Krämer 1, Yin Yang 2, Michael Cammert 1, Bernhard Seeger 1, and Dimitris Papadias 2 1 University of Marburg,

More information

Rack-scale Data Processing System

Rack-scale Data Processing System Rack-scale Data Processing System Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing

More information

Designing Hybrid Data Processing Systems for Heterogeneous Servers

Designing Hybrid Data Processing Systems for Heterogeneous Servers Designing Hybrid Data Processing Systems for Heterogeneous Servers Peter Pietzuch Large-Scale Distributed Systems (LSDS) Group Imperial College London http://lsds.doc.ic.ac.uk University

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

From CQL to CEP to BAM - Oracle s Event Processing Platform Now and in the (near) Future

From CQL to CEP to BAM - Oracle s Event Processing Platform Now and in the (near) Future From CQL to CEP to BAM - Oracle s Event Processing Platform Now and in the (near) Future Hans Viehmann Senior Sales Consulting Manager Agenda EDA The Big Picture Event Publishers

More information

Apache Flink. Alessandro Margara

Apache Flink. Alessandro Margara Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate

More information

Data Stream Management Systems - for Sensor Networks

Data Stream Management Systems - for Sensor Networks Data Stream Management Systems - for Sensor Networks Vera Goebel Department of Informatics, University of Oslo New Computing Paradigm? Sensor Networks What are DSMSs? (terms) Why do we need DSMSs? (applications)

More information

In-Memory Data Management

In-Memory Data Management In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.

More information

Oracle and Tangosol Acquisition Announcement

Oracle and Tangosol Acquisition Announcement Oracle and Tangosol Acquisition Announcement March 23, 2007 The following is intended to outline our general product direction. It is intended for information purposes only, and may

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

DRAFT A Survey of Event Processing Languages (EPLs)

DRAFT A Survey of Event Processing Languages (EPLs) DRAFT A Survey of Event Processing Languages (EPLs) October 15, 2006 (v14) Tim Bass, CISSP Co-Chair Event Processing Reference Architecture Working Group Principal Global Architect, Director TIBCO Software

More information

Partition and Compose: Parallel Complex Event Processing. Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS

Partition and Compose: Parallel Complex Event Processing. Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1 ? CEP = Stream Processing? Event (Stream) Processing Aggregate Enrich Join Filter Parse

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data

More information

B561 Advanced Database Concepts Streaming Model. Qin Zhang 1-1

B561 Advanced Database Concepts Streaming Model. Qin Zhang 1-1 B561 Advanced Database Concepts 2.2. Streaming Model Qin Zhang 1-1 Data Streams Continuous streams of data elements (massive possibly unbounded, rapid, time-varying) Some examples: 1. network monitoring

More information

Parallel paradigms for Data Stream Processing

Parallel paradigms for Data Stream Processing Università di Pisa Scuola Superiore Sant Anna Master Degree in Computer Science and Networking Master Thesis Parallel paradigms for Data Stream Processing Candidates Andrea Bozzi - Andrea Cicalese Supervisor

More information

Apache Flink- A System for Batch and Realtime Stream Processing

Apache Flink- A System for Batch and Realtime Stream Processing Apache Flink- A System for Batch and Realtime Stream Processing Lecture Notes Winter semester 2016 / 2017 Ludwig-Maximilians-University Munich Prof Dr. Matthias Schubert 2016 Introduction to Apache Flink

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Large-Scale Data Engineering. Data streams and low latency processing

Large-Scale Data Engineering. Data streams and low latency processing Large-Scale Data Engineering Data streams and low latency processing DATA STREAM BASICS What is a data stream? Large data volume, likely structured, arriving at a very high rate Potentially high enough

More information

Real-time data processing with Apache Flink

Real-time data processing with Apache Flink Real-time data processing with Apache Flink Gyula Fóra gyfora@apache.org Flink committer Swedish ICT Stream processing Data stream: Infinite sequence of data arriving in a continuous fashion. Stream processing:

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining

More information

Models and Issues in Data Stream Systems

Models and Issues in Data Stream Systems Models and Issues in Data Stream Systems Brian Babcock Shivnath Babu Mayur Datar Rajeev Motwani Jennifer Widom Department of Computer Science Stanford University Stanford, CA 94305 babcock,shivnath,datar,rajeev,widom

More information

What Developers must know about DB2 for z/os indexes

What Developers must know about DB2 for z/os indexes CRISTIAN MOLARO CRISTIAN@MOLARO.BE What Developers must know about DB2 for z/os indexes Mardi 22 novembre 2016 Tour Europlaza, Paris-La Défense What Developers must know about DB2 for z/os indexes Introduction

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle NoSQL Database and Oracle Relational Database - A Perfect Fit Dave Rubin Director NoSQL Database Development 2 The following is intended to outline our general product direction. It is intended

More information