StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments

Similar documents
Event-based QoS for a Distributed Continual Query System

Querying Sliding Windows over On-Line Data Streams

RASC: Dynamic Rate Allocation for Distributed Stream Processing Applications

Load Shedding for Aggregation Queries over Data Streams

Complex Event Processing in a High Transaction Enterprise POS System

Specifying Access Control Policies on Data Streams

Load Shedding for Window Joins on Multiple Data Streams

Multi-Model Based Optimization for Stream Query Processing

Adaptive Load Diffusion for Multiway Windowed Stream Joins

Condensative Stream Query Language for Data Streams

Rethinking the Design of Distributed Stream Processing Systems

Service-oriented Continuous Queries for Pervasive Systems

LinkViews: An Integration Framework for Relational and Stream Systems

Load Shedding. Recommended Reading

Detouring and Replication for Fast and Reliable Internet-Scale Stream Processing

Dealing with Overload in Distributed Stream Processing Systems

U ur Çetintemel Department of Computer Science, Brown University, and StreamBase Systems, Inc.

MonetDB/DataCell: leveraging the column-store database technology for efficient and scalable stream processing Liarou, E.

Resource Allocation in a Middleware for Streaming Data

MJoin: A Metadata-Aware Stream Join Operator

An Adaptive Multi-Objective Scheduling Selection Framework for Continuous Query Processing

ARES: an Adaptively Re-optimizing Engine for Stream Query Processing

Network Awareness in Internet-Scale Stream Processing

UDP Packet Monitoring with Stanford Data Stream Manager

Analytical and Experimental Evaluation of Stream-Based Join

A flexible Framework for Multisensor Data Fusion using Data Stream Management Technologies

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

A Dynamic Attribute-Based Load Shedding Scheme for Data Stream Management Systems

liquid: Context-Aware Distributed Queries

Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

SmartCIS: Integrating Digital and Physical Environments

Detection and Tracking of Discrete Phenomena in Sensor-Network Databases

Using Virtual Sensors to generate Timed Output from a Wireless Sensor Networks

Just-In-Time Processing of Continuous Queries

New Data Types and Operations to Support. Geo-stream

FIAP: Facility Information Access Protocol for Data-Centric Building Automation Systems

An Initial Study of Overheads of Eddies

OM: A Tunable Framework for Optimizing Continuous Queries over Data Streams

Live Data Views: Programming Pervasive Applications That Use Timely and Dynamic Data

SmartCIS: Integrating Digital and Physical Environments

Distributed Operator Placement and Data Caching in Large-Scale Sensor Networks

MIPS 2004 Tutorial: Data Stream Management Systems (DSMS) - Applications, Concepts, and Systems - Appendix: Literature References for DSMS

Query Optimization for Distributed Data Streams

Distributed Pattern Discovery in Multiple Streams

Database abstractions for managing sensor network data

Optimizing Multiple Distributed Stream Queries Using Hierarchical Network Partitions

Caching Queues in Memory Buffers

Sliding Window Based Method for Processing Continuous J-A Queries on Data Streams

Metadata Services for Distributed Event Stream Processing Agents

Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

A Publish & Subscribe Architecture for Distributed Metadata Management

Customizable Parallel Execution of Scientific Stream Queries

Distributed Monitoring of Peer to Peer Systems

Operator Placement for In-Network Stream Query Processing

Stream Processing Algorithms that Model Behavior Change

Multilevel Secure Data Stream Processing

Towards Correcting Input Data Errors Probabilistically Using Integrity Constraints

Continuous Analytics: Rethinking Query Processing in a Network-Effect World

Dissemination of Real-time Sensor Data on the Internet

Research Issues in Outlier Detection for Data Streams

Dynamic Plan Migration for Snapshot-Equivalent Continuous Queries in Data Stream Systems

utility utility utility message % output value delay

Predicate Indexing for Incremental Multi-Query Optimization

A column-oriented data stream engine

ijoin: Importance-aware Join Approximation over Data Streams

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

A Distributed Event Stream Processing Framework for Materialized Views over Heterogeneous Data Sources

BIG data applications such as stock exchanges, smart. FPGA Based Custom Accelerator Architecture Framework for Complex Event Processing

Design Considerations of a Flexible Data Stream Processing Middleware

A theory of stream queries

Adaptive In-Network Query Deployment for Shared Stream Processing Environments

Overload Management in Data Stream Processing Systems with Latency Guarantees

QoS Management of Real-Time Data Stream Queries in Distributed. environment.

Real Time Processing of Streaming and Static Information

Load Shedding in a Data Stream Manager

Efficient Migration of Service Agent in P-Grid Environments based on Mobile Agent

Multi-granular Time-based Sliding Windows over Data Streams

Chapter 5 Data Streams and Data Stream Management Systems and Languages

New Event-Processing Design Patterns using CEP

No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams

FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies

System Energy Efficiency Lab seelab.ucsd.edu. Jinseok Yang

Real-time stream mining: online knowledge. extraction using classifier networks

PRESTO: A Predictive Storage Architecture for Sensor Networks

Dissemination of Paths in Path-Aware Networks

XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms. Presented by Wei Dai

Towards a Unified Monitoring and Performance Analysis System for the Grid

MauveDB: Statistical Modeling inside Database Systems. Amol Deshpande, University of Maryland

Towards Context-Aware Adaptable Web Services

Clustering from Data Streams

Adaptive Load Shedding via Fuzzy Control in Data Stream Management Systems

PIX-Grid: A Platform for P2P Photo Exchange

PRESTO: A Predictive Storage Architecture for Sensor Networks

Managing Trajectories of Moving Objects as Data Streams

Diplomarbeit. Distributed Processing of Data Streams in P2P Networks. Tobias Scholl Alemannenstr Augsburg

Sensor Data and Managing Data Quality in Sensory Network Applications

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

Computer-based Tracking Protocols: Improving Communication between Databases

Transformation of Continuous Aggregation Join Queries over Data Streams

Transcription:

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments A. Kemper, R. Kuntschke, and B. Stegmaier TU München Fakultät für Informatik Lehrstuhl III: Datenbanksysteme http://www-db.in.tum.de/research/projects/streamglobe

Outline Motivation StreamGlobe The StreamGlobe Approach Architecture Overview Current and Future Research Conclusion 09/05/2006 StreamGlobe 2

Exemplary Initial Situation B A WLAN b Network Consists of peers Given or grown topology Data Sources Provide XML data stream Possibly infinite streams (e.g., sensor measurements) User requests Continuous queries Query language XQuery Registered at a peer 09/05/2006 StreamGlobe 3

General Traditional Approach A B 1. Register requests 2. Establish data transfer Peers may connect arbitrarily 3. Process / Execute requests 4. Routing of streams Map streams to network b 09/05/2006 StreamGlobe 4

General Traditional Approach (ctd.) A 2 B Drawbacks 1. Transmission of useless data 2. Redundant transmissions 3. Multiple request evaluation 3 1 Network congestion and processing overhead 3 b 09/05/2006 StreamGlobe 5

Why StreamGlobe? Other Systems / previous work E.g. Cougar, TelegraphCQ, Multicast techniques: Focus on specific aspects (e.g., query optimization) Tailored to specific domains StreamGlobe Contribution is combination of techniques: In-network query processing combined with routing Constitutes a generic infrastructure Independent of domain Efficient data stream transformation and distribution 09/05/2006 StreamGlobe 6

Outline Motivation StreamGlobe The StreamGlobe Approach Architecture Overview Current and Future Research Conclusion 09/05/2006 StreamGlobe 7

The StreamGlobe Approach B Intelligent Routing Multicast routing techniques Data Stream Clustering A a ab Push query execution into network Multi-query optimization Reduce network traffic Avoid redundant transmissions Reduce processing cost b 09/05/2006 StreamGlobe 8

Basic Concepts P2P Network Topology No arbitrary communication Communication via transfer paths No fixed P2P topology Classification of peers Thin-Peers Super-Peers Constitution of a super-peer backbone Hierarchical organization Speaker-peer responsible for certain subnet 09/05/2006 StreamGlobe 9

StreamGlobe Peer Architecture XQuery Subscriptions register StreamGlobe Interface Optimization Query Engine Globus Toolkit XML Data Streams Metadata Management Based upon Open Grid Services Architecture (OGSA) Integration similar to OGSA- DAI or OGSA-DQP Layers as grid-services Availability according to peer capabilities Message exchange via RPC and notifications Data stream transfer via direct TCP connections 09/05/2006 StreamGlobe 10

Optimization Goals 1. Registration of arbitrary subscriptions at any peer 2. Achieve good distribution of data streams 3. Optimize evaluation of many subscriptions Achievement Pushing query execution into the network (1) and (3) Multiquery optimization (3) Early filtering of data streams resp. evaluation of subscriptions (2) Data stream clustering (2) 09/05/2006 StreamGlobe 11

Multi-Query Optimization b Performed by speaker-peer Analyze subscriptions and streams Common subqueries Query a Filter a Filter b Query ab Re-usability of streams Based on properties of subscriptions / streams Computes Filters and queries Data stream clustering Execution locations 09/05/2006 StreamGlobe 12

Query Execution Basic concepts Streaming evaluation and push-based techniques Preclude unbounded buffering by requiring window constraints Extensibility by means of mobile code Evaluation of subscriptions with FluX Designed for streaming processing of XQuery Event-based extension to XQuery Usage of schema information for buffer minimization Visit my talk at the VLDB: Tomorrow, Research Session 6: XML(II) 09/05/2006 StreamGlobe 13

Outline Motivation StreamGlobe The StreamGlobe Approach Architecture Overview Current and Future Research Conclusion 09/05/2006 StreamGlobe 14

Current and Future Research Current Research Optimization techniques Extension of FluX Future Research Quality-of-Service management Explicit load balancing Load shedding techniques Construction of overlay network 09/05/2006 StreamGlobe 15

Conclusion StreamGlobe Exploiting in-network query processing capabilities In combination with data stream clustering Minimization of network traffic Query execution with FluX Efficient and scalable execution of subscriptions Multi-query optimization Parallelization and load balancing in the network 09/05/2006 StreamGlobe 16

Related Work Aberer, Cudré-Mauroux, Datta, Despotovic, Hauswirth, Punceva, Schmidt. P-Grid: a selforganizing structured P2P system. SIGMOD Record 32(3), 2003 Arasu, Babcock, Babu, Datar, Ito, Motwani, Nishizawa, Srivastava, Thomas, Varma, Widom. STREAM: The Stanford Stream Data Manager. Data Engineering Bulletin 26(1), 2003 Carney, Cetintemel, Cherniack, Convey, Lee, Seidman, Stonebraker, Tatbul, Zdonik. Monitoring Streams A New Class of Data Management Applications. VLDB 2002 Chandrasekaran, Cooper, Deshpande, Franklin, Hellerstein, Hong, Krishnamurthy, Madden, Raman, Reiss, Shah. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003 Cherniack, Balakrishnan, Balazinska, Carney, Cetintemel, Xing, Zdonik. Scalable Distributed Stream Processing. CIDR 2003 Krämer, Seeger. PIPES A Public Infrastructure for Processing and Exploring Streams. SIGMOD 2004 Madden, Shah, Hellerstein, Raman. Continuously Adaptive Continuous Queries over Streams. SIGMOD 2002 Sellis. Multiple-Query Optimization. TODS 1988 Yang, Garcia-Molina. Designing a Super-Peer Network. ICDE 2003 Yao, Gehrke. The Cougar Approach to In-Network Query Processing in Sensor Networks. SIGMOD Record 31(3), 2002 09/05/2006 StreamGlobe 17