Amol Deshpande, UC Berkeley. Suman Nath, CMU. Srinivasan Seshan, CMU

Similar documents
CS 525M Mobile and Ubiquitous Computing Seminar. Mark Figura

Adaptive Data Placement for Wide-Area Sensing Services

A method of sensor data processing on large scale publish/subscribe systems

Database-Centric Programming for Wide-Area Sensor Systems

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153

Data Structures and Algorithms

The XQuery Data Model

A Distributed Filtering Architecture for Multimedia Sensors

Evaluating XPath Queries

Tree-Pattern Queries on a Lightweight XML Processor

XML Data Stream Processing: Extensions to YFilter

PARALLEL XPATH QUERY EVALUATION ON MULTI-CORE PROCESSORS

XML in the DATA Step Michael Palmer, Zurich Biostatistics, Inc., Morristown, New Jersey

Grid Computing Systems: A Survey and Taxonomy

Flexible Design for Simple Digital Library Tools and Services

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

A Location Model for Ambient Intelligence

Binary Trees, Binary Search Trees

Consistency-preserving Caching of Dynamic Database Content

Geodatabases. Dr. Zhang SPRING 2016 GISC /03/2016

YFilter: an XML Stream Filtering Engine. Weiwei SUN University of Konstanz

Semi-structured Data. 8 - XPath

SFilter: A Simple and Scalable Filter for XML Streams

Scalable Hybrid Search on Distributed Databases

An Efficient XML Index Structure with Bottom-Up Query Processing

CSE 214 Computer Science II Introduction to Tree

Oracle NoSQL Database Enterprise Edition, Version 18.1

An approach to the model-based fragmentation and relational storage of XML-documents

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

XML Storage and Indexing

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Yandex.Classifieds. Vadim Tsesko

4 SAX. XML Application. SAX Parser. callback table. startelement! startelement() characters! 23

Multi-Channel Clustered Web Application Servers

Highly Distributed and Fault-Tolerant Data Management

Binary Trees

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

1 <?xml encoding="utf-8"?> 1 2 <bubbles> 2 3 <!-- Dilbert looks stunned --> 3

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

What s New in VMware vsphere 5.1 VMware vcenter Server

SAX Simple API for XML

Information Retrieval. Lecture 10 - Web crawling

MySQL & NoSQL: The Best of Both Worlds

Part XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Table of Contents Chapter 1 - Introduction Chapter 2 - Designing XML Data and Applications Chapter 3 - Designing and Managing XML Storage Objects

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Wide Area Query Systems The Hydra of Databases

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Introduction to XQuery and XML Databases

Edge Side Includes (ESI) Overview

HYRISE In-Memory Storage Engine

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Triggers in IrisNet 1. INTRODUCTION. Hyang-Ah Kim, Tadashi Okoshi, Vladislav Shkapenyuk {hakim, slash,

status Emmanuel Cecchet

Databases and Information Systems 1. Prof. Dr. Stefan Böttcher

exist: An Open Source Native XML database

Mining XML Functional Dependencies through Formal Concept Analysis

Cost-effective development of flexible self-* applications

Gyu Sang Choi Yeungnam University

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

1 The size of the subtree rooted in node a is 5. 2 The leaf-to-root paths of nodes b, c meet in node d

Part V. SAX Simple API for XML. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 84

11 TREES DATA STRUCTURES AND ALGORITHMS IMPLEMENTATION & APPLICATIONS IMRAN IHSAN ASSISTANT PROFESSOR, AIR UNIVERSITY, ISLAMABAD

ZooKeeper. Table of contents

XML and information exchange. XML extensible Markup Language XML

XML Data Management. 5. Extracting Data from XML: XPath

Optimizing distributed XML queries through localization and pruning

Part V. SAX Simple API for XML

Oracle NoSQL Database Enterprise Edition, Version 18.1

Course: The XPath Language

Network Awareness in Internet-Scale Stream Processing

XPath. by Klaus Lüthje Lauri Pitkänen

Trees. CSE 373 Data Structures

Chapter 13 XML: Extensible Markup Language

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

516. XSLT. Prerequisites. Version 1.2

IBM Db2 Analytics Accelerator Version 7.1

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Comprehensive Guide to Evaluating Event Stream Processing Engines

Early Measurements of a Cluster-based Architecture for P2P Systems

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Data Querying, Extraction and Integration II: Applications. Recuperación de Información 2007 Lecture 5.

Sensor Network Protocol Design and Implementation. Philip Levis UC Berkeley

Adaptive Cluster Computing using JavaSpaces

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

EMERGING TECHNOLOGIES

ON VIEW PROCESSING FOR A NATIVE XML DBMS

COMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Index-Driven XQuery Processing in the exist XML Database

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

MATLAB-to-ROCI Interface. Member(s): Andy Chen Faculty Advisor: Camillo J. Taylor

Transcription:

Amol Deshpande, UC Berkeley Suman Nath, CMU Phillip Gibbons, Intel Research Pittsburgh Srinivasan Seshan, CMU IrisNet

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions 2

Motivation Proliferation of resource-intensive sensors attached to powerful devices Webcams, pressure gauges, microphones Rich data sources with high data volumes Typically distributed over wide geographical areas Useful services utilizing such sensors missing IrisNet: An infrastructure to support deployment of sensor services over such sensors 3

Ease of deployment of sensor services Minimal requirements from the service provider Distributed data storage and querying for high throughputs Ease of querying XML as the data format, XPATH as the query language Natural geographical hierarchy on data as well as queries Continuously evolving data Location transparency Logical view of the entire distributed database as a single centralized XML document 4

Sensing Agents (SA) PDA/PC-class processor, MBs GBs storage OAs SA SA Collect & process data from sensors, as dictated by senselet code uploaded by OAs Processed data sent to the OAs for update in-place Organizing Agents (OA) PC/Server-class processor, GBs storage Provide data storage, discovery, querying facilities Use an off-the-shelf database to store data locally Interface with the local database using XPATH/XSLT 5

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions 6

Webcams monitor parking spaces and provide real-time information about their availability Image processing to extract availability information Natural geographical hierarchy on the data 7

<State id= Pennysylvinia > <County id= Allegheny > <City id= Pittsburgh > <Neighborhood id= Oakland > <total-spaces>200</total-spaces> <Block id= 1 > <GPS> </GPS> <pspace id= 1 > <in-use>no</in-use> <metered>yes</metered> </pspace> <pspace id= 2 > </pspace> </Block> </Neighborhood> <Neighborhood id= Shadyside > 8

9

Users issue queries against the document as a whole Find all available parking spots in Oakland /State[@id= Pennsylvania ] /County[@id= Allegheny ]/City[@id= Pittsburgh ] /Neighborhood[@id= Oakland ]/Block/pSpace[in-use = no ] Find all blocks in in Allegheny have more than 20 metered parking spots /State[@id= Pennsylvania ]/County[@id= Allegheny ] //Block[count(./pSpace[metered = yes ]) > 20] Find the cheapest parking spot in Oakland Block 1 /State[@id= Pennsylvania ]/County[@id= Allegheny ]/City[@id= Pittsburgh ] /Neighborhood[@id= Oakland ]/Block[@id= 1 ] /pspace[not(../pspace/price >./price)] Challenge : Evaluate arbitrary XPATH queries against the document even though the document may be partitioned across multiple OAs 10

Maintain data partitioning invariants Used to guarantee that an OA always has sufficient information to participate correctly in a query Use DNS to maintain the data distribution information and to route queries to data Convert the XPATH query to an XSLT query that : Walks the document recursively Evaluates part of the query that can be done locally Gathers missing information by asking subqueries 11

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions 12

Definition : An IDable node in the document Has an id attribute with value unique among its siblings All its ancestors in the document are IDable 13

Definition : Local Information of an IDable node All its attributes and all its non-idable descendants IDs of all its IDable children 14

Definition : Local Information of an IDable node All its attributes and all its non-idable descendants IDs of all its IDable children 15

Data storage, ownership always in units of local informations corresponding to the IDable nodes in the document These form a nearly-disjoint partitioning of the overall document Granularity can be controlled using the id attributes A partitioning unit can be uniquely identified using the id s on the path to the root of the document Data ownership: Each partitioning unit owned by exactly one OA 16

Data stored locally at each OA: A document fragment consisting of union of partitioning units Constraints: Must store the document fragment it owns If stored the id of an IDable node, must also store the local information of all its ancestors We minimize the amount of information required to store (details in paper) Invariant : Only need to store ID s of all ancestors, and of their children If an OA has the id of an IDable node, it either Has the local information for the node, or Has the id s on the path to the root allowing it to locate the local information for that node 17

OA 1 Owns OA 2 Owns 18

Local information required Local information optional Local information optional Data storage configuration at OA 1 19

Local information required Local information required Local information optional Data storage configuration at OA 2 20

Mapping of nodes to physical OAs maintained using DNS For each IDable node, create a unique DNS-style name by concatenating the IDs on the path to the root OA 2 Owns OA 1 Owns Mapped to OA 1: Allegheny-County.iris.net Pittsburgh-City.Allegheny-County.iris.net Mapped to OA 2: Oakland-Neighborhood.Pittsburgh-City. Allegheny-County.iris.net 1-Block.Oakland-Neighborhood.Pittsburgh - City.Allegheny-County.iris.net 1-pSpace.1-Block.Oakland-Neighborhood. Pittsburgh-City.Allegheny-County.iris.net 21

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions 22

Each query has a hierarchical prefix /State[@id= Pennsylvania ]/County[@id= Allegheny ] /City[@id= Pittsburgh ]/ /Neighborhood[@id= Oakland ]/Block/pSpace Simple parsing of the query to extract the least common ancestor (LCA) of the possible query result Send the query to Oakland-Neighborhood.Pittsburgh-City. Allegheny-County.Pennsylvania-State.parking.intel-iris.net Name extracted from query without any global or per-service state 23

Nesting depth of an XPATH query Maximum depth at which a location path that traverses over IDable nodes occurs in the query Examples : /a[@id= x ]/b[@id= y ]/c 0 /a[@id= x ]//c 0 /a[./b/c]/b 1 (if b is IDable) /a[count(./b/[./c[@id= 1 ]]) 2 Complexity of evaluating a query increases with nesting depth 24

Any predicate in the query can be evaluated using just the local information for an IDable node Example : /Block[@id= 1 ][./available-spaces > 10] Sketch of the XSLT program : Walk the document recursively If local information for the node under consideration available, evaluate the part of the query that refers to that node, otherwise tag the returned answer with the tag asksubquery Postprocessor finds the missing information by asking subqueries 25

A site can add to its document any fragment as long as the data partitioning constraints are satisfied We generalize subqueries to fetch the smallest superset of the answer that satisfies the constraints and cache it Data time-stamped at the time of caching Queries can specify freshness requirements 26

Queries with Nesting Depth > 0 Schema changes Data partitioning changes Implementation details and experimental study 27

Identified the challenges in query processing over a distributed XML document Developed formal framework and techniques that Allow for flexible document partitioning Integrate caching seamlessly Correctly and efficiently answer XPATH queries Experimental results demonstrate the advantages of flexible data partitioning and caching 28

IrisNet project website http://www.intel-iris.net 29

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Performance Study 30

Current prototype written in Java A cluster of 9 2GHz Pentium IV machines Apache Xindice used as the backend XML database Artificially generated database 2400 parking spaces with 2 cities, 6 neighborhoods and 120 blocks Five query workloads QW-1: Asking for a single block QW-2: Asking for two blocks from a single neighborhood QW-3: Asking for two blocks from two neighborhoods QW-4: Asking for two blocks from two cities QW-Mix: 40% of QW-1 and QW-2, 15% QW-3, 5%QW-4 31

32

Architecture already allows for caching data An OA is allowed to store more data than that it owns Data time-stamped at the time of caching Queries can specify freshness tolerance 33

34

35

OA 1 OWNS OA 2 OWNS e.g. OA 2 must store local information of the County(Allegheny) node 36

Location transparency distributed DB hidden from user Flexible data partitioning Low latency queries & Query scalability Direct query routing to LCA of the answer Query-driven caching, supporting partial matches Load shedding; No per-service state needed at web servers Support query-based consistency Use off-the-shelf DB components 37

<County id= Allegheny > <City id= Pittsburgh > <Neighborhood id= Oakland > <available-spaces>8</available-spaces> <Block id= 1 > <pspace id= 1 > <in-use>no</in-use> <metered>yes</metered> </pspace> </Block> </Neighborhood> </City> </County> 38

Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Performance Study 39