Outline. Motivation. Traditional Database Systems. A Distributed Indexing Scheme for Multi-dimensional Range Queries in Sensor Networks

Similar documents
Center for Embedded Network Sensing University of California

Multidimensional Indexes [14]

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Geographic Routing in Simulation: GPSR

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Computational Geometry

Data-Centric Query in Sensor Networks

08 Distributed Hash Tables

Approximately Uniform Random Sampling in Sensor Networks

15-441: Computer Networking. Lecture 24: Ad-Hoc Wireless Networks

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

Challenges in Geographic Routing: Sparse Networks, Obstacles, and Traffic Provisioning

Simulations of the quadrilateral-based localization

CMSC 754 Computational Geometry 1

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

CS127: B-Trees. B-Trees

Geometric data structures:

A Scalable Content- Addressable Network

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Chapter 12: Indexing and Hashing

R2D2: Rendezvous Regions for Data Discovery Karim Seada 1, Ahmed Helmy 2

Hierarchical Clustering 4/5/17

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Routing in Sensor Networks

Data Organization and Processing

Chapter 7: Naming & Addressing

Computational Geometry

Chapter 11: Indexing and Hashing

Using Hierarchical Location Names for Scalable Routing and Rendezvous in Wireless Sensor Networks

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305

Lecture 7: Decision Trees

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Energy Aware and Anonymous Location Based Efficient Routing Protocol

Location-aware In-Network Monitoring in Wireless Sensor Networks

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Table of Contents 1 PIM Configuration 1-1

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

Database index structures

Spatial Data Management

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Data Communication. Guaranteed Delivery Based on Memorization

Routing Outline. EECS 122, Lecture 15

Peer to peer systems: An overview

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Spatial Data Management

Multi-Way Search Trees

Background: disk access vs. main memory access (1/2)

Location Oriented Networking

Request for Comments: 3989 Category: Informational T. Taylor Nortel February Middlebox Communications (MIDCOM) Protocol Semantics

A Survey of Peer-to-Peer Content Distribution Technologies

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Geographic rendezvous-based architectures for emergency data dissemination

Motivation for B-Trees

Minimizing Churn in Distributed Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Scalable P2P architectures

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Lecture 25 of 41. Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees

C13b: Routing Problem and Algorithms

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks

Geographic Routing Without Location Information. AP, Sylvia, Ion, Scott and Christos

A Scalable and Resilient Layer-2 Network with Ethernet Compatibility

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Hierarchical Peer-to-Peer Networks

6.033 Computer Systems Engineering: Spring Quiz II THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO COMPUTERS, NO LAPTOPS, NO PDAS, ETC.

Geographic and Diversity Routing in Mesh Networks

Graph Algorithms. Many problems in networks can be modeled as graph problems.

Masters Exam. Fall 2004 Department of Computer Science University of Arizona

Linux System Administration

Range Searching and Windowing

Fractional Cascading in Wireless. Jie Gao Computer Science Department Stony Brook University

Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1

A Scalable and Resilient Layer-2 Network with Ethernet Compatibility

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

GEM: Graph EMbedding for Routing and Data-Centric Storage in Sensor Networks Without Geographic Information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Protocol for Tetherless Computing

Geographical routing 1

Locality- Sensitive Hashing Random Projections for NN Search

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

Polygon Partitioning. Lecture03

EEC-684/584 Computer Networks

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Collision and Proximity Queries

Lecture 5: Search Algorithms for Discrete Optimization Problems

Homework 2: IP Due: 11:59 PM, Oct 19, 2017

Multiway Search Trees. Multiway-Search Trees (cont d)

Chapter 2: The Game Core. (part 2)

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Multi-Way Search Trees

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

External Memory Algorithms and Data Structures Fall Project 3 A GIS system

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818

Secure Routing in Wireless Sensor Networks: Attacks and Countermeasures

Ad hoc and Sensor Networks Topology control

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Transcription:

A Distributed Indexing Scheme for Multi-dimensional Range Queries in Sensor Networks Tingjian Ge Outline Introduction and Overview Concepts and Technology Inserting Events into the Index Querying the Index Analysis and Comparison with Other Schemes 2 Motivation Events are data Tuple of attribute values <A 1, A 2,, A k > Each A i is a sensor reading Multi-dimensional range queries <x 1 y 1, x 2 y 2,, x k y k > List all events that have temperatures between 50 F and 60 F, and light levels between 10 and 20 Point query (equality) is a special case Essentially any query Correlate events and triggering actions Eg., queries indicate an event, triggering cameras Traditional Database Systems Not in this paper B-tree index Hash index Patricia tree index Bitmap index All above are centralized indices Data dependency (insertion order vs. index structure) B-tree is data-dependent, Patricia tree is not Our distributed index is not Sensitivity of index structure on insert/update/delete Rebalance of index structure on skewed data 3 4

How about Queries in Sensor Networks Flooding Store events where they are generated and queries are flooded through the network External storage All events stored centrally in a node outside the sensor network Geographic hash table (GHT) Events are hashed on some attribute and range queries are sub-divided and hashed to the appropriate location Distributed Index for Multi-dimensional data (DIMs) 5 Overview Events (data) are mapped onto zones of the network (multi-dimensional space to 2-d space) Data Locality: Events with close attribute values stored in same location in network Locality-preserving geographic hash Events are routed to and stored at that node Queries are routed to and resolved by appropriate nodes 6 Using GPSR - Greedy Perimeter Stateless Routing Zones Algorithm to route events to appropriate nodes at specified location Greedy-mode forwarding Node receives packet with destination X, node forwards packet to neighbor closest to X Perimeter-mode If no neighbor that takes the packet closer to its destination exists (i.e. void) Rectangle R on x-y plane (entire network) Subrectangle Z is a zone if Z is obtained by dividing R k times satisfying the following property: After the i-th division, 1 i k, R is partitioned into 2 i equal rectangles. If i is odd (even), the division is parallel to the y-axis (xaxis). k is the level of the zone, level(z) = k Right-hand rule to circumnavigate voids 7 8

Zones Example Zone Identification code(z) Bit string of length level(z) Starting from left of code string, if zone Z resides on the left half of R, bit equals 0, else 1. For the next bit, if zone Z resides on the bottom half of R, bit is 0, else 1. addr(z) Location of the centroid of zone rectangle 9 10 Zone Terminology Sibling subtree of a zone Left/right subtree rooted at the same parent zone Backup zone If the sibling subtree of a node is on the left (right), its backup zone is the rightmost (leftmost) zone in its sibling subtree If code(z)=p1, code(backup(z))=p01* If code(z)=p0, code(backup(z))=p10* Associating Zones with Nodes Sensor field divided into zones, which can be of different sizes (not a complete binary tree different levels) Zone ownership A owns Z A Z A is the largest zone that contains only node A Some zones may not have node owner backup(z) is the owner 11 12

Algorithm for Zone Ownership Each node maintains its four boundaries Initialize to network boundary Send messages to learn locations of neighbors If neighbor responds, node will adjust its boundaries accordingly Else boundary is undecided Undecided boundaries resolved during querying or event insertion Discussions (optionally): Efficiency Alternative Reality and conclusion (offload insert/query, one-time) Pseudocode for Zone Ownership Build-Zone(a) while Contain(ZA, a) do if length(code(za)) mod 2 == 0 then new_bound = (bound[0] + bound[1])/2 if A.x < new_bound then bound[1] = new_bound else bound[0] = new_bound else new_bound = (bound[2] + bound[3])/2 if A.y < new_bound then bound[3] = new_bound else bound[2] = new_bound 13 14 Hashing an event to a zone Routing an event to its owner Resolving undecided zone boundaries during insertion Hashing an Event to a Zone Have m attributes A 1, A 2,, A m and attribute values have been normalized To hash a k-bit zone code to an event: For i in [1, m], if A i < 0.5, the i th bit of the zone code is 0, else 1. For i in [m+1, 2m], if A i < 0.25 or 0.5 A i < 0.75, then the (i-m) th bit is 0, else 1. Etc. until all k bits are assigned 15 16

Hashing an Event to a Zone Example: Hash event <0.3, 0.8> to a 5-bit zone code Zone code = 01110 Discussions (optional): Precision What if k<m? Add dummy levels? Ordering of attributes Normalization, value bound tracking, dynamic updates Can actual code of an event be determined from only the max level of the network? Routing an Event to its Owner Node generating the event calculates code(e) up to its own length GPSR delivers message to some intermediate node A Message contains: event E, code(e), target location, owner, location of owner, A encodes the event to code new (E) (actually only if needed) Updates message if code new (E) is longer than code in message A checks if code(a) has longer match with code(e) than previous owner If yes, update message by setting itself as the owner If code(a) and code(e) identical and A s boundaries are known, A is the owner of E and stores it Else A will route E to its owner by invoking GPSR 17 18 Resolving Undecided Boundaries Suppose node C receives event E If code(c) = code(e) and all of C s boundaries are known, C will store the event If C has undecided boundaries, there may be zone overlap with another node C sets itself as owner and forwards message using GPSR perimeter mode If message is not changed, it will come back to C C assumes it is the owner and stores it 19 Resolving Undecided Boundaries An intermediate node X marks itself as the owner but code(e) is unchanged X recognizes zone overlap with C and adjusts its boundaries and send messages to C to update its boundaries An intermediate node D refines code(e) D will try to deliver the message to the new zone Another node X may overlap with C X will shrink its zone and send C messages to do the same C will update its undecided boundary 20

Example Nodes A and B have claimed the same zone 0 Node A generates event E = <0.4, 0.8, 0.9>, code(e) = 0 Perimeter mode forwarding of event to B B and A engage in message exchange to shrink zones Mistake in the paper: B shrinks its zone from 0 to 01 according to A s location (not needed, it knows) Queries Routing Routing for point queries is the same as event insertion Range queries query initially routed to zone corresponding to the entire range Comment: Effectively, this means the initial destination of the query is the lowest-level node containing the query ranges progressively split into smaller sub-queries so each sub-query can be resolved by a single node 21 22 Splitting Queries Node A splits a query if its zone overlaps, but does not entirely contain query range If the range of Q s first attribute contains value 0.5, A divides Q into two sub-queries, one with range 0 to 0.5, and the other 0.5 to 1 If a sub-query does not overlap with zone A any more, A stops splitting it Else A continues splitting the query using successive attribute ranges and recomputing the overlap until it is small enough to fit entirely in zone(a) Splitting Queries Example Suppose there is a node A with code(a) = 0110 Split a query Q = <0.3 0.8, 0.6 0.9> 23 24

Query Resolution Once a sub-query falls into a zone, the node owner resolves the query and sends the reply to the querier The other sub-queries are forwarded to other nodes Robustness Maintaining zones: zone expansion (due to nodes turned off) Dealing with Node Failures: Local replication (sibling zone) Mirror replication (one s complement of zone code) Dealing with packet loss ACK, negative ACK, timeout, selectively re-issue 25 26 Analysis on DIMs Average Insertion Cost Metrics Average insertion cost average number of messages required to insert an event into the network Discussion: why is the result in next slide? Average query delivery cost average number of messages required to route a query message to all the relevant nodes Compared against alternatives GHT and flooding Discussion: what happens with GHT? Discussion: why the difference between bounded uniform & exponential query distribution? 27 28

Average Query Cost Bounded Uniform Distribution Average Query Cost Exponential Query Distribution 29 30 Conclusion Under reasonable assumptions about query distributions, DIMs scale quite well with network size (both insertion and query costs scale O( N)) Work that still needs to be done Skewed data distribution Existential queries Node heterogeneity Final Thoughts & Discussions Comparison with B-tree index B-tree index can only do prefix match; DIMs can match any attribute, distributed & concurrent processing B-tree can rebalance DIMs is essentially a binary tree, but GPSR routing makes it more than logn, and N is total network nodes, not data size Locality The more the levels, the more divided the values, the worse the locality Even if few events (data), likely to be very distributed Possible solution: change normalization, but doesn t scale well Selectivity Depends on network node structure Insertion cost vs. query cost Sensor networks insertion cost is a big deal Improvements? Distributed caching of query result? 31 32