R-Trees. Accessing Spatial Data

Similar documents
Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Multidimensional Indexing The R Tree

I/O-Algorithms Lars Arge Aarhus University

CS127: B-Trees. B-Trees

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

2. Dynamic Versions of R-trees

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Chapter 2: The Game Core. (part 2)

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

Spatial Data Management

Spatial Data Management

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

Data Warehousing & Data Mining

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

B-Trees & its Variants

Principles of Data Management. Lecture #14 (Spatial Data Management)

Spatial Data Structures

Spatial Data Structures

Spatial Data Structures

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Laboratory Module X B TREES

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

PCR*-Tree: PCM-aware R*-Tree

R-Tree Based Indexing of Now-Relative Bitemporal Data

Chapter 13: Query Processing

Introduction to Spatial Database Systems

Spatial Data Structures

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Indexing and Hashing. Basic Concepts

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing

Cost Models for Query Processing Strategies in the Active Data Repository

Intro to DB CHAPTER 12 INDEXING & HASHING

CSIT5300: Advanced Database Systems

What we have covered?

Database index structures

Data Organization and Processing

Massive Data Algorithmics

Chapter 12: Query Processing

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

CSE 530A. B+ Trees. Washington University Fall 2013

Chapter 11: Indexing and Hashing

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Massive Data Algorithmics

Perfect Hashing Base R-tree for Multiple Queries

I/O-Algorithms Lars Arge

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Level-Balanced B-Trees

Comparing Implementations of Optimal Binary Search Trees

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Spatial Queries. Nearest Neighbor Queries

CMSC 754 Computational Geometry 1

Chapter 11: Indexing and Hashing

Computer Graphics. Bing-Yu Chen National Taiwan University

Algorithms. Deleting from Red-Black Trees B-Trees

Advanced Data Types and New Applications

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 12: Indexing and Hashing (Cnt(

Balanced Trees Part One

Experimental Evaluation of Spatial Indices with FESTIval

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

R-Tree Based Indexing of Now-Relative Bitemporal Data

amiri advanced databases '05

Main Memory and the CPU Cache

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Ray Tracing Acceleration Data Structures

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Computational Geometry in the Parallel External Memory Model

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Balanced Binary Search Trees. Victor Gao

Distributed Geometric Data Structures. Philip Levis Stanford Platform Lab Review Feb 9, 2017

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

Multimedia Database Systems

Suppose you are accessing elements of an array: ... or suppose you are dereferencing pointers: temp->next->next = elem->prev->prev;

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Spatial Data Structures

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

File Structures and Indexing

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Chapter 11: Indexing and Hashing

Planar Point Location

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Balanced Search Trees

Multidimensional Data and Modelling - DBMS

Lecture Notes: Range Searching with Linear Space

Indexing and Hashing

Database System Concepts

Computational Geometry

Accelerated Raytracing

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Chapter 25: Spatial and Temporal Data and Mobility

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

Indexing and Hashing

Transcription:

R-Trees Accessing Spatial Data

In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion Often used for data stored on long latency I/O (filesystems and DBs) because child nodes can be accessed together (since they are in order)

B-Tree From wikipedia

What s wrong with B-Trees B-Trees cannot store new types of data Specifically people wanted to store geometrical data and multi-dimensional data The R-Tree provided a way to do that (thanx to Guttman 84)

R-Trees R-Trees can organize any-dimensional data by representing the data by a minimum bounding box. Each node bounds it s children. A node can have many objects in it The leaves point to the actual objects (stored on disk probably) The height is always log n (it is height balanced)

R-Tree Example From http://lacot.org/public/enst/bda/img/schema1.gif

Operations Searching: look at all nodes that intersect, then recurse into those nodes. Many paths may lead nowhere Insertion: Locate place to insert node through searching and insert. If a node is full, then a split needs to be done Deletion: node becomes underfull. Reinsert other nodes to maintain balance.

Splitting Full Nodes Linear choose far apart nodes as ends. Randomly choose nodes and assign them so that they require the smallest MBR enlargement Quadratic choose two nodes so the dead space between them is maximized. Insert nodes so area enlargement is minimized Exponential search all possible groupings Note: Only criteria is MBR area enlargement

Demo How can we visualize the R-Tree By clicking here

Variants - R+ Trees Avoids multiple paths during searching. Objects may be stored in multiple nodes MBRs of nodes at same tree level do not overlap On insertion/deletion the tree may change downward or upward in order to maintain the structure

R+ Tree http://perso.enst.fr/~saglio/bdas/epfl0525/sld041.htm

Variants: Hilbert R-Tree Similar to other R-Trees except that the Hilbert value of its rectangle centroid is calculated. That key is used to guide the insertion On an overflow, evenly divide between two nodes Experiments has shown that this scheme significantly improves performance and decreases insertion complexity. Hilbert R-tree achieves up to 28% saving in the number of pages touched compared to R*-tree.

Hilbert Value?? The Hilbert value of an object is found by interleaving the bits of its x and y coordinates, and then chopping the binary string into 2-bit strings. Then, for every 2-bit string, if the value is 0, we replace every 1 in the original string with a 3, and vice-versa. If the value of the 2-bit string is 3, we replace all 2 s and 0 s in a similar fashion. After this is done, you put all the 2-bit strings back together and compute the decimal value of the binary string; This is the Hilbert value of the object. http://www-users.cs.umn.edu/research/shashi-group/cs8715/exercise_ans.doc

R*-Tree The original R-Tree only uses minimized MBR area to determine node splitting. There are other factors to consider as well that can have a great impact depending on the data By considering the other factors, R*-Trees become faster for spatial and point access queries.

Problems in original R-Tree Because the only criteria is to minimize area 1. Certain types of data may create small areas but large distances which will initiate a bad split. 2. If one group reaches a maximum number of entries, the rest of assigned without consideration of their geometry. Greene tried to solve, but he only used the split axis more criteria needs to be used

Splitting overfilled nodes Why is this overfull?

R*-Tree Parameters 1. Area covered by a rectangle should be minimized 2. Overlap should be minimized 3. The sum of the lengths of the edges (margins) should be minimized 4. Storage utilization should be maximized (resulting in smaller tree height)

Splitting in R*-Trees 1) Entries are sorted by their lower value, then their upper value of their rectangles. All possible distributions are determined 2) Compute the sum of the margin values and choose the axis with the minimum as the split axis 3) Along the split axis, choose the distribution with the minimum overlap 4) Distribute entries into these two groups

Deleting and Forced Re-insertion Experimentally, it was shown that re-inserting data provided large (20-50%) improvement in performance. Thus, randomly deleting half the data and reinserting is a good way to keep the structure balanced.

Results Lots of data sets and lots of query types. One example: Real Data: MBRs of elevation lines. 100K objects Query Storage util. After build up Disk accesses On insert

RC-Trees Changing motivations: Memory large enough to store objects It s possible to store the object geometry and not just the MBR representation. Data is dynamic and transient Spatial objects naturally overlap (ie: stock market triggers)

RC-Trees Take advantage of dynamic segmentation If the original geometry is thrown away, then later on the MBR cannot be modified to represent new changes to the tree RC Tree does 1. Clipping 2. Domain Reduction 3. Rebalancing

Discriminators A discriminator is used to decide (in binary) which direction a node should go in. (It means it s a binary tree, unlike other R-Trees) It partitions the space If an object intersects a discriminator, the object can be clipped into two parts When an object is clipped, the space it takes up (in terms of its MBR) is reduced (aka domain reduction) This allows for removal of dead space and faster point query lookups

Domain Reduction and Clipping

Operations Insert, Delete and Search are straightforward What happens on an node that has been overflowed? Choose a discriminator to partition the object into balanced sets How is a discriminator chosen?

Partitioning Two methods for finding a discriminator for a partition RC-MID faster, but ignores balancing and clipping. Uses pre-computed data to determine and average discriminator. Problems? Different distributions greatly affect partition Space requirements can be huge

Partitioning Take 2 RC-SWEEP sorts objects. Candidates for discriminators are the boundaries of the MBRs Assign a weight to each candidate using a formula not shown here Choose the minimum Problems? Slower, but space costs much better than RC-MID (which keeps info about nodes)

Rebuilding The tree can take a certain degree of flexibility in its structure before needing to be rebalanced On an insert, check if the height is too imbalanced If so, go to the imbalanced subtree and flush the items, sort and call split on them to get a better balancing

Experimentation CPU execution time not a good measure. (although they still calculate it) Instead use number of discriminators compared Lots of results Result summary: Insertion a little more expensive (because of possible rebalancing) Querying for point or spatial data faster (and fewer memory accesses) than all previous incarnations Storage requirements not that bad Dynamic segmentation (ie recalculating MBRs) can help a lot Controlling space with γ factor (by disallowing further splitting) controls space costs