Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Similar documents
Chap4: Spatial Storage and Indexing

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

What we have covered?

Introduction to Spatial Database Systems

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Geometric data structures:

Multidimensional Data and Modelling - DBMS

Multidimensional Indexes [14]

R-Trees. Accessing Spatial Data

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

An Introduction to Spatial Databases

So, we want to perform the following query:

Chapter 17 Indexing Structures for Files and Physical Database Design

Multidimensional (spatial) Data and Modelling (2)

Locality- Sensitive Hashing Random Projections for NN Search

CMSC 754 Computational Geometry 1

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Data Organization B trees

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Spatial Data Management

Data Warehousing & Data Mining

Spatial Data Management

Spatial Queries. Nearest Neighbor Queries

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Principles of Data Management. Lecture #14 (Spatial Data Management)

I/O-Algorithms Lars Arge Aarhus University

CS535 Fall Department of Computer Science Purdue University

CS350: Data Structures B-Trees

Using space-filling curves for multidimensional

Introduction to Spatial Database Systems. Outline

Motivation for B-Trees

Chapter 12: Indexing and Hashing. Basic Concepts

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

CSE 544 Principles of Database Management Systems

Chapter 2: The Game Core. (part 2)

Data Structures and Algorithms

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Chapter 12: Indexing and Hashing

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Multimedia Database Systems

Multidimensional Indexing The R Tree

Perfect Hashing Base R-tree for Multiple Queries

Data Organization and Processing

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

CS 350 : Data Structures B-Trees

Orthogonal range searching. Orthogonal range search

Indexing Techniques 3 rd Part

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database Applications (15-415)

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Spatial data structures in 2D

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Chapter 25: Spatial and Temporal Data and Mobility

Material You Need to Know

Chapter 12: Indexing and Hashing

Storage hierarchy. Textbook: chapters 11, 12, and 13

Query Types Exact Match Partial Match Range Match

B-Tree. CS127 TAs. ** the best data structure ever

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

What is a Multi-way tree?

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Chapter 11: Indexing and Hashing

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

CSE 444: Database Internals. Lectures 5-6 Indexing

Multi-Attribute Indexing

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Physical Level of Databases: B+-Trees

Challenge... Ex. Maps are 2D Ex. Images are (with * height) D (assuming that each pixel is a feature)

Sorting a file in RAM. External Sorting. Why Sort? 2-Way Sort of N pages Requires Minimum of 3 Buffers Pass 0: Read a page, sort it, write it.

Multidimensional Data and Modelling (grid technique)

Planar Point Location

CSE 530A. B+ Trees. Washington University Fall 2013

Organizing Spatial Data

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Space-based Partitioning Data Structures and their Algorithms. Jon Leonard

2. Dynamic Versions of R-trees

Instructor: Amol Deshpande

Geometric Algorithms. Geometric search: overview. 1D Range Search. 1D Range Search Implementations

CS317 File and Database Systems

Computational Geometry

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Module 4: Index Structures Lecture 16: Voronoi Diagrams and Tries. The Lecture Contains: Voronoi diagrams. Tries. Index structures

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Information Retrieval. Wesley Mathew

Balanced Trees Part Two

Chapter 11: Indexing and Hashing

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Transcription:

Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model What is a physical data model? Why learn about physical data models? LO2: Learn how to structure data files LO3: Learn how to use auxiliary data-structures Focus on concepts not procedures! Mapping Sections to learning objectives LO2-4.1 LO3-4.2 2

Physical model in 3 level design? Recall 3 levels of database design Conceptual model: high level abstract description Logical model: description of a concrete realization Physical model: implementation using basic components Analogy with vehicles Conceptual model: mechanisms to move, turn, stop,... Logical models: Car: accelerator pedal, steering wheel, brake pedal, Bicycle: pedal forward to move, turn handle, pull brakes on handle Physical models : Car: engine, transmission, master cylinder, break lines, brake pads, Bicycle: chain from pedal to wheels, gears, wire from handle to brake pads We now go, so to speak, under the hood 3

What is a physical data model? What is a physical data model of a database? Concepts to implement logical data model Using current components, e.g. computer hardware, operating systems In an efficient and fault-tolerant manner Why learn physical data model concepts? To be able to choose between DBMS brand names Some brand names do not have spatial indices! To be able to use DBMS facilities for performance tuning For example, if a query is running slow, one may create an index to speed it up For example, if loading of a large number of tuples takes for ever one may drop indices on the table before the inserts and recreate index after inserts are done! 4

Concepts in a physical data model Database concepts Conceptual data model - entity, (multi-valued) attributes, relationship, Logical model - relations, atomic attributes, primary and foreign keys Physical model - secondary storage hardware, file structures, indices, Examples of physical model concepts from relational DBMS Secondary storage hardware: Disk drives File structures - sorted Auxiliary search structure - search trees (hierarchical collections of one-dimensional ranges) 5

An interesting fact about physical data model Physical data model design is a trade-off between Efficiently support a small set of basic operations of a few data types Simplicity of overall system Each DBMS physical model Choose a few physical DM techniques Choice depends chosen sets of operations and data types Relational DBMS physical model Data types: numbers, strings, date, currency one-dimensional, totally ordered Operations: search on one-dimensional totally order data types insert, delete,... 6

Common Spatial Queries and Operations Physical model provides simpler operations needed by spatial queries! Common Queries Range query Nearest neighbor Spatial-join query Others (Closest-pair query, Color range query, etc.) Example schema: A big company with a lot of stores and warehouses Store(Id int, Name char(30), Location Point) Warehouse(Id int, Name char(30), Location Point) 7

Range query Find all objects contained in a rectangle/circle Ex. Find all warehouses at dist < 50 Km from location (0,0) Select WarehouseId From Warehouse Where distance(warehouse.location, Point(0,0)) < 50; 8

Nearest neighbor query Find the object(s) closest to another object Ex. Find the store closest to store 101 Select s2.id From Store s1, Store s2 Where s1.id = 101 and distance(s1.location, s2.location) = min (Select distance(s1.location, s3.location) From Store s3); 9

Spatial-join query Find pairs of objects satisfying a property Ex. Find all pairs of stores-warehouses with dist < 10 Km Select Store.Id, Warehouse.Id From Store, Warehouse Where distance(store.location, Warehouse.Location)< 10 10

Other types of queries Closest-pair query: Find the closest pair (i.e., with min distance) between a store and a warehouse (Coral et al., 2000) Color range query: What type of objects (e.g., stores, warehouses) are inside a rectangle/circle Find not the objects themselves, but their types (Nanopoulos et al., 2001) Computational geometry has many interesting queries Not all of them have been transferred to SDB realm 11

Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model LO2: Learn how to structure data files What is a file structure? Why structure files? What are common structures for spatial data file? LO3: Learn how to use auxiliary data-structures Mapping Sections to learning objectives LO2-4.1 LO3-4.2 12

4.1.4 File Structures What is a file structure? A method of organizing records in a file For efficient implementation of common file operations on disks Example: ordered files Measure of efficiency I/O cost: Number of disk sectors retrieved from secondary storage CPU cost: Number of CPU instruction used Two basic categories of file structures in SDB Point Access Methods (objects are strictly points) Spatial Access Methods (objects have spatial extend) 13

Point Access Methods (PAM) PAM: index only point data Multidimensional Hashing Hierarchical (tree-based) structures Space filling curve 14

The problem Given a point set and a rectangular query, find the points enclosed in the query Query 15

Grid File Hashing methods for multidimensional points (extension of Extensible hashing) Idea: Use a grid to partition the space each cell is associated with one page Two disk access principle (exact match) 16

Grid File Select dividers along each dimension. Partition space into cells Dividers cut all the way. Each cell corresponds to 1 disk page. Many cells can point to the same page. Cell directory potentially exponential in the number of dimensions 17

Example Grid Directory Buckets/Disk Blocks Linear scale Y Linear scale X 18

Grid File Search Exact Match Search: at most 2 I/Os assuming linear scales fit in memory. First use liner scales to determine the index into the cell directory access the cell directory to retrieve the bucket address (may cause 1 I/O if cell directory does not fit in memory) access the appropriate bucket (1 I/O) Range Queries: use linear scales to determine the index into the cell directory. Access the cell directory to retrieve the bucket addresses of buckets to visit. Access the buckets. 19

Tree-based PAMs Most of tb-pams are based on kd-tree kd-tree is a main memory binary tree for indexing k-dimensional points Needs to be adapted for disk model Levels rotate among the dimensions, partitioning the space based on a value for that dimension kd-tree is not necessarily balanced 20

Example At each level we use a different dimension X=3 X=7 X=5 y=5 y=6 Y=6 x=3 x=8 x=7 Y=2 y=2 X=5 X=8 21

Kd-tree properties Height of the tree O(log n) Search time for exact match: O(log n) Search time for range query: O(n 1/2 + k) 22

Space Filling Curves: Z-ordering Map points from 2-dimensions to 1-dimension. Use a B+-tree to index the 1-dimensional points Basic assumption: Finite precision in the representation of each co-ordinate, K bits (2 K values) The address space is a square (image) and represented as a 2 K 2 K array Each element is called a pixel x 23

Z-ordering Impose a linear ordering on the pixels of the image 1 dimensional problem 11 10 01 00 A 00 01 10 11 B Z A = shuffle(xa, ya) = shuffle( 01, 11 ) = 0111 = (7) 10 Z B = shuffle( 01, 01 ) = 0011 Example of shuffling 24

Queries Find the z-values that contained in the query and then the ranges 11 10 01 00 Q A 00 01 10 11 Q B Q A range [4, 7] Q B ranges [2,3] and [8,9] 25

Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model LO2: Learn how to structure data files LO3: Learn how to use auxiliary data-structures Concept of index Spatial indices, e.g. R-tree families Focus on concepts not procedures! Mapping Sections to learning objectives LO2-4.1 LO3-4.2 26

What is an index? Concept of an index auxiliary file to search a data file Example: Fig. 4.10 index records have key value address of relevant data sector see arrows in Fig. 4.10 Index records are ordered find, findnext, insert are fast Note assumption of total order on values of indexed attributes Fig 4.10 27

Spatial Access Methods (SAMs) Indexes for spatial data that have extend (not only point data) Use only Minimum Bounding Rectangles MBRs (filtering) R-tree (Guttman, 1984) is the prominent SAM Implemented in Oracle, Postgres, Informix 28

R-Tree A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M entries The root node has at least 2 entries (children) 29

Example eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page B A C D E F G J H I 30

Example F=4 P1 A B P2 C D E P3 G F P4 J H I A B C D E H I J F G 31

Example F=4 P1 A B P2 C D E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J F G 32

R-trees:Search P1 A B P2 C D E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J F G 33

R-trees:Search P1 A B P2 C D E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J F G A query may follow multiple branches 34

R-trees:Insertion Insert new MBR in a leaf Find the leaf to insert by searching, starting from the root How to find the next node to insert the new object? Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) 35

R-trees:Insertion Insert X P1 A B P2 C D X E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J X F G 36

R-trees:Insertion Insert Y P1 A B Y P2 C D E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J F G 37

R-trees:Insertion Extend the parent MBR P1 A B Y P2 C D E P3 G F P4 J H I A B C D E P1 P2 P3 P4 H I J Y F G 38

R-trees:Insertion If node is full then Split : ex. Insert w P1 K A C B P2 D W E P3 G F P4 J H I A B C D E P1 P2 P3 P4 K H I J F G 39

R-trees:Split Split node P1: partition the MBRs into two groups. P1 K A C B W A1: linear split A2: quadratic split A3: exponential split: 2 M-1 choices 40

R-trees:Split pick two rectangles as seeds ; assign each rectangle R to the closest seed seed2 R seed1 41

R-trees:Split pick two rectangles as seeds ; assign each rectangle R to the closest seed : closest : the smallest increase in area seed2 R seed1 42

R-trees:Split How to pick Seeds: Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d 43

R-trees:Insertion (the complete algorithm) Use the ChooseLeaf to find the leaf node to insert an entry E If leaf node is full, then Split, otherwise insert there Propagate the split upwards, if necessary Adjust parent nodes 44

R-Trees:Deletion Find the leaf node that contains the entry E Remove E from this node If underflow: Eliminate the node by removing the node entries and the parent entry Reinsert the orphaned (other entries) into the tree using Insert 45

R-trees: Variations R+-tree: DO not allow overlapping, so split the objects (similar to z-values) R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion) Hilbert R-tree: use the Hilbert values to insert objects into the tree 46

Summary Physical DM efficiently implements logical DM on computer hardware Physical DM has file-structure, indexes Classical methods were designed for data with total ordering fall short in handling spatial data because spatial data is multi-dimensional Two approaches to support spatial data and queries Reuse classical method Use Space-Filling curves to impose a total order on multi-dimensional data Use new methods R-trees, Grid files 47