Multidimensional Data and Modelling (grid technique)

Similar documents
The Grid File: An Adaptable, Symmetric Multikey File Structure

Chapter 11: Indexing and Hashing

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

CSIT5300: Advanced Database Systems

Striped Grid Files: An Alternative for Highdimensional

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

What we have covered?

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Hashed-Based Indexing

Indexing and Hashing

Chapter 12: Indexing and Hashing

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Hashing file organization

Indexing and Hashing

Chapter 11: Indexing and Hashing

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

key h(key) Hash Indexing Friday, April 09, 2004 Disadvantages of Sequential File Organization Must use an index and/or binary search to locate data

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 11: Indexing and Hashing

18 Multidimensional Data Structures 1

Lecture 8 Index (B+-Tree and Hash)

Indexing: Overview & Hashing. CS 377: Database Systems

Chapter 11: Indexing and Hashing

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Intro to DB CHAPTER 12 INDEXING & HASHING

Hash-Based Indexes. Chapter 11

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Database index structures

Recall: Inside Triangle Test

Multidimensional Indexes [14]

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

Indices in Information Systems

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Chapter 12: Indexing and Hashing (Cnt(

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Some Practice Problems on Hardware, File Organization and Indexing

Introduction to Spatial Database Systems

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Indexing by Shape of Image Databases Based on Extended Grid Files

Material You Need to Know

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Data Structures. Motivation

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 6. Hash-Based Indexing. Efficient Support for Equality Search. Architecture and Implementation of Database Systems Summer 2014

LH*Algorithm: Scalable Distributed Data Structure (SDDS) and its implementation on Switched Multicomputers

Chapter 17 Indexing Structures for Files and Physical Database Design

COMP 430 Intro. to Database Systems. Indexing

File Management By : Kaushik Vaghani

Kathleen Durant PhD Northeastern University CS Indexes

CS6401- Operating System QUESTION BANK UNIT-IV

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Similarity Searching:

CMSC 754 Computational Geometry 1

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Spatial Data Structures

File Structures and Indexing

Data Organization B trees

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Spatial Data Structures

1. Meshes. D7013E Lecture 14

Instructor: Amol Deshpande

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Hash-Based Indexing 1

A Survey on Multidimensional Access Methods

CSC 261/461 Database Systems Lecture 17. Fall 2017


Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

V Advanced Data Structures

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Indexing Methods. Lecture 9. Storage Requirements of Databases

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Indexing: B + -Tree. CS 377: Database Systems

Introduction to Data Management. Lecture 15 (More About Indexing)

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Module 5: Hash-Based Indexing

Introduction to Data Management. Lecture 21 (Indexing, cont.)

ΗΥ360 Αρχεία και Βάσεις εδοµένων

Tree-Structured Indexes

Spatial Data Structures

Hashing. Data organization in main memory or disk

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Geometric data structures:

IMPORTANT: Circle the last two letters of your class account:

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Transcription:

Multidimensional Data and Modelling (grid technique) 1

Grid file Increase of database usage and integrated information systems File structures => efficient access to records How? Combine attribute values (multikeys) But traditional file structures that provide multikey access to records are extensions of file structures originally designed for single-key access. Thus, they manifest various deficiencies in particular for multikey access to highly dynamic files 2

Grid file l problem: spatial queries in k-d point-sets l main idea: try to generalize hashing to k-d

Initial approach of locational data (0,100) (100,100) (62,77) Toronto (82,65) Buffalo (35,42) Chicago y (27,35) Omaha (45,30) Memphis (0,0) x (52,10) Mobile (85,15) Atlanta (90,5) Miami (100,0) 4

Traditional single-key access

Grid file l Special kind of hashing l Adaptable: w.r.t. insert/delete l Efficient query handling l Dynamic : Access time is uniform (two-disk-access principle) l Symmetric: No Secondary Key. Every key is the Primary Key l Multikey: records using subset of keys

Grid file l A: put a grid l specs: [Nievergelt +, 84] Jurg Nievergelt l symmetric to all attributes l 2 disk accesses for exact match queries l adaptive to non-uniform distr. l Q: details?

Grid file

Grid file

Grid file l Useful for range queries that would map into a set of cells corresponding to a group of values along the linear scales. l Can be applied to any number of search keys. l n search keys => n dimensions. l They perform well in terms of reduction in time for multiple key access.

Grid file How? l Divide record space into grid blocks

Grid file l Allocates storage in units of fixed size l Disk blocks/pages/buckets l To map grid blocks to buckets? l Use grid directory l Two-disk-access: Retrieve single record in at-most 2 disk access l Access directory(grid) l Access Bucket(database) l Efficient range queries

Grid Directory (k=2)

Single Record Access [1980,w]

Range Query l [1450-1600, c-g,, ] l Different buckets?

Next in each direction l l l l Nextxabove: cx = (cx+1) mod nx Nextxbelow: cx = (cx-1) mod nx Nextyabove: cy = (cy+1) mod ny Nextybelow: cy = (cy-1) mod ny

Insertion l Bucket size = 4 l Split it!!!!

Grid File Insertion

Grid File Insertion

Grid File Insertion l Fixed scheduled Dimension splitting is used in this example

Directory Merging l No queries between [a-k] and [0-1500]

Directory Merging l Grid directory is trimmed on merging

Concurrent Access l No root node as in trees(bottleneck if present), allows concurrency

Advantages l No special computations are required l Only the right records are retrieved l Can also be used for single search key queries l Easy to extend to queries on n search keys l Significant improvement in processing time for multiple-key queries l Has a two-disk-access upper bound for accessing data l Allows simpler concurrency control protocols

Grid files - disadvantages l #1: problems in high-d: directory splits can be expensive l #2: even in low-d, suffers on correlated attributes

Grid files - disadvantages l (A1: rotate; A2: triangular cells)

Grid files - disadvantages l #3: how about region data?

Grid files - disadvantages l #3: how about region data? l if we cut them, then we have O(volume) pieces (while z-ordering: O(surface)) l Translation to 2k d points! (clever, BUT, still has subtle problems) E.g., 1-d regions A B C 0 ¼ ¾ 1 ½ x-end A B C 0 ¼ ¾ 1 ½ x-start

Grid files - disadvantages l what to do? l Translation to 2kd points! (clever, BUT, still has subtle problems) E.g., 1-d regions A B C x-end A C B 0 ¼ ¾ 1 ½ 0 ¼ ¾ 1 ½ x-start

Disadvantages l dimensionality curse; large query regions l imposes space overhead l performance overhead on insertion and deletion l a frequent reorganization of the file adds to the maintenance cost

Bang file

Two-level grid file Two-Level Grid File

Twin grid file Given set of points can be distributed among two grid files in such a way that storage space utilization is optimal. The optimal twin grid file can be built practically as fast as a standard grid file, i.e. the storage space optimality is obtained at almost no extra cost.

Twin grid file The performances of the standard grid file, the optimal static twin grid file, and an efficient dynamic twin grid file, where insertions and deletions trigger the redistribution of points among the two grid files. Twin grid files utilize storage space at roughly 90%, as compared with the 69% of the standard grid file. Typical range queries - the most important spatial search operations - can be answered in twin grid files at least as fast as in the standard grid file.

Buddy tree The buddy tree is a dynamic hashing scheme with a treelike directory. The universe is cut recursively into two parts of equal size with iso-oriented hyperplanes, and each interior node corresponds to a partition together with interval. The interval corresponds to MBB, covering points below of given node. Also: l Each directory node contains at least two entries; l Whenever a node is split, the MBB and subnodes are recomputed, to fit situation; l Except for the root of the directory, there is exactly one pointer referring to each directory page.

Buddy tree