Multidimensional Data and Modelling - DBMS

Similar documents
An Introduction to Spatial Databases

Multidimensional (spatial) Data and Modelling (2)

Introduction to Spatial Database Systems. Outline

Introduction to Spatial Database Systems

Introduction to Spatial Database Systems

Spatial Databases. Literature and Acknowledgments. Temporal and Spatial Data Management Fall 2017 SL05. Introduction/2.

Spatial Databases SL05

What we have covered?

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Data Organization and Processing

layers in a raster model

Multidimensional Data and Modelling

M. Andrea Rodríguez-Tastets. I Semester 2008

Database Applications (15-415)

Spatial Data Management

Lecture 6: GIS Spatial Analysis. GE 118: INTRODUCTION TO GIS Engr. Meriam M. Santillan Caraga State University

Spatial Data Management

GITA 338: Spatial Information Processing Systems

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Principles of Data Management. Lecture #14 (Spatial Data Management)

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

Multidimensional Indexes [14]

Database Applications (15-415)

Chapter 1: Introduction to Spatial Databases

Multidimensional (spatial) Data and Modelling (3)

Chap4: Spatial Storage and Indexing

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

External Memory Algorithms and Data Structures Fall Project 3 A GIS system

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

Geometric and Thematic Integration of Spatial Data into Maps

Evaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing

LECTURE 2 SPATIAL DATA MODELS

Announcements. Data Sources a list of data files and their sources, an example of what I am looking for:

GEOSPATIAL ENGINEERING COMPETENCIES. Geographic Information Science

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

Overview of Query Evaluation. Chapter 12

Chapter 12: Query Processing. Chapter 12: Query Processing

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Database System Concepts

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Spatial Data Models. Raster uses individual cells in a matrix, or grid, format to represent real world entities

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Applications (15-415)

Geometric Modeling Mortenson Chapter 11. Complex Model Construction

Lecturer 2: Spatial Concepts and Data Models

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Multi-Step Processing of Spatial Joins

Chapter 12: Query Processing

Implementation of Relational Operations

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 24 Solid Modelling

Storing And Using Multi-Scale Topological Data Efficiently In A Client-Server DBMS Environment

Evaluation of Relational Operations. Relational Operations

Maps as Numbers: Data Models

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

Multidimensional Data and Modelling - Operations

Intersection Acceleration

CSG obj. oper3. obj1 obj2 obj3. obj5. obj4

Chapter 13: Query Processing

Evaluation of Relational Operations

Spatial Queries. Nearest Neighbor Queries

Class #2. Data Models: maps as models of reality, geographical and attribute measurement & vector and raster (and other) data structures

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Ray Intersection Acceleration

Sorting a file in RAM. External Sorting. Why Sort? 2-Way Sort of N pages Requires Minimum of 3 Buffers Pass 0: Read a page, sort it, write it.

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

6. Parallel Volume Rendering Algorithms

Advanced Database Systems

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Analytical and Computer Cartography Winter Lecture 9: Geometric Map Transformations

Potentials for Improving Query Processing in Spatial Database Systems

R-Trees. Accessing Spatial Data

Advanced Data Types and New Applications

Object modeling and geodatabases. GEOG 419: Advanced GIS

DATA MODELS IN GIS. Prachi Misra Sahoo I.A.S.R.I., New Delhi

CMSC 754 Computational Geometry 1

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Extending Rectangle Join Algorithms for Rectilinear Polygons

Page 1. Area-Subdivision Algorithms z-buffer Algorithm List Priority Algorithms BSP (Binary Space Partitioning Tree) Scan-line Algorithms

Morphological Image Processing

Ray Tracing Acceleration Data Structures

Problem Set 6 Solutions

Algorithms for GIS csci3225

CAR-TR-990 CS-TR-4526 UMIACS September 2003

Experimental Evaluation of Spatial Indices with FESTIval

ArcView QuickStart Guide. Contents. The ArcView Screen. Elements of an ArcView Project. Creating an ArcView Project. Adding Themes to Views

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Triangle meshes. Computer Graphics CSE 167 Lecture 8

VECTOR ANALYSIS: QUERIES, MEASUREMENTS & TRANSFORMATIONS

External Sorting Sorting Tables Larger Than Main Memory

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Chapter 12: Query Processing

Spatial Data Structures for Computer Graphics

Spatial Analysis and Modeling (GIST 4302/5302) Database Fundaments. Database. Review: Bits and Bytes

Realm-Based Spatial Data Types: The ROSE Algebra 1

Triangle meshes I. CS 4620 Lecture 2

Multidimensional Indexing The R Tree

Transcription:

Multidimensional Data and Modelling - DBMS 1

DBMS-centric approach Summary: l Spatial data is considered as another type of data beside conventional data in a DBMS. l Enabling advantages of DBMS (data structures, algorithms, conventional data model), however, l DBMS-centric approaches lack flexibility of GIS-centric for management of maps. l Relational operations union, except and intersect are used to yield spatial union, difference and intersection. 2

Introduction l A common technology for some applications: l GIS (geographic/geo-referenced data) l VLSI design (geometric data) l modeling complex phenomena (spatial data) l All need to manage large collections of relatively simple spatial objects l Spatial DB vs. Image/pictorial DB l Spatial DB contains objects in the space l Image DB contains representations of a space (images, pictures, : raster data)

SDBMS Definition A spatial database system: l Is a database system l A DBMS with additional capabilities for handling spatial data l Offers spatial data types (SDTs) in its data model and query language l Structure in space: e.g., POINT, LINE, REGION l Relationships among them: (l intersects r)

Modeling Assume 2-D and GIS application, two basic things need to be represented: l Objects in space: cities, forests, or rivers objects single l Coverage/Field: say something about every point in space (e.g., partitions, thematic maps) spatially related collections of objects

Modeling: spatial primitives l Point: object represented only by its location in space, e.g. center of a state l Line (actually a curve or polyline): representation of moving through or connections in space, e.g. road, river l Region: representation of an extent in 2d-space, e.g. lake, city

Modeling: coverages l Partition: set of region objects that are required to be disjoint (adjacency or region objects with common boundaries), e.g. thematic maps l Networks: embedded graph in plane consisting of set of points (vertices) and lines (edges) objects, e.g. highways, power supply lines, rivers l Spatial predicates for topological relationships: l inside: geo x regions bool l intersect, meets: ext1 x ext2 bool l adjacent, encloses: regions x regions bool

Modeling: coverage EXT={lines, regions}, GEO={points, lines, regions} l Operations returning atomic spatial data types: l intersection: lines x lines points l intersection: regions x regions regions l plus, minus: geo x geo geo l contour: regions lines l Spatial operators returning numbers l dist: geo1 x geo2 real l perimeter, area: regions real

Modeling: coverage l Spatial operations on set of objects l sum: set(obj) x (objgeo) geo l A spatial aggregate function, geometric union of all attribute values, e.g. union of set of provinces determine the area of the country l closest: set(obj) x (objgeo1) x geo2 set(obj) l Determines within a set of objects those whose spatial attribute value has minimal distance from geometric query object l Other complex operations: overlay, buffering,

Modeling: SDBMS data model l DBMS data model must be extended by SDTs at the level of atomic data types (such as integer, string), or better be open for user-defined types (OR-DBMS approach): relation states (sname: STRING; area: REGION; spop: INTEGER) relation cities (cname: STRING; center: POINT; ext: REGION; cpop: INTEGER); relation rivers (rname: STRING; route: LINE)

Querying Two main issues: 1. Connecting the operations of a spatial algebra (including predicates for spatial relationships) to the facilities of a DBMS query language. Fundamental spatial algebra operator are: Spatial selection Spatial join (overlay, fusion) 2. Providing graphical presentation of spatial data (i.e. results of queries), and graphical input of SDT values used in queries.

Querying: spatial selection l Spatial selection: returning those objects satisfying a spatial predicate with the query object l All cities in Bavaria SELECT cname FROM cities c WHERE c.center inside Bavaria.area l All rivers intersecting a query window SELECT * FROM rivers r WHERE r.route intersects Window

Querying: spatial selection l Spatial selection: returning those objects satisfying a spatial predicate with the query object l All big cities no more than 100 Kms from Hagen SELECT cname FROM cities c WHERE dist(c.center, Hagen.center) < 100 and c.pop > 500k (conjunction with other predicates and query optimization)

Querying: spatial join l Spatial join: A join which compares any two joined objects based on a predicate on their spatial attribute values. l For each river pass through Bavaria, find all cities within less than 50 Kms. SELECT r.rname, c.cname, length(intersection(r.route, c.area)) FROM rivers r, cities c WHERE r.route intersects Bavaria.area and dist(r.route,c.area) < 50

Querying: I/O l Requirements for spatial querying: l Spatial data types l Graphical display of query results l Graphical combination (overlay) of several query results (start a new picture, add/remove layers, change order of layers) l Display of context (e.g., show background such as a raster image (satellite image) or boundary of states) l Facility to check the content of a display (which query contributed to the content)

Querying: I/O Other requirements for spatial querying: l Extended dialog: use pointing device to select objects within a subarea, zooming, l Varying graphical representations: different colors, patterns, intensity, symbols to different objects classes or even objects within a class l Label placement: selecting object attributes (e.g., population) as labels l Scale selection: determines not only size of the graphical representations but also what kind of symbol be used and whether an object be shown at all

Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated manner with the DBMS query processing. 2. Not just simply implementing atomic operations using computational geometry algorithms, but consider the use of the predicates within set-oriented query processing Spatial indexing or access methods, and spatial join algorithms

Data Structures l Representation of a value of a SDT must be compatible with two different views: 1. DBMS perspective: l Same as attribute values of other types with respect to generic operations l Can have varying and possibly large size l Reside permanently on disk page(s) l Can efficiently be loaded into memory l Offers a number of type-specific implementations for generic operations needed by the DBMS (e.g., transformation functions from/to ASCII or graphic)

Data Structures 2. Spatial algebra implementation perspective, the representation: l Is a value of some programming language data type l Is some arbitrary data structure which is possibly quite complex l Supports efficient computational geometry algorithms for spatial algebra operations l Is not geared only to one particular algorithm but is balanced to support many operations well enough

Data Structures l From both perspectives, the representation should be mapped by the compiler into a single or perhaps a few contiguous areas (to support DBMS paging). Also supports: l Plane sweep sequence: object s vertices stored in a specific sweep order (e.g. x-order) to expedite plane-sweep operation. l Approximations: stores some approximations as well (e.g. MBR) to speed up operations (e.g. comparison) l Stored unary function values: such as perimeter or area be stored once the object is constructed to eliminate future expensive computations.

Spatial Indexing l To expedite spatial selection (as well as other operations such as spatial joins, ) l It organizes space and the objects in it in some way so that only parts of the objects need to be considered to answer a query. l Two main approaches: 1. Dedicated spatial data structures (e.g. R-tree) 2. Spatial objects mapped to a 1-D space to utilize standard indexing techniques (e.g. B-tree)

Spatial Indexing: operations l Spatial data structures either store points or rectangles (for line or region values) l Operations on those structures: insert, delete, member l Query types for points: Range query: all points within a query rectangle Nearest neighbor: point closest to a query point Distance scan: enumerate points in increasing distance from a query point. l Query types for rectangles: Intersection query Containment query

Spatial Indexing l l A fundamental idea: use of approximations as keys 1) continuous (e.g. bounding box) 2) Grid (a geometric entity as a set of cells). Filter and refine strategy for query processing: 1. Filter: returns a set of candidate object which is a superset of the objects fulfilling a predicate 2. Refine: for each candidate, the exact geometry is checked

Spatial Indexing l A spatial index structure organizes points into buckets. l Each bucket has an associated bucket region, a part of space containing all objects stored in that bucket. l For point data structures, the regions are disjoint & partition space so that each point belongs into precisely one bucket. l For rectangle data structures, bucket regions may overlap. A kd-tree partitioning of 2d-space where each bucket can hold up to 3 points

Spatial Indexing l One dimensional embedding: z-order or bit-interleaving l Find a linear order for the cells of the grid while maintaining locality (i.e., cells close to each other in space are also close to each other in the linear order) l Define this order recursively for a grid that is obtained by hierarchical subdivision of 1111 01 space 10 00 11 0000

Spatial Indexing l Any shape (approximated as set of cells) over the grid can now be decomposed into a minimal number of cells at different levels (using always the highest possible level) l Hence, for each spatial object, we can obtain a set of spatial keys l Index: can be a B-tree of lexicographically ordered list of the union of these spatial keys

Spatial indexing: 2-D points l Data structures representing points have a much longer tradition: l Kd-tree and its extensions (KDB-tree and LSD-tree) l Grid file (organizing buckets into an irregular grid of pointers)

Spatial Indexing: 2-D rectangles l Spatial index structures for rectangles: unlike points,rectangles don t fall into a unique cell of a partition and might intersect partition boundaries l Transformation approach: instead of k-dimensional rectangles, 2k-dimensional points are stored using a point data structure l Overlapping regions: partitioning space is abandoned & bucket regions may overlap (e.g. R- tree & R*-tree) l Clipping: keep partitioning, a rectangle that intersects partition boundaries is clipped and represented within each intersecting cell (e.g. R+tree)

Spatial Join l Traditional join methods such as hash join or sort/ merge join are not applicable. l Filtering cartesian product is expensive. l Two general classes: 1. Grid approximation/bounding box 2. None/one/both operands are presented in a spatial index structure l Grid approximations and overlap predicate: l A parallel scan of two sets of z-elements corresponding to two sets of spatial objects is performed l Too fine a grid, too many z-elements per object (inefficient) l Too coarse a grid, too many false hits in a spatial join

Spatial Join l Bounding boxes: for two sets of rectangles R, S all pairs (r,s), r in R, s in S, such that r intersects s: l No spatial index on R and S: bb_join which uses a computational geometry algorithm to detect rectangle intersection, similar to external merge sorting l Spatial index on either R or S: index join scan the nonindexed operand and for each object, the bounding box of its SDT attribute is used as a search argument on the indexed operand (only efficient if non-indexed operand is not too big or else bb-join might be better) l Both R and S are indexed: synchronized traversal of both structures so that pairs of cells of their respective partitions covering the same part of space are encountered together.

System Architecture l Extensions required to a standard DBMS architecture: l Representations for the data types of a spatial algebra l Procedures for the atomic operations (e.g. overlap) l Spatial index structures l Access operations for spatial indices (e.g. insert) l Filter and refine techniques l Spatial join algorithms l Cost functions for all these operations (for query optimizer) l Statistics for estimating selectivity of spatial selection and join l Extensions of optimizer to map queries into the specialized query processing method l Spatial data types & operations within data definition and query language l User interface extensions to handle graphical representation and input of SDT values

System Architecture l The only clean way to accommodate these extensions is an integrated architecture based on the use of an extensible DBMS. l There is no difference in principle between: l a standard data type such as a STRING and a spatial data type such as REGION l same for operations: concatenating two strings or forming intersection of two regions l clustering and secondary index for standard attribute (e.g. B- tree) & for spatial attribute (R-tree) l sort/merge join and bounding-box join l query optimization (only reflected in the cost functions)