Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

Similar documents
Database Management Systems. Objectives of Lecture 10. The Need for a DBMS

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Spatial Data Management

Data Organization and Processing

Multidimensional Data and Modelling - DBMS

Spatial Data Management

R-Trees. Accessing Spatial Data

Principles of Data Management. Lecture #14 (Spatial Data Management)

Introduction to Spatial Database Systems

Multidimensional Indexing The R Tree

Chapter 2: The Game Core. (part 2)

What we have covered?

An Introduction to Spatial Databases

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Data Warehousing & Data Mining

Multidimensional (spatial) Data and Modelling (2)

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Spatial Queries. Nearest Neighbor Queries

CMSC 754 Computational Geometry 1

Data Mining and Machine Learning: Techniques and Algorithms

I/O-Algorithms Lars Arge Aarhus University

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Geometric Structures 2. Quadtree, k-d stromy

Geometric data structures:

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Computational Geometry

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Computational Geometry

Multidimensional Indexes [14]

{ K 0 (P ); ; K k 1 (P )): k keys of P ( COORD(P) ) { DISC(P): discriminator of P { HISON(P), LOSON(P) Kd-Tree Multidimensional binary search tree Tak

Spatial Data Structures

GEOMETRIC SEARCHING PART 1: POINT LOCATION

Spatial Data Structures

CS210 Project 5 (Kd-Trees) Swami Iyer

Introduction to Spatial Database Systems. Outline

Computational Geometry

Advanced Data Types and New Applications

Prof. Gill Barequet. Center for Graphics and Geometric Computing, Technion. Dept. of Computer Science The Technion Haifa, Israel

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Background: disk access vs. main memory access (1/2)

Course Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases

Lecture 3 February 9, 2010

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Spatial Data Structures

Spatial Data Structures

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

VQ Encoding is Nearest Neighbor Search

Database Applications (15-415)

Advanced Data Types and New Applications

Spatial Data Structures

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Algorithms for GIS:! Quadtrees

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Intersection Acceleration

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

Search trees, k-d tree Marko Berezovský Radek Mařík PAL 2012

External Memory Algorithms and Data Structures Fall Project 3 A GIS system

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

2. Dynamic Versions of R-trees

Indexing: B + -Tree. CS 377: Database Systems

Chapter 17 Indexing Structures for Files and Physical Database Design

Flavor of Computational Geometry. Voronoi Diagrams. Shireen Y. Elhabian Aly A. Farag University of Louisville

CSE 512 Course Project Operation Requirements

An O(n) Approximation for the Double Bounding Box Problem

Outline. Motivation. Traditional Database Systems. A Distributed Indexing Scheme for Multi-dimensional Range Queries in Sensor Networks

26 The closest pair problem

Orthogonal Range Queries

VORONOI DIAGRAM PETR FELKEL. FEL CTU PRAGUE Based on [Berg] and [Mount]

CSIT5300: Advanced Database Systems

Chapter 3. Image Processing Methods. (c) 2008 Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Clustering Part 4 DBSCAN

Nearest Neighbor Search by Branch and Bound

Notes on Project 1. version September 01, 2014

Lesson 05. Mid Phase. Collision Detection

Outline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI

Ray Tracing Acceleration Data Structures

Chapter 25: Spatial and Temporal Data and Mobility

Problem Set 6 Solutions

Renderer Implementation: Basics and Clipping. Overview. Preliminaries. David Carr Virtual Environments, Fundamentals Spring 2005

DDS Dynamic Search Trees

Collision and Proximity Queries

TIMSS 2011 Fourth Grade Mathematics Item Descriptions developed during the TIMSS 2011 Benchmarking

References. Additional lecture notes for 2/18/02.

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Accelerating Ray-Tracing

Computer Graphics. The Two-Dimensional Viewing. Somsak Walairacht, Computer Engineering, KMITL

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute

January 10-12, NIT Surathkal Introduction to Graph and Geometric Algorithms

Transcription:

Database Management Systems Winter 2002 CMPUT 39: Spatial Data Management Dr. Jörg Sander & Dr. Osmar. Zaïane University of Alberta Chapter 26 of Textbook Course Content Introduction Database Design Theory Query Processing and Optimisation Concurrency Control Data Base ecovery and Security Object-Oriented Databases Inverted Index for I Spatial Data Management ML Data Warehousing Data Mining Parallel and Distributed Databases CMPUT 39 Database Management Systems University of Alberta CMPUT 39 Database Management Systems University of Alberta 2 Objectives of Lecture? Spatial Data Management This lecture will give you a basic understanding of spatial data management What is special about spatial data What are spatial queries How do typical spatial index structures work Spatial Data Management Modeling Spatial Data Spatial Queries Space-Filling Curves + B-Trees -trees CMPUT 39 Database Management Systems University of Alberta 3 CMPUT 39 Database Management Systems University of Alberta 4

elational epresentation of Spatial Data Example: epresentation of geometric objects (here: parcels/fields of land) in normalized relations Parcels Borders Points FNr BNr BNr PNr PNr 2 PNr -Coord -Coord F F 2 F 3 F 4 F 6 F 5 F 7 F F F F F7 F7 F7 F7 B B2 B3 B4 B2 B5 B6 B7 B8 B9 B7 B0 B B2 B B2 B3 B4 B5 B6 B7 B8 B9 B0 B B2 P P2 P3 P4 P2 P5 P6 P7 P8 P6 P9 P0 P2 P3 P4 P P5 P6 P7 P8 P3 P9 P0 P7 P P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 0 P P2 P3 P4 P5 P6 P7 P8 P9 P0 edundancy free representation requires distribution of the information over 3 tables: Parcels, Borders, Points P P2 P3 P4 P5 P6 P7 P8 P9 P0 elational epresentation of Spatial Data For (spatial) queries involving parcels it is necessary to reconstruct the spatial information from the different tables E.g.: if we want to determine if a given point P is inside parcel F 2, we have to find all corner-points of parcel F 2 first SELECT Points.PNr, -Coord, -Coord FOM Parcels, Border, Points WHEE FNr = F 2 AND Parcel.BNr = Borders.BNr AND (Borders.PNr = Points.PNr O Borders.PNr 2 = Points.PNr) Even this simple query requires expensive joins of three tables Querying the geometry (e.g., P in F 2?) is not directly supported. CMPUT 39 Database Management Systems University of Alberta 5 CMPUT 39 Database Management Systems University of Alberta 6 Extension of the elational Model to Support Spatial Data Integration of spatial data types and operations into the core of a DBMS (Æ object-oriented and object-relational databases) Data types such as Point, Line, Polygon Operations such as ObjectIntersect, angequery, etc. Advantages Natural extension of the relational model and query languages Facilitates design and querying of spatial databases Spatial data types and operations can be supported by spatial index structures and efficient algorithms, implemented in the core of a DBMS All major database vendors today implement support for spatial data and operations in their database systems via object-relational extensions Extension of the elational Model to Support Spatial Data Example elation: ForestZones(Zone: Polygon, ForestOfficial: String, Area: Cardinal) 2 4 6 3 5 The province decides that a reforestation is necessary in an area described by a polygon S. Find all forest officials affected by this decsion. SELECT FOM WHEE ForestOfficial ForestZones ForestZones Zone ForestOfficial Area (m 2 ) 2 3 4 5 6 ObjectIntersects (S, Zone) Stevens Behrens Lee Goebel Jones Kent 3900 4250 6700 5400 900 4600 CMPUT 39 Database Management Systems University of Alberta 7 CMPUT 39 Database Management Systems University of Alberta 8

Data Types for Spatial Objects Spatial objects are described by Spatial Extent location and/or boundary with respect to a reference point in a coordinate system, which is at least 2-dimensional. Basic object types: Point, Lines, Polygon Other Non-Spatial Attributes Thematic attributes such as height, area, name, land-use, etc. 2-dim. points 2-dim. lines 2-dim. polygons Spatial Data Management Modeling Spatial Data Spatial Queries Space-Filling Curves + B-Trees -trees Crop Forest Water CMPUT 39 Database Management Systems University of Alberta 9 CMPUT 39 Database Management Systems University of Alberta 0 Spatial Query Processing DBMS has to support two types of operations Operations to retrieve certain subsets of spatial object from the database Spatial Queries/Selections, e.g., window query, point query, etc. Operations that perform basic geometric computations and tests E.g., point in polygon test, intersection of two polygons etc. Spatial selections, e.g. in geographic information systems, are often supported by an interactive graphical user interface Point Query P Window Query W Basic Spatial Queries Containment Query: Given a spatial object, find all objects that completely contain. If is a Point: Point Query egion Query: Given a region (polygon or circle), find all spatial objects that intersect with. If is a rectangle: Window Query Enclosure Query: Given a polygon region, find all objects that are completely contained in K-Nearest Neighbor Query: Given an object P, find the k objects that are closest to P (typically for points) egion Query Containment Query Enclosure Query P Point Query Window Query P 2-nn Query CMPUT 39 Database Management Systems University of Alberta CMPUT 39 Database Management Systems University of Alberta 2

Basic Spatial Queries Spatial Join Given two sets of spatial objects (typically minimum bounding rectangles) S = {, 2,, m } and S 2 = {, 2,, n } Spatial Join: Compute all pairs of objects (, ) such that S, S 2, and intersects ( ) Spatial predicates other that intersection are also possible, e.g. all pairs of objects that are within a certain distance from each other {,, A6} {B,, B3} A5 B2 A6 B A2 Spatial-Join A4 B3 Answer Set (A5, B) (A4, B) (, B2) (A6, B2) (A2, B3) Index Support for Spatial Queries Conventional index structures such as B-trees are not designed to support spatial queries Group objects only along one dimension Do not preserve spatial proximity E.g. nearest neighbor query: Nearest neighbor of Q is typically not the nearest neighbor in any single dimension C D NN(Q) A B A and B closer in the dimension; C and D closer in the dimension. Q CMPUT 39 Database Management Systems University of Alberta 3 CMPUT 39 Database Management Systems University of Alberta 4 Index Support for Spatial Queries Spatial index structures try to preserve spatial proximity Group objects that are close to each other on the same data page Problem: the number of bytes to store extended spatial objects (lines, polygons) varies Solution: Store Approximations of spatial objects in the index structure, typically axis-parallel minimum bounding rectangles (MB) Exact object representation (E) stored separately; pointers to E in the index E MB... (MB,,...) (MB,,...)... (E) Spatial Index Query Processing Using Approximations Two-Step Procedure. Filter Step: Use the index to find all approximations that satisfy the query Some objects already satisfy the query based on the approximation, others have to be checked in the refinement step Æ Candidate Set 2. efinement Step: Load the exact object representations for candidates left after the filter a c f d step and test whether they satisfies the query b e g Query query-window Window Why? a und sind sicher Antworten - a and b are certain answers f, d und sind sicher keine Antworten - f, d, and g are certainly c und e sind Kandidaten not answers Query c ist ein Fehltreffer (false hit, d. h. ein - c and e Kandidat, are candidates der keine Antwort ist) -c is a false hit Filter candidates Not an answer efinement (exact evaluation) false hits final results CMPUT 39 Database Management Systems University of Alberta 5 CMPUT 39 Database Management Systems University of Alberta 6

Spatial Data Management Modeling Spatial Data Spatial Queries Space-Filling Curves + B-Trees -trees Embedding of the 2-dimensional space into a dimensional space Basic Idea: The data space is partitioned into rectangular cells. Use a space filling curve to assign cell numbers to the cells (define a linear order on the cells) The curve should preserve spatial proximity as good as possible Cell numbers should be easy to compute Objects are approximated by cells. Store the cell numbers for objects in a conventional index structure with respect to the linear order 2 20 7 6 5 4 0 23 22 9 8 7 6 3 2 29 28 25 24 3 2 9 8 3 30 27 26 5 4 0 53 52 49 48 37 36 33 32 55 54 5 50 39 38 35 34 6 60 57 56 45 44 4 40 63 62 59 58 47 46 43 42 CMPUT 39 Database Management Systems University of Alberta 7 CMPUT 39 Database Management Systems University of Alberta 8 Space Filling Curves Z-Order Z-Values Lexicographic Order Coding of Cells Z-Order 56 57 58 59 60 6 62 63 48 49 50 5 52 53 54 55 40 4 42 43 44 45 46 47 32 33 34 35 36 37 38 39 24 25 26 27 28 29 30 3 6 7 8 9 20 2 22 23 8 9 0 2 3 4 5 0 2 3 4 5 6 7 2 23 29 3 53 55 6 63 20 22 28 30 52 54 60 62 7 9 25 27 49 5 57 59 6 8 24 26 48 50 56 58 Hilbert-Curve 63 62 49 48 47 44 43 42 60 6 50 5 46 45 40 4 59 56 55 52 33 34 39 38 58 57 54 53 32 35 36 37 5 6 9 0 3 28 27 26 4 7 8 30 29 24 25 3 2 3 2 7 8 23 22 0 4 5 6 9 20 2 Partition the data space recursively into two halves Alternate and dimension Left/bottom Æ 0 ight/top Æ Z-Value: (c, l) c = decimal value of the bit string l = level (number of bits) 0 0 0 0 0 0 0 Æ 7 00000 Æ 6 0 Æ 2 5 7 3 5 37 39 45 47 4 0 6 2 4 36 38 44 46 3 9 33 35 4 43 2 8 0 32 34 40 42 if all cells are on the same level, then l can be omitted Z-Order preserves spatial proximity relatively good Z-Order is easy to compute CMPUT 39 Database Management Systems University of Alberta 9 CMPUT 39 Database Management Systems University of Alberta 20

Z-Order epresentation of Spatial Objects For Points Use a fixed a resolution of the space in both dimensions, i.e., each cell has the same size Each point is then approximated by one cell For extended spatial object minimum enclosing cell Problems with cells that intersect the first partitions already improvement: use several cells Better approximation of the objects edundant storage edundant retrieval in spatial queries C C C 2 C4 C 3 Coding of by one cell 2 23 29 3 53 55 6 63 20 22 28 30 52 54 60 62 7 9 25 27 49 5 57 59 6 8 24 26 48 50 56 58 5 7 3 5 37 39 45 47 4 6 2 4 36 38 44 46 3 9 33 35 4 43 0 2 8 0 32 34 40 42 Coding of by several cells Query Window Query returns the same answer several times Z-Order Mapping to a B + -Tree Linear Order for Z-values to store them in a B + -tree: Let (c, l ) and (c 2, l 2 ) be two Z-Values and let l = min{l, l 2 }. The order relation Z (that defines a linear order on Z-values) is then defined by (c, l ) Z (c 2, l 2 ) iff (c div 2 ) (c 2 div 2 ) Examples: (,2) Z (3,2), (3,4) Z (3,2), (,2) Z (0,4) (l - l) (l 2 - l) CMPUT 39 Database Management Systems University of Alberta 2 CMPUT 39 Database Management Systems University of Alberta 22 Mapping to a B + -Tree - Example Mapping to a B + -Tree Window Query (0,2) (2,3) (6,4) (7,4) (7,4) (4,3) (20,5) (2,5) (,4) (0,2) (7,4) (7,4) (2,3) (7,4) (4,3) (6,4) (7,4) (20,5) (7,3)... Exact representations stored in a different location Window Query Æ ange Query in the B+-tree find all entries (Z-Values) in the range [l, u] where l = smallest Z-Value of the window (bottom left corner) u = largest Z-Value of the window (top right corner) l and u are computed with respect to the maximum resolution/length of the Z-values in the tree (here: 6) (0,2) (2,3) (0,6) (2,3) (6,4) esult: (0,2) (7,4) (7,4) (4,3) (20,5) (2,5) (,4) (7,3) Window: Min = (0,6), Max = (0,6) CMPUT 39 Database Management Systems University of Alberta 23 CMPUT 39 Database Management Systems University of Alberta 24

Spatial Data Management Modeling Spatial Data Spatial Queries Space-Filling Curves + B-Trees -trees The -Tree Properties Balanced Tree designed to organize rectangles [Gut 84]. Each page contains between m and M entries. Data page entries are of the form (MB, PointerToExactepr). MB is a minimum bounding rectangle of a spatial object, which PointerToExactepr is pointing to Directory page entries are of the form (MB, PointerToSubtree). MB is the minimum bounding rectangle of all entries in the subtree, which PointerToSubtree is pointing to. ectangles can overlap The height h of an -Tree for N spatial objects: h log m N + Directory Level Directory Level 2 Data Pages... Exact epresentations CMPUT 39 Database Management Systems University of Alberta 25 CMPUT 39 Database Management Systems University of Alberta 26 The -Tree Queries The -Tree Queries. A5 A6 A5 A6 S A4 A2 T S A4 A2 T Point Query A2 A6 Window Query A2 A6 T S A4 T S A4 Paths that the query has to follow A5 A5 Answer Set: [] Answer Set: [A2, ] PointQuery (Page, Point); FO ALL Entry Page DO IF Point IN Entry.MB THEN IF Page = DataPage THEN PointInPolygonTest (load(entry.exactepr), Point) ELSE PointQuery (Entry.Subtree, Point); First call: Page = oot of the -tree Window Query (Page, Window); FO ALL Entry Page DO IF Window INTESECTS Entry.MB THEN IF Page = DataPage THEN Intersection (load(entry.exactepr), Window) ELSE WindowQuery (Entry.Subtree, Window); CMPUT 39 Database Management Systems University of Alberta 27 CMPUT 39 Database Management Systems University of Alberta 28

-Tree Construction Optimization Goals Overlap between the MBs spatial queries have to follow several paths try to minimize overlap Empty space in MB spatial queries may have to follow irrelevant paths try to minimize area and empty space in MBs -Tree Construction Important Issues Split Strategy? Split into 2 pages A5 A4 S S A4 A5 How to divide a set of rectangles into 2 sets? M = 3, m = 2 A5 A4 Start: empty data page (= root) Insert: A5,,, A4 A5,,, A4 * (overflow) Insertion Strategy A5 A4 S A2 A4? Insert A2 S? A2 A5 Where to insert a new rectangle? CMPUT 39 Database Management Systems University of Alberta 29 CMPUT 39 Database Management Systems University of Alberta 30 -Tree Construction Insertion Strategies Dynamic construction by insertion of rectangles Searching for the data page into which will be inserted, traverses the tree from the root to a data page. When considering entries of a directory page P, 3 cases can occur:. falls into exactly one Entry.MB Æ follow Entry.Subtree 2. falls into the MB of more than one entry e,..., e n Æ follow E i.subtree for entry e i with the smallest area of e i.mb. 3. does not fall into an Entry.MB of the current page Æ check the increase in area of the MB for each entry when enlarging the MB to enclose. Choose Entry with the minimum increase in area (if this entry is not unique, choose the one with the smallest area); enlarge Entry.MB and follow Entry.Subtree Construction by bulk-loading the rectangles Sort the rectangles, e.g., using Z-Order Create the -tree bottom-up -Tree Construction Split Insertion will eventually lead to an overflow of a data page The parent entry for that page is deleted. The page is split into 2 new pages - according to a split strategy 2 new entries pointing to the newly created pages are inserted into the parent page. A now possible overflow in the parent page is handled recursively in a similar way; if the root has to be split, a new root is created to contain the entries pointing to the newly created pages. A5 A2A6 A4 S S A4 A2 A5 A6 * Overlow Æ split node M = 3, m = 2 U A5 A2A6 V A4 S A4 S U V A5 A2 A6 CMPUT 39 Database Management Systems University of Alberta 3 CMPUT 39 Database Management Systems University of Alberta 32

-Tree Construction Splitting Strategies Overflow of node K with K = M+ entries Æ Distribution of the entries into two new nodes K and K 2 such that K m and K 2 m Exhaustive algorithm: Searching for the best split in the set of all possible splits is two expensive (O(2 M ) possibilities!) Quadratic algorithm: Choose the pair of rectangles and 2 that have the largest value d(, 2 ) for empty space in an MB, which covers both und 2. d (, 2 ) := Area(MB( 2 )) (Area( ) + Area( 2 )) Set K := { } and K 2 := { 2 } epeat until STOP if all i are assigned: STOP if all remaining i are needed to fill the smaller node to guarantee minimal occupancy m: assign them to the smaller node and STOP else: choose the next i and assign it to the node that will have the smallest increase in area of the MB by the assignment. If not unique: choose the K i that covers the smaller area (if still not unique: the one with less entries). -Tree Construction Splitting Strategies Linear algorithm: Same as the quadratic algorithm, except for the choice of the initial pair: Choose the pair with the largest distance. For each dimension determine the rectangle with the largest minimal value and the rectangle with the smallest maximal value (the difference is the maximal distance/separation). Normalize the maximal distance of each dimension by dividing by the sum of the extensions of the rectangles in this dimension Choose the pair of rectangles that has the greatest normalized distance. Set K := { } and K 2 := { 2 }. Largest minimal value in dimension Smallest maximal value in dimension Smallest maximal value in dimension Largest minimal value in dimension max. distance for max. distance for CMPUT 39 Database Management Systems University of Alberta 33 CMPUT 39 Database Management Systems University of Alberta 34 -Trees Variants Many variants of -trees exist, e.g., the *-tree, -tree for higher dimensional point data, For further information see http://www.cs.umd.edu/~hjs/rtrees/index.html (includes an interactive demo) -trees are also efficient index structures for point data since points can be modeled as degenerated rectangles Multi-dimensional points, where a distance function between the points is defined play an important role for similarity search in so-called feature or multi-media databases. P P 2 P 3 P 4 P 5 Examples of Feature Databases Measurements for celestial objects (e.g., intensity of emission in different wavelengths) Color histograms of images Documents, shape descriptors, nd-dimensional feature vectors (o, o 2,, o d ) (o2, o2 2,, o2 d )... (on, on 2,, on d ) P 6 P 7 P 8 P 9 P 0 P P 2 P 3 CMPUT 39 Database Management Systems University of Alberta 35 CMPUT 39 Database Management Systems University of Alberta 36

Feature Databases and Similarity Queries Objects + Metric Distance Function The distance function measures (dis)similarity between objects Basic types of similarity queries range queries with range ε etrieves all objects which are similar to the query object up to a certain degree ε k-nearest neighbor queries etrieves k most similar objects to the query query range ε query object 3-nn distance CMPUT 39 Database Management Systems University of Alberta 37