Challenge... Ex. Maps are 2D Ex. Images are (with * height) D (assuming that each pixel is a feature)
|
|
- Jared Snow
- 5 years ago
- Views:
Transcription
1 Challenge... l Traditional data is one dimensional. l Multimedia data is multi dimensional. Ex. Maps are 2D Ex. Images are (with * height) D (assuming that each pixel is a feature) In general, if a given information has k features, it can be represented by a k-dimensional space A 24
2 What kind of queries we can expect? l Given a set of point in k-dimensional space Exact match: l find if a given point is in the set or not Nearest neihgbor: l find the closest point to a given point Range search: l Given a region (rectangle or circle), find all the points in the given region 25
3 General approach l Divide the space into regions l Insert the new object into the corresponding region l If the region is full, split the region 26 l retrieval: determine which regions are required to answer a given query and limit the search to these regions
4 Is there an alternative to multidimensional space decomposition? l YES! Convert a given k-d space to 1D space We know how to handle 1D space!! l Don t we loose information?? Yes, but if we are careful, we can minimize the information loss. 27
5 Space filling curves l Convert a k-d space into 1D space such that points that are close to each other in k-d space are also close to each other in 1-D space 28
6 Row order/column order
7 Row order/column order N 30
8 Row order/column order Problems:
9 Row prime order/column prime order Problems: Not a problem:
10 Cantor diagonal order
11 Z-order curve (hilbert curve)
12 Z-order curve (hilbert curve) Easy to compute (bit-shuffling): 1(001) X 2 (010) = 6 (000110)
13 Z-order curve (hilbert curve)
14 Z-order curve (hilbert curve) X XX 38 Range search can be implemented using tries
15 Peano-hilbert curve
16 Indexing l What are we indexing??? Text à tries Numbers, text à B-trees, B+ trees, B*trees Imagesà??????????? l Which feature are we going to index on? Color? Texture? Time? (image series) l What do we need to specify? Lines? Points? Space? 46
17 How do we index points? 47 l Given a space of N-dimensions M points a distance function between points l we can use multidimensional index structures k-d trees point quadtrees MX quadtrees R-trees TV-trees X-trees
18 So l we can answer queries of the form Given l a point X in N-dimensional space Find l all points Y that are in its proximity (d(x,y) < ε) 48
19 thus l If we represent any feature as a point in N-dimensional space (color, texture, shape, etc.) we define a distance function between those points l Then l (larger distanceà lower similarity) we can find media object with similar properties. 49
20 Populate database ImgA ImgB 50
21 Populate database ImgA ImgB 51
22 Map query image ImgA query ImgB 52
23 Range search ImgA query ImgB δ A match 53
24 Grid File l Every cell is one disk page 1/2 54 1/2 3/4 1
25 Grid File l Every cell is one disk page 1/2 Wasted directory space!!!! 55 1/2 3/4 1
26 Point Trees
27 How can we divide space? l Let us assume that the space is 2-d l There are many ways to divide the space Fixed size squares Triangles Rectangles Arbitrary space decomposition A E 57 Each line divides the space into two Line: n 1 x+n 2 y =c Regions: n 1 x+n 2 y >=c n 1 x+n 2 y < c B D C
28 Point quadtrees (Finkel and Bentley 74) l Key features: Every node in a point quadtree implicitly represents a rectangular region. Each node contains an explicit point labeling it. Root represents the whole region. Each node s region is split into 4 parts ( quadrants ) by drawing a vertical and a horizontal line through the point labeling the node. Each node has 4 children corresponding to the 4 quadrants above. 59
29 Point quadtrees: example Represents whole region <12,11> <12,11> 0,0 60
30 Point quadtrees: example NW <9,15> <12,11> <9,15> <12,11> 0,0 61
31 Point quadtrees: example <12,11> NW SE <9,15> <14,3> <9,15> <12,11> <14,3> 0,0 62
32 Point quadtrees: example <12,11> NW <9,15> SE <14,3> <1,13> <9,15> SW <1,13> <12,11> <14,3> 0,0 63
33 Observation l The structure of the tree depends on the insertion order!!!! l Exercise: try to insert nodes in the following order <14,3> <12,11>, <1,13> <9,15> and compare the resulting tree with the previous one. 64
34 Key Points l Suppose a point quadtree has N nodes in it. l Worst case height = N. l Worst case insertion time = N. l Other operations are: Deletion: delete a point Range query: find all points within a given region NN query: find the nearest neighbor (or M nearest neighbors) of a given point. 65
35 Point quadtrees: range search <12,11> NW <9,15> SW SE <14,3> <1,13> <9,15> <12,11> SW SE <1,13> SE 0,0 <14,3> 66
36 Deletion l Suppose T is the root of a point quadtree and you want to delete <x,y>. l Steps: Find <x,y> by doing a search. If it is a leaf node, then simply set the appropriate link field of its parent to nil (and return the node to available storage). What if it is not a leaf? 67
37 Delete <12,11> <12,11> NW SE <9,15> <14,3> SW <1,13> <9,15> <12,11> <1,13> <14,3> 0,0 68
38 Delete <12,11> <12,11> NW SE <9,15> <14,3> SW <1,13> <9,15> <1,13> <14,3> 0,0 69
39 Delete <12,11> <12,11> NW SE <9,15> <14,3>?? SW <1,13> <9,15> <1,13>? 0,0 <14,3> 70
40 Let us choose <1,13> <1,13> NE <9,15> SE <14,3> <1,13> <9,15> <14,3> 0,0 71
41 Problems with Point Quadtrees l Deletion is slow. l Tree can be highly unbalanced. l Size of regions associated with nodes can vary dramatically. l All these factors make the time taken to compute NN and range queries unpredictable. 72
42 MX quadtrees l In point quadtrees, the region is split by drawing a vertical and a horizontal line through the point labeling node N. l In MX-quadtrees, the entire space is a 2 n x 2 n matrix. region is split by drawing a vertical and a horizontal line through the center of the region. 79
43 MX quadtrees: example Empty region 80
44 MX quadtrees: example NE NE <1,4> 81 Insert <1,4>
45 MX quadtrees: example NE NW NE NE <1,4> <1,2> 82 Insert <1,2>
46 MX quadtrees: example NE NW SW NE NE NE <1,4> <1,2> <3,2> 83 Insert <3,2>
47 MX-quadtrees: salient features l Each node represents a region. l Root (level 0) represents 2 n x 2 n region. l Nodes at level j represent 2 n-j x 2 n-j region. l Points label leaf nodes (at level n). l Insertion takes time O(n). l So does search for a point. 85
48 MX-Quadtrees: deletion l Very easy to delete a point. l First search for the point (which must be a leaf) and delete the leaf. l If the parent now has 4 empty child fields, then delete the parent. And repeat as long as possible. This process is termed collapsing. 86
49 PR-quadtrees l MX-quadtree works well if the data is discrete otherwise, it may need to use buckets, which may increase search time l PR-quadtree (point region quadtree) assumes a continous space. Structure is independent of insertion order Deletion is easy 89
50 PR-quadtrees l MX-quadtree works well if the data is discrete otherwise, it may need to use buckets, which may increase search time l PR-quadtree (point region quadtree) assumes a continous space. 90 BD-trees minimize the waste of space Structure is independent of insertion order Deletion is easy
51 KD-trees l Deficiencies of quadtree: each node requires k comparisons each leaf contains k null pointers node size gets larger as k increases 93
52 KD-trees l Deficiencies of quadtree: each node requires k comparisons each leaf contains k null pointers node size gets larger as k increases l Solution: KD-tree the tree is binary whatever k is!!! each node has two pointers only 94
53 KD-trees 95
54 KD-trees 96
55 KD-trees 97
56 KD-trees 98
57 KD-trees 99
58 KD-trees 10 0
59 KD-trees NW SE SW SW NE 10 1
60 KD-trees NW SE SW SW NE 10 2
61 KD-trees NW SE SW SW NE 10 3
62 10 6
63 R-trees l R-trees are used to store two dimensional rectangle data. l They can be easily generalized to higher dimensions. l R-trees themselves generalize the well known B-trees. 11 3
64 Node capacity l Each node in a R-tree can contain upto N rectangles. l But in addition, each node must contain at least N/2 rectangles. l We will assume henceforth that N >=
65 Node structure l Each node has between N/2 and N rectangles. l Like a B-tree: All leaves are at the same level Root has at least two children unless it s a leaf 11 5
66 Node properties l Each node implicitly represents a region. l Root represents the whole space. l The region of a node N, N.reg, is the bounding box of the rectangles stored at that node. l Unlike quadtrees, it is possible for regions of siblings to intersect. 11 6
67 Example R-tree a b c d c d a b 12 4
68 Example R-tree a b c d c d a e b 12 5 No space in root! Must split
69 Example R-tree g1 g2 c d a c d e a b g1 e b g Minimize the sum of the areas of the two BRs.
70 Example: Let max size be
71 How do we split??? 13 4
72 How do we split??? 13 5 Minimize overlap of the BRs
73 How do we split??? Q This search range can not be quickly pruned 13 6 Minimize overlap of the BRs
74 How do we split??? 13 7 Minimize total area
75 How do we split??? Q This search range requires access to two pages!!!! 13 8 Minimize total area
76 Insertion (similar to B-trees) Step 1: search the appropriate page 14 4 PAGE
77 Insertion (similar to B-trees) Step 1: search the appropriate page Step 2: if the page is full split and pushup 14 5 PAGE
78 Insertion (similar to B-trees) Step 1: search the appropriate page How about deletion?? Step 2: if the page is full split and pushup 14 8 PAGE
79 R+-tree l Overlaps are bad.. l..so, let s eliminate overlaps 15 0
80 Overlap in R-tree g1 g2 c g1 d a c d e a b f e f b g The two BRs are overlapping.
81 No-overlap in R+-tree g1 g2 c g1 d a c d e f1 a b f2 e f1 f2 b g The two BRs are not-overlapping.
82 Other range/region index structures l Range-tree, 2D-range tree Precise, too much overhead l MX-CIF quadtree Regular division Each rectangle is associated with the quadtree-page which covers it entirely 15 3
83 TV trees (telescopic vector trees) (Lin, Jagadish, Faloutsos, VLDB Journal, 1994) l Dimensionality curse: R-trees do not work for large numbers of dimensions l Idea: not all features are equally important order features based on importance (discrimination power) use as little features as possible contract and extend feature vectors based on need 15 7
84 Cost of a dimension l Every rectangle has to have values describing all its dimensions 15 8
85 Cost of a dimension l Every rectangle has to have values describing all its dimensions Disk Page vs Disk Page
86 Intuition Transportation vehicle 0-features Air Sea Land 1-feature 2-features jet-engine propeller road land 16 0 Classification requires less features at the higher levels than it uses at the lower levels
87 TV-trees l Hierarchical Leaves: objects (documents) Internal nodes: Minimum Bounding Regions l Higher fan-out at the root l Lower fan-out at the leaves (or lower levels) 16 2
88 Node structure in TV-trees l In R-trees, every node is a hyper-rectangle l In TV-trees, every node has a center (in k-dimensions) a radius (defined in n-dimensions) center radius unused f1 f2 fk fk+1 fk+n Feature importance
89 Node structure in TV-trees l In R-trees, every node is a hyper-rectangle l In TV-trees, every node has a center (in k-dimensions) a radius (defined in n-dimensions) n stays constant!!! center radius unused f1 f2 fk fk+1 fk+n contract extend
90 TV trees: example y z center r r x 16 7 C, the center, has only one dimension, x Radius has only one dimension, y...any z is okay (z is unused)
91 TV trees: extension example y z center r r x 16 8 C, the center, has only two dimensions, x,y Radius has only one dimension, z...any z is not okay!!!!!
92 Drawback l Information about the behaviour of single attributes, e.g., their selectivity, is required 17 1
93 X-tree l Like R-trees, but change the page size based on the depth to ensure that there is larger fanout higher in the tree structure l A larger page size means multiple disk pages that are consecutively stored so, no page seek penalty during disk access. 17 8
94 Dimensionality curse l Exponential growth in the number of pointers needed, wasted storage, l Exponential suqueries (quadtrees) l Larger MBRs means smaller fanout in trees and this is bad 17 9
95 Pyramid trees (Berchtold, Bohm, Kriegel, SIGMOD98) l Motivation: drawbacks of already existing multidimensional index structures Querying and indexing techniques which provide good results on l low-dimensional data do not perform sufficiently well on multi-dimensional data (curse of dimensionality) high cost for insert/delete operations Poor support for concurrency control/recovery 18 0
96 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves 18 1
97 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves 18 2
98 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves 18 3
99 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves
100 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves 1 2 h2 h
101 Pyramid tree l Space-filling curves were using B-trees l Pyramid trees also do the same..without space filling curves 1 2 key=(4,h2) h2 key=(3,h1) h This keys can be used as inputs to a B-tree!!!
102 Pyramid tree l If data is uniformly distributed, pages are likely to be of the same volume 18 8.
103 Pyramid tree l If data is uniformly distributed, queries likely to avoid thin pages, reducing the average access time bad good 18 9.
104 Other index structures l Grids l VA-files extension of the grid idea.. l SR-, SS-trees like R-trees use spheres instead of rectangles l X-trees 19 2 like R-trees change the page size based on the depth
105 Nearest neighbor search l..no range given first pick a (random) object o D and compute the distance dist(q, o).this is the first nearest neighbor candidate. start a range search on the hierarchy using the range, r = dist(q, o). whenever you find a data object o such that dist(q, o) < r, where r is the current nearest neighbor range, pick o is as the new nearest neighbor candidate l Set dist(q, o) as the new range, r 19 3
106 Nearest neighbor search l..great, but in which order do we visit the pages? l..how can we prune the pages that we have not visited yet most effectively?? 19 4
107 Nearest neighbor search l Given q and MBR M mindist(q,m) minimum possible distance between the query and the objects contained within the MBR minimum distance between q and any of the faces of M l optimistic; yet mindist based ordering of the MBRs provides good pruning opportunities. 19 5
108 Nearest neighbor search l Given q and MBR M minmaxdist(q,m) upperbound on the distances minimum among l maximum distances on the closest faces of M on each dimension mindist(q,m) <= r <= minmaxdist(q,m) 19 6
109 Nearest neighbor search l Cannot prune an MBR as long as mindist(q,m) <= r <= minmaxdist(q,m) l l l downward pruning: discard M if there exists M s.t. mindist(q,m) > minmaxdist(q,m ) downward pruning: prune candidate object o if there exists M s.t. dist(q,o) =r > minmaxdost(q,m) upward pruning: M is discarded if the current candidate is s.t. mindist(q,m) > r = dist(q,o) 19 7
110 Nearest neighbor search l What is we are looking for more than one, say k, nearest neighbors? Maintain a list of k candidates in the memory Always prune the search space using the current k th best candidate 19 8 When you find an object better than the current k th best candidate l Drop the current k th best candidate l Include the new object in the list of k candidates l Identify the new k th best candidate
111 Hashing for nearest neighbor search l Hashing generally works for equality searches l..can we use hashes for nearest-neighbor searches??? l.if they are locality sensitive, then yes! 19 9
112 Locality Sensitive Hashing (LSH) 20 0 l What is locality sensitive hashing? a grid is a locality sensitive hash a space filling curve is a locality sensitive hash More specifically, these are deterministic functions that tend to map nearby points to the same or nearby values. l Can we develop randomized locality sensitive hashes?
113 Locality Sensitive Hashing (LSH) l Let sim() be a similarity function l A locality sensitive hash corresponding to sim() is a function, h(), such that prob(h(o1) = h(o2)) = sim(o1,o2) 20 1 l The challenge is to find the appropriate h() for a given sim()
114 Locality Sensitive Hashing (LSH) l An LSH family, H, is (r, cr, P1, P2)-sensitive, if for any two objects oi and oj and for a randomly selected h() H if dist(oi, oj) r then prob (h(oi) = h(oj )) P1, if dist(oi, oj) cr then prob (h(oi) = h(oj)) P2 and P1 > P
115 Locality Sensitive Hashing (LSH) l Consider a (r, cr, P1, P2)-sensitive hash family, H l Let s create L composite hash functions gj(o) = (h1,j (o),..., hk,j(o)), 20 3 by picking L k hash functions, hi,j H, independently and uniformly at random from H.
116 Locality Sensitive Hashing (LSH) l Let us be given g1() through gl() and database, D, l Hash object o in D using g1() through gl() and include o in all matching hash buckets g1(o) = (h1,1 (o),..., hk,1(o)),. gl(o) = (h1,l (o),..., hk,l(o)) 20 4
117 Locality Sensitive Hashing (LSH) l Hash the query q in also using g1() through gl() and consider all objects in these hash buckets g1(q) = (h1,1 (q),..., hk,1(q)),. gl(q) = (h1,l (q),..., hk,l(q)) 20 5 l Key result: if L = log1 P1 k δ, then any object within range r is returned with probability at least 1 δ.
118 Locality Sensitive Hashing (LSH) l Then, how do we create a (r, cr, P1, P2)- sensitive hash family, H?? l.depends on the underlying sim() or δ() function 20 6
119 Locality Sensitive Hashing (LSH) l Assume d-dimensional binary vector; e.g. (0,1,1,1,0,, 1) l Let δ() be the hamming distance (number of differing dimensions between two vectors) l H contains all projections of the input point x on one of the coordinates; i.e., h i (x) = x i 20 7
120 Locality Sensitive Hashing (LSH) l Let p and q be two vectors in d-dimensional binary vector space δ() is the hamming distance 20 8 H contains h i (x) = x i l Note that prob[h(q) = h(p)] is equal to the fraction of coordinates on which p and q agree. l Then, if we select P1= 1 (r/d) and P2= 1 c(r/d) such that c>1 we have P 1 > P 2.
121 Locality Sensitive Hashing (LSH) l 20 9
122 Locality Sensitive Hashing (LSH) l 21 0
123 To eliminate false positives, hash function of each table is the intersection of multiple element hashes h() but this can increase the number of misses!!!!! g()=[h 1 (),,h c ()] [1,1] [1,2] miss false positive h1 211 h2 Animation by Renwei Yu
124 To reduce misses, union of L hash tables are used G={g 1,,g L } Single hash table can have large misses q g L (q) g 1 (q) g i (q) g g i g L 212 Animation by Renwei Yu Candidate Results Set Post-processing Database
125 Issues of Basic LSH l Large number of tables to achieve good search quality L>580 in [Buhler01] l Impractical for large datasets, need reduce hash tables Entropy-based LSH [Panigrahy06] Multi-Probe LSH [Qin07] 214..this slide by Renwei Yu
126 Issues of Basic LSH (continued ) l Data dependent parameters need hand-tuning Different bucket size is required to collect enough candidates to answer different KNN queries LSH-Forest [Bawa05] Multi-Probe LSH can also be self-tuning to answer different KNN queries 215..this slide by Renwei Yu
Multidimensional Indexes [14]
CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting
More informationDatenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser
Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image
More informationOrganizing Spatial Data
Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the
More informationChap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationSo, we want to perform the following query:
Abstract This paper has two parts. The first part presents the join indexes.it covers the most two join indexing, which are foreign column join index and multitable join index. The second part introduces
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationSimilarity Searching:
Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods, for Applications in Multimedia Databases Hanan Samet hjs@cs.umd.edu Department of Computer Science
More informationFoundations of Multidimensional and Metric Data Structures
Foundations of Multidimensional and Metric Data Structures Hanan Samet University of Maryland, College Park ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE
More informationkd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )
kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against
More informationModule 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.
The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012
More informationQuery Processing and Advanced Queries. Advanced Queries (2): R-TreeR
Query Processing and Advanced Queries Advanced Queries (2): R-TreeR Review: PAM Given a point set and a rectangular query, find the points enclosed in the query We allow insertions/deletions online Query
More informationSpace-based Partitioning Data Structures and their Algorithms. Jon Leonard
Space-based Partitioning Data Structures and their Algorithms Jon Leonard Purpose Space-based Partitioning Data Structures are an efficient way of organizing data that lies in an n-dimensional space 2D,
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More informationAlgorithms for GIS:! Quadtrees
Algorithms for GIS: Quadtrees Quadtree A data structure that corresponds to a hierarchical subdivision of the plane Start with a square (containing inside input data) Divide into 4 equal squares (quadrants)
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationAdvanced Data Types and New Applications
Advanced Data Types and New Applications These slides are a modified version of the slides of the book Database System Concepts (Chapter 24), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.
More informationGeometric Algorithms. Geometric search: overview. 1D Range Search. 1D Range Search Implementations
Geometric search: overview Geometric Algorithms Types of data:, lines, planes, polygons, circles,... This lecture: sets of N objects. Range searching Quadtrees, 2D trees, kd trees Intersections of geometric
More informationMesh Generation. Quadtrees. Geometric Algorithms. Lecture 9: Quadtrees
Lecture 9: Lecture 9: VLSI Design To Lecture 9: Finite Element Method To http://www.antics1.demon.co.uk/finelms.html Lecture 9: To Lecture 9: To component not conforming doesn t respect input not well-shaped
More informationCMSC 425: Lecture 10 Geometric Data Structures for Games: Index Structures Tuesday, Feb 26, 2013
CMSC 2: Lecture 10 Geometric Data Structures for Games: Index Structures Tuesday, Feb 2, 201 Reading: Some of today s materials can be found in Foundations of Multidimensional and Metric Data Structures,
More informationAdvanced Data Types and New Applications
Advanced Data Types and New Applications These slides are a modified version of the slides of the book Database System Concepts (Chapter 24), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.
More informationStriped Grid Files: An Alternative for Highdimensional
Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,
More information1. Meshes. D7013E Lecture 14
D7013E Lecture 14 Quadtrees Mesh Generation 1. Meshes Input: Components in the form of disjoint polygonal objects Integer coordinates, 0, 45, 90, or 135 angles Output: A triangular mesh Conforming: A triangle
More informationIndexing Techniques 3 rd Part
Indexing Techniques 3 rd Part Presented by: Tarik Ben Touhami Supervised by: Dr. Hachim Haddouti CSC 5301 Spring 2003 Outline! Join indexes "Foreign column join index "Multitable join index! Indexing techniques
More informationImage Rotation Using Quad Tree
80 Image Rotation Using Quad Tree Aashish Kumar, Lec. ECE Dept. Swami Vivekanand institute of Engneering & Technology, Ramnagar Banur ABSTRACT This paper presents a data structure based technique of rotating
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationGeometric Structures 2. Quadtree, k-d stromy
Geometric Structures 2. Quadtree, k-d stromy Martin Samuelčík samuelcik@sccg.sk, www.sccg.sk/~samuelcik, I4 Window and Point query From given set of points, find all that are inside of given d-dimensional
More informationData Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,
More informationSpatial Queries. Nearest Neighbor Queries
Spatial Queries Nearest Neighbor Queries Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer efficiently point queries range queries k-nn
More informationCS535 Fall Department of Computer Science Purdue University
Spatial Data Structures and Hierarchies CS535 Fall 2010 Daniel G Aliaga Daniel G. Aliaga Department of Computer Science Purdue University Spatial Data Structures Store geometric information Organize geometric
More informationAdvances in Data Management Principles of Database Systems - 2 A.Poulovassilis
1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently
More informationModule 8: Range-Searching in Dictionaries for Points
Module 8: Range-Searching in Dictionaries for Points CS 240 Data Structures and Data Management T. Biedl K. Lanctot M. Sepehri S. Wild Based on lecture notes by many previous cs240 instructors David R.
More informationWhat we have covered?
What we have covered? Indexing and Hashing Data warehouse and OLAP Data Mining Information Retrieval and Web Mining XML and XQuery Spatial Databases Transaction Management 1 Lecture 6: Spatial Data Management
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationChap4: Spatial Storage and Indexing
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationIntroduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana
Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationAnnouncements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday
Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationUsing Natural Clusters Information to Build Fuzzy Indexing Structure
Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationI/O-Algorithms Lars Arge Aarhus University
I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =
More information9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES
CONFERENCES Short Name SIGMOD Full Name Special Interest Group on Management Of Data CONTINUOUS NEAREST NEIGHBOR SEARCH Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationX-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree
X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationSpace-filling curves for spatial data structures
Space-filling curves for spatial data structures Herman Haverkort & Freek van Walderveen, TU Eindhoven Report all rectangles that intersect rectangular query range Space-filling curves for spatial data
More informationNearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017
Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationAccess Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value
Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationSpatial Data Structures for Computer Graphics
Spatial Data Structures for Computer Graphics Page 1 of 65 http://www.cse.iitb.ac.in/ sharat November 2008 Spatial Data Structures for Computer Graphics Page 1 of 65 http://www.cse.iitb.ac.in/ sharat November
More informationModule 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.
The Lecture Contains: Hashing Dynamic hashing Extendible hashing Insertion file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture9/9_1.htm[6/14/2012
More informationChapter 17 Indexing Structures for Files and Physical Database Design
Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationDatabase System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static
More informationA Tutorial on Spatial Data Handling
A Tutorial on Spatial Data Handling Rituparna Sinha 1, Sandip Samaddar 1, Debnath Bhattacharyya 2, and Tai-hoon Kim 3 1 Heritage Institute of Technology Kolkata-700107, INDIA rituparna.snh@gmail.com Hannam
More informationChapter 25: Spatial and Temporal Data and Mobility
Chapter 25: Spatial and Temporal Data and Mobility Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 25: Spatial and Temporal Data and Mobility Temporal Data Spatial
More informationSpatial Data Management
Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value
More informationData Structures for Approximate Proximity and Range Searching
Data Structures for Approximate Proximity and Range Searching David M. Mount University of Maryland Joint work with: Sunil Arya (Hong Kong U. of Sci. and Tech) Charis Malamatos (Max Plank Inst.) 1 Introduction
More informationOn the data structures and algorithms for contact detection in granular media (DEM) V. Ogarko, May 2010, P2C course
On the data structures and algorithms for contact detection in granular media (DEM) V. Ogarko, May 2010, P2C course Discrete element method (DEM) Single particle Challenges: -performance O(N) (with low
More informationSpatial Data Management
Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite
More informationimproving raytracing speed
ray tracing II computer graphics ray tracing II 2006 fabio pellacini 1 improving raytracing speed computer graphics ray tracing II 2006 fabio pellacini 2 raytracing computational complexity ray-scene intersection
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationCMSC724: Access Methods; Indexes 1 ; GiST
CMSC724: Access Methods; Indexes 1 ; GiST Amol Deshpande University of Maryland, College Park March 14, 2011 1 Partially based on notes from Joe Hellerstein Outline 1 Access Methods 2 B+-Tree 3 Beyond
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationMultidimensional Indexing The R Tree
Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationRay Tracing Acceleration Data Structures
Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has
More informationNearest Neighbor Classification. Machine Learning Fall 2017
Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationIntersection Acceleration
Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationComputational Geometry
Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms, National Kapodistrian U. Athens ATHENA Research & Innovation Center, Greece Spring 2018 I.Emiris
More informationSummary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:
Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,
More information9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology
Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive
More informationIndexing High-Dimensional Data for Content-Based Retrieval in Large Databases
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University
More informationUnsupervised Learning Hierarchical Methods
Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach
More informationDatabase System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationSorting in Space and Network Distance Browsing
Sorting in Space and Network Distance Browsing Hanan Samet hjs@cs.umd.edu http://www.cs.umd.edu/ hjs Department of Computer Science University of Maryland College Park, MD 20742, USA These notes may not
More informationCS127: B-Trees. B-Trees
CS127: B-Trees B-Trees 1 Data Layout on Disk Track: one ring Sector: one pie-shaped piece. Block: intersection of a track and a sector. Disk Based Dictionary Structures Use a disk-based method when the
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie
More informationCAR-TR-990 CS-TR-4526 UMIACS September 2003
CAR-TR-990 CS-TR-4526 UMIACS 2003-94 September 2003 Object-based and Image-based Object Representations Hanan Samet Computer Science Department Center for Automation Research Institute for Advanced Computer
More informationApproximate Nearest Neighbor Search. Deng Cai Zhejiang University
Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0
More informationDifferentially Private H-Tree
GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern
More informationHot topics and Open problems in Computational Geometry. My (limited) perspective. Class lecture for CSE 546,
Hot topics and Open problems in Computational Geometry. My (limited) perspective Class lecture for CSE 546, 2-13-07 Some slides from this talk are from Jack Snoeyink and L. Kavrakis Key trends on Computational
More informationImage Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 18.
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 18 Image Hashing Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph:
More informationA locality-aware similar information searching scheme
Int J Digit Libr DOI 10.1007/s00799-014-0128-9 A locality-aware similar information searching scheme Ting Li Yuhua Lin Haiying Shen Received: 5 June 2013 / Revised: 3 September 2014 / Accepted: 11 September
More informationR-Trees. Accessing Spatial Data
R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion
More informationAdvanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b
Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees
More informationModule 4: Index Structures Lecture 16: Voronoi Diagrams and Tries. The Lecture Contains: Voronoi diagrams. Tries. Index structures
The Lecture Contains: Voronoi diagrams Tries Delaunay triangulation Algorithms Extensions Index structures 1-dimensional index structures Memory-based index structures Disk-based index structures Classification
More informationUniversiteit Leiden Computer Science
Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie
More informationSpatial data structures in 2D
Spatial data structures in 2D 1998-2016 Josef Pelikán CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ Spatial2D 2016 Josef Pelikán, http://cgg.mff.cuni.cz/~pepca 1 / 34 Application
More information18 Multidimensional Data Structures 1
18 Multidimensional Data Structures 1 Hanan Samet University of Maryland 18.1 Introduction 18.2 PointData 18.3 BucketingMethods 18.4 RegionData 18.5 RectangleData 18.6 LineDataandBoundariesofRegions 18.7
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More information