Indexing of moving objects. Simonas Šaltenis Aalborg University

Size: px
Start display at page:

Download "Indexing of moving objects. Simonas Šaltenis Aalborg University"

Transcription

1 Indexing of moving objects Simonas Šaltenis Aalborg University

2 Welcome! The audience of the course, what this course is about and what it is not about: research results not how to use commercial DBMSs, but we will talk about them too Important: ideas Agenda: Indexing of current time and future: TPR-trees, TPR*-trees, STRIPES Indexing of the past: B x -trees, BB x -trees, 3D R-trees, Partial persistence Acknowledgment Yufei Tao (TPR*-tree slides) Simonas Šaltenis, 006

3 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 3

4 Why Moving Objects? Position-aware, online, moving objects are enabled by the following trends. Miniaturization of electronics Advances in positioning systems (e.g., GPS, assisted GPS,...) Advances in wireless communications Examples of position-aware online moving objects GPS-enabled mobile-phones, as well as diverse types of personal digital assistants (online cameras, wrist watches, etc.). The coming years will witness very large quantities of these. Vehicles, including cars, public transportation, recreational vehicles, sea vessels, airplanes, etc. Simonas Šaltenis, 006 4

5 Sample Location-Based Services Context-aware Location-Based Services: Location-aware advertising Integrated tourist services Traffic coordination and management Identification of impending traffic jams and expected fastest routes between positions Safety-related services It is possible to monitor tourists traveling in dangerous environments, and then react to emergencies. Games! Sensor-networks also generates MO data! Monitoring of any kind-of continuous variables, e.g., temperature, pressure (1D, nd). Simonas Šaltenis, 006 5

6 Requirements There is a need to track the positions of large amounts of objects in a database main-memory solutions may not always suffice, especially if history needs to be recorded There is a need for efficient processing of queries: Find all mobile-phones that are currently no further than 500m from the supermarket Find all mobile-phones that will be no further than 500m from the supermarket in 10 minutes (according to the current knowledge of their movement) Find ten objects that are currently closest to the supermarket Find all mobile-phones that were no further than 500m from the supermarket last Saturday between 10 and 11 a.m. Simonas Šaltenis, 006 6

7 Rules of the game Measure of efficiency number of I/O operations One I/O operation reads or writes a page of data (~Kb 16Kb) from the disk to memory A page of data = a node of data structure (index) consisting of a number B of entries The size of the data set in disk blocks n = N/B CPU cost is mostly neglected Operations that have to be supported: Different kind of queries (that's why we want an index) Insertions Deletions Update = Deletion + Insertion Bulk-loading the index Simonas Šaltenis, 006 7

8 Multidimensional DS Multidimensional Data Structures are used to answer queries on the set of stationary multidimensional points Range query ([x 1, x ], [y 1,y ]): find all points (x, y) such that x 1 <x<x and y 1 <y<y, and do it faster than a linear scan (Θ(n)) preferably O(log n) in worst-case or at least on average Simonas Šaltenis, 006 8

9 1D Range-Searching: B-trees B-trees [Bayer & McCreight, 197] a balanced diskbased search tree B+-trees All data entries (keys) are in leaf-nodes Based on an ordering < Simonas Šaltenis, 006 9

10 B-trees (cont.) Internal (directory) nodes direct queries and point to lower-level nodes Leaf nodes include actual keys and pointers to data items Guaranteed 50% capacity by insert and delete algorithms split / merge routines Range query: find x, such that x 1 <x<x Can be done in O(log n + k) worst-case time, where k is the size of the answer in disk blocks x 1 x Simonas Šaltenis,

11 Multidimensional Range Searching Much harder than one-dimensional problem A lot of multidimensional index methods have been proposed The theoretically optimal ones [Arge et al. 1999] that achieve O(log n + k) worst-case query performance: are unpractically complex, require a little bit more than linear-space Easy to implement, linear-space methods: K-D-B-trees, Quadtrees, Grid Files (for points) R-trees and variations (work for non-point objects as well) Concurrency control, recovery protocols for R-trees are well researched Implemented in Oracle, IBM Informix, PostgreSQL, MySQL Not only range queries, but a wide variety of other types of queries are supported by R-trees (nearest-neighbor queries, joins) Simonas Šaltenis,

12 Spatial Indexing With the R-Tree Example R1 p6 Query R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p1 p p3 p4 Pointers to data items p5 p6 p7 p8 p9 p10p11 p1p13 Simonas Šaltenis, 006 1

13 R-tree Properties Leaf entry = <n-dimensional point, rid> Internal entry = <n-dim MBR, ptr to a child node> MBR a Minimum Bounding Rectangle of all points in the sub-tree pointed to by ptr R-tree is a balanced tree all leaves are at same depth from the root Nodes are kept at least m% full (except root) through insertion and deletion algorithms m is usually chosen to be 40%. m is the minimum fill factor, depending on the workload the average fill factor is usually 70%. Simonas Šaltenis,

14 Range Query in R-trees Range query: RangeQuery(Q, root) RangeQuery(Q, node) If node is internal, for each entry <MBR, ptr> in node: if MBR overlaps Q, RangeQuery(Q, ReadNode (ptr)) If node is leaf, for each entry <E, rid> in node: if E overlaps Q, report rid Note: We may have to search several subtrees at each node! What about B-trees? Worst-case performance O(n)! But in practice, R-trees exhibit good query performance for various data sets How do we insert and delete? Simonas Šaltenis,

15 Grow-Post trees Grow-Post trees: generalized R-tree-type indexes Bounding predicate (BP) = something that describes entries in a subtree Building blocks of algorithms: Consistent(BP, Q) returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q) PickSplit(node) splits a page of entries into two groups Penalty(BP, E) returns an estimate how worse BP becomes if E is inserted under it Union(node) computes a BP of a collection of entries (in the R-tree, computes an MBR minimum and maximum in all dimensions ) Internal Nodes Leaf Nodes BP 1 BP..... BP n Simonas Šaltenis,

16 Insertion Insert(E) leaf = ChoosePath(E, root) Insert E into leaf PropogateUp (leaf) ChoosePath(E, node) If node is leaf, return node. From all entries in node, choose entry <MBR, ptr> with the smallest Penalty (MBR, E). ChoosePath(E, ReadNode(ptr)). PropogateUp(node) If node is overfull, call PickSplit(node) to produce n1 and n, replace node s old entry in its parent by e1 = Union(n1), e = Union(n), call PropogateUp(node's parent) Else if e = Union(node) is different from node s old entry in its parent, replace the old entry with e, callpropogateup(node's parent). Create a new root with two entries whenever a root is split. Simonas Šaltenis,

17 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree s Penalty. R1 p6 R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p14 p1 p p3 p4 Pointers to data tuples p5 p6 p7 p8 p9 p10p11 p1p13 Simonas Šaltenis,

18 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree s Penalty. R1 p6 R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p14 p1 p p3 p4 Pointers to data tuples p5 p6 p7 p8 p9 p10p11 p1p13 p14 Simonas Šaltenis,

19 PickSplit The entries in a node node plus the newly inserted entry must be distributed between n1 and n The goal is to reduce a likelihood of both n1 and n being searched on subsequent queries Again we use heuristics of minimum area. Redistribute so as to minimize the area of n1 plus the area of n. Problem there is an exponential number of possible distributions of entries to consider. Solutions: Guttman s quadratic gready algorithm Good split R*-tree s algrithm Bad split Simonas Šaltenis,

20 Deletion Delete(E) Using the search procedure, find a leaf where entry E is located Remove E from leaf. Call PropogateUp(leaf). PropogateUp(leaf) 1. If leaf is underfull, deallocate the node leaf remove leaf s entry in its parent, call PropogateUp(leaf s parent), and reinsert all leaf s entries.. Else if e = Union(leaf) is different from leaf s old entry in its parent, replace the old entry with e, call PropogateUp (leaf s parent). No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine. Simonas Šaltenis, 006 0

21 Comments on R-Trees Works well for - 4 D datasets. Several variants (notably, R + and R*-trees) have been proposed; widely used Supports a wide variety of queries Point / range queries Spatial join queries [Brinkhoff et al., 1993] Direction, topological, distance queries [Papadias et al., 1995] k- Nearest neighbor queries [Roussopoulos et al., 1995] Simonas Šaltenis, 006 1

22 Exercises R-trees Insert points p1, p, p1, p13, p5, p14, p8, p7, p11, p9 from slide 17 into an initially empty R-tree with the node capacity of 3 entries and minimum allowed fan-out of. Use Guttman s quadratic split algorithm. Would you build a better tree from scratch if all points were known in advance? What is the difference between the B-tree and the 1D R-tree? What kind of data the 1D R-tree can store that the B-tree can not? What is the bounding predicate in B-trees? Consider nearest neighbor queries ( find a data point that is nearest to the query point ) in R-trees: Can they be implemented redefining the Consistent(E, Q) method, do you need something more? Simonas Šaltenis, 006

23 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 3

24 Problem Statement We address the problem of indexing the ever-changing current and predicted future positions of point objects moving in one, two, and three-dimensional space. Indexing challenges specific to MOs continuous change of positions extrapolation between the last update and the current time must be supported hyper-dynamic workloads high-rates of updates Simonas Šaltenis, 006 4

25 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data Simonas Šaltenis, 006 5

26 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions. We use linear functions to capture the present and future positions. x( t) = x( t0) + v( t t0), where t now Updates are less frequent Tentative future queries are supported t 0 For example, given, the current and anticiapted, future position of a twodimensional point can be described by four parameters. x ( t0), y( t0), v x, v y Simonas Šaltenis, 006 6

27 Modeling Continuous Movement Three ways to think about continuously moving points in d-dimensional space: Lines in (d+1)-dimensional space d spatial dimensions and 1 time dimension Points in d-dimensional space d spatial and d velocity dimensions (function parameters: ) Time-parameterized points in d-dimensional space x t ), v ( 0 x x(t 0 ) o 3 o o 1 o o 1 o t v Simonas Šaltenis, 006 7

28 Queries Type 1: objects that intersect a given rectangle at t x Type : objects that 6 o 3 intersect a given 5 4 rectangle sometime 3 o from t1 to t o 1 o Type 3: objects that 1 1 o intersect a given 4 moving rectangle t sometime between t 1 andt We can expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t 1, t <= CT + W Simonas Šaltenis, 006 8

29 Time-Parameterized Rectangles The TPR-tree [Šaltenis et al. 000] is based on the R- tree. Moving points are bounded with time-parameterized rectangles. Are bounding from now on. The R-tree allows overlap. The tree employs conservative bounding rectangles. Union( node) : min i c i c max i c i c { } x ( t ) = min o. x ( t ) o node { } x ( t ) = max o. x ( t ) o node { } v = min o. v o node { } i Simonas Šaltenis, min i v = max o. v o node max i i

30 Entry Structure, Querying Do we need to store t c in each entry? No we use one common reference time t r (the same for data points): x = x ( t ) = x ( t ) + v ( t t ) Entry structure: <TPBR, ptr> min min min min i i r i c i r c x = x ( t ) = x ( t ) + v ( t t ) max max max max i i r i c i r c min max min max min max min max TPBR = MBR,VBR = ( x, x, x, x ),( v, v, v, v ) At any t > CT we can get a valid R-tree: TPR-tree(t) = R-tree x () t = x ( t ) + v ( t t ) min min min i i r i r x () t = x ( t ) + v ( t t ) max max max i i r i r Simonas Šaltenis,

31 Tightening Ideally, bounding rectangles should be always minimal. Excessive storage cost x t TPBRs are tightened (i.e., recomputed) whenever a bounded node is modified Simonas Šaltenis,

32 Tightening Ideally, bounding rectangles should be always minimal. Excessive storage cost x t TPBRs are tightened (i.e., recomputed) whenever a bounded node is modified Simonas Šaltenis, 006 3

33 Simonas Šaltenis, Insertion: Grouping Points How to group moving points (Penalty and PickSplit)? The R-tree s algorithms minimize characteristics of MBRs such as area, overlap, and margin. How does that work for moving points?

34 Insertion in the TPR-Tree The bounding rectangle characteristics (area, overlap, and margin) are functions of time. The goal is to minimize these for all time points from now to now+h. Minimizing the characteristics for time now + H/ does not work (e.g., the area of a conservative bounding rectangle is not linear). We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals. now+ H now A ( t) dt, where A(t) is, e.g., the area of an MBR What H to use? H depends on the update rate, and on how far queries may reach into the future (W). Simonas Šaltenis,

35 Example I We illustrate the working of the TPR-tree by means of an example. The subsequent figures are generated automatically, by the index code used for performance experiments. Data 0 one-dimensional points are used. Index Parameters Page size = 64 (5 entries in leaf nodes and 3 in non-leaf nodes). H = 8. Simonas Šaltenis,

36 Example II CT = 0 At CT=1, the point at x = 0, v = 0 is updated to have x = 18.5, v = Simonas Šaltenis,

37 Example III CT = 1 Inserting a moving point at position 14 with v = 0.5. Simonas Šaltenis,

38 Example IV After insertion Simonas Šaltenis,

39 What H to use? Intuitivly: we want H to be equal to the time during which queries will see the node that we consider modifying: H depends on the update rate, and on how far queries may reach into the future (W) Experiments show that H = UI + W consistently gives good query performance (UI average update interval) The system can track UI automatically (How?) W is usually smaller than UI can be tracked automatically too Simonas Šaltenis,

40 TPR-Tree Bulk-Loading I Regular R-tree bulk loading aims at dividing the space into roughly equal, square MBRs (e.g., STR). It is a straightforward approach to use STR, interpreting the d-dimensional moving points as d-dimensional points. The ratio between the spatial and velocity extents of BRs is important. α We introduce a parameter that expresses how many times the velocity extents of BRs are longer than their spatial extents. Given a fixed number of BRs to produce, we can make them large in the spatial dimensions, but small in the velocity dimensions (i.e., growing slowly). This corresponds to a small α. make them small in the spatial dimensions, but large in the velocity dimensions (i.e., fast growing). This corresponds to a large. α Simonas Šaltenis,

41 TPR-Tree Bulk Loading II Example: Using either initially large, but slow-growing, or initially small, but fast-growing, time-parameterized BRs. x(t 0 ) V s S v x t x(t 0 ) t α = 1 (good for a large H) 4 α = 1 Goal: find an optimal α as a function of H. v x (good for a small H) Simonas Šaltenis,

42 TPR-Tree Bulk Loading III One-dimensional uniform data Let k be the number of nodes and the space-time ratio S V SV k = s =. Area (length) A( t) = s + tsα. s sα kα H SV We integrate A( t) : I = s (1 + tα ) dt = ( H + 0 kα I To minimize I, we solve = 0 α = α H For D data,, and for 3D data, α α = H H gets smaller with increasing dimensionality Intuitive as hyper-volumes grow faster in more dimensions α H α). α H Simonas Šaltenis, 006 4

43 TPR-Tree Bulk Loading IV A modified STR is used that achieves the desired -ratio. Example: 000 points, 0 points per leaf-node. α H = 4, α = 1 H = 16, α = 1 8 Simonas Šaltenis,

44 What about Expanding BRs? Will the expanding TPBRs ruin the performance? That's what the experiments show: Settings: objects update only once per hour! D data, with W = 40. Due to the constant influx of updates, the performance of the TPR-tree does not degrade after reaching a certain level. Simonas Šaltenis,

45 Summary The TPR-tree indexes the current and predicted future positions of moving objects. The TPR-tree is based on the proven, widely used R-tree technology The tree extends the R*-tree by introducing conservative, timeparameterized bounding rectangles, which are tightened regularly. The tree s algorithms use integrals of area, overlap, etc. The tree can be tuned to take advantage of a specific update rate and querying window length. Other types of queries that are supported by the R-tree can be supported by the TPR-tree, e.g., nearest-neighbor queries. Simonas Šaltenis,

46 Exercises TPR-tree Write a Consistent (Q, E) function for a d-dimensional TPR-tree. E = [ x, x ],...,[ x, x ];[ v, v ],...,[ v, v ], 1 1 d d 1 1 d d Q= [ q, q ],...,[ q, q ];[ w, w ],...,[ w, w ];[ t, t ] 1 1 d d 1 1 d d where [ x, x ], and [ q, q ] are TPBRs' extents at time t. i i i i Consider SR-tree, where instead of MBR s minimum bounding circles (spheres) are used. Define the bounding predicate (and a Consistent (Q, E) function) of a time parameterized SR-tree. Simonas Šaltenis,

47 TPR*-tree Observation: TPR-tree integrated heuristics optimize the tree for time-slice queries (Why?) What if we want to target window and moving queries? A query is given as a TPBR plus a time window: Q = ( qx, qx, qx, qx ),( qv, qv, qv, qv ),( t, t ) min max min max min max min max min max TPR*-tree [Tao et al., 003]: Uses heuristics targeted for window and moving queries Uses other (structural) improvements of algorithms: Modified ChoosePath Modified PickSplit and PickWorst (for reinsertions) Active tightening during deletions Simonas Šaltenis,

48 Sweeping Region Heuristics Once again: What does it mean for a static rectangle to intersect with a static query? y axis y axis O Q O' Q center x axis i x axis Simonas Šaltenis,

49 Sweeping Region Heuristics What does it mean for a TPBR O to intersect with a moving query at some time point t? The same as for the TPBR O' to intersect the static query centroid: MBR of O' on i-th axis: VBR of O' on i-th axis: q q x x + min max max min ( v qv, v qv ) min max ( i, i i i ) i i i i y axis O -1 Q y axis O' 3 3 Q center x axis i x axis Simonas Šaltenis,

50 Sweeping Region Heuristics What does it mean for a TPBR O to intersect with a moving query at any point in time interval (t min,t max )? The same as for the sweeping region of TPBR O' to intersect the static query centroid y axis O' 3 Q center i x axis Simonas Šaltenis,

51 Sweeping Region Heuristics Instead of integrals of TPR-tree, areas of sweeping regions are used: E.g.: enlargment of the integral of area of O is replaced by the enlargment of the area of the sweeping region of O' Assumption about query: static point Compare these two heuristics: What happens if we have a static MBR? What happens if we have a moving MBR that does not change area? Does this confirm the intuition that these heuristics optimize for differnet queries? Simonas Šaltenis,

52 TPR Problem: ChoosePath To insert an entry, the TPR-tree picks the sub-tree incurring the minimum penalty y axis the (absolute) values of all velocities are 1 b a g y axis b g p d 4 i (static) e f c time 0 d h x axis What would a TPR-tree do? Simonas Šaltenis, i e f c inserting p at time May result in inserting an entry into a bad sub-tree a h x axis

53 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm) y axis i b e f c inserting p at time g p a h d x axis Maintain a heap: [(g),0], [(h),0], [(i),0] the path expanded so far the accumulated penalty so far Simonas Šaltenis,

54 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). 10 y axis Visit node g: 8 6 b g p d [(h),0], [(a,g),3], [(i),0], [(b,g),3] 4 i e f c inserting p at time a h x axis complete paths already although nodes a and b are not visited Simonas Šaltenis,

55 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm) y axis i b e f c inserting p at time g p a h d x axis Visit node h: [(a,g),3], [(d,h),9], [(c,h),17], [(i),0], [(b,g),3] The algorithm stops now. Simonas Šaltenis,

56 TPR* solution: ChoosePath It is more expensive. Why does it make sense to do? Query performance is obviously improved Insertions are more expensive But overall update improves: Remember: update is deletion + insertion The main part of the cost of deletion (and all update) is search, which we improved with the new ChoosePath! Simonas Šaltenis,

57 TPR: Tightening MBRs in deletion Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries. Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f) y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a e deleting e at time 1 d c j x axis Simonas Šaltenis,

58 TPR: Tightening MBRs in deletion Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries. Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f) y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a d c j x axis after deleting e at time 1 Simonas Šaltenis,

59 TPR* solution: Active tightening Tightening more entries for free. Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a e deleting e at time 1 d c j x axis Simonas Šaltenis,

60 TPR* solution: Active tightening Tightening more entries for free. Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i y axis the (absolute) values of all velocities are 1 f g i e d y axis i g f 4 h a b c j 4 b h a d c j time 0 x axis x axis after deleting e at time 1 Simonas Šaltenis,

61 TPR* solution: Active tightening Another example: Assume the shaded nodes are accessed to find e. The active tightening can tighten the MBR of n 5, n 6, n 3, and n 4. But not n 1 and n. Simonas Šaltenis,

62 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 6

63 Modeling Continuous Movement Three ways to think about continuously moving points in d-dimensional space: Lines in (d+1)-dimensional space d spatial dimensions and 1 time dimension Points in d-dimensional space d spatial and d velocity dimensions (function parameters: x ( t 0 ), v ) Time-parameterized points in d-dimensional space our approach x o 3 o o t o x(t 0 ) o 1 o v Simonas Šaltenis,

64 STRIPES Using Duality The main idea of STRIPES [Patel et al., 004] is to transform queries and points using the duality transformation: Lines become points, points become lines Trajectories become points Queries become region queries Use Quadtrees to index points That's it! Simonas Šaltenis,

65 Quadtrees Quadtree a four-way partition tree Linear space Good average query performance Orcale has them a b 1 c d e c f 3 f 4 d e b a Simonas Šaltenis,

66 Query Transformation Exercise Let's see how a window query Q = ([x l, x u ],[t l, t u ]) is transformed: x u x l 1 x x(t 0 ) o ? t v t i t u 1 o 1 o Simonas Šaltenis,

67 STRIPES: Summary Avoids the problem of expanding TPBRs Uses reference positions: No tightening is performed Has to assume L (the longest update interval) and to maintain two indixes: from t ref to t ref + L, and from t ref + L to t ref + L Index is not tuned to a specific horizon H (determined by updating and querying pattern) Remember TPR-trees bulk-loaded with different values of the α parameter (corresponding to the different values of H)! Simonas Šaltenis,

68 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

69 B x -tree [Jensen et al., 004] The goal: to index the current and future positions of moving objects using B-trees B-trees are at the core of every DBMS Very good concurrency-protocols are available for B-trees (and are implemented in all DBMS) improtant when we track a lot of objects B-tree updates are very efficient and we have a lot of them But how do we store and query these future linear trajectories of objects? Key ideas used: Space-filling curves Query expansion Simonas Šaltenis,

70 Space-filling curves How can we index multi-dimensional points with B-trees? Space-filling curve: An infinite curve that visits all points of a discretized space (usually one quadrant (if D) of space) In this way it uniquely maps multi-dimensional points to natural numbers Building Hilbert curve: (3,4) Simonas Šaltenis,

71 Transforming Queries Other space-filling curves can be constructed, e.g., Peano curve (Z-ordering) How do we perform range queries on the B + - tree storing the transformed points? Multi-dimensional range query is transformed into a set of intervals Simonas Šaltenis,

72 Enlarging Queries What about moving points? Just index their positions re-computed at some reference time point (t ref ) How do we ask range queries then? Let's assume we know the VBR of all points, i.e., maximum velocities of objects in all four directions: v l 1,v u 1,v l,v u Insight: If the query time is t q, we can expand the query to catch all necessary objects, knowing that objects traveling at maximum speed v, could have moved not further than v t ref t q. Simonas Šaltenis, 006 7

73 Multiple B-trees Problem: we may need to expand queries too much if the difference t ref t q is large. Idea: divide objects into n + 1 groups use a separate B- tree with a separate t ref for each group! Example: n + 1 = Only half of the points are queried with the maximum enlargment. What n to choose? t 1 t t q Large n: Smaller average query enlargment But, more queries are performed in more B-trees Authors propose to use n = Simonas Šaltenis,

74 Index Structure All three B + -trees are combined into one that has three partitions: 0, 1, and. Objects are assigned to partitions based on their last update time-stamps t u. Let's assume that all objects update more frequently than every Δt mu time units: t lab =t u t mu /n l partition= t lab / t mu /nmod n1 xrep= xvaluexv t lab t u The key then is (partition,xrep) Simonas Šaltenis,

75 Technicalities If an object does not update itself in Δt mu, it is migrated from partition 0 to partition n. Instead of using maximum VBR of all objects for expansion: The space is divided into a grid of cells, each sell maintains a VBR of objects in that cell (in main memory). The global VBR is used to expand the query, then local VBR is computed from the covered cells. The local VBR is used to recompute a possibly smaller expansion of query. The filtering of query results has to be done at the end. Simonas Šaltenis,

76 Exercise Which objects will be returned by the query q? Which are false drops? What is the precision of the query? Assume global VBR for query expansions Positions of objects are given at update times (given after comma) E, C,4 q B,4 A,1 D, Δt mu =6 t q =7 0 3 CT 6 9 time Simonas Šaltenis,

77 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

78 Trajectory data Positions between measurements are linearly interpolated. What we have are 3D polypines Two spatial dimensions + one time dimension The same types of queries as for the current-future data Simonas Šaltenis,

79 Recording of Movement How do we get the data? It is given as a static file of positions or new data is constantly appended to the end If new data is appended to the end: The last segment is probably an infinite trajectory based on the given velocity vector We have to assume that an object is somewhere between the last update and the current time Then, whenever an update happens, we have to update the last infinite peace of trajectory into a new peace of trajectory and insert a new infinite peace of trajectory Simonas Šaltenis,

80 Recording of Movement: Example pos Linear prediction of future movement Recorded polyline u1 u u3 u4 u5 CT time Simonas Šaltenis,

81 R-tree Based Indexing of Trajectories We can use a 3D R-tree! But Line segments (especially long ones) are difficult to index because they introduce large MBRs with a lot of overlap We can cut long line segments into smaller the index becomes larger STR-tree and TB-tree [Pfoser et al., 000]: Modifications of 3D R-tree, that strive for trajectory preservation (segments of the same trajectory should be clustered together): Heuristics of the tree changed In the TB-tree, a linked list conects line segments in the leaf level into a trajectory Trajectory queries are supported: Find all the taxis that on Monday at 10:00am were 1km around Fredrik Bajers Vej 7 and then drove to the city center Problem historical time-slice queries perform worse and worse, the more history you accumulate. Simonas Šaltenis,

82 BB x -tree It is easy to see how the B x -tree can be modified to record history: Have a separate B + -tree for each partition objects updated during [t ref Δt mu /n; t ref ] are stored in this B + -tree according to their positions at t ref Entry structure: (xrep, t start, t end, ptr), xrep is used as a key, for internal nodes t start and t end is the minimum and the maximum of timestamps of children. Simonas Šaltenis, 006 8

83 BB x -tree A query accesses those B + -trees whose lifespans overlap with the query BB x -tree can support both historical and future queries Technicalities: Together with each tree the histogram of VBRs is stored If an object does not update itself in Δt mu, it is migrated to the last partition: But it is not easy to do this, some sort of trigers are needed ( priority queue) Simonas Šaltenis,

84 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

85 Partial Persistence Data structures that do not record history of changes are called ephemeral data structures (e.g., B-tree, R-tree) The technique of partial persistence transforms an ephemeral data structure into a partially persistent data structure which records the history and: Answers historical time-slice queries with the same asymptotic worst-case performance as an ephemeral sturcture Has the size proportional to the number of updates How is this achieved? Let's look at R-trees Entry structure: (point/mbr,t start, t end, ptr) For data (leaf) entries t start is the insertion time, t end is the deletion time. For non-leaf entries t start is the time when the ptr node was created, t end is the time when the ptr node was time split Simonas Šaltenis,

86 Time Split An entry is alive if t end =, else it is dead. The key operation is time spliting of a node A: A new node is created and all live entries from A are copied to the new node. No cahnges are done to A, and it will never change again, t end of the live parent entry of A is set to the current time, a new live entry is created pointing to the new node. A node is called dead if it was time split. When the root is time split, a pointer to it is put in a root array Queries consider only entries live at the time of query Simonas Šaltenis,

87 Time Split: Example Time spliting a node when it is overfull (B = 5), CT = 11: A(5,10) B(7, ) A(5,10) B(7, ) M(11, ) C(5,9) D(6, ) E(5, ) F(5,10) G(7, ) H(7,10) I(8,9) J(8, ) K(8,10) C(5,9) D(6, ) E(5, ) F(5,10) G(7, ) H(7,10) I(8,9) J(8, ) K(8,10) G(7, ) J(8, ) L(11, ) Insert L(11, ) What if an overfull node is full of live entries? Are there other situations when we do a time split? What does it give us? Simonas Šaltenis,

88 Invariants Weak version condition For each moment in time and for each non-root node, the number of alive entries is either zero or at least d (d = B/ k for some k). This guarantees that a time-slice for any time point sees an ephemeral structure that has O(B) fan-out! Queries have the same(good) worst-case performance The node is time-split when it is overfull or if weak version condition is violated. Strong version condition The number of alive entries in a node right after time-split is more than d+e and less than B-e (e = B/ j for some j) This ensures that the size of the index remains linear (why?) Simonas Šaltenis,

89 Algorithm Delete Insert Weak version underflow 1 Invariants OK AD Overflow Time split D Strong version underflow Time split sibling Merge two live nodes D Invariants OK A A a node of alive entries is bounded D a node of dead entries is bounded AD a mixed node is bounded Strong version overflow Key split live node A A Simonas Šaltenis,

90 Partially Persistent TPR-tree: TPBRs Straightforward TPBRs Double TPBRs with static "heads" x j Time of last update x j Double entry TPBR in TPR-tree x j x j x j x j Time of last update t CT t t CT t x j x j After insertion and time split x j x j Dead entry Time split Alive entry x j x j Time split Dead entry Double entry with an empty "head" t CT t t CT t Simonas Šaltenis,

91 Partially Persistent TPR-tree To make the TPR-tree partially persistent new double TPBRs are introduced for nodes with a mix of dead and alive entries Black box of ephemeral algorithms is opened Partial persistence algorithms have to be modified further to allow updating of the last-recorded trajectory segment (from infinite prediction to a finite line segment) Recall that time splits may have introduced a lot of copies of this live segment Simonas Šaltenis,

92 Exercise: Partially Persistent R-tree Do one operation per time unit: Insert p1, p, p1, p13, p5, p14; Delete p5, p; Insert p8; Delete p1; Insert p7, p11, p9. Assume: B = 6, m = 50% (3), d =, e = 1 p6 p1 p p4 p7 p3 p8 p5 p1 p9 p11 p10 p13 Simonas Šaltenis, 006 9

93 Not Covered Topics Uncertainty Either queries or objects are expanded Which of the covered indices support circle-objects? Update efficiency Using object-id-based look-ups in hash table to eliminate R-tree deletion multiple path searches [M. Lee et al., 003] Using buffer-tree [Arge, 1995, 1999] techniques Main-memory indexing Cache-conscious and cache-oblivious data structures Different types of queries (knn, closest pairs, etc.) Papadias et al. Infrastructure-constrained indexing Papadias et al., 003 Simonas Šaltenis,

94 Thank you How did you like the course? Acknowledgments Yufei Tao for his TPR*-tree slides Simonas Šaltenis,

Indexing the Positions of Continuously Moving Objects

Indexing the Positions of Continuously Moving Objects Indexing the Positions of Continuously Moving Objects Simonas Šaltenis Christian S. Jensen Aalborg University, Denmark Scott T. Leutenegger Mario A. Lopez Denver University, USA SIGMOD 2000 presented by

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Multidimensional Indexing The R Tree

Multidimensional Indexing The R Tree Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create

More information

Spatial Data Management

Spatial Data Management Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value

More information

Spatial Data Management

Spatial Data Management Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite

More information

Principles of Data Management. Lecture #14 (Spatial Data Management)

Principles of Data Management. Lecture #14 (Spatial Data Management) Principles of Data Management Lecture #14 (Spatial Data Management) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Project

More information

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

Abstract. 1. Introduction

Abstract. 1. Introduction Predicted Range Aggregate Processing in Spatio-temporal Databases Wei Liao, Guifen Tang, Ning Jing, Zhinong Zhong School of Electronic Science and Engineering, National University of Defense Technology

More information

R-Trees. Accessing Spatial Data

R-Trees. Accessing Spatial Data R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

R-Tree Based Indexing of Now-Relative Bitemporal Data

R-Tree Based Indexing of Now-Relative Bitemporal Data 36 R-Tree Based Indexing of Now-Relative Bitemporal Data Rasa Bliujūtė, Christian S. Jensen, Simonas Šaltenis, and Giedrius Slivinskas The databases of a wide range of applications, e.g., in data warehousing,

More information

What we have covered?

What we have covered? What we have covered? Indexing and Hashing Data warehouse and OLAP Data Mining Information Retrieval and Web Mining XML and XQuery Spatial Databases Transaction Management 1 Lecture 6: Spatial Data Management

More information

Data Organization and Processing

Data Organization and Processing Data Organization and Processing Spatial Join (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza Outline Spatial join basics Relational join Spatial join Spatial join definition (1) Given two sets

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair {machado,tron}@visgraf.impa.br Theoretical Models Random Access Machine Memory: Infinite Array. Access

More information

I/O-Algorithms Lars Arge

I/O-Algorithms Lars Arge I/O-Algorithms Fall 203 September 9, 203 I/O-Model lock I/O D Parameters = # elements in problem instance = # elements that fits in disk block M = # elements that fits in main memory M T = # output size

More information

Update-efficient indexing of moving objects in road networks

Update-efficient indexing of moving objects in road networks DOI 1.17/s177-8-52-5 Update-efficient indexing of moving objects in road networks Jidong Chen Xiaofeng Meng Received: 22 December 26 / Revised: 1 April 28 / Accepted: 3 April 28 Springer Science + Business

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Module 4: Tree-Structured Indexing

Module 4: Tree-Structured Indexing Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction

More information

Background: disk access vs. main memory access (1/2)

Background: disk access vs. main memory access (1/2) 4.4 B-trees Disk access vs. main memory access: background B-tree concept Node structure Structural properties Insertion operation Deletion operation Running time 66 Background: disk access vs. main memory

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing 15-82 Advanced Topics in Database Systems Performance Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries 2 Indexing

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Before ascending into d-dimensional space R d with d > 1, this lecture will focus on one-dimensional space, i.e., d = 1. We will review the B-tree, which is a fundamental

More information

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be

More information

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems

CMPUT 391 Database Management Systems. Spatial Data Management. University of Alberta 1. Dr. Jörg Sander, 2006 CMPUT 391 Database Management Systems CMPUT 391 Database Management Systems Spatial Data Management University of Alberta 1 Spatial Data Management Shortcomings of Relational Databases and ORDBMS Modeling Spatial Data Spatial Queries Space-Filling

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

UNIT III BALANCED SEARCH TREES AND INDEXING

UNIT III BALANCED SEARCH TREES AND INDEXING UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant

More information

Chap4: Spatial Storage and Indexing

Chap4: Spatial Storage and Indexing Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Information Retrieval. Wesley Mathew

Information Retrieval. Wesley Mathew Information Retrieval Wesley Mathew 30-11-2012 Introduction and motivation Indexing methods B-Tree and the B+ Tree R-Tree IR- Tree Location-aware top-k text query 2 An increasing amount of trajectory data

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification Extended R-Tree Indexing Structure for Ensemble Stream Data Classification P. Sravanthi M.Tech Student, Department of CSE KMM Institute of Technology and Sciences Tirupati, India J. S. Ananda Kumar Assistant

More information

Intersection Acceleration

Intersection Acceleration Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

I/O-Algorithms Lars Arge Aarhus University

I/O-Algorithms Lars Arge Aarhus University I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =

More information

Database index structures

Database index structures Database index structures From: Database System Concepts, 6th edijon Avi Silberschatz, Henry Korth, S. Sudarshan McGraw- Hill Architectures for Massive DM D&K / UPSay 2015-2016 Ioana Manolescu 1 Chapter

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Relational Database Support for Spatio-Temporal Data

Relational Database Support for Spatio-Temporal Data Relational Database Support for Spatio-Temporal Data by Daniel James Mallett Technical Report TR 04-21 September 2004 DEPARTMENT OF COMPUTING SCIENCE University of Alberta Edmonton, Alberta, Canada Relational

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

CMSC724: Access Methods; Indexes 1 ; GiST

CMSC724: Access Methods; Indexes 1 ; GiST CMSC724: Access Methods; Indexes 1 ; GiST Amol Deshpande University of Maryland, College Park March 14, 2011 1 Partially based on notes from Joe Hellerstein Outline 1 Access Methods 2 B+-Tree 3 Beyond

More information

Tree-Structured Indexes

Tree-Structured Indexes Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree

More information

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d ) kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against

More information

Chapter 25: Spatial and Temporal Data and Mobility

Chapter 25: Spatial and Temporal Data and Mobility Chapter 25: Spatial and Temporal Data and Mobility Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 25: Spatial and Temporal Data and Mobility Temporal Data Spatial

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,

More information

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta

Course Content. Objectives of Lecture? CMPUT 391: Spatial Data Management Dr. Jörg Sander & Dr. Osmar R. Zaïane. University of Alberta Database Management Systems Winter 2002 CMPUT 39: Spatial Data Management Dr. Jörg Sander & Dr. Osmar. Zaïane University of Alberta Chapter 26 of Textbook Course Content Introduction Database Design Theory

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Tree-Structured Indexes. Chapter 10

Tree-Structured Indexes. Chapter 10 Tree-Structured Indexes Chapter 10 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k 25, [n1,v1,k1,25] 25,

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Multidimensional Divide and Conquer 2 Spatial Joins

Multidimensional Divide and Conquer 2 Spatial Joins Multidimensional Divide and Conque Spatial Joins Yufei Tao ITEE University of Queensland Today we will continue our discussion of the divide and conquer method in computational geometry. This lecture will

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

School of Computing & Singapore-MIT Alliance, National University of Singapore {cuibin, lindan,

School of Computing & Singapore-MIT Alliance, National University of Singapore {cuibin, lindan, Bin Cui, Dan Lin and Kian-Lee Tan School of Computing & Singapore-MIT Alliance, National University of Singapore {cuibin, lindan, tankl}@comp.nus.edu.sg Presented by Donatas Saulys 2007-11-12 Motivation

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

R-Tree Based Indexing of Now-Relative Bitemporal Data

R-Tree Based Indexing of Now-Relative Bitemporal Data R-Tree Based Indexing of Now-Relative Bitemporal Data Rasa Bliujūtė Christian S. Jensen Simonas Šaltenis Giedrius Slivinskas Department of Computer Science Aalborg University, Denmark rasa, csj, simas,

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Massive Data Algorithmics

Massive Data Algorithmics Database queries Range queries 1D range queries 2D range queries salary G. Ometer born: Aug 16, 1954 salary: $3,500 A database query may ask for all employees with age between a 1 and a 2, and salary between

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

Chapter 12: Indexing and Hashing (Cnt(

Chapter 12: Indexing and Hashing (Cnt( Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information

Spatial Data Structures

Spatial Data Structures CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 14 B + trees, multi-key indices, partitioned hashing and grid files B and B + -trees are used one implementation

More information

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees

More information

Advanced Data Types and New Applications

Advanced Data Types and New Applications Advanced Data Types and New Applications These slides are a modified version of the slides of the book Database System Concepts (Chapter 24), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Topics to Learn. Important concepts. Tree-based index. Hash-based index CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary index vs. secondary index (= clustering index vs. non-clustering index) Tree-based vs. hash-based index Tree-based

More information

Dongmin Seo, Sungho Shin, Youngmin Kim, Hanmin Jung, Sa-kwang Song

Dongmin Seo, Sungho Shin, Youngmin Kim, Hanmin Jung, Sa-kwang Song Dynamic Hilbert Curve-based B + -Tree to Manage Frequently Updated Data in Big Data Applications Dongmin Seo, Sungho Shin, Youngmin Kim, Hanmin Jung, Sa-kwang Song Dept. of Computer Intelligence Research,

More information

Balanced Trees Part One

Balanced Trees Part One Balanced Trees Part One Balanced Trees Balanced search trees are among the most useful and versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke Access Methods v File of records: Abstraction of disk storage for query processing (1) Sequential scan;

More information

Spatial Queries. Nearest Neighbor Queries

Spatial Queries. Nearest Neighbor Queries Spatial Queries Nearest Neighbor Queries Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer efficiently point queries range queries k-nn

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

TPR tree Running in Main Memory

TPR tree Running in Main Memory AALBORG UNIVERSITY Department of Computer Science TPR tree Running in Main Memory Beata Nitkiewicz Michał Pańczyk August 2006 1 2 Abstract As the branch of business, where locations based services matter,

More information

Physical Database Design: Outline

Physical Database Design: Outline Physical Database Design: Outline File Organization Fixed size records Variable size records Mapping Records to Files Heap Sequentially Hashing Clustered Buffer Management Indexes (Trees and Hashing) Single-level

More information

The B dual -Tree: Indexing Moving Objects by Space Filling Curves in the Dual Space

The B dual -Tree: Indexing Moving Objects by Space Filling Curves in the Dual Space The VLDB Journal manuscript No. (will be inserted by the editor) Man Lung Yiu Yufei Tao Nikos Mamoulis The B dual -Tree: Indeing Moving Objects by Space Filling Curves in the Dual Space Received: date

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Priority Queues and Binary Heaps

Priority Queues and Binary Heaps Yufei Tao ITEE University of Queensland In this lecture, we will learn our first tree data structure called the binary heap which serves as an implementation of the priority queue. Priority Queue A priority

More information

Data Structures for Moving Objects

Data Structures for Moving Objects Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part IV Lecture 14, March 10, 015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part III Tree-based indexes: ISAM and B+ trees Data Warehousing/

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information

More information

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR

Query Processing and Advanced Queries. Advanced Queries (2): R-TreeR Query Processing and Advanced Queries Advanced Queries (2): R-TreeR Review: PAM Given a point set and a rectangular query, find the points enclosed in the query We allow insertions/deletions online Query

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

Spatial Data Structures

Spatial Data Structures Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster

More information