Indexing of moving objects. Simonas Šaltenis Aalborg University

Size: px

Start display at page:

Download "Indexing of moving objects. Simonas Šaltenis Aalborg University"

Arthur Wood
5 years ago
Views:

1 Indexing of moving objects Simonas Šaltenis Aalborg University

2 Welcome! The audience of the course, what this course is about and what it is not about: research results not how to use commercial DBMSs, but we will talk about them too Important: ideas Agenda: Indexing of current time and future: TPR-trees, TPR*-trees, STRIPES Indexing of the past: B x -trees, BB x -trees, 3D R-trees, Partial persistence Acknowledgment Yufei Tao (TPR*-tree slides) Simonas Šaltenis, 006

3 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 3

4 Why Moving Objects? Position-aware, online, moving objects are enabled by the following trends. Miniaturization of electronics Advances in positioning systems (e.g., GPS, assisted GPS,...) Advances in wireless communications Examples of position-aware online moving objects GPS-enabled mobile-phones, as well as diverse types of personal digital assistants (online cameras, wrist watches, etc.). The coming years will witness very large quantities of these. Vehicles, including cars, public transportation, recreational vehicles, sea vessels, airplanes, etc. Simonas Šaltenis, 006 4

5 Sample Location-Based Services Context-aware Location-Based Services: Location-aware advertising Integrated tourist services Traffic coordination and management Identification of impending traffic jams and expected fastest routes between positions Safety-related services It is possible to monitor tourists traveling in dangerous environments, and then react to emergencies. Games! Sensor-networks also generates MO data! Monitoring of any kind-of continuous variables, e.g., temperature, pressure (1D, nd). Simonas Šaltenis, 006 5

6 Requirements There is a need to track the positions of large amounts of objects in a database main-memory solutions may not always suffice, especially if history needs to be recorded There is a need for efficient processing of queries: Find all mobile-phones that are currently no further than 500m from the supermarket Find all mobile-phones that will be no further than 500m from the supermarket in 10 minutes (according to the current knowledge of their movement) Find ten objects that are currently closest to the supermarket Find all mobile-phones that were no further than 500m from the supermarket last Saturday between 10 and 11 a.m. Simonas Šaltenis, 006 6

7 Rules of the game Measure of efficiency number of I/O operations One I/O operation reads or writes a page of data (~Kb 16Kb) from the disk to memory A page of data = a node of data structure (index) consisting of a number B of entries The size of the data set in disk blocks n = N/B CPU cost is mostly neglected Operations that have to be supported: Different kind of queries (that's why we want an index) Insertions Deletions Update = Deletion + Insertion Bulk-loading the index Simonas Šaltenis, 006 7

8 Multidimensional DS Multidimensional Data Structures are used to answer queries on the set of stationary multidimensional points Range query ([x 1, x ], [y 1,y ]): find all points (x, y) such that x 1 <x<x and y 1 <y<y, and do it faster than a linear scan (Θ(n)) preferably O(log n) in worst-case or at least on average Simonas Šaltenis, 006 8

9 1D Range-Searching: B-trees B-trees [Bayer & McCreight, 197] a balanced diskbased search tree B+-trees All data entries (keys) are in leaf-nodes Based on an ordering < Simonas Šaltenis, 006 9

10 B-trees (cont.) Internal (directory) nodes direct queries and point to lower-level nodes Leaf nodes include actual keys and pointers to data items Guaranteed 50% capacity by insert and delete algorithms split / merge routines Range query: find x, such that x 1 <x<x Can be done in O(log n + k) worst-case time, where k is the size of the answer in disk blocks x 1 x Simonas Šaltenis,

11 Multidimensional Range Searching Much harder than one-dimensional problem A lot of multidimensional index methods have been proposed The theoretically optimal ones [Arge et al. 1999] that achieve O(log n + k) worst-case query performance: are unpractically complex, require a little bit more than linear-space Easy to implement, linear-space methods: K-D-B-trees, Quadtrees, Grid Files (for points) R-trees and variations (work for non-point objects as well) Concurrency control, recovery protocols for R-trees are well researched Implemented in Oracle, IBM Informix, PostgreSQL, MySQL Not only range queries, but a wide variety of other types of queries are supported by R-trees (nearest-neighbor queries, joins) Simonas Šaltenis,

12 Spatial Indexing With the R-Tree Example R1 p6 Query R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p1 p p3 p4 Pointers to data items p5 p6 p7 p8 p9 p10p11 p1p13 Simonas Šaltenis, 006 1

13 R-tree Properties Leaf entry = <n-dimensional point, rid> Internal entry = <n-dim MBR, ptr to a child node> MBR a Minimum Bounding Rectangle of all points in the sub-tree pointed to by ptr R-tree is a balanced tree all leaves are at same depth from the root Nodes are kept at least m% full (except root) through insertion and deletion algorithms m is usually chosen to be 40%. m is the minimum fill factor, depending on the workload the average fill factor is usually 70%. Simonas Šaltenis,

14 Range Query in R-trees Range query: RangeQuery(Q, root) RangeQuery(Q, node) If node is internal, for each entry <MBR, ptr> in node: if MBR overlaps Q, RangeQuery(Q, ReadNode (ptr)) If node is leaf, for each entry <E, rid> in node: if E overlaps Q, report rid Note: We may have to search several subtrees at each node! What about B-trees? Worst-case performance O(n)! But in practice, R-trees exhibit good query performance for various data sets How do we insert and delete? Simonas Šaltenis,

15 Grow-Post trees Grow-Post trees: generalized R-tree-type indexes Bounding predicate (BP) = something that describes entries in a subtree Building blocks of algorithms: Consistent(BP, Q) returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q) PickSplit(node) splits a page of entries into two groups Penalty(BP, E) returns an estimate how worse BP becomes if E is inserted under it Union(node) computes a BP of a collection of entries (in the R-tree, computes an MBR minimum and maximum in all dimensions ) Internal Nodes Leaf Nodes BP 1 BP..... BP n Simonas Šaltenis,

16 Insertion Insert(E) leaf = ChoosePath(E, root) Insert E into leaf PropogateUp (leaf) ChoosePath(E, node) If node is leaf, return node. From all entries in node, choose entry <MBR, ptr> with the smallest Penalty (MBR, E). ChoosePath(E, ReadNode(ptr)). PropogateUp(node) If node is overfull, call PickSplit(node) to produce n1 and n, replace node s old entry in its parent by e1 = Union(n1), e = Union(n), call PropogateUp(node's parent) Else if e = Union(node) is different from node s old entry in its parent, replace the old entry with e, callpropogateup(node's parent). Create a new root with two entries whenever a root is split. Simonas Šaltenis,

17 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree s Penalty. R1 p6 R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p14 p1 p p3 p4 Pointers to data tuples p5 p6 p7 p8 p9 p10p11 p1p13 Simonas Šaltenis,

18 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree s Penalty. R1 p6 R3 p1 p R4 p4 R5 p7 R1 R p3 p5 p8 R3 R4 R5 R6 R7 R R7 p1 R6 p9 p11 p10 p13 p14 p1 p p3 p4 Pointers to data tuples p5 p6 p7 p8 p9 p10p11 p1p13 p14 Simonas Šaltenis,

19 PickSplit The entries in a node node plus the newly inserted entry must be distributed between n1 and n The goal is to reduce a likelihood of both n1 and n being searched on subsequent queries Again we use heuristics of minimum area. Redistribute so as to minimize the area of n1 plus the area of n. Problem there is an exponential number of possible distributions of entries to consider. Solutions: Guttman s quadratic gready algorithm Good split R*-tree s algrithm Bad split Simonas Šaltenis,

20 Deletion Delete(E) Using the search procedure, find a leaf where entry E is located Remove E from leaf. Call PropogateUp(leaf). PropogateUp(leaf) 1. If leaf is underfull, deallocate the node leaf remove leaf s entry in its parent, call PropogateUp(leaf s parent), and reinsert all leaf s entries.. Else if e = Union(leaf) is different from leaf s old entry in its parent, replace the old entry with e, call PropogateUp (leaf s parent). No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine. Simonas Šaltenis, 006 0

21 Comments on R-Trees Works well for - 4 D datasets. Several variants (notably, R + and R*-trees) have been proposed; widely used Supports a wide variety of queries Point / range queries Spatial join queries [Brinkhoff et al., 1993] Direction, topological, distance queries [Papadias et al., 1995] k- Nearest neighbor queries [Roussopoulos et al., 1995] Simonas Šaltenis, 006 1

22 Exercises R-trees Insert points p1, p, p1, p13, p5, p14, p8, p7, p11, p9 from slide 17 into an initially empty R-tree with the node capacity of 3 entries and minimum allowed fan-out of. Use Guttman s quadratic split algorithm. Would you build a better tree from scratch if all points were known in advance? What is the difference between the B-tree and the 1D R-tree? What kind of data the 1D R-tree can store that the B-tree can not? What is the bounding predicate in B-trees? Consider nearest neighbor queries ( find a data point that is nearest to the query point ) in R-trees: Can they be implemented redefining the Consistent(E, Q) method, do you need something more? Simonas Šaltenis, 006

23 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 3

24 Problem Statement We address the problem of indexing the ever-changing current and predicted future positions of point objects moving in one, two, and three-dimensional space. Indexing challenges specific to MOs continuous change of positions extrapolation between the last update and the current time must be supported hyper-dynamic workloads high-rates of updates Simonas Šaltenis, 006 4

25 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data Simonas Šaltenis, 006 5

26 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions. We use linear functions to capture the present and future positions. x( t) = x( t0) + v( t t0), where t now Updates are less frequent Tentative future queries are supported t 0 For example, given, the current and anticiapted, future position of a twodimensional point can be described by four parameters. x ( t0), y( t0), v x, v y Simonas Šaltenis, 006 6

27 Modeling Continuous Movement Three ways to think about continuously moving points in d-dimensional space: Lines in (d+1)-dimensional space d spatial dimensions and 1 time dimension Points in d-dimensional space d spatial and d velocity dimensions (function parameters: ) Time-parameterized points in d-dimensional space x t ), v ( 0 x x(t 0 ) o 3 o o 1 o o 1 o t v Simonas Šaltenis, 006 7

28 Queries Type 1: objects that intersect a given rectangle at t x Type : objects that 6 o 3 intersect a given 5 4 rectangle sometime 3 o from t1 to t o 1 o Type 3: objects that 1 1 o intersect a given 4 moving rectangle t sometime between t 1 andt We can expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t 1, t <= CT + W Simonas Šaltenis, 006 8

29 Time-Parameterized Rectangles The TPR-tree [Šaltenis et al. 000] is based on the R- tree. Moving points are bounded with time-parameterized rectangles. Are bounding from now on. The R-tree allows overlap. The tree employs conservative bounding rectangles. Union( node) : min i c i c max i c i c { } x ( t ) = min o. x ( t ) o node { } x ( t ) = max o. x ( t ) o node { } v = min o. v o node { } i Simonas Šaltenis, min i v = max o. v o node max i i

30 Entry Structure, Querying Do we need to store t c in each entry? No we use one common reference time t r (the same for data points): x = x ( t ) = x ( t ) + v ( t t ) Entry structure: <TPBR, ptr> min min min min i i r i c i r c x = x ( t ) = x ( t ) + v ( t t ) max max max max i i r i c i r c min max min max min max min max TPBR = MBR,VBR = ( x, x, x, x ),( v, v, v, v ) At any t > CT we can get a valid R-tree: TPR-tree(t) = R-tree x () t = x ( t ) + v ( t t ) min min min i i r i r x () t = x ( t ) + v ( t t ) max max max i i r i r Simonas Šaltenis,

31 Tightening Ideally, bounding rectangles should be always minimal. Excessive storage cost x t TPBRs are tightened (i.e., recomputed) whenever a bounded node is modified Simonas Šaltenis,

32 Tightening Ideally, bounding rectangles should be always minimal. Excessive storage cost x t TPBRs are tightened (i.e., recomputed) whenever a bounded node is modified Simonas Šaltenis, 006 3

33 Simonas Šaltenis, Insertion: Grouping Points How to group moving points (Penalty and PickSplit)? The R-tree s algorithms minimize characteristics of MBRs such as area, overlap, and margin. How does that work for moving points?

34 Insertion in the TPR-Tree The bounding rectangle characteristics (area, overlap, and margin) are functions of time. The goal is to minimize these for all time points from now to now+h. Minimizing the characteristics for time now + H/ does not work (e.g., the area of a conservative bounding rectangle is not linear). We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals. now+ H now A ( t) dt, where A(t) is, e.g., the area of an MBR What H to use? H depends on the update rate, and on how far queries may reach into the future (W). Simonas Šaltenis,

35 Example I We illustrate the working of the TPR-tree by means of an example. The subsequent figures are generated automatically, by the index code used for performance experiments. Data 0 one-dimensional points are used. Index Parameters Page size = 64 (5 entries in leaf nodes and 3 in non-leaf nodes). H = 8. Simonas Šaltenis,

36 Example II CT = 0 At CT=1, the point at x = 0, v = 0 is updated to have x = 18.5, v = Simonas Šaltenis,

37 Example III CT = 1 Inserting a moving point at position 14 with v = 0.5. Simonas Šaltenis,

38 Example IV After insertion Simonas Šaltenis,

39 What H to use? Intuitivly: we want H to be equal to the time during which queries will see the node that we consider modifying: H depends on the update rate, and on how far queries may reach into the future (W) Experiments show that H = UI + W consistently gives good query performance (UI average update interval) The system can track UI automatically (How?) W is usually smaller than UI can be tracked automatically too Simonas Šaltenis,

40 TPR-Tree Bulk-Loading I Regular R-tree bulk loading aims at dividing the space into roughly equal, square MBRs (e.g., STR). It is a straightforward approach to use STR, interpreting the d-dimensional moving points as d-dimensional points. The ratio between the spatial and velocity extents of BRs is important. α We introduce a parameter that expresses how many times the velocity extents of BRs are longer than their spatial extents. Given a fixed number of BRs to produce, we can make them large in the spatial dimensions, but small in the velocity dimensions (i.e., growing slowly). This corresponds to a small α. make them small in the spatial dimensions, but large in the velocity dimensions (i.e., fast growing). This corresponds to a large. α Simonas Šaltenis,

41 TPR-Tree Bulk Loading II Example: Using either initially large, but slow-growing, or initially small, but fast-growing, time-parameterized BRs. x(t 0 ) V s S v x t x(t 0 ) t α = 1 (good for a large H) 4 α = 1 Goal: find an optimal α as a function of H. v x (good for a small H) Simonas Šaltenis,

42 TPR-Tree Bulk Loading III One-dimensional uniform data Let k be the number of nodes and the space-time ratio S V SV k = s =. Area (length) A( t) = s + tsα. s sα kα H SV We integrate A( t) : I = s (1 + tα ) dt = ( H + 0 kα I To minimize I, we solve = 0 α = α H For D data,, and for 3D data, α α = H H gets smaller with increasing dimensionality Intuitive as hyper-volumes grow faster in more dimensions α H α). α H Simonas Šaltenis, 006 4

43 TPR-Tree Bulk Loading IV A modified STR is used that achieves the desired -ratio. Example: 000 points, 0 points per leaf-node. α H = 4, α = 1 H = 16, α = 1 8 Simonas Šaltenis,

44 What about Expanding BRs? Will the expanding TPBRs ruin the performance? That's what the experiments show: Settings: objects update only once per hour! D data, with W = 40. Due to the constant influx of updates, the performance of the TPR-tree does not degrade after reaching a certain level. Simonas Šaltenis,

45 Summary The TPR-tree indexes the current and predicted future positions of moving objects. The TPR-tree is based on the proven, widely used R-tree technology The tree extends the R*-tree by introducing conservative, timeparameterized bounding rectangles, which are tightened regularly. The tree s algorithms use integrals of area, overlap, etc. The tree can be tuned to take advantage of a specific update rate and querying window length. Other types of queries that are supported by the R-tree can be supported by the TPR-tree, e.g., nearest-neighbor queries. Simonas Šaltenis,

46 Exercises TPR-tree Write a Consistent (Q, E) function for a d-dimensional TPR-tree. E = [ x, x ],...,[ x, x ];[ v, v ],...,[ v, v ], 1 1 d d 1 1 d d Q= [ q, q ],...,[ q, q ];[ w, w ],...,[ w, w ];[ t, t ] 1 1 d d 1 1 d d where [ x, x ], and [ q, q ] are TPBRs' extents at time t. i i i i Consider SR-tree, where instead of MBR s minimum bounding circles (spheres) are used. Define the bounding predicate (and a Consistent (Q, E) function) of a time parameterized SR-tree. Simonas Šaltenis,

47 TPR*-tree Observation: TPR-tree integrated heuristics optimize the tree for time-slice queries (Why?) What if we want to target window and moving queries? A query is given as a TPBR plus a time window: Q = ( qx, qx, qx, qx ),( qv, qv, qv, qv ),( t, t ) min max min max min max min max min max TPR*-tree [Tao et al., 003]: Uses heuristics targeted for window and moving queries Uses other (structural) improvements of algorithms: Modified ChoosePath Modified PickSplit and PickWorst (for reinsertions) Active tightening during deletions Simonas Šaltenis,

48 Sweeping Region Heuristics Once again: What does it mean for a static rectangle to intersect with a static query? y axis y axis O Q O' Q center x axis i x axis Simonas Šaltenis,

49 Sweeping Region Heuristics What does it mean for a TPBR O to intersect with a moving query at some time point t? The same as for the TPBR O' to intersect the static query centroid: MBR of O' on i-th axis: VBR of O' on i-th axis: q q x x + min max max min ( v qv, v qv ) min max ( i, i i i ) i i i i y axis O -1 Q y axis O' 3 3 Q center x axis i x axis Simonas Šaltenis,

50 Sweeping Region Heuristics What does it mean for a TPBR O to intersect with a moving query at any point in time interval (t min,t max )? The same as for the sweeping region of TPBR O' to intersect the static query centroid y axis O' 3 Q center i x axis Simonas Šaltenis,

51 Sweeping Region Heuristics Instead of integrals of TPR-tree, areas of sweeping regions are used: E.g.: enlargment of the integral of area of O is replaced by the enlargment of the area of the sweeping region of O' Assumption about query: static point Compare these two heuristics: What happens if we have a static MBR? What happens if we have a moving MBR that does not change area? Does this confirm the intuition that these heuristics optimize for differnet queries? Simonas Šaltenis,

52 TPR Problem: ChoosePath To insert an entry, the TPR-tree picks the sub-tree incurring the minimum penalty y axis the (absolute) values of all velocities are 1 b a g y axis b g p d 4 i (static) e f c time 0 d h x axis What would a TPR-tree do? Simonas Šaltenis, i e f c inserting p at time May result in inserting an entry into a bad sub-tree a h x axis

53 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm) y axis i b e f c inserting p at time g p a h d x axis Maintain a heap: [(g),0], [(h),0], [(i),0] the path expanded so far the accumulated penalty so far Simonas Šaltenis,

54 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). 10 y axis Visit node g: 8 6 b g p d [(h),0], [(a,g),3], [(i),0], [(b,g),3] 4 i e f c inserting p at time a h x axis complete paths already although nodes a and b are not visited Simonas Šaltenis,

55 TPR* solution: ChoosePath Aims at finding the best insertion path globally, namely, among all possible paths. Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm) y axis i b e f c inserting p at time g p a h d x axis Visit node h: [(a,g),3], [(d,h),9], [(c,h),17], [(i),0], [(b,g),3] The algorithm stops now. Simonas Šaltenis,

56 TPR* solution: ChoosePath It is more expensive. Why does it make sense to do? Query performance is obviously improved Insertions are more expensive But overall update improves: Remember: update is deletion + insertion The main part of the cost of deletion (and all update) is search, which we improved with the new ChoosePath! Simonas Šaltenis,

57 TPR: Tightening MBRs in deletion Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries. Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f) y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a e deleting e at time 1 d c j x axis Simonas Šaltenis,

58 TPR: Tightening MBRs in deletion Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries. Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f) y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a d c j x axis after deleting e at time 1 Simonas Šaltenis,

59 TPR* solution: Active tightening Tightening more entries for free. Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i y axis the (absolute) values of all velocities are 1 f h g a b i e c time 0 d j x axis y axis h i g b f a e deleting e at time 1 d c j x axis Simonas Šaltenis,

60 TPR* solution: Active tightening Tightening more entries for free. Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i y axis the (absolute) values of all velocities are 1 f g i e d y axis i g f 4 h a b c j 4 b h a d c j time 0 x axis x axis after deleting e at time 1 Simonas Šaltenis,

61 TPR* solution: Active tightening Another example: Assume the shaded nodes are accessed to find e. The active tightening can tighten the MBR of n 5, n 6, n 3, and n 4. But not n 1 and n. Simonas Šaltenis,

62 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis, 006 6

63 Modeling Continuous Movement Three ways to think about continuously moving points in d-dimensional space: Lines in (d+1)-dimensional space d spatial dimensions and 1 time dimension Points in d-dimensional space d spatial and d velocity dimensions (function parameters: x ( t 0 ), v ) Time-parameterized points in d-dimensional space our approach x o 3 o o t o x(t 0 ) o 1 o v Simonas Šaltenis,

64 STRIPES Using Duality The main idea of STRIPES [Patel et al., 004] is to transform queries and points using the duality transformation: Lines become points, points become lines Trajectories become points Queries become region queries Use Quadtrees to index points That's it! Simonas Šaltenis,

65 Quadtrees Quadtree a four-way partition tree Linear space Good average query performance Orcale has them a b 1 c d e c f 3 f 4 d e b a Simonas Šaltenis,

66 Query Transformation Exercise Let's see how a window query Q = ([x l, x u ],[t l, t u ]) is transformed: x u x l 1 x x(t 0 ) o ? t v t i t u 1 o 1 o Simonas Šaltenis,

67 STRIPES: Summary Avoids the problem of expanding TPBRs Uses reference positions: No tightening is performed Has to assume L (the longest update interval) and to maintain two indixes: from t ref to t ref + L, and from t ref + L to t ref + L Index is not tuned to a specific horizon H (determined by updating and querying pattern) Remember TPR-trees bulk-loaded with different values of the α parameter (corresponding to the different values of H)! Simonas Šaltenis,

68 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

69 B x -tree [Jensen et al., 004] The goal: to index the current and future positions of moving objects using B-trees B-trees are at the core of every DBMS Very good concurrency-protocols are available for B-trees (and are implemented in all DBMS) improtant when we track a lot of objects B-tree updates are very efficient and we have a lot of them But how do we store and query these future linear trajectories of objects? Key ideas used: Space-filling curves Query expansion Simonas Šaltenis,

70 Space-filling curves How can we index multi-dimensional points with B-trees? Space-filling curve: An infinite curve that visits all points of a discretized space (usually one quadrant (if D) of space) In this way it uniquely maps multi-dimensional points to natural numbers Building Hilbert curve: (3,4) Simonas Šaltenis,

the B + - tree storing the transformed points?

71 Transforming Queries Other space-filling curves can be constructed, e.g., Peano curve (Z-ordering) How do we perform range queries on the B + - tree storing the transformed points? Multi-dimensional range query is transformed into a set of intervals Simonas Šaltenis,

72 Enlarging Queries What about moving points? Just index their positions re-computed at some reference time point (t ref ) How do we ask range queries then? Let's assume we know the VBR of all points, i.e., maximum velocities of objects in all four directions: v l 1,v u 1,v l,v u Insight: If the query time is t q, we can expand the query to catch all necessary objects, knowing that objects traveling at maximum speed v, could have moved not further than v t ref t q. Simonas Šaltenis, 006 7

73 Multiple B-trees Problem: we may need to expand queries too much if the difference t ref t q is large. Idea: divide objects into n + 1 groups use a separate B- tree with a separate t ref for each group! Example: n + 1 = Only half of the points are queried with the maximum enlargment. What n to choose? t 1 t t q Large n: Smaller average query enlargment But, more queries are performed in more B-trees Authors propose to use n = Simonas Šaltenis,

74 Index Structure All three B + -trees are combined into one that has three partitions: 0, 1, and. Objects are assigned to partitions based on their last update time-stamps t u. Let's assume that all objects update more frequently than every Δt mu time units: t lab =t u t mu /n l partition= t lab / t mu /nmod n1 xrep= xvaluexv t lab t u The key then is (partition,xrep) Simonas Šaltenis,

75 Technicalities If an object does not update itself in Δt mu, it is migrated from partition 0 to partition n. Instead of using maximum VBR of all objects for expansion: The space is divided into a grid of cells, each sell maintains a VBR of objects in that cell (in main memory). The global VBR is used to expand the query, then local VBR is computed from the covered cells. The local VBR is used to recompute a possibly smaller expansion of query. The filtering of query results has to be done at the end. Simonas Šaltenis,

76 Exercise Which objects will be returned by the query q? Which are false drops? What is the precision of the query? Assume global VBR for query expansions Positions of objects are given at update times (given after comma) E, C,4 q B,4 A,1 D, Δt mu =6 t q =7 0 3 CT 6 9 time Simonas Šaltenis,

77 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

78 Trajectory data Positions between measurements are linearly interpolated. What we have are 3D polypines Two spatial dimensions + one time dimension The same types of queries as for the current-future data Simonas Šaltenis,

79 Recording of Movement How do we get the data? It is given as a static file of positions or new data is constantly appended to the end If new data is appended to the end: The last segment is probably an infinite trajectory based on the given velocity vector We have to assume that an object is somewhere between the last update and the current time Then, whenever an update happens, we have to update the last infinite peace of trajectory into a new peace of trajectory and insert a new infinite peace of trajectory Simonas Šaltenis,

80 Recording of Movement: Example pos Linear prediction of future movement Recorded polyline u1 u u3 u4 u5 CT time Simonas Šaltenis,

81 R-tree Based Indexing of Trajectories We can use a 3D R-tree! But Line segments (especially long ones) are difficult to index because they introduce large MBRs with a lot of overlap We can cut long line segments into smaller the index becomes larger STR-tree and TB-tree [Pfoser et al., 000]: Modifications of 3D R-tree, that strive for trajectory preservation (segments of the same trajectory should be clustered together): Heuristics of the tree changed In the TB-tree, a linked list conects line segments in the leaf level into a trajectory Trajectory queries are supported: Find all the taxis that on Monday at 10:00am were 1km around Fredrik Bajers Vej 7 and then drove to the city center Problem historical time-slice queries perform worse and worse, the more history you accumulate. Simonas Šaltenis,

82 BB x -tree It is easy to see how the B x -tree can be modified to record history: Have a separate B + -tree for each partition objects updated during [t ref Δt mu /n; t ref ] are stored in this B + -tree according to their positions at t ref Entry structure: (xrep, t start, t end, ptr), xrep is used as a key, for internal nodes t start and t end is the minimum and the maximum of timestamps of children. Simonas Šaltenis, 006 8

83 BB x -tree A query accesses those B + -trees whose lifespans overlap with the query BB x -tree can support both historical and future queries Technicalities: Together with each tree the histogram of VBRs is stored If an object does not update itself in Δt mu, it is migrated to the last partition: But it is not easy to do this, some sort of trigers are needed ( priority queue) Simonas Šaltenis,

84 Outline Motivation Background Multidimensional index structures R-tree Indexing the current and future positions: TPR-tree and TPR*-tree STRIPES B x -tree Trajectory indexing: Straight-forward R-tree-based approaches BB x -tree Partial persistence Conclusion of the course Simonas Šaltenis,

85 Partial Persistence Data structures that do not record history of changes are called ephemeral data structures (e.g., B-tree, R-tree) The technique of partial persistence transforms an ephemeral data structure into a partially persistent data structure which records the history and: Answers historical time-slice queries with the same asymptotic worst-case performance as an ephemeral sturcture Has the size proportional to the number of updates How is this achieved? Let's look at R-trees Entry structure: (point/mbr,t start, t end, ptr) For data (leaf) entries t start is the insertion time, t end is the deletion time. For non-leaf entries t start is the time when the ptr node was created, t end is the time when the ptr node was time split Simonas Šaltenis,

86 Time Split An entry is alive if t end =, else it is dead. The key operation is time spliting of a node A: A new node is created and all live entries from A are copied to the new node. No cahnges are done to A, and it will never change again, t end of the live parent entry of A is set to the current time, a new live entry is created pointing to the new node. A node is called dead if it was time split. When the root is time split, a pointer to it is put in a root array Queries consider only entries live at the time of query Simonas Šaltenis,

87 Time Split: Example Time spliting a node when it is overfull (B = 5), CT = 11: A(5,10) B(7, ) A(5,10) B(7, ) M(11, ) C(5,9) D(6, ) E(5, ) F(5,10) G(7, ) H(7,10) I(8,9) J(8, ) K(8,10) C(5,9) D(6, ) E(5, ) F(5,10) G(7, ) H(7,10) I(8,9) J(8, ) K(8,10) G(7, ) J(8, ) L(11, ) Insert L(11, ) What if an overfull node is full of live entries? Are there other situations when we do a time split? What does it give us? Simonas Šaltenis,

88 Invariants Weak version condition For each moment in time and for each non-root node, the number of alive entries is either zero or at least d (d = B/ k for some k). This guarantees that a time-slice for any time point sees an ephemeral structure that has O(B) fan-out! Queries have the same(good) worst-case performance The node is time-split when it is overfull or if weak version condition is violated. Strong version condition The number of alive entries in a node right after time-split is more than d+e and less than B-e (e = B/ j for some j) This ensures that the size of the index remains linear (why?) Simonas Šaltenis,

89 Algorithm Delete Insert Weak version underflow 1 Invariants OK AD Overflow Time split D Strong version underflow Time split sibling Merge two live nodes D Invariants OK A A a node of alive entries is bounded D a node of dead entries is bounded AD a mixed node is bounded Strong version overflow Key split live node A A Simonas Šaltenis,

90 Partially Persistent TPR-tree: TPBRs Straightforward TPBRs Double TPBRs with static "heads" x j Time of last update x j Double entry TPBR in TPR-tree x j x j x j x j Time of last update t CT t t CT t x j x j After insertion and time split x j x j Dead entry Time split Alive entry x j x j Time split Dead entry Double entry with an empty "head" t CT t t CT t Simonas Šaltenis,

91 Partially Persistent TPR-tree To make the TPR-tree partially persistent new double TPBRs are introduced for nodes with a mix of dead and alive entries Black box of ephemeral algorithms is opened Partial persistence algorithms have to be modified further to allow updating of the last-recorded trajectory segment (from infinite prediction to a finite line segment) Recall that time splits may have introduced a lot of copies of this live segment Simonas Šaltenis,

92 Exercise: Partially Persistent R-tree Do one operation per time unit: Insert p1, p, p1, p13, p5, p14; Delete p5, p; Insert p8; Delete p1; Insert p7, p11, p9. Assume: B = 6, m = 50% (3), d =, e = 1 p6 p1 p p4 p7 p3 p8 p5 p1 p9 p11 p10 p13 Simonas Šaltenis, 006 9

93 Not Covered Topics Uncertainty Either queries or objects are expanded Which of the covered indices support circle-objects? Update efficiency Using object-id-based look-ups in hash table to eliminate R-tree deletion multiple path searches [M. Lee et al., 003] Using buffer-tree [Arge, 1995, 1999] techniques Main-memory indexing Cache-conscious and cache-oblivious data structures Different types of queries (knn, closest pairs, etc.) Papadias et al. Infrastructure-constrained indexing Papadias et al., 003 Simonas Šaltenis,

94 Thank you How did you like the course? Acknowledgments Yufei Tao for his TPR*-tree slides Simonas Šaltenis,

Indexing the Positions of Continuously Moving Objects

Indexing the Positions of Continuously Moving Objects Simonas Šaltenis Christian S. Jensen Aalborg University, Denmark Scott T. Leutenegger Mario A. Lopez Denver University, USA SIGMOD 2000 presented by