Incremental Sub-Trajectory Clustering of Large Moving Object Databases

Size: px
Start display at page:

Download "Incremental Sub-Trajectory Clustering of Large Moving Object Databases"

Transcription

1 Incremental Sub-Trajectory Clustering of Large Moving Object Databases Information Management Lab (InfoLab) Department of Informatics University of Piraeus Nikos Pelekis Panagiotis Tampakis Marios Vodas Yannis Theodoridis August 2013

2 Table of Contents 1. INTRODUCTION RELATED WORK THE RETRATREE STRUCTURE RETRATREE ALGORITHMS... 8 REFERENCES... 12

3 1. INTRODUCTION Huge volumes of location information are available nowadays due to the rapid growth of positioning devices (GPS-enabled smartphones and tablets, on-board navigation systems in vehicles, vessels and planes, smart chips for animals, etc.). In the near future, it is unavoidable that this explosion will contribute in what is called the BIG DATA era, raising high challenges for the data management research community. Assume, for example, the following scenario: Location-based Services (LBS) users transmit their location to a central LBS server asynchronously and in batch mode; from the server side, a Moving Object Database (MOD) system is responsible for organizing users traces in terms of trajectories; for providing high quality services, the server executes extensive querying and mining processes on the trajectory database stored in the MOD engine. Clear challenges arise from the above scenario: Challenge 1: Since users (the data producers) transmit their location information in batch mode / asynchronously, the underlying index that supports MOD query processing should be able to handle this kind of information transmission. Challenge 2: Mining operations in MOD should be treated as first-class citizens, at least as of the same class with querying operators. To achieve this, it is objective that the mining functionality be provided as query operators function in a MOD engine. Challenge 3: Unlike BIRCH and CURE for relational data, in the MOD domain, there is a lack of an efficient incremental clustering algorithm. The incremental characteristic is essential in the above scenario since updates in the database are frequent and clustering results could quickly degrade. This incremental clustering approach should also support PAUSE/RESUME operations due to the big volume of data to be handled. In this report, we address the above issues and propose a novel indexing scheme for large MODs, which is sophisticatedly based on abstractions of trajectory data, the so-called Representative Trajectories, hence the term ReTraTree, and turns out to be able to (a) efficiently process both querying and mining operations in the MOD, (b) work in distributed mode, and (c) support trajectory databases that are fed in asynchronous and batch mode. The rest of the report is organized as follows. Section 2 reviews related work. Section 3 presents the ReTraTree structure, and Section 4 provides the algorithms for maintaining the ReTraTree and exploiting in it for querying and mining operations along with a cost analysis. Section V discusses the settings and results of our empirical performance study. 2. RELATED WORK In this section, we review related work on mobility data mining (focusing on clustering) and related access methods for mobility data. Recently, several approaches try to make well-known mining algorithms operational to trajectories. The common building block of these approaches is the use of different similarity functions as the mean to group trajectories into clusters. An interesting approach, which is also adopted by our work, is proposed in [10] for the efficient processing of most-similar trajectory (MST) queries. A similar distance function is used in T-OPTICS algorithm [32] (and its variant TF-OPTICS, which focuses on the discovery of the temporal intervals that lead to best clustering results), where the OPTICS [4] clustering algorithm, is made applicable for trajectory data. The previously mentioned temporal intervals are given by the user, which is a limitation of the approach, so TF-OPTICS reapplies T-OPTICS on portions of trajectories, that all live in exactly the same temporal period. The best out of the possible clusterings is chosen by applying some qualitative measures. Our approach can be viewed as a generalization of this approach, as we automatically identify patterns of sub-trajectories in an un-supervised way, which may have various nonpredefined lifespans. In [7] the authors proposed probabilistic techniques based on EM algorithm for clustering short trajectories using regression mixture models. This approach also aims on performing global clustering of whole trajectories and the notions of segmentation, sampling and sub-trajectory pattern mining is out of scope. In [39] the authors proposed a variant of FCM algorithm for MOD, called CenTR-I-FCM. The

4 approach makes use of local patterns in time dimension as the base to identify global clusters of whole approximate/symbolic trajectories. Again the discovered local patterns are predefined with respect to their lifespan. In [28] the authors proposed a partition-and-group framework for clustering 2-D trajectories which enables the grouping of similar sub-trajectories, based on a trajectory partitioning algorithm that uses the minimum description length principle. In its core it uses a variant of the DBSCAN algorithm that operates on the partitioned directed line segments. This work was the first to tackle the problem of identifying subpatterns in mobility data, and, although similar in principle with our approach, it presents certain limitations as discussed earlier on the example of Fig. 1. An interesting line of research include works that aim to discover several types of collective behavior among moving objects like flocks, leadership, convergence, encounter and sub-trajectory clusters patterns [5][16][26][15], moving clusters [23] convoys [22], and swarms [30]. Although these approaches provide lucid definitions of the mined patterns, their main limitation is that they are rather rigorous and sensitive to parameters, while their computation raises efficiency issues. Our approach also finds commonalities to well-known approaches of clustering algorithms of point (vector) data [47] [43], which first sample the dataset and then start the clustering process (aiming at high efficiency). Of course, it is not only that these vector-based algorithms are not applicable to MOD (due to the complex structure and properties of mobility data), there is also an essential difference between those techniques and our approach: while those rely on random sampling, in our approach the clustering is driven by a sample resulted by an optimization formula, thus leading to a deterministic solution of the sub-trajectory clustering problem. The previous algorithms usually handle the issue of efficiency by employing a general-purpose access method (e.g. an R-tree-like structure), which however is implemented ad hoc and outside a DBMS or a specialized MOD. This means that concurrency and recovery are left outside of the scene of requirements, as such diminishing the usage of the algorithms in real-world applications. Extending MOD engines, and generally commercial ORDBMSs like Informix Dynamic Server [21] Oracle s Extensible Indexing Interface [35] and DB2 UDB s table functions [8], it does not reduce the complexity of understanding their concurrency and recovery protocols, and as such it does not reduce the implementation effort of an external access method when compared to a built-in one, if identical levels of concurrency, robustness and integration are desired [25], as in our case. Actually, this complexity is the main reason that although in the literature there have been proposed literally dozens of efficient access method for mobility data (to name but a few representatives [41][44][19][34]), none of them has been integrated in a real DBMS under the afore-mentioned specifications. To handle this problem the GiST [20] structure has been proposed, which however, to the best of our knowledge it has not been used in the context of mobility data. More interestingly, not all of the research proposals can be realized as GiST s instances. For example, the TBtree [41] cannot be reproduced due to each double linked list in the leaves of the tree. In this report, our proposal is to simulate the well-known 3D-Rtree [44] on PostgreSQL s GiST extensibility interface, applied on appropriate data types that allow mobility data representation. Our choice was driven by the fact that 3D-Rtree has been used as the access method by several diverse algorithms, while it has exhibited balanced performance in a variety of benchmark queries [9]. 3. THE RETRATREE STRUCTURE Hierarchical splitting the time domain The idea of hierarchical partitioning of the time domain is to first partition each trajectory into p << L k equi-sized disjoint temporal periods (i.e. first level partitioning into so-called chunks), and secondly to organize each of the latter into possibly overlapping equivalence classes according to the lifespans of the sub-trajectories inside the chunks (i.e. second level partitioning into so-called sub-chunks).

5 More formally, given a trajectory T k as a sequence of L k -1 (3D) line segments e k,i, the lifespan l of all trajectories in the trajectory database D, and a target partitioning granularity p << L k, the chunked subtrajectory ST k,i of trajectory T k is the one resulted by T k when restricted inside a temporal period p j. ( 1 ) li, ) =, 1 i p l i p i p p where l is the length of each time interval (i.e. the lifespan of each chunk) and timestamps l ( i 1) p td. +, 2 i p are called splitting timestamps. As such, each trajectory is split into multiple sub-trajectories using the same p-1 splitting timestamps. Note that this strategy is different from that in [18], [19], [42], where each trajectory selects different splitting timestamps, while it is similar to that in [34]. See in Fig. 1 the splitting of the MOD into two chunks, each corresponding to data of one day (i.e. see mauve (green) colored sub-trajectories, respectively). The chunking process is applied incrementally whenever a batch of recordings from a moving object arrives. Then, the algorithm tries to fit it in the existing chunks, taking into consideration the already created chunking borders. If the given trajectory cannot be fitted in the existing temporal range, then the set of chunks is extended suitably in order to fit the trajectory. At the second level, each chunk produced by the first phase is subdivided into (possibly) overlapping equivalence classes. Specifically, we partition each chunk into smaller sub-chunks by grouping the subtrajectories contained in the chunk by their lifespan and their starting and ending timepoints. Therefore, in this phase the temporal borders of each sub-chunk, which are not defined by the user but from the data, are the same or similar w.r.t. a temporal tolerance parameter tau that is user-defined. This parameter implies that two sub-trajectories are considered temporally similar if their starting (ending) timepoints do not differ more than tau/2, respectively. Moreover, this parameter assumes that two sub-trajectories cannot be considered as spatio-temporally similar (as such they will not be in the same cluster and their distance function will not be calculated) if the union of their non-common lifespans is bigger than tau (i.e. a trajectory of 20 minutes duration cannot be similar with a trajectory of 30 minutes duration, when tau=10 minutes). Obviously, when setting tau=0 minutes, this will result into a large number of equivalence classes as in real-world data it is rare to have many trajectories starting their route absolutely concurrently. In Fig. 1 the chunk corresponding to the first day (i.e. mauve colored sub-trajectories) is subdivided to two sub-chunks, containing <T 1, T 2, T 3, T 4 > and <T 5, T 6 > sets of sub-trajectories, respectively. t p Day 2 T 6 T 3 T 5 T 4 T 3 T 1 T 2 Day 1 x y Fig. 1: A MOD consisting of six trajectories

6 Table 1 summarizes the definitions of the symbols used in this report. Symbols Definitions D Given MOD, D = {T 1, T 2,, T N } T k k th trajectory of D L k Number of points of T k p k,i i-th (3D) point of trajectory T k, p k,i = (x k,i, y k,i, t k,i ) e k,i i-th (3D) line segment of trajectory T k, e k,i = [p k,i, p k,i+1 ] l k,i Lifespan of e k,i, calculated as: l k,i = t k,i+1 t k,i LP k Number of sub-trajectories partitioning trajectory T k P k Set of the sub-trajectories partitioning trajectory T k P k,i i th sub-trajectory of trajectory T k V Set of all voting descriptors in dataset D V k The voting descriptor of trajectory T k VP k,i The voting descriptor of sub-trajectory P k,i Nl k,i The descriptor of sub-trajectory P k,i w.r.t. normalized lifespan of its line segments S Sampling set of representatives S={R 1,..., R M } M The cardinality of S, also the number of clusters in the resulting clustering SR(S) Representativeness function of S V(P k,i,p m,n ) Voting descriptor of P k,i D S w.r.t. P m,n S C Clustering of sub-trajectories in M clusters, C = {C 1,, C M } Out Set of sub-trajectories not belonging to C (i.e., outliers) t.x Minimum timestamp of object x T.x Maximum timestamp of object x l Lifespan of all T k in D p Number of equi-time disjoint intervals (i.e. chunks) CK i i-th chunk of D corresponding to p i, 1 i pperiod ST k,i S n CK i tau S n CK i.per S n CK i.s S n CK i.out Table 1: Symbol table. Sub-Trajectory of T k that belongs to CK i n-th subchunk of i-th chunk Temporal threshold (tolerance) Temporal period for S n CK i Representatives for S n CK i Outliers for S n CK i The ReTraTree data structure The previous discussion regarding the hierarchical splitting of the time domain implicitly describes the first two levels of the ReTraTree. In detail, the root of the ReTraTree consists of entries corresponding to chunks sorted by time. Note that for each chunk CK i there is no need to maintain the temporal periods in the index nodes as these correspond to equal-length splitting intervals. Each entry CK i only maintains a pointer to the respective set of sub-chunks S n CK i, n 1, forming the second level of ReTraTree. Each entry of a sub-chunk is a sequence of triplets <S n CK i.per, S n CK i.s, S n CK i.out>, where per is the temporal period of the sub-chunk, while S (Out) are pointers to the set of representative (outlier) sub-trajectories of S n CK i. The sequence of triplets are ordered initially by the starting timepoint and secondly by the ending timepoint of per. The entries of the set S consist of pairs <R j, C!! >, each of which include the representative sub-trajectory R j and a pointer C!! to the subset of sub-trajectories that formulate a cluster around R j. Similarly S is ordered by the time period of R j. The set Out contains the outlier sub-trajectories of the current sub-chunk. The sets S and Out (whose utility and role will be discussed in the subsequent section) form a third level of partitioning in ReTraTree, while the actual data corresponding to all clusters C!! is the fourth level of the structure. Let s refer to this subset of data as D n,i. Note that all the sub-

7 trajectories of all C!! in a sub-chunk, namely D n,i, are organized in a relation, whose column including the sub-trajectories is indexed by a pg3dr-tree, while the column including the cluster identifier is indexed by a B+ tree. Of course, the relation further includes the identifiers of the trajectories, also indexed by a B+ tree. Obviously, these indices enable us to apply spatio-temporal queries to sub-chunk D n,i and facilitate the direct access into data of a specific cluster C!!. Subsequently, we formalize the above discussion, while in Fig. 2 and Fig. 3 we depict the structure of the ReTraTree and its instantiation for the MOD of Fig. 1, respectively: root = {,CK i, }, 1 i p CK i = {S n CK i, }, n 1 S n CK i = <S n CK i.per, S n CK i.s, S n CK i.out> S n CK i.s = {<C!!, R j >}, j 1 Fig. 2: The structure of the ReTraTree Fig. 3: A ReTraTree (omitting the fourth (data) level) built from the MOD of Fig. 1

8 4. RETRATREE ALGORITHMS Below we provide a technical description of the algorithm that is presented abstractly in Fig. 2 Algorithm S 2 T-Clustering Input: MOD D = {T 1, T 2,, T N }, w, ε Output: Sampling set S, Clusters C i, i {1,..., M}, Outliers O. 1. V ß GVA(D, ε) 2. for each V k V do 3. P k ß TSA(V k, w)! 4. S ß SSA(V,!!! P! )! 5. (C, Out) ß SCA(S,!!! P!, ε) 6. return (S, C, Out) Fig. 4: Algorithm for Sampling-based Sub-Trajectory Clustering Incremental maintenance of ReTraTree Recall that our goal is to incrementally maintain the ReTraTree whenever a batch of recordings of a moving object (i.e. a trajectory T k ) arrives. This methodology is described in Algorithm 2. In the previous discussion we have described how our method incrementally performs the first phase of partitioning in the time dimension (line 1). The update_root function returns the set of chunks CK and the respective set of sub-trajectories ST that correspond to the input trajectory T k. Briefly, the rest of the methodology assigns each sub-trajectory to an appropriate sub-chunk (lines 4-6). If there is not a matching sub-chunk w.r.t. time, a new subchunk is created, which is initialized with an empty representative set S, and an outliers set Out including the unmatched sub-trajectory (lines 35-39). If there is an appropriate sub-chunk for the sub-trajectory under processing, the algorithm tries to assign it to an existing cluster (lines 8-13). If this attempt fails, then the algorithm adds the sub-trajectory into the outliers set, which act as a temporary relation upon which sampling-based sub-trajectory clustering (i.e. S 2 T-Clustering Algorithm) is applied whenever the size of the relation exists a user-defined threshold (e.g. > α Mb). When this process takes place, a resulting new representative sub-trajectory will extend the existing set of representatives, only if it is different from them. Subsequently, for each of the resulting new outlier sub-trajectories, we either delete them (store them in a permanent outliers relation) if their size is smaller than w, which means that it will not be able to be clustered in a future clustering round, or we re-drop the sub-trajectory from the top of the ReTraTree structure. This implies that we recursively apply the procedure for that sub-trajectory (till it is either clustered or partitioned to smaller pieces, due to successive applications of the S 2 T-Clustering algorithm) in order to search for other sub-chunks wherein the latter could be clustered or to form a new sub-chunk (lines 15-28).

9 Algorithm IS 2 T-Clustering Input: ReTraTree root, trajectory T k, tau, w, ε Output: Updated ReTraTree //PHASE 1: Chunking in the time domain 1. (CK, ST)ß update_root(root, T k ) 2. for each pair (CK i, ST k,i ) (CK, ST) do //PHASE 2: Data-Driven Incremental Sub-Chunking and Clustering 3. clusteredß false; matchß false 4. SCK i ß {S n CK i, t.st k,i t.s n CK i < tau/2} 5. for each S n CK i SCK i do 6. if ( T.ST k,i T.S n CK i < tau/2) then 7. for each R j S n CK i.s do 8. if (non_common_lifespan(st k,i, R j ) < tau) then 9. if (V ST!,!, R! ε) then 10. C!! ß C!! ST k,i 11. clusteredß true 12. if (clustered=false) then 13. S n CK i.out ß S n CK i.out ST k,i //PHASE 3: Sampling-based Sub-Trajectory Clustering 14. if S n CK i.out > α_mb then 15. (S, C, Out) ß S 2 T-Clustering(S n CK i.out, w, ε) 16. S n CK i.s ß S n CK i.s {S S ΝΟΤ ε-join(s n CK i.s, S)} 17. for each outlier O in Out do 18. if O < w then 19. delete O 20. else 21. IS 2 T-Clustering(root, O, tau, w, ε) 22. matchß true 23. if (match = true) then 24. break 25. if (match = false) then 26. SCK i ß SCK i S n+1 CK i // i.e. create new sub-chunk 27. S n+1 CK i.s ß 28. S n+1 CK i.out ß S n+1 CK i.out ST k,i 29. return Fig. 5: Algorithm for Inserting a trajectory in the ReTraTree structure Query-based T-Clustering on ReTraTree The above algorithm maintains incrementally the ReTraTree structure, which in its leaves includes already clustered sub-trajectories. However, given a temporal period it is not enough to retrieve the clusters (i.e. sub-trajectories following the representatives) that overlap this period, as it is possible that the sub-trajectory clustering process of overlapping sub-chunks to form clusters, namely representatives

10 that: (a) are almost identical (as such, a merge process should take place in order to report only one cluster as the union of the two clusters built around the two similar representatives), and/or (b) one representative can be the continuation of another (as such, an append process should take place to identify maximal clusters). In other words what we require is a methodology that takes as input the ReTraTree structure as input and searches it, so as to identify maximal patterns w.r.t. the user requirements, while at the same time identifies places where internal re-organization could take place to improve the effectiveness and efficiency of ReTraTree. Such a user requirement could be the discovery of all the valid clusters during a specific period of time (eventual this period could be the whole lifespan of the MOD, providing a solution also for the whole- (vs. sub-) trajectory clustering probel. This is a reasonable requirement in the BIG mobility data setting that we envision and the fact that state-of-art clustering algorithms are not able to be applied in the currently available MOD sizes. To put differently, the proposed methodology implies an algorithm that will act as a query operator in a MOD engine and that it will retrieve already clustered data according to user parameters and it will perform the aforementioned necessary merge and append refinements on the query results. To the best of our knowledge, such a query-based clustering approach is novel in the mobility data management and mining literature. The following algorithm proposes such a solution on top of ReTraTree. The user gives as parameter the period of interest and the algorithm traverses the tree and returns clusters valid in this period. More specifically, the algorithm initially filters the chunks that overlap the given period and for each of them the corresponding valid sub-chunks (lines 1-3). These sub-chunks are organized in a priority queue which at this step (line 4) partitions the sub-chunks in equivalence classes according to whether the representatives that have been discovered inside these sub-chunks temporally overlap or not. To illustrate this, Fig. 7 shows only the representative sub-trajectories (not the outliers) of one chunk. Note that for simplicity, y-dimension has been omitted and specific borders of sub-chunks are not depicted, while the representatives form two equivalence classes, i.e. the blue and the red one. Subsequently, the algorithm pops each equivalence class one-by-one and sorts all representatives w.r.t. time dimension, similarly to sub-chunks, by interleaving the already sorted representatives in each sub-chunk (line 6). In Fig. 7 representatives coming from different sub-chunks are distinguished as dashed vs. continuous polylines. Then, the algorithm sweeps in time dimension the temporally interleaved representatives (line 7) and for each pair of overlapped sub-trajectories it only checks whether the two representatives have either the same lifespan (line 8) or one ends when the next is starting (line 11); w.r.t. the tau threshold. In the first case, if the two representatives are similar (this means that come for sure from different sub-chunks), then the first (in order) is being annotated with MERGE flag so as to hint that a merge process should take place at this step. This implies that the ReTraTree should be appropriately updated (shrinked) to keep only one of them. Note that such a re-organization is not performed at query time, but queries results gives the required hints in order to apply whenever applications allow it. Such a merging hint is depicted in Fig. 7 between sub-trajectories R 1 and R 2. Obviously, representatives like R 5 and R 6, will both be maintained in the final outcome although they have similar lifespans. In the second case, if the Euclidean distance of the last point of the first representative is close (w.r.t. a distance threshold) to the first point of the second representative and a sufficient number of the same moving objects are represented by both representatives (w.r.t. a percentage threshold), the latter is appended to the first one (lines 12-13). This case is depicted in Fig. 7 between sub-trajectories R 3 and R 4. In any other case (line 15) the algorithm does nothing, meaning that it continues to the next pair, as such it maintains both representatives into the sorted list. At the end of each sweep, the algorithm simply maintains in the next round only those representatives that end at most tau seconds before the border of the current chunk (e.g. R 7, as candidates for merging with subsequent representatives) (lines 16-18). The rest of the representatives are part of the final outcome of the algorithm.

11 Algorithm QuT-Clustering Input: ReTraTree root, temporal period tp=[s, e), tau, d, γ Output: Clusters C valid inside tp 1. CKß {CK i, overlap(root.ck i, tp)} 2. for each CK i CK do 3. SCK i ß {S n CK i, overlap(tp, CK i.s n CK i.per)} 4. TEQ_PQß bulk_push_2_teq(teq_pq, SCK i ) 5. while TEQ_PQ Ø do 6. Sß temporal_interleaving(teq_pq.pop()) 7. for each R j S do 8. if (non_common_lifespan(r j, R j+1 ) < tau) then 9. if (V R!, R!!! ε) then 10. annotate R j with MERGE flag 11. else if ( T.R j - t.r j+1 < tau) then 12. if (euclidean_dist(p(t.r j ), p(t.r j+1 )) < d) AND (common_ids(r j, R j+1 ) > γ) then 13. annotate R j with APPEND flag 14. else 15. continue 16. S ß {R j S, T.R j - T.CK i > tau} 17. Sß S-S 18. Cß C S 19. return C Fig. 6: Algorithm for trajectory search in the ReTraTree structure Fig. 7: Representatives of a chunk organized in a temporal equivalence class

12 REFERENCES [1] Almeida, V.T., Güting, R.H., & Behr, T Querying moving objects in secondo. In Proceedings of MDM. [2] Andrienko, G., Andrienko, N., Rinzivillo, S., Nanni, M., and Pedreschi D A visual analytics toolkit for cluster-based classification of mobility data. In Proceedings of SSTD, pages [3] Andrienko, G., Andrienko, N., Rinzivillo, S., Nanni, M., Pedreschi D., and Giannotti, F Interactive visual clustering of large collections of trajectories. In Proceedings of VAST, pages [4] Ankerst, M., Breunig, M. M., Kriegel, H.-P. and Sander, J Optics: Ordering points to identify the clustering structure. In Proceedings of SIGMOD. [5] Benkert, M., Gudmundsson, J., Hubner, F. and Wolle T Reporting flock patterns. In Proceedings of ESA, pages [6] Brinkhoff T. A framework for generating network-based moving objects. GeoInformatica, 6(2):153180, [7] Cadez, I. V., Gaffney, S., and Smyth, P A general probabilistic framework for clustering individuals and objects. In Proceedings of KDD, pages [8] Dessloch, S. and Mattos, N Integrating SQL Databases with Content-Specific Search Engines. In Proceedings of VLDB, pages [9] Düntgen, C., Behr, T., and Güting, R. H BerlinMOD: a benchmark for moving object databases. VLDB Journal, 18(6): [10] Frentzos, E., Gratsias, K., and Theodoridis, Y Index-based most similar trajectory search. In Proceedings of ICDE. [11] Frentzos, E., Gratsias, K., Pelekis, N., and Theodoridis, Y Algorithms for nearest neighbor search on moving object trajectories. GeoInformatica, 11: [12] Gaffney, S., and Smyth, P Trajectory clustering with mixtures of regression models. In Proceedings of KDD, pages [13] Giannotti, F. and Pedreschi, D Mobility, Data Mining and Privacy, Geographic Knowledge Discovery. Springer-Verlag. [14] Giannotti, F., Nanni, M. Pinelli, F. and Pedreschi D Trajectory pattern mining. In Proceedings of KDD, pages [15] Gudmundsson, J. van Kreveld, M. J. and Speckmann, B Efficient detection of patterns in 2d trajectories of moving points. GeoInformatica, 11(2): [16] Gudmundsson, J., Loffler, M., Buchin, K., Buchin, M. and Luo, J Detecting commuting patterns by clustering subtrajectories. In Proceedings of ISAAC. [17] Guttman, A R-Trees. A Dynamic Index Structure for Spatial Searching. In Proceedings of SIGMOD. [18] Hadjieleftheriou, M., Kollios, G., Tsotras, V.J. and Gunopulos, D Efficient Indexing of Spatiotemporal Objects, In Proceedings of EDBT, pages [19] Hadjieleftheriou, M., Kollios, G., Gunopulos, D. and Tsotras, V.J., Indexing Spatio-Temporal Archives, VLDB J., vol. 15, no. 2, pages [20] Hellerstein, J., Naughton, J. and Pfeffer, A Generalized Search Trees for Database Systems. In Proceedings of VLDB, pages [21] Informix Corp Virtual Index Interface Guide. [22] Jeung, H., Yiu, M. L., Zhou, X., Jensen, C., and Shen, H. T Discovery of convoys in trajectory databases. In Proceedings of VLDB. [23] Kalnis, P., Mamoulis, N., and Bakiras, S On discovering moving clusters in spatio-temporal data. In Proceedings of SSTD, pages [24] Kollios, G., Gunopulos, D., Koudas, N., and Berchtold, S Efficient biased sampling for approximate clustering and outlier detection in large datasets. IEEE Transactions on Knowledge and Data Engineering, 15: [25] Kornacker, M High-Performance Extensible Indexing. In Proceedings of VLDB, pages 3-10.

13 [26] Laube, P., Imfeld, S., and Weibel, R Discovering relative motion patterns in groups of moving point objects. International Journal of Geographical Information Science. 19(6), [27] Lee, J.-G., Han, J., and Li, X Trajectory outlier detection: A partition-and-detect framework. In Proceedings of ICDE, pages [28] Lee, J.-G., Han, J., and Whang, K.-Y Trajectory clustering: a partition-and-group framework. In Proceedings of SIGMOD. [29] Lee, J.-G., Han, J., Li, X. and Gonzalez, H Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. PVLDB, pages [30] Li, Z., Ding, B., Han, J. and Kays, R Swarm: Mining Relaxed Temporal Moving Object Clusters. In Proceedings of VLDD. [31] Li, Y., Han, J., and Yang, J Clustering moving objects. In Proceedings of KDD, pages [32] Nanni, M., and Pedreschi, D Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems, 27(3): [33] Nanopoulos, A., Theodoridis, Y., and Manolopoulos, Y Indexed-based density biased sampling for clustering applications. Data and Knowledge Engineering, 57(1): [34] Ni, J. and Ravishankar, C. V., Indexing Spatio-Temporal Trajectories with Efficient Polynomial Approximations, IEEE TKDE, vol. 19, no. 5, pages [35] Oracle Corp All Your Data: The Oracle Extensibility Architecture. [36] Panagiotakis, C., Pelekis, N., Kopanakis, I., Ramasso, E., and Theodoridis, Y Segmentation and sampling of moving object trajectories based on representativeness. IEEE Transactions on Knowledge and Data Engineering. [37] Pelekis, N., Andrienko, G., Andrienko, N., Kopanakis, I., Marketos, G., Theodoridis, Y Visually Exploring Movement Data via Similarity-based Analysis, Journal of Intelligent Information Systems. [38] Pelekis, N., Frentzos, E., Giatrakos, N., and Theodoridis, Y HERMES: Aggregative LBS via a trajectory DB engine. In Proceedings of SIGMOD, pages [39] Pelekis, N., Kopanakis, I., Kotsifakos, E., Frentzos, E. and Theodoridis, Y Clustering uncertain trajectories. Knowledge and Information Systems, 28(1): [40] Pelekis, N., Panagiotakis, C., Kopanakis, I., and Theodoridis, Y Unsupervised trajectory sampling. In Proceedings of ECML-PKDD. [41] Pfoser, D., Jensen, C.S., and Theodoridis, Y Novel approaches to the indexing of moving object trajectories. In Proceedings of VLDB. [42] Rasetic, S., Sander, J., Elding, J. and Nascimento, M.A., A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing, In Proceedings of VLDB, pages [43] Shim K. Guha, S. Rastogi R Cure: An efficient clustering algorithm for large databases. In Proceedings of SIGMOD. [44] Theodoridis, Y., Vazirgiannis, M. and Sellis, T Spatio-Temporal Indexing for Large Multimedia Applications. In Proceedings of ICMS. [45] The R-Tree website, [Online]. Available: [46] Vodas, M Hermes - Building an Efficient Moving Object Database Engine, MSc. Thesis, University of Piraeus. [47] Zhang, T., Ramakrishnan, R., and Livny, M Birch: An efficient data clustering method for very large databases. In Proceedings of SIGMOD.

Mobility Data Management and Exploration: Theory and Practice

Mobility Data Management and Exploration: Theory and Practice Mobility Data Management and Exploration: Theory and Practice Chapter 4 -Mobility data management at the physical level Nikos Pelekis & Yannis Theodoridis InfoLab, University of Piraeus, Greece infolab.cs.unipi.gr

More information

Detect tracking behavior among trajectory data

Detect tracking behavior among trajectory data Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing

More information

Mobility Data Management & Exploration

Mobility Data Management & Exploration Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter

More information

On Discovering Moving Clusters in Spatio-temporal Data

On Discovering Moving Clusters in Spatio-temporal Data On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis 1, Nikos Mamoulis 2, and Spiridon Bakiras 3 1 Department of Computer Science, National University of Singapore, kalnis@comp.nus.edu.sg

More information

Mobility Data Mining. Mobility data Analysis Foundations

Mobility Data Mining. Mobility data Analysis Foundations Mobility Data Mining Mobility data Analysis Foundations MDA, 2015 Trajectory Clustering T-clustering Trajectories are grouped based on similarity Several possible notions of similarity Start/End points

More information

A Joint approach of Mining Trajectory Patterns according to Various Chronological Firmness

A Joint approach of Mining Trajectory Patterns according to Various Chronological Firmness 2016 IJSRSET Volume 2 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology A Joint approach of Mining Trajectory Patterns according to Various Chronological

More information

On Discovering Moving Clusters in Spatio-temporal Data

On Discovering Moving Clusters in Spatio-temporal Data On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis 1, Nikos Mamoulis 2, and Spiridon Bakiras 3 1 Department of Computer Science, National University of Singapore kalnis@comp.nus.edu.sg

More information

Hermes - A Framework for Location-Based Data Management *

Hermes - A Framework for Location-Based Data Management * Hermes - A Framework for Location-Based Data Management * Nikos Pelekis, Yannis Theodoridis, Spyros Vosinakis, and Themis Panayiotopoulos Dept of Informatics, University of Piraeus, Greece {npelekis, ytheod,

More information

Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases

Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases Costas Panagiotakis 1, Nikos Pelekis 2, and Ioannis Kopanakis 3 1 Dept. of Computer Science, University

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

Implementation and Experiments of Frequent GPS Trajectory Pattern Mining Algorithms

Implementation and Experiments of Frequent GPS Trajectory Pattern Mining Algorithms DEIM Forum 213 A5-3 Implementation and Experiments of Frequent GPS Trajectory Pattern Abstract Mining Algorithms Xiaoliang GENG, Hiroki ARIMURA, and Takeaki UNO Graduate School of Information Science and

More information

Trajectory Compression under Network Constraints

Trajectory Compression under Network Constraints Trajectory Compression under Network Constraints Georgios Kellaris, Nikos Pelekis, and Yannis Theodoridis Department of Informatics, University of Piraeus, Greece {gkellar,npelekis,ytheod}@unipi.gr http://infolab.cs.unipi.gr

More information

On-Line Discovery of Flock Patterns in Spatio-Temporal Data

On-Line Discovery of Flock Patterns in Spatio-Temporal Data On-Line Discovery of Floc Patterns in Spatio-Temporal Data Marcos R. Vieira University of California Riverside, CA 97, USA mvieira@cs.ucr.edu Peto Baalov ESRI Redlands, CA 97, USA pbaalov@esri.com Vassilis

More information

Trajectory Compression under Network constraints

Trajectory Compression under Network constraints Trajectory Compression under Network constraints Georgios Kellaris University of Piraeus, Greece Phone: (+30) 6942659820 user83@tellas.gr 1. Introduction The trajectory of a moving object can be described

More information

xiii Preface INTRODUCTION

xiii Preface INTRODUCTION xiii Preface INTRODUCTION With rapid progress of mobile device technology, a huge amount of moving objects data can be geathed easily. This data can be collected from cell phones, GPS embedded in cars

More information

A System for Discovering Regions of Interest from Trajectory Data

A System for Discovering Regions of Interest from Trajectory Data A System for Discovering Regions of Interest from Trajectory Data Muhammad Reaz Uddin, Chinya Ravishankar, and Vassilis J. Tsotras University of California, Riverside, CA, USA {uddinm,ravi,tsotras}@cs.ucr.edu

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Faster Clustering with DBSCAN

Faster Clustering with DBSCAN Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data

More information

An Efficient Technique for Distance Computation in Road Networks

An Efficient Technique for Distance Computation in Road Networks Fifth International Conference on Information Technology: New Generations An Efficient Technique for Distance Computation in Road Networks Xu Jianqiu 1, Victor Almeida 2, Qin Xiaolin 1 1 Nanjing University

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf Where Next? Data Mining Techniques and Challenges for Trajectory Prediction Slides credit: Layla Pournajaf o Navigational services. o Traffic management. o Location-based advertising. Source: A. Monreale,

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Data Mining II Mobility Data Mining

Data Mining II Mobility Data Mining Data Mining II Mobility Data Mining Mirco Nanni, ISTI-CNR Main source: Jiawei Han, Dep. of CS, Univ. IL at Urbana-Champaign: https://agora.cs.illinois.edu/display/cs512/lectures Mining Moving Object Data

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

TRAJECTORY PATTERN MINING

TRAJECTORY PATTERN MINING TRAJECTORY PATTERN MINING Fosca Giannotti, Micro Nanni, Dino Pedreschi, Martha Axiak Marco Muscat Introduction 2 Nowadays data on the spatial and temporal location is objects is available. Gps, GSM towers,

More information

Segmentation and sampling of moving object trajectories based on representativeness.

Segmentation and sampling of moving object trajectories based on representativeness. Segmentation and sampling of moving object trajectories based on representativeness. ostas Panagiotakis, Nikos Pelekis, Ioannis Kopanakis, mmanuel Ramasso, Yannis Theodoridis To cite this version: ostas

More information

Nearest Neighbor Search on Moving Object Trajectories

Nearest Neighbor Search on Moving Object Trajectories Nearest Neighbor Search on Moving Object Trajectories Elias Frentzos 1, Kostas Gratsias 1,2, Nikos Pelekis 1, and Yannis Theodoridis 1,2 1 Department of Informatics, University of Piraeus, 8 Karaoli-Dimitriou

More information

Online Clustering for Trajectory Data Stream of Moving Objects

Online Clustering for Trajectory Data Stream of Moving Objects DOI: 10.2298/CSIS120723049Y Online Clustering for Trajectory Data Stream of Moving Objects Yanwei Yu 1,2, Qin Wang 1,2, Xiaodong Wang 1, Huan Wang 1, and Jie He 1 1 School of Computer and Communication

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

A FRAMEWORK FOR MINING UNIFYING TRAJECTORY PATTERNS USING SPATIO- TEMPORAL DATASETS BASED ON VARYING TEMPORAL TIGHTNESS Rahila R.

A FRAMEWORK FOR MINING UNIFYING TRAJECTORY PATTERNS USING SPATIO- TEMPORAL DATASETS BASED ON VARYING TEMPORAL TIGHTNESS Rahila R. A FRAMEWORK FOR MINING UNIFYING TRAJECTORY PATTERNS USING SPATIO- TEMPORAL DATASETS BASED ON VARYING TEMPORAL TIGHTNESS Rahila R. 1 and R, Siva 2 Department of CSE, KCG College of Technology, Chennai,

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Searching for Similar Trajectories on Road Networks using Spatio-Temporal Similarity

Searching for Similar Trajectories on Road Networks using Spatio-Temporal Similarity Searching for Similar Trajectories on Road Networks using Spatio-Temporal Similarity Jung-Rae Hwang 1, Hye-Young Kang 2, and Ki-Joune Li 2 1 Department of Geographic Information Systems, Pusan National

More information

Incremental Clustering for Trajectories

Incremental Clustering for Trajectories Incremental Clustering for Trajectories Zhenhui Li Jae-Gil Lee Xiaolei Li Jiawei Han Univ. of Illinois at Urbana-Champaign, {zli28, hanj}@illinois.edu IBM Almaden Research Center, leegj@us.ibm.com Microsoft,

More information

Introduction to Spatial Database Systems

Introduction to Spatial Database Systems Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated

More information

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Irene Ntoutsi, Yannis Theodoridis Database Group, Information Systems Laboratory Department of Informatics, University of Piraeus, Greece

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &

More information

Discovering Frequent Mobility Patterns on Moving Object Data

Discovering Frequent Mobility Patterns on Moving Object Data Discovering Frequent Mobility Patterns on Moving Object Data Ticiana L. Coelho da Silva Federal University of Ceará, Brazil ticianalc@ufc.br José A. F. de Macêdo Federal University of Ceará, Brazil jose.macedo@lia.ufc.br

More information

Clustering Moving Objects in Spatial Networks

Clustering Moving Objects in Spatial Networks Clustering Moving Objects in Spatial Networks Jidong Chen 1,2, Caifeng Lai 1,2, Xiaofeng Meng 1,2, Jianliang Xu 3, and Haibo Hu 3 1 School of Information, Renmin University of China 2 Key Laboratory of

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Mining Dense Trajectory Pattern Regions of Various Temporal Tightness Ms. Sumaiya I. Shaikh 1, Prof. K. N. Shedge 2

Mining Dense Trajectory Pattern Regions of Various Temporal Tightness Ms. Sumaiya I. Shaikh 1, Prof. K. N. Shedge 2 Mining Dense Trajectory Pattern Regions of Various Temporal Tightness Ms. Sumaiya I. Shaikh 1, Prof. K. N. Shedge 2 1 Ms.Sumaiya I. Shaikh, ComputerEngineering Department,SVIT, Chincholi, Nashik, Maharashtra,

More information

Similarity-based Analysis for Trajectory Data

Similarity-based Analysis for Trajectory Data Similarity-based Analysis for Trajectory Data Kevin Zheng 25/04/2014 DASFAA 2014 Tutorial 1 Outline Background What is trajectory Where do they come from Why are they useful Characteristics Trajectory

More information

Nearest Neighbor Search on Moving Object Trajectories

Nearest Neighbor Search on Moving Object Trajectories Nearest Neighbor Search on Moving Object Trajectories Elias Frentzos 1, Kostas Gratsias 1,2, Nikos Pelekis 1, Yannis Theodoridis 1,2 1 Department of Informatics, University of Piraeus, 8 Karaoli-Dimitriou

More information

Fosca Giannotti et al,.

Fosca Giannotti et al,. Trajectory Pattern Mining Fosca Giannotti et al,. - Presented by Shuo Miao Conference on Knowledge discovery and data mining, 2007 OUTLINE 1. Motivation 2. T-Patterns: definition 3. T-Patterns: the approach(es)

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

arxiv: v1 [cs.db] 9 Mar 2018

arxiv: v1 [cs.db] 9 Mar 2018 TRAJEDI: Trajectory Dissimilarity Pedram Gharani 1, Kenrick Fernande 2, Vineet Raghu 2, arxiv:1803.03716v1 [cs.db] 9 Mar 2018 Abstract The vast increase in our ability to obtain and store trajectory data

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Towards a Taxonomy of Location Based Services

Towards a Taxonomy of Location Based Services Towards a Taxonomy of Location Based Services Kostas Gratsias 1,2, Elias Frentzos 1, Vasilis Delis 2, and Yannis Theodoridis 1,2 1 Department of Informatics, University of Piraeus, 80 Karaoli-Dimitriou

More information

Clustering Large Dynamic Datasets Using Exemplar Points

Clustering Large Dynamic Datasets Using Exemplar Points Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Clustering Algorithm for Network Constraint Trajectories

Clustering Algorithm for Network Constraint Trajectories Clustering Algorithm for Network Constraint Trajectories Ahmed Kharrat 1, Iulian Sandu Popa 1 Karine Zeitouni 1, Sami Faiz 2, 1 PRiSM Laboratory, University of Versailles 45, avenue des Etats-Unis - 78035

More information

Continuous Query Processing in Spatio-temporal Databases

Continuous Query Processing in Spatio-temporal Databases Continuous uery rocessing in Spatio-temporal Databases Mohamed F. Mokbel Department of Computer Sciences, urdue University mokbel@cs.purdue.edu Abstract. In this paper, we aim to develop a framework for

More information

Density Based Clustering using Modified PSO based Neighbor Selection

Density Based Clustering using Modified PSO based Neighbor Selection Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com

More information

Distributed k-nn Query Processing for Location Services

Distributed k-nn Query Processing for Location Services Distributed k-nn Query Processing for Location Services Jonghyeong Han 1, Joonwoo Lee 1, Seungyong Park 1, Jaeil Hwang 1, and Yunmook Nah 1 1 Department of Electronics and Computer Engineering, Dankook

More information

An Efficient and Effective Algorithm for Density Biased Sampling

An Efficient and Effective Algorithm for Density Biased Sampling An Efficient and Effective Algorithm for Density Biased Sampling Alexandros Nanopoulos Dept. of Informatics Aristotle University Thessaloniki, Greece alex@delab.csd.auth.gr Yannis Theodoridis Dept. of

More information

An algorithm for Trajectories Classification

An algorithm for Trajectories Classification An algorithm for Trajectories Classification Fabrizio Celli 28/08/2009 INDEX ABSTRACT... 3 APPLICATION SCENARIO... 3 CONCEPTUAL MODEL... 3 THE PROBLEM... 7 THE ALGORITHM... 8 DETAILS... 9 THE ALGORITHM

More information

Design Considerations on Implementing an Indoor Moving Objects Management System

Design Considerations on Implementing an Indoor Moving Objects Management System , pp.60-64 http://dx.doi.org/10.14257/astl.2014.45.12 Design Considerations on Implementing an s Management System Qian Wang, Qianyuan Li, Na Wang, Peiquan Jin School of Computer Science and Technology,

More information

Introduction to Trajectory Data Mining. Zhe Zhang Maa Spatial Data Mining

Introduction to Trajectory Data Mining. Zhe Zhang Maa Spatial Data Mining Introduction to Trajectory Data Mining Zhe Zhang 10.05.2016 Maa-123. 3580 Spatial Data Mining Introduction Various types of positioning and identification methods eg. GPS enable us to track human beings,

More information

A Framework for Mobility Pattern Mining and Privacy- Aware Querying of Trajectory Data

A Framework for Mobility Pattern Mining and Privacy- Aware Querying of Trajectory Data A Framework for Mobility Pattern Mining and Privacy- Aware Querying of Trajectory Data Despina Kopanaki Dept. of Informatics University of Piraeus Piraeus, Greece dkopanak@unipi.gr Nikos Pelekis Dept.

More information

Data Clustering With Leaders and Subleaders Algorithm

Data Clustering With Leaders and Subleaders Algorithm IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 11 (November2012), PP 01-07 Data Clustering With Leaders and Subleaders Algorithm Srinivasulu M 1,Kotilingswara

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Marco BOTTA Dipartimento di Informatica Università di Torino botta@di.unito.it www.di.unito.it/~botta/didattica/clustering.html Data Clustering Outline What is cluster analysis? What

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Searching Similar Trajectories in Real Time: an Effectiveness and Efficiency Study *

Searching Similar Trajectories in Real Time: an Effectiveness and Efficiency Study * Searching Similar Trajectories in Real Time: an Effectiveness and Efficiency Study * Yuchi Ma, Chunyan Qu, Tingting Liu, Ning Yang +, Changjie Tang College of Computer Science, Sichuan University 610065Chengdu,

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Trajectory Bayesian Indexing : The Airport Ground Traffic Case

Trajectory Bayesian Indexing : The Airport Ground Traffic Case Trajectory Bayesian Indexing : The Airport Ground Traffic Case Cynthia Delauney 1, Nicolas Baskiotis 1 and Vincent Guigue 1 Abstract In this paper, we propose a new approach of indexing trajectories to

More information

Multiplexing Trajectories of Moving Objects

Multiplexing Trajectories of Moving Objects Multiplexing Trajectories of Moving Objects Kostas Patroumpas 1, Kyriakos Toumbas 1, and Timos Sellis 1,2 1 School of Electrical and Computer Engineering National Technical University of Athens, Hellas

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Data mining and warehousing for Temporal Data Objects Kola Surya Prakash Asst Prof Computer Science Tagore Arts College, Lawspet, puducherry

Data mining and warehousing for Temporal Data Objects Kola Surya Prakash Asst Prof Computer Science Tagore Arts College, Lawspet, puducherry Data mining and warehousing for Temporal Data Objects Kola Surya Prakash Asst Prof Computer Science Tagore Arts College, Lawspet, puducherry Abstract Mobility data analysis became a more challenging task

More information

Pointwise-Dense Region Queries in Spatio-temporal Databases

Pointwise-Dense Region Queries in Spatio-temporal Databases Pointwise-Dense Region Queries in Spatio-temporal Databases Jinfeng Ni and Chinya V. Ravishankar Department of Computer Science and Engineering University of California, Riverside Riverside, CA 95, USA

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

CHAPTER 3 ANTI-COLLISION PROTOCOLS IN RFID BASED HUMAN TRACKING SYSTEMS (A BRIEF OVERVIEW)

CHAPTER 3 ANTI-COLLISION PROTOCOLS IN RFID BASED HUMAN TRACKING SYSTEMS (A BRIEF OVERVIEW) 33 CHAPTER 3 ANTI-COLLISION PROTOCOLS IN RFID BASED HUMAN TRACKING SYSTEMS (A BRIEF OVERVIEW) In a RFID based communication system the reader activates a set of tags, and the tags respond back. As outlined

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Improving K-Means by Outlier Removal

Improving K-Means by Outlier Removal Improving K-Means by Outlier Removal Ville Hautamäki, Svetlana Cherednichenko, Ismo Kärkkäinen, Tomi Kinnunen, and Pasi Fränti Speech and Image Processing Unit, Department of Computer Science, University

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Effective Density Queries on Continuously Moving Objects

Effective Density Queries on Continuously Moving Objects Effective Queries on Continuously Moving Objects Christian S. Jensen 1 Dan Lin 2 Beng Chin Ooi 2 Rui Zhang 2 1 Department of Computer Science Aalborg University, Denmark csj@cs.aau.dk 2 School of Computing

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Close Pair Queries in Moving Object Databases

Close Pair Queries in Moving Object Databases Close Pair Queries in Moving Object Databases Panfeng Zhou, Donghui Zhang, Betty Salzberg, and Gene Cooperman Northeastern University Boston, MA, USA zhoupf@ccs.neu.edu, donghui@ccs.neu.edu, salzberg@ccs.neu.edu,

More information

Pattern Mining in Frequent Dynamic Subgraphs

Pattern Mining in Frequent Dynamic Subgraphs Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de

More information

Mining Representative Movement Patterns through Compression

Mining Representative Movement Patterns through Compression Mining Representative Movement Patterns through Compression Phan Nhat Hai, Dino Ienco, Pascal Poncelet, and Maguelonne Teisseire 1 IRSTEA Montpellier, UMR TETIS - 34093 Montpellier, France {nhat-hai.phan,dino.ienco,maguelonne.teisseire}@teledetection.fr

More information

Effective Density Queries for Moving Objects in Road Networks

Effective Density Queries for Moving Objects in Road Networks Effective Density Queries for Moving Objects in Road Networks Caifeng Lai 1,2, Ling Wang 1,2, Jidong Chen 1,2, Xiaofeng Meng 1,2, and Karine Zeitouni 3 1 School of Information, Renmin University of China

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking

DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking In Proc. 10th Pacific-Asian Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD'06), Singapore, 2006 DeLiClu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering

More information

Clustering Algorithms In Data Mining

Clustering Algorithms In Data Mining 2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information