An Efficient Clustering Algorithm for Moving Object Trajectories

Size: px

Start display at page:

Download "An Efficient Clustering Algorithm for Moving Object Trajectories"

Archibald Thornton
6 years ago
Views:

1 3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb , 214 Singapore An Efficient Clustering Algorithm for Moving Object Trajectories Hnin Su Khaing, and Thandar Thein Abstract Evidence of increasing and continuous diffusion of low cost GPS devices, it is becoming the challenges to analyze the moving objects trajectory data. To analyze the moving object trajectories, there is a need for mechanism that how to effectively cluster on moving objects. Trajectory clustering has long been an important research direction on move mining, but still remains which algorithm is more effective among existing algorithms. In this paper, we propose a clustering algorithm which is based on Density-Based Spatial Clustering of Applications with Noise (). It cannot cluster data sets well with large differences in densities. We address this problem by proposed clustering algorithm which enhanced the by solving time consuming. Finally we evaluate an efficient trajectory clustering algorithm with real trajectory dataset by comparing with. Evaluation results show that proposed clustering algorithm can provide better performance and minimal error than. Keywords, MoveMine, Moving Object Trajectory, Trajectory Clustering. W I. INTRODUCTION ITH a widespread use of location aware devices such as mobile phones and GPS-enabled devices, huge amount of moving object data have been collected. This leads to a growing research area with automatic analysis of animal behavior and traffic management using computer vision techniques. Many researchers pay a lot of attention on trajectory data modeling, indexing and query processing issues for trajectories and proposing new models specifically dedicated to moving objects and their trajectories. Based on the above motivation, MoveMine system is designed for the discovery of various kinds of movement patterns and knowledge in numerous applications such traffic control, climatological forecast and animal movement pattern. For instance, the animal migration demonstrates that there is a temporally and spatially correlation with the movement of creatures. In biological domains, many researchers discovered that some wild animals form large social groups when migration occurs. The study of animals' social behavior and wildlife migration are more concerned with a group of animals' movement patterns than each individual's. MoveMine System is integrated into two functions: moving Hnin Su Khaing is with University of Computer Studies, Mandalay, Myanmar. ( hninsukhaing@gmail.com). Thandar Thein is with University of Computer Studies, Yangon, Myanmar. ( thandartheinn@gmail.com). object pattern mining and trajectory mining. Trajectory data associated with moving objects is one of the fields which have increased in volume considerably. This indication becomes a challenge of finding moving animal belonging to the same group. Trajectory clustering take part in trajectory mining and there exits many algorithms using data mining techniques. In general, there are a lot of data mining methods developed for analyzing moving animal based on the nature of methods. Especially, the data analysis task of clustering is to find objects that have move in a similar way. is the one of the algorithms for clustering the trajectory data. It can find a number of clusters starting from the estimated density distribution of corresponding nodes but it cannot well cluster with very large densities. The goal of this work is to propose an efficient clustering algorithm which can solve the problem of for moving object trajectories. This algorithm is composed of three phases: partitioning; clustering and grouping. In partitioning phase, we divide the trajectory data into 'k' partitions. Then, we develop the clustering phase by exploiting with and finally we group the separated clusters. The rest of this paper is organized as follows: Section II presents the related work and Section III describes background theory for trajectory clustering. In Section IV, proposed clustering algorithm is discussed and evaluation is conducted in Section V. Finally conclusion is conducted in Section VI. II. RELATED WORK Trajectory clustering, one of which plays a major role in moving object trajectory mining. There are a lot of studies for trajectory data such as transportation management and behavioral analysis. The author [3] observed that the moving objects similarity between trajectory sets. He designed a similarity metric to find the similarity between trajectory sets where each set is generated by a moving object and based on these measures, he proposed a clustering algorithm to cluster trajectory sets. In order to prove the effective and efficiency of algorithm his algorithm, he conducted with intensive experiments using mobile phones data. To reduce the estimating of complex parameters, complexity and computational cost for human analyst, a vector field k-means clustering technique was proposed in [8] that took together ideas from visualization [2], data clustering and scalar field design to find a locally optimal cluster and demonstrated that how can find global patterns and handle 74

3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb. 11-12, 214 Singapore partial trajectories.

They use silhouette coefficient as a measure for quality of their approach and they showed the performance and accuracy on both real and synthetic dataset.

2 3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb , 214 Singapore partial trajectories. An extended k-means technique for clustering moving objects was proposed in [9]. They use the direction as a heuristic to determine the different number of cluster. They use silhouette coefficient as a measure for quality of their approach and they showed the performance and accuracy on both real and synthetic dataset. The authors [1] presented a density based k-nearest Neighbors Clustering Algorithm for trajectory data which can resolve the sensitive user defined parameters problem in. This cluster method has three main features; discovering clusters of arbitrary shape, strong ability of disposing noise; easily setting the input-parameter; and the recommended value is more accurate than others. They use two real datasets of moving vehicles in Milan (Italy) and Athens (Greece) and extensive experiments were conducted. To predict the locations of moving objects, clustered periodical trajectories used a compact representation of spatiotemporal trajectory in [1]. They suggested an algorithm by using cluster's centroids to predict future locations with experimental real-world data and evaluated the precision and recall of the result. A new partition and group framework for trajectory clustering (TRACLUS) was proposed in [4]. In this algorithm, a trajectory is partitioned into a set of line segments and then, grouped similar line segments together into a cluster. For partitioning algorithm, they used the minimum description length (MDL) principle. They demonstrated that TRACLUS correctly discover common sub-trajectories from real trajectory data. In this paper, a new clustering algorithm is purposed and we show that how the algorithm is more efficient and effective than others by comparing with real world trajectory dataset. III. PRELIMINARY CONCEPTS Despite the growing demands for diverse applications, there have been few scalable tools available for mining massive and sophisticated moving object data. MoveMine system has two categories based on the nature of methods: pattern mining and trajectory mining [1]. A. Pattern Mining The first category is moving object pattern mining which emphasizes the analysis of discrete locations with temporal information [11]. It includes swarm pattern, periodic pattern and follower pattern in Fig 1. B. Trajectory Mining Trajectory mining in Fig 2, focuses more on the mining of trajectories associated with geometric shapes, such as clustering and finding outliers from hurricane path across years [11]. Trajectory clustering is the process of finding a set of physical or abstract objects into classes of similar object by applying the various clustering algorithms such as k-means, k- nearest neighbors and etc depend on their trajectory dataset. Fig. 1 Pattern Mining Trajectory outlier is a object that is different from or inconsistent with the remaining set of data. It can be used by outlier algorithm such as distribution-based, distance-based, density-based and deviation-based [6]. Trajectory classification is model construction for predicting the class labels of moving objects based on their trajectories and other features. C. Clustering Techniques Fig. 2 Trajectory Mining Clustering is a dynamic field of research in data mining and an unsupervised learning process because there are no class labels to help. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. A cluster of data objects can be treated collectively as one group and so may be considered as a form of data compression. In general, the major clustering methods can be classified into the following categories. A partitioning method first creates an initial set of k partitions, where parameter k is the number of partitions to construct. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another. Typical partitioning methods include k- means, k-medoids, CLARANS, and etc. A hierarchical method creates a hierarchical decomposition of the given set of data objects. The method can be classified as being either agglomerative (bottom-up) or divisive (top-down), based on how the hierarchical decomposition is formed. In densitybased method, it clusters objects based on the notion of density. It either grows clusters according to the density of neighborhood objects (such as in ) or according to some density function (such as in DENCLUE). A grid-based 75

3 3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb , 214 Singapore method first quantizes the object space into a finite number of cells that form a grid structure, and then performs clustering on the grid structure. A model-based method hypothesizes a model for each of the clusters and finds the best fit of the data to that model [5]. the points which have similar distance of NeighborPts. All points that are found in eps, neighbor are added into cluster (C). This process continues until the connected cluster is completely found. D.Distance Measure In our analysis scenario, we evaluate the distance between Latitude and Longitude of points using Euclidean distance [5]. where X1 = (x11, x12,, x1n) and X2 = (x21, x22,, x2n), shown in equation (1). Algorithm: Efficient Clustering Algorithm Input: number of clusters K, epsilon eps, minimum point MinPts, threshold t, trajectory dataset D Output: set of trajectory clusters Set C to be ; Partition (D,K); Grouping (t); /*PARTITIONING PHASE*/ Partition(D, K) for each( k ε K) //partition the data to k Clustering(D); /*CLUSTERING PHASE*/ Clustering (D, eps, MinPts, t) for each (d ε D) do visited = P;// randomly selected NeighborPts = regionquery (P,eps) // find the neighborpts by using distance function if (sizeof(neighborpts) < MinPts) then Noise=P; else C++; expandcluster (P, NeighborPts, C, eps, MinPts) function expandcluster (P, NeighborPts, C, eps, MinPts) C=P; for each (n ε NeighborPts) do if(p!=visited) then visited=p ; NeighborPts = regionquery(p,eps) // find the neighborpts by using distance function if (sizeof (NeighborPts ) >= MinPts) then NeighborPts = NeighborPts joined with NeighborPts // join the NeighborPts if (P is not yet member of any cluster C) then C=P retrun; function regionquery (P, eps) Euclidean Distance//calculate distance return all points within P s eps-neighborhood /*GROUPING PHASE*/ Grouping (t) for each (c ε C) mean(c) // calculate the mean value of each cluster diff= difference of mean value of c with previous c if(diff<t) then c= join the two c; // join the two cluster return joined clusters dist (X1, X 2 ) n i 1 ( x1i x2i ) 2 (1) IV. PROPOSED EFFICIENT CLUSTERING ALGORITHM The proposed trajectory clustering algorithm consists of three phases; partitioning; clustering; and grouping. Initially we perform the partitioning phase by decomposing the trajectory into k partition. In second, we apply the clustering phase on each partition. In grouping phase, we reform the separated clusters. Architecture of proposed clustering algorithm is shown in Fig 3. Fig. 3 System Flow for Proposed Clustering Algorithm A. Partitioning Firstly, we perform the partitioning phase on the trajectory dataset in order to improve the efficiency of our algorithm. To reduce the computation time in [7] which take more time to perform the similarity measure, we make enhancing it by dividing the data into k partitions. This algorithm mainly emphasizes on huge amount of data and it requires a parameter k for number of partitions. B. Clustering After partitioning the trajectory data, here, we apply the clustering algorithm. Having k partitions from previous steps, we now apply to cluster on each partition and it also needs two parameters epsilon (eps) which is the distance within we form cluster and minimum point (MinPts) in each cluster respectively. In this phase, it starts with arbitrary point that has not been visited and then compute the similarity using the Euclidean Distance in (1) for finding the neighbor points (NeighborPts) within eps and if the size of neighbor is less than MinPts, we eliminate the point as noise. For expanding the cluster, we find Fig. 4 Proposed Clustering Algorithm 76

3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb. 11-12, 214 Singapore C. Grouping Now, we present the grouping of resulted clusters.

Due to the spread clusters from partitioning phase, uncertain cluster will produce. In this phase, we calculate the mean values in each cluster, then, comparing with each cluster to others.

4 3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb , 214 Singapore C. Grouping Now, we present the grouping of resulted clusters. In order to improve the effectiveness of clustering algorithm, we group the clusters in each partition. This phase is necessary to protect the spread clusters without including in dense region. Due to the spread clusters from partitioning phase, uncertain cluster will produce. In this phase, we calculate the mean values in each cluster, then, comparing with each cluster to others. Here, we need to define the threshold (t) for grouping the two or more cluster. If the difference is less than threshold, group the clusters and we verify the effectiveness of our algorithm by measuring with Sum Squared Error (SSE). V. EXPERIMENTAL EVALUATION In this section, we evaluate the proposed clustering algorithm by trajectory data set. We compare proposed algorithm with. We also describe the data set used in experiment and discuss the experimental results. A. Experimental Study The animal trajectory dataset is used to conduct the effectiveness of the proposed clustering algorithm. It has been generated by Starkey project. This data set contains the radiotelemetry locations (with other information) of elk, deer, and cattle from the years 1993 through 1996We use elk's movements in 1993 and deer's movements in 1995 and cattle's movement in Elk has 33 trajectories and points; Deer 32 trajectories and 265 points; Cattle 41 trajectories and points. They have coordinates points which define by Universal Transverse Mercator (UTM) and 2 fields such as UTMGrid, UTMGridEast, UTMGridNorth and etc [13]. We extract the x, UTMGridEast and y, UTMGridNorth coordinates from the telemetry data for our experiments. We perform the evaluation of proposed clustering algorithm by comparing with on trajectory data. B. Performance Matrix We show the performance of computation time on varying data size of animal trajectory by making a comparison of and proposed clustering algorithm. In our study we find the fact that changing of data size effect the number of cluster. We also attempt to measure the clustering quality by employing Sum Squared Error (SSE). In order to measure the clustering quality independent from the features used for clustering and the number of clusters produced as a result our analysis use SSE in (2). numclus 2 (1/ 2 i (, ) ) i 1 x C i y C i SSE C dis x y (2) We conduct the experiments on core i7 with 8GBytes of main memory, running on Windows 7. We implement our algorithm in jdk 1.7 on Eclipse Juno. C. Result Discussion The experiment studies the effect of changing the data size among trajectory on clustering computation time for both and proposed algorithm. In this experiment, we find that our clustering algorithm performs well in large datasets. This experiment shows that due to the increasing number of data size as a result of less computation time. algorithm takes more time for clustering of all objects. Fig 5 proofs that the differences of performance gain is more significant on large datasets. Although changing the data size, our algorithm changes the running time slightly. Time(milisec) Animal Trajectory Data Size 9 Proposed Agorithm Fig. 5 Performance Comparison of and Proposed Algorithm Fig 6 shows that SSE values of proposed algorithm. We discover that SSE value of our algorithm is drastic compare with. We define that error of proposed algorithm is less than. It means that there are small numbers of SSE. The small number of SSE, our algorithm correctly classified. SSE Animal Trajectory Data Size Fig. 6 Sum Square Error of vs. We also study the changing of data size effect the number of cluster. The small number of cluster means an increase in cluster size. Our algorithm well cluster without depending on the changes of data size. We find that has dependent of data size due to the expand cluster. So, it has more computation time and large number of clusters. We address these problems by an efficient clustering algorithm for large trajectory datasets. 77

5 3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'214) Feb , 214 Singapore No. of Cluster Data Size Fig. 7 Accuracy of vs. D. Effect of Parameter Values We study of changing the parameter value of eps on the clustering result. If we use a smaller eps, we discover a larger number of clusters. But if the value of eps is less than 3, we find that only cluster discover in algorithm. We have tested the effects of varying parameter values for both algorithms. To study the effect of epsilon value on number of cluster, we conduct the experiment with various epsilon values. According to the experimental result shown in Fig 8, we observe that epsilon value is less than 45, the number of cluster is smaller. The epsilon value is between 45 and 55, the optimal number of cluster is achieved. The epsilon value is greater than 55, the number of cluster is decreasing. Fig. 8 shows the clustering result of optimal parameter using the different values between 35 and 125. No. of Cluster Epsilon(eps) algorithm we conducted the performance evaluation and analyze the results by comparing proposed algorithm and. REFERENCES [1] A.K. Akasapu, P.S. Rao, L. K. Sharma and S. K. Satpathy, Density Based k-nearest Neighbors Clustering Algorithm for Trajectory Data, International Journal of Advanced Science and Technology, Vol. 31, June 211. [2] G. McArdle, A. Tahir, M. Bertolotto, "Spatio-Temporal Clustering of Movement Data: An Application to Trajectories Generated by Human- Computer Interaction", ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume I-2, 212, XXII ISPRS Congress, 25 August 1 September 212, Melbourne, Australia. [3] J. Dai, "A Novel Moving Object Trajectories Clustering Approach for Very Large Datasets", in: Proceeding of 2nd International Conference on computer Science and Electronic Engineering (ICCSEE 213). [4] J.G. Lee, J. Han, and K.-Y. Whang. Trajectory Clustering: A partitionand-group framework, in SIGMOD '7: Proceeding of the 27 ACM SIGMOD International Conference on Management of Data. New Yourk, NY, USA: ACM, 27. p [5] J. Han and M. Kamber, "Data Ming: Concept and Technique", 2nd edition, Morgan Kaufmann, p. 348 and 398, 26. [6] J. G. Lee, J. Han and X. Li, "Trajectory Outlier Detection: A Partition and Detect Framework", Data Engineering 28,ICDE, 28, IEEE International Conference, April 7-12,28. p [7] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A density-based algorithm for discovering clusters in large spatial databases", in: Proceeding of 1996 International Conference Knowledge Discovery and Data Mining (KDD 96), pages , Portland, OR, Aug [8] N.Ferreira1, J. Klosowski, C. E. Scheidegger, C. T. Silva1, " Vector Field k-means: Clustering Trajectories by Fitting Multiple Vector Fields", Eurographics Conference on Visualization (EuroVis) 213, Volume 32 (213), Number 3. [9] O. Omnia, H. M.O. Mokhtar, M.E. El-Sharkawi, An extended k-means technique for clustering moving objects, Egyptian Informatics Journal, Cairo University, March 211, Volume 12, Issue 1, p [1] S.Elnekave, M. Last, O. Maimon, "Predicting Future Locations Using Clusters' Centroids", in: Proceeding of 15th annual ACM international symposium on Advances in geographic information systems, ACMGIS 7, November 7 9, 27, Seattle, WA, USA. [11] Z. Li, M. Ji, J.G. Lee, L.A. Tang, Y. Yu, J. Han and R. Kays, "MoveMine: Mining Moving Object Databases", in: Proceeding of SIGMOD 1, ACM SIGMOD International Conference on Management of Data, June 6 11, 21, Indianapolis, Indiana, USA. [12] Z. Li, J. Han, M. Ji, L. Tang, Y. Yu, B. Ding, MoveMine: Mining Moving Object Data for Discovery of Animal Movement Patterns, Journal of ACM Transactions on Intelligent Systems and Technology (TIST), Volume 2 Issue 4, July 211, Article 37, ACM New York, NY, USA. [13] Fig. 8 Effect of eps values on number of clusters VI. CONCLUSION In this paper, we propose an efficient clustering algorithm for trajectory data. It composes of three phases; partitioning; clustering and grouping. clustering algorithm cannot cluster well in very large densities and distance calculation is time consuming. To overcome time consuming issue, we conducted the partitioning of dataset first and then trajectories are clustered by applying algorithm in each partition. Finally we perform the grouping phase to integrate the spread clusters. To evaluate the effectiveness of proposed 78

Unsupervised learning on Color Images

Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra