A Density-Based Clustering Structure Mining Algorithm for Data Streams

Size: px
Start display at page:

Download "A Density-Based Clustering Structure Mining Algorithm for Data Streams"

Transcription

1 A ensity-based Clustering Structure Mining Algorithm for ata Streams Huan Wang 1, Yanwei Yu 2, Qin Wang 3, Yadong Wan 4,* School of Computer and Communication Engineering, University of Science and Technology Beijing. 30 th Xueyuan Road, Haidian istrict, Beijing, China. 1 huanwng@gmail.com, 2 yuyanwei0530@gmail.com, 3 wangqin@ies.ustb.edu.cn, 4,* yadong.wan@gmail.com. ABSTRACT Today, advances in hardware and storage techniques demand for automatically data mining on data streams. Clustering analysis is an importance tool on data streams mining. Though density-based clustering algorithms on data streams now could discover clusters of arbitrary shapes, their effectiveness are depended on parameters settings. Also global parameters used in these algorithms limit their ability in discovering overlapping clusters. In this paper, we propose a novel density-based clustering structure mining algorithm for data streams OPCluStream. It could adaptively discover clusters of arbitrary shapes and overlapping clusters. Satisfying one-pass constraint, OPCluStream uses a tree topology to index points on which points link to other related ones using pointers directionally. This tree topology records relationships among points, which represent clustering results including a broad range of settings and could discover clusters through a transformation to clustering structure. Clustering structure is equivalent to the index structure and convenient to be used. In addition, OPCluStream has a high efficiency on clustering since a usage of tree topology in points index and a designed limited computing area when new points added to data streams. A number of experiments on synthetic and real data sets illustrate the effectiveness, efficiency and insights provided by our method. Categories and Subject escriptors H.3.3 [Information System]: Information Storage and Retrieval Information Search and Retrieval, Clustering. General Terms Algorithms, Performance, esign, Experimentation, Theory. Keywords ensity-based Clustering, ata Stream, Clustering Structure, Tree Topology. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BigMine'12, August 12, 2012 Beijing, China Copyright 2012 ACM /12/08... $ INTROUCTION In recent years, an enormous number of streaming data have been generated from many real time applications such as intelligent transportation system, traffic management, person-tracking, logistics management system, networks flow management [1]. iscovering knowledge automatically and efficiently from data stream is important. Clustering analysis is a primary tool for data stream mining, which could be used either as a stand-alone tool for observation or a pre-precession for further analysis [2]. Clustering data stream has drawn lots of attention in last few years. iscovering clusters of arbitrary shapes from data streams on realtime is a difficult and important problem in these real time application scenarios. Problem of parameters depended also limits the effect of mining algorithms. Identifying clusters of arbitrary shapes is always a big problem on clustering. ensity-based clustering algorithms could discover clusters of arbitrary shapes in databases, such as BSCAN [3], GBSCAN [4] and ENCLUE [5]. But they can t be applied to data stream clustering. On data streams processing, Martin Ester et al. in [6] propose incremental-bscan, which inserts and deletes points incrementally on the basis of BSCAN. It proves that any changes are only occurred in the neighborhood of new points. It processes efficiently in data warehousing. It could find clusters of arbitrary shapes but has dependence on parameters settings. There are many other clustering algorithms on data streams. Aggarwal et al. in [7] propose CluStream, which separates clustering process into 2 components online micro-clustering component and offline macro-clustering component. Online micro-clustering component gets appropriate summary statistic in data streams and offline macro-clustering component uses these summary statistic in conjunction with other user inputs to give clusters. CluStream also has Pyramidal Time Frame, which is a trade-off between the storage requirement and the ability to recall from different time horizons. Many algorithms are developed based on two component structures of CluStream, such as ACluStream [8], enstream [9], SStream [10] and C- enstream [11]. Similar to K-means [12], CluStream performs well on convex clusters, but it s ineffective on non-convex clusters. enstream defines core-micro-clusters and outliers micro-cluster structure to maintain and distinguish clusters and outliers. It could recognize clusters of arbitrary shapes but has large time complexity because of numbers of time vector calculation. - Stream [13] is density-based two phrase clustering algorithm on

2 data streams, which has two components online component and offline component. Online component maps points into grids and offline component computes grid density and clusters based on the density. Grids are divided into 3 kinds dense grid, sparse grid and transitional grid. When new points arrive, related grids are updated. And they will be augmented, while clustering. It is efficient on grid but loses information on data sets relatively. Tu et al. propose grid attraction concept in [14] to address information lose problem in -Stream, but it computes a lot for grid attraction, thus, it increases time complexity. OLStream [21] designed on the basis of BSCAN uses grid technique to optimize querying procedure and get efficiency, but its effectiveness is depended to parameters. Many adaptive clustering algorithms are proposed. On [15], Kranen et al. propose an algorithm, which could adaptively change its processing speed according to the arriving speed of data streams. This algorithm uses ClusTree that they propose on the same paper, which is an index structure based on microclusters. On [16], Xu et al. propose an evolutionary framework, AFFECT, which could adaptively get optimal smoothing parameters settings for static algorithms using a shrinkage method. Points on each time step generate matrix of proximities, which have two states current state and past state. These states are estimated using shrinkage estimators and, then, the results could be used on optimizing static parameters. They propose a new method, which could estimate the weight of past proximities adaptively varying over time. For static clustering, there are many researches for parameters optimization on BSCAN [17, 18, 19]. OPTICS is proposed to solve parameters depended problem on density-based clustering algorithms on static data base [20]. It proposes two concepts core distance and reachability distance to organize points. While clustering, organized points are added to a list ordered by reachability distance and spatial position. This list is called clustering structure. It contains a broad range of parameters setting on one clustering structure and could be retrieved to many different clusters divisions according to sub, which is a new parameter working like in BSCAN. OPTICS solves parameters depended problems and could find arbitrary shaped clusters and overlapping clusters, but it can t be used in data streams. There has been many algorithms proposed to discover clusters of arbitrary shapes and overlapping clusters, but few of them could solve both of these problems on data streams. CluStream could be used on data stream, but it s inefficient on non-convex clusters. ensity-based algorithms are sensitive to arbitrary shaped clusters, but many of them can t be used on data streams or has other problems on data streams mining. BSCAN and related static algorithms can t be used on data streams. enstream is inefficient. -Stream is efficient but ineffectiveness. Some possible parameters adaptive method on static density-based algorithms can t be used on data streams and solutions designed on data streams couldn t solve these two problems at the same time, such as AFFECT. In this paper, we use tree topology to describe points structure in data sets and maintain this topology while new points arriving. This topology could find arbitrary shaped clusters and overlapping clusters. Also, it contains a broad range of parameters setting itself and thus, solve parameters depended problem on density-based clustering algorithm. Every point on data streams is directed to at most one point around it and thus, builds a digraph. Generated under some conditions, this digraph could become tree. This tree consists of all points in data sets and could be transformed to clustering structure as the one in OPTICS. This clustering structure could be used in a similar way as OPTICS. Related definitions are presented on section 2. We briefly introduce related definitions used in OPCluStream defined in OPTICS and BSCAN. In section 3, we describe basic definitions of OPCluStream, especially tree topology concept. Also, program descriptions are presented here. Experiments on effects, efficiency and parameters sensitivity are showed on section 4. Section 5 concludes the paper with a summary. 2. ensity-based Clustering 2.1 Related efinitions The formal definitions for density notions are introduced on the following on the basis of definitions in BSCAN and OPTICS. There are two input parameters, (ilon Neighborhood) and MinPts (Minimum Number of Points). For a detailed presentation see BSCAN [3] and OPTICS [20]. Let be the current set of data points. efinition 1. (-neighborhood of a point) The neighborhood of a point p, denoted by N ( p ), is defined by ( ) ( ) N p = { q dist p,q }. According to definition 1, points in could be divided into 3 groups core points, border points and noises. Core Point: A point p is a core point if the cardinality of its neighborhood is not less than MinPts. Border point: A point p is a border point if the cardinality of its -neighborhood is less than MinPts and at least one of its neighborhood is core point. Noise: if the cardinality of a point s -neighborhood is less than MinPts and its -neighborhood contain no core points, the point is regarded as a noise. irectly ensity-reachable: A point p is directly densityreachable from point q, if p is in the -neighborhood of q and q is a core point. Because the point p may not be a core point, q may be not directly density-reachable from p. So directly density-reachable is an asymmetric concept. The relation is not mutual. ensity-reachable: A point p is density-reachable from point q, if there is a chain of points p 1 ; p 2 ; ; p n which satisfies conditions that p i is directly density-reachable from p i+1 (i=1,2,,n-1) and p=p 1 ; q=p n. ensity-connected: if two points p and q are density-reachable from a core point o, p and q are density-connected. The relation of density-connected is mutual for both points. efinition 2. (Core istance) For a point p, MinPts-distance(p) denotes the distance between p and its MinPts-th nearest neighbor point. Core-distance of p, denoted by core-distance(p), is defined as: " $ core-distance( p) = # %$ MinPts-distance( p), N ( p)! MinPts; UNEFINE,otherwise. core-distance is only available to core points and it is the smallest value of for a point which satisfies core point definition. For non-core point, core-distance is UNEFINE. efinition 3. (Reachability istance) For a point p, o!n ( p), the reachability-distance of p with regard to the point o, denoted as reachability-distance(p, o) is defined as:

3 reachability-distance( p,o) =! # " $# UNEFINE, if N (o) < MinPts; max(dist(o, p),core-distance(o)),otherwise. If p s reachability-distance is not UNEFIENE, we call that p gets its reachability distance from o. Here, the point o is limited to locate in -neighborhood of p. Only reachability-distances smaller than are valuable to be distinguished. Points with this value beyond are processed in same way as noises. In addition, after re-definition, available points, which have values not UNEFINE and then need to be calculated, are limited to a much smaller area, -neighborhood of this point, than the whole data set in OPTICS definition. 2.2 Order Points Clustering A short example of order points clustering approach is presented on figure 1. p4 R(p4) R(p3) o C(o) p1 R(p1) R(p2) p2 reachability-distance( p,o)! reachability-distance( p,o') Points being directly core reachable to p must be in its neighborhood. efinition 5. (Core-Reachable) A point p is core-reachable from point q, if there is a chain of points p 1, p 2,, p n which satisfies conditions that p i is directly core-reachable from p i+1 (i=1,2,,n- 1) and p=p 1 ; q=p n. Father: A point p is father of point q, if q is directly corereachable from p and q get its reachability-distance with respect to p. Since every point has only one reachability-distance and it s got from its directly core-reachable point, there is at most one father for any point. Ancestor: A point p is ancestor of point q, if there is a chain of points p 1, p 2,, p n which satisfies conditions that p i is father of p i+1 (i=1,2,,n-1) and p=p 1 ; q=p n. We could find the relation between father and directly corereachable and ancestor and core-reachable that, because every point has only one reachability-distance, father is the one used among all available points which satisfy the directly corereachable condition and ancestor satisfies core-reachable condition as it is an extension of father. Figure 2 shows a set of points, a, b, c,, k, l, p, q. p and q are located at the center of other points in square shape. p3 Figure 1 An example of Ordering Points clustering is figured on figure 1 and MinPts is supposed to be 3. Point o is core point because it has five -neighbors. And its coredistance is C(o), which is the distance between o and its third nearest point p 2. Any points nearer than p2 to o will be set to coredistance of o on reachability-distance and any points further than p2 to o within will be set to its real distance to its reachability-distance w.r.t. o. So points p 1 and p 2 are set the same values. For points p 3 and p 4, because they are further to o than p 2, their reachability-distance are their real distance to o. Now, points noted above are ordered from top to bottom as o, p 1, p 2, p 3, p 4 and all of them are added to a list in this order. For any points in this list, it will be processed one by one like o. Furthermore, points in the list need to maintain its order increasing by reachabilitydistance when a new point is added. 3. OPCluStream: Order Points to Clustering ata Stream In this section, we present the OPCluStream (Order Points to CLUstering data STREAM), which is designed to discover the clustering structures from data streams by using the order points clustering methods. ata stream is partly visible and fast incremental data sets, OPCluStream maintains a tree topology of received points, and the tree is updated while a new point has been arriving, this tree can be fast transformed to an order points list representing clustering structure when the user requests. 3.1 Tree Topology efinition 4. (irectly Core-Reachable) p, o!, p is directly core-reachable from o, if for o!n ( p), it satisfies the following condition: Figure 2 A set of points Supposed MinPts=2 and which is large enough to contain all points for each point in. According to the directly corereachable, if a point m is directly core-reachable from another point n, we use an arrow pointing from m to n. we can obtain a digraph topology, as shown in figure 3.This digraph topology describes hierarchical structures of points. Points connected by arrows constituent groups and points in same group are neighbors geographically and locate on similar density. And arrows represent expanding directions of a group. Points, which have no arrow points to other points, are border points of a group. Figure 3 A digraph topology of In order to get clustering structure from digraph topology, we propose a tree topology. However, In the digraph of points set, some isolated rings may be exist, such as Points, a, b, c,, k, l, form a ring and so do p and q in Figure 3. The separate rings will be processed as separate trees meaning separated clusters. But the ring may as a result of directly core-reachability, they maybe is a high-density sub-clusters. To avoid the effect of input parameters,

4 we should try to avoid forming the rings and make all points into a tree topology. Here we give a ring check and breaking method. Ring check and breaking method: When a point p finds a core point o from which it is directly corereachable, condition of ring will be checked whether it is satisfied. If not, the core point o is set p s father, or a breaking procedure is performed as shown in figure 4. RingCheckAndSetValue(p,o) 1. t! o; 2. while(t. father <> p " t. father <> null) 3. t! t. father; 4. if(t <> null) 5. if(p.dist(o. father)<o.dist(o. father)) 6. p. father! o. father; 7. o. father! p; Figure 4 Code of RingCheckAndSetValue Following the ring check and breaking method, a tree topology of in figure 5 is obtained. Figure 5 A Tree topology of 3.2 New Points Effect When a new point arrives and is to be added in data set, this point will only effect on the clustering structure of a limited area. Thus, OPCluStream only need to process new points and their neighborhood, through re-assignment of core-distance and reachability-distance and update the tree topology when new points added to current data set. When a new point is received, the effect points could be divided into 3 groups: 1. New core points: Set of new core points in neighborhood of point p, denoted as NewCore(p), is define p as NewCore(p) = {q! N ( p ) N ( q) < MinPts and N ( q) MinPts}. If the new point p is a core point, p is an element of NewCore(p). A point changed to be core point from border point or noise. 2. Changed core points: Set of core points changed their coredistance in -neighborhood of point p, denoted as ChangeCoreist(p), is defined as ChangeCoreist(p) = p {q! N ( p ) N ( q) MinPts and dist(q, p)! core-distance(q) }. An existed core point changed its core-distance. 3. Reachability-distance changed points: if p is core point, reachability-distance changed points in -neighborhood of p, denoted as ChangeReaist(p), is defined as ChangeReaist(p)={q! N ( p ) reachability-distance(q, p) < reachability-distance(q,o) }, o is previous father of q. A point changed its reachability-distance. Because the order clustering of points sorts the points according to their reachability-distance, although the first 2 groups don t change their orders, they may spread its change to points in its -neighborhood as a result of changes of core-distance. And the points of ChangeReaist(p) would directly change their order. As points in NewCore(p) and ChangCoreist(p) will transfer its effect to its neighbors, Therefore, NewCore(p) and ChangeCoreist(p) will expand procession to its neighborhood. If p is core point, the ChangeReaist(p) will be processed in expanded process of NewCore(p). So we could directly perform the expanded process after find the NewCore(p) and ChangeCoreist(p). q 1 o 1 o 2 Figure 6 New Point Added p q 2 p: new point Figure 6 shows an example of new point effect, point p is the new received point. otted circle represents core-distance and real circle represents, and MinPts is 3. Because of arrival of p, p and q 1 became the new core point. And o 2 changes its coredistance. As a result of spreading property of first 2 kinds of changes, q 2 changes it reachability-distance and o 1 gets its new reachability-distance from q Algorithm Procedure When a new point p received, point p and the effected points will be changed in the tree topology. insertionpt( p) 1: SetOfPoints p; 2: list NewCore( p); 3: list list ChangeCoreist( p); //nearest core point 4 : get corept getcorepoint( p); 5:setReachabilityistance( p, corept); 6: p. father corept; 7 : foreach( q in list) 8 : foreach( d in N( q)) 9 : expandupdated( qd, ); expandupdated( corept, d) 10 : j corept. core- distance; 11: dist dist( corept, d); 12: t ( dist < j)? j : dist; 13:if( t < d. reachability- distance) 14: if( d is not ancestor of corept) 15: d. reachability- distance t; 16 : d. father corept; 17 : else if( reachability distance( corept, d. father) < d. reachability distance) 18 : corept. father d. father; 19 : d. father corept; Figure 7 Codes of Insertion and Expansion process Firstly, the new point p will be processed. For the new point p, find its father first and then get its reachability-distance. If there isn t a core point that p is directly core-reachable from, reachability-distance of p is UNEFINE. Secondly, find the NewCore(p) and ChangeCoreist(p), and all points in -neighbohood of which will be updated if it satisfies reachability-distance change.

5 If a point d in -neighbohood of NewCore(p) and ChangeCoreist(p) changes its reachability-distance with respect to CorePt. Then, check whether d is ancestor of CorePt or not. etailed code is described in figure Identifying Clustering Structure Tree topology maintained by OPCluStream could be transformed to clustering structure by the following method (see figure 8). When a new point needs to be added in clustering structure, we first check whether its father has been added. If so, add it point. Or add its father first. While adding its father, follow the same procedure until its father is null. For a point whose father is null, it could be added directly at the end of clustering structure. Savelist is final result clustering structure. Transform function is used to traverse all points SetOfPoints in order that every point is added when its father, if existed, has already located in savelist. When a point p with its father o is to be added in savelist, compare from the first point successive to its father and insert at the first place where reachability-distance of this point equals to or is larger than its value. savelist.get(setofpoints) 20:foreach(dp in SetOfPoints) 21: if(dp.processed is false) 22: transform(dp); transform(dp) 23:if(dp. father is null savelist.contains(dp. father)) 24 : savelist.add(dp. father, dp); 25:else transform(dp. father) 26: savelist.add(dp. father,dp); savelist.add(corept, p) 27:if(corePt=null) savelist.addatend(p); 28:else 29: while(++(i! order (corept)) < savelist.count) 30: if(savelist[i].reachability-distance " p.reachability-distance) 31: savelist.insertat(i, p); break; 32: if(i == savelist.cout) 33: savelist.addatend(p); Figure 8 Codes of savelist procession order (p) is the index of point p in savelist. 3.5 Clusters According to maintained tree topology, we could have the definitions about clusters: efinition 6: (Atom Cluster) A set of point A is an atom cluster, if all of its points satisfy a tree topology and then p!a: q!p.sons on tree, q is core-reachable from p, corresponding to tree topology. In atom cluster, every point is ancestor to any of its sons on a tree topology and root point is ancestor to any points in cluster except itself. Atom cluster is the basic unit of clusters. efinition 7: (Cluster) A set of points C is a cluster, if it satisfies the following conditions: 1. All points in C are in tree topology. 2. For a new tree topology C that any atom cluster in C is represented by its root point s father, p!c : q!p.sons on tree, q is core-reachable from p, corresponding to tree topology. We could find that cluster is a constituent of atom clusters and points. And if all atom clusters are represented by their root point s father, cluster could be seen as an atom cluster. Any set of points could be seen as a set of clusters or forest or tree in topology. 4. Experiments We conduct comprehensive performance experiments on real data and synthetic data to evaluate the effectiveness and efficiency of OPCluStream compared with OPTICS and OLStream. OPTICS- Grid tested on experiments is an optimized version of OPTICS that it s optimized on querying region through grid technique proposed in OLStream [21]. And OPCluStream also use this grid technique for querying neighbors. All algorithms are realized on C# using Visual Studio All the experiments are conducted on an i5 2.4 GHz Intel CPU system with 3G memory and Operation System is Windows 7. ata1 is a subset of SEQUOIA 2000 and contains 473 points located on squares. It contains several clusters on arbitrary shapes and overlapping clusters. ata2 is a synthetic database generated by Matlab and contains 4413 points including 300 noises located on square. It contains convex and non-convex clusters. ata3 is a subset of real trajectory database of Geolife project executed by Microsoft Research Asia [22]. This database collected 178 users 164,789 location points from May, 7, 2007 to May, 16, ata4 is a synthetic database containing 112,200 points located on 5,000 5,000 squares generated by Thomas Brinkhoff spatial data generator [23]. It has convex and non-convex clusters. 4.1 Effectiveness test An effect test of discovering clustering structures is performed on data1 and data2 for OPCluStream, comparing to OPTICS. Two algorithms are set same input parameters on same database. Setting =5, MinPts=2 on data1, and =30, MinPts=5 on data2. Clustering structures discovered by OPCluStreamon data1 and data2 are shown in figure 9(a) and figure9(c) respectively, and clustering structures discovered by OPTICS on data1 and data2 are shown in figure 9(b) and figure 9(d) respectively. sub is a parameter used in clustering structure to retrieve different clusters divisions. The result demonstrates that OPCluStream could find similar clustering structures compared with OPTICS. Although they have isolated parts (with different colors in figure 9) in different order on same data sets, each part of one clustering structure has a counterpart with similar shape in another clustering structure. Generally speaking, these two structures are the same. Parts of clustering structures are isolated logically and topologically, as they locate at different orders on these two algorithms that OPTICS reverses according to geological information of static data set in random order, and, on the contrary, OPCluStream reverses on data streams. An experiment of discovering effect of clusters is conducted for OPCluStream and OPTICS on data1. Figure 10(a), 10(b) and 10(c) show the clusters extracting from clustering structure discovered by our algorithm with different sub. Figure 10(d), 10(e) and 10(f) show the clusters extracting from clustering structure discovered by OPTICS with corresponding sub. The result shows that OPCluStream could discover clusters with arbitrary shapes, and identify the overlapping clusters.

6 maintained its speed at 10,000 points per seconds when processed data sets beyond 50,000 points. At record of 160,000 points on data3, run time of OPCluStream is one-fourteenth of OLStream and one-seventeenth of OPTICS with Grid. (see figure 11(b)) Figure 9 Comparison of clustering structures between OPTICS and OPCluStream Figure 10 Clusters with different sub 4.2 Efficiency test Since data1 is small and similar to data2, we use data2, data3 and data4 to test the clustering efficiency of OPCluStream comparing to OPTICS, OPTICS with grid and OLStream. In the experiments, we set =5, MinPts=2 on data2, =5, MinPts=2 on data3, =100, MinPts=2 on data4. The running time on different data is shown in figure 11. On data2, OPTICS with Grid, OLStream and OPCluStream get similar curves and run time of them are roughly equal on every record point. OPTICS run time increases rapidly since 1500 points and run time of OPTICS becomes roughly 3 times as large as of other 3 algorithms. (see figure 11(a)). On data3 and data4, run time of OPTICS increase rigidly at about 5,000 points and then it becomes much larger than the ones of the rest three algorithms. OLStream and OPTICS with Grid have similar trends though OPTICS with Grid isn t as smooth as OLStream. Run time of OPCluStream increase linearly and Figure 11 Efficiency of algorithms The curve of OPTICS with grid isn t smooth because we use grid optimization for it. In some cases, grid doesn t have good effect when points are dense around target point. Even in the worst cases, grid work better than original algorithm. OPTICS need to maintain a huge list of points on order to be processed, which is extremely inefficient especially while dealing with large data sets. And OPCluStream uses links to connect points with maintaining a real list and it deals with a limited number of points around target points every time. That is partly the reason why OPCluStream does better than OPTICS. ifference of speed is increasingly obvious when number of point rising. OLStream get similar curve while dealing with large data set on more than 100,000 points. Line of OPCluStream is linear and speed of OPCluStream maintains at 10,000 points per second on each of data set.

7 4.3 Sensitivity Analysis There are two parameters needed for OPCluStream and a parameter for clustering structure. Since different retrieving parameters of clustering structure don t change clustering structure itself, which is equivalent to the result of OPCluStream, we test about two parameters effects on OPCluStream. Figure 12(a) shows the running time with varying when MinPts is set to 5. And figure 12(b) shows the running time of our algorithm with varying MinPts when is set to 10. The running time is recorded when number of received points are 5k, 10k, 15k, 20k, 25k. Figure 12 Running time of algorithm w.r.t. parameters 7. Reference: [1] Sudipto Guha, Adam Meyerson, Nina Mishra et al Clustering ata Streams: Theory and Practice. IEEE Transaction on Knowledge and ata Engineering, 15(3): [2] Han Jiawei, Kamber M ata Mining Concepts and Techniques. Beijing: China Machine Press, [3] Martin Ester, Hans-Peter Kriegel, Jorg Sander et al A ensity-based Algorithm for iscovering Clusters in Large Spatial atabases with Noise. In: Proceedings of 2nd International Conference on Knowledge iscovery and ata Mining, Portland, USA: AAAI Press, [4] Jorg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu ensity-based Clustering in Spatial atabases: The Algorithm GBSCAN and its Applications. ata Mining and Knowledge iscovery, 2(2): [5] Alexander Hinneburg, aniel A. Keim An Efficient Approach to Clustering in Large Multimedia atabases with Noise. In: Proceedings of 4th International Conference on Knowledge iscovery and ata Mining, New York, USA: AAAI Press, [6] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu Incremental Clustering for Mining in a ata Warehousing Environment. In: Proceedings of the 24th Very Large atabases Conference, Ner York, USA: Citeseer, With increasing, run time was rising with stable speed when was smaller than 15 and with rigid speed when was greater than 15. As we use grid technique to querying region, while increasing, areas needed to be processed are expanding and points needed to be updated are rising, too. (see figure 12(a)). Effects of clustering structure are not only directly related to value of. in large value doesn t improve clustering structure very much though it carries out much more run time. When is fixed, changing MinPts will affect core-distance and reachability-distance of points. And with MinPts increasing, more time is needed to calculate and update their information. As showed on figure 12(b), changing of MinPts doesn t bring serious run time differences. Run times are increasing slowly and stably with MinPts rising. 5. Conclusion In this paper, we propose a tree topology, which describes the density structure of data streams. In tree topology, every point directs to its father, if exists. For noises, there is no father. We maintain this tree while points arriving on a data streams and transform it to clustering structure whenever needed. Clusters divisions could be retrieved through clustering structure. We give methods how to generate and transform. Experiments on effectiveness show this clustering structure could find overlapping clusters and contain information with broad range of parameters. It has linear time complexity and show high efficiency, especially on large data sets, more than 100,000 points. 6. Acknowledgement This work is supported by National Natural Science Foundation of P. R. China ( , ), octoral Fund of Ministry of Education ( ) and National 863 Plan Project of China (2011AA040101). [7] C.C.Aggarwal, J.Han, J.Wang and P.S.Yu A framework for clustering evolving data streams. In: Proceedings of the 29th Very Large atabases Conference, Berlin, USA: VLB Endowment, [8] Zhu Wei-heng, Yin Jian, Xie Yi-huang Arbitrary Shape Cluster Algorithm for Clustering ata Stream. Journal of Software, 17(3): [9] Cao Feng, Martin Ester, Qian Wei-ning, Zhou Ao-ying ensity-based clustering over an evolving data stream with noise. In: Proceedings of SIAM Conference on ata Mining, Bethesda, USA:SIAM Press, [10] J. Ren, R. Ma, and J. Ren ensity-based data streams clustering over sliding windows. In: Proceedings of the 6th International Conference on Fuzzy systems and Knowledge iscovery, Piscataway, USA:IEEE Press, [11] C. Ruiz, E. Menasalvas, and M. Spiliopoulou C- enstream: Using domain knowledge on a data stream. In: Proceedings of the 12th International Conference on iscovery Science, Berlin, Heidelberg: SpringerVerlag, [12] J. MacQueen Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA: University of California Press, [13] Chen Yi-xin, Tu Li ensity-based clustering for realtime stream data. In: Proceedings of the 13th ACM SIGK

8 international conference on Knowledge iscovery and ata Mining, California, USA: ACM, [14] Tu Li, Chen Yi-xin Stream ata Clustering Based on Grid ensity and Attraction. ACM Transactions on Knowledge iscovery from ata, 3(3): 1-27 [15] P. Kranen, I. Assent, C. Baldauf, and T. Seidl Self- Adaptive Anytime Stream Clustering, In 2009 Ninth IEEE International Conference on ata Mining (ICM), [16] K. S. Xu, M. Kliger, and A. O. Hero III Adaptive Evolutionary Clustering, arxiv.org, vol. cs.lg. [17] L. Xia and J. Jing SA-BSCAN: A self-adaptive density-based clustering algorithm, Journal of the Graduate School of the Chinese Academic of Sciences, 2: [18] Yingkun Cai et al An Improved BSCAN Algorithm which is Insensitive to Input Parameters [J], Acta Scicentiarum Naturalum Universitis Pekinesis, 40: [19] Y. Tan and R. Hu Adapted BSCAN with multithreshold, Journal of Computer Applications, 28: [20] M. Ankerst, M. M. Breunig, H.-P. Kriegel, J. Sander, M OPTICS: ordering points to identify the clustering structure, ACM SIGMO Record, 28(2): [21] Yanwei Yu, Qin Wang, Jun Kuang, Jie He An On-line ensity-based Clustering Algorithm for Spatial ata Stream, ACTA AUTOMATICA SINICA, 38(6): [22] Y., Zheng, Y., Chen, X., Xie, W., Y, Ma GeoLife2.0: A Location-Based Social Networking Service. In MM [23]

Towards New Heterogeneous Data Stream Clustering based on Density

Towards New Heterogeneous Data Stream Clustering based on Density , pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

DBSCAN. Presented by: Garrett Poppe

DBSCAN. Presented by: Garrett Poppe DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources

More information

Data Stream Clustering Using Micro Clusters

Data Stream Clustering Using Micro Clusters Data Stream Clustering Using Micro Clusters Ms. Jyoti.S.Pawar 1, Prof. N. M.Shahane. 2 1 PG student, Department of Computer Engineering K. K. W. I. E. E. R., Nashik Maharashtra, India 2 Assistant Professor

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

An Empirical Comparison of Stream Clustering Algorithms

An Empirical Comparison of Stream Clustering Algorithms MÜNSTER An Empirical Comparison of Stream Clustering Algorithms Matthias Carnein Dennis Assenmacher Heike Trautmann CF 17 BigDAW Workshop Siena Italy May 15 18 217 Clustering MÜNSTER An Empirical Comparison

More information

Heterogeneous Density Based Spatial Clustering of Application with Noise

Heterogeneous Density Based Spatial Clustering of Application with Noise 210 Heterogeneous Density Based Spatial Clustering of Application with Noise J. Hencil Peter and A.Antonysamy, Research Scholar St. Xavier s College, Palayamkottai Tamil Nadu, India Principal St. Xavier

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

OPTICS-OF: Identifying Local Outliers

OPTICS-OF: Identifying Local Outliers Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 99), Prague, September 1999. OPTICS-OF: Identifying Local Outliers Markus M. Breunig, Hans-Peter

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &

More information

Data Mining 4. Cluster Analysis

Data Mining 4. Cluster Analysis Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

arxiv: v1 [cs.lg] 3 Oct 2018

arxiv: v1 [cs.lg] 3 Oct 2018 Real-time Clustering Algorithm Based on Predefined Level-of-Similarity Real-time Clustering Algorithm Based on Predefined Level-of-Similarity arxiv:1810.01878v1 [cs.lg] 3 Oct 2018 Rabindra Lamsal Shubham

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

A Parallel Community Detection Algorithm for Big Social Networks

A Parallel Community Detection Algorithm for Big Social Networks A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic

More information

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures An Enhanced Density Clustering Algorithm for Datasets with Complex Structures Jieming Yang, Qilong Wu, Zhaoyang Qu, and Zhiying Liu Abstract There are several limitations of DBSCAN: 1) parameters have

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Boundary Detecting Algorithm for Each Cluster based on DBSCAN Yarui Guo1,a,Jingzhe Wang1,b, Kun Wang1,c

Boundary Detecting Algorithm for Each Cluster based on DBSCAN Yarui Guo1,a,Jingzhe Wang1,b, Kun Wang1,c 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) Boundary Detecting Algorithm for Each Cluster based on DBSCAN Yarui Guo1,a,Jingzhe Wang1,b, Kun Wang1,c 1 School of

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

K-means based data stream clustering algorithm extended with no. of cluster estimation method

K-means based data stream clustering algorithm extended with no. of cluster estimation method K-means based data stream clustering algorithm extended with no. of cluster estimation method Makadia Dipti 1, Prof. Tejal Patel 2 1 Information and Technology Department, G.H.Patel Engineering College,

More information

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at:

More information

Faster Clustering with DBSCAN

Faster Clustering with DBSCAN Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data

More information

Data Mining Algorithms

Data Mining Algorithms for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester

More information

Density Based Clustering using Modified PSO based Neighbor Selection

Density Based Clustering using Modified PSO based Neighbor Selection Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

Hierarchical Density-Based Clustering for Multi-Represented Objects

Hierarchical Density-Based Clustering for Multi-Represented Objects Hierarchical Density-Based Clustering for Multi-Represented Objects Elke Achtert, Hans-Peter Kriegel, Alexey Pryakhin, Matthias Schubert Institute for Computer Science, University of Munich {achtert,kriegel,pryakhin,schubert}@dbs.ifi.lmu.de

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment Navneet Goyal, Poonam Goyal, K Venkatramaiah, Deepak P C, and Sanoop P S Department of Computer Science & Information

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

AN IMPROVED DENSITY BASED k-means ALGORITHM

AN IMPROVED DENSITY BASED k-means ALGORITHM AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

Density-based clustering algorithms DBSCAN and SNN

Density-based clustering algorithms DBSCAN and SNN Density-based clustering algorithms DBSCAN and SNN Version 1.0, 25.07.2005 Adriano Moreira, Maribel Y. Santos and Sofia Carneiro {adriano, maribel, sofia}@dsi.uminho.pt University of Minho - Portugal 1.

More information

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

Research on Data Mining Technology Based on Business Intelligence. Yang WANG 2018 International Conference on Mechanical, Electronic and Information Technology (ICMEIT 2018) ISBN: 978-1-60595-548-3 Research on Data Mining Technology Based on Business Intelligence Yang WANG Communication

More information

Notes on Photoshop s Defect in Simulation of Global Motion-Blurring

Notes on Photoshop s Defect in Simulation of Global Motion-Blurring Notes on Photoshop s Defect in Simulation of Global Motion-Blurring Li-Dong Cai Department of Computer Science, Jinan University Guangzhou 510632, CHINA ldcai@21cn.com ABSTRACT In restoration of global

More information

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts Appl. Math. Inf. Sci. 6 No. 1S pp. 93S-97S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. A Novel Image Classification Model Based

More information

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes 1,* Chengpei Tang, 1 Jiao Yin, 1 Yu Dong 1

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Scalable Varied Density Clustering Algorithm for Large Datasets

Scalable Varied Density Clustering Algorithm for Large Datasets J. Software Engineering & Applications, 2010, 3, 593-602 doi:10.4236/jsea.2010.36069 Published Online June 2010 (http://www.scirp.org/journal/jsea) Scalable Varied Density Clustering Algorithm for Large

More information

This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author s benefit and for the benefit of the author s institution, for non-commercial

More information

Clustering Lecture 4: Density-based Methods

Clustering Lecture 4: Density-based Methods Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

2 Proposed Methodology

2 Proposed Methodology 3rd International Conference on Multimedia Technology(ICMT 2013) Object Detection in Image with Complex Background Dong Li, Yali Li, Fei He, Shengjin Wang 1 State Key Laboratory of Intelligent Technology

More information

Algorithm research of 3D point cloud registration based on iterative closest point 1

Algorithm research of 3D point cloud registration based on iterative closest point 1 Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,

More information

3 The standard grid. N ode(0.0001,0.0004) Longitude

3 The standard grid. N ode(0.0001,0.0004) Longitude International Conference on Information Science and Computer Applications (ISCA 2013 Research on Map Matching Algorithm Based on Nine-rectangle Grid Li Cai1,a, Bingyu Zhu2,b 1 2 School of Software, Yunnan

More information

MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid

MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid , pp.119-128 http://dx.doi.org/10.14257/ijdta.2015.8.2.12 MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid Li Ma 1, 2, 3, Lei Gu 1, 2, Bo Li 1, 4, Shouyi Qiao 1, 2 1, 2, 3,

More information

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

A Finite State Mobile Agent Computation Model

A Finite State Mobile Agent Computation Model A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,

More information

K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels

K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels 15 International Workshop on Data Mining with Industrial Applications K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels Madhuri Debnath Department of Computer Science and Engineering

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset

AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset Manisha Naik Gaonkar & Kedar Sawant Goa College of Engineering, Computer Department, Ponda-Goa, Goa College of Engineering, Computer Department,

More information

An Algorithm of Parking Planning for Smart Parking System

An Algorithm of Parking Planning for Smart Parking System An Algorithm of Parking Planning for Smart Parking System Xuejian Zhao Wuhan University Hubei, China Email: xuejian zhao@sina.com Kui Zhao Zhejiang University Zhejiang, China Email: zhaokui@zju.edu.cn

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Acta Technica 62 No. 3B/2017, 141 148 c 2017 Institute of Thermomechanics CAS, v.v.i. Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Zhang Fan 2, 3, Tan Yuegang

More information

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network Sumuya Borjigin 1. School of Economics and Management, Inner Mongolia University, No.235 West College Road, Hohhot,

More information

Density-Based Clustering over an Evolving Data Stream with Noise

Density-Based Clustering over an Evolving Data Stream with Noise Density-Based Clustering over an Evolving Data Stream with Noise Feng Cao Martin Ester Weining Qian Aoying Zhou Abstract Clustering is an important task in mining evolving data streams. Beside the limited

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

Online Clustering for Trajectory Data Stream of Moving Objects

Online Clustering for Trajectory Data Stream of Moving Objects DOI: 10.2298/CSIS120723049Y Online Clustering for Trajectory Data Stream of Moving Objects Yanwei Yu 1,2, Qin Wang 1,2, Xiaodong Wang 1, Huan Wang 1, and Jie He 1 1 School of Computer and Communication

More information

APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE

APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE Zhuohui Ren 1 and Cong Wang 2 1 Department of Software Engineering, Beijing University of Posts and Telecommunications, BeiJing City,

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 3, May 2017 Efficient Path Finding Method Based Evaluation Function in Large

More information

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan 5th International Conference on Civil Engineering and Transportation (ICCET 205) Traffic Flow Prediction Based on the location of Big Data Xijun Zhang, Zhanting Yuan Lanzhou Univ Technol, Coll Elect &

More information

An Edge-Based Algorithm for Spatial Query Processing in Real-Life Road Networks

An Edge-Based Algorithm for Spatial Query Processing in Real-Life Road Networks An Edge-Based Algorithm for Spatial Query Processing in Real-Life Road Networks Ye-In Chang, Meng-Hsuan Tsai, and Xu-Lun Wu Abstract Due to wireless communication technologies, positioning technologies,

More information

Research on Community Structure in Bus Transport Networks

Research on Community Structure in Bus Transport Networks Commun. Theor. Phys. (Beijing, China) 52 (2009) pp. 1025 1030 c Chinese Physical Society and IOP Publishing Ltd Vol. 52, No. 6, December 15, 2009 Research on Community Structure in Bus Transport Networks

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

Distance-based Methods: Drawbacks

Distance-based Methods: Drawbacks Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

DOI:: /ijarcsse/V7I1/0111

DOI:: /ijarcsse/V7I1/0111 Volume 7, Issue 1, January 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology

More information

Detecting Anomalous Trajectories and Traffic Services

Detecting Anomalous Trajectories and Traffic Services Detecting Anomalous Trajectories and Traffic Services Mazen Ismael Faculty of Information Technology, BUT Božetěchova 1/2, 66 Brno Mazen.ismael@vut.cz Abstract. Among the traffic studies; the importance

More information

6. Concluding Remarks

6. Concluding Remarks [8] K. J. Supowit, The relative neighborhood graph with an application to minimum spanning trees, Tech. Rept., Department of Computer Science, University of Illinois, Urbana-Champaign, August 1980, also

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

SSM-DBSCANand SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data.

SSM-DBSCANand SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data. SSM-DBSCANand SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data. Ms K.Santhisree, Dr. A Damodaram, Dept. of Computer science, Jawaharlal Nehru Technology

More information

Performance Analysis of Video Data Image using Clustering Technique

Performance Analysis of Video Data Image using Clustering Technique Indian Journal of Science and Technology, Vol 9(10), DOI: 10.17485/ijst/2016/v9i10/79731, March 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Performance Analysis of Video Data Image using Clustering

More information

Title Grid for Multimedia Communication Ne. The original publication is availabl. Press

Title Grid for Multimedia Communication Ne. The original publication is availabl. Press JAIST Reposi https://dspace.j Title Grid for Multimedia Communication Ne A Double Helix Architecture of Knowl Discovery System Based Data Grid and Author(s)Jing, He; Wuyi, Yue; Yong, Shi Citation Issue

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Scalable Density-Based Distributed Clustering

Scalable Density-Based Distributed Clustering Proc. 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, 2004 Scalable Density-Based Distributed Clustering Eshref Januzaj 1, Hans-Peter Kriegel

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 09: Vector Data: Clustering Basics Instructor: Yizhou Sun yzsun@cs.ucla.edu October 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images Sensors & Transducers 04 by IFSA Publishing, S. L. http://www.sensorsportal.com Cluster Validity ification Approaches Based on Geometric Probability and Application in the ification of Remotely Sensed

More information

OSM-SVG Converting for Open Road Simulator

OSM-SVG Converting for Open Road Simulator OSM-SVG Converting for Open Road Simulator Rajashree S. Sokasane, Kyungbaek Kim Department of Electronics and Computer Engineering Chonnam National University Gwangju, Republic of Korea sokasaners@gmail.com,

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks Journal of Computational Information Systems 11: 10 (2015) 3459 3467 Available at http://www.jofcis.com Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks Yu CHEN, Jian XU,

More information

manufacturing process.

manufacturing process. Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 203-207 203 Open Access Identifying Method for Key Quality Characteristics in Series-Parallel

More information

A Robust Image Zero-Watermarking Algorithm Based on DWT and PCA

A Robust Image Zero-Watermarking Algorithm Based on DWT and PCA A Robust Image Zero-Watermarking Algorithm Based on DWT and PCA Xiaoxu Leng, Jun Xiao, and Ying Wang Graduate University of Chinese Academy of Sciences, 100049 Beijing, China lengxiaoxu@163.com, {xiaojun,ywang}@gucas.ac.cn

More information

Multimedia Big Data Frame Combination Storage Strategy Based on Virtual Space Distortion

Multimedia Big Data Frame Combination Storage Strategy Based on Virtual Space Distortion Multimedia Big Data Frame Combination Storage Strategy Based on Virtual Space https://doi.org/0.399/ijoe.v3i02.66 Jian Luo Zhejiang Technical Institute of Economics, Hangzhou Zhejiang, China jiansluo@yahoo.com

More information

Social Network Recommendation Algorithm based on ICIP

Social Network Recommendation Algorithm based on ICIP Social Network Recommendation Algorithm based on ICIP 1 School of Computer Science and Technology, Changchun University of Science and Technology E-mail: bilin7080@163.com Xiaoqiang Di 2 School of Computer

More information

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Multiple Query Optimization for Density-Based Clustering Queries over Streaming Windows

Multiple Query Optimization for Density-Based Clustering Queries over Streaming Windows Worcester Polytechnic Institute DigitalCommons@WPI Computer Science Faculty Publications Department of Computer Science 4-1-2009 Multiple Query Optimization for Density-Based Clustering Queries over Streaming

More information