An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans

Size: px
Start display at page:

Download "An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans"

Transcription

1 2016 International Computer Symposium An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans Kuan-Teng Liao, Chuan-Ming Liu Computer Science and Information Engineering National Taipei University of Technology Taipei, TAIWAN Abstract Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering. Index Terms Uncertain data clustering, similarity, cluster boundary, UKmeans I. INTRODUCTION UKmeans clustering approaches have been used for dealing uncertain objects for several years. Like other uncertain clustering approaches, UKmeans clustering exists the time cost and effectiveness imperfections because of considering object errors. The time cost is the common imperfection in uncertain data clustering (UDC) because all possible positions of an uncertain cluster and an uncertain object should be considered when determining the belongingness of an object. Therefore, the clustering processes are usually inefficient and time-consuming. In addition to the time cost, the effectiveness of clustering is another imperfection when clustering uncertain objects because object errors usually lead an object into a wrong cluster. In this study, we propose an improved similarity, namely UKmeans with maximum distance density and weighted intersection (KMDI w ) to consider both the time cost and effectiveness, and define a centroid boundary, namely a square rooted boundary (SRB), to increase the effectiveness of clustering. These two mechanisms are recommended as follows. The KMDI w is an extension from the simplified similarity calculations in [7]. In the aforementioned study, researchers determined the belongingness of an object by a ratio expected distance (ED) which is contributed from minimum and maximum (minmax) distances to replace the ED calculated by integration. Compared to ED using by integration, the ratio ED can save much time in calculations because it ignores all possible positions from an object and a cluster and only considers the shortest and the farthest distance between an object and a cluster. Although the simplified similarity saves time, the effectiveness of clustering is sometimes low. In this study, the two factors, namely intersections and cluster density are considered for simplified similarity to increase the effectiveness of clustering. The intersection factor is the first important factor for increasing the effectiveness. It implies the degree of the overlapping between an uncertain object and a cluster, and is useful to raise the effectiveness. In [1], the researchers showed that an object easily belongs to the overlapping cluster because they may locate in the overlapping area. By using the factor, they obtained higher effectiveness compared to the one using the distance factor. In addition to the intersection factor, the cluster density is also a guide that affects the effectiveness when an object has the same degree in the distance and intersection factors to different clusters. In certain data clustering, a cluster density is used to be a weight to show the importance of clusters. However, in uncertain clustering, the aforementioned importance may result in generating a heavy cluster which has the most objects, and thereby it reduces the effectiveness because the cluster attracts objects easier than other clusters. In the study, an reversed cluster density is used. That means if a cluster contains more objects than others, the cluster provides a small centrifugal force to push objects left. However, this factor may lead objects threshing. For ensuring the stable clustering results, the factor should decay with the loops. In addition to the KMDI w, the SRB is the other mechanism that we propose. It is used for displaying the uncertainty of a centroid. In study [2], the researchers used a geometry /16 $ IEEE DOI /ICS

2 boundary (GB) which is an average value from object errors as a upper bound of the area to show possible positions of a centroid. However, the GB is usually large because objects errors are large. For tightening the boundary, we propose the SRB to minimize the size of the boundary to avoid the occurrence of clusters which have large boundary to reduce objects overlapping. Therefore, the SRB can increase the effectiveness of clustering efficiently. We have three contributions in this study. First, we mix three factors that affects results of clustering to increase the effectiveness of clustering. Second, we inherit the concept of simplified similarity for saving time in clustering. Last, we avoid the occurrence of the large size of the boundary. The paper is organized as follows. In section II, we will succinctly introduce the topics of the similarity and boundary. In section III, we will show our proposed model for clustering uncertain data; and illustrate the performance and accuracy with different approaches in section IV. Section V contains the conclusion and summary. II. RELATED WORK The UDC [1 6, 8] has been studied in the recent decade. It affects the effectiveness and performance because of errors from objects and clusters. Researchers proposed many approaches against these errors to increase the effectiveness and performance. In the following, we briefly review some improvements in the effectiveness term and then discuss them in the time cost term. Similarity is an intuitive factor that affects the effectiveness. According to various object presentations, the calculations of the similarity are also different. For example, the probability density function (pdf) is widely used for modeling an continuous uncertain object [8] and the errors in each dimension (eied) is used the simplified model that constructs from each dimensional error [5, 6] to model an uncertain object. In the pdf model, the similarity Sim is derived from the distance factor by using integration for evaluating the ED between a centroid C c and an uncertain object Ō. The formula is shown in Formula. 1 where f(x) is the pdf of O, andd(.) is the Euclidean distance between the point x (x O) and a centroid. Sim(O, C c )= f(x)d(x, C c )dx (1) In addition to the distance factor, the researchers in [1] considered that the factor, namely prioritized intersection, can increase the effectiveness and they showed that an uncertain object belongs to a cluster if an object has an intersection to a cluster. Compared to the distance factor, they considered that the intersection factor had higher priority (I pr ) when determining the belongingness of an object because the positions of the object and the centroid of the cluster might appear in the overlapping place. Besides the pdf presentation, the similarity calculating by the eied formatted objects, called the simplified similarity, is another popular approach. By an object center vector and object errors ψ(.) which are denoted in formula 2 where m is the dimension of an uncertain object Ō, ando Ei is the error of an object in each dimension, the similarity can be calculated easily. ψ(ō) = m i=1 O2 E i (2) Unlike the similarity using by integration, the simplified similarity only considers two types of distances, namely the minimum and maximum distances which is the lower bound and upper bound of the actual distance (Dist actual ) between a cluster and an object. In [7], the researchers obtained the Dist actual by using a ratio λ ( 0 λ 1) for the minimum distance MinDist(.), and the ratio (1-λ) for the maximum distance MaxDist(.) between an uncertain cluster C and an uncertain object Ō. According to the experiments, they resulted in the discovery that the maximum distance affects higher effectiveness of clustering compared to the minimum distance. The boundary of a centroid is another key that influences the effectiveness. It enhances the probability of object intersection because the intersection factor can increase the effectiveness of clustering. In the study [7], the researchers proposed the GB from object errors that the cluster has to reflect possible positions of a centroid. According to the intersection factor, the GB increases the effectiveness of clustering. The time cost is the other imperfection in UDC. For saving time-consuming, the improvement of time cost using by integration is reducing the visitations for an object. Ngai et al. [6] proposed a minmax distance pruning for decreasing the ED calculations. By an minimum bounding rectangle (MBR) which surrounds an object, some calculations can be omitted if the minimum distance between the object and a cluster is farther than the maximum distance between the object and another cluster. Besides the study [6], Lukic et al. [5] proposed a voronoi diagram (VD) mechanism to decrease the ED calculations and they also enhanced the imperfection from minmax distance pruning. At the beginning, the VD is partitioned by the pairs of centroids to form several individual closed regions, called voronoi cells (VCs). The researchers considered that the calculations can be ignored if the object completely stays in a VC. On the contrary, the belongingness of the remainder objects which are partially in a VC should be determined by ED calculations. In addition to aforementioned two mechanisms, Ngai et al. [6] proposed the boundary of the distance (BD) mechanism which inherits the concept of the triangle inequality to reduce the number of ED calculations. The BD formula shown in equation 3, where O is an uncertain object, C c is the centroid of the cluster, and y is the point in MBR of O. ED(O, C c ) ED(O, y)+ed(c c,y) (3) In the BD mechanism, the ED(O, C c ) depends on (1) the value of ED(O, y) that can preproccess with a point y in MBR of O by integration, and (2) the Euclidean distance between the point y and the centroid, ED(C c,y). Clearly,the BD mechanism can save time because the ED(O, y) only 301

3 calculate once in the first loop. Therefore, the BD can save much time in ED calculations. For decreasing time cost, some researchers used the simplified similarity to reduce the calculations. As mentioned previously, the simplified similarity uses the ratio of the minmax distance as the ED to replace the complex calculations by integration. Therefore, the similarity can be obtained more efficient than the one using by integration. III. PROPOSED APPROACHES We propose two mechanisms, the KMDI w which is an improved similarity and SRB for efficiently promoting the improvements of the time cost and effectiveness in the study. For considering both time cost and effectiveness, we use a concept of the simplified similarity and combine three factors that affect the effectiveness as the KMDI w for clustering. Besides the KMDI w, we also propose the mechanism of the centroid boundary that affects the degree of the intersection for increasing the effectiveness. In the following, we introduce these two mechanisms for clarity. Similarity is the common factor that has an influence in the time cost and the effectiveness in UDC. For decreasing the time cost efficiently, the simplified similarity is appropriate in this study because it only considers the minmax distance in the ED calculations. However, the effectiveness calculated from the simplified similarity is more unfavorable compared to the one calculated by integration. For the increasing the effectiveness, we extend the simplified similarity by inviting some factors that affect the effectiveness. The first factor is the intersection factor. As mentioned previously in section II, the prioritized intersection factor can increase the effectiveness; however, the prioritized intersection factor easily lead objects which have large errors into wrong clusters. To avoid the aforementioned situation, we use the weighted intersection to substitute the prioritized intersection. Unlike the prioritized intersection factor, the weighted intersection factor supplies a weight mechanism for a cluster if an object overlaps the cluster. This mechanism restrains the degree from the overlapping to be a weight and the weight value is between one and two. As a result, the similarity increases slightly when an overlapping occurs. The definition of the intersection I(.) factor can be formulated in formula 4, where d(.) is the Euclidean distance between the center of a cluster and an object. Cond implies the situation of overlapping and is equal to ψ(ō)+ψ( C) d(o, C). 1 Cond 0 I(Ō, C) = 1+ (d(o,c) ψ( C ) ψ(ō) Cond > 0 and ψ( C) <d(o, C) +ψ(ō) 2 ψ(ō) 2 Cond > 0 and ψ( C) d(o, C) +ψ(ō) (4) When an object circumscribes or does not overlap a cluster, the degree of the weighted intersection is equal to one that implies no effect in the similarity. If an object overlaps a cluster and at the same time, one of them does not cover completely to the other one, the degree of the weight is equal (d(o,c) ψ( C) ψ(ō) to 1+ 2 ψ(ō) which implies the degree of the intersection. On the contrary, the degree of the intersection is equal to two if a cluster or an object contain completely to the other. The next factor is the cluster density. The cluster density is the common factor to strengthen the important of clusters that generally have some special properties. In some situations, especially when an object has same the distances and degrees of the intersection, the density factor provides a slight reversed force to push the object to the cluster which has low density. The aforementioned mechanism can avoid the aggregation from most of objects, especially objects with large errors. However, the density factor also carries out the threshing when clustering. To converge the clustering results, the effect of the density factor should be decayed with the times of the loops. The formula is shown as Formula 5 where l implies the loop times, and n is the number of objects that an uncertain cluster C belongs to. M smooth ( C)=1+( 1 n )l (5) When the value of the loop times is large, M( C) 1 except n =1. Concluded with these factors, the similarity between two uncertainty of Ō and C can be presented as Formula 6 Sim(Ō, C) =α I(Ō, C)M smooth ( C) (6) Maxdist(Ō, C) where α is the coefficient between (1) the product of the intersection and the density and (2) the distance. When α<1, the factor of distance constructs more effects compared to the product of the intersection and density; otherwise, the factors of intersection and density are more important than the distance factor. In addition to the KMDI w, we also propose the boundary mechanism, called SRB, to increase the effectiveness of clustering. In section II, we introduce the definition of the GB in which each point reflects the possible position of the centroid. In general terms, the GB can easily obtain from errors and the radius (R GB ) formed from the average errors of uncertain objects. The definition of R GB is shown as Formula 7. n ψ( O k ) R GB ( C) k=1,k C = (7) n When objects in the cluster have large errors, the R GB is large and thereby the cluster will attract objects easily. To reduce the size of radius of the boundary, we use the SRB as the definition of the uncertainty of the boundary to avoid large errors of clusters. Unlike the GB, the SRB generates smaller boundary than the GB because the SRB calculates by the square root of the sum of object errors. The formulas of the SRB radius is shown as Formula 8 and the proof is shown as follows. n ψ( O k ) 2 R SRB ( C) = k=1,k C n (8) 302

4 (a) Case: large intersection (b) Case: small intersection (c) Case: tangent (d) Case: no intersection Fig. 1: Four cases of discussions of the areas that are the possible locations of the centroid; where O 1 and O 2 are uncertain objects, C 1 and C 2 are the centers of O 1 and O 2 respectively, the area that is surrounding by a red circle is GB, and an area of the green circle is SRB Proof. R SRB( C) = 1 n ψ(o k) 2 n k=1, O k C (R SRB( C)) 2 =( 1 n )2 n k=1, O k C ψ(o k) 2 k=1, O k C ( 1 n n )2 ( ψ(o k)) 2 =(R GB( C)) 2 Clearly, the size of the GB is equal to the SRB when the cluster only contain an object; on the contrary, the size of the SRB is smaller than the GB. For clearly showing the sizes of the GB and the SRB, we discuss the boundaries from two objects and illustrate the boundaries in four cases in Fig. 1a 1d. As shown in Fig. 1, assume O 1 and O 2 belong to the same cluster, and both of objects have same errors. The boundaries of the cluster can be easily calculated according to the individual definitions. Even though the situations of the intersection are different, both the GB and SRB provide a fixed size of boundary and the size of the SRB is smaller than GB. Algorithm 1 Algorithm of KMDI w 1: Input: Objects O = {o 1,o 2,o 3,..., o i,..., o n} 2: Output: Clusters C = {c 1,c 2,c 3,..., c k } Each c i contains objects 3: define two variables, J and J 4: J: the sum of max similarity; J : the previous sum of max similarity 5: repeat 6: J J 7: for each object o i and each cluster do 8: calculatesim(o i,c j ) 9: select max Sim(o i,c j ) and c j o i 10: add J max Sim(o i,c j ) 11: update SRB of clusters 12: until J J 0 Combined with the KMDI w and the SRB, the complete KMDI w algorithm is shown as Algorithm. 1. IV. SIMULATIONS We simulate two synthetic datasets and two real datasets to reveal the time cost and effectiveness of clustering with different mechanisms in order. In the time cost aspect, we (a) The time cost of one hundred of(b) The time cost in different datasets objects clustering with various mecha-witnisms in different various mechanisms datasets Fig. 2: Distribution of objects in two artificial datasets measure the time in (1) the same number of objects, and (2) the original number of objects in different mechanisms. In the former one, we show the time cost with various dimensions, whereas the latter one show the time spent in different datasets. In the effectiveness aspect, we verify the effectiveness to show the correctness of clustering by three common external criteria, namely accuracy, F 1 score, and purity first. In addition, we analyze the effectiveness from the different boundaries. All the simulations are programmed in C# and processed on Intel(R) Core(TM) i GHz on Window 7. A. dataset In the simulations, we use two synthetic datasets that are generated from the matlab and two real datasets iris and wine quality which are provided by UCI Machine Learning Repository 1 for simulations. In synthetic datasets, we generate a 2-dimensional and a 3-dimensional datasets with Gaussian distribution where the range of μ is [4, 12] and σ is [0.2, 0.8]. Each of synthetic datasets contains four categories. Since these two synthetic datasets and two real datasets are certain and precise values, we add Gaussian noise where μ err and σ err are respectively around [0.15, 0.5] of the μ and σ of the each original dimension to ensure that objects still belong to the original categories for measuring the effectiveness. The information of four datasets is shown in Table. I. 1 UCI Machine Learning Repository. URL: Accessed:

5 (a) synthetic dataset I (accuracy) (b) synthetic dataset II (accuracy) (c) iris dataset (accuracy) (d) wine quality dataset (accuracy) (e) synthetic dataset I (F1 score) (f) synthetic dataset II (F1 score) (g) iris dataset (F1 score) (h) wine quality dataset (F1 score) (i) synthetic dataset I (purity) (j) synthetic dataset II (purity) (k) iris dataset (purity) (l) wine quality dataset (purity) Fig. 3: Results of the effectiveness from various datasets (a) synthetic dataset I (accuracy) (b) synthetic dataset II (accuracy) (c) iris dataset (accuracy) (d) wine quality dataset (accuracy) (e) synthetic dataset I (F1 score) (f) synthetic dataset II (F1 score) (g) iris dataset (F1 score) (h) wine quality dataset (F1 score) (i) synthetic dataset I (purity) (j) synthetic dataset II (purity) (k) iris dataset (purity) (l) wine quality dataset (purity) Fig. 4: Results of the effectiveness of different boundary from various datasets 304

6 parameter/ dataset TABLE I: The information of datasets artificial dataset I artificial dataset II iris dataset wine quality dataset categories attributes dataset size B. compared methods In time cost aspect, we compare the UKmeans with different popular mechanisms mentioned in section II for comparisons. These mechanisms include the probability density function by integration (KP), maximum distance with geometric boundary (KMGB) [7], maximum distance with prioritized intersection with geometric boundary (KMI pr GB) [1], and BD [6]. The KP and BD calculate the similarity by integration. To ensure the results as correct as possible in the KP and BD, we contract 16 d slices for an object in datasets except the wine quality dataset where d is the number of dimensions. In addition to the wine dataset, we adopt 8 d slices in an object for saving time spent. All object errors are uniform distribution. In the effectiveness aspect, we first compare the effectiveness with aforementioned mechanisms. For simplifying the calculations in KMDI w,wesetα as 1 in our simulations. In addition to the comparisons from different mechanisms, we also discuss the effectiveness in the KM, with different boundaries, GB and SRB, for discussions. C. measurements First, we compare the time cost of calculating one hundred of the objects in each dataset with various mechanisms for fairness since the object quantities in datasets are different. Clearly, the KMGB spends the least time cost in all datasets. The KMDI w, an extended simplified similarity, spends higher time cost compared to the KMGB because of the calculations of the density factor and weighted intersection factor. The results are illustrated as Figs 2a. According to the results of clustering one hundred objects, the time cost of each dataset can be revealed in Fig. 2b. In the effectiveness of clustering, the results are shown in Figs 3. In synthetic I, II, and iris datasets, the KMDI w obtains the highest accuracy, F 1 scores, and purity when the number of initial clusters K are less than the number of predefined categories. As increasing the quantities of initial clusters, the accuracy of the KMDI w decreases because centroids which contain the boundary have high probability to attract other objects by the intersections. In the wine quality dataset, all mechanisms show similar accuracy, F 1 scores, and purity because the number of the dimensions are large. In addition to the effectiveness from the similarity, the boundary of centroids is the other mechanism that we concern. For comparing the effect from two boundary, GB and SRB, we assign a fixed number of initial clusters and measure the effectiveness in UKmeans. The KMGB and KMSRB are the abbreviations. The results of the accuracy, F 1 scores and purity in KMSRB are better than the ones in KMGB. The results are illustrated as Fig. 4. V. CONCLUSION We propose two mechanisms towards clustering in the study. First, we present a mechanism for modeling cluster boundary. In the past, the GB is the only one definition for the boundary and honestly reflects the possible locations of centroids from object errors which belong to the cluster. However, the results from the clusters using the GB are unfavorable when object errors are large. The SRB, however, controls the aforementioned results because of small boundary. Therefore, the SRB provides the stable effectiveness of clustering. Next, we propose a mixed similarity mechanism. The mechanism balances (1) some clusters which have large boundary, and (2) the results that most objects do not belong absolutely to the clusters which have large boundary. Finally, we retain the time cost in simplified clustering. Although the time cost in our proposed mechanisms can not excess the KMGB, we also cluster objects in the favorable time cost. REFERENCES [1] C. C. Aggarwal. On density based transforms for uncertain data mining. In Data Engineering, ICDE IEEE 23rd International Conference on, pages IEEE, [2] C. C. Aggarwal and P. S. Yu. A framework for clustering uncertain data streams. In Data Engineering, ICDE IEEE 24th International Conference on, pages IEEE, [3] B. Kao, S. D. Lee, D. W. Cheung, W.-S. Ho, and K. Chan. Clustering uncertain data using voronoi diagrams. In Data Mining, ICDM 08. Eighth IEEE International Conference on, pages IEEE, [4] H.-P. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages ACM, [5] I. Lukic, M. Kohler, and N. Slavek. Improved bisector pruning for uncertain data mining. In Information Technology Interfaces (ITI), Proceedings of the ITI th International Conference on, pages IEEE, [6] W. K. Ngai, B. Kao, C. K. Chui, R. Cheng, M. Chau, and K. Y. Yip. Efficient clustering of uncertain data. In Data Mining, ICDM 06. Sixth International Conference on, pages IEEE, [7] Y. Peng, Q. Luo, and X. Peng. Uck-means: A customized k-means for clustering uncertain measurement data. In Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on, volume 2, pages IEEE, [8] L. Xiao and E. Hung. An efficient distance calculation method for uncertain objects. In Computational Intelligence and Data Mining, CIDM IEEE Symposium on, pages IEEE,

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Efficient Clustering of Uncertain Data

Efficient Clustering of Uncertain Data Efficient Clustering of Uncertain Data Wang Kay Ngai, Ben Kao, Chun Kit Chui Dept. of Computer Science The University of Hong Kong wkngai,kao,ckchui@cs.hku.hk Michael Chau School of Business The University

More information

Unsupervised Classification of Uncertain Data Objects in Spatial Databases Using Computational Geometry and Indexing Techniques

Unsupervised Classification of Uncertain Data Objects in Spatial Databases Using Computational Geometry and Indexing Techniques Unsupervised Classification of Uncertain Data Objects in Spatial Databases Using Computational Geometry and Indexing Techniques Kurada Ramachandra Rao*, PSE Purnima*, M Naga Sulochana*, B Durga Sri **

More information

International Journal of Advance Engineering and Research Development CLUSTERING ON UNCERTAIN DATA BASED PROBABILITY DISTRIBUTION SIMILARITY

International Journal of Advance Engineering and Research Development CLUSTERING ON UNCERTAIN DATA BASED PROBABILITY DISTRIBUTION SIMILARITY Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 08, August -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 CLUSTERING

More information

Metric and Trigonometric Pruning for Clustering of Uncertain Data in 2D Geometric Space

Metric and Trigonometric Pruning for Clustering of Uncertain Data in 2D Geometric Space Metric and Trigonometric Pruning for Clustering of Uncertain Data in 2D Geometric Space Wang Kay Ngai a, Ben Kao,a, Reynold Cheng a, Michael Chau b, Sau Dan Lee a, David W. Cheung a, Kevin Y. Yip c,d a

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 3, September 2012

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 3, September 2012 Clustering and Indexing For Uncertain Objects Using Pruning Techniques of Voronoi Diagram and R-Trees 1 Ravindra Changala, 2 Kareemunnisa 3 Thaduri Venkata Ramana, 4 Y Sowjanya, 5 K Chandrika 1, 2 IT Dept,

More information

A Framework for Clustering Uncertain Data Streams

A Framework for Clustering Uncertain Data Streams A Framework for Clustering Uncertain Data Streams Charu C. Aggarwal, Philip S. Yu IBM T. J. Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532, USA { charu, psyu }@us.ibm.com Abstract In recent

More information

Efficient Range Query Processing on Uncertain Data

Efficient Range Query Processing on Uncertain Data Efficient Range Query Processing on Uncertain Data Andrew Knight Rochester Institute of Technology Department of Computer Science Rochester, New York, USA andyknig@gmail.com Manjeet Rege Rochester Institute

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Technique For Clustering Uncertain Data Based On Probability Distribution Similarity

Technique For Clustering Uncertain Data Based On Probability Distribution Similarity Technique For Clustering Uncertain Data Based On Probability Distribution Similarity Vandana Dubey 1, Mrs A A Nikose 2 Vandana Dubey PBCOE, Nagpur,Maharashtra, India Mrs A A Nikose Assistant Professor

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Threshold Interval Indexing for Complicated Uncertain Data

Threshold Interval Indexing for Complicated Uncertain Data Threshold Interval Indexing for Complicated Uncertain Data Andrew Knight Department of Computer Science Rochester Institute of Technology Rochester, New York, USA Email: alk1234@rit.edu Qi Yu Department

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

In what follows, we will focus on Voronoi diagrams in Euclidean space. Later, we will generalize to other distance spaces.

In what follows, we will focus on Voronoi diagrams in Euclidean space. Later, we will generalize to other distance spaces. Voronoi Diagrams 4 A city builds a set of post offices, and now needs to determine which houses will be served by which office. It would be wasteful for a postman to go out of their way to make a delivery

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

1. Use the Trapezium Rule with five ordinates to find an approximate value for the integral

1. Use the Trapezium Rule with five ordinates to find an approximate value for the integral 1. Use the Trapezium Rule with five ordinates to find an approximate value for the integral Show your working and give your answer correct to three decimal places. 2 2.5 3 3.5 4 When When When When When

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Local Linear Approximation for Kernel Methods: The Railway Kernel

Local Linear Approximation for Kernel Methods: The Railway Kernel Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,

More information

Guidelines for proper use of Plate elements

Guidelines for proper use of Plate elements Guidelines for proper use of Plate elements In structural analysis using finite element method, the analysis model is created by dividing the entire structure into finite elements. This procedure is known

More information

Clustering uncertain data using voronoi diagrams and R-tree index. Kao, B; Lee, SD; Lee, FKF; Cheung, DW; Ho, WS

Clustering uncertain data using voronoi diagrams and R-tree index. Kao, B; Lee, SD; Lee, FKF; Cheung, DW; Ho, WS Title Clustering uncertain data using voronoi diagrams and R-tree index Author(s) Kao, B; Lee, SD; Lee, FKF; Cheung, DW; Ho, WS Citation Ieee Transactions On Knowledge And Data Engineering, 2010, v. 22

More information

GCSE Higher Revision List

GCSE Higher Revision List GCSE Higher Revision List Level 8/9 Topics I can work with exponential growth and decay on the calculator. I can convert a recurring decimal to a fraction. I can simplify expressions involving powers or

More information

On Indexing High Dimensional Data with Uncertainty

On Indexing High Dimensional Data with Uncertainty On Indexing High Dimensional Data with Uncertainty Charu C. Aggarwal Philip S. Yu Abstract In this paper, we will examine the problem of distance function computation and indexing uncertain data in high

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Constrained Clustering with Interactive Similarity Learning

Constrained Clustering with Interactive Similarity Learning SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,

More information

Density Based Clustering using Modified PSO based Neighbor Selection

Density Based Clustering using Modified PSO based Neighbor Selection Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to

More information

Math 7 Glossary Terms

Math 7 Glossary Terms Math 7 Glossary Terms Absolute Value Absolute value is the distance, or number of units, a number is from zero. Distance is always a positive value; therefore, absolute value is always a positive value.

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Detecting Clusters and Outliers for Multidimensional

Detecting Clusters and Outliers for Multidimensional Kennesaw State University DigitalCommons@Kennesaw State University Faculty Publications 2008 Detecting Clusters and Outliers for Multidimensional Data Yong Shi Kennesaw State University, yshi5@kennesaw.edu

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Iterative random projections for high-dimensional data clustering

Iterative random projections for high-dimensional data clustering Iterative random projections for high-dimensional data clustering Ângelo Cardoso, Andreas Wichert INESC-ID Lisboa and Instituto Superior Técnico, Technical University of Lisbon Av. Prof. Dr. Aníbal Cavaco

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Distance-based Outlier Detection: Consolidation and Renewed Bearing Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction

More information

Link Recommendation Method Based on Web Content and Usage Mining

Link Recommendation Method Based on Web Content and Usage Mining Link Recommendation Method Based on Web Content and Usage Mining Przemys law Kazienko and Maciej Kiewra Wroc law University of Technology, Wyb. Wyspiańskiego 27, Wroc law, Poland, kazienko@pwr.wroc.pl,

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

AN IMPROVED DENSITY BASED k-means ALGORITHM

AN IMPROVED DENSITY BASED k-means ALGORITHM AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Year 8 Mathematics Curriculum Map

Year 8 Mathematics Curriculum Map Year 8 Mathematics Curriculum Map Topic Algebra 1 & 2 Number 1 Title (Levels of Exercise) Objectives Sequences *To generate sequences using term-to-term and position-to-term rule. (5-6) Quadratic Sequences

More information

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Discrete geometry Lecture 2 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 The world is continuous, but the mind is discrete

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Course: Geometry PAP Prosper ISD Course Map Grade Level: Estimated Time Frame 6-7 Block Days. Unit Title

Course: Geometry PAP Prosper ISD Course Map Grade Level: Estimated Time Frame 6-7 Block Days. Unit Title Unit Title Unit 1: Geometric Structure Estimated Time Frame 6-7 Block 1 st 9 weeks Description of What Students will Focus on on the terms and statements that are the basis for geometry. able to use terms

More information

Geometric Computations for Simulation

Geometric Computations for Simulation 1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things

More information

A Population Based Convergence Criterion for Self-Organizing Maps

A Population Based Convergence Criterion for Self-Organizing Maps A Population Based Convergence Criterion for Self-Organizing Maps Lutz Hamel and Benjamin Ott Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI 02881, USA. Email:

More information

Improved Version of Kernelized Fuzzy C-Means using Credibility

Improved Version of Kernelized Fuzzy C-Means using Credibility 50 Improved Version of Kernelized Fuzzy C-Means using Credibility Prabhjot Kaur Maharaja Surajmal Institute of Technology (MSIT) New Delhi, 110058, INDIA Abstract - Fuzzy c-means is a clustering algorithm

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Distance based Clustering for Categorical Data

Distance based Clustering for Categorical Data Distance based Clustering for Categorical Data Extended Abstract Dino Ienco and Rosa Meo Dipartimento di Informatica, Università di Torino Italy e-mail: {ienco, meo}@di.unito.it Abstract. Learning distances

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Information Systems 36 (2011) Contents lists available at ScienceDirect. Information Systems

Information Systems 36 (2011) Contents lists available at ScienceDirect. Information Systems Information Systems 36 (211) 476 497 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier.com/locate/infosys Metric and trigonometric pruning for clustering of uncertain

More information

Generating Decision Trees for Uncertain Data by Using Pruning Techniques

Generating Decision Trees for Uncertain Data by Using Pruning Techniques Generating Decision Trees for Uncertain Data by Using Pruning Techniques S.Vidya Sagar Appaji,V.Trinadha Abstract Current research techniques on data stream classification mainly focuses on certain data,

More information

Clustering on Uncertain Data using Kullback Leibler Divergence Measurement based on Probability Distribution

Clustering on Uncertain Data using Kullback Leibler Divergence Measurement based on Probability Distribution International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Clustering

More information

Aspects of Geometry. Finite models of the projective plane and coordinates

Aspects of Geometry. Finite models of the projective plane and coordinates Review Sheet There will be an exam on Thursday, February 14. The exam will cover topics up through material from projective geometry through Day 3 of the DIY Hyperbolic geometry packet. Below are some

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Introduction to Computer Science

Introduction to Computer Science DM534 Introduction to Computer Science Clustering and Feature Spaces Richard Roettger: About Me Computer Science (Technical University of Munich and thesis at the ICSI at the University of California at

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS 4.1 Introduction Although MST-based clustering methods are effective for complex data, they require quadratic computational time which is high for

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

TOPIC LIST GCSE MATHEMATICS HIGHER TIER (Bold HIGHER TIER ONLY) Number Topic Red Amber Green

TOPIC LIST GCSE MATHEMATICS HIGHER TIER (Bold HIGHER TIER ONLY) Number Topic Red Amber Green TOPIC LIST GCSE MATHEMATICS HIGHER TIER (Bold HIGHER TIER ONLY) Number Order whole, decimal, fraction and negative numbers Use the symbols =,, Add, subtract, multiply, divide whole numbers using written

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 5 Mathematics

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 5 Mathematics Mapping Common Core State Clusters and Ohio s Grade Level Indicators: Grade 5 Mathematics Operations and Algebraic Thinking: Write and interpret numerical expressions. Operations and Algebraic Thinking:

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse Tutorial Outline Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse exams. Math Tutorials offer targeted instruction,

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Superpixels Generating from the Pixel-based K-Means Clustering

Superpixels Generating from the Pixel-based K-Means Clustering Superpixels Generating from the Pixel-based K-Means Clustering Shang-Chia Wei, Tso-Jung Yen Institute of Statistical Science Academia Sinica Taipei, Taiwan 11529, R.O.C. wsc@stat.sinica.edu.tw, tjyen@stat.sinica.edu.tw

More information

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department

More information

Lifting Transform, Voronoi, Delaunay, Convex Hulls

Lifting Transform, Voronoi, Delaunay, Convex Hulls Lifting Transform, Voronoi, Delaunay, Convex Hulls Subhash Suri Department of Computer Science University of California Santa Barbara, CA 93106 1 Lifting Transform (A combination of Pless notes and my

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Fuzzy Voronoi Diagram

Fuzzy Voronoi Diagram Fuzzy Voronoi Diagram Mohammadreza Jooyandeh and Ali Mohades Khorasani Mathematics and Computer Science, Amirkabir University of Technology, Hafez Ave., Tehran, Iran mohammadreza@jooyandeh.info,mohades@aut.ac.ir

More information

Revision of Inconsistent Orthographic Views

Revision of Inconsistent Orthographic Views Journal for Geometry and Graphics Volume 2 (1998), No. 1, 45 53 Revision of Inconsistent Orthographic Views Takashi Watanabe School of Informatics and Sciences, Nagoya University, Nagoya 464-8601, Japan

More information

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry Balanced Box-Decomposition trees for Approximate nearest-neighbor 11 Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry Nearest Neighbor A set S of n points is given in some metric

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS HEMANT D. TAGARE. Introduction. Shape is a prominent visual feature in many images. Unfortunately, the mathematical theory

More information

Chapter 14 Global Search Algorithms

Chapter 14 Global Search Algorithms Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.

More information

Unit Activity Correlations to Common Core State Standards. Geometry. Table of Contents. Geometry 1 Statistics and Probability 8

Unit Activity Correlations to Common Core State Standards. Geometry. Table of Contents. Geometry 1 Statistics and Probability 8 Unit Activity Correlations to Common Core State Standards Geometry Table of Contents Geometry 1 Statistics and Probability 8 Geometry Experiment with transformations in the plane 1. Know precise definitions

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

Geometry Spring 2017 Item Release

Geometry Spring 2017 Item Release Geometry Spring 2017 Item Release 1 Geometry Reporting Category: Congruence and Proof Question 2 16743 20512 Content Cluster: Use coordinates to prove simple geometric theorems algebraically and to verify

More information

Shape spaces. Shape usually defined explicitly to be residual: independent of size.

Shape spaces. Shape usually defined explicitly to be residual: independent of size. Shape spaces Before define a shape space, must define shape. Numerous definitions of shape in relation to size. Many definitions of size (which we will explicitly define later). Shape usually defined explicitly

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

THE discrete multi-valued neuron was presented by N.

THE discrete multi-valued neuron was presented by N. Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Multi-Valued Neuron with New Learning Schemes Shin-Fu Wu and Shie-Jue Lee Department of Electrical

More information

Digital Image Stabilization and Its Integration with Video Encoder

Digital Image Stabilization and Its Integration with Video Encoder Digital Image Stabilization and Its Integration with Video Encoder Yu-Chun Peng, Hung-An Chang, Homer H. Chen Graduate Institute of Communication Engineering National Taiwan University Taipei, Taiwan {b889189,

More information

Conjectures concerning the geometry of 2-point Centroidal Voronoi Tessellations

Conjectures concerning the geometry of 2-point Centroidal Voronoi Tessellations Conjectures concerning the geometry of 2-point Centroidal Voronoi Tessellations Emma Twersky May 2017 Abstract This paper is an exploration into centroidal Voronoi tessellations, or CVTs. A centroidal

More information