An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm

Size: px
Start display at page:

Download "An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm"

Transcription

1 Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference An Accelerated Nearest Neighbor Search Method for the -Means Clustering Algorithm Adam Fausett and M. Emre Celebi Department of Computer Science, Louisiana State University in Shreveport, Shreveport, LA, USA Abstract -means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, the nearest neighbor search step of this algorithm can be computationally expensive, as the distance between each input vector and all cluster centers need to be calculated. To accelerate this step, a computationally inexpensive distance estimation method can be tried first, resulting in the rejection of candidate centers that cannot possibly be the nearest center to the input vector under consideration. This way, the computational requirements of the search can be reduced as most of the full distance computations become unnecessary. In this paper, a fast nearest neighbor search method that rejects impossible centers to accelerate the k-means clustering algorithm is presented. Our method uses geometrical relations among the input vectors and the cluster centers to reject many unlikely centers that are not typically rejected by similar approaches. Experimental results show that the method can reduce the number of distance computations significantly without degrading the clustering accuracy. 1 Introduction Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis (Jain, Murty, and Flynn 1999). Primary goals of clustering include gaining insight into data (detecting anomalies, identifying salient features, etc.), classifying data, and compressing data. Clustering has a long and rich history in a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s (Jain 2010). Clustering algorithms can be broadly classified into two groups: hierarchical and partitional (Jain 2010). Hierarchical algorithms recursively find nested clusters either in a top-down (divisive) or bottom-up (agglomerative) fashion. In contrast, partitional algorithms find all the clusters simultaneously as a partition of the data and do not impose a hierarchical structure. Most hierarchical algorithms have quadratic or higher complexity in the number of data points (Jain, Murty, and Flynn 1999) and therefore are not suitable Copyright c 2013, Association for the Advancement of Artificial Intelligence ( All rights reserved. for large data sets, whereas partitional algorithms often have lower complexity. Given a data set X = {x 1, x 2,...,x N } R D, i.e., N vectors each with D attributes, hard partitional algorithms divide X into exhaustive and mutually exclusive clusters P = {P 1, P 2,...,P }, i=1 P i = X, P i P j = for 1 i j. These algorithms usually generate clusters by optimizing a criterion function. The most intuitive and frequently used criterion function is the Sum of Squared Error (SSE) given by: SSE = x j y i 2 2 (1) i=1 x j P i where. 2 denotes the Euclidean norm and y i = 1/ P i x j P i x j is the centroid of cluster P i whose cardinality is P i. The number of ways in which a set of N objects can be partitioned into non-empty groups is given by Stirling numbers of the second kind: S(N,) = 1 ( ) ( 1) i i N (2)! i i=0 which can be approximated by N /! It can be seen that a complete enumeration of all possible clusterings to determine the global minimum of (1) is clearly computationally prohibitive. In fact, this non-convex optimization problem is proven to be NP-hard even for =2(Aloise et al. 2009) or D =2(Mahajan, Nimbhorkar, and Varadarajan 2012). Consequently, various heuristics have been developed to provide approximate solutions to this problem (Tarsitano 2003). Among these heuristics, Lloyd s algorithm (Lloyd 1982), often referred to as the k-means algorithm, is the simplest and most commonly used one (Celebi and ingravi 2012; Celebi, ingravi, and Vela 2013). This algorithm starts with arbitrary centers, typically chosen uniformly at random from the data points. Each point is assigned to the nearest center and then each center is recalculated as the mean of all points assigned to it. These two steps are repeated until a predefined termination criterion is met. Algo. 1 shows the pseudocode of this procedure. Despite its linear time complexity, k-means can be time consuming for large data sets due its computationally expensive nearest neighbor search step. In this step, the nearest 426

2 center to each input vector is determined using a full search. That is, given an input vector x, the distance from this vector to each of the centers is calculated and then the center with the smallest distance is determined. The dissimilarity of two vectors x =(x 1,x 2,...,x D ) and y =(y 1,y 2,...,y D ) is measured using the squared Euclidean distance: D d 2 (x, y) = x y 2 2 = (x d y d ) 2 (3) Equation (3) shows that each distance computation requires D multiplications and 2D 1 additions/subtractions. It is therefore necessary to perform D multiplications, (2D 1) additions/subtractions, and 1 comparisons to determine the nearest center to each input vector. In this paper, a novel method is proposed to minimize the number of distance computations necessary to find the nearest center by utilizing the geometrical relationships among the input vector and the cluster centers. The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 describes the proposed accelerated nearest neighbor search method. Section 4 provides the experimental results. Finally, Section 5 gives the conclusions. input : X = {x 1, x 2,...,x N } R D (N D input data set) output: C = {c 1, c 2,...,c } R D ( cluster centers) Select a random subset C of X as the initial set of cluster centers; while termination criterion is not met do for (i =1;i N; i = i +1)do Assign x i to the nearest cluster; m[i] = argmin k {1,2,...,} x i c k 2 ; Recalculate the cluster centers; for (k =1;k ; k = k +1)do Cluster P k contains the set of points x i that are nearest to the center c k ; P k = {x i m[i] =k}; Calculate the new center c k as the mean of the points that belong to P k ; c k = 1 x i ; P k x i P k Algorithm 1: -Means algorithm 2 Related Work Numerous methods have been proposed for accelerating the nearest neighbor search step of the k-means algorithm. Note that k-means is also known as the Generalized Lloyd Algorithm (GLA) or the Linde-Buzo-Gray (LBG) algorithm (Linde, Buzo, and Gray 1980) in the vector quantization literature. In this section, we briefly describe several methods that perform well compared to full search. Triangle Inequality Elimination (TIE) Method Let x be an input vector and y l be a known nearby center. Center y j is a candidate to be the nearest center only if it satisfies d(x, y j ) d(x, y l ). This inequality constrains the centers which need to be considered, however, testing the inequality requires evaluating d(x, y j ) for each center y j, which is computationally expensive. To circumvent this, the TIE method bounds d(y l, y j ) rather than d(x, y j ). Using the triangle inequality, center y j is a candidate to be the nearest center only if the following condition is satisfied (Huang and Chen 1990; Chen and Hsieh 1991; Orchard 1991; Phillips 2002): d(y i, y l ) d(x, y j )+d(x, y l ) 2d(x, y l ) (4) With d(x, y l ) evaluated, (4) makes it possible to eliminate center y j from consideration by evaluating the distances between y j and another center, y l, rather than between y j and the input vector x. This means that the distance computations do not involve the input vector, and thus they can be precomputed, thereby allowing a large set of centers to be rejected by evaluating the single distance d(x, y l ). TIE stores the distance of each center y i to every other center in a table. Therefore, the method requires tables, each with 1 entries. The entries in each table are then sorted in increasing order. Mean-Distance-Ordered Search (MOS) Method Given an input vector x =(x 1,x 2,...,x D ) and a center y i = (y i1,y i2,...,y id ), the MOS method (Ra and im 1993) uses the following inequality between the mean and the distortion: ( D ) 2 x d y id Dd 2 (x, y i ) (5) In this method, the centers are ordered according to their Euclidean norms prior to nearest neighbor search. For a given input vector x and an initial center, the search procedure moves up and down from the initial center alternately in the pre-ordered list. Let y i be the center with the current minimum distance, d(x, y i ), among the centers examined so far. If a center y j satisfies (6) we do not need to calculate d(x, y j ) because d(x, y j ) d(x, y i ). ( D ) 2 Dd 2 (x, y i ) x d y jd (6) Consequently, (6) does not have to be checked for a center y k that is higher, or lower, in the pre-ordered list than y j as d(x, y k ) d(x, y j ) d(x, y i ) (Choi and Chae 2000). Poggi s (PGG) Method This method takes advantage of distance estimations to reject potential centers from consideration as the nearest center for an input vector x. Let y l be the current nearest center 427

3 to x and let y j be any other center. The center y j can be rejected if the following condition is satisfied (Poggi 1993): 1 x d 1 d(x, y l ) y jd (7) x y j x If the center y j is not rejected due to (7), then d(x, y j ) must be computed. The rejection procedure for an input vector x is shown in Algo. 2. for j =1to do d(x, yl ) r = D ; α = β =0; for d =1to D do α = α + x d ; β = β + y jd ; α = α/d; β = β/d; if α β >rthen Reject y j as a potential nearest center for x; else Calculate d(x, y j ); Algorithm 2: Rejection procedure for Poggi s method Activity Detection (AD) Method It can be observed that the only changes made by k-means are local changes in the set of centers, and the amount of change differs from center to center. Therefore, some centers will stabilize much faster than others. This information is useful because the activity of the centers can be monitored indicating whether the center was changed during the previous iteration. The centers can then be classified into two groups: active and static. The cluster of a static center is called a static cluster. The number of static centers increases as the iterations progress (aukoranta, Fränti, and Nevalainen 2000). This activity information is useful for reducing the number of distance computations in the case of vectors in static clusters. Consider an input vector x in a static cluster P i, and denote the center of this cluster by y i. It is not difficult to see that if another static center y j was not the nearest center for x in the previous iteration, it cannot be the nearest center in the current iteration either. The partition of the centers in a static cluster cannot therefore change to another static cluster. For these vectors, it is sufficient to calculate the distances only to the active centers because only they may become closer than the center of the current cluster. This method is also useful for certain input vectors in an active cluster. Consider an input vector x in an active cluster α. If the center c α moves closer to x than it was before the update, then x can be treated as if it were in a static cluster. Note that for each input vector x, its distance to the current nearest center y j must be stored. For the remaining input vectors (whose distance to current center was increased), a full search must be performed by calculating the distances to all centers. This method requires that a set of the active centers be maintained. This is done by comparing the new and the previous centers with at most O(D) extra work and O(D) extra space. For each input vector x, the method checks to see if its current nearest center y i is in the set of active centers. If this is not the case, we perform a search for the nearest center from the set of the active centers by using any fast search method. Otherwise, when x is mapped to active center y i, we calculate the distance d(x, y i ). If this distance is greater than the distance in previous iteration, a full search is necessary. If, however, the distance does not increase, only the set of the active centers is used. To store the distances of the previous iteration O(ND) extra space is needed. 3 Proposed Accelerated Nearest Neighbor Search Method Many centers can be omitted from computations based on their geometrical relation to the input vector. Consider an input vector x, its nearest center y i, and a new potential center y j.ify j is located outside the circle centered at x with a radius equal to the Euclidean distance between x and y i, then it should not be considered as the nearest center, since the distance from x to y j must be strictly larger than the current nearest center distance. For y j to be a nearer center for x, it must be inside the described circle. Evaluating whether or not a new center is inside this circle normally requires comparing the Euclidean distance between x and y j with the radius. Such computations are expensive. However, they can be largely avoided if the circle is inscribed in a square whose sides are tangent to the circle. Any center that lies outside of the square is also outside of the circle and thus cannot be nearer to x than y i. Fig. 1 illustrates this in two dimensions. Figure 1: Geometrical relations among an input vector x and various cluster centers in two dimensions (Tai and Lin 1996) To see this mathematically, let d min be the Euclidean distance between the input vector x and its nearest center y i. Any center whose projection x d on the axis d is outside the 428

4 range [x d d min,x d +d min ] can be firmly discarded as it has no chance to be nearer to x. This rule is applied to each axis (attribute) d D, eliminating any center that lies outside the square in Fig. 1 from further consideration. The focus can now be narrowed to only those centers that lie inside the square. There exists an in-between region near the corners of the square where it is possible to be inside the square and yet outside the circle. Consider a circle of radius 10, centered at the origin, O. The point A(9,9) is inside the square, however, d(o, A) = = 162, placing it outside of the circle as 162 > 10. A potential center located in this region should be rejected. Determining whether a point is in the in-between region in this manner requires a distance calculation. It is possible to make this decision with a less expensive operation by comparing the City-block distance (8) between O and A to the radius of the circle multiplied by the square root of D (number of dimensions). Using this approach, T (O, A) = ,soA is rejected as expected. T (x, y) = x d y d (8) Therefore, if T (x, y j ) d min D, then yj lies in the region between the circle and the enclosing square and thus can be firmly discarded. If, at this point, the potential center has not been rejected, it becomes the new nearest center, as it lies inside the circle with a radius equal to the distance from x to y i. The exact distance to the newly found nearest center, y j, must now be calculated. The absolute differences calculated in (8) may be stored in array: V [d] = x d y d d {1, 2,...,D}. These values can then be reused in the event that the potential center is not rejected, resulting in the need for a full distance calculation. Eq. (3) then becomes (9), thereby not wasting any of the calculations preformed thus far. d 2 (x, y) = (x d y d ) 2 = V [d] 2 (9) Implementation Implementation of the proposed method is fairly straightforward as shown in Algo. 3. Here, d min is the current nearest neighbor distance found at any point in time, and index is the index of x s current nearest center. The RejectionTest procedure is given in Algo. 4. Note that a) it is necessary to set the variable radius equal to the square root of the current minimum distance, i.e., dmin, rather than simply d min as we are using the squared Euclidean distance to measure the dissimilarity of two vectors, and b) the value of D does not change during the course of nearest neighbor search. Therefore, the square root of D is calculated only once outside of this method. 4 Experimental Results Eight data sets from the UCI Machine Learning Repository (Frank and Asuncion 2012) were used in the experiments. d min = d(x, y index ); for j =1to do if RejectionTest (x, y j, d min ) == false then dist =0; for d =1to D do dist = dist + V [d] 2 ; d min = dist; index = j; Algorithm 3: Proposed accelerated nearest neighbor search method radius = d min ; T =0; for d =1to D do V [d] = x d y d ; if V [d] radius then return true; T = T + V [d]; if T radius D then return true; else return false; Algorithm 4: Rejection test for the proposed method Table 1: Descriptions of the Data Sets (N: # points, D: # attributes) ID Data Set N D 1 Concrete Compressive Strength 1, Covertype 581, Landsat Satellite (Statlog) 6, Letter Recognition 20, Parkinsons 5, Person Activity 164, Shuttle (Statlog) 58, Steel Plates Faults 1, The data set descriptions are given in Table 1. Each method was executed a 100 times with each run initialized using a set of centers randomly chosen from the input vectors. Tables 2 and 4 give the average percentage of distance computations and average percentage of centers rejected per iteration for {2, 4, 8, 16, 32}, respectively. Note that in the former table smaller values are better, whereas in the latter one larger values are better. Tables 3 and 5 summarize Tables 2 and 4, respectively by averaging each method across the eight data sets for each value. It can be seen that when compared to the other methods, the proposed method reduces the number of distance computations significantly and rejects a substantial number 429

5 Table 2: % of full distance computations per iteration ID Method PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed Table 4: % of rejected centers per iteration ID Method PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed Table 3: Summary of Table 2 Method PGG AD MOS TIE Proposed of centers in each iteration. Furthermore, the advantage of our method is more prominent for larger values. Table 5: Summary of Table 4 Method PGG AD MOS TIE Proposed Conclusions In this paper, we presented a novel approach to accelerate the k-means clustering algorithm based on an efficient near- 430

6 est neighbor search method. The method exploits the geometrical relations among the input vectors and the cluster centers to reduce the number of necessary distance computations without compromising the clustering accuracy. Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrated the superiority of our method over four well-known methods. 6 Acknowledgments This publication was made possible by a grant from the National Science Foundation ( ). References Aloise, D.; Deshpande, A.; Hansen, P.; and Popat, P NP-Hardness of Euclidean Sum-of-Squares Clustering. Machine Learning 75(2): Celebi, M. E., and ingravi, H Deterministic Initialization of the -Means Algorithm Using Hierarchical Clustering. International Journal of Pattern Recognition and Artificial Intelligence 26(7): Celebi, M. E.; ingravi, H.; and Vela, P. A A Comparative Study of Efficient Initialization Methods for the - Means Clustering Algorithm. Expert Systems with Applications 40(1): Chen, S. H., and Hsieh, W. M Fast Algorithm for VQ Codebook Design. IEE Proceedings I: Communications, Speech and Vision 138(5): Choi, S. Y., and Chae, S. I Exted Mean- Distance-Ordered Search Using Multiple l 1 and l 2 Inequalities for Fast Vector Quantization. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 47(4): Frank, A., and Asuncion, A UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. Huang, S. H., and Chen, S. H Fast Encoding Algorithm for VQ-based Image Coding. Electronics Letters 26(19): Jain, A..; Murty, M. N.; and Flynn, P. J Data Clustering: A Review. ACM Computing Surveys 31(3): Jain, A Data Clustering: 50 Years Beyond - means. Pattern Recognition Letters 31(8): aukoranta, T.; Fränti, P.; and Nevalainen, O A Fast Exact GLA Based on Code Vector Activity Detection. IEEE Transactions on Image Processing 9(8): Linde, Y.; Buzo, A.; and Gray, R An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1): Lloyd, S Least Squares Quantization in PCM. IEEE Transactions on Information Theory 28(2): Mahajan, M.; Nimbhorkar, P.; and Varadarajan, The Planar k-means Problem is NP-hard. Theoretical Computer Science 442: Orchard, M. T A Fast Nearest-Neighbor Search Algorithm. In Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, volume 4, Phillips, S Acceleration of -Means and Related Clustering Algorithms. In Proceedings of the 4th International Workshop on Algorithm Engineering and Experiments, Poggi, G Fast Algorithm for Full-Search VQ Encoding. Electronics Letters 29(12): Ra, S. W., and im, J A Fast Mean-Distance- Ordered Partial Codebook Search Algorithm for Image Vector Quantization. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 40(9): Tai, S. C.and Lai, C. C., and Lin, Y. C Two Fast Nearest Neighbor Searching Algorithms for Image Vector Quantization. IEEE Transactions on Communications 44(12): Tarsitano, A A Computational Study of Several Relocation Methods for -Means Algorithms. Pattern Recognition 36(12):

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Comparative Study on VQ with Simple GA and Ordain GA

Comparative Study on VQ with Simple GA and Ordain GA Proceedings of the 9th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Turkey, May 27-29, 2007 204 Comparative Study on VQ with Simple GA and Ordain GA SADAF SAJJAD

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Dynamic local search algorithm for the clustering problem

Dynamic local search algorithm for the clustering problem UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Report Series A Dynamic local search algorithm for the clustering problem Ismo Kärkkäinen and Pasi Fränti Report A-2002-6 ACM I.4.2, I.5.3 UDK 519.712

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Genetic algorithm with deterministic crossover for vector quantization

Genetic algorithm with deterministic crossover for vector quantization Pattern Recognition Letters 21 (2000) 61±68 www.elsevier.nl/locate/patrec Genetic algorithm with deterministic crossover for vector quantization Pasi Franti * Department of Computer Science, University

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998

Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998 Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp. 2726-2732, October 1998 Timo Kaukoranta 1, Pasi Fränti 2 and Olli Nevalainen 1 1 Turku Centre

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Codebook generation for Image Compression with Simple and Ordain GA

Codebook generation for Image Compression with Simple and Ordain GA Codebook generation for Image Compression with Simple and Ordain GA SAJJAD MOHSIN, SADAF SAJJAD COMSATS Institute of Information Technology Department of Computer Science Tobe Camp, Abbotabad PAKISTAN

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

Non-Parametric Vector Quantization Algorithm

Non-Parametric Vector Quantization Algorithm Non-Parametric Vector Quantization Algorithm 31 Non-Parametric Vector Quantization Algorithm Haemwaan Sivaraks and Athasit Surarerks, Non-members ABSTRACT Recently researches in vector quantization domain

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 03/02/2016 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the

More information

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application Fouad Khan Central European University-Environmental

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Pattern Recognition Lecture Sequential Clustering

Pattern Recognition Lecture Sequential Clustering Pattern Recognition Lecture Prof. Dr. Marcin Grzegorzek Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany Pattern Recognition Chain patterns sensor

More information

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1 Anàlisi d Imatges i Reconeixement de Formes Image Analysis and Pattern Recognition:. Cluster Analysis Francesc J. Ferri Dept. d Informàtica. Universitat de València Febrer 8 F.J. Ferri (Univ. València)

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners Mohammad Asiful Hossain, Abdul Kawsar Tushar, and Shofiullah Babor Computer Science and Engineering Department,

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information

An Effective Color Quantization Method Based on the Competitive Learning Paradigm

An Effective Color Quantization Method Based on the Competitive Learning Paradigm An Effective Color Quantization Method Based on the Competitive Learning Paradigm M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ecelebi@lsus.edu Abstract

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It

More information

Inital Starting Point Analysis for K-Means Clustering: A Case Study

Inital Starting Point Analysis for K-Means Clustering: A Case Study lemson University TigerPrints Publications School of omputing 3-26 Inital Starting Point Analysis for K-Means lustering: A ase Study Amy Apon lemson University, aapon@clemson.edu Frank Robinson Vanderbilt

More information

Consensus clustering by graph based approach

Consensus clustering by graph based approach Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU

More information

Compression of Image Using VHDL Simulation

Compression of Image Using VHDL Simulation Compression of Image Using VHDL Simulation 1) Prof. S. S. Mungona (Assistant Professor, Sipna COET, Amravati). 2) Vishal V. Rathi, Abstract : Maintenance of all essential information without any deletion

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Road map. Basic concepts

Road map. Basic concepts Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?

More information

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

A SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2

A SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2 Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2 1 P.G. Scholar, Department of Computer Engineering, ARMIET, Mumbai University, India 2 Principal of, S.S.J.C.O.E, Mumbai University, India ABSTRACT Now a

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Binary vector quantizer design using soft centroids

Binary vector quantizer design using soft centroids Signal Processing: Image Communication 14 (1999) 677}681 Binary vector quantizer design using soft centroids Pasi FraK nti *, Timo Kaukoranta Department of Computer Science, University of Joensuu, P.O.

More information

Seismic regionalization based on an artificial neural network

Seismic regionalization based on an artificial neural network Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx

More information

Clustering: K-means and Kernel K-means

Clustering: K-means and Kernel K-means Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem

More information

Hierarchical Ordering for Approximate Similarity Ranking

Hierarchical Ordering for Approximate Similarity Ranking Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au

More information

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION

A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION Ming-Hwa Sheu, Sh-Chi Tsai and Ming-Der Shieh Dept. of Electronic Eng., National Yunlin Univ. of Science and Technology, Yunlin,

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials Tuukka Toivonen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering

More information

26 The closest pair problem

26 The closest pair problem The closest pair problem 1 26 The closest pair problem Sweep algorithms solve many kinds of proximity problems efficiently. We present a simple sweep that solves the two-dimensional closest pair problem

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

Associative Cellular Learning Automata and its Applications

Associative Cellular Learning Automata and its Applications Associative Cellular Learning Automata and its Applications Meysam Ahangaran and Nasrin Taghizadeh and Hamid Beigy Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ahangaran@iust.ac.ir,

More information

Hartigan s Method for K-modes Clustering and Its Advantages

Hartigan s Method for K-modes Clustering and Its Advantages Proceedings of the Twelfth Australasian Data Mining Conference (AusDM 201), Brisbane, Australia Hartigan s Method for K-modes Clustering and Its Advantages Zhengrong Xiang 1 Md Zahidul Islam 2 1 College

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? Intersecting Circles This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? This is a problem that a programmer might have to solve, for example,

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information

Fast nearest-neighbor search algorithms based on approximation-elimination search

Fast nearest-neighbor search algorithms based on approximation-elimination search Pattern Recognition 33 (2000 1497}1510 Fast nearest-neighbor search algorithms based on approximation-elimination search V. Ramasubramanian, Kuldip K. Paliwal* Computer Systems and Communications Group,

More information

Multilevel Compression Scheme using Vector Quantization for Image Compression

Multilevel Compression Scheme using Vector Quantization for Image Compression Multilevel Compression Scheme using Vector Quantization for Image Compression S.Vimala, B.Abidha, and P.Uma Abstract In this paper, we have proposed a multi level compression scheme, in which the initial

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,

More information

A note on quickly finding the nearest neighbour

A note on quickly finding the nearest neighbour A note on quickly finding the nearest neighbour David Barber Department of Computer Science University College London May 19, 2014 1 Finding your nearest neighbour quickly Consider that we have a set of

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing 20 (2010) 1494 1501 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Fast k-nearest neighbors search using modified principal axis

More information

Hierarchical Minimum Spanning Trees for Lossy Image Set Compression

Hierarchical Minimum Spanning Trees for Lossy Image Set Compression Hierarchical Minimum Spanning Trees for Lossy Image Set Compression Anthony Schmieder, Barry Gergel, and Howard Cheng Department of Mathematics and Computer Science University of Lethbridge, Alberta, Canada

More information

Decision Making. final results. Input. Update Utility

Decision Making. final results. Input. Update Utility Active Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition Department of Computer Science and Engineering State University of New York

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Geometric Computations for Simulation

Geometric Computations for Simulation 1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things

More information

Efficient and optimal block matching for motion estimation

Efficient and optimal block matching for motion estimation Efficient and optimal block matching for motion estimation Stefano Mattoccia Federico Tombari Luigi Di Stefano Marco Pignoloni Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento

More information

K-modes Clustering Algorithm for Categorical Data

K-modes Clustering Algorithm for Categorical Data K-modes Clustering Algorithm for Categorical Data Neha Sharma Samrat Ashok Technological Institute Department of Information Technology, Vidisha, India Nirmal Gaud Samrat Ashok Technological Institute

More information

A Study of Hierarchical and Partitioning Algorithms in Clustering Methods

A Study of Hierarchical and Partitioning Algorithms in Clustering Methods A Study of Hierarchical Partitioning Algorithms in Clustering Methods T. NITHYA Dr.E.RAMARAJ Ph.D., Research Scholar Dept. of Computer Science Engg. Alagappa University Karaikudi-3. th.nithya@gmail.com

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number

More information

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm Journal of Universal Computer Science, vol. 13, no. 10 (2007), 1449-1461 submitted: 12/6/06, accepted: 24/10/06, appeared: 28/10/07 J.UCS An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

More information

The Application of K-medoids and PAM to the Clustering of Rules

The Application of K-medoids and PAM to the Clustering of Rules The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information