An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm
|
|
- Elmer Francis
- 5 years ago
- Views:
Transcription
1 Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference An Accelerated Nearest Neighbor Search Method for the -Means Clustering Algorithm Adam Fausett and M. Emre Celebi Department of Computer Science, Louisiana State University in Shreveport, Shreveport, LA, USA Abstract -means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, the nearest neighbor search step of this algorithm can be computationally expensive, as the distance between each input vector and all cluster centers need to be calculated. To accelerate this step, a computationally inexpensive distance estimation method can be tried first, resulting in the rejection of candidate centers that cannot possibly be the nearest center to the input vector under consideration. This way, the computational requirements of the search can be reduced as most of the full distance computations become unnecessary. In this paper, a fast nearest neighbor search method that rejects impossible centers to accelerate the k-means clustering algorithm is presented. Our method uses geometrical relations among the input vectors and the cluster centers to reject many unlikely centers that are not typically rejected by similar approaches. Experimental results show that the method can reduce the number of distance computations significantly without degrading the clustering accuracy. 1 Introduction Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis (Jain, Murty, and Flynn 1999). Primary goals of clustering include gaining insight into data (detecting anomalies, identifying salient features, etc.), classifying data, and compressing data. Clustering has a long and rich history in a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s (Jain 2010). Clustering algorithms can be broadly classified into two groups: hierarchical and partitional (Jain 2010). Hierarchical algorithms recursively find nested clusters either in a top-down (divisive) or bottom-up (agglomerative) fashion. In contrast, partitional algorithms find all the clusters simultaneously as a partition of the data and do not impose a hierarchical structure. Most hierarchical algorithms have quadratic or higher complexity in the number of data points (Jain, Murty, and Flynn 1999) and therefore are not suitable Copyright c 2013, Association for the Advancement of Artificial Intelligence ( All rights reserved. for large data sets, whereas partitional algorithms often have lower complexity. Given a data set X = {x 1, x 2,...,x N } R D, i.e., N vectors each with D attributes, hard partitional algorithms divide X into exhaustive and mutually exclusive clusters P = {P 1, P 2,...,P }, i=1 P i = X, P i P j = for 1 i j. These algorithms usually generate clusters by optimizing a criterion function. The most intuitive and frequently used criterion function is the Sum of Squared Error (SSE) given by: SSE = x j y i 2 2 (1) i=1 x j P i where. 2 denotes the Euclidean norm and y i = 1/ P i x j P i x j is the centroid of cluster P i whose cardinality is P i. The number of ways in which a set of N objects can be partitioned into non-empty groups is given by Stirling numbers of the second kind: S(N,) = 1 ( ) ( 1) i i N (2)! i i=0 which can be approximated by N /! It can be seen that a complete enumeration of all possible clusterings to determine the global minimum of (1) is clearly computationally prohibitive. In fact, this non-convex optimization problem is proven to be NP-hard even for =2(Aloise et al. 2009) or D =2(Mahajan, Nimbhorkar, and Varadarajan 2012). Consequently, various heuristics have been developed to provide approximate solutions to this problem (Tarsitano 2003). Among these heuristics, Lloyd s algorithm (Lloyd 1982), often referred to as the k-means algorithm, is the simplest and most commonly used one (Celebi and ingravi 2012; Celebi, ingravi, and Vela 2013). This algorithm starts with arbitrary centers, typically chosen uniformly at random from the data points. Each point is assigned to the nearest center and then each center is recalculated as the mean of all points assigned to it. These two steps are repeated until a predefined termination criterion is met. Algo. 1 shows the pseudocode of this procedure. Despite its linear time complexity, k-means can be time consuming for large data sets due its computationally expensive nearest neighbor search step. In this step, the nearest 426
2 center to each input vector is determined using a full search. That is, given an input vector x, the distance from this vector to each of the centers is calculated and then the center with the smallest distance is determined. The dissimilarity of two vectors x =(x 1,x 2,...,x D ) and y =(y 1,y 2,...,y D ) is measured using the squared Euclidean distance: D d 2 (x, y) = x y 2 2 = (x d y d ) 2 (3) Equation (3) shows that each distance computation requires D multiplications and 2D 1 additions/subtractions. It is therefore necessary to perform D multiplications, (2D 1) additions/subtractions, and 1 comparisons to determine the nearest center to each input vector. In this paper, a novel method is proposed to minimize the number of distance computations necessary to find the nearest center by utilizing the geometrical relationships among the input vector and the cluster centers. The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 describes the proposed accelerated nearest neighbor search method. Section 4 provides the experimental results. Finally, Section 5 gives the conclusions. input : X = {x 1, x 2,...,x N } R D (N D input data set) output: C = {c 1, c 2,...,c } R D ( cluster centers) Select a random subset C of X as the initial set of cluster centers; while termination criterion is not met do for (i =1;i N; i = i +1)do Assign x i to the nearest cluster; m[i] = argmin k {1,2,...,} x i c k 2 ; Recalculate the cluster centers; for (k =1;k ; k = k +1)do Cluster P k contains the set of points x i that are nearest to the center c k ; P k = {x i m[i] =k}; Calculate the new center c k as the mean of the points that belong to P k ; c k = 1 x i ; P k x i P k Algorithm 1: -Means algorithm 2 Related Work Numerous methods have been proposed for accelerating the nearest neighbor search step of the k-means algorithm. Note that k-means is also known as the Generalized Lloyd Algorithm (GLA) or the Linde-Buzo-Gray (LBG) algorithm (Linde, Buzo, and Gray 1980) in the vector quantization literature. In this section, we briefly describe several methods that perform well compared to full search. Triangle Inequality Elimination (TIE) Method Let x be an input vector and y l be a known nearby center. Center y j is a candidate to be the nearest center only if it satisfies d(x, y j ) d(x, y l ). This inequality constrains the centers which need to be considered, however, testing the inequality requires evaluating d(x, y j ) for each center y j, which is computationally expensive. To circumvent this, the TIE method bounds d(y l, y j ) rather than d(x, y j ). Using the triangle inequality, center y j is a candidate to be the nearest center only if the following condition is satisfied (Huang and Chen 1990; Chen and Hsieh 1991; Orchard 1991; Phillips 2002): d(y i, y l ) d(x, y j )+d(x, y l ) 2d(x, y l ) (4) With d(x, y l ) evaluated, (4) makes it possible to eliminate center y j from consideration by evaluating the distances between y j and another center, y l, rather than between y j and the input vector x. This means that the distance computations do not involve the input vector, and thus they can be precomputed, thereby allowing a large set of centers to be rejected by evaluating the single distance d(x, y l ). TIE stores the distance of each center y i to every other center in a table. Therefore, the method requires tables, each with 1 entries. The entries in each table are then sorted in increasing order. Mean-Distance-Ordered Search (MOS) Method Given an input vector x =(x 1,x 2,...,x D ) and a center y i = (y i1,y i2,...,y id ), the MOS method (Ra and im 1993) uses the following inequality between the mean and the distortion: ( D ) 2 x d y id Dd 2 (x, y i ) (5) In this method, the centers are ordered according to their Euclidean norms prior to nearest neighbor search. For a given input vector x and an initial center, the search procedure moves up and down from the initial center alternately in the pre-ordered list. Let y i be the center with the current minimum distance, d(x, y i ), among the centers examined so far. If a center y j satisfies (6) we do not need to calculate d(x, y j ) because d(x, y j ) d(x, y i ). ( D ) 2 Dd 2 (x, y i ) x d y jd (6) Consequently, (6) does not have to be checked for a center y k that is higher, or lower, in the pre-ordered list than y j as d(x, y k ) d(x, y j ) d(x, y i ) (Choi and Chae 2000). Poggi s (PGG) Method This method takes advantage of distance estimations to reject potential centers from consideration as the nearest center for an input vector x. Let y l be the current nearest center 427
3 to x and let y j be any other center. The center y j can be rejected if the following condition is satisfied (Poggi 1993): 1 x d 1 d(x, y l ) y jd (7) x y j x If the center y j is not rejected due to (7), then d(x, y j ) must be computed. The rejection procedure for an input vector x is shown in Algo. 2. for j =1to do d(x, yl ) r = D ; α = β =0; for d =1to D do α = α + x d ; β = β + y jd ; α = α/d; β = β/d; if α β >rthen Reject y j as a potential nearest center for x; else Calculate d(x, y j ); Algorithm 2: Rejection procedure for Poggi s method Activity Detection (AD) Method It can be observed that the only changes made by k-means are local changes in the set of centers, and the amount of change differs from center to center. Therefore, some centers will stabilize much faster than others. This information is useful because the activity of the centers can be monitored indicating whether the center was changed during the previous iteration. The centers can then be classified into two groups: active and static. The cluster of a static center is called a static cluster. The number of static centers increases as the iterations progress (aukoranta, Fränti, and Nevalainen 2000). This activity information is useful for reducing the number of distance computations in the case of vectors in static clusters. Consider an input vector x in a static cluster P i, and denote the center of this cluster by y i. It is not difficult to see that if another static center y j was not the nearest center for x in the previous iteration, it cannot be the nearest center in the current iteration either. The partition of the centers in a static cluster cannot therefore change to another static cluster. For these vectors, it is sufficient to calculate the distances only to the active centers because only they may become closer than the center of the current cluster. This method is also useful for certain input vectors in an active cluster. Consider an input vector x in an active cluster α. If the center c α moves closer to x than it was before the update, then x can be treated as if it were in a static cluster. Note that for each input vector x, its distance to the current nearest center y j must be stored. For the remaining input vectors (whose distance to current center was increased), a full search must be performed by calculating the distances to all centers. This method requires that a set of the active centers be maintained. This is done by comparing the new and the previous centers with at most O(D) extra work and O(D) extra space. For each input vector x, the method checks to see if its current nearest center y i is in the set of active centers. If this is not the case, we perform a search for the nearest center from the set of the active centers by using any fast search method. Otherwise, when x is mapped to active center y i, we calculate the distance d(x, y i ). If this distance is greater than the distance in previous iteration, a full search is necessary. If, however, the distance does not increase, only the set of the active centers is used. To store the distances of the previous iteration O(ND) extra space is needed. 3 Proposed Accelerated Nearest Neighbor Search Method Many centers can be omitted from computations based on their geometrical relation to the input vector. Consider an input vector x, its nearest center y i, and a new potential center y j.ify j is located outside the circle centered at x with a radius equal to the Euclidean distance between x and y i, then it should not be considered as the nearest center, since the distance from x to y j must be strictly larger than the current nearest center distance. For y j to be a nearer center for x, it must be inside the described circle. Evaluating whether or not a new center is inside this circle normally requires comparing the Euclidean distance between x and y j with the radius. Such computations are expensive. However, they can be largely avoided if the circle is inscribed in a square whose sides are tangent to the circle. Any center that lies outside of the square is also outside of the circle and thus cannot be nearer to x than y i. Fig. 1 illustrates this in two dimensions. Figure 1: Geometrical relations among an input vector x and various cluster centers in two dimensions (Tai and Lin 1996) To see this mathematically, let d min be the Euclidean distance between the input vector x and its nearest center y i. Any center whose projection x d on the axis d is outside the 428
4 range [x d d min,x d +d min ] can be firmly discarded as it has no chance to be nearer to x. This rule is applied to each axis (attribute) d D, eliminating any center that lies outside the square in Fig. 1 from further consideration. The focus can now be narrowed to only those centers that lie inside the square. There exists an in-between region near the corners of the square where it is possible to be inside the square and yet outside the circle. Consider a circle of radius 10, centered at the origin, O. The point A(9,9) is inside the square, however, d(o, A) = = 162, placing it outside of the circle as 162 > 10. A potential center located in this region should be rejected. Determining whether a point is in the in-between region in this manner requires a distance calculation. It is possible to make this decision with a less expensive operation by comparing the City-block distance (8) between O and A to the radius of the circle multiplied by the square root of D (number of dimensions). Using this approach, T (O, A) = ,soA is rejected as expected. T (x, y) = x d y d (8) Therefore, if T (x, y j ) d min D, then yj lies in the region between the circle and the enclosing square and thus can be firmly discarded. If, at this point, the potential center has not been rejected, it becomes the new nearest center, as it lies inside the circle with a radius equal to the distance from x to y i. The exact distance to the newly found nearest center, y j, must now be calculated. The absolute differences calculated in (8) may be stored in array: V [d] = x d y d d {1, 2,...,D}. These values can then be reused in the event that the potential center is not rejected, resulting in the need for a full distance calculation. Eq. (3) then becomes (9), thereby not wasting any of the calculations preformed thus far. d 2 (x, y) = (x d y d ) 2 = V [d] 2 (9) Implementation Implementation of the proposed method is fairly straightforward as shown in Algo. 3. Here, d min is the current nearest neighbor distance found at any point in time, and index is the index of x s current nearest center. The RejectionTest procedure is given in Algo. 4. Note that a) it is necessary to set the variable radius equal to the square root of the current minimum distance, i.e., dmin, rather than simply d min as we are using the squared Euclidean distance to measure the dissimilarity of two vectors, and b) the value of D does not change during the course of nearest neighbor search. Therefore, the square root of D is calculated only once outside of this method. 4 Experimental Results Eight data sets from the UCI Machine Learning Repository (Frank and Asuncion 2012) were used in the experiments. d min = d(x, y index ); for j =1to do if RejectionTest (x, y j, d min ) == false then dist =0; for d =1to D do dist = dist + V [d] 2 ; d min = dist; index = j; Algorithm 3: Proposed accelerated nearest neighbor search method radius = d min ; T =0; for d =1to D do V [d] = x d y d ; if V [d] radius then return true; T = T + V [d]; if T radius D then return true; else return false; Algorithm 4: Rejection test for the proposed method Table 1: Descriptions of the Data Sets (N: # points, D: # attributes) ID Data Set N D 1 Concrete Compressive Strength 1, Covertype 581, Landsat Satellite (Statlog) 6, Letter Recognition 20, Parkinsons 5, Person Activity 164, Shuttle (Statlog) 58, Steel Plates Faults 1, The data set descriptions are given in Table 1. Each method was executed a 100 times with each run initialized using a set of centers randomly chosen from the input vectors. Tables 2 and 4 give the average percentage of distance computations and average percentage of centers rejected per iteration for {2, 4, 8, 16, 32}, respectively. Note that in the former table smaller values are better, whereas in the latter one larger values are better. Tables 3 and 5 summarize Tables 2 and 4, respectively by averaging each method across the eight data sets for each value. It can be seen that when compared to the other methods, the proposed method reduces the number of distance computations significantly and rejects a substantial number 429
5 Table 2: % of full distance computations per iteration ID Method PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed Table 4: % of rejected centers per iteration ID Method PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed PGG AD MOS TIE Proposed Table 3: Summary of Table 2 Method PGG AD MOS TIE Proposed of centers in each iteration. Furthermore, the advantage of our method is more prominent for larger values. Table 5: Summary of Table 4 Method PGG AD MOS TIE Proposed Conclusions In this paper, we presented a novel approach to accelerate the k-means clustering algorithm based on an efficient near- 430
6 est neighbor search method. The method exploits the geometrical relations among the input vectors and the cluster centers to reduce the number of necessary distance computations without compromising the clustering accuracy. Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrated the superiority of our method over four well-known methods. 6 Acknowledgments This publication was made possible by a grant from the National Science Foundation ( ). References Aloise, D.; Deshpande, A.; Hansen, P.; and Popat, P NP-Hardness of Euclidean Sum-of-Squares Clustering. Machine Learning 75(2): Celebi, M. E., and ingravi, H Deterministic Initialization of the -Means Algorithm Using Hierarchical Clustering. International Journal of Pattern Recognition and Artificial Intelligence 26(7): Celebi, M. E.; ingravi, H.; and Vela, P. A A Comparative Study of Efficient Initialization Methods for the - Means Clustering Algorithm. Expert Systems with Applications 40(1): Chen, S. H., and Hsieh, W. M Fast Algorithm for VQ Codebook Design. IEE Proceedings I: Communications, Speech and Vision 138(5): Choi, S. Y., and Chae, S. I Exted Mean- Distance-Ordered Search Using Multiple l 1 and l 2 Inequalities for Fast Vector Quantization. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 47(4): Frank, A., and Asuncion, A UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. Huang, S. H., and Chen, S. H Fast Encoding Algorithm for VQ-based Image Coding. Electronics Letters 26(19): Jain, A..; Murty, M. N.; and Flynn, P. J Data Clustering: A Review. ACM Computing Surveys 31(3): Jain, A Data Clustering: 50 Years Beyond - means. Pattern Recognition Letters 31(8): aukoranta, T.; Fränti, P.; and Nevalainen, O A Fast Exact GLA Based on Code Vector Activity Detection. IEEE Transactions on Image Processing 9(8): Linde, Y.; Buzo, A.; and Gray, R An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1): Lloyd, S Least Squares Quantization in PCM. IEEE Transactions on Information Theory 28(2): Mahajan, M.; Nimbhorkar, P.; and Varadarajan, The Planar k-means Problem is NP-hard. Theoretical Computer Science 442: Orchard, M. T A Fast Nearest-Neighbor Search Algorithm. In Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, volume 4, Phillips, S Acceleration of -Means and Related Clustering Algorithms. In Proceedings of the 4th International Workshop on Algorithm Engineering and Experiments, Poggi, G Fast Algorithm for Full-Search VQ Encoding. Electronics Letters 29(12): Ra, S. W., and im, J A Fast Mean-Distance- Ordered Partial Codebook Search Algorithm for Image Vector Quantization. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 40(9): Tai, S. C.and Lai, C. C., and Lin, Y. C Two Fast Nearest Neighbor Searching Algorithms for Image Vector Quantization. IEEE Transactions on Communications 44(12): Tarsitano, A A Computational Study of Several Relocation Methods for -Means Algorithms. Pattern Recognition 36(12):
An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm
An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationComparative Study on VQ with Simple GA and Ordain GA
Proceedings of the 9th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Turkey, May 27-29, 2007 204 Comparative Study on VQ with Simple GA and Ordain GA SADAF SAJJAD
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationDynamic local search algorithm for the clustering problem
UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Report Series A Dynamic local search algorithm for the clustering problem Ismo Kärkkäinen and Pasi Fränti Report A-2002-6 ACM I.4.2, I.5.3 UDK 519.712
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationGenetic algorithm with deterministic crossover for vector quantization
Pattern Recognition Letters 21 (2000) 61±68 www.elsevier.nl/locate/patrec Genetic algorithm with deterministic crossover for vector quantization Pasi Franti * Department of Computer Science, University
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationFeature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design
Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationIterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998
Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp. 2726-2732, October 1998 Timo Kaukoranta 1, Pasi Fränti 2 and Olli Nevalainen 1 1 Turku Centre
More informationHOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery
HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCodebook generation for Image Compression with Simple and Ordain GA
Codebook generation for Image Compression with Simple and Ordain GA SAJJAD MOHSIN, SADAF SAJJAD COMSATS Institute of Information Technology Department of Computer Science Tobe Camp, Abbotabad PAKISTAN
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationNon-Parametric Vector Quantization Algorithm
Non-Parametric Vector Quantization Algorithm 31 Non-Parametric Vector Quantization Algorithm Haemwaan Sivaraks and Athasit Surarerks, Non-members ABSTRACT Recently researches in vector quantization domain
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 03/02/2016 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationAn Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve
An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application Fouad Khan Central European University-Environmental
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationPattern Recognition Lecture Sequential Clustering
Pattern Recognition Lecture Prof. Dr. Marcin Grzegorzek Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany Pattern Recognition Chain patterns sensor
More information4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1
Anàlisi d Imatges i Reconeixement de Formes Image Analysis and Pattern Recognition:. Cluster Analysis Francesc J. Ferri Dept. d Informàtica. Universitat de València Febrer 8 F.J. Ferri (Univ. València)
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationAn Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners
An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners Mohammad Asiful Hossain, Abdul Kawsar Tushar, and Shofiullah Babor Computer Science and Engineering Department,
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationAn Effective Color Quantization Method Based on the Competitive Learning Paradigm
An Effective Color Quantization Method Based on the Competitive Learning Paradigm M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ecelebi@lsus.edu Abstract
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationMRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)
5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationModule 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures
The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It
More informationInital Starting Point Analysis for K-Means Clustering: A Case Study
lemson University TigerPrints Publications School of omputing 3-26 Inital Starting Point Analysis for K-Means lustering: A ase Study Amy Apon lemson University, aapon@clemson.edu Frank Robinson Vanderbilt
More informationConsensus clustering by graph based approach
Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationRecent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery
Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU
More informationCompression of Image Using VHDL Simulation
Compression of Image Using VHDL Simulation 1) Prof. S. S. Mungona (Assistant Professor, Sipna COET, Amravati). 2) Vishal V. Rathi, Abstract : Maintenance of all essential information without any deletion
More informationEfficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points
Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationCHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION
CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationA SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2
Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2 1 P.G. Scholar, Department of Computer Engineering, ARMIET, Mumbai University, India 2 Principal of, S.S.J.C.O.E, Mumbai University, India ABSTRACT Now a
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationA CSP Search Algorithm with Reduced Branching Factor
A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il
More informationBinary vector quantizer design using soft centroids
Signal Processing: Image Communication 14 (1999) 677}681 Binary vector quantizer design using soft centroids Pasi FraK nti *, Timo Kaukoranta Department of Computer Science, University of Joensuu, P.O.
More informationSeismic regionalization based on an artificial neural network
Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx
More informationClustering: K-means and Kernel K-means
Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem
More informationHierarchical Ordering for Approximate Similarity Ranking
Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au
More information2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an
Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationA LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION
A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION Ming-Hwa Sheu, Sh-Chi Tsai and Ming-Der Shieh Dept. of Electronic Eng., National Yunlin Univ. of Science and Technology, Yunlin,
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationEfficient Method for Half-Pixel Block Motion Estimation Using Block Differentials
Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials Tuukka Toivonen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering
More information26 The closest pair problem
The closest pair problem 1 26 The closest pair problem Sweep algorithms solve many kinds of proximity problems efficiently. We present a simple sweep that solves the two-dimensional closest pair problem
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationAssociative Cellular Learning Automata and its Applications
Associative Cellular Learning Automata and its Applications Meysam Ahangaran and Nasrin Taghizadeh and Hamid Beigy Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ahangaran@iust.ac.ir,
More informationHartigan s Method for K-modes Clustering and Its Advantages
Proceedings of the Twelfth Australasian Data Mining Conference (AusDM 201), Brisbane, Australia Hartigan s Method for K-modes Clustering and Its Advantages Zhengrong Xiang 1 Md Zahidul Islam 2 1 College
More informationAccelerating Unique Strategy for Centroid Priming in K-Means Clustering
IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationThis blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?
Intersecting Circles This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? This is a problem that a programmer might have to solve, for example,
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationParticle Swarm Optimization applied to Pattern Recognition
Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...
More informationFast nearest-neighbor search algorithms based on approximation-elimination search
Pattern Recognition 33 (2000 1497}1510 Fast nearest-neighbor search algorithms based on approximation-elimination search V. Ramasubramanian, Kuldip K. Paliwal* Computer Systems and Communications Group,
More informationMultilevel Compression Scheme using Vector Quantization for Image Compression
Multilevel Compression Scheme using Vector Quantization for Image Compression S.Vimala, B.Abidha, and P.Uma Abstract In this paper, we have proposed a multi level compression scheme, in which the initial
More informationRelative Constraints as Features
Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,
More informationA note on quickly finding the nearest neighbour
A note on quickly finding the nearest neighbour David Barber Department of Computer Science University College London May 19, 2014 1 Finding your nearest neighbour quickly Consider that we have a set of
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationDigital Signal Processing
Digital Signal Processing 20 (2010) 1494 1501 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Fast k-nearest neighbors search using modified principal axis
More informationHierarchical Minimum Spanning Trees for Lossy Image Set Compression
Hierarchical Minimum Spanning Trees for Lossy Image Set Compression Anthony Schmieder, Barry Gergel, and Howard Cheng Department of Mathematics and Computer Science University of Lethbridge, Alberta, Canada
More informationDecision Making. final results. Input. Update Utility
Active Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition Department of Computer Science and Engineering State University of New York
More informationBipartite Graph Partitioning and Content-based Image Clustering
Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationGeometric Computations for Simulation
1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things
More informationEfficient and optimal block matching for motion estimation
Efficient and optimal block matching for motion estimation Stefano Mattoccia Federico Tombari Luigi Di Stefano Marco Pignoloni Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento
More informationK-modes Clustering Algorithm for Categorical Data
K-modes Clustering Algorithm for Categorical Data Neha Sharma Samrat Ashok Technological Institute Department of Information Technology, Vidisha, India Nirmal Gaud Samrat Ashok Technological Institute
More informationA Study of Hierarchical and Partitioning Algorithms in Clustering Methods
A Study of Hierarchical Partitioning Algorithms in Clustering Methods T. NITHYA Dr.E.RAMARAJ Ph.D., Research Scholar Dept. of Computer Science Engg. Alagappa University Karaikudi-3. th.nithya@gmail.com
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationCharacter Recognition
Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationLecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering
CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number
More informationAn Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm
Journal of Universal Computer Science, vol. 13, no. 10 (2007), 1449-1461 submitted: 12/6/06, accepted: 24/10/06, appeared: 28/10/07 J.UCS An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm
More informationThe Application of K-medoids and PAM to the Clustering of Rules
The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More information