Random Swap algorithm

Size: px
Start display at page:

Download "Random Swap algorithm"

Transcription

1 Random Swap algorithm Pasi Fränti

2 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c 2,,c k },

3 N i P i i C x d dn MSE N i C x P j i k j i 1, argmin 2 1 k j x C j P j P i j i i 1 1, Clustering problem Objective function: Optimality of partition: Optimality of centroid:

4 K-means algorithm X = Data set C = Cluster centroids P = Partition K-Means(X, C) (C, P) REPEAT C prev C; FOR i=1 TO N DO p i FindNearest(x i, C); FOR j=1 TO k DO c j Average of x i p i = j; Optimal partition Optimal centoids UNTIL C = C prev

5 Problems of k-means

6 Swapping strategy

7 Pigeon hole principle CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure Pattern Recognition, 47 (9), , September 2014, 2014.

8 Pigeon hole principle S2

9 Aim of the swap One centroid, but two clusters. Two centroids, but only one cluster. S2

10 Random Swap algorithm Random Swap(X) C, P C Select random representatives(x); P Optimal partition(x, C); REPEAT T times (C new,j) Random swap(x, C); P new Local repartition(x, C new, P, j); C new, P new Kmeans(X, C new, P new ); IF f(c new, P new ) < f(c, P) THEN (C, P) C new, P new ; RETURN (C, P);

11 Steps of the swap 1. Random swap: c j x i j random( 1, k), i random(1, N) 2. Re-allocate vectors from old cluster: p i argmind 1 jk 3. Create new cluster: 2 x, c i p j i j 2 1 p arg min d x, c i, N i k j kp i i k i

12 Swap Swap is made from centroid rich area to centroid poor area.

13 Local re-partition Re-allocate vectors Create new cluster

14 Iterate by k-means 1 st iteration

15 Iterate by k-means 2 nd iteration

16 Iterate by k-means 3 rd iteration

17 Iterate by k-means 16 th iteration

18 Iterate by k-means 17 th iteration

19 Iterate by k-means 18 th iteration

20 Iterate by k-means 19 th iteration

21 Final result 25 iterations

22 Extreme example

23 Dependency on initial solution MSE K-means Random + RS K-means + RS Split + RS Bridge Ward + RS

24 Data sets

25 Data sets

26 Data sets visualized Images Bridge House Miss America Europe 4x4 blocks RGB color 4x4 blocks frame differential Differential coordinates 16-d 3-d 16-d 2-d

27 Data sets visualized Artificial S 1 S 2 S 3 S 4 Unbalance DIM 32

28 Data sets visualized Artificial G G G A 1 A 2 A 3

29 Time complexity

30 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) K-means: IkN = O(IkN) Bottleneck!

31 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) (Fast) K-means: 4N = O(N) 2 iterations only! T. Kaukoranta, P. Fränti and O. Nevalainen "A fast exact GLA based on code vector activity detection" IEEE Trans. on Image Processing, 9 (8), , August 2000.

32 Estimated and observed steps Bridge N=4096, k=256, N/k=16, 8

33 Processing time profile 10 8 N = 4096 k = Bridge Time (ms) 6 4 KM 2 39 % 2 KM 1 53 % 0 Local repartition

34 Processing time profile N = k = Birch1 Time (ms) KM 2 48 % CI=0 reached 40 0 KM 1 49 % Local repartition 50 % 44 %

35 Effect of K-means iterations 190 MSE Bridge Time (s)

36 Effect of K-means iterations MSE Birch Time (s)

37 How many swaps?

38 Three types of swaps Trial swap Accepted swap Successful swap MSE improves CI improves Before swap Accepted Successful CI=2 CI=2 CI=

39 Accepted and successful swaps

40 Number of swaps needed Example with 35 clusters A3 CI=4 CI=9

41 Number of swaps needed Example from image quantization K-means clustering result (3 swaps needed) Final clustering result Remove Added Remove

42 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % 5 % Median = 26 Average = 35 S1 Max = % cases < 70 0 % Swaps

43 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % Median = 18 Average = 25 S4 Max = % 90% cases < 50 0 % Swaps

44 Theoretical estimation

45 Probability of good swap Select a proper prototype to remove: There are k clusters in total: p removal =1/k Select a proper new location: There are N choices: p add =1/N Only k are significantly different: p add =1/k Both happens same time: k 2 significantly different swaps. Probability of each different swap is p swap =1/k 2 Open question: how many of these are good? p = (/k)(/k) = O(/k) 2

46 Expected number of iterations Probability of not finding good swap: q 2 1 k 2 T Estimated number of iterations: log q T 2 T log 1 k 2 log q 2 log 1 2 k

47 Probability of failure (q) depending on T q Iterations

48 Observed probability (%) of fail N=5000, k=15, d=2, N/k=333, q =10% 10 q q =1% q =0.1% 0.1 S 1 -S 4 S1 S2 S3 S4 Dataset

49 Observed probability (%) of fail N=1024, k=16, d=32-128, N/k=64, q Dimensions q =10% q =1% q =0.1% Dim

50 Bounds for the iterations Upper limit: T ln 1 ln q - ln q -ln q α / k α / k k α 2 Lower limit similarly; resulting in: T 2 k - ln q 2 α

51 Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations: i T i w i k k i T q log 1 ˆ k w k i t w i

52 Expected time complexity tˆ N, k log w N -log 1. Linear dependency on N k α w 2. Quadratic dependency on k (With large number of clusters, it can be too slow) 3. Logarithmic dependency on w (Close to constant) Inverse dependency on (Higher the dimensionality, faster the method) α Nk 2

53 Linear dependency on N N< , k=100, d=2, N/k=1000, % Birch2 Iterations N= % Data size (thousands)

54 Quadratic dependency on k N< , k=100, d=2, N/k=1000, Birch2 Iterations Quadratic fit 75% 25% Clusters

55 Logarithmic dependency on w

56 Theory vs. reality

57 Neighborhood size

58 How much is? Voronoi neighbors Neighbors by distance 1 2 S dim: D-dim: 2(3k-6)/k = 6 12/k 2kD/2/k = O(2k D/2-1) Upper limit: k

59 Observed number of neighbors Data set S 2 Frequency 45 % 40 % 35 % 30 % 25 % 20 % 15 % 10 % 5 % 0 % Average = Number of neighbours

60 Estimate Five iterations of random swap clustering Each pair of prototypes A and B: 1. Calculate the half point HP = (A+B)/2 2. Find the nearest prototype C for HP 3. If C=A or C=B they are potential neighbors. Analyze potential neighbors: 1. Calculate all vector distances across A and B 2. Select the nearest pair (a,b) 3. If d(a,b) < min( d(a,c(a), d(b,c(b) ) then Accept = Number of pairs found / k

61 Observed values of

62 Optimality

63 Multiple optima (=plateaus) Very similar result (<0.3% diff. in MSE) CI-values significantly high (9%) Finds one of the near-optimal solutions

64 Experiments

65 Time-versus-distortion N=4096, k=256, d=16, N/k=16, 5.4 MSE K-means (single) 70 (28) (172) 60 (365) (825) 30 (3k) (146) Random Swap Bridge Repeated k-means ( repeats) (13k) (78k) Time 5 61 (10k) 0 (387k) (813k)

66 Time-versus-distortion N=6480, k=256, d=16, N/k=25, 17.1 MSE K-means (single) (62) Random Swap (501) (1632) (136) Missa Repeated k-means ( repeats) 88 (10k) (23k) (70k) (135k) (940k) Time

67 Time-versus-distortion N= , k=100, d=2, N/k=1000, (2) Birch1 MSE (22) 5 (43) Random Swap K-means 7 (single) (606) 0 Repeated k-means (2500 repeats) Time 3 (489)

68 Time-versus-distortion N= , k=100, d=2, N/k=1000, 3.1 MSE (1) 18 K-means (single) 20 (34) 10 (64) 5 Random Swap Repeated k-means (2500 repeats) (41) 0 (2067) Birch2 9 (845) (2500) Time

69 Time-versus-distortion N= , k=256, d=2, N/k=663, Europe MSE (11) 80 (24) Random Swap 60 (44) K-means (single) (305) Repeated k-means (1888) (9k) (500 repeats) Time 10 (58k) (576k)

70 Time-versus-distortion N=6500, k=8, d=2, N/k=821, (4) K-means (single) Unbalance MSE (20) 2 1 (43) (98) 0 (116) Random swap Repeated k-means Time 2 1 (3.219) 1 (20.000)

71 Variation of results 15 % 10 % Random swap Bridge 5 % Repeated k-means % MSE

72 Variation of results 15 % Random swap Birch1 10 % 5 % 0 % 4.64 Repeated k-means MSE

73 Comparison of algorithms k-means (KM) k-means++ repeated k-means (RKM) x-means agglomerative clustering (AC) random swap (RS) global k-means genetic algorithm D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA 07), New Orleans, LA, , January, D. Pelleg, and A. Moore, "X-means: Extending k- means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML 00), Stanford, CA, USA, June P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), , May A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, , P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.

74 Processing time

75 Clustering quality

76 Conclusions

77 What we learned? 1. Random swap is efficient algorithm 2. It does not converge to sub-optimal result 3. Expected processing has dependency: Linear O(N) dependency on the size of data Quadratic O(k2) on the number of clusters Inverse O(1/) on the neighborhood size Logarithmic O(log w) on the number of swaps

78 References P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), , August P. Fränti, O. Virmajoki and V. Hautamäki, Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR 08), Tampa, FL, Dec Pseudo code:

79 Supporting material Implementations available: (C, Matlab, Java, Javascript, R and Python) Interactive animation: Clusterator:

Pasi Fränti K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017.

Pasi Fränti K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017. K-means properties Pasi Fränti 17.4.2017 K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017. Goal of k-means Input N points: X={x 1, x 2,, x N } Output

More information

Efficiency of random swap clustering

Efficiency of random swap clustering https://doi.org/10.1186/s40537-018-0122-y RESEARCH Open Access Efficiency of random swap clustering Pasi Fränti * *Correspondence: franti@cs.uef.fi Machine Learning Group, School of Computing, University

More information

Comparative Study on VQ with Simple GA and Ordain GA

Comparative Study on VQ with Simple GA and Ordain GA Proceedings of the 9th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Turkey, May 27-29, 2007 204 Comparative Study on VQ with Simple GA and Ordain GA SADAF SAJJAD

More information

Genetic algorithm with deterministic crossover for vector quantization

Genetic algorithm with deterministic crossover for vector quantization Pattern Recognition Letters 21 (2000) 61±68 www.elsevier.nl/locate/patrec Genetic algorithm with deterministic crossover for vector quantization Pasi Franti * Department of Computer Science, University

More information

Random Projection for k-means Clustering

Random Projection for k-means Clustering Random Projection for k-means Clustering Sami Sieranoja and Pasi Fränti ( ) School of Computing, University of Eastern Finland, Joensuu, Finland {sami.sieranoja,pasi.franti}@uef.fi Abstract. We study how

More information

Codebook generation for Image Compression with Simple and Ordain GA

Codebook generation for Image Compression with Simple and Ordain GA Codebook generation for Image Compression with Simple and Ordain GA SAJJAD MOHSIN, SADAF SAJJAD COMSATS Institute of Information Technology Department of Computer Science Tobe Camp, Abbotabad PAKISTAN

More information

Generalizing centroid index to different clustering models

Generalizing centroid index to different clustering models Generalizing centroid index to different clustering models Pasi Fränti and Mohammad Rezaei University of Eastern Finland Abstract. Centroid index is the only measure that evaluates cluster level differrences

More information

Self-Adaptive Genetic Algorithm for Clustering

Self-Adaptive Genetic Algorithm for Clustering Journal of Heuristics, 9: 113 129, 3 c 3 Kluwer Academic Publishers. Manufactured in The Netherlands. Self-Adaptive Genetic Algorithm for Clustering JUHA KIVIJÄRVI Turku Centre for Computer Science (TUCS),

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Dynamic local search algorithm for the clustering problem

Dynamic local search algorithm for the clustering problem UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Report Series A Dynamic local search algorithm for the clustering problem Ismo Kärkkäinen and Pasi Fränti Report A-2002-6 ACM I.4.2, I.5.3 UDK 519.712

More information

Binary vector quantizer design using soft centroids

Binary vector quantizer design using soft centroids Signal Processing: Image Communication 14 (1999) 677}681 Binary vector quantizer design using soft centroids Pasi FraK nti *, Timo Kaukoranta Department of Computer Science, University of Joensuu, P.O.

More information

Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998

Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998 Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp. 2726-2732, October 1998 Timo Kaukoranta 1, Pasi Fränti 2 and Olli Nevalainen 1 1 Turku Centre

More information

1090 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY Centroid Ratio for a Pairwise Random Swap Clustering Algorithm

1090 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY Centroid Ratio for a Pairwise Random Swap Clustering Algorithm 1090 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY 2014 Centroid Ratio for a Pairwise Random Swap Clustering Algorithm Qinpei Zhao and Pasi Fränti, Senior Member, IEEE Abstract

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

On the splitting method for vector quantization codebook generation

On the splitting method for vector quantization codebook generation On the splitting method for vector quantization codebook generation Pasi Fränti University of Joensuu Department of Computer Science P.O. Box 111 FIN-80101 Joensuu, Finland E-mail: franti@cs.joensuu.fi

More information

Towards the world s fastest k-means algorithm

Towards the world s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering

More information

Clustering methods: Part 7 Outlier removal Pasi Fränti

Clustering methods: Part 7 Outlier removal Pasi Fränti Clustering methods: Part 7 Outlier removal Pasi Fränti 6.5.207 Machine Learning University of Eastern Finland Outlier detection methods Distance-based methods Knorr & Ng Density-based methods KDIST: K

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Improving K-Means by Outlier Removal

Improving K-Means by Outlier Removal Improving K-Means by Outlier Removal Ville Hautamäki, Svetlana Cherednichenko, Ismo Kärkkäinen, Tomi Kinnunen, and Pasi Fränti Speech and Image Processing Unit, Department of Computer Science, University

More information

Clustering: Centroid-Based Partitioning

Clustering: Centroid-Based Partitioning Clustering: Centroid-Based Partitioning Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 29 Y Tao Clustering: Centroid-Based Partitioning In this lecture, we

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

K-Means Clustering. Sargur Srihari

K-Means Clustering. Sargur Srihari K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Non-Parametric Vector Quantization Algorithm

Non-Parametric Vector Quantization Algorithm Non-Parametric Vector Quantization Algorithm 31 Non-Parametric Vector Quantization Algorithm Haemwaan Sivaraks and Athasit Surarerks, Non-members ABSTRACT Recently researches in vector quantization domain

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Francisco Martinez-Gil, Miguel Lozano, Fernando Ferna ndez Presented by: Daniel Geschwender 9/29/2016 1 Overview

More information

k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008

k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008 k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008 Outline General problem definition Several specific examples k-center, k-means, k-mediod Approximation

More information

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART

More information

Overview. 1 Preliminaries. 2 k-center Clustering. 3 The Greedy Clustering Algorithm. 4 The Greedy permutation. 5 k-median clustering

Overview. 1 Preliminaries. 2 k-center Clustering. 3 The Greedy Clustering Algorithm. 4 The Greedy permutation. 5 k-median clustering KCenter K-Center Isuru Gunasekara University of Ottawa aguna100@uottawa.ca November 21,2016 Overview KCenter 1 2 3 The Algorithm 4 The permutation 5 6 for What is? KCenter The process of finding interesting

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Clustering: An art of grouping related objects

Clustering: An art of grouping related objects Clustering: An art of grouping related objects Sumit Kumar, Sunil Verma Abstract- In today s world, clustering has seen many applications due to its ability of binding related data together but there are

More information

STEPWISE ALGORITHM FOR FINDING UNKNOWN NUMBER OF CLUSTERS. Ismo Kärkkäinen and Pasi Fränti

STEPWISE ALGORITHM FOR FINDING UNKNOWN NUMBER OF CLUSTERS. Ismo Kärkkäinen and Pasi Fränti STEPWISE ALGORITHM FOR FINDING UNKNOWN NUMBER OF CLUSTERS Ismo Käräinen and Pasi Fränti Ismo.Karainen@cs.joensuu.fi, Pasi.Franti@cs.joensuu.fi University of Joensuu, Department of Computer Science, P.O.

More information

N-Candidate methods for location invariant dithering of color images

N-Candidate methods for location invariant dithering of color images Image and Vision Computing 18 (2000) 493 500 www.elsevier.com/locate/imavis N-Candidate methods for location invariant dithering of color images K. Lemström a,1, *, P. Fränti b,2 a Department of Computer

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Unsupervised Learning Hierarchical Methods

Unsupervised Learning Hierarchical Methods Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

Compression of Image Using VHDL Simulation

Compression of Image Using VHDL Simulation Compression of Image Using VHDL Simulation 1) Prof. S. S. Mungona (Assistant Professor, Sipna COET, Amravati). 2) Vishal V. Rathi, Abstract : Maintenance of all essential information without any deletion

More information

9.913 Pattern Recognition for Vision. Class 8-2 An Application of Clustering. Bernd Heisele

9.913 Pattern Recognition for Vision. Class 8-2 An Application of Clustering. Bernd Heisele 9.913 Class 8-2 An Application of Clustering Bernd Heisele Fall 2003 Overview Problem Background Clustering for Tracking Examples Literature Homework Problem Detect objects on the road: Cars, trucks, motorbikes,

More information

CS229 Final Project: k-means Algorithm

CS229 Final Project: k-means Algorithm CS229 Final Project: k-means Algorithm Colin Wei and Alfred Xue SUNet ID: colinwei axue December 11, 2014 1 Notes This project was done in conjuction with a similar project that explored k-means in CS

More information

Iterative shrinking method for clustering problems

Iterative shrinking method for clustering problems Pattern Recognition 39 (2006) 761 775 www.elsevier.com/locate/patcog Iterative shrinking method for clustering problems Pasi Fränti, Olli Virmajoki Department of Computer Science, Universit of Joensuu,

More information

A Review of K-mean Algorithm

A Review of K-mean Algorithm A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation

More information

Image Compression with Competitive Networks and Pre-fixed Prototypes*

Image Compression with Competitive Networks and Pre-fixed Prototypes* Image Compression with Competitive Networks and Pre-fixed Prototypes* Enrique Merida-Casermeiro^, Domingo Lopez-Rodriguez^, and Juan M. Ortiz-de-Lazcano-Lobato^ ^ Department of Applied Mathematics, University

More information

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data

More information

k-means Clustering David S. Rosenberg April 24, 2018 New York University

k-means Clustering David S. Rosenberg April 24, 2018 New York University k-means Clustering David S. Rosenberg New York University April 24, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 24, 2018 1 / 19 Contents 1 k-means Clustering 2 k-means:

More information

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION

CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 778 784 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Color Image Compression

More information

Geometric Modeling. Mesh Decimation. Mesh Decimation. Applications. Copyright 2010 Gotsman, Pauly Page 1. Oversampled 3D scan data

Geometric Modeling. Mesh Decimation. Mesh Decimation. Applications. Copyright 2010 Gotsman, Pauly Page 1. Oversampled 3D scan data Applications Oversampled 3D scan data ~150k triangles ~80k triangles 2 Copyright 2010 Gotsman, Pauly Page 1 Applications Overtessellation: E.g. iso-surface extraction 3 Applications Multi-resolution hierarchies

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for

More information

Contents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1

Contents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1 Contents List of Figures List of Tables List of Algorithms Preface xiii xv xvii xix I Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering... 3 1.2 The Vocabulary

More information

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Adaptive Binary Quantization for Fast Nearest Neighbor Search IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,

More information

Neural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing

Neural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing Neural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing JOHN TSILIGARIDIS Mathematics and Computer Science Department Heritage University Toppenish, WA USA

More information

3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm

3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm 3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm Bing Han, Chris Paulson, and Dapeng Wu Department of Electrical and Computer Engineering University of Florida

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

A Mixed Hierarchical Algorithm for Nearest Neighbor Search

A Mixed Hierarchical Algorithm for Nearest Neighbor Search A Mixed Hierarchical Algorithm for Nearest Neighbor Search Carlo del Mundo Virginia Tech 222 Kraft Dr. Knowledge Works II Building Blacksburg, VA cdel@vt.edu ABSTRACT The k nearest neighbor (knn) search

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

VQ Encoding is Nearest Neighbor Search

VQ Encoding is Nearest Neighbor Search VQ Encoding is Nearest Neighbor Search Given an input vector, find the closest codeword in the codebook and output its index. Closest is measured in squared Euclidean distance. For two vectors (w 1,x 1,y

More information

Mean-shift outlier detection

Mean-shift outlier detection Mean-shift outlier detection Jiawei YANG a, Susanto RAHARDJA b a,1 and Pasi FRÄNTI a School of Computing, University of Eastern Finland b Northwestern Polytechnical University, Xi an, China Abstract. We

More information

An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm

An Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference An Accelerated Nearest Neighbor Search Method for the -Means Clustering Algorithm Adam Fausett

More information

L9: Hierarchical Clustering

L9: Hierarchical Clustering L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,

More information

On Initial Effects of the k-means Clustering

On Initial Effects of the k-means Clustering 200 Int'l Conf. Scientific Computing CSC'15 On Initial Effects of the k-means Clustering Sherri Burks, Greg Harrell, and Jin Wang Department of Mathematics and Computer Science Valdosta State University,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Clustering algorithms and introduction to persistent homology

Clustering algorithms and introduction to persistent homology Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction

More information

Multilevel Compression Scheme using Vector Quantization for Image Compression

Multilevel Compression Scheme using Vector Quantization for Image Compression Multilevel Compression Scheme using Vector Quantization for Image Compression S.Vimala, B.Abidha, and P.Uma Abstract In this paper, we have proposed a multi level compression scheme, in which the initial

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Clustering Documentation

Clustering Documentation Clustering Documentation Release 0.3.0 Dahua Lin and contributors Dec 09, 2017 Contents 1 Overview 3 1.1 Inputs................................................... 3 1.2 Common Options.............................................

More information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM. Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)

More information

Clustering k-mean clustering

Clustering k-mean clustering Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: partition genes into distinct sets with high homogeneity and high separation Clustering (unsupervised)

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling

Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling Tomi Kinnunen, Ilja Sidoroff, Marko Tuononen, Pasi Fränti Speech and Image Processing Unit, School of Computing, University

More information

A Comparison of Resampling Methods for Clustering Ensembles

A Comparison of Resampling Methods for Clustering Ensembles A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department

More information

Solving Capacitated P-Median Problem by Hybrid K-Means Clustering and Fixed Neighborhood Search algorithm

Solving Capacitated P-Median Problem by Hybrid K-Means Clustering and Fixed Neighborhood Search algorithm Proceedings of the 2010 International Conference on Industrial Engineering and Operations Management Dhaka, Bangladesh, January 9 10, 2010 Solving Capacitated P-Median Problem by Hybrid K-Means Clustering

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Lecture 13: k-means and mean-shift clustering

Lecture 13: k-means and mean-shift clustering Lecture 13: k-means and mean-shift clustering Juan Carlos Niebles Stanford AI Lab Professor Stanford Vision Lab Lecture 13-1 Recap: Image Segmentation Goal: identify groups of pixels that go together Lecture

More information

Approximating max-min linear programs with local algorithms

Approximating max-min linear programs with local algorithms Approximating max-min linear programs with local algorithms Patrik Floréen, Marja Hassinen, Petteri Kaski, Topi Musto, Jukka Suomela HIIT seminar 29 February 2008 Max-min linear programs: Example 2 / 24

More information

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:

More information

ECE6095: CAD Algorithms. Optimization Techniques

ECE6095: CAD Algorithms. Optimization Techniques ECE6095: CAD Algorithms Optimization Techniques Mohammad Tehranipoor ECE Department 6 September 2010 1 Optimization Techniques Objective: A basic review of complexity Review of basic algorithms Some physical

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information