Random Swap algorithm
|
|
- Amy Arnold
- 5 years ago
- Views:
Transcription
1 Random Swap algorithm Pasi Fränti
2 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c 2,,c k },
3 N i P i i C x d dn MSE N i C x P j i k j i 1, argmin 2 1 k j x C j P j P i j i i 1 1, Clustering problem Objective function: Optimality of partition: Optimality of centroid:
4 K-means algorithm X = Data set C = Cluster centroids P = Partition K-Means(X, C) (C, P) REPEAT C prev C; FOR i=1 TO N DO p i FindNearest(x i, C); FOR j=1 TO k DO c j Average of x i p i = j; Optimal partition Optimal centoids UNTIL C = C prev
5 Problems of k-means
6 Swapping strategy
7 Pigeon hole principle CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure Pattern Recognition, 47 (9), , September 2014, 2014.
8 Pigeon hole principle S2
9 Aim of the swap One centroid, but two clusters. Two centroids, but only one cluster. S2
10 Random Swap algorithm Random Swap(X) C, P C Select random representatives(x); P Optimal partition(x, C); REPEAT T times (C new,j) Random swap(x, C); P new Local repartition(x, C new, P, j); C new, P new Kmeans(X, C new, P new ); IF f(c new, P new ) < f(c, P) THEN (C, P) C new, P new ; RETURN (C, P);
11 Steps of the swap 1. Random swap: c j x i j random( 1, k), i random(1, N) 2. Re-allocate vectors from old cluster: p i argmind 1 jk 3. Create new cluster: 2 x, c i p j i j 2 1 p arg min d x, c i, N i k j kp i i k i
12 Swap Swap is made from centroid rich area to centroid poor area.
13 Local re-partition Re-allocate vectors Create new cluster
14 Iterate by k-means 1 st iteration
15 Iterate by k-means 2 nd iteration
16 Iterate by k-means 3 rd iteration
17 Iterate by k-means 16 th iteration
18 Iterate by k-means 17 th iteration
19 Iterate by k-means 18 th iteration
20 Iterate by k-means 19 th iteration
21 Final result 25 iterations
22 Extreme example
23 Dependency on initial solution MSE K-means Random + RS K-means + RS Split + RS Bridge Ward + RS
24 Data sets
25 Data sets
26 Data sets visualized Images Bridge House Miss America Europe 4x4 blocks RGB color 4x4 blocks frame differential Differential coordinates 16-d 3-d 16-d 2-d
27 Data sets visualized Artificial S 1 S 2 S 3 S 4 Unbalance DIM 32
28 Data sets visualized Artificial G G G A 1 A 2 A 3
29 Time complexity
30 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) K-means: IkN = O(IkN) Bottleneck!
31 Efficiency of the random swap Total time to find correct clustering: Time per iteration Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) (Fast) K-means: 4N = O(N) 2 iterations only! T. Kaukoranta, P. Fränti and O. Nevalainen "A fast exact GLA based on code vector activity detection" IEEE Trans. on Image Processing, 9 (8), , August 2000.
32 Estimated and observed steps Bridge N=4096, k=256, N/k=16, 8
33 Processing time profile 10 8 N = 4096 k = Bridge Time (ms) 6 4 KM 2 39 % 2 KM 1 53 % 0 Local repartition
34 Processing time profile N = k = Birch1 Time (ms) KM 2 48 % CI=0 reached 40 0 KM 1 49 % Local repartition 50 % 44 %
35 Effect of K-means iterations 190 MSE Bridge Time (s)
36 Effect of K-means iterations MSE Birch Time (s)
37 How many swaps?
38 Three types of swaps Trial swap Accepted swap Successful swap MSE improves CI improves Before swap Accepted Successful CI=2 CI=2 CI=
39 Accepted and successful swaps
40 Number of swaps needed Example with 35 clusters A3 CI=4 CI=9
41 Number of swaps needed Example from image quantization K-means clustering result (3 swaps needed) Final clustering result Remove Added Remove
42 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % 5 % Median = 26 Average = 35 S1 Max = % cases < 70 0 % Swaps
43 Statistical observations N=5000, k=15, d=2, N/k=333, % 15 % 10 % Median = 18 Average = 25 S4 Max = % 90% cases < 50 0 % Swaps
44 Theoretical estimation
45 Probability of good swap Select a proper prototype to remove: There are k clusters in total: p removal =1/k Select a proper new location: There are N choices: p add =1/N Only k are significantly different: p add =1/k Both happens same time: k 2 significantly different swaps. Probability of each different swap is p swap =1/k 2 Open question: how many of these are good? p = (/k)(/k) = O(/k) 2
46 Expected number of iterations Probability of not finding good swap: q 2 1 k 2 T Estimated number of iterations: log q T 2 T log 1 k 2 log q 2 log 1 2 k
47 Probability of failure (q) depending on T q Iterations
48 Observed probability (%) of fail N=5000, k=15, d=2, N/k=333, q =10% 10 q q =1% q =0.1% 0.1 S 1 -S 4 S1 S2 S3 S4 Dataset
49 Observed probability (%) of fail N=1024, k=16, d=32-128, N/k=64, q Dimensions q =10% q =1% q =0.1% Dim
50 Bounds for the iterations Upper limit: T ln 1 ln q - ln q -ln q α / k α / k k α 2 Lower limit similarly; resulting in: T 2 k - ln q 2 α
51 Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations: i T i w i k k i T q log 1 ˆ k w k i t w i
52 Expected time complexity tˆ N, k log w N -log 1. Linear dependency on N k α w 2. Quadratic dependency on k (With large number of clusters, it can be too slow) 3. Logarithmic dependency on w (Close to constant) Inverse dependency on (Higher the dimensionality, faster the method) α Nk 2
53 Linear dependency on N N< , k=100, d=2, N/k=1000, % Birch2 Iterations N= % Data size (thousands)
54 Quadratic dependency on k N< , k=100, d=2, N/k=1000, Birch2 Iterations Quadratic fit 75% 25% Clusters
55 Logarithmic dependency on w
56 Theory vs. reality
57 Neighborhood size
58 How much is? Voronoi neighbors Neighbors by distance 1 2 S dim: D-dim: 2(3k-6)/k = 6 12/k 2kD/2/k = O(2k D/2-1) Upper limit: k
59 Observed number of neighbors Data set S 2 Frequency 45 % 40 % 35 % 30 % 25 % 20 % 15 % 10 % 5 % 0 % Average = Number of neighbours
60 Estimate Five iterations of random swap clustering Each pair of prototypes A and B: 1. Calculate the half point HP = (A+B)/2 2. Find the nearest prototype C for HP 3. If C=A or C=B they are potential neighbors. Analyze potential neighbors: 1. Calculate all vector distances across A and B 2. Select the nearest pair (a,b) 3. If d(a,b) < min( d(a,c(a), d(b,c(b) ) then Accept = Number of pairs found / k
61 Observed values of
62 Optimality
63 Multiple optima (=plateaus) Very similar result (<0.3% diff. in MSE) CI-values significantly high (9%) Finds one of the near-optimal solutions
64 Experiments
65 Time-versus-distortion N=4096, k=256, d=16, N/k=16, 5.4 MSE K-means (single) 70 (28) (172) 60 (365) (825) 30 (3k) (146) Random Swap Bridge Repeated k-means ( repeats) (13k) (78k) Time 5 61 (10k) 0 (387k) (813k)
66 Time-versus-distortion N=6480, k=256, d=16, N/k=25, 17.1 MSE K-means (single) (62) Random Swap (501) (1632) (136) Missa Repeated k-means ( repeats) 88 (10k) (23k) (70k) (135k) (940k) Time
67 Time-versus-distortion N= , k=100, d=2, N/k=1000, (2) Birch1 MSE (22) 5 (43) Random Swap K-means 7 (single) (606) 0 Repeated k-means (2500 repeats) Time 3 (489)
68 Time-versus-distortion N= , k=100, d=2, N/k=1000, 3.1 MSE (1) 18 K-means (single) 20 (34) 10 (64) 5 Random Swap Repeated k-means (2500 repeats) (41) 0 (2067) Birch2 9 (845) (2500) Time
69 Time-versus-distortion N= , k=256, d=2, N/k=663, Europe MSE (11) 80 (24) Random Swap 60 (44) K-means (single) (305) Repeated k-means (1888) (9k) (500 repeats) Time 10 (58k) (576k)
70 Time-versus-distortion N=6500, k=8, d=2, N/k=821, (4) K-means (single) Unbalance MSE (20) 2 1 (43) (98) 0 (116) Random swap Repeated k-means Time 2 1 (3.219) 1 (20.000)
71 Variation of results 15 % 10 % Random swap Bridge 5 % Repeated k-means % MSE
72 Variation of results 15 % Random swap Birch1 10 % 5 % 0 % 4.64 Repeated k-means MSE
73 Comparison of algorithms k-means (KM) k-means++ repeated k-means (RKM) x-means agglomerative clustering (AC) random swap (RS) global k-means genetic algorithm D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA 07), New Orleans, LA, , January, D. Pelleg, and A. Moore, "X-means: Extending k- means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML 00), Stanford, CA, USA, June P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), , May A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, , P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.
74 Processing time
75 Clustering quality
76 Conclusions
77 What we learned? 1. Random swap is efficient algorithm 2. It does not converge to sub-optimal result 3. Expected processing has dependency: Linear O(N) dependency on the size of data Quadratic O(k2) on the number of clusters Inverse O(1/) on the neighborhood size Logarithmic O(log w) on the number of swaps
78 References P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), , August P. Fränti, O. Virmajoki and V. Hautamäki, Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR 08), Tampa, FL, Dec Pseudo code:
79 Supporting material Implementations available: (C, Matlab, Java, Javascript, R and Python) Interactive animation: Clusterator:
Pasi Fränti K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017.
K-means properties Pasi Fränti 17.4.2017 K-means properties on six clustering benchmark datasets Pasi Fränti and Sami Sieranoja Algorithms, 2017. Goal of k-means Input N points: X={x 1, x 2,, x N } Output
More informationEfficiency of random swap clustering
https://doi.org/10.1186/s40537-018-0122-y RESEARCH Open Access Efficiency of random swap clustering Pasi Fränti * *Correspondence: franti@cs.uef.fi Machine Learning Group, School of Computing, University
More informationComparative Study on VQ with Simple GA and Ordain GA
Proceedings of the 9th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Turkey, May 27-29, 2007 204 Comparative Study on VQ with Simple GA and Ordain GA SADAF SAJJAD
More informationGenetic algorithm with deterministic crossover for vector quantization
Pattern Recognition Letters 21 (2000) 61±68 www.elsevier.nl/locate/patrec Genetic algorithm with deterministic crossover for vector quantization Pasi Franti * Department of Computer Science, University
More informationRandom Projection for k-means Clustering
Random Projection for k-means Clustering Sami Sieranoja and Pasi Fränti ( ) School of Computing, University of Eastern Finland, Joensuu, Finland {sami.sieranoja,pasi.franti}@uef.fi Abstract. We study how
More informationCodebook generation for Image Compression with Simple and Ordain GA
Codebook generation for Image Compression with Simple and Ordain GA SAJJAD MOHSIN, SADAF SAJJAD COMSATS Institute of Information Technology Department of Computer Science Tobe Camp, Abbotabad PAKISTAN
More informationGeneralizing centroid index to different clustering models
Generalizing centroid index to different clustering models Pasi Fränti and Mohammad Rezaei University of Eastern Finland Abstract. Centroid index is the only measure that evaluates cluster level differrences
More informationSelf-Adaptive Genetic Algorithm for Clustering
Journal of Heuristics, 9: 113 129, 3 c 3 Kluwer Academic Publishers. Manufactured in The Netherlands. Self-Adaptive Genetic Algorithm for Clustering JUHA KIVIJÄRVI Turku Centre for Computer Science (TUCS),
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationDynamic local search algorithm for the clustering problem
UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Report Series A Dynamic local search algorithm for the clustering problem Ismo Kärkkäinen and Pasi Fränti Report A-2002-6 ACM I.4.2, I.5.3 UDK 519.712
More informationBinary vector quantizer design using soft centroids
Signal Processing: Image Communication 14 (1999) 677}681 Binary vector quantizer design using soft centroids Pasi FraK nti *, Timo Kaukoranta Department of Computer Science, University of Joensuu, P.O.
More informationIterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp , October 1998
Iterative split-and-merge algorithm for VQ codebook generation published in Optical Engineering, 37 (10), pp. 2726-2732, October 1998 Timo Kaukoranta 1, Pasi Fränti 2 and Olli Nevalainen 1 1 Turku Centre
More information1090 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY Centroid Ratio for a Pairwise Random Swap Clustering Algorithm
1090 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY 2014 Centroid Ratio for a Pairwise Random Swap Clustering Algorithm Qinpei Zhao and Pasi Fränti, Senior Member, IEEE Abstract
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationOn the splitting method for vector quantization codebook generation
On the splitting method for vector quantization codebook generation Pasi Fränti University of Joensuu Department of Computer Science P.O. Box 111 FIN-80101 Joensuu, Finland E-mail: franti@cs.joensuu.fi
More informationTowards the world s fastest k-means algorithm
Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering
More informationClustering methods: Part 7 Outlier removal Pasi Fränti
Clustering methods: Part 7 Outlier removal Pasi Fränti 6.5.207 Machine Learning University of Eastern Finland Outlier detection methods Distance-based methods Knorr & Ng Density-based methods KDIST: K
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationComputer Vision. Exercise Session 10 Image Categorization
Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category
More informationImproving K-Means by Outlier Removal
Improving K-Means by Outlier Removal Ville Hautamäki, Svetlana Cherednichenko, Ismo Kärkkäinen, Tomi Kinnunen, and Pasi Fränti Speech and Image Processing Unit, Department of Computer Science, University
More informationClustering: Centroid-Based Partitioning
Clustering: Centroid-Based Partitioning Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 29 Y Tao Clustering: Centroid-Based Partitioning In this lecture, we
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationK-Means Clustering. Sargur Srihari
K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationNon-Parametric Vector Quantization Algorithm
Non-Parametric Vector Quantization Algorithm 31 Non-Parametric Vector Quantization Algorithm Haemwaan Sivaraks and Athasit Surarerks, Non-members ABSTRACT Recently researches in vector quantization domain
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationStrategies for simulating pedestrian navigation with multiple reinforcement learning agents
Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Francisco Martinez-Gil, Miguel Lozano, Fernando Ferna ndez Presented by: Daniel Geschwender 9/29/2016 1 Overview
More informationk-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008
k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008 Outline General problem definition Several specific examples k-center, k-means, k-mediod Approximation
More informationAn Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm
An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level
More informationClustering. (Part 2)
Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works
More informationCSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning
CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART
More informationOverview. 1 Preliminaries. 2 k-center Clustering. 3 The Greedy Clustering Algorithm. 4 The Greedy permutation. 5 k-median clustering
KCenter K-Center Isuru Gunasekara University of Ottawa aguna100@uottawa.ca November 21,2016 Overview KCenter 1 2 3 The Algorithm 4 The permutation 5 6 for What is? KCenter The process of finding interesting
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationClustering: An art of grouping related objects
Clustering: An art of grouping related objects Sumit Kumar, Sunil Verma Abstract- In today s world, clustering has seen many applications due to its ability of binding related data together but there are
More informationSTEPWISE ALGORITHM FOR FINDING UNKNOWN NUMBER OF CLUSTERS. Ismo Kärkkäinen and Pasi Fränti
STEPWISE ALGORITHM FOR FINDING UNKNOWN NUMBER OF CLUSTERS Ismo Käräinen and Pasi Fränti Ismo.Karainen@cs.joensuu.fi, Pasi.Franti@cs.joensuu.fi University of Joensuu, Department of Computer Science, P.O.
More informationN-Candidate methods for location invariant dithering of color images
Image and Vision Computing 18 (2000) 493 500 www.elsevier.com/locate/imavis N-Candidate methods for location invariant dithering of color images K. Lemström a,1, *, P. Fränti b,2 a Department of Computer
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationFeature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design
Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationUnsupervised Learning Hierarchical Methods
Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach
More informationCluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole
Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationRandomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.
Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationCompression of Image Using VHDL Simulation
Compression of Image Using VHDL Simulation 1) Prof. S. S. Mungona (Assistant Professor, Sipna COET, Amravati). 2) Vishal V. Rathi, Abstract : Maintenance of all essential information without any deletion
More information9.913 Pattern Recognition for Vision. Class 8-2 An Application of Clustering. Bernd Heisele
9.913 Class 8-2 An Application of Clustering Bernd Heisele Fall 2003 Overview Problem Background Clustering for Tracking Examples Literature Homework Problem Detect objects on the road: Cars, trucks, motorbikes,
More informationCS229 Final Project: k-means Algorithm
CS229 Final Project: k-means Algorithm Colin Wei and Alfred Xue SUNet ID: colinwei axue December 11, 2014 1 Notes This project was done in conjuction with a similar project that explored k-means in CS
More informationIterative shrinking method for clustering problems
Pattern Recognition 39 (2006) 761 775 www.elsevier.com/locate/patcog Iterative shrinking method for clustering problems Pasi Fränti, Olli Virmajoki Department of Computer Science, Universit of Joensuu,
More informationA Review of K-mean Algorithm
A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationTOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,
More informationData Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation
More informationImage Compression with Competitive Networks and Pre-fixed Prototypes*
Image Compression with Competitive Networks and Pre-fixed Prototypes* Enrique Merida-Casermeiro^, Domingo Lopez-Rodriguez^, and Juan M. Ortiz-de-Lazcano-Lobato^ ^ Department of Applied Mathematics, University
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationk-means Clustering David S. Rosenberg April 24, 2018 New York University
k-means Clustering David S. Rosenberg New York University April 24, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 24, 2018 1 / 19 Contents 1 k-means Clustering 2 k-means:
More informationCHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION
CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 778 784 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Color Image Compression
More informationGeometric Modeling. Mesh Decimation. Mesh Decimation. Applications. Copyright 2010 Gotsman, Pauly Page 1. Oversampled 3D scan data
Applications Oversampled 3D scan data ~150k triangles ~80k triangles 2 Copyright 2010 Gotsman, Pauly Page 1 Applications Overtessellation: E.g. iso-surface extraction 3 Applications Multi-resolution hierarchies
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationRecommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu
Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for
More informationContents. List of Figures. List of Tables. List of Algorithms. I Clustering, Data, and Similarity Measures 1
Contents List of Figures List of Tables List of Algorithms Preface xiii xv xvii xix I Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering... 3 1.2 The Vocabulary
More informationAdaptive Binary Quantization for Fast Nearest Neighbor Search
IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,
More informationNeural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing
Neural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing JOHN TSILIGARIDIS Mathematics and Computer Science Department Heritage University Toppenish, WA USA
More information3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm
3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm Bing Han, Chris Paulson, and Dapeng Wu Department of Electrical and Computer Engineering University of Florida
More informationOptimal Parallel Randomized Renaming
Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously
More informationA Mixed Hierarchical Algorithm for Nearest Neighbor Search
A Mixed Hierarchical Algorithm for Nearest Neighbor Search Carlo del Mundo Virginia Tech 222 Kraft Dr. Knowledge Works II Building Blacksburg, VA cdel@vt.edu ABSTRACT The k nearest neighbor (knn) search
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationVQ Encoding is Nearest Neighbor Search
VQ Encoding is Nearest Neighbor Search Given an input vector, find the closest codeword in the codebook and output its index. Closest is measured in squared Euclidean distance. For two vectors (w 1,x 1,y
More informationMean-shift outlier detection
Mean-shift outlier detection Jiawei YANG a, Susanto RAHARDJA b a,1 and Pasi FRÄNTI a School of Computing, University of Eastern Finland b Northwestern Polytechnical University, Xi an, China Abstract. We
More informationAn Accelerated Nearest Neighbor Search Method for the K-Means Clustering Algorithm
Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference An Accelerated Nearest Neighbor Search Method for the -Means Clustering Algorithm Adam Fausett
More informationL9: Hierarchical Clustering
L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,
More informationOn Initial Effects of the k-means Clustering
200 Int'l Conf. Scientific Computing CSC'15 On Initial Effects of the k-means Clustering Sherri Burks, Greg Harrell, and Jin Wang Department of Mathematics and Computer Science Valdosta State University,
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationClustering algorithms and introduction to persistent homology
Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction
More informationMultilevel Compression Scheme using Vector Quantization for Image Compression
Multilevel Compression Scheme using Vector Quantization for Image Compression S.Vimala, B.Abidha, and P.Uma Abstract In this paper, we have proposed a multi level compression scheme, in which the initial
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationClustering Documentation
Clustering Documentation Release 0.3.0 Dahua Lin and contributors Dec 09, 2017 Contents 1 Overview 3 1.1 Inputs................................................... 3 1.2 Common Options.............................................
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationClustering k-mean clustering
Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: partition genes into distinct sets with high homogeneity and high separation Clustering (unsupervised)
More informationClustering: Overview and K-means algorithm
Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin
More informationComparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling
Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling Tomi Kinnunen, Ilja Sidoroff, Marko Tuononen, Pasi Fränti Speech and Image Processing Unit, School of Computing, University
More informationA Comparison of Resampling Methods for Clustering Ensembles
A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department
More informationSolving Capacitated P-Median Problem by Hybrid K-Means Clustering and Fixed Neighborhood Search algorithm
Proceedings of the 2010 International Conference on Industrial Engineering and Operations Management Dhaka, Bangladesh, January 9 10, 2010 Solving Capacitated P-Median Problem by Hybrid K-Means Clustering
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationLecture 13: k-means and mean-shift clustering
Lecture 13: k-means and mean-shift clustering Juan Carlos Niebles Stanford AI Lab Professor Stanford Vision Lab Lecture 13-1 Recap: Image Segmentation Goal: identify groups of pixels that go together Lecture
More informationApproximating max-min linear programs with local algorithms
Approximating max-min linear programs with local algorithms Patrik Floréen, Marja Hassinen, Petteri Kaski, Topi Musto, Jukka Suomela HIIT seminar 29 February 2008 Max-min linear programs: Example 2 / 24
More informationCS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017
CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:
More informationECE6095: CAD Algorithms. Optimization Techniques
ECE6095: CAD Algorithms Optimization Techniques Mohammad Tehranipoor ECE Department 6 September 2010 1 Optimization Techniques Objective: A basic review of complexity Review of basic algorithms Some physical
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More information