7/19/09 A Primer on Clustering: I. Models and Algorithms 1
|
|
- Milo Ethan Charles
- 6 years ago
- Views:
Transcription
1 7/19/9 A Primer on Clustering: I. Models and Algorithms 1
2 7/19/9 A Primer on Clustering: I. Models and Algorithms 2
3 7/19/9 A Primer on Clustering: I. Models and Algorithms 3
4 A Primer on Clustering : I.Models and Algorithms Atenção! Os Estados Unidos 2 : A Espanha 1. Clustering: similarity, labels, and partitions 2. U only models : SAHN partitioning 3. V only models : SOM point prototypes 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 4
5 Numerical Pattern Recognition 4 topics Humans : "Knowledge Tidbits" Process Description Feature Nomination X = Object Data R = Relation Data Design Test Sensors Process Description Feature Analysis Classifier Design Feature Analysis Extraction PreProcessing 2D - Display Identification Prediction Control Classifier Design This Module: Cluster Analysis Cluster Analysis Exploration Validity 2D - Display 7/19/9 A Primer on Clustering: I. Models and Algorithms 5
6 2 Kinds of Data for Clustering Objects O= {o 1,, o n } : o i = i-th physical object Object Data X = {x 1,, x n } R p : x i = feature vector for o i x ji = j-th (measured) feature of x i : 1 j p Relational R = [r ij ] = relationship (o i, o j ) ) or (x i, x j ) d ij Data s ij = pairwise similarity (o i, o j ) or (x i, x j ) d ij = pairwise dissimilarity (o i, o j ) or (x i, x j ) Tyically (R = D) d ii = d ij > d ij = d ji : 1 i n : 1 i,j n : 1 i j n (Positive-definite) (Symmetric) We often convert XD with distance d ij = x i x j 7/19/9 A Primer on Clustering: I. Models and Algorithms 6
7 Measuring (Dis)Similarity : The Usual Suspects Mahalonobis <,> Norms d Maha = x v = (x M 1 v)t M 1 (x v) Diagonal d diag = x v DM -1 = (x v)t D M -1 (x v) Euclidean d 2 = x v I = (x v) T (x v) City Block Minkowski Norms d 1 = x v 1 p = x j v j j=1 Sup or Max d = x v = max{ x j v } j 1jp 7/19/9 A Primer on Clustering: I. Models and Algorithms 7
8 Sample Parameters of the Statistical Norms Mean Vector v = n x k k=1 / n Covariance Matrix M = [m ij ] = n (x k k=1 v )(x k v ) T n Variance (Matrix) m 11 L 2 s 1 L D M = M O M = M O M L m 2 pp L s p 7/19/9 A Primer on Clustering: I. Models and Algorithms 8
9 Label the similar objects What is Clustering? (Unsupervised Learning} What is similar? How many groups (c)? 7/19/9 A Primer on Clustering: I. Models and Algorithms 9
10 mixtures Data sets usually contain of objects Clustering groups and labels objects Easy for Humans! Hard for computers! 7/19/9 A Primer on Clustering: I. Models and Algorithms 1
11 Unlabeled Data X={x 1, x n } R p (or) R=[r ij ] R nxn Clustering 3 Problems P1: Tendency X or R has clusters? No : Stop P2: Clustering Find (c) clusters Most important and difficult problem! No P3: Validity Are c clusters OK? Yes : Stop 7/19/9 A Primer on Clustering: I. Models and Algorithms 11
12 Unipolar Label c = 3 Crisp = Vertices N h3 Fuzzy or Probabilistic = Triangle Possibilistic = Cube-{} N p3 N f3 7/19/9 A Primer on Clustering: I. Models and Algorithms 12
13 U = row i M ship of all o k s in cluster i o 1 o k o n Objects u 11 L u 1k L u 1n M M M u i1 L u ik L u in M M M u c1 L u ck L u cn col k M ship of o k in each cluster Partition Matrices Membership Functions u i :O [,1] u i (o k )=u ik =M ship of o k in cluster i 7/19/9 A Primer on Clustering: I. Models and Algorithms 13
14 Crisp Fuzzy/Prob Possibilistic Row sums k u ik > same same Col sums i u ik = 1 same c u ik c i=1 M ships u ik {,1} u ik [,1] same Set Name M hcn M fcn M pcn Example Take a 2nd 7/19/9 A Primer on Clustering: I. Models and Algorithms 14
15 Clustering is NOT Classifier Design Data X C M pcn M fcn M hcn + Other parameters V, Clustering labels only the submitted data z N hc N pc R p D N fc Classifier D can label all the data in its domain D(z) 7/19/9 A Primer on Clustering: I. Models and Algorithms 15
16 Hardening Partitions Defuzzification (Deprobabilization = Bayes Rule for U M fcn!) U = M pcn H 1 (U) = U MM = M hcn resolve ties randomly -cut thresholding (< <1) filters chosen levels of m ship 1 H.9 (U) = M hcn H.8 (U) = 1 1 M hcn 7/19/9 A Primer on Clustering: I. Models and Algorithms 16
17 Clusters represented by partition matrix (U) or prototypes V = {v k } U c = U f = Crisp : hard boundaries Fuzzy : soft boundaries 7/19/9 A Primer on Clustering: I. Models and Algorithms 17
18 U models : find clusters U M pcn 3 Types of Basic Clustering Models Usually relational models, e.g., single linkage V=F(U) can be computed - if you want it V models : find point prototypes V R cp Usually local sequential models, e.g., SOM U=G(V) can be computed - super important for VL Data Methods (U,V) models : find (U,V) M pcn x R cp Usually global batch models, e.g., c-means (U, V, +) have other parameters, e.g., {p i }, { i } in GMD 7/19/9 A Primer on Clustering: I. Models and Algorithms 18
19 A Primer on Clustering : I.Models and Algorithms 2. U only models : SAHN partitioning Eu - homem de Budweiser do fósforo? Por que não eu? 3. V only models : point prototypes with SOM 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 19
20 U Models : SAHN Clustering S Sequential : at most one U c M hcn for c from n to 1 A Agglomerative : begins at c=n, merges clusters until c=1 Devisive : begins at c=1, splits clusters until c=n H Hierarchichal : joined objects remain in nested clusters N Non-overlapping : {U k } M hcn are crisp partitions of O 7/19/9 A Primer on Clustering: I. Models and Algorithms 2
21 Algorithm SAHN : find U k M hcn for each k from n to 1 Object data: X R p Relational Data:R R nn Input convert to (or) Dissimilarity: D ii = i, D=D T D n n = [D ij ] = x i x j (or) Similarity: S ii =1 i, S=S T Pick Intercluster (set) distance d SAHN = d SL, d AL or d CL ONLY 1 user choice - no parameters or thresholds! A nice aspect of SAHN clustering Set k = 1 C = { {1}, {2},,{n} } C=initial clusters 7/19/9 A Primer on Clustering: I. Models and Algorithms 21
22 Dissimilarity Input Case For k=2 to n Find W,V in C to Merge W and V min d k = d SAHN,k (W,V) = WV=(WV) min S,T C S T {d SAHN,k (S,T)} Update C C*=C-{W,V} C = C*WV Update D Delete rows/cols in D for W,V Add row/col in D for new WV {d SAHN (WV, S) : S C, S WV} Outputs Crisp partitions {U SAHN,k M h(n-k+1)n : 1kn} Merger distances {min d k : 1kn} 7/19/9 A Primer on Clustering: I. Models and Algorithms 22
23 Geometry of Set Distances (dissimilarities) Single linkage (set) distance Complete linkage (set) distance V V W W d SL (W,V) = min{d ij } i W jv d CL (W,V) = max{d ij } i W jv 7/19/9 A Primer on Clustering: I. Models and Algorithms 23
24 Geometry of Set Distances (dissimilarities) Average linkage (set) distance V W d AL (W,V) = i W jv D ij W V 7/19/9 A Primer on Clustering: I. Models and Algorithms 24
25 SL Works Geometry of Linkage Clusters (dissimilarities) CL Fails AL supposedly mediates between these 2 extremes SL Fails CL Works 7/19/9 A Primer on Clustering: I. Models and Algorithms 25
26 7/19/9 A Primer on Clustering: I. Models and Algorithms 26 D = Dissimilarity Matrix of Euclidean Distances 5 points X in R Ex.1 SL Clustering
27 MST on 5 nodes shows evolution of SL clusters 1st Clusters C 5 ={ {1},{2},{3},{4},{5} } 1st Merger C 3 = {3,4,5} 2nd Merger C 2 = {1,2} 3rd Merger C 1 = {1,2,3,4,5} D = D = D = 2.24 D = [] 7/19/9 A Primer on Clustering: I. Models and Algorithms 27
28 Dendogram on 5 nodes shows evolution of SL clusters. SL Merger Distances min d k c=5 c=3 c= c=1 7/19/9 A Primer on Clustering: I. Models and Algorithms 28
29 Naturalmente posso fechar fora a Espanha V only models : SOM point prototypes A Primer on Clustering : I.Models and Algorithms 4. (U, V) models : c-means with AO 5. (U V, +) models : GMD with EM-AO 7/19/9 A Primer on Clustering: I. Models and Algorithms 29
30 V Models : 2 ways to get Point Prototypes V R cp V S SELECTED in X X V S V R BUILT from X X V R 7/19/9 A Primer on Clustering: I. Models and Algorithms 3
31 Point Prototype Algorithms : MANY choices 7/19/9 A Primer on Clustering: I. Models and Algorithms 31
32 Three Kinds of CL Models Nbhd. Updated Examples (N t ) -1 = v i Winner only SHCM, VQ (N t ) -1 V (N t ) -1 = V A subset of V All c nodes SOM GVQ-F, SCS, DR (N t ) -1 = update "neighborhood" in R p at iterate t can be imbedded in definition of learning rates The update subset of V is not a real neighborhood in R p ; the neighborhood is determined in display space 7/19/9 A Primer on Clustering: I. Models and Algorithms 32
33 x 1 Input Layer Hidden Layer x v 1 Network Architecture of SOM/VQ Models Output Layer x 2 x p x i x v j argmin{ x v } = s j j Winner v s x p x v c 7/19/9 A Primer on Clustering: I. Models and Algorithms 33
34 Input User picks Initialize Unlabeled Object data: X R p c,, T,{ ik,t }; {N 1 t }; p; err V = (v 1,,,v c, ) cp Basic CL Algorithm For t = to T For k = 1 to n s = argmin{ x k v } j,t j v i,t+1 = v i,t i,t (x k v i,t ) v i,t N 1 t (v s,t ) If V t+1 V t err STOP Output Else next k V* cp Update Nbhd is Fcn of Winner 7/19/9 A Primer on Clustering: I. Models and Algorithms 34
35 The Update Geometry of Most CL Models =1 x v new (x-v old ) + (excitation) v old = (inhibition) v new = v old + (x v old ) 7/19/9 A Primer on Clustering: I. Models and Algorithms 35
36 Old Hidden Layer in R p "Display Space" in R q New Hidden Layer in R p v 1,t v k p cell k q 11 v 1,t v 2,t 1 2 v s,t wins in R p v 2,t v i,t i cell s,t in R q v i,t s j N t (v s,t ) in R q v s,t N t 1 (v s,t ) R p v s,t v j.t c new v s N 1 t (v s,t ) v j.t v c,t SOM/VQ Update Neighborhood N 1 t (v s,t ) v c,t 7/19/9 A Primer on Clustering: I. Models and Algorithms 36
37 VQ Example : IRIS ~ Each 4-vector represents an Iris flower n = 15 vectors p = 4 features c = 3 physical subspecies n i = 5 vectors each class x k = x 1k sepal length x 2k sepal width x 3k petal length x 4k petal width c = 3 physical clusters Iris Data, Features (3,4) But only c = 2 geometric clusters 7/19/9 A Primer on Clustering: I. Models and Algorithms 37
38 VQ Prototypes : Initializations I 1, I 2 in conv(iris) I 1 = V Final V Same V* min err =.1 I 2 = m = m M Final V M + m 2 = max err =.225 M = See reference [BP95] for algorithmic parameters 7/19/9 A Primer on Clustering: I. Models and Algorithms 38
39 7/19/9 A Primer on Clustering: I. Models and Algorithms 39 Sestosa Sestosa same same -but not so hot but not so hot- for I for I 3, I, I 4 VQ Prototypes VQ Prototypes : Initializations I 3, I 4 outside outside conv(iris) V Final Final Final Final I V I 4 YIKES!!! YIKES!!! Clusters U=G(V) Clusters U=G(V) you can get with you can get with these prototypes these prototypes will will SUCK!!! SUCK!!!
40 Things you need to about SOM/VQ Kohonen does NOT advocate SOM for clustering! MANY schemes for learning rates { i,t } and neighborhood N t Learning rates { i,t } with (t) force termination artificially Learning rates often linear { i,t = i, (1-(t/T)): t=1,2,,t} Good initialization for V is crucial < k,t < 1 Guarantee {V t } (some) V* but k,t = 2 k,t < properties of V* are unknown! 7/19/9 A Primer on Clustering: I. Models and Algorithms 4
Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationPattern Recognition Lecture Sequential Clustering
Pattern Recognition Lecture Prof. Dr. Marcin Grzegorzek Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany Pattern Recognition Chain patterns sensor
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationWhat is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology
Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationAdministration. Final Exam: Next Tuesday, 12/6 12:30, in class. HW 7: Due on Thursday 12/1. Final Projects:
Administration Final Exam: Next Tuesday, 12/6 12:30, in class. Material: Everything covered from the beginning of the semester Format: Similar to mid-term; closed books Review session on Thursday HW 7:
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationOPTIMIZATION. Optimization. Derivative-based optimization. Derivative-free optimization. Steepest descent (gradient) methods Newton s method
OPTIMIZATION Optimization Derivative-based optimization Steepest descent (gradient) methods Newton s method Derivative-free optimization Simplex Simulated Annealing Genetic Algorithms Ant Colony Optimization...
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationData Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationPreprocessing DWML, /33
Preprocessing DWML, 2007 1/33 Preprocessing Before you can start on the actual data mining, the data may require some preprocessing: Attributes may be redundant. Values may be missing. The data contains
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationClustering algorithms and introduction to persistent homology
Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationPattern Clustering with Similarity Measures
Pattern Clustering with Similarity Measures Akula Ratna Babu 1, Miriyala Markandeyulu 2, Bussa V R R Nagarjuna 3 1 Pursuing M.Tech(CSE), Vignan s Lara Institute of Technology and Science, Vadlamudi, Guntur,
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More informationGeneralized Information Theoretic Cluster Validity Indices for Soft Clusterings
Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings Yang Lei, James C Bezdek, Jeffrey Chan, Nguyen Xuan Vinh, Simone Romano and James Bailey Department of Computing and Information
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More information6. Learning Partitions of a Set
6. Learning Partitions of a Set Also known as clustering! Usually, we partition sets into subsets with elements that are somewhat similar (and since similarity is often task dependent, different partitions
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationMachine Learning & Statistical Models
Astroinformatics Machine Learning & Statistical Models Neural Networks Feed Forward Hybrid Decision Analysis Decision Trees Random Decision Forests Evolving Trees Minimum Spanning Trees Perceptron Multi
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationFUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION
FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationClustering gene expression data
Clustering gene expression data 1 How Gene Expression Data Looks Entries of the Raw Data matrix: Ratio values Absolute values Row = gene s expression pattern Column = experiment/condition s profile genes
More informationData Warehousing and Machine Learning
Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 35 Preprocessing Before you can start on the actual
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationFuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.
Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More information4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1
Anàlisi d Imatges i Reconeixement de Formes Image Analysis and Pattern Recognition:. Cluster Analysis Francesc J. Ferri Dept. d Informàtica. Universitat de València Febrer 8 F.J. Ferri (Univ. València)
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationClustering Algorithms for general similarity measures
Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Outline Prototype-based Fuzzy c-means
More informationClustering (COSC 416) Nazli Goharian. Document Clustering.
Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationMarket basket analysis
Market basket analysis Find joint values of the variables X = (X 1,..., X p ) that appear most frequently in the data base. It is most often applied to binary-valued data X j. In this context the observations
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationCLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi
CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationClustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline
Clustering cluster - set of patterns whose inter-pattern distances are smaller than inter-pattern distances for patterns not in the same cluster a homogeneity and uniformity criterion no connectivity little
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationPartially Supervised Clustering for Image Segmentation
Partially Supervised Clustering for Image Segmentation Amine M. Bensaid, Lawrence O. Hall Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {bensaid,hall}@csee.usf.edu
More information