Clustering algorithms

Similar documents
Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Based on Raymond J. Mooney s slides

Hierarchical Clustering

Data Informatics. Seon Ho Kim, Ph.D.

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Lecture 15 Clustering. Oct

Unsupervised learning, Clustering CS434

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

CSE 5243 INTRO. TO DATA MINING

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

What to come. There will be a few more topics we will cover on supervised learning

CS47300: Web Information Search and Management

CSE 5243 INTRO. TO DATA MINING

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Administrative. Machine learning code. Machine learning: Unsupervised learning

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Unsupervised Learning and Clustering

Information Retrieval and Organisation

Unsupervised Learning

Machine Learning. Unsupervised Learning. Manfred Huber

Clust Clus e t ring 2 Nov

Unsupervised Learning and Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Cluster Analysis. Ying Shen, SSE, Tongji University


Clustering CS 550: Machine Learning

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Cluster Analysis: Agglomerate Hierarchical Clustering

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

K-means and Hierarchical Clustering

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

CHAPTER 4: CLUSTER ANALYSIS

COMS 4771 Clustering. Nakul Verma

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Hierarchical Clustering 4/5/17

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Supervised vs. Unsupervised Learning

Introduction to Machine Learning. Xiaojin Zhu

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering Lecture 5: Mixture Model

University of Florida CISE department Gator Engineering. Clustering Part 2

Network Traffic Measurements and Analysis

K-Means Clustering 3/3/17

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Clustering Lecture 3: Hierarchical Methods

Unsupervised Learning

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Clustering. Supervised vs. Unsupervised Learning

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Hierarchical Clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Note Set 4: Finite Mixture Models and the EM Algorithm

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering. Chapter 10 in Introduction to statistical learning

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Unsupervised Learning : Clustering

Informa(on Retrieval

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today

Unsupervised: no target value to predict

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Hierarchical Clustering

Clustering and The Expectation-Maximization Algorithm

Unsupervised Learning: Clustering

Clustering: K-means and Kernel K-means

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1396

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Methods for Intelligent Systems

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

Informa(on Retrieval

Data Clustering. Danushka Bollegala

Unsupervised Learning Partitioning Methods

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Unsupervised Learning

CS Introduction to Data Mining Instructor: Abdullah Mueen

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Lesson 3. Prof. Enza Messina

Mixture Models and EM

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Introduction to Mobile Robotics

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Finding Clusters 1 / 60

Machine Learning for OR & FE

Unsupervised Learning

Clustering in Ratemaking: Applications in Territories Clustering

Transcription:

Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22

Table of contents 1 Supervised & unsupervised learning 2 Clustering 3 Hierarchical clustering 4 Non-Hierarchical Clustering Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 2 / 22

Supervised & unsupervised learning The learning methods covered in class up to this point have focused on the issue of classification/regression. An example consisted of a pair of variables (x, t), where x a feature vector and t the label/value. Such learning problems are called supervised since the system is given both the feature vector and the correct answer. We will investigate methods that operate on unlabeled data. Given a collection of feature vectors X = {x 1, x 2,..., x N } without labels/values t i, these methods attempt to build a model that captures the structure of the data. These methods are called unsupervised since they are not provided with the correct answer. The unsupervised learning methods may appear to have limited capabilities, there are several reasons that make them useful Labeling large data sets can be a costly procedure but raw data is cheap. Class labels may not be known beforehand. Large datasets can be compressed by finding a small set of prototypes. One can train with large amount of unlabeled data, and then use supervision to label the groupings found. Unsupervised methods can be used for feature extraction. Exploratory data analysis can provide insight into the nature or structure of the data. Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 3 / 22

Unsupervised Learning Unsupervised learning algorithms Non-parametric methods: These methods don t make any assumption about the underlying densities, instead we seek a partition of the data into clusters. Parametric methods: These methods model the underlying class-conditional densities with a mixture of parametric densities, and the objective is to find the model parameters. p(x θ) = i p(x ω i, θ i )p(ω i ) Examples of unsupervised learning Dimensionality reduction Latent variable learning Clustering A cluster is a number of similar objects collected or grouped together. Clustering algorithm partitions examples into groups when labels are available. Sample applications Novelty detection and outliers detection. Clusters are connected regions of a multidimensional space containing a relatively high density of points, separated from other such regions by a region containing a relatively low density of points. Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 4 / 22

Application of Clustering Cluster retrieved documents to present more organized and understandable results to user diversified retrieval Detecting near duplicates such as entity resolution Exploratory data analysis Automated (or semi-automated) creation of taxonomies Comparison Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 5 / 22

Why do Unsupervised Learning? Clustering is a very difficult problem because data can reveal clusters with different shapes and sizes. How many clusters do you see in the above figure? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 6 / 22

Why do Unsupervised Learning? (cont.) How many clusters do you see in the figure? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 7 / 22

Why do Unsupervised Learning? (cont.) How many clusters do you see in the figure? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 8 / 22

Why do Unsupervised Learning? (cont.) How many clusters do you see in the figure? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 9 / 22

Why do Unsupervised Learning? (cont.) How many clusters do you see in the figure? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 10 / 22

Clustering Clustering algorithms can be divided into several groups Exclusive (each pattern belongs to only one cluster) Vs non-exclusive (each pattern can be assigned to several clusters). Hierarchical (nested sequence of partitions) Vs partitioned (a single partition). Clustering algorithms Hierarchical clustering Centroid-based clustering Distribution-based clustering Density-based clustering Grid-based clustering Constraint clustering Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 11 / 22

Clustering Challenges in the clustering Selection of an appropriate measure of similarity to define clusters that is often both data (cluster shape) and context dependent. Choice of the criterion function to be optimized. Evaluation function Optimization method Similarity/distance measures Euclidean distance (L 2 norm) L 2 (x, y) = Σ N i=1 (x i y i ) 2 L 1 norm: Cosine similarity: L 1 (x, x) = cosine(x, y) = Σ N i=1 x i y i xy x y Evaluation function that assigns a (usually real-valued) value to a clustering. This function typically function of withing-cluster similarity and between-cluster dissimilarity. Optimization method : Find a clustering that maximize the criterion. This can be done by global optimization methods (often intractable), greedy search methods, and approximation algorithms. Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 12 / 22

Hierarchical clustering Organizes the clusters in a hierarchical way Produces a rooted tree (Dendrogram) Animal Vertebrate Invertebrate Fish Reptile Amphibian Mammal Worm Insect Crustacean Recursive application of a standard clustering algorithm can produce a hierarchical clustering Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 13 / 22

Hierarchical clustering (cont.) Organize the clusters in a hierarchical way. Types of hierarchical clustering Agglomerative (bottom-up): Methods start with each example in its own cluster and Produces iteratively combine a them rooted to formbinary larger and larger tree clusters. (dendrogram). Divisive(top-down): Methods separate all examples recursively into smaller clusters. Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 14 / 22

Agglomerative (bottom up) Assumes a similarity function for determining the similarity of two clusters. Starts with all instances in a separate cluster and then repeatedly joins the two clusters that are most similar until there is only one cluster. The history of merging forms a binary tree or hierarchy Basic algorithms: Start with all instances in their own cluster Until there is only one cluster: Among the current clusters, determine the two clusters, c i and c j that are most similar Replace c i and c j with a single cluster c i c j Cluster Similarity: How to compute similarity of two clusters each possibly containing multiple instances? Single Linkage: Similarity of two most similar members. Complete Linkage: Similarity of two least similar members. Group Average: Average similarity between members. This method uses the average of similarity across all pairs within the merged cluster to measure the similarity of two clusters. This method is a compromise between single and complete link. Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 15 / 22

Single-Link (bottom-up) sim(c i, c j ) = max x ci,y c j sim(x, y) Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 16 / 22

Compelete-Link (bottom-up) sim(c i, c j ) = min x ci,y c j sim(x, y) Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 17 / 22

Computational Complexity of HAC In the first iteration, all HAC methods need to compute similarity of all pairs n individual instances which is O(n 2 ). In each of the subsequent O(n) merging iterations, must find smallest distance pair of clusters Maintain heap O(n 2 log(n)) In each of the subsequent O(n) merging iterations, it must compute the distance between the most recently created cluster and all other existing cluster. Can this be done in constant time such that O(n 2 log(n)) overall? Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 18 / 22

Centroid-Based Clustering Assumes instances are real-valued vectors. Clusters represented via centroids (for example, average of points in a cluster) c µ(c) = 1 c x Reassignment of instances to clusters is based on distance to the current cluster K-Means algorithm Input: k = number of clusters, distance measure d, Select k random instances s 1, s 2,..., s k as seeds. Until clustering converges or other stopping criterion: For each instance x i : Assign x i to cluster c j such that d(x i, s j ) is minimum For each cluster c j,update its centroid x c s j = µ(c j ) Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 19 / 22

Time Complexity Assume computing distance between two instances is O(D), where D is the dimensionality of the vectors. Reassigning clusters for N points: O(kN) distance computations, or O(kND). Computing centroids: Each instance gets added once to some centroid: O(ND). Assume these two steps are each done once for m iterations: O(mkND). Problems with K-means Results can vary based on random seed selection, especially for high-dimensional data. Some seeds can result in poor convergence rate, or convergence to sub-optimal clusterings. Sensitive to outliers Idea: Combine HAC and K-means clustering. Convergence of K-means Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 20 / 22

Gaussian mixture model A mixture model is a linear combination of K densities K p(x θ) = π k N (x µ k, Σ k ) Set of parameters θ = {{π k }, {µ k }, {Σ k }} π is a discrete distribution, i.e. 0 π k 1 and K k=1 π k = 1. Each component is a multi-variate Gaussian 1 ( ) N (x µ k, Σ k ) = (2π) D/2 Σ k exp (x µ k ) T Σ 1 k (x µ k) k=1 To generate a sample x from the mixture model: (1) sample mixture component z π, (2) sample x R D from the z th component x N (µ z, Σ z ). An alternative viewpoint: z is a 1 of K binary vector The posterior distribution p(x) = z p(x z)p(z) = p(z k x) = p(x z k)p(z k ) p(x) K π k N (x µ k, Σ k ) k=1 = π kn (x µ k, Σ k ) K j=1 π jn (x µ j, Σ j ) Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 21 / 22

Gaussian Mixtures and EM Initialize π, µ, and Σ. Repeat E-Step Evaluate the posterior probabilities p(z k x n ) = π kn (x µ k, Σ k ) k j=1 π jn (x µ j, Σ j ) M-Step Update the parameter values Until Convergence µ k = 1 K p(z k x n )x n N k n=1 Σ k = 1 K p(z k x n )(x n µ k )(x n µ k ) T N k Σ k = N k N n=1 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 22 / 22