K-Means Clustering 3/3/17

Similar documents
COMS 4771 Clustering. Nakul Verma

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Machine Learning. Unsupervised Learning. Manfred Huber

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Hierarchical Clustering 4/5/17

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Clustering k-mean clustering

Introduction to Machine Learning. Xiaojin Zhu

Unsupervised Learning: Clustering

Unsupervised: no target value to predict

Based on Raymond J. Mooney s slides

Introduction to Machine Learning CMU-10701

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Clustering Lecture 5: Mixture Model

ECE 5424: Introduction to Machine Learning

Clustering and Visualisation of Data

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Unsupervised Learning and Clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Grundlagen der Künstlichen Intelligenz

Clustering. Image segmentation, document clustering, protein class discovery, compression

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering algorithms

IBL and clustering. Relationship of IBL with CBR

4. Ad-hoc I: Hierarchical clustering

Content-based image and video analysis. Machine learning

Clustering COMS 4771

CHAPTER 4: CLUSTER ANALYSIS

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

CS 229 Midterm Review

Network Traffic Measurements and Analysis

Supervised vs. Unsupervised Learning

Clustering: K-means and Kernel K-means

Unsupervised Learning and Clustering

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Clustering web search results

University of Florida CISE department Gator Engineering. Clustering Part 2

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning. Semi-Supervised Learning. Manfred Huber

Clustering and Dimensionality Reduction

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Data Clustering. Danushka Bollegala

Introduction to Artificial Intelligence

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Unsupervised Learning

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering CS 550: Machine Learning

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Motivation. Technical Background

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Mixture Models and EM

Introduction to Mobile Robotics

Chapter 6: Cluster Analysis

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

10701 Machine Learning. Clustering

9.1. K-means Clustering


Unsupervised Learning

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Hierarchical Clustering

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Unsupervised Learning

Finding Clusters 1 / 60

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Using Machine Learning to Optimize Storage Systems

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Clustering. Supervised vs. Unsupervised Learning

ECG782: Multidimensional Digital Signal Processing

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Cluster Analysis: Agglomerate Hierarchical Clustering

Understanding Clustering Supervising the unsupervised

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Transcription:

K-Means Clustering 3/3/17

Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering Find a better basis to represent the data. Principal component analysis Compress the data to a shorter representation. Auto-encoders

Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Applications: Generating the input representation for another AI or ML algorithm. Clusters could lead to states in a state space search or MDP model. A new basis could be the input to a classification or regression algorithm. Making data easier to understand, by identifying what s important and/or discarding what isn t.

The Goal of Clustering Given a bunch of data, we want to come up with a representation that will simplify future reasoning. Key idea: group similar points into clusters. Examples: Identifying objects in sensor data Detecting communities in social networks Constructing phylogenetic trees of species Making recommendations from similar users

EM Algorithm E step: expectation terrible name Classify the data using the current model. M step: maximization slightly less terrible name Generate the best model using the current classification of the data. Initialize the model, then alternate E and M steps until convergence. Note: The EM algorithm has many variations, including some that have nothing to do with clustering.

K-Means Algorithm Model: k clusters each represented by a centroid. E step: Assign each point to the closest centroid. M step: Move each centroid to the mean of the points assigned to it. Convergence: we ran an E step where no points had their assignment changed.

K-Means Example

Initializing K-Means Reasonable options: 1. Start with a random E step. Randomly assign each point to a cluster in {1, 2,, k}. 2. Start with a random M step. a) Pick random centroids within the maximum range of the data. b) Pick random data points to use as initial centroids.

K-Means in Action https://www.youtube.com/watch?v=bvfg7fd1h30

Another EM Example: GMMs GMM: Gaussian mixture model A Gaussian distribution is a multivariate generalization of a normal distribution (the classic bell curve). A Gaussian mixture is a distribution comprised of several independent Gaussians. If we model our data as a Gaussian mixture, we re saying that each data point was a random draw from one of several Gaussian distributions (but we may not know which).

EM for Gaussian Mixture Models Model: data drawn from a mixture of k Gaussians E step: Compute the (log) likelihood of the data Each point s probability of being drawn from each Gaussian. M step: Update the mean and covariance of each Gaussian. Weighted by how responsible that Gaussian was for each data point.

How do we pick K? There s no hard rule. Sometimes the application for which the clusters will be used dictates k. If k can be flexible, then we need to consider the tradeoffs: Higher k will always decrease the error (increase the likelihood). Lower k will always produce a simpler model.

Hierarchical Clustering Organizes data points into a hierarchy. Every level of the binary tree splits the points into two subsets. Points in a subset should be more similar than points in different subsets. The resulting clustering can be represented by a dendrogram.

Direction of Clustering Agglomerative (bottom-up) Each point starts in its own cluster. Repeatedly merge the two most-similar clusters until only one remains. Divisive (top-down) All points start in a single cluster. Repeatedly split the data into the two most self-similar subsets. Either version can stop early if a specific number of clusters is desired.