Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Similar documents
CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING

Hierarchical Clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering CS 550: Machine Learning

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering Part 3. Hierarchical Clustering

Supervised vs. Unsupervised Learning

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Hierarchical Clustering 4/5/17

Machine Learning (BSMC-GA 4439) Wenke Liu

Lecture 4 Hierarchical clustering

1 Case study of SVM (Rob)

Clustering Lecture 3: Hierarchical Methods

Cluster Analysis: Agglomerate Hierarchical Clustering

Cluster analysis. Agnieszka Nowak - Brzezinska

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Hierarchical Clustering

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Hierarchical clustering

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

ECG782: Multidimensional Digital Signal Processing

Clustering. Chapter 10 in Introduction to statistical learning

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Stat 321: Transposable Data Clustering

CS7267 MACHINE LEARNING

Machine Learning. Unsupervised Learning. Manfred Huber

Clustering COMS 4771

Cluster Analysis. Ying Shen, SSE, Tongji University

Clustering algorithms

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Mining Concepts & Techniques

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

What to come. There will be a few more topics we will cover on supervised learning

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Kernels and Clustering

Exploratory data analysis for microarrays

Chapter 6: Cluster Analysis

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Clustering. Supervised vs. Unsupervised Learning

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Lecture 10: Semantic Segmentation and Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Clustering Algorithms. Margareta Ackerman

Segmentation Computer Vision Spring 2018, Lecture 27

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CS 343: Artificial Intelligence

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Machine Learning for OR & FE

Mixture Models and EM

Machine learning - HT Clustering

Unsupervised Learning Partitioning Methods

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Clustering: K-means and Kernel K-means

University of Florida CISE department Gator Engineering. Clustering Part 2

CSC 411: Lecture 12: Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Gene Clustering & Classification

Distances, Clustering! Rafael Irizarry!

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Unsupervised Learning

Network Traffic Measurements and Analysis

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Unsupervised learning, Clustering CS434

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

CPSC 425: Computer Vision

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Information Retrieval and Web Search Engines

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

What is Unsupervised Learning?

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

Unsupervised Learning and Clustering

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Unsupervised Learning and Clustering

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Unsupervised Learning Hierarchical Methods

CSE 573: Artificial Intelligence Autumn 2010

Lecture 15 Clustering. Oct

Transcription:

Clustering CS498

Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ

Unsupervised learning Supervised learning Use labeled data to do something smart What if the labels don t exist?

Some inspiration El Capitan, Yosemite National Park

The way we ll see it Blue Green 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red

A new question I see classes, but 1 0.9 How do I find them? Can I automate this? How any are there? Blue 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 Answer: Clustering 0 Green 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red

Clustering Discover classes in data Divide data in sensible clusters Fundamentally ill-defined problem There is often no correct solution Relies on many user choices

Clustering process Describe your data using features What s your objective? Define a proximity measure How is the feature space shaped? Define a clustering criterion When do samples make a cluster?

Know what you want Features & objective matter Which are the two classes?

Know what you want Features & objective matter Which are the two classes? Basketball player recruiting Player s knowledge of entomology Player s height

Know your space Define a sensible proximity measure

Know your space Define a sensible proximity measure Angle of incidence 0 2π 4π

Know your cluster type What forms a cluster in your space?

Know your cluster type What forms a cluster in your space? Planes near airports Altitude Speed

Know your cluster type What forms a cluster in your space? Sound bouncing off a wall Intensity Wavefront position

Know your cluster type What forms a cluster in your space? Ants departing colony North West East South

How many clusters? The deeper you look the more you ll get

There are no right answers! Part of clustering is an art You need to experiment to get there But some good starting points exist

How to cluster Tons of methods We can use step-based logical steps e.g., find two closest point and merge, repeat Or formulate a global criterion

Hierarchical methods Agglomerative algorithms Keep pairing up your data Divisive algorithms Keep breaking up your data

Agglomerative Approach Look at your data points and form pairs Keep at it

More formally Represent data as vectors: X = {x i,i = 1,,N } Represent clusters by: C j Represent the clustering by: R = {C j, j = 1,,m} e.g. R = {{x 1,x 3 },x 2 {x 4,x 5,x 6 }} Define a distance measure: d(c j,c i )

Agglomerative clustering Choose: For t = 1, R 0 = {C i = {x i },i = 1,,N } Among all clusters in R t-1, find cluster pair {C i,c j } such that: arg mind(c i,c j ) i,j Form new cluster and replace pair: C q = C i C j R t = (R t 1 {C i,c j }) {C q } Until we have only one cluster

Dendrogram Pretty picture version R 0 x 1 x 2 x 3 x 4 x 5 R 1 R 2 R 3 Similarity R 4

Venn diagram Pretty picture two x 1 x 2 x 3 x 4 x 5 R 1 R 2 R 3 R 4

Cluster distance? Complete linkage Merge clusters that result to the smallest diameter Single linkage Merge clusters with two closest data points Group average Use average of distances

What s involved At level t we have N - t clusters At level t+1 the pairs we consider are: N t 2 Overall comparisons: N 1 N t 2 t=0 (N t)(n t 1) 2 (N 1)N(N + 1) 6

Not good for our case El Capitan picture has 63,140 pixels How many cluster comparisons is that? N 3 N t = 41,946,968,141,536 2 t=0 Thanks, but no thanks

Divisive Clustering Works the other way around Start with all data in one cluster Start dividing into sub-clusters

Choose: For t = 1, Divisive Clustering R 0 = {X} For k = 1,,t Find least similar sub-clusters in each cluster Pick the least similar of all: arg maxd(c k,i,c k,j ) k,i,j New clustering is now: R t = (R t 1 {C t }) {C t,i,c t,j } Until each point is a cluster

Comparison Which one is faster? Agglomerative Divisive has a complicated search step Which one gives better results? Divisive Agglomerative makes only local observations

Using cost functions Given a set of data x i Define a cost function: i j J(θ,U) = u ij d(x i,θ j ) θ are the cluster parameters U {0,1} is an assignment matrix d() is a distance function

An iterative solution We can t use a gradient method The assignment matrix is binary-valued We have two parameters to find θ, U Fix one and find the other, repeat flip case Iterate until happy

Overall process Initialize θ and iterate: Estimate U Estimate θ u ij = 1, if d(x,θ ) = min d(x,θ ) i j k i k 0, otherwise i u ij d(x i,θ j ) θ j = 0 Repeat until satisfied

K-means Standard and super-popular algorithm Finds clusters in terms of region centers Optimizes squared Euclidean distance d(x,θ) = x θ 2

K-means algorithm Initialize k means µ Iterate Assign samples x i to closest mean µ Estimate µ from assigned samples x i Repeat until convergence

Example run step 1 2 1 0 1 2 3 4 5 6 6 4 2 0 2 4 6 8

Example run step 2 2 1 0 1 2 3 4 5 6 6 4 2 0 2 4 6 8

Example run step 3 2 1 0 1 2 3 4 5 6 6 4 2 0 2 4 6 8

Example run step 4 2 1 0 1 2 3 4 5 6 6 4 2 0 2 4 6 8

Example run step 5 2 1 0 1 2 3 4 5 6 6 4 2 0 2 4 6 8

How well does it work? Converges to a minimum of cost function Not for all distances though! Is heavily biased by starting positions Various initialization tricks It s pretty fast!

K-Means on El Capitan Blue 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 Blue 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.4 0.2 0.5 Green 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red Green 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red

K-means on El Capitan

K-means on El Capitan Blue 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 Blue 1 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.4 Green 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red Green 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Red

K-means on El Capitan

One problem K-means struggles with outliers 40 40 30 30 20 20 10 10 0 0 10 10 20 20 30 5 0 5 10 15 30 5 0 5 10 15

K-medoids Medoid: Least dissimilar data point to all others Not as influenced by outliers as the mean Replace means with medoids Redesign k-means as k-medoids

K-medoids 40 Input data 40 k means 40 k medoids 30 30 30 20 20 20 10 10 10 0 0 0 10 10 10 20 20 20 30 10 0 10 20 30 10 0 10 20 30 10 0 10 20

Vector Quantization Use of clustering for compression Keep a codebook ( k-means) Transmit nearest codebook vector instead of current sample We transmit only the cluster index, not the entire data for each sample

Simple example 2 0 2 4 6 4 2 0 2 4 6 0 2 4 6 6 4 2 0 2 4 6

Vector Quantization in Audio Input sequence Frequency 4 Cluster 3 2 1 Coded sequence Frequency Time

Recap Hierarchical clustering Agglomerative, Divisive Issues with performance K-means Fast and easy K-medoids for more robustness (but slower)