Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Similar documents
Expectation Maximization!

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

Introduction to Information Retrieval

Introduction to Information Retrieval

PV211: Introduction to Information Retrieval

Introduction to Information Retrieval

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline.

Clustering CE-324: Modern Information Retrieval Sharif University of Technology


Information Retrieval and Organisation

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Chapter 9. Classification and Clustering

Information Retrieval and Web Search Engines

Data Clustering. Danushka Bollegala

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CLUSTERING. Quiz information. today 11/14/13% ! The second midterm quiz is on Thursday (11/21) ! In-class (75 minutes!)

Introduction to Mobile Robotics

Machine Learning. Unsupervised Learning. Manfred Huber

Information Retrieval and Web Search Engines

ALTERNATIVE METHODS FOR CLUSTERING

Clustering: Overview and K-means algorithm

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

COMS 4771 Clustering. Nakul Verma

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

CS Introduction to Data Mining Instructor: Abdullah Mueen

Based on Raymond J. Mooney s slides

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

K-means and Hierarchical Clustering

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Administrative. Machine learning code. Machine learning: Unsupervised learning

K-Means Clustering 3/3/17

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Clustering CS 550: Machine Learning

Text Documents clustering using K Means Algorithm

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF 4300 Classification III Anne Solberg The agenda today:

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Lecture on Modeling Tools for Clustering & Regression

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Clustering: Overview and K-means algorithm

Chapter 4: Text Clustering

Association Rule Mining and Clustering

CSE 158. Web Mining and Recommender Systems. Midterm recap

Clustering Algorithms for general similarity measures

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

A Comparison of Document Clustering Techniques

Artificial Intelligence. Programming Styles

Clustering Color/Intensity. Group together pixels of similar color/intensity.

CSE 494/598 Lecture-11: Clustering & Classification

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Clustering: Classic Methods and Modern Views

Clustering Lecture 5: Mixture Model

Clustering algorithms

Hierarchical Clustering

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering web search results

Data Informatics. Seon Ho Kim, Ph.D.

CS47300: Web Information Search and Management

Using Machine Learning to Optimize Storage Systems

Cluster Analysis: Agglomerate Hierarchical Clustering

CS145: INTRODUCTION TO DATA MINING

Clustering (COSC 416) Nazli Goharian. Document Clustering.

CSE 5243 INTRO. TO DATA MINING

Cluster Analysis. Ying Shen, SSE, Tongji University

SGN (4 cr) Chapter 11

IBL and clustering. Relationship of IBL with CBR

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

Unsupervised Learning and Data Mining

Clustering: K-means and Kernel K-means

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

CLUSTERING. JELENA JOVANOVIĆ Web:

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Master-Worker pattern

Clustering k-mean clustering

Search Engines. Information Retrieval in Practice

Pattern Recognition Lecture Sequential Clustering

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1396

Mixture Models and the EM Algorithm

Transcription:

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical Start with many clusters join clusters at each step

k-means Clustering

Hierarchical Agglomerative Clustering Start with N groups each with one instance Merging similar groups to form larger groups until there is a single one Divisive Clustering Start with a single group Divide large groups into smaller groups until each group contains a single instance

Sec. 17.2 Closest pair of clusters Many variants to defining closest pair of clusters Single-link Similarity of the most cosine-similar (single-link) Complete-link Similarity of the furthest points, the least cosine-similar Centroid Clusters whose centroids are the most cosine-similar Average-link Average cosine between pairs of elements

Cluster Labeling Differential Cluster Labeling Cluster-internal Labeling

What is a good clustering? Internal criteria Example of an internal criterion: Reconstruction Error in K- k t t E means ({ mi} X ) = b i = 1 t i i x mi t t! $ t 1 if x mi = min x m j b = # j i!" 0 otherwise But an internal criterion often does not evaluate the actual utility of a clustering in the application. Alternative: External criteria Evaluate with respect to a human-defined classification 10

External criteria for clustering quality Based on a gold standard data set, e.g., the Reuters collection we also used for the evaluation of classification Goal: Clustering should reproduce the classes in the gold standard (But we only want to reproduce how documents are divided into groups, not the class labels.) First measure for how well we were able to reproduce the classes: purity 11

External criterion: Purity Ω= {ω 1, ω 2,..., ω K } is the set of clusters and C = {c 1, c 2,..., c J } is the set of classes. For each cluster ω k : find class c j with most members n kj in ω k Sum all n kj and divide by total number of points 12

Sec. 16.3 Purity example Cluster I: (max(5, 1, 0)) = 5 Cluster II: (max(1, 4, 1)) = 4 Cluster III: (max(2, 0, 3)) = 3 Purity(clusters1,II,II, classes XOD) = (1/17) (5 + 4 + 3) 0.71

Normalized Mutual Information (NMI) How much information does the clustering contain about the classification? Singleton clusters (number of clusters = number of docs) have maximum MI Therefore: normalize by entropy of clusters and classes 14

Normalized Mutual Information

Normalized Mutual Information

Rand index Definition: Based on 2x2 contingency table of all pairs of documents: 17 TP+FN+FP+TN is the total number of pairs. There are pairs for N documents. Example: = 136 in o/ /x example Each pair is either positive or negative (the clustering puts the two documents in the same or in different clusters)...... and either true (correct) or false (incorrect): the clustering decision is correct or incorrect. 17

Rand Index: Example 18

Rand measure for the o/ /x example (20 + 72)/(20 + 20 + 24 + 72) 0.68. 19 19

Sec. 16.3 Rand index and Cluster F-measure P = TP TP +FP R = TP TP + FN

Cluster F-measure: Example P = TP TP +FP R = TP TP + FN

Evaluation results for the o/ /x example All four measures range from 0 (really bad clustering) to 1 (perfect clustering). 22

Hard vs. soft clustering Hard clustering: Each document belongs to exactly one cluster More common and easier to do Soft clustering: A document can belong to more than one cluster. document about Chinese cars (china and automobiles) document about electric cars (technology and environment)

Model-based Clustering K-means is a special case of model based clustering

Model Based Clustering k-means

EM is a general framework Create an initial model, θ Arbitrarily, randomly, or with a small set of training examples Use the model θ to obtain another model θ such that Σ i log P θ (y i ) > Σ i log P θ (y i ) i.e. better models data Let θ = θ and repeat the above step until reaching a local maximum Guaranteed to find a better model after each iteration

Example - clustering documents

Inferring the Model Parameters from the Data Similar to K Means - Alternates between an expectation step (corresponding to reassignment) And a maximization step (corresponding to recomputation of the parameters of the model)

Maximization Step (re)compute the parameters q mk and alpha k (priors) as follows: if

Expectation Step compute the soft assignment of documents to clusters given the current parameters q mk and alpha k as follows:

Customizing Sentiment Classifiers to New Domains: a Case Study by Aue and Gamon

E and M steps Expectation: Given the current model, figure out the expected probabilities of the documents belonging to each cluster p(x θ c ) Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c Each iteration increases the likelihood of the data and it is guaranteed to converge!

Similar to K-Means Iterate: Assign/cluster each document to closest center Expectation: Given the current model, figure out the expected probabilities of the documents to each cluster p(x θ c ) Recalculate centers as the mean of the points in a cluster Maximization: Given the probabilistic assignment of all the documents, estimate a new model, θ c

EM example Figure from Chris Bishop

EM example Figure from Chris Bishop