INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Similar documents
INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Introduction to Information Retrieval

Introduction to Information Retrieval

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

PV211: Introduction to Information Retrieval

Information Retrieval and Web Search Engines

Information Retrieval and Organisation

Information Retrieval and Web Search Engines

Introduction to Information Retrieval

Clustering CE-324: Modern Information Retrieval Sharif University of Technology

Unsupervised Learning : Clustering

Gene Clustering & Classification

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

Unsupervised Learning and Clustering

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline.

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Unsupervised Learning

What to come. There will be a few more topics we will cover on supervised learning

Clustering (COSC 416) Nazli Goharian. Document Clustering.

Clustering: Overview and K-means algorithm

CSE 5243 INTRO. TO DATA MINING

Data Clustering. Danushka Bollegala

Text Documents clustering using K Means Algorithm

Clustering CS 550: Machine Learning

Unsupervised Learning and Clustering

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Artificial Intelligence. Programming Styles

Based on Raymond J. Mooney s slides

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Unsupervised Learning

INF 4300 Classification III Anne Solberg The agenda today:

Clustering (COSC 488) Nazli Goharian. Document Clustering.

Unsupervised Learning Partitioning Methods

CS47300: Web Information Search and Management

Chapter 9. Classification and Clustering

CSE 158. Web Mining and Recommender Systems. Midterm recap

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Clustering: Overview and K-means algorithm


Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

CSE 5243 INTRO. TO DATA MINING

Machine Learning. Unsupervised Learning. Manfred Huber

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

ECLT 5810 Clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Data Informatics. Seon Ho Kim, Ph.D.

ECLT 5810 Clustering

Natural Language Processing

Association Rule Mining and Clustering

CLUSTERING. Quiz information. today 11/14/13% ! The second midterm quiz is on Thursday (11/21) ! In-class (75 minutes!)

Clustering Algorithms for general similarity measures

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CSE 494/598 Lecture-11: Clustering & Classification

Cluster Analysis. CSE634 Data Mining

Clustering in Data Mining

Introduction to Artificial Intelligence

Clustering: K-means and Kernel K-means

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Exploratory Analysis: Clustering

Road map. Basic concepts

Evaluating Classifiers

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Cluster Analysis. Ying Shen, SSE, Tongji University

[Raghuvanshi* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

CHAPTER 4: CLUSTER ANALYSIS

Introduction to Mobile Robotics

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

IBL and clustering. Relationship of IBL with CBR

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

6. Learning Partitions of a Set

Evaluating Classifiers

k-means Clustering Todd W. Neller Gettysburg College

Expectation Maximization!

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

CSE494 Information Retrieval Project C Report

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Machine learning - HT Clustering

Introduction to Machine Learning CMU-10701

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

K-Means Clustering 3/3/17

Network Traffic Measurements and Analysis

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Clustering Basic Concepts and Algorithms 1

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

DD2475 Information Retrieval Lecture 10: Clustering. Document Clustering. Recap: Classification. Today

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Transcription:

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012

Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision, recall and F-score Clustering Unsupervised machine learning for class discovery. Flat vs. hierarchical clustering. Example of flat / partional clustering: k-means clustering.

Topics we covered last week 3 Supervised vs unsupervised learning. Vectors space classification. How to represent classes and class membership. Rocchio + knn. Linear vs non-linear decision boundaries.

Testing a classifier 4 We ve seen how vector space classification amounts to computing the boundaries in the space that separate the class regions; the decision boundaries. To evaluate the boundary, we measure the number of correct classification predictions on unseeen test items. Many ways to do this... Why can t we test on the training data? We want to test how well a model generalizes on a held-out test set. (Or, if we have little data, by n-fold cross-validation.) Labeled test data is sometimes refered to as the gold standard.

Example: Evaluating classifier decisions 5 Predictions for a given class can be wrong or correct in two ways: gold = positive gold = negative prediction = positive true positive (TP) false positive (FP) prediction = negative false negative (FN) true negative (TN)

Evaluation measures 6 accuracy = TP+TN TP+TN N = TP+TN+FP+FN The ratio of correct predictions. Not suitable for unbalanced numbers of positive / negative examples. TP precision = TP+FP The number of detected class members that were correct. TP recall = TP+FN The number of actual class members that were detected. Trade-off: Positive predictions for all examples would give 100% recall but (typically) terrible precision. F-score = 2 precision recall precision+recall Balanced measure of precision and recall (harmonic mean).

Example: Evaluating classifier decisions 7 accuracy = TP+TN N = 1+6 10 = 0.7 precision = = 1 1+1 = 0.5 recall = = 1 TP TP+FP TP TP+FN 1+2 = 0.33 F-score = 2 precision recall precision+recall = 0.4

Evaluating multi-class predictions 8 Macro-averaging Sum precision and recall for each class, and then compute global averages of these. The macro average will be highly influenced by the small classes. Micro-averaging Sum TPs, FPs, and FNs for all points/objects across all classes, and then compute global precision and recall. The micro average will be highly influenced by the large classes.

Two categorization tasks in machine learning 9 Classification Supervised learning, requiring labeled training data. Given some training set of examples with class labels, train a classifier to predict the class labels of unseen objects. Clustering Unsupervised learning from unlabeled data. Automatically group similar objects together. No pre-defined classes: we only specify the similarity measure. Described by? (?) as the search for structure in data. General objective: Partition the data into subsets, so that the similarity among members of the same group is high (homogeneity) while the similarity between the groups themselves is low (heterogeneity).

Example applications of cluster analysis 10 Visualization and exploratory data analysis. Generalization and abstraction. Reason by analogy. Can have class-based models, even without predefined classes. Helps alleviating the sparse data problem. Many applications within IR. Examples: Speed up search: First retrieve the most relevant cluster, then retrieve documents from within the cluster. Presenting the search results: Instead of ranked lists, organize the results as clusters (see e.g. clusty.com). News aggregation / topic directories. Social network analysis; identify sub-communities and user segments. Image segmentation, product recommendations, demographic analysis,...

Types of clustering methods 11 Different methods can be divided according to the memberships they create and the procedure by which the clusters are formed: Flat { Agglomerative Procedure Hierarchical Divisive Hybrid Hard Memberships Soft Disjunctive

Types of clustering methods (cont d) 12 Hierarchical Creates a tree structure of hierarchically nested clusters Divisive (top-down): Let all objects be members of the same cluster; then successively split the group into smaller and maximally dissimilar clusters until all objects is its own singleton cluster. Agglomerative (bottom-up): Let each object define its own cluster; then successively merge most similar clusters until only one remains. Flat Often referred to as partitional clustering when assuming hard and disjoint clusters. (But can also be soft.) Tries to directly decompose the data into a set of clusters.

Flat clustering 13 Given a set of objects O = {o 1,..., o n }, a hard flat clustering algorithm seeks to construct a set of clusters C = {c 1,..., c k }, where each object o i is assigned to a single cluster c i. The cardinality k (= the number of clusters) must typically be manually specified as a parameter to the algorithm. But the most important parameter is the similarity function s. More formally, we want to define an assignment γ : O C that optimizes some objective function F s (γ). The objective function is defined in terms of the similarity function, and generally we want to optimize for: High intra-cluster similarity Low inter-cluster similarity

Flat clustering (cont d) 14 Optimization problems are search problems: There s a finite number of possible of partitionings of O. Naive solution: enumerate all possible assignments Γ = {γ 1,..., γ m } and choose the best one, ˆγ = arg min F s (γ) γ Γ Problem: Exponentially many possible partitions. Approximate the solution by iteratively improving on an initial (possibly random) partition until some stopping criterion is met.

k-means 15 Unsupervised variant of the Rocchio classifier. Goal: Partition the n observed objects into k clusters C so that each point x j belongs to the cluster c i with the nearest centroid µ i. Typically assumes Euclidean distance as the similarity function s. The optimization problem: For each cluster, minimize the within-cluster sum of squares, F s = WCSS: WCSS = x j µ i 2 c i C x j c i WCSS also amounts to the more general measure of how well a model fits the data known as the residual sum of squares (RSS). Minimizing RSS is equivalent to minimizing the average squared distance between objects and their cluster centroids (since n is fixed), a measure of how well each centroid represents the members assigned to the cluster.

k-means (cont d) 16 Algorithm Initialize: Compute centroids for k seeds. Iterate: Assign each object to the cluster with the nearest centroid. Compute new centroids for the clusters. Terminate: When stopping criterion is satisfied. Properties In short, we iteratively reassign memberships and recompute centroids until the configuration stabilizes. WCSS is monotonically decreasing (or unchanged) for each iteration. Guaranteed to converge but not to find the global minimum. The time complexity is linear, O(kn).

17 k-means example for k = 2 in R 2 (Manning, Raghavan & Schütze 2008)

Comments on k-means 18 Seeding We initialize the algorithm by choosing random seeds that we use to compute the first set of centroids. Many possible heuristics for selecting the seeds: pick k random objects from the collection; pick k random points in the space; pick k sets of m random points and compute centroids for each set; compute an hierarchical clustering on a subset of the data to find k initial clusters; etc.. The initial seeds can have a large impact on the resulting clustering (because we typically end up only finding a local minimum of the objective function). Outliers are troublemakers.

Comments on k-means 19 Possible termination criterions Fixed number of iterations Clusters or centroids are unchanged between iterations. Threshold on the decrease of the objective function (absolute or relative to previous iteration) Some Close Relatives of k-means k-medoids: Like k-means but uses medoids instead of centroids to represent the cluster centers. Fuzzy c-means (FCM): Like k-means but assigns soft memberships in [0, 1], where membership is a function of the centroid distance. The computations of both WCSS and centroids are weighted by the membership function.

Flat Clustering: The good and the bad 20 Pros Conceptually simple, and easy to implement. Efficient. Typically linear in the number of objects. Cons The dependence on the random seeds makes the clustering non-deterministic. The number of clusters k must be pre-specified. Often no principled means of a priori specifying k. The clustering quality often considered inferior to that of the less efficient hierarchical methods. Not as informative as the more stuctured clusterings produced by hierarchical methods.

Next week 21 Agglomerative vs. divisive hierarchical clustering Reading: Chapter 17 in Manning, Raghavan & Schütze (2008), Introduction to Information Retrieval; http://informationretrieval.org/ (see course web-page for the relevant sections).