Multiclass Classification

Similar documents
Ensemble Methods: Bagging

Instructor: Jessica Wu Harvey Mudd College

CS145: INTRODUCTION TO DATA MINING

CS249: ADVANCED DATA MINING

Multi-Class Logistic Regression and Perceptron

LARGE MARGIN CLASSIFIERS

ADVANCED CLASSIFICATION TECHNIQUES

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Evaluating Classifiers

Pattern recognition (4)

An Empirical Study on Lazy Multilabel Classification Algorithms

Topics in Machine Learning

Classification. 1 o Semestre 2007/2008

Evaluating Classifiers

2. On classification and related tasks

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Function Algorithms: Linear Regression, Logistic Regression

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

CSE 158. Web Mining and Recommender Systems. Midterm recap

8. Tree-based approaches

Evaluating Machine-Learning Methods. Goals for the lecture

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Support Vector Machines

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Introduction to Automated Text Analysis. bit.ly/poir599

Classification: Linear Discriminant Functions

An Empirical Study of Lazy Multilabel Classification Algorithms

CSEP 573: Artificial Intelligence

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Weka ( )

Evaluating Classifiers

On The Value of Leave-One-Out Cross-Validation Bounds

Optimal Extension of Error Correcting Output Codes

Gene Clustering & Classification

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

ECE 5470 Classification, Machine Learning, and Neural Network Review

Supervised classification of law area in the legal domain

Lecture 3: Truth Tables and Logic Gates. Instructor: Joelle Pineau Class web page:

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Search Engines. Information Retrieval in Practice

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

15-780: Problem Set #2

Evaluating Machine Learning Methods: Part 1

Multimedia Information Retrieval

Network Traffic Measurements and Analysis

Information Management course

Programming Exercise 3: Multi-class Classification and Neural Networks

COS 226 Fall 2015 Midterm Exam pts.; 60 minutes; 8 Qs; 15 pgs :00 p.m. Name:

CS145: INTRODUCTION TO DATA MINING

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Some questions of consensus building using co-association

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai

Recap from Monday. Visualizing Networks Caffe overview Slides are now online

Intro to Artificial Intelligence

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

A Feature Selection Method to Handle Imbalanced Data in Text Classification

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Generalizing Binary Classiers to the Multiclass Case

Neural Networks (Overview) Prof. Richard Zanibbi

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

1 Machine Learning System Design

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Multi-label classification using rule-based classifier systems

CISC 4631 Data Mining

Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data

Exploiting scene constraints to improve object detection algorithms for industrial applications

Supervised vs unsupervised clustering

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

On the automatic classification of app reviews

Machine Learning: Think Big and Parallel

Applied Scaling & Classification Techniques in Political Science. Lecture 6 Dictionaries and Supervised classification methods

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Model s Performance Measures

A Bagging Method using Decision Trees in the Role of Base Classifiers

Kernel Methods & Support Vector Machines

SVM: Multiclass and Structured Prediction. Bin Zhao

Recognition Tools: Support Vector Machines

HOG-based Pedestriant Detector Training

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Bayes Classifiers and Generative Methods

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht

CS5670: Computer Vision

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Classification and Regression Trees

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

Applying Supervised Learning

Classification Part 4

Lecture Linear Support Vector Machines

CS451 - Assignment 3 Perceptron Learning Algorithm

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

GenStat for Schools. Disappearing Rock Wren in Fiordland

Detection of a Single Hand Shape in the Foreground of Still Images

GRAPHING LINEAR INEQUALITIES AND FEASIBLE REGIONS

Transcription:

Multiclass Classification Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), David Kauchak (Pomona), Tommi Jaakola (MIT) and the many others who made their course materials freely available online. Robot Image Credit: Viktoriya Sukhanova 23RF.com Multiclass Classification Learning Goals Describe common strategies for multiclass classification and the trade offs of each one versus all, one versus one black box approach using output codes Describe how to evaluate multiclass problems micro averaging, macro averaging Describe the difference between multiclass and multilabel problems (extra)

Setup Most classification problems are multiclass (more than 2 labels). Can you name some real world examples? document classification protein classification handwriting recognition face recognition sentiment analysis autonomous vehicles emotion recognition Which of our current classifiers work out of the box? Black Box Approach Abstraction: We have a generic binary classifier. optionally output confidence/score Can we use it to solve a multiclass problem? Consider simple three class classification task.

Approach: One versus All (OVA) For each class, pose a binary classification task. All examples of this class are positive All other examples are negative. apple vs. not orange vs. not banana vs. not apple orange apple banana banana How do we classify? OVA with Linear Classifiers banana vs. not pineapple vs. not apple vs. not

OVA: Classify If classifier does not provide confidence (this is rare) and there is ambiguity, pick one of the ones in conflict. e.g. as measured by distance to hyperplane Else Pick the most confident positive. If none vote positive, pick least confident negative. What does the decision boundary look like? banana vs. not pineapple vs. not apple vs. not Generalizing to Output Codes Let y = {1,2,3}. To turn a m class classification task into m binary classification tasks: Each column defines how classes are translated to binary classes in each component task. y = 1 y = 2 y = 3 Each row corresponds to one of original classes. Ex: To solve task 3, any example with class y = 2 has target binary class R(2,3) = 1.

Training Separately train classifiers h 1 (x), h 2 (x), h 3 (x) to solve each associated binary task. If is original 3 class training set, then h 1 (x) is trained with, where so that corresponds to first column of R. Prediction Combine outputs of trained binary classifiers into full multiclass classifier. Clearly, prediction of original labels has to be based on output code R. If h i (x) {, 1}, we can simply predict class label that best agrees binary predictions: where k = 3 is the number of binary tasks. Notes: If R(y,j) and h j (x) match (in sign), then h j (x) agrees with predicting (multiclass) y. If each component classifier h j (x) predicts R(y,j) consistent with true label y, then above sum will be maximized for that y. each product determines whether binary task j agrees or disagrees with possible label y

Example h 1 (x) =, h 2 (x) = 1, h 3 (x) = 1 (OVA code) What is predicted label? Extensions Problems Classifier outputs may be contradictory. Ex: If h 1 (x) =, h 2 (x) = 1, h 3 (x) =, then predict y = 1 or y = 3. Using h i (x) {, 1} omits how strongly each classifier insists on its binary label. Solutions Use discriminant function values h i (x) = T x Use loss function, then predict class most consistent with classifiers (minimum loss): Loss function measures how poorly discriminant function matches particular binary class [explore in homework]. Note: where

Problems with OVA? What kind of training set will be difficult for OVA? One versus One (All versus All) y = 1 y = 2 y = 3 0 in output code indicates examples not part of binary classification task. Ex. class 3 excluded from task.

Comparing Output Codes Some output codes require more binary tasks OVA: OVO: Some binary tasks require larger training set OVA: OVO: Some binary tasks are harder to solve than others OVA: OVO: If using binary classifiers, OVA is the most common. scikit-learn: OVA for all linear models except SVC OVO for SVC Minimal Output Code? Minimal number of binary tasks but one task could be hard, leading to poorly performing classifier with only two tasks, a single poorly performing classifier degrades performance considerably With more binary classifiers multiclass classifier has chance to error correct

Multiclass Evaluation class apple orange apple prediction orange orange apple How should we evaluate? recall = (TP) / (TP + FN) banana banana pineapple pineapple banana pineapple Problems? Micro and Macro Averaging class prediction apple orange orange orange apple apple banana pineapple banana banana pineapple pineapple recall apple =/2 orange =/1 banana =/2 pineapple =/1 Micro averaging average over examples normal way of calculating recall = 4/6 Macro averaging calculate metric for each class, then average over classes put more emphasis on rarer classes recall = 3/4

Confusion Matrices prediction actual Classic Country Disco Hiphop Jazz Rock Classic 86 2 0 4 18 1 Country 1 57 5 1 12 13 Disco 0 6 55 4 0 5 Hiphop 0 15 28 90 4 18 Jazz 7 1 0 0 37 12 Rock 6 19 11 0 27 48 entry (i, j) represents number of examples of class i that were predicted to be class j Multilabel Classification Is it edible? Is it sweet? Is it a fruit? Is it a banana? Is it a banana? Is it an apple? Is it an orange? Is it a pineapple? Differences? Is it a banana? Is it yellow? Is it sweet? Is it round? nested / hierarchical exclusive/ multiclass general / structured

Multiclass vs Multilabel Multiclass each example has one label and exactly one label Multilabel (also called annotation) each example has zero or more labels Multilabel applications? image annotation document topics medical diagnosis