Evaluating Classifiers

Similar documents
Evaluating Classifiers

Classification Part 4

CS4491/CS 7265 BIG DATA ANALYTICS

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

List of Exercises: Data Mining 1 December 12th, 2015

CS145: INTRODUCTION TO DATA MINING

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Classification. Instructor: Wei Ding

CS249: ADVANCED DATA MINING

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Evaluating Machine-Learning Methods. Goals for the lecture

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

MEASURING CLASSIFIER PERFORMANCE

Weka ( )

Performance Measures

Data Mining and Knowledge Discovery Practice notes 2

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Data Mining and Knowledge Discovery: Practice Notes

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Information Management course

Data Mining and Knowledge Discovery: Practice Notes

Evaluating Machine Learning Methods: Part 1

Large Scale Data Analysis Using Deep Learning

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Introduction to Automated Text Analysis. bit.ly/poir599

CS 584 Data Mining. Classification 3

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Probabilistic Classifiers DWML, /27

Machine learning in fmri

ECLT 5810 Evaluation of Classification Quality

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

2. On classification and related tasks

Network Traffic Measurements and Analysis

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

10 Classification: Evaluation

Machine Learning and Bioinformatics 機器學習與生物資訊學

CISC 4631 Data Mining

Data Mining Classification: Bayesian Decision Theory

DATA MINING OVERFITTING AND EVALUATION

Machine Learning for. Artem Lind & Aleskandr Tkachenko

1 Machine Learning System Design

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

Classification. Slide sources:

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

Metrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57

Web Information Retrieval. Exercises Evaluation in information retrieval

Stages of (Batch) Machine Learning

Predictive modelling / Machine Learning Course on Big Data Analytics

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Model s Performance Measures

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Statistics 202: Statistical Aspects of Data Mining

CLASSIFICATION JELENA JOVANOVIĆ. Web:

Tutorials Case studies

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

ECE 5470 Classification, Machine Learning, and Neural Network Review

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Machine Learning: Symbolische Ansätze

Introduction to Machine Learning

Louis Fourrier Fabien Gaie Thomas Rolf

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Use of Synthetic Data in Testing Administrative Records Systems

Nächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?

Classification Algorithms in Data Mining

Regularization and model selection

Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT

Logistic Regression: Probabilistic Interpretation

Contents. Preface to the Second Edition

Part II: A broader view

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Pattern recognition (4)

Package hmeasure. February 20, 2015

Fast or furious? - User analysis of SF Express Inc

Global Probability of Boundary

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Boolean Classification

CS570: Introduction to Data Mining

Chapter 3: Supervised Learning

Predict Employees Computer Access Needs in Company

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

EE795: Computer Vision and Intelligent Systems

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Machine Learning in Biology

Chuck Cartledge, PhD. 23 September 2017

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Data Mining Concepts & Techniques

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

MACHINE LEARNING TOOLBOX. Logistic regression on Sonar

Transcription:

Evaluating Classifiers

Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

Evaluating Classifiers What we want: Classifier that best predicts unseen ( test ) data Common assumption: Data is iid (independently and identically distributed)

Topics Cross Validation Precision and Recall ROC Analysis Bias, Variance, and Noise

Cross-Validation

Accuracy and Error Rate Accuracy = fraction of correct classifications on unseen data (test set) Error rate = 1 Accuracy

How to use available data to best measure accuracy? Split data into training and test sets. But how to split? Too little training data: Cannot learn a good model Too little test data: Cannot evaluate learned model Also, how to learn hyper-parameters of the model?

One solution: k-fold cross validation Used to better estimate generalization accuracy of model Used to learn hyper-parameters of model ( model selection )

Using k-fold cross validation to estimate accuracy Each example is used both as a training instance and as a test instance. Instead of splitting data into training set and test set, split data into k disjoint parts: S 1, S 2,..., S k. For i = 1 to k Select S i to be the test set. Train on the remaining data, test on S i, to obtain accuracy A i. Report 1 k k i=1 A i as the final accuracy.

Split data into training and test sets. Put test set aside. Split training data into k disjoint parts: S 1, S 2,..., S k. Assume you are learning one hyper-parameter. Choose R possible values for this hyperparameter. For j = 1 to R Using k-fold cross validation to learn hyper-parameters (e.g., learning rate, number of hidden units, SVM kernel, etc. ) For i = 1 to k Select S i to be the validation set Train the classifier on the remaining data using the jth value of the hyperparameter Test the classifier on S i, to obtain accuracy A i,j. Compute the average of the accuracies: A j = 1 k Choose the value j of the hyper-parameter with highest. Retrain the model with all the training data, using this value of the hyper-parameter. Test resulting model on the test set. A j k i=1 A i, j

Precision and Recall

Evaluating classification algorithms Confusion matrix for a given class c Actual Predicted (or classified ) Positive Negative (in class c) (not in class c) Positive (in class c) TruePositives FalseNegatives Negative (not in class c) FalsePositives TrueNegatives

Example: A vs. B Assume A is positive class Confusion Matrix Results from Perceptron: Instance Class Perception Output 1 A 1 2 A +1 3 A +1 4 A 1 5 B +1 6 B 1 7 B 1 1 B 1 Actual Predicted Positive Negative Positive 2 2 Negative 1 3 Accuracy:

Evaluating classification algorithms Confusion matrix for a given class c Actual Predicted (or classified ) Positive Negative (in class c) (not in class c) Type 2 error Positive (in class c) TruePositive FalseNegative Negative (not in class c) FalsePositive TrueNegative Type 1 error

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Recall = Sensitivity = True Positive Rate Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

In-class exercise 1

Example: A vs. B Assume A is positive class Results from Perceptron: Instance Class Perception Output 1 A 1 2 A +1 3 A +1 4 A 1 5 B +1 6 B 1 7 B 1 1 B 1 Precision = Recall = TP TP + FP TP TP + FN F-measure = 2 precision recall precision + recall

Creating a Precision vs. Recall Curve P = R = TP TP + FP TP TP + FN Results of classifier Threshold Accuracy Precision Recall.9.8.7.6.5.4.3.2.1 -

Multiclass classification: Mean Average Precision Average precision for a class: Area under precision/recall curve for that class Mean average precision: Mean of average precision over all classes

ROC Analysis

( recall or sensicvity ) (1 - specificity )

Creating a ROC Curve True Positive Rate (= Recall) = False Positive Rate = FP TN + FP TP TP + FN Results of classifier Threshold Accuracy TPR FPR.9.8.7.6.5.4.3.2.1 -

How to create a ROC curve for a binary classifier Run your classifier on each instance in the test data, without doing the sgn step: Score(x) = w x Get the range [min, max] of your scores Divide the range into about 200 thresholds, including and max For each threshold, calculate TPR and FPR Plot TPR (y-axis) vs. FPR (x-axis)

In-Class Exercise 2

Precision/Recall vs. ROC Curves Precision/Recall curves typically used in detection or retrieval tasks, in which positive examples are relatively rare. ROC curves used in tasks where positive/negative examples are more balanced.

Bias, Variance, and Noise

Bias: Classifier is not powerful enough to represent the true function; that is, it underfits the function. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

Variance: Classifier s hypothesis depends on specific training set; that is, it overfits the function. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

Noise: Underlying process generating data is stochastic, or data has errors or outliers.. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

Examples of bias? Examples of variance? Examples of noise?

From http://lear.inrialpes.fr/~verbeek/mlcr.10.11.php

Illustration of Bias / Variance Tradeoff

In-Class Exercise 3