Evaluating Classifiers

Similar documents
Evaluating Classifiers

Classification Part 4

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

CS4491/CS 7265 BIG DATA ANALYTICS

List of Exercises: Data Mining 1 December 12th, 2015

CS145: INTRODUCTION TO DATA MINING

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

CS249: ADVANCED DATA MINING

Classification. Instructor: Wei Ding

Evaluating Machine-Learning Methods. Goals for the lecture

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

MEASURING CLASSIFIER PERFORMANCE

Information Management course

Data Mining and Knowledge Discovery Practice notes 2

Weka ( )

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Introduction to Automated Text Analysis. bit.ly/poir599

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Large Scale Data Analysis Using Deep Learning

2. On classification and related tasks

CS 584 Data Mining. Classification 3

Probabilistic Classifiers DWML, /27

Machine learning in fmri

ECLT 5810 Evaluation of Classification Quality

Evaluating Machine Learning Methods: Part 1

Network Traffic Measurements and Analysis

Data Mining Classification: Bayesian Decision Theory

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

10 Classification: Evaluation

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

Machine Learning and Bioinformatics 機器學習與生物資訊學

Classification. Slide sources:

Performance Measures

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

1 Machine Learning System Design

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Tutorials Case studies

ECE 5470 Classification, Machine Learning, and Neural Network Review

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Louis Fourrier Fabien Gaie Thomas Rolf

DATA MINING OVERFITTING AND EVALUATION

Metrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57

Predictive modelling / Machine Learning Course on Big Data Analytics

CISC 4631 Data Mining

Stages of (Batch) Machine Learning

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

Introduction to Machine Learning

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Classification Algorithms in Data Mining

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Model s Performance Measures

Statistics 202: Statistical Aspects of Data Mining

CLASSIFICATION JELENA JOVANOVIĆ. Web:

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Global Probability of Boundary

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Boolean Classification

Web Information Retrieval. Exercises Evaluation in information retrieval

Use of Synthetic Data in Testing Administrative Records Systems

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CS570: Introduction to Data Mining

Logistic Regression: Probabilistic Interpretation

Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT

Part II: A broader view

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Contents. Preface to the Second Edition

Machine Learning in Biology

Lecture 25: Review I

Package hmeasure. February 20, 2015

Fast or furious? - User analysis of SF Express Inc

Pattern recognition (4)

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

I211: Information infrastructure II

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Tutorials (M. Biehl)

Artificial Intelligence. Programming Styles

Machine Learning: Symbolische Ansätze

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Evaluating Classifiers

Tutorial on Machine Learning Tools

Predict Employees Computer Access Needs in Company

Chapter 3: Supervised Learning

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Regularization and model selection

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Transcription:

Evaluating Classifiers

Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

Evaluating Classifiers What we want: Classifier that best predicts unseen ( test ) data Common assumption: Data is iid (independently and identically distributed)

Topics Cross Validation Precision and Recall ROC Analysis Bias, Variance, and Noise

Cross-Validation

Accuracy and Error Rate Accuracy = fraction of correct classifications on unseen data (test set) Error rate = 1 Accuracy

How to use available data to best measure accuracy? Split data into training and test sets. But how to split? Too little training data: Cannot learn a good model Too little test data: Cannot evaluate learned model Also, how to learn hyper-parameters of the model?

One solution: k-fold cross validation Used to better estimate generalization accuracy of model Used to learn hyper-parameters of model ( model selection )

Using k-fold cross validation to estimate accuracy Each example is used both as a training instance and as a test instance. Instead of splitting data into training set and test set, split data into k disjoint parts: S 1, S 2,..., S k. For i = 1 to k Select S i to be the test set. Train on the remaining data, test on S i, to obtain accuracy A i. Report 1 k k i=1 A i as the final accuracy.

Split data into training and test sets. Put test set aside. Split training data into k disjoint parts: S 1, S 2,..., S k. Assume you are learning one hyper-parameter. Choose R possible values for this hyperparameter. For j = 1 to R Using k-fold cross validation to learn hyper-parameters (e.g., learning rate, number of hidden units, SVM kernel, etc. ) For i = 1 to k Select S i to be the validation set Train the classifier on the remaining data using the jth value of the hyperparameter Test the classifier on S i, to obtain accuracy A i,j. Compute the average of the accuracies: A j = 1 k Choose the value j of the hyper-parameter with highest. Retrain the model with all the training data, using this value of the hyper-parameter. Test resulting model on the test set. A j k i=1 A i, j

Precision and Recall

Evaluating classification algorithms Confusion matrix for a given class c Actual Predicted (or classified ) Positive Negative (in class c) (not in class c) Type 2 error Positive (in class c) TruePositive FalseNegative Negative (not in class c) FalsePositive TrueNegative Type 1 error

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive.

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Recall = Sensitivity = True Positive Rate Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

In-class exercise 1

Example: A vs. B Assume A is positive class Results from Perceptron: Precision = TP TP + FP Instance Class Perception Output 1 A 1 2 A 1 3 A 1 4 A 0 5 B 1 6 B 0 7 B 0 8 B 0 Recall = F-measure = 2 TP TP + FN precision recall precision + recall

Creating a Precision vs. Recall Curve P = TP TP + FP R = TP TP + FN Results of classifier Threshold Accuracy Precision Recall.9.8.7.6.5.4.3.2.1 -

Multiclass classification: Mean Average Precision Average precision for a class: Area under precision/recall curve for that class Mean average precision: Mean of average precision over all classes

ROC Analysis

( recall or sensicvity ) (1 - specificity )

Creating a ROC Curve True Positive Rate (= Recall) = False Positive Rate = FP TN + FP TP TP + FN Results of classifier Threshold Accuracy TPR FPR.9.8.7.6.5.4.3.2.1 -

How to create a ROC curve for a binary classifier Run your classifier on each instance in the test data. Calculate score. Get the range [min, max] of your scores Divide the range into about 200 thresholds, including and max For each threshold, calculate TPR and FPR Plot TPR (y-axis) vs. FPR (x-axis)

In-Class Exercise 2

Precision/Recall vs. ROC Curves Precision/Recall curves typically used in detection or retrieval tasks, in which positive examples are relatively rare. ROC curves used in tasks where positive/negative examples are more balanced.

Bias, Variance, and Noise

Bias, Variance, and Noise A classifier s error can be separated into three components: bias, variance, and noise

Bias: Related to ability of model to learn function High Bias: Model is not powerful enough to learn function. Learns a lesscomplicated function (underfitting). Low Bias: Model is too powerful; learns a too-complicated function (overfitting). From http:// eecs.oregonstate.edu/~tgd/talks/bv.ppt hips://www.quora.com/topic/logiscc-regression

Variance: How much a learned function will vary if different training sets are used Same experiment, repeated: with 50 samples of 20 points each From hip:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

Noise: Underlying process generating data is stochastic, or data has errors or outliers.. From hip:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

Examples of causes of bias? Examples of causes of variance? Examples of causes of noise?

From http://lear.inrialpes.fr/~verbeek/mlcr.10.11.php