Evaluating Classifiers

Size: px

Start display at page:

Download "Evaluating Classifiers"

Arleen Beasley
6 years ago
Views:

1 Evaluating Classifiers

2 Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

3 Evaluating Classifiers What we want: Classifier that best predicts unseen ( test ) data Common assumption: Data is iid (independently and identically distributed)

4 Topics Cross Validation Precision and Recall ROC Analysis Bias, Variance, and Noise

5 Cross-Validation

6 Accuracy and Error Rate Accuracy = fraction of correct classifications on unseen data (test set) Error rate = 1 Accuracy

7 How to use available data to best measure accuracy? Split data into training and test sets. But how to split? Too little training data: Cannot learn a good model Too little test data: Cannot evaluate learned model Also, how to learn hyper-parameters of the model?

8 One solution: k-fold cross validation Used to better estimate generalization accuracy of model Used to learn hyper-parameters of model ( model selection )

9 Using k-fold cross validation to estimate accuracy Each example is used both as a training instance and as a test instance. Instead of splitting data into training set and test set, split data into k disjoint parts: S 1, S 2,..., S k. For i = 1 to k Select S i to be the test set. Train on the remaining data, test on S i, to obtain accuracy A i. Report 1 k k i=1 A i as the final accuracy.

10 Split data into training and test sets. Put test set aside. Split training data into k disjoint parts: S 1, S 2,..., S k. Assume you are learning one hyper-parameter. Choose R possible values for this hyperparameter. For j = 1 to R Using k-fold cross validation to learn hyper-parameters (e.g., learning rate, number of hidden units, SVM kernel, etc. ) For i = 1 to k Select S i to be the validation set Train the classifier on the remaining data using the jth value of the hyperparameter Test the classifier on S i, to obtain accuracy A i,j. Compute the average of the accuracies: A j = 1 k Choose the value j of the hyper-parameter with highest. Retrain the model with all the training data, using this value of the hyper-parameter. Test resulting model on the test set. A j k i=1 A i, j

11 Precision and Recall

12 Evaluating classification algorithms Confusion matrix for a given class c Actual Predicted (or classified ) Positive Negative (in class c) (not in class c) Positive (in class c) TruePositives FalseNegatives Negative (not in class c) FalsePositives TrueNegatives

13 Example: A vs. B Assume A is positive class Confusion Matrix Results from Perceptron: Instance Class Perception Output 1 A 1 2 A +1 3 A +1 4 A 1 5 B +1 6 B 1 7 B 1 1 B 1 Actual Predicted Positive Negative Positive 2 2 Negative 1 3 Accuracy:

14 Evaluating classification algorithms Confusion matrix for a given class c Actual Predicted (or classified ) Positive Negative (in class c) (not in class c) Type 2 error Positive (in class c) TruePositive FalseNegative Negative (not in class c) FalsePositive TrueNegative Type 1 error

15 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10

16 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive.

17 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4.

18 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive.

19 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

20 Recall and Precision All instances in test set Predicted positive Predicted negative x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Recall: Fraction of positive examples predicted as positive. Here: 3/4. Recall = Sensitivity = True Positive Rate Precision: Fraction of examples predicted as positive that are actually positive. Here: 1/2

21 In-class exercise 1

22 Example: A vs. B Assume A is positive class Results from Perceptron: Instance Class Perception Output 1 A 1 2 A +1 3 A +1 4 A 1 5 B +1 6 B 1 7 B 1 1 B 1 Precision = Recall = TP TP + FP TP TP + FN F-measure = 2 precision recall precision + recall

23 Creating a Precision vs. Recall Curve P = R = TP TP + FP TP TP + FN Results of classifier Threshold Accuracy Precision Recall

24 Multiclass classification: Mean Average Precision Average precision for a class: Area under precision/recall curve for that class Mean average precision: Mean of average precision over all classes

25 ROC Analysis

26 ( recall or sensicvity ) (1 - specificity )

28 Creating a ROC Curve True Positive Rate (= Recall) = False Positive Rate = FP TN + FP TP TP + FN Results of classifier Threshold Accuracy TPR FPR

30 How to create a ROC curve for a binary classifier Run your classifier on each instance in the test data, without doing the sgn step: Score(x) = w x Get the range [min, max] of your scores Divide the range into about 200 thresholds, including and max For each threshold, calculate TPR and FPR Plot TPR (y-axis) vs. FPR (x-axis)

31 In-Class Exercise 2

32 Precision/Recall vs. ROC Curves Precision/Recall curves typically used in detection or retrieval tasks, in which positive examples are relatively rare. ROC curves used in tasks where positive/negative examples are more balanced.

33 Bias, Variance, and Noise

34 Bias: Classifier is not powerful enough to represent the true function; that is, it underfits the function. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

35 Variance: Classifier s hypothesis depends on specific training set; that is, it overfits the function. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

36 Noise: Underlying process generating data is stochastic, or data has errors or outliers.. From hjp:// eecs.oregonstate.edu/~tgd/talks/bv.ppt

37 Examples of bias? Examples of variance? Examples of noise?

38 From

39 Illustration of Bias / Variance Tradeoff

40 In-Class Exercise 3

Evaluating Classifiers

Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts