Part III: Multi-class ROC

Similar documents
Part II: A broader view

Machine Learning: Symbolische Ansätze

Nächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.

Rank Measures for Ordering

On classification, ranking, and probability estimation

The Relationship Between Precision-Recall and ROC Curves

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

F-Measure Curves for Visualizing Classifier Performance with Imbalanced Data

Context Change and Versatile Models in Machine Learning

ROC Analysis of Example Weighting in Subgroup Discovery

Evaluating Machine-Learning Methods. Goals for the lecture

What ROC Curves Can t Do (and Cost Curves Can)

Calibrating Random Forests

What ROC Curves Can t Do (and Cost Curves Can)

SHOPPING PACKAGE SOFTWARE USING ENHANCED TOP K-ITEM SET ALGORITHM THROUGH DATA MINING

Evaluating Machine Learning Methods: Part 1

Classification. Instructor: Wei Ding

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Cost-Sensitive Classifier Selection Using the ROC Convex Hull Method

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

Predict Employees Computer Access Needs in Company

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

Technical Report. No: BU-CE A Discretization Method based on Maximizing the Area Under ROC Curve. Murat Kurtcephe, H. Altay Güvenir January 2010

On the use of ROC analysis for the optimization of abstaining classifiers

Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs

IT is desirable that a classification method produces membership

Optimizing Classifiers for Hypothetical Scenarios

Evaluating Classifiers

Optimizing Classifiers for Hypothetical Scenarios

Learning to Rank by Maximizing AUC with Linear Programming

Fast or furious? - User analysis of SF Express Inc

Evaluating Classifiers

An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics

A Critical Review of Multi-Objective Optimization in Data Mining: a position paper

Data Mining and Knowledge Discovery: Practice Notes

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

CS145: INTRODUCTION TO DATA MINING

A Privacy Preserving Data Mining Methodology for Dynamically Predicting Emerging Human Threats

SD-Map A Fast Algorithm for Exhaustive Subgroup Discovery

A Survey on Postive and Unlabelled Learning

Cost-sensitive Discretization of Numeric Attributes

Data Mining and Knowledge Discovery Practice notes 2

Uplift Modeling with ROC: An SRL Case Study

Visualisation of multi-class ROC surfaces

Data Mining and Knowledge Discovery: Practice Notes

Similarity-Binning Averaging: A Generalisation of Binning Calibration

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Using Decision Trees and Soft Labeling to Filter Mislabeled Data. Abstract

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

The Fuzzy Search for Association Rules with Interestingness Measure

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

arxiv: v1 [stat.ml] 25 Jan 2018

Efficient Pairwise Classification

Improving Classifier Fusion via Pool Adjacent Violators Normalization

Further Thoughts on Precision

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Robust Classification for Imprecise Environments

Classification Part 4

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Version Space Support Vector Machines: An Extended Paper

Multiobjective Optimization of Classifiers by Means of 3D Convex-Hull-Based Evolutionary Algorithms

Modifying ROC Curves to Incorporate Predicted Probabilities

Probabilistic Classifiers DWML, /27

CS570: Introduction to Data Mining

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

A Projection-Based Framework for Classifier Performance Evaluation

Machine Learning Techniques for Data Mining

Slides for Data Mining by I. H. Witten and E. Frank

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Stochastic propositionalization of relational data using aggregates

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CS 584 Data Mining. Classification 3

Louis Fourrier Fabien Gaie Thomas Rolf

An Empirical Study of Lazy Multilabel Classification Algorithms

Antonio Bella Sanjuán. Supervisors: César Ferri Ramírez José Hernández Orallo Maria José Ramírez Quintana

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Regression Error Characteristic Surfaces

The Expected Performance Curve: a New Assessment Measure for Person Authentication

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

Cost-sensitive C4.5 with post-pruning and competition

SSV Criterion Based Discretization for Naive Bayes Classifiers

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

A Genetic-Fuzzy Classification Approach to improve High-Dimensional Intrusion Detection System

Univariate Margin Tree

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Cost Curves: An Improved Method for Visualizing Classifier Performance

Sample Size and Modeling Accuracy with Decision Tree Based Data Mining Tools. James Morgan, Robert Dougherty, Allan Hilchie, and Bern Carey

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

Correlation Based Feature Selection with Irrelevant Feature Removal

A Lazy Approach for Machine Learning Algorithms

Hybrid Feature Selection for Modeling Intrusion Detection Systems

ECS289: Scalable Machine Learning

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

A Hierarchical Document Clustering Approach with Frequent Itemsets

Transcription:

Part III: Multi-class ROC The general problem multi-objective optimisation Pareto front convex hull Searching and approximating the ROC hypersurface multi-class AUC multi-class calibration 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 81/92

The general problem Two-class ROC analysis is a special case of multi-objective optimisation don t commit to trade-off between objectives Pareto front is the set of points for which no other point improves all objectives points not on the Pareto front are dominated assumes monotonic trade-off between objectives Convex hull is subset of Pareto front assumes linear trade-off between objectives e.g. accuracy, but not precision 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 82/92

How many dimensions? Depends on the cost model 1-vs-rest: fixed misclassification cost C( c c) for each class c C > C dimensions ROC space spanned by either tpr for each class or fpr for each class 1-vs-1: different misclassification costs C(c i c j ) for each pair of classes c i c j > C ( C 1) dimensions ROC space spanned by fpr for each (ordered) pair of classes Results about convex hull, optimal point given linear cost function etc. generalise (Srinivasan, 1999) 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 83/92

Multi-class AUC In the most general case, we want to calculate Volume Under ROC Surface (VUS) See (Mossman, 1999) for VUS in the 1-vs-rest three-class case Can be approximated by projecting down to set of two-dimensional curves and averaging MAUC (Hand & Till, 2001): 1-vs-1, unweighted average (Provost & Domingos, 2001): 1-vs-rest, AUC for class c weighted by P(c) 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 84/92

Multi-class calibration 1. How to manipulate scores f(x,c) in order to obtain different ROC points? depends on the cost model 2. How to search these ROC points to find optimum? exhaustive search probably infeasible, so needs to be approximated 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 85/92

A simple 1-vs-rest approach 1. From thresholds to weights: predict argmax c w c f(x,c) NB. two-class thresholds are a special case: w + f(x,+) > w f(x, ) f(x,+)/f(x, ) > w /w + 2. Setting the weights (Lachiche & Flach, 2003) Assume an ordering on classes and set the weights in a greedy fashion Set w 1 = 1 For classes c=2 to n look for the best weight w c according to the weights fixed so far for classes c'<c, using the two-class algorithm 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 86/92

Example: 3 classes 3 (0,0,1) (0,1,0) (1,0,0) 1 2 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 87/92

Discussion Strong experimental results 13 significant wins (95%), 22 draws, 2 losses on UCI data Sensitive to the ordering of classes largest classes first is best No guarantee to find a global (or even a local) optimum lots of scope for improvement, e.g. stochastic search 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 88/92

The many faces of ROC analysis ROC analysis for model evaluation and selection key idea: separate performance on classes think rankers, not classifiers! information in ROC curves not easily captured by statistics ROC visualisation for understanding ML metrics towards a theory of ML metrics types of metrics, equivalences, skew-sensitivity ROC metrics for use within ML algorithms one classifier can be many classifiers! separate skew-insensitive parts of learning probabilistic model, unlabelled tree from skew-sensitive parts selecting thresholds or class weights, labelling and pruning 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 89/92

Outlook Several issues not covered in this tutorial optimising AUC rather than accuracy when training (several papers at ICML 03 and ICML 04) e.g. RankBoost optimises AUC (Cortes & Mohri, 2003) Many open problems remain ROC analysis in rule learning overlapping rules relation between training skew and testing skew multi-class ROC analysis 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 90/92

See also http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html for pointers to ROC literature. References C. Cortes and M. Mohri (2003). AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems (NIPS 03). MIT Press. T.G. Dietterich, M. Kearns, and Y. Mansour (1996). Applying the weak learning framework to understand and improve C4.5. In L. Saitta, editor, Proc. 13th International Conference on Machine Learning (ICML 96), pp. 96-103. Morgan Kaufmann. C. Drummond and R.C. Holte (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In P. Langley, editor, Proc. 17th International Conference on Machine Learning (ICML 00), pp. 239-246. T. Fawcett (2004). ROC graphs: Notes and practical considerations for data mining researchers. Technical report HPL-2003-4, HP Laboratories, Palo Alto, CA, USA. Revised March 16, 2004. Available at http://www.purl.org/net/tfawcett/papers/roc101.pdf. C. Ferri, P.A. Flach, and J. Hernández-Orallo (2002). Learning Decision Trees Using the Area Under the ROC Curve. In C. Sammut and A. Hoffmann, editors, Proc. 19th International Conference on Machine Learning (ICML 02), pp. 139 146. Morgan Kaufmann. P.A. Flach (2003). The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In T. Fawcett and N. Mishra, editors, Proc. 20th International Conference on Machine Learning (ICML 03), pp. 194 201. AAAI Press. P.A. Flach and S. Wu (2003). Reparing concavities in ROC curves. In J.M. Rossiter and T.P. Martin, editors, Proc. 2003 UK workshop on Computational Intelligence (UKCI 03), pp. 38 44. University of Bristol. J. Fürnkranz and P.A. Flach (2003). An analysis of rule evaluation metrics. In T. Fawcett and N. Mishra, editors, Proc. 20th International Conference on Machine Learning (ICML 03), pp. 202 209. AAAI Press. J. Fürnkranz and P.A. Flach (forthcoming). ROC n rule learning towards a better understanding of covering algorithms. Machine Learning, accepted for publication. D. Gamberger and N. Lavrac (2002). Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research, 17, 501 527. D.J. Hand and R.J. Till (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems, Machine Learning, 45, 171-186. N. Lachiche and P.A. Flach (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In T. Fawcett and N. Mishra, editors, Proc. 20th International Conference on Machine Learning (ICML 03), pp. 416 423. AAAI Press. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki (1997). The DET curve in assessment of detection task performance. In Proc. 5th European Conference on Speech Communication and Technology, vol. 4, pp. 1895-1898. D. Mossman (1999). Three-way ROCs. Medical Decision Making 1999(19): 78 89. F. Provost and T. Fawcett (2001). Robust classification for imprecise environments. Machine Learning, 42, 203 231. F. Provost and P. Domingos (2003). Tree induction for probability-based rankings. Machine Learning 52:3. A. Srinivasan (1999). Note on the location of optimal classifiers in n-dimensional ROC space. Technical Report PRG-TR-2-99, Oxford University Computing Laboratory. B. Zadrozny and C. Elkan (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proc. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 02), pp. 694-699. 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 91/92

Acknowledgements Many thanks to: Johannes Fürnkranz, Cèsar Ferri, José Hernández-Orallo, Nicolas Lachiche & Shaomin Wu for joint work on ROC analysis and for some of the material Jim Farrand & Ronaldo Prati for ROC visualisation software Chris Drummond & Rob Holte for material and discussion on cost curves Tom Fawcett & Rich Roberts for some of the ROC graphs Terri Kamm for the DET slide 4 July, 2004 ICML 04 tutorial on ROC analysis Peter Flach Part III: 92/92