Trade-offs in Explanatory

Similar documents
Trade-offs in Explanatory Model Learning

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Random Forest A. Fornaser

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Tree-based methods for classification and regression

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

CS 229 Midterm Review

Ensemble Methods, Decision Trees

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The Curse of Dimensionality

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Lecture 06 Decision Trees I

ECG782: Multidimensional Digital Signal Processing

STA 4273H: Statistical Machine Learning

Network Traffic Measurements and Analysis

7. Boosting and Bagging Bagging

Applying Supervised Learning

Machine Learning with MATLAB --classification

Naïve Bayes for text classification

Multi-label classification using rule-based classifier systems

Chapter 8 The C 4.5*stat algorithm

Artificial Intelligence. Programming Styles

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Data Mining Lecture 8: Decision Trees

Face detection and recognition. Detection Recognition Sally

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Logical Rhythm - Class 3. August 27, 2018

An introduction to random forests

3 Virtual attribute subsetting

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

Nonparametric Methods Recap

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Slides for Data Mining by I. H. Witten and E. Frank

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Classification. Slide sources:

Applied Statistics for Neuroscientists Part IIa: Machine Learning

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

Machine Learning Techniques for Data Mining

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Nonparametric Classification Methods

Bagging for One-Class Learning

Machine Learning. Chao Lan

PARALLEL CLASSIFICATION ALGORITHMS

Business Club. Decision Trees

Otto Group Product Classification Challenge

Machine Learning for Medical Image Analysis. A. Criminisi

Three Embedded Methods

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

Lecture 2 :: Decision Trees Learning

Random Forests and Boosting

Available online at ScienceDirect. Procedia Computer Science 35 (2014 )

The Basics of Decision Trees

CS294-1 Final Project. Algorithms Comparison

k-nn Disgnosing Breast Cancer

Lecture #11: The Perceptron

Comparing Univariate and Multivariate Decision Trees *

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Supervised vs unsupervised clustering

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Classifier Inspired Scaling for Training Set Selection

University of Florida CISE department Gator Engineering. Clustering Part 4

Machine Learning. Decision Trees. Manfred Huber

Random Forest Classification and Attribute Selection Program rfc3d

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Boosting Simple Model Selection Cross Validation Regularization

Bias-Variance Analysis of Ensemble Learning

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Classification Algorithms in Data Mining

CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD

Final Report: Kaggle Soil Property Prediction Challenge

Clustering Part 4 DBSCAN

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

Classifier Case Study: Viola-Jones Face Detector

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models

Globally Induced Forest: A Prepruning Compression Scheme

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Contents. Preface to the Second Edition

Interpretable Machine Learning with Applications to Banking

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

CS145: INTRODUCTION TO DATA MINING

Information Management course

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

8. Tree-based approaches

Transcription:

1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon

2 Outline Motivation: need for interpretable models Overview of data analysis tools Model evaluation accuracy vs complexity Model evaluation understandability Example applications Summary

3 Example Application: Nuclear Threat Detection Border control: vehicles are scanned Human in the loop interpreting results prediction feedback vehicle scan

4 Boosted Decision Stumps Accurate, but hard to interpret How is the prediction derived from the input?

5 Decision Tree More Interpretable yes Radiation > x% no no Payload type = ceramics yes Consider balance of Th232, Ra226 and Co60 yes Uranium level > max. admissible for ceramics no Threat Clear

6 Motivation Many users are willing to trade accuracy to better understand the system-yielded results Need: simple, interpretable model Need: explanatory prediction process

7 Analysis Tools Black-box Random Forests Very accurate tree ensemble L. Breiman, Random Forests, 2001 Boosting Guarantee: decreases training error R. Schapire, The boosting approach to machine learning Multi-boosting Bagged boosting G. Webb, MultiBoosting: A Technique for Combining Boosting and Weighted Bagging

8 Analysis Tools White-box CART Decision tree based on the Gini Impurity criterion Feating Dec. tree with leaf classifiers K. Ting, G. Webb, FaSS: Ensembles for Stable Learners Subspacing Ensemble: each discriminator trained on a random subset of features R. Bryll, Attribute bagging EOP Builds a decision list that selects the classifier to deal with a query point

9 Explanation-Oriented Partitioning 2 Gaussians Uniform cube 5 4 3 2 1 0-1 -2 5 4 3 2 1-3 5 4 3 2 1 0-1 -2-3 -4-3 -2-1 0 1 2 3 4 5 0-1 -2-3 -4-3 -2-1 0 1 2 3 4 5 (X,Y) plot

10 EOP Execution Example 3D data Step 1: Select a projection - (X 1,X 2 )

11 EOP Execution Example 3D data Step 1: Select a projection - (X 1,X 2 )

12 EOP Execution Example 3D data h 1 Step 2: Choose a good classifier - call it h 1

13 EOP Execution Example 3D data Step 2: Choose a good classifier - call it h 1

14 EOP Execution Example 3D data OK NOT OK Step 3: Estimate accuracy of h 1 at each point

15 EOP Execution Example 3D data Step 3: Estimate accuracy of h 1 for each point

16 EOP Execution Example 3D data Step 4: Identify high accuracy regions

17 EOP Execution Example 3D data Step 4: Identify high accuracy regions

18 EOP Execution Example 3D data Step 5:Training points - removed from consideration

19 EOP Execution Example 3D data Step 5:Training points - removed from consideration

20 EOP Execution Example 3D data Finished first iteration

21 EOP Execution Example 3D data Finished second iteration

22 EOP Execution Example 3D data Iterate until all data is accounted for or error cannot be decreased

23 Learned Model Processing query [x 1 x 2 x 3 ] [x 1 x 2 ] in R 1? no [x 2 x 3 ] in R 2? no [x 1 x 3 ] in R 3? no yes yes yes h 1 (x 1 x 2 ) h 2 (x 2 x 3 ) h 3 (x 1 x 3 ) Default Value

24 Parametric / Nonparametric Regions Bounding Polyhedra Enclose points in convex shapes (hyper-rectangles /spheres). Easy to test inclusion Visually appealing Inflexible decision Nearest-neighbor Score Consider the k-nearest neighbors Region: { X Score(X) > t} t learned threshold Easy to test inclusion Can look insular Deals with irregularities decision n 3 p n 2 n 5 n 1 n 4 Incorrectly classified Correctly classified Query point

25 Feating and EOP Feating EOP Tiles in feature space Models trained on all features Decision Tree Decision Structures to pick right classification model Decision List Flexible Regions Models trained on subspaces

26 Outline Motivation: need for interpretable models Overview of data analysis tools Model evaluation accuracy vs complexity Model evaluation understandability Example applications Summary

27 Overview of datasets Real valued features, binary output Artificial data 10 features Low-d Gaussians/uniform cubes UCI repository Application-related datasets Results by k-fold cross validation Complexity = expected number of vector operations performed for a classification task

28 10 9 8 7 6 5 4 3 2 1 EOP vs AdaBoost - SVM base classifiers EOP is often less accurate, but not significantly the reduction of complexity is statistically significant 0.85 0.9 0.95 1 0 100 200 300 Boosting Accuracy EOP (nonparametric) mean diff in accuracy: 0.5% mean diff in complexity: 85 p-value of 2-sided test: 0.832 p-value of 2-sided test: 0.003 10 9 8 7 6 5 4 3 2 1 Complexity

29 BT V MB BCW EOP (stumps as base classifiers) vs CART on data from the UCI repository 0 0.5 1 CART is the most accurate Accuracy 0 20 Complexity Dataset # of Features # of Points Breast Tissue 10 1006 Vowel 9 990 MiniBOONE 10 5000 Breast Cancer 10 596 CART EOP N. EOP P. Parametric EOP yields the simplest models

30 Why are EOP models less complex? Typical XOR dataset

31 Why are EOP models less complex? Typical XOR dataset CART is accurate takes many iterations does not uncover or leverage structure of data

32 Why are EOP models less complex? Typical XOR dataset CART is accurate takes many iterations does not uncover or leverage structure of data EOP equally accurate uncovers structure + o Iteration 1 o + Iteration 2

33 Error Error Variation With Model Complexity for EOP and CART 0.5 0.4 0.3 0.2 Error variation with model complexity Breast Cancer Wis CART Breast Cancer Wis EOP MiniBOONE CART MiniBOONE EOP Breast Tissue CART Breast Tissue EOP Vowel CART Vowel EOP 0.1 0 1 2 3 4 5 6 7 8 Depth of decision tree/list Depth of decision tree/list At low complexities, EOP is typically more accurate

34 UCI data Accuracy R-EOP Vow BT MB BCW N-EOP CART Feating Sub-spacing Multiboosting Random Forests 0 0.2 0.4 0.6 0.8 1 1.2

35 UCI data Model complexity Vow R-EOP N-EOP CART BT Feating Sub-spacing MB Multiboosting BCW Complexity of Random Forests is huge - thousands of nodes - 0 20 40 60 80

36 Accuracy Robustness Accuracy-targeting EOP identifies which portions of the data can be confidently classified with a given rate. Accuracy of EOP when regions do not include noisy data Max allowed error

37 Outline Motivation: need for interpretable models Overview of data analysis tools Model evaluation accuracy vs complexity Model evaluation understandability Example applications Summary

38 Metrics of Explainability Lift Bayes Factor J-Score Normalized Mutual Information

39 Evaluation with usefulness metrics For 3 out of 4 metrics, EOP beats CART CART EOP BF L J NMI BF L J NMI MB 1.982 0.004 0.389 0.040 1.889 0.007 0.201 0.502 BCW 1.057 0.007 0.004 0.011 2.204 0.069 0.150 0.635 BT 0.000 0.009 0.210 0.000 Inf 0.021 0.088 0.643 V Inf 0.020 0.210 0.010 2.166 0.040 0.177 0.383 Mean 1.520 0.010 0.203 0.015 2.047 0.034 0.154 0.541 BF =Bayes Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info Higher values are better

40 Outline Motivation: need for interpretable models Overview of data analysis tools Model evaluation accuracy vs complexity Model evaluation understandability Example application Summary

41 Spam Detection (UCI SPAMBASE ) 10 features: frequencies of misc. words in e-mails Output: spam or not 0.9 Accuracy 0.85 0.8 0.75 0.7 0.65 100 Splits Complexity 90 80 70 60 50 40 30 20 10 0

42 Spam Detection Iteration 1 classifier labels everything as spam high confidence regions do enclose mostly spam and: Incidence of the word your is low Length of text in capital letters is high

43 Spam Detection Iteration 2 the required incidence of capitals is increased the square region on the left also encloses examples that will be marked as `not spam'

44 Spam Detection Iteration 3 Classifier marks everything as spam Frequency of your and hi determine the regions word_frequency_hi

45 Effects of Cell Treatment Monitored population of cells 7 features: cycle time, area, perimeter... Task: determine which cells were treated 0.8 Accuracy 0.79 0.78 0.77 0.76 0.75 0.74 0.73 0.72 0.71 0.7 25 Splits Complexity 20 15 10 5 0

46

47 Mimic Medication Data Information about administered medication Features: dosage for each drug Task: predict patient return to ICU 0.9945 Accuracy 0.994 0.9935 0.993 0.9925 0.992 0.9915 25 Splits Complexity 20 15 10 5 0

48

49 Predicting Fuel Consumption 10 features: vehicle and driving style characteristics Output: fuel consumption level (high/low) 0.8 Accuracy 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 25 20 15 10 5 0 Splits Complexity

50

No match 51 Nuclear threat detection data Random Forests accuracy: 0.94 Rectangular EOP accuracy: 0.881 but Regions found in 1 st iteration for Fold 0: incident.riidfeatures.snr [2.90,9.2] Incident.riidFeatures.gammaDose [0,1.86]*10-8 Regions found in 2st iteration for Fold 1: incident.rpmfeatures.gamma.sigma [2.5, 17.381] incident.rpmfeatures.gammastatistics.skewdose [1.31, ]

52 Summary White box models (CART, Feating, Sub-spacing) ~ as accurate as typical black-box models - B, MB In most cases EOP: maintains accuracy reduces complexity identifies useful aspects of the data EOP wins in terms of expressiveness Trade-offs Accuracy vs Complexity Accuracy vs Coverage Open questions: What if no good low-dimensional projections found? What to do with inconsistent models in different folds of cv?