Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Similar documents
Classification Part 4

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Classification. Instructor: Wei Ding

Model s Performance Measures

Data Mining Classification: Bayesian Decision Theory

CS 584 Data Mining. Classification 3

10 Classification: Evaluation

CISC 4631 Data Mining

CS145: INTRODUCTION TO DATA MINING

CS4491/CS 7265 BIG DATA ANALYTICS

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Statistics 202: Statistical Aspects of Data Mining

DATA MINING OVERFITTING AND EVALUATION

Evaluating Classifiers

List of Exercises: Data Mining 1 December 12th, 2015

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Evaluating Classifiers

Probabilistic Classifiers DWML, /27

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Metrics Overfitting Model Evaluation Research directions. Classification. Practical Issues. Huiping Cao. lassification-issues, Slide 1/57

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

Lecture Notes for Chapter 4

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

2. On classification and related tasks

MEASURING CLASSIFIER PERFORMANCE

Evaluating Machine Learning Methods: Part 1

CS249: ADVANCED DATA MINING

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery: Practice Notes

Evaluating Machine-Learning Methods. Goals for the lecture

Weka ( )

CSE 5243 INTRO. TO DATA MINING

CS570: Introduction to Data Mining

Chapter 3: Supervised Learning

Data Mining and Knowledge Discovery: Practice Notes

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Further Thoughts on Precision

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Chuck Cartledge, PhD. 23 September 2017

Boolean Classification

Machine Learning: Symbolische Ansätze

F-Measure Curves for Visualizing Classifier Performance with Imbalanced Data

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Pattern recognition (4)

Classification Algorithms in Data Mining

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Information Management course

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

CLASSIFICATION JELENA JOVANOVIĆ. Web:

K- Nearest Neighbors(KNN) And Predictive Accuracy

Machine Learning and Bioinformatics 機器學習與生物資訊學

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Classification Salvatore Orlando

Nächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

INF 4300 Classification III Anne Solberg The agenda today:

Classification and Regression

Classification. Slide sources:

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

[Programming Assignment] (1)

Ester Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

Use of Synthetic Data in Testing Administrative Records Systems

Part II: A broader view

An Empirical Study on Lazy Multilabel Classification Algorithms

Data Mining Concepts & Techniques

Network Traffic Measurements and Analysis

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly

Constrained Classification of Large Imbalanced Data

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?

A Conflict-Based Confidence Measure for Associative Classification

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

โครงการอบรมเช งปฏ บ ต การ น กว ทยาการข อม ลภาคร ฐ (Government Data Scientist)

Data Linkages - Effect of Data Quality on Linkage Outcomes

Evaluating Classifiers

Clustering will not be satisfactory if:

ECE 5470 Classification, Machine Learning, and Neural Network Review

PARALLEL SELECTIVE SAMPLING USING RELEVANCE VECTOR MACHINE FOR IMBALANCE DATA M. Athitya Kumaraguru 1, Viji Vinod 2, N.

Data Mining: Model Evaluation

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Large Scale Data Analysis Using Deep Learning

Tutorials Case studies

Machine Learning Techniques for Data Mining

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Package hmeasure. February 20, 2015

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Multimedia Retrieval. Chapter 1: Performance Evaluation. Dr. Roger Weber, Computer Science / / 2018

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

Transcription:

Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems where the classes are skewed (more records from one class than another) Credit card fraud Intrusion detection Defective products in manufacturing assembly line 02/03/2018 Introduction to Data Mining, 2 nd Edition 2

Challenges Evaluation measures such as accuracy is not well-suited for imbalanced class Detecting the rare class is like finding needle in a haystack 02/03/2018 Introduction to Data Mining, 2 nd Edition 3 Confusion Matrix Confusion Matrix: Class=Yes Class=No Class=Yes a b Class=No c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative) 02/03/2018 Introduction to Data Mining, 2 nd Edition 4

Accuracy Class=Yes Class=No Class=Yes Class=No a (TP) c (FP) b (FN) d (TN) Most widely-used metric: Accuracy = a + d a + b + c + d TP + TN = TP + TN + FP + FN 02/03/2018 Introduction to Data Mining, 2 nd Edition 5 Problem with Accuracy Consider a 2-class problem Number of Class 0 examples = 9990 Number of Class 1 examples = 10 02/03/2018 Introduction to Data Mining, 2 nd Edition 6

Problem with Accuracy Consider a 2-class problem Number of Class NO examples = 990 Number of Class YES examples = 10 If a model predicts everything to be class NO, accuracy is 990/1000 = 99 % This is misleading because the model does not detect any class YES example Detecting the rare class is usually more interesting (e.g., frauds, intrusions, defects, etc) 02/03/2018 Introduction to Data Mining, 2 nd Edition 7 Alternative Measures Class=Yes Class=No Class=Yes a b Class=No c d a Precision (p) = a + c a Recall (r) = a + b 2rp 2a F - measure (F) = = r + p 2a + b + c 02/03/2018 Introduction to Data Mining, 2 nd Edition 8

Alternative Measures Class=Yes 10 0 Class=No 10 980 10 Precision (p) = = 0.5 10 + 10 10 Recall (r) = = 1 10 + 0 2*1*0.5 F - measure (F) = = 0.62 1+ 0.5 Accuracy = 990 1000 = 0.99 02/03/2018 Introduction to Data Mining, 2 nd Edition 9 Alternative Measures Class=Yes 10 0 Class=No 10 980 10 Precision (p) = = 0.5 10 + 10 10 Recall (r) = = 1 10 + 0 2*1*0.5 F - measure (F) = = 0.62 1+ 0.5 Accuracy = 990 1000 = 0.99 Class=Yes Class=No Class=Yes 1 9 Class=No 0 990 1 Precision (p) = = 1 1+ 0 1 Recall (r) = = 0.1 1+ 9 2*0.1*1 F - measure (F) = = 0.18 1+ 0.1 Accuracy = 991 1000 = 0.991 02/03/2018 Introduction to Data Mining, 2 nd Edition 10

Alternative Measures Class=Yes 40 10 Class=No 10 40 Precision (p) = 0.8 Recall (r) = 0.8 F - measure (F) = 0.8 Accuracy = 0.8 02/03/2018 Introduction to Data Mining, 2 nd Edition 11 Alternative Measures Class=Yes 40 10 Class=No 10 40 Precision (p) = 0.8 Recall (r) = 0.8 F - measure (F) = 0.8 Accuracy = 0.8 Class=Yes 40 10 Class=No 1000 4000 Precision (p) = ~ 0.04 Recall (r) = 0.8 F - measure (F) = ~ 0.08 Accuracy = ~ 0.8 02/03/2018 Introduction to Data Mining, 2 nd Edition 12

Measures of Classification Performance Yes No Yes TP FN No FP TN α is the probability that we reject the null hypothesis when it is true. This is a Type I error or a false positive (FP). β is the probability that we accept the null hypothesis when it is false. This is a Type II error or a false negative (FN). 02/03/2018 Introduction to Data Mining, 2 nd Edition 13 Alternative Measures Class=Yes 40 10 Class=No 10 40 Precision (p) = 0.8 TPR = Recall (r) = 0.8 FPR = 0.2 F - measure (F) = 0.8 Accuracy = 0.8 Class=Yes 40 10 Class=No 1000 4000 Precision (p) = ~ 0.04 TPR = Recall (r) = 0.8 FPR = 0.2 F - measure (F) = ~ 0.08 Accuracy = ~ 0.8 02/03/2018 Introduction to Data Mining, 2 nd Edition 14

Alternative Measures Class=Yes 10 40 Class=No 10 40 Precision (p) = 0.5 TPR = Recall (r) = 0.2 FPR = 0.2 Class=Yes 25 25 Class=No 25 25 Precision (p) = 0.5 TPR = Recall (r) = 0.5 FPR = 0.5 Class=Yes 40 10 Class=No 40 10 Precision (p) = 0.5 TPR = Recall (r) = 0.8 FPR = 0.8 02/03/2018 Introduction to Data Mining, 2 nd Edition 15 ROC (Receiver Operating Characteristic) A graphical approach for displaying trade-off between detection rate and false alarm rate Developed in 1950s for signal detection theory to analyze noisy signals ROC curve plots TPR against FPR Performance of a model represented as a point in an ROC curve Changing the threshold parameter of classifier changes the location of the point 02/03/2018 Introduction to Data Mining, 2 nd Edition 16

ROC Curve (TPR,FPR): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (1,0): ideal Diagonal line: Random guessing Below diagonal line: u prediction is opposite of the true class 02/03/2018 Introduction to Data Mining, 2 nd Edition 17 ROC (Receiver Operating Characteristic) To draw ROC curve, classifier must produce continuous-valued output Outputs are used to rank test records, from the most likely positive class record to the least likely positive class record Many classifiers produce only discrete outputs (i.e., predicted class) How to get continuous-valued outputs? u Decision trees, rule-based classifiers, neural networks, Bayesian classifiers, k-nearest neighbors, SVM 02/03/2018 Introduction to Data Mining, 2 nd Edition 18

Example: Decision Trees Decision Tree x2 < 12.63 x1 < 13.29 x2 < 17.35 x1 < 6.56 x1 < 2.15 x1 < 7.24 x2 < 8.64 Continuous-valued outputs x2 < 12.63 x1 < 13.29 x2 < 17.35 x2 < 1.38 x1 < 12.11 x1 < 18.88 x1 < 6.56 0.107 0.059 x2 < 8.64 x1 < 2.15 0.071 x1 < 7.24 0.220 x2 < 1.38 0.164 0.727 x1 < 12.11 0.143 0.669 0.271 x1 < 18.88 0.654 0 02/03/2018 Introduction to Data Mining, 2 nd Edition 19 ROC Curve Example x2 < 12.63 x1 < 13.29 x2 < 17.35 x1 < 6.56 0.059 x1 < 2.15 0.220 0.107 x2 < 8.64 0.071 x1 < 7.24 x2 < 1.38 0.164 0.727 x1 < 12.11 0.143 0.669 0.271 x1 < 18.88 0.654 0 02/03/2018 Introduction to Data Mining, 2 nd Edition 20

ROC Curve Example - 1-dimensional data set containing 2 classes (positive and negative) - Any points located at x > t is classified as positive At threshold t: TPR=0.5, FNR=0.5, FPR=0.12, TNR=0.88 02/03/2018 Introduction to Data Mining, 2 nd Edition 21 Using ROC for Model Comparison No model consistently outperform the other M 1 is better for small FPR M 2 is better for large FPR Area Under the ROC curve Ideal: Area = 1 Random guess: Area = 0.5 02/03/2018 Introduction to Data Mining, 2 nd Edition 22

How to Construct an ROC curve Instance Score True Class 1 0.95 + 2 0.93 + 3 0.87-4 0.85-5 0.85-6 0.85 + 7 0.76-8 0.53 + 9 0.43-10 0.25 + Use a classifier that produces a continuous-valued score for each instance The more likely it is for the instance to be in the + class, the higher the score Sort the instances in decreasing order according to the score Apply a threshold at each unique value of the score Count the number of TP, FP, TN, FN at each threshold TPR = TP/(TP+FN) FPR = FP/(FP + TN) 02/03/2018 Introduction to Data Mining, 2 nd Edition 23 How to construct an ROC curve Class + - + - - - + - + + Threshold >= 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00 TP 5 4 4 3 3 3 3 2 2 1 0 FP 5 5 4 4 3 2 1 1 0 0 0 TN 0 0 1 1 2 3 4 4 5 5 5 FN 0 1 1 2 2 2 2 3 3 4 5 TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0 FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0 ROC Curve: 02/03/2018 Introduction to Data Mining, 2 nd Edition 24

Handling Class Imbalanced Problem Class-based ordering (e.g. RIPPER) Rules for rare class have higher priority Cost-sensitive classification Misclassifying rare class as majority class is more expensive than misclassifying majority as rare class Sampling-based approaches 02/03/2018 Introduction to Data Mining, 2 nd Edition 25 Cost Matrix Cost Matrix Class=Yes f(yes, Yes) f(yes,no) Class=No f(no, Yes) f(no, No) C(i, j) Class=Yes C(Yes, Yes) C(Yes, No) Class=No C(No, Yes) C(No, No) C(i,j): Cost of misclassifying class i example as class j Cost = C( i, j) f ( i, j) 02/03/2018 Introduction to Data Mining, 2 nd Edition 26

Computing Cost of Classification Cost Matrix C(i,j) + - + -1 100-1 0 Model M 1 + - + 150 40-60 250 Model M 2 + - + 250 45-5 200 Accuracy = 80% Cost = 3910 Accuracy = 90% Cost = 4255 02/03/2018 Introduction to Data Mining, 2 nd Edition 27 Cost Sensitive Classification Example: Bayesian classifer Given a test record x: u Compute p(i x) for each class i u Decision rule: classify node as class k if k = arg max i p( i x) For 2-class, classify x as + if p(+ x) > p(- x) u This decision rule implicitly assumes that C(+ +) = C(- -) = 0 and C(+ -) = C(- +) 02/03/2018 Introduction to Data Mining, 2 nd Edition 28

Cost Sensitive Classification General decision rule: Classify test record x as class k if 2-class: k = argmin p( i x) C( i, j) Cost(+) = p(+ x) C(+,+) + p(- x) C(-,+) Cost(-) = p(+ x) C(+,-) + p(- x) C(-,-) Decision rule: classify x as + if Cost(+) < Cost(-) u if C(+,+) = C(-,-) = 0: j i C(, + ) p( + x) > C(, + ) + C( +, ) 02/03/2018 Introduction to Data Mining, 2 nd Edition 29 Sampling-based Approaches Modify the distribution of training data so that rare class is well-represented in training set Undersample the majority class Oversample the rare class Advantages and disadvantages 02/03/2018 Introduction to Data Mining, 2 nd Edition 30