Ester Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

Similar documents
Model s Performance Measures

CS145: INTRODUCTION TO DATA MINING

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

CS249: ADVANCED DATA MINING

Evaluating Classifiers

Classification Part 4

Evaluating Classifiers

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Constrained Classification of Large Imbalanced Data

CS4491/CS 7265 BIG DATA ANALYTICS

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

PARALLEL SELECTIVE SAMPLING USING RELEVANCE VECTOR MACHINE FOR IMBALANCE DATA M. Athitya Kumaraguru 1, Viji Vinod 2, N.

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

INF 4300 Classification III Anne Solberg The agenda today:

Information Management course

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Weka ( )

List of Exercises: Data Mining 1 December 12th, 2015

SYED ABDUL SAMAD RANDOM WALK OVERSAMPLING TECHNIQUE FOR MI- NORITY CLASS CLASSIFICATION

Lecture 25: Review I

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Evaluating Classifiers

Introduction to Data Mining and Data Analytics

Classification. Instructor: Wei Ding

Hybrid classification approach of SMOTE and instance selection for imbalanced. datasets. Tianxiang Gao. A thesis submitted to the graduate faculty

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy

Identification of the correct hard-scatter vertex at the Large Hadron Collider

ECE 5470 Classification, Machine Learning, and Neural Network Review

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Using Machine Learning to Optimize Storage Systems

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

Classification of Data Streams with Skewed Distribution

Pattern recognition (4)

Classifying Imbalanced Data Sets Using. Similarity Based Hierarchical Decomposition

Evaluating Machine-Learning Methods. Goals for the lecture

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Evaluating Machine Learning Methods: Part 1

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

CS 584 Data Mining. Classification 3

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

CHALLENGES IN HANDLING IMBALANCED BIG DATA: A SURVEY

Artificial Intelligence. Programming Styles

Machine Learning and Bioinformatics 機器學習與生物資訊學

A Feature Selection Method to Handle Imbalanced Data in Text Classification

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems

Generalizing Rules by Random Forest-based Learning Classifier Systems for High-Dimensional Data Mining

2. On classification and related tasks

Use of Synthetic Data in Testing Administrative Records Systems

Data Mining Classification: Bayesian Decision Theory

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets

Probabilistic Classifiers DWML, /27

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques

Racing for Unbalanced Methods Selection

The k-means Algorithm and Genetic Algorithm

MEASURING CLASSIFIER PERFORMANCE

Fraud Detection using Machine Learning

A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Classification of weld flaws with imbalanced class data

Network Traffic Measurements and Analysis

Application of Support Vector Machine In Bioinformatics

A Preliminary Study on the Selection of Generalized Instances for Imbalanced Classification

ECLT 5810 Evaluation of Classification Quality

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Information Integration of Partially Labeled Data

Lina Guzman, DIRECTV

Chapter 3: Supervised Learning

Automatic Detection of Change in Address Blocks for Reply Forms Processing

DATA MINING OVERFITTING AND EVALUATION

Fast or furious? - User analysis of SF Express Inc

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Hybrid probabilistic sampling with random subspace for imbalanced data learning

Cost sensitive adaptive random subspace ensemble for computer-aided nodule detection

10701 Machine Learning. Clustering

CISC 4631 Data Mining

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Predict Employees Computer Access Needs in Company

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

Machine learning in fmri

A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data

Transcription:

Learning Classifier Systems for Class Imbalance Problems Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

Aim Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains

Framework Dataset LCS Representativity of the target concept Geometrical complexity Class imbalance model + estimated performance Evolutionary pressures Interpretability Domain of applicability Noise

Class Imbalance When one class is represented by a small number of examples, compared to other class/es. Usually the class of that describes the circumscribed concept (positive class) is the minority class Where? Rare medical diagnoses Fraud detection Oil spills in satellite images

Class Imbalance and Classifiers Is there a bias towards the majority class? Probably, because Most classifier schemes are trained to minimize the global error As a result They classify accurately the examples from the majority class They tend to misclassify the examples of the minority class, which are often those representing the target concept.

Measures of Performance Confusion matrix Actual A B Prediction A B true positive (TP) false negative (FN) false positive (FP) true negative (TN) Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP) ROC curves

The Higher Class Imbalance: the Higher Bias? Dataset 1 concept: 15 counterpart: 150 ratio: 10:1 Dataset 2 concept: 15 counterpart: 45 ratio: 3:1

XCS XCS input Set of Rules class search update Genetic Algorithms Reinforcement Learning reward Environment Dataset

Our Approach with XCS Bounding XCS s parameters for unbalanced datasets Online identification of small disjuncts Adaptation of parameters for the discovery of small disjuncts

XCS s Behavior in Unbalanced Datasets Unbalanced 11-multiplexer problem ir=16:1 ir=32:1 ir=64:1

XCS s Population Most numerous rules, ir=128:1 Classifier P Error F Num ###########:0 1000 0.12 0.98 385 ###########:1 1.2 10-4 0.074 0.98 366 overgeneral classifiers estimated prediction: 992.24 7.75 estimated error: 15.38 high fitness too high numerosity Test examples are classified as belonging to the majority class

How Imbalance Affects XCS Classifier s error Stability of prediction and error estimates Occurrence-based reproduction

Classifier s Error in Unbalanced Datasets Will an overgeneral classifier be detected as inaccurate if the imbalance ratio is high? Bound for inaccurate classifier:!"! 0 Given the estimated prediction and error: We derive: where For P = P ( cl) R c "= P! R max max + (1! P ( cl)) R P ( cl) + P! R 2 #" o p + 2 p( Rmax #" 0) #" 0! 0 p =! C / C Rmax = 1000! 0 = 1 c c min min (1! P ( cl)) c!"! 0 we get maximum imbalance ratio: ir max =1998

Prediction and Error Estimates and Learning Rate ir=128:1, ###########:0 Prediction Error β=0.002 β=0.2

Occurrence-based Reproduction Probability of occurrence (p occ ) Given ir=maj/min: 0,6 Classifier p occb p occi ########### :0 1/2 1/2 ########### :1 1/2 1/2 0000#######:0 1/32 0001#######:1 1/32 probability of occurrence 0,5 0,4 0,3 0,2 0,1 0 1 2 4 8 16 32 64 128 256 imbalance ratio 00001######:1 00000######:0 ###########:0 ###########:1

Occurrence-based Reproduction Probability of reproduction (p GA ) p GA where 1 = T GA T GA $ #% GA if Tocc < % "! Tocc otherwise GA With θ GA =20: T (###########: 0)! " GA GA T occ GA θ GA T (0000#######: 0)! T 1 GA occ T occ GA 1 Assuming non-overlapping θ GA

Guidelines for Parameter Tuning R max and є 0 determine the threshold between negligible noise and imbalance ratio β determines the size of the moving window. The window should be high enough to allow computing examples from both classes: f! = k min f maj θ GA can counterbalance the reproduction opportunities of most frequent (majority) and least frequent niches (minority):! = k' 1 GA f min

XCS with Parameters Tuning XCS with standard settings XCS with parameter tuning ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1

XCS Tuning for Real-world Datasets How we can estimate the niche frequency? Estimate from the ratio of majority class instances and minority class instances Problem: This may not be related to the distribution of niches in the feature space Take the approach to the small disjuncts problem

Online Identification of Small Disjuncts We search for regions that promote overgeneral classifiers Estimate ir cl based on the classifier s experience on each class: ir cl = exp exp max min Adapt β and θ GA according to ir cl ir cl = 20 / 4

Online Parameter Adaptation ir=256:1

What about UCS? Supervised XCS: Needs less exploration Avoids XCS s fitness dilemma More robust to parameter settings Overgeneral classifiers also tend to overcome the population Their probability of occurrence depends on the imbalance ratio Partially minimized with fitness sharing

What about UCS? ir=512:1 ir=256:1

Are LCSs more error-prone to class imbalance than other classifier schemes? 24,99% 30,50% 17,07% 9,50% 12,87% 41,00% wpbc 4,52% 98,57% 6,02% 97,14% 11,70% 90,18% wine2c3 5,27% 98,33% 5,27% 98,33% 8,05% 95,00% wine2c2 10 10 16,63% 89,00% wine2c1 6,37% 93,83% 2,71% 94,81% 5,09% 92,47% wdbc 5,89% 95,83% 5,36% 95,42% 3,42% 92,95% wbdc 3,65% 87,86% 3,48% 89,97% 2,38% 77,64% wab2c3 2,57% 78,72% 4,05% 84,57% 3,89% 72,34% wav2c2 3,43% 87,24% 3,20% 88,51% 4,06% 75,74% wav2c1 8,05% 90,71% 21,35% 33,81% 10,34% 90,95% thy2c3 14,93% 90,83% 24,92% 54,17% 12,45% 94,17% thy2c2 16,10% 9 22,50% 76,67% 16,10% 9 thy2c1 5,72% 92,58% 6,17% 84,11% 2,14% 95,23% tao 9,75% 55,93% 6,42% 53,38% 13,27% 55,37% pim 9,78% 8 7,03% 8 13,29% 75,83% h-s 13,95% 81,79% 15,13% 59,82% gls2c6 14,21% 84,29% 9,64% 1 16,77% 77,14% gls2c5 25,40% 81,67% 25,40% 81,67% 32,63% 75,00% gls2c4 15,81% 5,00% 42,16% 3 gls2c3 49,72% 55,00% 33,75% 15,00% 47,43% 35,00% gls2c2 52,70% 5 42,16% 8 gls2c1 9,10% 61,38% 14,09% 42,95% bpa 6,88% 83,99% 5,59% 93,77% 6,04% 81,90% Bal2c3 6,00% 81,96% 4,64% 93,72% 6,83% 81,65% Bal2c2 Bal2c1 XCS SMO C4.5 TP rate

How can we Minimize the Effects of Small Disjuncts? Resampling the dataset: Classical methods: Random oversampling Random undersampling Heuristic methods: Tomek links CNN One-sided selection Smote Cluster-based oversampling Cost-sensitive classifiers Addresses small disjuncts Assumes that clusterization will find small disjuncts and match classifier s approximation Could XCS benefit from the online identification of small disjuncts?

Domains of Applicability Should we use some counterbalancing scheme? Which learning scheme should we use? Is there a combination of counterbalancing scheme+learner that beats all others? How can we know the presence of small disjuncts? Are there other complexity factors mixed up with the small disjuncts problem?

Domains of Applicability Learn it! Resampling/ Classifier/ Resampling+classifier Where are LCSs placed? Dataset Dataset characterization Prediction Suggested approach Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors

Future Directions Potential benefit of XCS to discover small disjuncts and learn from it online Further analyze UCS How do LCSs perform w.r.t. other classifiers for unbalanced datasets? Measures for small disjuncts identification and other possible complexity factors What is noise and what is a small disjunct? In which cases a LCS is applicable?

Learning Classifier Systems for Class Imbalance Problems Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain