Multi-label classification using rule-based classifier systems

Similar documents
An Empirical Study of Lazy Multilabel Classification Algorithms

Contents. Preface to the Second Edition

Classification. 1 o Semestre 2007/2008

CS145: INTRODUCTION TO DATA MINING

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Data Preprocessing. Supervised Learning

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Chapter 8 The C 4.5*stat algorithm

Classification Algorithms in Data Mining

CSE 158. Web Mining and Recommender Systems. Midterm recap

Naïve Bayes for text classification

Clustering Lecture 5: Mixture Model

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Using Machine Learning to Optimize Storage Systems

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

April 3, 2012 T.C. Havens

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Tutorials Case studies

Random Forest A. Fornaser

Nearest neighbor classification DSE 220

Supervised vs unsupervised clustering

6.034 Quiz 2, Spring 2005

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Semi-Supervised Clustering with Partial Background Information

Machine Learning: Algorithms and Applications Mockup Examination

CSEP 573: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Evolutionary Algorithms. CS Evolutionary Algorithms 1

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Evaluating Classifiers

Lecture #11: The Perceptron

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Based on Raymond J. Mooney s slides

PARALLEL CLASSIFICATION ALGORITHMS

Data Mining and Analytics

Data Mining Practical Machine Learning Tools and Techniques

Machine Learning Classifiers and Boosting

Clustering: Classic Methods and Modern Views

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Information Fusion Dr. B. K. Panigrahi

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

An Empirical Study on Lazy Multilabel Classification Algorithms

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Topics in Machine Learning

Association Rule Mining and Clustering

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Chapter 3: Supervised Learning

Bayesian model ensembling using meta-trained recurrent neural networks

CS 229 Midterm Review

arxiv: v2 [cs.lg] 11 Sep 2015

10-701/15-781, Fall 2006, Final

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CSE4334/5334 DATA MINING

Network Traffic Measurements and Analysis

Bus Detection and recognition for visually impaired people

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

Natural Language Processing

Search Engines. Information Retrieval in Practice

Unsupervised Learning

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

1) Give decision trees to represent the following Boolean functions:

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

ECG782: Multidimensional Digital Signal Processing

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Regularization and model selection

TEXT CATEGORIZATION PROBLEM

Semi-supervised learning and active learning

CSE 573: Artificial Intelligence Autumn 2010

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

SOCIAL MEDIA MINING. Data Mining Essentials

Application of Support Vector Machine Algorithm in Spam Filtering

Performance Evaluation of Various Classification Algorithms

Collective classification in network data

1 Training/Validation/Testing

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Unsupervised Learning : Clustering

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Table Of Contents: xix Foreword to Second Edition

Trade-offs in Explanatory

k-nearest Neighbor (knn) Sept Youn-Hee Han

Loop detection and extended target tracking using laser data

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

A Dendrogram. Bioinformatics (Lec 17)

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Semi-supervised Learning

Machine Learning in Biology

Unsupervised Learning

Transcription:

Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

outline Motivation Introduction Multi-label classification overview Confidence level in prediction Multi-label classification using learning classifier systems (LCSs) Simulation results Conclusion and future works 2

Motivation Data-driven techniques are ubiquitous in many applications such as classification, estimation and modeling In some classification applications, samples in the data set attribute to more than one class simultaneously Multi-label classification methods that solve a single problem are in advantage The level of confidence in assigned labels to the samples, is vital to train an accurate machine When modeling a dynamical system, the overlap among adjacent sub-models can be handled using multi-label data with appropriate confidence levels 3

Introduction Multi-class classification Multi-label classification 4

Introduction Multi-class classification Multi-label classification In contrast to simple binary-class classification, each instance of the data set belongs to one of (M > 2) different classes The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs One-vs-all: trains M binary classifiers one for each class One-vs-all: trains M(M 1) classifiers to distinguish each pair of classes Decision tress, naïve Bayes, neural networks, 5

Introduction Multi-class classification Multi-label classification In contrast to conventional (single-label) classification, the setting of multi-label classification (MLC) allows an instance to belong to several classes simultaneously. Multi-label classification tasks are ubiquitous in real-world problems Text categorization: each document may belong to several predefined topics Bioinformatics: one protein may have many effects on a cell when predicting its functional classes 6

Definitions Notation: D: multi label data set H: X Y i, Y i ε Y Y = y 1, y 2,, y l Label cardinality of D: the average number of labels of the examples in D Label density of D: the average number of labels of the examples in D divided by Y Hamming loss: Ranking loss: HL H, D = 1 D RL f = 1 D D i=1 D i=1 Y i Y i Y 1 Y Y R(x) 7

MLC methods Problem transformation methods Algorithm adaptation methods 8

MLC methods Problem transformation methods Select family: discards ML data or selects one of the multiple labels for each instance It discard a lot of information content in the original dataset Label power set method: considers each different set of labels, as a single label It may lead to large number of classes with a few examples per class Binary relevance: learns Y binary classifiers one for each different label The most common problem transformation method Ranking by pairwise comparison: generates Y data sets 2 binary label Outputs a ranking of labels based on the votes from binary classifiers 9

MLC methods Problem transformation methods Select family: discards ML data or selects one of the multiple labels for each instance It discard a lot of information content in the original dataset Label power set method: considers each different set of labels, as a single label It may lead to large number of classes with a few examples per class Random k-labelsets: breaks the initial set of labels into small random, disjoint or overlapping, subsets Improves label power set results, still is challenged with domains with large number of labels and instances 10

MLC methods Algorithm adaptation methods Decision trees: C4.5 was adapted to learn ML data ML models that are understandable by human Probabilistic methods: proposed for text classification, a generative model is trained according to which, each label generates different words The ML document in generated by a mixture of the word distributions of its labels using EM Neural networks: the back-propagation algorithm is adapted by introduction of a new error function similar to ranking loss Lazy methods: k-nearest neighbors algorithm is used to maximize the posterior probability of labels assigned to new instances Outputs a ranking function for the probability of each label 11

MLC methods Algorithm adaptation methods Support vector machines: the one-versus-one strategy is used to partition a dataset with Y labels into Y 2 double label subsets. Assumes double label instances are located at marginal region between positive and negative instances Associative classification methods: constructs classification rule sets using associative rule mining MMAC learns an initial set of rules, removes the examples associated with this rule set, and recursively learns a new rule set from the remaining examples until no further frequent items are left. 12

Confidence in prediction The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of weaker hypotheses Confidence scores give a reliability of each prediction Classification methods like probabilistic approaches and logistic regression, output a value as a probability of a label to be true The idea of confidence in prediction can be extended to one step prior to training 13

Confidence in prediction The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of weaker hypotheses Confidence scores give a reliability of each prediction Classification methods like probabilistic approaches and logistic regression, output a value as a probability of a label to be true The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 14

Confidence in prediction The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of weaker hypotheses Confidence scores give a reliability of each prediction Classification methods like probabilistic approaches and logistic regression, output a value as a probability of a label to be true The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert The hypothesis will learn confidence levels and output a confidence degree along with its predicted labels for new instances 15

Notations X denotes the instance space and Y = {y 1, y 2,, y k } is the finite set of class labels Each instance x X is associated with a subset of labels y Y D is the set of data D = { x 1, λ 1, C 1, x 2, λ 2, C 2, x n, λ n, C n } λ i is the binary relevance vector of labels for instance x i λ i,j = {1: y j y, 0: y j y i 1, n, j [1, k]} H: X (Y, C), outputs a set of predicted labels (Y) along with a vector of confidence level (W)of the hypothesis in each of the labels 16

LCS structure A strength based Michigan-style classifier system has been used to extract knowledge from ML data Michigan-style classifier system are rule-based and supervised learning systems with a fixed rule length Genetic algorithm acts as a driving force to help evolve useful rules Classification model consists of a population of rule in the form of IF condition-then action Originally structured for learning binary classification problems Isolated structure of the action part of the classifiers, lets further modifications to adapt to more general cases of classification problems, namely multi-class and multi-label 17

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters Data set: a set of triples in the form of: (sample, label, confidence level) Training instance: randomly drawn individual from the data set 19

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters [P]: population of rules/classifiers Classifier parameters: Condition Action Strength (S) Confidence estimate W = w 1, w 2,, w k Confidence error (ε) 20

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters Condition: For binary-valued attributes composed of {0,1, #} For real-valued attributes takes the form of an ordered list of pairs of center and spread (c i, s i ) 21

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters Action: is an ordered list of 0,1 Example: labels for a sample drawn from a four class data set "0110 Confidence level for this label set C = [0 1 0.9 0] 22

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters [M]: matching classifiers with provided instance c i s i < x i < c i + s i Covering: creates a matching classifier if [M] is empty 23

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters CR: conflict resolution Uses bidding to identify the classifier that gets to classify the instance B = Sμe αε μ is a function of specificity or generality of the classifier 24

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters : classifiers having the same action as the winning classifier : [M] Genetic algorithm: randomly picks two classifiers from and creates two offsprings Off-springs are replaced into the [P] 25

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters Genetic algorithm: favors classifiers with higher fitness value and lower confidence estimate error simultaneously 26

LCS structure Training instance Data set Covering [P] Model Genetic algorithm CR [M] Update rule parameters Taxes are deducted from classifiers in both sets ε i = W i C 1 Delta rule update scheme W i W i + β C W i Fitness and error proportionate recourse sharing scheme S i e αε i R i = S j e αε R j 0 j 27

LCS structure Covering [P] Training instance Genetic algorithm Data set Model CR [M] Update rule parameters Model: the population of trained classifiers (rules) that collectively solve the classification problem, after proper number of training iterations 28

Performance measures Hamming loss is employed as a measure of accuracy and plotted against training iterations The average confidence estimate error of the population is plotted against training iterations In the test stage The prediction of the model is generated based on the votes from classifiers that match the instance The confidence level of classification is reported as the weighted average of the confidence estimates of the classifiers that match the instance 29

Simulation results Artificial Binary-valued data set: five attributes and two classes Artificial real-valued data set: four attributes and two classes, attribute range is ( 0.5, 0.5) 30

Simulation results Iris data: a three class data set with 50 samples per class All data are used for training Results averaged over 10 runs Method OVO SVM MLP Logistic Regression Random Forest Accuracy 97.33 99.48 98 100 98 LCS 31

Conclusion and future work Strength-based learning classifier system is employed to design an embedded MLC algorithm Classifier structure is adapted to handle confidence level in labels provided in the training set Model is tested on one real-world data set and two artificial datasets and results are provided Appropriate performance measures for test accuracy needs to be implemented MLC method discussed here will be extended to accuracybased classifier system (UCS) 32

Thank you for your attention! Your questions are welcome and feedback are appreciated! 33