The Multilabel Naive Credal Classifier

Size: px

Start display at page:

Download "The Multilabel Naive Credal Classifier"

Maurice Johns
6 years ago
Views:

1 The Multilabel Naive Credal Classifier Alessandro Antonucci and Giorgio Corani Istituto Dalle Molle di Studi sull Intelligenza Artificiale - Lugano (Switzerland) ISIPTA 15, Pescara, July 21st, 2015

2 IPG IDSIA USI SUPSI LUGANO

3 IPG IDSIA USI SUPSI LUGANO

4 IPG IDSIA USI SUPSI LUGANO

5 IPG IDSIA USI SUPSI LUGANO University of Applied Sciences and Arts of Southern Switzerland (supsi.ch) Università della Svizzera Italiana (usi.ch)

6 IPG IDSIA USI SUPSI LUGANO

7 Chronology (Acknowledgements) Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman) ISIPTA 01 ISIPTA 11 IJCAI-13 NIPS 14 ISIPTA 15 Bayes nets as multilabel classifiers by Denis (Mauá) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

8 Chronology (Acknowledgements) Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman) ISIPTA 01 ISIPTA 11 IJCAI-13 NIPS 14 ISIPTA 15 Bayes nets as multilabel classifiers by Denis (Mauá) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

9 Chronology (Acknowledgements) Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman) ISIPTA 01 ISIPTA 11 IJCAI-13 NIPS 14 ISIPTA 15 Bayes nets as multilabel classifiers by Denis (Mauá) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

10 Chronology (Acknowledgements) Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman) ISIPTA 01 ISIPTA 11 IJCAI-13 NIPS 14 ISIPTA 15 Bayes nets as multilabel classifiers by Denis (Mauá) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

11 Chronology (Acknowledgements) Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman) ISIPTA 01 ISIPTA 11 IJCAI-13 NIPS 14 ISIPTA 15 Bayes nets as multilabel classifiers by Denis (Mauá) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

two (or more) colors Possible values in 2 C, a multilabel task!

12 Single- vs. multi-label classification A (fictious) classifier to detect eyes color Possible classes C := {brown,green,blue} SINGLE-LABEL Heterochromia iridum: two (or more) colors Possible values in 2 C, a multilabel task! Trivial approaches Standard classification over the power set Exponential in the number of labels! Each label as a separate Boolean variable a (standard) classifier for each label Ignored relations among classes! C = green MULTI-LABEL Graphical models (GMs) to depict relations among class labels (and features) Classification as (standard) inference in GMs C = {blue,brown}

13 Credal classifiers are not (yet) multilabel classifiers Class variable C and (discrete) features F, a test instance f Standard (single-label) classifier are maps: F C learn P(C, F ) from data and return c := arg max c C P(c, f ) Multi-label classifiers: F 2 C C = (C 1,..., C n) as an array of Boolean vars, one for each label learn P(C, F ) and solve the MAP task c := arg max c {0,1} n P(c, f ) Credal (single-label) classifiers: F 2 C learn credal set K(C, F ) and return all c C s.t. c : P(c, f ) > P(c, f ) P(C, F ) K(C, F ) Multilabel credal classifier (MCC): F 2 2C learn credal set K(C, F ) and return all sequences c s.t. c : P(c, f ) > P(c, f ) P(C, F ) K(C, F )

14 Compact Representation of the Output Output of a MCC might be exponentially large Jasper & Gert s idea to fix this with imprecise HMMs (Viterbi): decide whether or not there is at least an optimal sequence sucht that a variable is in a particular state (for each variable and state) With MCCs, for each class label, we can decide whether: the class is active for all the optimal sequences the class is inactive fro all the optimal sequences there are optimal sequences with the label active, and others with the label inactive Optimization task min c :c l max inf =0/1 c P(C,F ) K(C,F ) P(c, f ) P(c, f ) 1 O(2 treewidth ) for separately specified credal nets (e.g., local IDM) More complex with non-separate specifications

15 C F 1 F 2... F m NBC

16 C F 1 F 2... F m NCC=NBC+IDM

17 C 1 Multi-label? Naive topology over classes C 2... C n F 1 F 2... F m Structural learning to bound # of parents of the features and to select the super-class C 1

18 F 1 1 F F 1 m Features replicated: tree topology C 1 MNBC F n 1 F n 2... F n m C 2... C n F 2 1 F F 2 m

19 F 1 1 F F 1 m Features replicated: tree topology C 1 MNBC + IDM = MNCC F1 n F2 n... Fm n C 2... C n F1 2 F Fm 2

20 During the poster session I can Explain some detail about the learning of the structure Explain the feature replication trick (tis makes inference simpler) Explain the non-separate IDM-based quantification of the model Explain the detail of the (convex) optimization...

21 MNCC: the algorithm Input: test instance f (+ dataset D) / Output initialized: C 1 C 2... C n active inactive for l = 1,..., n do for c l = 0, 1 do if min c :c l =c l max c inf t P t(c,f ) P t(c,f ) 1 then Output(l,c l )=1 end if end for end for linear representation of a (exponential) number of maximal seqs

22 Testing MNCC Preliminary tests on real-world datasets Data set Classes Features Instances Emotions 6 44/ Scene 6 224/ E-mobility 10 14/ Slashdot / Perfomance described by: % of instance s.t. all maximal seqs all in the same state Accuracy of the precise model when MNCC is determinate Accuracy of the precise model when MNCC is indeterminate

23 C 1 C 2 C 3 C 4 C 5 C 6 Emotions

24 C 1 C 2 C 3 C 4 C 5 C 6 Scene

25 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C E-mobility

26 C15 C2 C13 C20 C5 C6 C8 C17 C3 C7 C14 C11 C18 C16 C10 C4 C9 C19 C22 C21 C12 C Slashdot

27 Conclusions, Outlooks and Acks Among the first tools for robust multilabel classification Still lots of things to do: Extension to multidimensional/hierarchical case Extension to continuous variables (features) Extension to continuous class (multi-target interval-valued regression) More complex topologies (ETAN, de Campos, 2014) Variational approach to features replication Not only 0/1 losses (imprecise losses?)

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables Scanagatta, M., Corani, G., de Campos, C. P., & Zaffalon, M. (2016). Learning Treewidth-Bounded Bayesian Networks with Thousands