Multi-Label Classification with Conditional Tree-structured Bayesian Networks

Size: px

Start display at page:

Download "Multi-Label Classification with Conditional Tree-structured Bayesian Networks"

Neal Booker
5 years ago
Views:

1 Multi-Label Classification with Conditional Tree-structured Bayesian Networks Original work: Batal, I., Hong C., and Hauskrecht, M. An Efficient Probabilistic Framework for Multi-Dimensional Classification. CIKM Charmgil Hong Department of Computer Science University of Pittsburgh

2 Machine Learning 2

3 3

4 Motivation Traditional classification Each data instance is associated with a single class variable 4

5 Motivation Multi-label classification In many real-world applications, each data instance can be associated with multiple class variables Examples 5

6 Motivation Multi-label classification In many real-world applications, each data instance can be associated with multiple class variables Examples A news article may cover multiple topics such as politics and economics 6

7 Motivation Multi-label classification In many real-world applications, each data instance can be associated with multiple class variables Examples An image may include multiple objects as beach, building and bridge Taken from the Scene dataset [Boutell et al., 04] 7

8 Motivation Multi-label classification In many real-world applications, each data instance can be associated with multiple class variables Examples A gene may be associated with several biological functions Image courtesy of ADAM Education 8

9 Motivation Multi-label classification Each data instance is associated with multiple class variables Objective: assign to each instance the most probable assignment of the class variables Class 1 { red, blue } Class 2 {,, } 9

10 Motivation Simplest solution Learning d independent classifiers for d class labels It does not capture the dependency relations between the classes Class 1 { red, blue } Class 2 {,, } 10

11 CTBN Conditional Tree-structured Bayesian Network 11

12 CTBN Conditional Tree-structured Bayesian Network (CTBN) for modeling P(Y1,...,Yd X) A class variable can have at most one other class variable as a parent (the dependencies among classes form a directed tree) The feature vector X is the common parent for all class variables X Y1 Y2 Y3 Y4 An example CTBN 12

13 CTBN Conditional Tree-structured Bayesian Network (CTBN) for modeling P(Y1,...,Yd X) A class variable can have at most one other class variable as a parent (the dependencies among classes form a directed tree) The feature vector X is the common parent for all class variables X Y1 Y2 Y3 Y4 An example CTBN 13

14 CTBN Conditional Tree-structured Bayesian Network (CTBN) for modeling P(Y1,...,Yd X) A class variable can have at most one other class variable as a parent (the dependencies among classes form a directed tree) The feature vector X is the common parent for all class variables X Y1 Y2 Y3 Y4 An example CTBN 14

15 CTBN Conditional Tree-structured Bayesian Network (CTBN) for modeling P(Y1,...,Yd X) A class variable can have at most one other class variable as a parent (the dependencies among classes form a directed tree) The feature vector X is the common parent for all class variables We restrict the dependency structure to a tree because: 1. The optimal structure can be learned efficiently (coming up) 2. Exact inference can be done in O(d) time (not in this talk) 15

16 Representation The conditional class distribution is: It is the product of the dependencies in the network 16

17 Representation The conditional class distribution is: the parent of yi in CTBN T It is the product of the dependencies in the network 17

18 Representation The conditional class distribution is: the parent of yi in CTBN T It is the product of the dependencies in the network X Y1 Y2 Y3 Y4 An example CTBN 18

19 Representation The conditional class distribution is: It is the product of the dependencies in the network X P(Y1 X,Y2) Y1 Y2 Y3 Y4 An example CTBN 19

20 Representation The conditional class distribution is: It is the product of the dependencies in the network X This network represents Y1 Y2 Y3 Y4 20

21 Representation The conditional class distribution is: It is the product of the dependencies in the network Each P(yi x, yπ(i,t)) is represented by classifier functions. X This network represents Y1 Y2 Y3 Y4 21

22 Representation The conditional class distribution is: It is the product of the dependencies in the network Each P(yi x, yπ(i,t)) is represented by classifier functions. f1,0(x) = P(Y1 X,Y2 = 0) Y2 {0, 1} X f1,1(x) = P(Y1 X,Y2 = 1) Y1 Y2 Y3 Y4 22

23 Structure learning Objective: Find the tree structure that best approximates P(Y X), i.e., that maximizes the conditional log-likelihood of data Idea: Cast the structure learning into the maximum branching tree problem [Edmonds, 67] Next: illustration through the example CTBN X Y1 Y2 Y3 Y4 23

24 Structure learning 1. Define a complete weighted directed graph G 24

25 Structure learning 1. Define a complete weighted directed graph G Draw d nodes for all class variables Yi : i {1,..., d} Y1 Y2 Y4 Y3 25

26 Structure learning 1. Define a complete weighted directed graph G Draw d nodes for all class variables Yi : i {1,..., d} Connect all the node pairs and add self-loops with directed edges Y1 Y2 Y4 Y3 26

27 Structure learning 2. Compute the edge weights using conditional log-likelihood of the data: Wφ 1 Y1 W1 2 W2 1 Y2 Wφ 2 W1 4 W4 1 W3 1 W2 4 W4 2 W1 3 W2 3 W3 2 Wφ 4 Y4 W4 3 W3 4 Y3 Wφ 3 27

28 Structure learning 3. Find the tree that maximizes the sum of the edge weights by solving the maximum branching tree problem Wφ 1 Y1 W1 2 W2 1 Y2 Wφ 2 W1 4 W4 1 W3 1 W2 4 W4 2 W1 3 W2 3 W3 2 Wφ 4 Y4 W4 3 W3 4 Y3 Wφ 3 28

29 Structure learning 4. The corresponding CTBN model is... Wφ 1 Y1 W1 2 W2 1 Y2 Wφ 2 Wφ 4 W1 4 W4 1 Y4 W3 1 W2 4 W4 2 W1 3 W4 3 W3 4 W2 3 W3 2 Y3 Wφ 3 X Y1 Y2 Y3 Y4 29

30 Experiments Compared methods Binary Relevance (BR) [Boutell et al., 04, Clare et al., 01] Classification with heterogeneous features (CHF) [Godbole and Sarawagi, 04] Multi-label k-nearest neighbor (MLKNN) [Zhang and Zhou, 07] Instance-based learning by logistic regression (IBLR) [Cheng and Hüllermeier, 09] Classifier chains (CC) [Read et al., 09] Maximum margin output coding (MMOC) [Zhang and Schneider, 12] 30

31 Experiments Data 10 publicly available datasets from different domains Dataset # Instances # Features # Classes Domain Emotions Music Yeast 2, Biology Scene 2, Image Enron 1,702 1, Text TMC ,596 30, Text RCV1_subset1 6,000 8, Text RCV1_subset2 6,000 8, Text RCV1_subset3 6,000 8, Text RCV1_subset4 6,000 8, Text RCV1_subset5 6,000 8, Text 31

32 Experiment Results Exact Match Accuracy The probability of all classes being predicted correctly (higher is better) 32

33 Experiment Results Exact Match Accuracy The probability of all classes being predicted correctly (higher is better) Dataset BR CHF MLKNN IBLR CC MMOC CTBN Emotions Yeast Scene Enron TMC RCV1_subset Could RCV1_subset not RCV1_subset finish RCV1_subset RCV1_subset #win-tie-loss

34 Experiment Results Normalized conditional log-likelihood loss Negative log-likelihood normalized on each dataset (lower is better) 34

35 Experiment Results Normalized conditional log-likelihood loss Negative log-likelihood normalized on each dataset (lower is better) 35

36 Conclusion We proposed a novel probabilistic approach to multilabel classification CTBN encodes the conditional dependence relations between classes Efficient structure learning and exact inference algorithms are presented Our approach outperforms several state-of-the-art methods 36

37 Thank you! 37

An Efficient Probabilistic Framework for Multi-Dimensional Classification

An Efficient Probabilistic Framework for Multi-Dimensional Classification Iyad Batal Computer Science Dept. University of Pittsburgh iyad@cs.pitt.edu Charmgil Hong Computer Science Dept. University of