MI2LS: Multi-Instance Learning from Multiple Information Sources

Size: px

Start display at page:

Download "MI2LS: Multi-Instance Learning from Multiple Information Sources"

Britney Lawson
5 years ago
Views:

1 MI2LS: Multi-Instance Learning from Multiple Information Sources Dan Zhang 1, Jingrui He 2, Richard Lawrence 3 1 Facebook Incorporation, Menlo Park, CA 2 Stevens Institute of Technology Hoboken, NJ 3 IBM T.J. Watson Research Lab, Yorktown Heights, NY 1

2 Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 2

3 Multiple Instance Learning Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 3

4 Multiple Instance Learning Notion From (Z. Zhou, NIPS 2006) Multiple Instance Learning (MIL) can be used to handle the ambiguity problem. Each bag (object) contains several instances. A bag is labeled as positive if at least one of its instances is positive, and otherwise it is considered as negative. 4

5 Multi-Instance Learning from Multiple Information Sources Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 5

6 Multi-Instance Learning from Multiple Information Sources MI2LS We solve the rarely studied problem of Multi-Instance Learning from Multiple Information Sources (MI2LS). Why do we need MI2LS? Examples are often described from several different information sources Webpages have disparate descriptions, textual contents, in-bound and out-bound links. Images have different kinds of features, such as the SIFT features, RGB features, texture features, etc. Many previous multiple source related works (not for MIL) have demonstrated the benefits by considering the consistencies between different sources. 6

7 Multi-Instance Learning from Multiple Information Sources MI2LS Objectives Questions How to ensure the consistencies between different information sources? How to further improve the time complexity of MIL? Are there any new applications for MI2LS? 7

8 Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 8

9 The general formulation can be described as follows: min Ω(w (1),..., w (M) ) + L c (D, w (1),..., w (M) ) w (p) + L a (D, w (1),..., w (M) ), M is the number of sources. w (t) represents the classifier for the t-th source. D = {(B i, Y i ), i = 1,..., n} represents the labeled bags. 9

10 The general function can be described as follows: min Ω(w (1),..., w (M) ) + L w (p) }{{} c (D, w (1),..., w (M) ) Regularization Term + L a (D, w (1),..., w (M) ), M is the number of sources. w (t) represents the classifier for the t-th source. D = {(B i, Y i ), i = 1,..., n} represents the labeled bags. 10

11 The general function can be described as follows: min Ω(w (1),..., w (M) ) + L c (D, w (1),..., w (M) ) w (p) }{{} Classification Loss Term + L a (D, w (1),..., w (M) ), M is the number of sources. w (t) represents the classifier for the t-th source. D = {(B i, Y i ), i = 1,..., n} represents the labeled bags. 11

12 The general function can be described as follows: min Ω(w (1),..., w (M) ) + L c (D, w (1),..., w (M) ) w (p) + L a (D, w (1),..., w (M) ), }{{} Consistency Term M is the number of sources. w (t) represents the classifier for the t-th source. D = {(B i, Y i ), i = 1,..., n} represents the labeled bags. 12

13 -Concrete 1 min w (1),w (2) 2 2 w (p) n p=1 s.t. i {1, 2,..., n} 2 n p=1 i=1 Y i max w (1)T B (1) j n ij 1 ξ (1) i i Y i max w (2)T B (2) j n ij 1 ξ (2) i i C (p) ξ (p) i i {1, 2,..., n}, j {1, 2,..., n i } w (1)T B (1) ij w (2)T B (2) ij ɛ + η ij, + C N n n i i=1 j=1 N = n i=1 n i, C (1), C (2) and C are trade-off parameters. η ij 13

14 Procedure Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 14

15 Procedure Algorithm Outline Observation The proposed optimization problem is non-convex. The numbers of bags and instances could be potentially huge, which may increase the time complexity. Previous Efforts Constraint Convex Concave Programming (CCCP) + Cutting Plane Adapt existing convex optimization methods, such as Bundle method, directly to non-convex problem. 15

16 Procedure Algorithm Outline Methodology We use CCCP to decompose the optimization problems into a series of convex problems. Due to the popularity of Stoachastic Gradient Descent (SGD), it is employed to solve the sub-problems Different from previous SGD, we have two sets of constraints, i.e., the ones on bags and the ones on instances. A two level sampling, i.e. sampling on bags and sampling on instances, is employed. 16

17 Procedure Descrption CCCP Iterations: 1. Initialize w 0, t = repeat 3. Derive CCCP sub-problem. Stochastic Gradient Descent Iterations: 4. for s = 1,..., S 5. Sampling on bags 6. Sampling on instances from the sampled bags 7. Calculate the sub-gradient and update the classifier. 8. end for 9.until convergence 10. w t S α = ( w t (1 α)s w t S )/αs. 17

18 Procedure Theoretical Results Proved the optimality of the solution from the adapted SGD method w.r.t # iterations. Proved the Randmacher Complexity and generalized error rate of the formulation. 18

19 Dataset Description Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 19

20 Dataset Description Dataset Two datasets from Reuters21578 Two datasets from WebKB One dataset from the new application Insider Threat Detection (ITD). Dataset # Features View1 # Features View2 # Bags #Instances Reuters Reuters Course Faculty ITD

21 Dataset Description Insider Threat Detection How Data Looks Insider ID start date end date AAM /23/ /29/2010 AJR /10/ /18/2010 BDV /30/ /10/2010 Intuition Bad guys will not do bad things on each single day within the period from the start date to the end date. The labeling is ambiguous for in which days each bad guy did bad things. 21

22 Dataset Description MIL in ADAMS How Data Looks Insider ID start date end date AAM /23/ /29/2010 AJR /10/ /18/2010 BDV /30/ /10/2010 How to Formulate Each time period is considered as a bag; the features of each single day is considered as an instance. The bag is labeled as positive if and only if a person did bad things in at least one day during this period; Otherwise, it is negative. 22

23 Dataset Description MIL in ITD Features Each single day can be described by two groups of features: the group that describes his social behaviors such as sending s and interacting with friends on social media websites the group that depicts things he did by himself, such as logging in and out of a computer system. Ensure the consistencies between different groups. 23

24 Results Outline 1 Introduction Multiple Instance Learning Multi-Instance Learning from Multiple Information Sources 2 Method Procedure 3 Experiments Dataset Description Results 4 Future Works and Conclusions 24

25 Results Results on Reuters21578 Reuters Accuracy FMI 2 LS FMI 2 LS 0 MISVM misvm Citation KNN MILES Training Ratio 25

26 Results Results on Reuters21578 Time FMI 2 LS FMI 2 LS 0 MISVM misvm Citation KNN MILES Reuters Training Ratio 26

27 Results Results on WebKB Accuracy course FMI 2 LS 0.8 FMI 2 LS 0 MISVM 0.75 misvm Citation KNN MILES Training Ratio 27

28 Results Results on WebKB course Time FMI 2 LS FMI 2 LS 0 MISVM misvm Citation KNN MILES Training Ratio 28

29 Results Results on ITD FMILMIS FMILMIS 0 MISVM misvm Citation KNN MILES ITD Training Ratio 29

30 Results Results on ITD Time FMILMIS FMILMIS 0 MISVM misvm Citation KNN MILES ITD Training Ratio 30

31 In this paper, we formulated a rarely studied problem: Multi-Instance Learning from Multiple Information Sources. suggested a formulation for this problem. adapted the Stoachastic Gradient Descent method to solve this problem. introduced a new application Insider Threat Detection. did an extensive set of experiments to demonstrate the advantages of the proposed method. In the future, we can solve: How to combine the current study with traditional single instance learning (e.g. MILEAGE from ICML 2013). How to tune different weights for different sources. 31

32 Questions? 32

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels