Guiding Semi-Supervision with Constraint-Driven Learning

Size: px

Start display at page:

Download "Guiding Semi-Supervision with Constraint-Driven Learning"

Felicia Barnett
5 years ago
Views:

1 Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew Stone Paper presentation by: Drew Stone 1 /

2 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 2 /

3 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 3 /

4 Semi-supervised learning Introduction Question: When labelled data is scarce, how can we take advantage of unlabelled data for training? Intuition: If there exists structure in the underlying distribution of samples, points close/clustered to one another may share labels Paper presentation by: Drew Stone 4 /

5 Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

6 Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Benefit: Access to more training data Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

7 Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Benefit: Access to more training data Drawback: Learned model might drift from correct classifier if the assumptions on the distribution do not hold. Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

8 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 6 /

9 Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

10 Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Benefits: Simple models (less features) are more computationally efficient Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

11 Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Benefits: Simple models (less features) are more computationally efficient Intuition: Fix a set of task-specific constraints to enable the use of a simple machine learning model but encode task-specific constraints to make both learning easier and more correct. Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

12 Constraint-driven learning Framework Given an objective function argmax y λ F (x, y) Define the set of linear (non-linear) constraints {C i } i k C i : X Y {0, 1} Solve the optimization problem given the constraints Paper presentation by: Drew Stone 8 /

13 Constraint-driven learning Framework Given an objective function argmax y λ F (x, y) Define the set of linear (non-linear) constraints {C i } i k C i : X Y {0, 1} Solve the optimization problem given the constraints Any Boolean rule can be encoded as a set of linear inequalities. Paper presentation by: Drew Stone 8 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

14 Constraint-driven learning Example task Sequence tagging with HMM/CRF + global constraints Paper presentation by: Drew Stone 9 /

15 Constraint-driven learning Sequence tagging constraints Unique label for each word Edges must for a path A verb must exist Paper presentation by: Drew Stone 10 /

16 Constraint-driven learning Example task Semantic role labelling with independent classifiers + global constraints Paper presentation by: Drew Stone 11 /

17 Constraint-driven learning Example task Each verb predicate carries with it a new inference problem Paper presentation by: Drew Stone 12 /

18 Constraint-driven learning Example task Paper presentation by: Drew Stone 13 /

19 Constraint-driven learning SRL constraints No duplicate argument classes Unique labels Order constraints Paper presentation by: Drew Stone 14 /

20 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 15 /

21 Constraint-drive learning w/ semi-supervision Introduction Intuition: Use constraints to provide better training samples from our unlabelled set of samples Paper presentation by: Drew Stone 16 /

22 Constraint-drive learning w/ semi-supervision Introduction Intuition: Use constraints to provide better training samples from our unlabelled set of samples Benefit: Deviations from simple model only do so towards a more expressive answer, since constraints guide the model Paper presentation by: Drew Stone 16 /

23 Constraint-drive learning w/ semi-supervision Examples Consider the constraint over {0, 1} n of 1 0. For every possible sequence {1 010 }, there are 2 good fixes {1 110, }. What is the best approach for learning? Consider the constraint, state transitions must occur on punctuation marks. There could be many good sequences abiding by this constraint. Paper presentation by: Drew Stone 17 /

24 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 18 /

25 Constraint-driven learning w/ semi-supervision Model Consider a data and output domain X, Y and a distribution D defined over X Y Input/output pairs (x, y) X Y are given as sequence pairs: x = (x 1,, x N ), y = (y 1,, y M ) We wish to find a structured output classifier h : X Y that uses a structured scoring function f : X Y R to choose the most likely output sequence, where the correct output sequence is given the highest score We are given (or define) a set of constraints {C i } i K C i : X Y {0, 1} Paper presentation by: Drew Stone 19 /

26 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 20 /

27 Constraint-driven learning w/ semi-supervision Scoring Scoring rule: f (x, y) = M i=1 λ i f i (x, y) = λ F (x, y) w T φ(x, y) Compatible for any linear discriminative and generative model such as HMMs and CRFs Local feature functions {f i } i M allow for tractable inference by capturing contextual structure Space of structured pairs could be huge Locally there are smaller distances to account for Paper presentation by: Drew Stone 21 /

28 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 22 /

29 Constraint-driven learning w/ semi-supervision Sample constraints Paper presentation by: Drew Stone 23 /

30 Constraint-driven learning w/ semi-supervision Constraint pipeline (hard) Define 1 C(x) Y as the set of output sequences that assign the value 1 for a given (x, y) Our objective function then becomes argmax y 1C(x) λ F (x, y) Paper presentation by: Drew Stone 24 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

31 Constraint-driven learning w/ semi-supervision Constraint pipeline (hard) Define 1 C(x) Y as the set of output sequences that assign the value 1 for a given (x, y) Our objective function then becomes argmax y 1C(x) λ F (x, y) Intuition: Find best output sequence y that maximizes the score, fitting the hard constraints Paper presentation by: Drew Stone 24 /

32 Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) i=1 Paper presentation by: Drew Stone 25 /

33 Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) Goal: Find best output sequence under our model that violates the least amount of constraints i=1 Paper presentation by: Drew Stone 25 /

34 Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) Goal: Find best output sequence under our model that violates the least amount of constraints Intuition: Given a learned model, λ F (x, y), we can bias its decisions using the amount by which the output sequence violates each soft constraint Option: Use minimal Hamming distance to a sequence i=1 d(y, 1 Ci (x)) = min y C i (x)h(y, y ) Paper presentation by: Drew Stone 25 /

35 Constraint-driven learning w/ semi-supervision Distance example Taken from CCM-NAACL-12-Tutorial M d(y, 1 Ci (x)) = φ Ci (y j ) count violations j=1 Paper presentation by: Drew Stone 26 /

36 Constraint-driven learning Recall: sequence tagging Paper presentation by: Drew Stone 27 /

37 Constraint-driven learning Recall: sequence tagging Paper presentation by: Drew Stone 28 /

38 Constraint-driven learning Recall: sequence tagging (a) correct citation parsing (b) HMM predicted citation parsing Adding punctuation state transition constraint returns correct parsing under the same HMM Paper presentation by: Drew Stone 29 /

39 Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 30 /

40 Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 31 /

41 Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 32 /

42 Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 33 /

43 Constraint-driven learning w/ semi-supervision CODL algorithm Constraints are used in inference not learning Maximum likelihood estimation of λ (EM algorithm) Semi-supervision occurs on line 7,9 (i.e. for each unlabelled sample, we generate K training examples) All unlabelled samples are re-labelled each training cycle, unlike self-training perspective Paper presentation by: Drew Stone 34 /

44 Constraint-driven learning w/ semi-supervision Expectation Maximization: Top K-hard EM EM procedures typically assign a distribution over all input/output pairs for a given unlabelled x X Instead, we choose top K, y Y maximizing our soft objective function and assign uniform probability to each output Constraints mutate distribution each step Paper presentation by: Drew Stone 35 /

45 Constraint-driven learning w/ semi-supervision Motivation for K > 1 There may be multiple ways to correct constraint-violating samples We have access to more training data Paper presentation by: Drew Stone 36 /

46 Experiments Citations Paper presentation by: Drew Stone 37 /

47 Experiments HMM Paper presentation by: Drew Stone 38 /

48 Pictures taken from: CCM-NAACL-12-Tutorial Chang, M. W., Ratinov, L., & Roth, D. (2007, June). Guiding semi-supervision with constraint-driven learning. In ACL (pp ). Paper presentation by: Drew Stone /

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative