Structured Perceptron with Inexact Search

Size: px

Start display at page:

Download "Structured Perceptron with Inexact Search"

Andrew Harmon
5 years ago
Views:

1 Structured Perceptron with Inexact Search Liang Huang Suphan Fayong Yang Guo presented by Allan July 15, 2016 Slides:

2 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

3 Structured Perceptron The standard structured perceptron:

4 Exact and Inexact Search Sometimes exponential number of classes. Dynamic programming (DP) Cannot use non-local features in DP.

5 Exact and Inexact Search Sometimes exponential number of classes. Dynamic programming (DP) Cannot use non-local features in DP. For example, greedy search or beam search.

6 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

7 Violation Definition 1 Standard confusion set C s (D) os a set of triples (x, y, z) for the training data D = {(x (t), y (t) )} n t=1 : C s (D) = {(x, y, z) (x, y) D, z Y(x) {y}} where S = (D, Φ, C) is a training scenario. Definition 3 A triple (x, y, z) is a violation in S if (x, y, z) C s (D) and w Φ(x, y, z) 0 Each update triple (x, y, z) in Algorithm 1 is a violation

8 Violation w Φ(x, y, z) 0, z y so (x, y, z) C S (D)

9 Violation in update This section explains that if we can guarantee violation in each update (valid update), it will converge no matter whether or how exact the search is.

10 Definitions Needed for Convergence Proof Definition 2 The training scenario S = D, Φ, C is linearly separable with margin δ > 0 if there exists u with u = 1 and: (x, y, z) C, u Φ(x, y, z) δ The maximal margin δ(s) is defined as: δ(s) = max u min (x,y,z) C u Φ(x, y, z) Definition 4 Diameter R(S) = max (x,y,z) C Φ(x, y, z)

11 Diameter and Max Margin

12 Proof of Convergence (Standard Structured Perceptron) Theorem 1 For a separable scenario S = D, Φ, C s (D) with δ(s) > 0, the standard structure perceptron in algorithm 1 will make finite number of updates: err(s) R 2 (S)/δ 2 (S)

13 Proof of Convergence (Standard Structured Perceptron) Proof : Let w (0) = 0, suppose the k th update on the triple (x, y, z) and w (k) is the weight before the k th update. Bound w (k+1) from two directions: 1. w (k+1) = w (k) + Φ(x, y, z). Dot product both sides. u w (k+1) = u w (k) + u Φ(x, y, z) Derived from definition 2. u w (k) + δ(s) kδ(s)

14 Proof of Convergence (Standard Structured Perceptron) 2. We have: w (k+1) w = (k) + Φ(x, y, z) 2 = w (k) 2 + Φ(x, y, z) 2 + 2w (k) Φ(x, y, z) w (k) 2 + R 2 (S) + 0 From definition 4, we have R(S) = max (x,y,z) C Φ(x, y, z). And since each update triple is a violation so that w (k) Φ(x, y, z) By induction, we have w (k+1) 2 kr 2 (S). 4. Combining 1 and 3, k R 2 (S)/δ 2 (S)

15 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

16 Violation-Fixing Perceptron Algorithm

17 Violation-Fixing Perceptron Algorithm

18 Violation-Fixing Perceptron Algorithm Theorem 2 For a separable scenario S, the perceptron converge with the same update bounds of R 2 (S)/δ 2 (S) as long as the update triple is a violation ( valid update ). The proof is same as structured perceptron except the second part the violation is guaranteed.

19 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

20 Non-Convergence with Inexact Search Theorem 3 If the arg max in standard structured perceptron is not exact, it might not converge. Greedy Search: commit to the single best action (e.g. tag for current word) given the previous actions.

21 Non-Convergence: Example Φ(x, y) = (1, 1)

22 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

23 Early Update

24 Early Update is Violation-Fixing Early Update (Collins and Roark, 2004): update at the first wrong action Definition 5 Greedy confusion set C g (D): C g (D) ={(x, y [1:i], z [1:i] ) (x, y, z) C S (D), 1 i y, z [1:i 1] = y [1:i 1], z i y i } Greedy Violation: if (x, y, z ) C g (D) and w Φ(x, y, z ) 0.

25 Early Update Convergence For a separable scenario S, it makes finite number of updates err(s) < R 2 (S)/δ 2 (S).

26 Beam Search with Early Update Definition 7 Beam confusion set C b (D) is the st of triples (x, y [1:i], z [1:i] ) where y [1:i] z [1:i] in at least one place: C b (D) = ={(x, y [1:i], z [1:i] ) (x, y, z) C S (D), 1 i y, y [1:i] z [1:i] } Beam violation: if (x, y, z ) C b (D) and w Φ(x, y, z ) 0 Convergence: Beam Search with Early Update will also make finite number of updates: R 2 (S)/δ 2 (S).

27 Beam Search with Early Update

28 Other update methods for Inexact Search hybrid update: if standard update is valid, perform it. Otherwise perform early update. max-violation update: choose the triple that is most violated: (x, y, z ) = arg min w Φ(x, y, z ) (x,y,z ) C,z i {B i [0]} latest update: choose the latest point where the update is still a violation: (x, y, z ) == arg max z (x,y,z ) C,z i {B i [0]},w Φ(x,y,z )>0

29 Table of Contents Structured Perceptron Exact and Inexact Search Violation in Update Violation The Convergence of Structured Perceptron Violation-Fixing Perceptron Algorithm Non-Convergence with Inexact Search Violation-Fixing Updates Early Update Beam Search with Early Update Other update methods for Inexact Search Experiments POS Tagging (Exact Search Possible) Dependency Parsing (Exact Search Intractable) Conclusions

30 Experiments - POS Tagging

31 Experiments - POS Tagging

32 Experiments - Dependency Results

33 % of invalid updates for standard update

34 Conclusions A unifying framework to guarantee convergence with inexact search. The theory explains why early update works. Proposed some variants of update methods to lead a better result.

On Structured Perceptron with Inexact Search, NAACL 2012

On Structured Perceptron with Inexact Search, NAACL 2012 John Hewitt CIS 700-006 : Structured Prediction for NLP 2017-09-23 All graphs from Huang, Fayong, and Guo (2012) unless otherwise specified. All