K-best Parsing Algorithms

Size: px
Start display at page:

Download "K-best Parsing Algorithms"

Transcription

1 K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute)

2 k-best Parsing Liang Huang (Penn) k-best parsing 2

3 k-best Parsing I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2

4 k-best Parsing S NP I v saw VP NP a boy VP prep PP NP 1-best with a telescope I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2

5 k-best Parsing S S NP I VP VP v NP saw a boy VP NP prep PP NP 1-best 2nd-best with a telescope I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2

6 Not a trivial task Liang Huang (Penn) k-best parsing 3

7 Not a trivial task Aravind Joshi Liang Huang (Penn) k-best parsing 3

8 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3

9 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3

10 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3

11 I saw her duck. Liang Huang (Penn) k-best parsing 4

12 I saw her duck. how about I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 4

13 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5

14 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5

15 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5

16 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6

17 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6

18 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6

19 Or even... I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 7

20 Why k-best? postpone disambiguation in a pipeline 1-best is not always optimal in the future propagate k-best lists instead of 1-best e.g.: semantic role labeler uses k-best parses approximate the set of all possible interpretations reranking (Collins, 2000) minimum error training (Och, 2003) online training (McDonald et al., 2005) part-of-speech tagging lattice Syntactic Parsing k-best parses Semantic Interpretation Liang Huang (Penn) k-best parsing 8

21 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming

22 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming

23 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming

24 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) naive k-best: O(k E) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming

25 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) 3. time complexity: O( V + E ) v naive k-best: O(k E) good k-best: O(E + k log k V) Liang Huang 9 Dynamic Programming

26 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP Liang Huang (Penn) k-best parsing 10

27 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP B C i j j+1 k Liang Huang (Penn) k-best parsing 10

28 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP A i k Liang Huang (Penn) k-best parsing 10

29 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP A computational complexity: O(n 3 P ) P is the set of productions (rules) i k Liang Huang (Penn) k-best parsing 10

30 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) Liang Huang (Penn) k-best parsing 11

31 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex vertices (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) Liang Huang (Penn) k-best parsing 11

32 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex vertices (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) hyperedges (S, 1, 6) Liang Huang (Penn) k-best parsing 11

33 Packed Forest as Hypergraph I saw a boy with a telescope S S NP VP VP packed forest a compact representation of all parse trees I v saw VP NP a boy NP prep with PP NP a telescope Liang Huang (Penn) k-best parsing 12

34 Packed Forest as Hypergraph I saw a boy with a telescope S NP VP VP NP packed forest a compact representation of all parse trees I v saw NP a boy prep with PP NP a telescope Liang Huang (Penn) k-best parsing 13

35 Packed Forest as Hypergraph I saw a boy with a telescope S NP VP VP NP packed forest a compact representation of all parse trees I v saw NP a boy prep with PP NP a telescope Liang Huang (Penn) k-best parsing 13

36 Weighted Deduction/Hypergraph (B, i, j): p (C, j+1, k): q (A, i, k): f (p, q) A B C f is the weight function e.g.: in Probabilistic Context-Free Grammars: f (p, q)= p q Pr (A B C) Liang Huang (Penn) k-best parsing 14

37 Monotonic Weight Functions all weight functions must be monotonic on each of their arguments optimal sub-problem property in dynamic programming B: b A: f (b, c) C: c CKY example: A = (S, 1, 5) B = (NP, 1, 2), C = (VP, 3, 5) f (b, c) = b c Pr(S NP VP) Liang Huang (Penn) k-best parsing 15

38 Monotonic Weight Functions all weight functions must be monotonic on each of their arguments optimal sub-problem property in dynamic programming B: b b A: f (b, c) f (b, c) C: c CKY example: A = (S, 1, 5) B = (NP, 1, 2), C = (VP, 3, 5) f (b, c) = b c Pr(S NP VP) Liang Huang (Penn) k-best parsing 15

39 k-best Problem in Hypergraphs 1-best problem find the best derivation of the target vertex t k-best problem find the top k derivations of the target vertex t assumption in CKY, t = (S, 1, n) acyclic: so that we can use topological order Liang Huang (Penn) k-best parsing 16

40 Outline Formulations Algorithms Generic 1-best Viterbi Algorithm Algorithm 0: naïve Algorithm 1: hyperedge-level Algorithm 2: vertex (item)-level Algorithm 3: lazy algorithm Experiments Applications to Machine Translation Liang Huang (Penn) k-best parsing 17

41 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : f 1 (a, b) Liang Huang (Penn) k-best parsing 18

42 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible (VP, 2, 4) u: a f 1 (VP, 2, 7) (PP, 4, 7) w: b v : f 1 (a, b) Liang Huang (Penn) k-best parsing 18

43 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better ( f 1 (a, b), f 2 (c, d)) u : c f 2 w : d Liang Huang (Penn) k-best parsing 19

44 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 w : d Liang Huang (Penn) k-best parsing 20

45 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 overall time complexity: O( E ) w : d Liang Huang (Penn) k-best parsing 20

46 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 overall time complexity: O( E ) w : d in CKY: E =O (n 3 P ) Liang Huang (Penn) k-best parsing 20

47 Dynamic Programming: 1950 s Richard Bellman Andrew Viterbi Liang Huang (Penn) k-best parsing 21

48 Dynamic Programming: 1950 s Richard Bellman Andrew Viterbi We knew everything so far in your talk 40 years ago Liang Huang (Penn) k-best parsing 21

49 k-best Viterbi algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value vector components maintain sorted now what s f (a, b)? k 2 values -- Cartesian Product f (a i, b j ) just need top k out of the k 2 values u: a f 1 mult k ( f, a, b) = top k { f (a i, b j ) } w: b v : mult k ( f 1, a, b) Liang Huang (Penn) k-best parsing 22

50 k-best Viterbi algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value vector components maintain sorted now what s f (a, b)? k 2 values -- Cartesian Product f (a i, b j ) b just need top k out of the k 2 values.6.4 a.3.3 u: a f 1 mult k ( f, a, b) = top k { f (a i, b j ) } w: b v : mult k ( f 1, a, b) Liang Huang (Penn) k-best parsing 22

51 Algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value and how to update? from two k-lengthed vectors (2k elements) select the top k elements: O(k) u: a f 1 w: b u : c f 2 v : merge k (mult k ( f 1, a, b), mult k ( f 2, c, d)) w : d Liang Huang (Penn) k-best parsing 23

52 Algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value and how to update? from two k-lengthed vectors (2k elements) select the top k elements: O(k) u: a w: b u : c w : d f 2 f 1 v : merge k (mult k ( f 1, a, b), mult k ( f 2, c, d)) overall time complexity: O(k 2 E ) Liang Huang (Penn) k-best parsing 23

53 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) }.1 b a.3.3 Liang Huang (Penn) k-best parsing 24

54 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2?.1 b a.3.3 Liang Huang (Penn) k-best parsing 24

55 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted!.1.3 b a.3 Liang Huang (Penn) k-best parsing 24

56 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic!.1.3 b a.3 Liang Huang (Penn) k-best parsing 24

57 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best.1.3 b a.3 Liang Huang (Penn) k-best parsing 24

58 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best the 2nd-best must be either f (a 2, b 1 ) or f (a 1, b 2 ).1.3 b a.3 Liang Huang (Penn) k-best parsing 24

59 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best the 2nd-best must be either f (a 2, b 1 ) or f (a 1, b 2 ) what about the 3rd-best?.1.3 b a.3 Liang Huang (Penn) k-best parsing 24

60 Algorithm 1 (Demo) f (a, b) = ab Liang Huang (Penn) k-best parsing 25

61 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25

62 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25

63 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25

64 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 26

65 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 26

66 Algorithm 1 (Demo) f (a, b) = ab b j a i Liang Huang (Penn) k-best parsing 26

67 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27

68 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27

69 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27

70 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j in each iteration: 1. extract-max from the heap 2. push the two shoulders into the heap k iterations O(k logk E ) overall time a i Liang Huang (Penn) k-best parsing 27

71 Algorithm 2: speedup merge k Algorithm 1 works on each hyperedge sequentially can we process them simultaneously? u: a w: b p: x q: y f 1 f i f d v Liang Huang (Penn) k-best parsing 28

72 Algorithm 2 (Demo) starts with an initial heap of the 1-best derivations from each hyperedge B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 Liang Huang (Penn) k-best parsing 29

73 Algorithm 2 (Demo) pop the best (.42) and B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3.42 Liang Huang (Penn) k-best parsing 30

74 Algorithm 2 (Demo) pop the best (.42) and push the two successors (.07 and.24) B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 output.42 Liang Huang (Penn) k-best parsing 31

75 Algorithm 2 (Demo) pop the 2 nd -best (.36) B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 output Liang Huang (Penn) k-best parsing 32

76 Algorithm 3: Offline (lazy) from Algorithm 0 to Algorithm 2: delaying the calculations until needed -- lazier larger locality even lazier (one step further) we are interested in the k-best derivations of the final item only! Liang Huang (Penn) k-best parsing 33

77 Algorithm 3: Offline (lazy) forward phase do a normal 1-best search till the final item but construct the hypergraph (forest) along the way recursive backward phase ask the final item: what s your 2nd-best? final item will propagate this question till the leaves then ask the final item: what s your 3rd-best? Liang Huang (Penn) k-best parsing 34

78 Algorithm 3 demo after the forward step (1-best parsing): forest = 1-best derivations from each hyperedge.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 35

79 Algorithm 3 demo now the backward step.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) what s your 2nd-best? S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 36

80 Algorithm 3 demo.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) I m not sure... let me ask my parents S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 37

81 Algorithm 3 demo well, it must be either or.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 38

82 Algorithm 3 demo but wait a minute did you already know the? s?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)?? ?? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 39

83 Algorithm 3 demo but wait a minute did you already know the? s?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)?? ?? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) oops forgot to ask more questions recursively S (1, 7) Liang Huang (Penn) k-best parsing 39 k=2.42

84 Algorithm 3 demo what s your 2nd-best?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 40

85 Algorithm 3 demo recursion goes on to the leaf nodes.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 41

86 Algorithm 3 demo and reports back the numbers??.20? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 42

87 Algorithm 3 demo push.30 and.21 to the candidate heap (priority queue) ? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7).42 k=2.30 Liang Huang (Penn) k-best parsing 43

88 Algorithm 3 demo pop the root of the heap (.30) ? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) now I know my 2nd-best S (1, 7).42 k=2.30 Liang Huang (Penn) k-best parsing 44

89 Interesting Properties 1-best is best everywhere (all decisions optimal) local picture: 2nd-best is optimal everywhere except one decision and that decision must be 2nd-best and it s the best of all 2nd-best decisions b j so what about the 3rd-best? kth-best is.6 a i Liang Huang (Penn) k-best parsing 45

90 Summary of Algorithms Algorithms Time Complexity Locality 1-best O( E ) hyperedge alg. 0: naïve O(k a E ) hyperedge (mult k ) alg. 1 O(k log k E ) hyperedge (mult k ) alg. 2 O( E + V k log k) item (merge k ) alg. 3: lazy O( E + D k log k) global for CKY: a=2, E =O(n 3 P ), V =O(n 2 N ), D =O(n) a is the arity of the grammar Liang Huang (Penn) k-best parsing 46

91 Outline Formulations Algorithms: Alg.0 thru Alg. 3 Experiments Collins/Bikel Parser both efficiency and accuracy Applications in Machine Translation Liang Huang (Penn) k-best parsing 47

92 Background: Statistical Parsing Probabilistic Grammar induced from a treebank (Penn Treebank) State-of-the-art Parsers Collins (1999), Bikel (2004), Charniak (2000), etc. Evaluation of Accuracy PARSEVAL: tree-similarity (English treebank: ~90%) Previous work on k-best Parsing: Collins (2000): turn off dynamic programming Charniak/Johnson (2005): coarse-to-fine, still too slow Liang Huang (Penn) k-best parsing 48

93 Efficiency Implemented Algorithms 0, 1, 3 on top of Collins/Bikel Parser Average (wall-clock) time on Penn Treebank (per sentence): O( E + D k log k) Liang Huang (Penn) k-best parsing 49

94 Oracle Reranking given k parses of a sentence oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function gold standard correct parse 1-best accuracy 100% 89% Liang Huang (Penn) k-best parsing 50

95 Oracle Reranking given k parses of a sentence gold standard correct parse accuracy 100% oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function k-best parses 1-best 96% 91% 89% 78% Liang Huang (Penn) k-best parsing 50

96 Oracle Reranking given k parses of a sentence oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function gold standard correct parse oracle k-best parses 1-best accuracy 100% 96% 91% 89% 78% Liang Huang (Penn) k-best parsing 50

97 Quality of the k-best lists This work on top of Collins parser Collins (2000) Liang Huang (Penn) k-best parsing 51

98 Quality of the k-best lists This work on top of Collins parser Collins (2000) Liang Huang (Penn) k-best parsing 51

99 Why are our k-best lists better? average number of parses average number of parses for sentences of certain length this work Collins (2000) sentence length Collins (2000): turn down dynamic programming theoretically exponential time complexity; aggressive beam pruning to make it tractable in practice Liang Huang (Penn) k-best parsing 52

100 Why are our k-best lists better? average number of parses average number of parses for sentences of certain length this work Collins (2000) 7% of the test-set sentence length Collins (2000): turn down dynamic programming theoretically exponential time complexity; aggressive beam pruning to make it tractable in practice Liang Huang (Penn) k-best parsing 52

101 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming

102 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming

103 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming

104 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming

105 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) v Algorithm 2: O(E + k log k V) 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming

106 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) 3. time complexity: O( V + E ) Liang Huang 53 v Algorithm 2: O(E + k log k V) Algorithm 3: O(E + k log k D) Dynamic Programming

107 Conclusions monotonic hypergraph formulation the k-best derivations problem k-best Algorithms Algorithm 0 (naïve) to Algorithm 3 (lazy) experimental results efficiency accuracy (effectively searching over larger space) applications in machine translation k-best rescoring and forest rescoring Liang Huang (Penn) Applications in MT 54

108 Implemented in... state-of-the-art statistical parsers Charniak parser (2005); Berkeley parser (2006) McDonald et al. dependency parser (2005) Microsoft Research (Redmond) dependency parser (2006) generic dynamic programming languages/packages Dyna (Eisner et al., 2005) and Tiburon (May and Knight, 2006) state-of-the-art syntax-based translation systems Hiero (Chiang, 2005) ISI syntax-based system (2005) CMU syntax-based system (2006) BBN syntax-based system (2007) Liang Huang (Penn) k-best parsing 55

109 Applications in Machine Translation

110 Syntax-based Translation synchronous context-free grammars (SCFGs) generating pairs of strings/trees simultaneously co-indexed nonterminal further rewritten as a unit S S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... S NP VP NP VP Baoweier PP VP Powell VP PP yu Shalong juxing le huitan held a meeting with Sharon Liang Huang (Penn) Applications in MT 57

111 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel Liang Huang (Penn) Applications in MT 58

112 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

113 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

114 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

115 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

116 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

117 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

118 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP 3.0 Powell with Sharon held a talk NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

119 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... held a talk with Sharon VP 3.0 Powell with Sharon held a talk NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58

120 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency Broken English language model (LM) fluency English Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59

121 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59

122 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59

123 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring TM Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59

124 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring TM LM Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I I am so hungry Applications in MT 59

125 k-best rescoring results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: BLEU score Liang Huang (Penn) Applications in MT 60

126 Language Model: Integration Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Liang Huang (Penn) (Knight and Koehn, 2003) Applications in MT 61

127 Language Model: Integration Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Que hambre tengo yo decoder integrated (LM-integrated) decoder computationally challenging! I am so hungry Liang Huang (Penn) (Knight and Koehn, 2003) Applications in MT 61

128 Integrated Decoding Results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) Applications in MT 62

129 Integrated Decoding Results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) but over 100 times slower! Applications in MT 62

130 Integrated Decoding Results see my second talk for details! The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) but over 100 times slower! Applications in MT 62

131 Rescoring vs. Integration quality good poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63

132 Rescoring vs. Integration quality good integration poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63

133 Rescoring vs. Integration quality good integration? any compromise? (on-the-fly rescoring?) poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63

134 Rescoring vs. Integration quality good integration Yes, forest-rescoring -- almost as fast as rescoring, and almost as good as integration? any compromise? (on-the-fly rescoring?) poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63

135 Why Integration is Slow? hyperedge VP1, 6 PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 with... Sharon along... Sharon with... Shalong held... talk held... meeting hold... talks split each node into +LM items (w/ boundary words) beam search: only keep top-k +LM items at each node but there are many ways to derive each node can we avoid enumerating all combinations? Liang Huang (Penn) Applications in MT 64

136 Why Integration is Slow? hyperedge VP1, 6 hold... Sharon held... Shalong hold... Shalong PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 with... Sharon along... Sharon with... Shalong held... talk held... meeting hold... talks split each node into +LM items (w/ boundary words) beam search: only keep top-k +LM items at each node but there are many ways to derive each node can we avoid enumerating all combinations? Liang Huang (Penn) Applications in MT 64

137 Forest Rescoring k-best parsing Algorithm 2 hyperedge with LM cost, we can only do k-best approximately. VP PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 process all hyperedges simultaneously! significant savings of computation Liang Huang (Penn) Applications in MT 65

138 Forest Rescoring Results on the Hiero system (Chiang, 2005) ~10 fold speed-up at the same level of BLEU on my syntax-directed system (Huang et al., 2006) ~10 fold speed-up at the same level of search-error on a typical phrase-based system (Pharaoh) ~30 fold speed-up at the same level of search-error ~100 fold speed-up at the same level of BLEU also used in NIST evals by ISI, CMU, and BBN syntax-based systems see my ACL 07 paper for details Liang Huang (Penn) Applications in MT 66

139 Conclusions monotonic hypergraph formulation the k-best derivations problem k-best Algorithms Algorithm 0 (naïve) to Algorithm 3 (lazy) experimental results efficiency accuracy (effectively searching over larger space) applications in machine translation k-best rescoring and forest rescoring Liang Huang (Penn) Applications in MT 67

140 Thank you!! Questions? Comments?

141 Thank you!! Questions? Comments?

142 Thank you!! Questions? Comments? Liang Huang and David Chiang (2005). Better k-best Parsing. In Proceedings of IWPT, Vancouver, B.C. Liang Huang and David Chiang (2007). Forest Rescoring: Faster Decoding with Integrated Language Models. In Proceedings of ACL, Prague, Czech Rep.

143 Quality of the k-best lists Collins (2000) Liang Huang (Penn) k-best parsing 69

144 Syntax-based Experiments

145 Tree-to-String System syntax-directed, English to Chinese (Huang, Knight, Joshi, 2006) the reverse direction is found in (Liu et al., 2006) VP!"""#$%&"""'$ bei synchronous treesubstitution grammars (STSG) VBD was VP VP-C PP (Galley et al., 2004; Eisner, 2003) related to STAG (Shieber/Schabes, 90) VBN PP IN NP-C shot TO to NP-C NN by DT the NN police tested on 140 sentences slightly better BLEU scores than Pharaoh Liang Huang (Penn) death Applications in MT 71

146 Speed vs. Search Quality Liang Huang (Penn) Applications in MT 72

147 Speed vs. Translation Accuracy Liang Huang (Penn) Applications in MT 73

148 Cube Pruning VP1, 6 PP1, 3 VP3, 6 monotonic grid? (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 74

149 Cube Pruning VP1, 6 PP1, 3 VP3, 6 (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 75

150 Cube Pruning VP1, 6 PP1, 3 VP3, 6 bigram (meeting, with) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 75

151 Cube Pruning VP1, 6 PP1, 3 VP3, 6 non-monotonic grid due to LM combo costs (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 76

152 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 77

153 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate push the two successors (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 78

154 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate push the two successors (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 79

155 Cube Pruning items are popped out-of-order solution: keep a buffer of pop-ups (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 80

156 Cube Pruning items are popped out-of-order solution: keep a buffer of pop-ups finally re-sort the buffer and return inorder: (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 80

Forest-Based Search Algorithms

Forest-Based Search Algorithms Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2

More information

Forest-based Algorithms in

Forest-based Algorithms in Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Spring 2017 Unit 3: Tree Models Lectures 9-11: Context-Free Grammars and Parsing required hard optional Professor Liang Huang liang.huang.sh@gmail.com Big Picture only 2 ideas

More information

Parsing with Dynamic Programming

Parsing with Dynamic Programming CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words

More information

Forest-based Translation Rule Extraction

Forest-based Translation Rule Extraction Forest-based Translation Rule Extraction Haitao Mi 1 1 Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704, Beijing 100190, China

More information

Left-to-Right Tree-to-String Decoding with Prediction

Left-to-Right Tree-to-String Decoding with Prediction Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State

More information

Improving Tree-to-Tree Translation with Packed Forests

Improving Tree-to-Tree Translation with Packed Forests Improving Tree-to-Tree Translation with Packed Forests Yang Liu and Yajuan Lü and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences

More information

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised

More information

Better k-best Parsing

Better k-best Parsing Better k-best Parsing Liang Huang Dept. of Computer & Information Science University of Pennsylvania 3330 Walnut Street Philadelphia, PA 19104 lhuang3@cis.upenn.edu David Chiang Inst. for Advanced Computer

More information

Iterative CKY parsing for Probabilistic Context-Free Grammars

Iterative CKY parsing for Probabilistic Context-Free Grammars Iterative CKY parsing for Probabilistic Context-Free Grammars Yoshimasa Tsuruoka and Jun ichi Tsujii Department of Computer Science, University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 CREST, JST

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)? Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides

More information

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016 Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review

More information

Forest Reranking for Machine Translation with the Perceptron Algorithm

Forest Reranking for Machine Translation with the Perceptron Algorithm Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,

More information

Constituency to Dependency Translation with Forests

Constituency to Dependency Translation with Forests Constituency to Dependency Translation with Forests Haitao Mi and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704,

More information

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009

More information

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop Large-Scale Syntactic Processing: JHU 2009 Summer Research Workshop Intro CCG parser Tasks 2 The Team Stephen Clark (Cambridge, UK) Ann Copestake (Cambridge, UK) James Curran (Sydney, Australia) Byung-Gyu

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging Wenbin Jiang Haitao Mi Qun Liu Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Machine Translation with Lattices and Forests

Machine Translation with Lattices and Forests Machine Translation with Lattices and Forests Haitao Mi Liang Huang Qun Liu Key Lab. of Intelligent Information Processing Information Sciences Institute Institute of Computing Technology Viterbi School

More information

Better Synchronous Binarization for Machine Translation

Better Synchronous Binarization for Machine Translation Better Synchronous Binarization for Machine Translation Tong Xiao *, Mu Li +, Dongdong Zhang +, Jingbo Zhu *, Ming Zhou + * Natural Language Processing Lab Northeastern University Shenyang, China, 110004

More information

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A General-Purpose Rule Extractor for SCFG-Based Machine Translation A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax

More information

CKY algorithm / PCFGs

CKY algorithm / PCFGs CKY algorithm / PCFGs CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University of Massachusetts

More information

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar 1 and Wolfgang Macherey 1 and Chris Dyer 2 and Franz Och 1 1 Google Inc. 1600

More information

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with. Parsing What is Parsing? S NP VP NP Det N S NP Papa N caviar NP NP PP N spoon VP V NP VP VP PP NP VP V spoon V ate PP P NP P with VP PP Det the Det a V NP P NP Det N Det N Papa ate the caviar with a spoon

More information

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of

More information

Advanced PCFG Parsing

Advanced PCFG Parsing Advanced PCFG Parsing BM1 Advanced atural Language Processing Alexander Koller 4 December 2015 Today Agenda-based semiring parsing with parsing schemata. Pruning techniques for chart parsing. Discriminative

More information

Dynamic Feature Selection for Dependency Parsing

Dynamic Feature Selection for Dependency Parsing Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana

More information

A Primer on Graph Processing with Bolinas

A Primer on Graph Processing with Bolinas A Primer on Graph Processing with Bolinas J. Andreas, D. Bauer, K. M. Hermann, B. Jones, K. Knight, and D. Chiang August 20, 2013 1 Introduction This is a tutorial introduction to the Bolinas graph processing

More information

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017 The CKY Parsing Algorithm and PCFGs COMP-550 Oct 12, 2017 Announcements I m out of town next week: Tuesday lecture: Lexical semantics, by TA Jad Kabbara Thursday lecture: Guest lecture by Prof. Timothy

More information

On LM Heuristics for the Cube Growing Algorithm

On LM Heuristics for the Cube Growing Algorithm On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current

More information

Forest-based Neural Machine Translation. Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita

Forest-based Neural Machine Translation. Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita Forest-based Neural Machine Translation Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita Motivation Key point: Syntactic Information To use or not to use? string-to-string model

More information

The Expectation Maximization (EM) Algorithm

The Expectation Maximization (EM) Algorithm The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden

More information

CS 224N Assignment 2 Writeup

CS 224N Assignment 2 Writeup CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Dependency Hashing for n-best CCG Parsing

Dependency Hashing for n-best CCG Parsing Dependency Hashing for n-best CCG Parsing Dominick g and James R. Curran -lab, School of Information Technologies University of Sydney SW, 2006, Australia {dominick.ng,james.r.curran}@sydney.edu.au e Abstract

More information

Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations

Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations Weiwei Sun and Rui Wang and Yi Zhang Department of Computational Linguistics, Saarland University German Research

More information

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based

More information

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com

More information

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Incremental Integer Linear Programming for Non-projective Dependency Parsing Incremental Integer Linear Programming for Non-projective Dependency Parsing Sebastian Riedel James Clarke ICCS, University of Edinburgh 22. July 2006 EMNLP 2006 S. Riedel, J. Clarke (ICCS, Edinburgh)

More information

Probabilistic Parsing of Mathematics

Probabilistic Parsing of Mathematics Probabilistic Parsing of Mathematics Jiří Vyskočil Josef Urban Czech Technical University in Prague, Czech Republic Cezary Kaliszyk University of Innsbruck, Austria Outline Why and why not current formal

More information

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Discriminative Training with Perceptron Algorithm for POS Tagging Task Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

CS395T Project 2: Shift-Reduce Parsing

CS395T Project 2: Shift-Reduce Parsing CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model

More information

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro

More information

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set Tianze Shi* Liang Huang Lillian Lee* * Cornell University Oregon State University O n 3 O n

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Advanced PCFG Parsing

Advanced PCFG Parsing Advanced PCFG Parsing Computational Linguistics Alexander Koller 8 December 2017 Today Parsing schemata and agenda-based parsing. Semiring parsing. Pruning techniques for chart parsing. The CKY Algorithm

More information

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications

More information

Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

Coarse-to-fine n-best parsing and MaxEnt discriminative reranking Coarse-to-fine n-best parsing and MaxEnt discriminative reranking Eugene Charniak and Mark Johnson Brown Laboratory for Linguistic Information Processing (BLLIP) Brown University Providence, RI 02912 {mj

More information

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2014-12-10 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Mid-course evaluation Mostly positive

More information

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2015-12-09 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing

More information

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2016-12-05 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing

More information

Word Graphs for Statistical Machine Translation

Word Graphs for Statistical Machine Translation Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications

More information

Tekniker för storskalig parsning: Dependensparsning 2

Tekniker för storskalig parsning: Dependensparsning 2 Tekniker för storskalig parsning: Dependensparsning 2 Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Dependensparsning 2 1(45) Data-Driven Dependency

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Voting between Multiple Data Representations for Text Chunking

Voting between Multiple Data Representations for Text Chunking Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.

More information

Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley

Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Word Alignment s Indonesia parliament court in arraigned speaker Word Alignment s Indonesia

More information

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing Ling/CSE 472: Introduction to Computational Linguistics 5/4/17 Parsing Reminders Revised project plan due tomorrow Assignment 4 is available Overview Syntax v. parsing Earley CKY (briefly) Chart parsing

More information

Parsing in Parallel on Multiple Cores and GPUs

Parsing in Parallel on Multiple Cores and GPUs 1/28 Parsing in Parallel on Multiple Cores and GPUs Mark Johnson Centre for Language Sciences and Department of Computing Macquarie University ALTA workshop December 2011 Why parse in parallel? 2/28 The

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

Optimal Incremental Parsing via Best-First Dynamic Programming

Optimal Incremental Parsing via Best-First Dynamic Programming Optimal Incremental Parsing via Best-First Dynamic Programming Kai Zhao 1 James Cross 1 1 Graduate Center City University of New York 365 Fifth Avenue, New York, NY 10016 {kzhao,jcross}@gc.cuny.edu Liang

More information

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers These notes are not yet complete. 1 Interprocedural analysis Some analyses are not sufficiently

More information

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith

More information

I Know Your Name: Named Entity Recognition and Structural Parsing

I Know Your Name: Named Entity Recognition and Structural Parsing I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum

More information

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11 Graph-Based Parsing Miguel Ballesteros Algorithms for NLP Course. 7-11 By using some Joakim Nivre's materials from Uppsala University and Jason Eisner's material from Johns Hopkins University. Outline

More information

Probabilistic parsing with a wide variety of features

Probabilistic parsing with a wide variety of features Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368

More information

Natural Language Dependency Parsing. SPFLODD October 13, 2011

Natural Language Dependency Parsing. SPFLODD October 13, 2011 Natural Language Dependency Parsing SPFLODD October 13, 2011 Quick Review from Tuesday What are CFGs? PCFGs? Weighted CFGs? StaKsKcal parsing using the Penn Treebank What it looks like How to build a simple

More information

A Quick Guide to MaltParser Optimization

A Quick Guide to MaltParser Optimization A Quick Guide to MaltParser Optimization Joakim Nivre Johan Hall 1 Introduction MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data

More information

Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Xinyan Xiao, Yang Liu, Qun Liu, and Shouxun Lin Key Laboratory of Intelligent Information Processing Institute of Computing

More information

Syntax and Grammars 1 / 21

Syntax and Grammars 1 / 21 Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types What is a language? 2 / 21 What is a language?

More information

Query Lattice for Translation Retrieval

Query Lattice for Translation Retrieval Query Lattice for Translation Retrieval Meiping Dong, Yong Cheng, Yang Liu, Jia Xu, Maosong Sun, Tatsuya Izuha, Jie Hao # State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016 The CKY Parsing Algorithm and PCFGs COMP-599 Oct 12, 2016 Outline CYK parsing PCFGs Probabilistic CYK parsing 2 CFGs and Constituent Trees Rules/productions: S VP this VP V V is rules jumps rocks Trees:

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 12 (Based on Paul Kube course materials) CSE 100 Coding and decoding with a Huffman coding tree Huffman coding tree implementation issues Priority queues and priority

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning Basic Parsing with Context-Free Grammars Some slides adapted from Karl Stratos and from Chris Manning 1 Announcements HW 2 out Midterm on 10/19 (see website). Sample ques>ons will be provided. Sign up

More information

Cube Pruning as Heuristic Search

Cube Pruning as Heuristic Search Cube Pruning as Heuristic Search Mark Hopkins and Greg Langmead Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA 90292 {mhopkins,glangmead}@languageweaver.com Abstract Cube pruning

More information

22nd International Conference on Computational Linguistics

22nd International Conference on Computational Linguistics Coling 2008 22nd International Conference on Computational Linguistics Advanced Dynamic Programming in Computational Linguistics: Theory, Algorithms and Applications Tutorial notes Liang Huang Department

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2006

More information

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies Ivan Titov University of Illinois at Urbana-Champaign James Henderson, Paola Merlo, Gabriele Musillo University

More information

Ling 571: Deep Processing for Natural Language Processing

Ling 571: Deep Processing for Natural Language Processing Ling 571: Deep Processing for Natural Language Processing Julie Medero February 4, 2013 Today s Plan Assignment Check-in Project 1 Wrap-up CKY Implementations HW2 FAQs: evalb Features & Unification Project

More information

Advanced Dynamic Programming in Semiring and Hypergraph Frameworks

Advanced Dynamic Programming in Semiring and Hypergraph Frameworks Advanced Dynamic Programming in Semiring and Hypergraph Frameworks Liang Huang Department of Computer and Information Science University of Pennsylvania lhuang3@cis.upenn.edu July 15, 2008 Abstract Dynamic

More information

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing

More information

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300

More information

Efficient Parallelization of Natural Language Applications using GPUs

Efficient Parallelization of Natural Language Applications using GPUs Efficient Parallelization of Natural Language Applications using GPUs Chao-Yue Lai Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-54

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and

More information

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model Yue Zhang Oxford University Computing Laboratory yue.zhang@comlab.ox.ac.uk Stephen Clark Cambridge University Computer

More information

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast

More information

Sequence Labeling: The Problem

Sequence Labeling: The Problem Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used

More information

Spanning Tree Methods for Discriminative Training of Dependency Parsers

Spanning Tree Methods for Discriminative Training of Dependency Parsers University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2006 Spanning Tree Methods for Discriminative Training of Dependency Parsers Ryan

More information

Easy-First POS Tagging and Dependency Parsing with Beam Search

Easy-First POS Tagging and Dependency Parsing with Beam Search Easy-First POS Tagging and Dependency Parsing with Beam Search Ji Ma JingboZhu Tong Xiao Nan Yang Natrual Language Processing Lab., Northeastern University, Shenyang, China MOE-MS Key Lab of MCC, University

More information

Introduction to Data-Driven Dependency Parsing

Introduction to Data-Driven Dependency Parsing Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,

More information