K-best Parsing Algorithms
|
|
- Rhoda Phillips
- 5 years ago
- Views:
Transcription
1 K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute)
2 k-best Parsing Liang Huang (Penn) k-best parsing 2
3 k-best Parsing I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2
4 k-best Parsing S NP I v saw VP NP a boy VP prep PP NP 1-best with a telescope I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2
5 k-best Parsing S S NP I VP VP v NP saw a boy VP NP prep PP NP 1-best 2nd-best with a telescope I saw a boy with a telescope. Liang Huang (Penn) k-best parsing 2
6 Not a trivial task Liang Huang (Penn) k-best parsing 3
7 Not a trivial task Aravind Joshi Liang Huang (Penn) k-best parsing 3
8 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3
9 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3
10 Not a trivial task I saw her duck. Aravind Joshi Liang Huang (Penn) k-best parsing 3
11 I saw her duck. Liang Huang (Penn) k-best parsing 4
12 I saw her duck. how about I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 4
13 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5
14 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5
15 I saw her duck with a telescope. Liang Huang (Penn) k-best parsing 5
16 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6
17 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6
18 Another Example I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 6
19 Or even... I eat sushi with tuna. Aravind Joshi Liang Huang (Penn) k-best parsing 7
20 Why k-best? postpone disambiguation in a pipeline 1-best is not always optimal in the future propagate k-best lists instead of 1-best e.g.: semantic role labeler uses k-best parses approximate the set of all possible interpretations reranking (Collins, 2000) minimum error training (Och, 2003) online training (McDonald et al., 2005) part-of-speech tagging lattice Syntactic Parsing k-best parses Semantic Interpretation Liang Huang (Penn) k-best parsing 8
21 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming
22 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming
23 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming
24 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) naive k-best: O(k E) v 3. time complexity: O( V + E ) Liang Huang 9 Dynamic Programming
25 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) 3. time complexity: O( V + E ) v naive k-best: O(k E) good k-best: O(E + k log k V) Liang Huang 9 Dynamic Programming
26 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP Liang Huang (Penn) k-best parsing 10
27 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP B C i j j+1 k Liang Huang (Penn) k-best parsing 10
28 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP A i k Liang Huang (Penn) k-best parsing 10
29 Parsing as Deduction Parsing with context-free grammars (CFGs) Dynamic Programming (CKY algorithm) (B, i, j ) (C, j+1, k) (A, i, k) A B C (NP, 1, 3) (VP, 4, 6) (S, 1, 6) S NP VP A computational complexity: O(n 3 P ) P is the set of productions (rules) i k Liang Huang (Penn) k-best parsing 10
30 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) Liang Huang (Penn) k-best parsing 11
31 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex vertices (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) Liang Huang (Penn) k-best parsing 11
32 Deduction => Hypergraph hypergraph is a generalization of graph each hyperedge connects several vertices to one vertex vertices (VB, 3, 3) (PP, 4, 6) (VB, 3, 3) (PP, 4, 6) (NP, 1, 3) (VP, 4, 6) (NP, 1, 3) (VP, 4, 6) (S, 1, 6) hyperedges (S, 1, 6) Liang Huang (Penn) k-best parsing 11
33 Packed Forest as Hypergraph I saw a boy with a telescope S S NP VP VP packed forest a compact representation of all parse trees I v saw VP NP a boy NP prep with PP NP a telescope Liang Huang (Penn) k-best parsing 12
34 Packed Forest as Hypergraph I saw a boy with a telescope S NP VP VP NP packed forest a compact representation of all parse trees I v saw NP a boy prep with PP NP a telescope Liang Huang (Penn) k-best parsing 13
35 Packed Forest as Hypergraph I saw a boy with a telescope S NP VP VP NP packed forest a compact representation of all parse trees I v saw NP a boy prep with PP NP a telescope Liang Huang (Penn) k-best parsing 13
36 Weighted Deduction/Hypergraph (B, i, j): p (C, j+1, k): q (A, i, k): f (p, q) A B C f is the weight function e.g.: in Probabilistic Context-Free Grammars: f (p, q)= p q Pr (A B C) Liang Huang (Penn) k-best parsing 14
37 Monotonic Weight Functions all weight functions must be monotonic on each of their arguments optimal sub-problem property in dynamic programming B: b A: f (b, c) C: c CKY example: A = (S, 1, 5) B = (NP, 1, 2), C = (VP, 3, 5) f (b, c) = b c Pr(S NP VP) Liang Huang (Penn) k-best parsing 15
38 Monotonic Weight Functions all weight functions must be monotonic on each of their arguments optimal sub-problem property in dynamic programming B: b b A: f (b, c) f (b, c) C: c CKY example: A = (S, 1, 5) B = (NP, 1, 2), C = (VP, 3, 5) f (b, c) = b c Pr(S NP VP) Liang Huang (Penn) k-best parsing 15
39 k-best Problem in Hypergraphs 1-best problem find the best derivation of the target vertex t k-best problem find the top k derivations of the target vertex t assumption in CKY, t = (S, 1, n) acyclic: so that we can use topological order Liang Huang (Penn) k-best parsing 16
40 Outline Formulations Algorithms Generic 1-best Viterbi Algorithm Algorithm 0: naïve Algorithm 1: hyperedge-level Algorithm 2: vertex (item)-level Algorithm 3: lazy algorithm Experiments Applications to Machine Translation Liang Huang (Penn) k-best parsing 17
41 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : f 1 (a, b) Liang Huang (Penn) k-best parsing 18
42 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible (VP, 2, 4) u: a f 1 (VP, 2, 7) (PP, 4, 7) w: b v : f 1 (a, b) Liang Huang (Penn) k-best parsing 18
43 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each vertex ( bottom-up ) for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better ( f 1 (a, b), f 2 (c, d)) u : c f 2 w : d Liang Huang (Penn) k-best parsing 19
44 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 w : d Liang Huang (Penn) k-best parsing 20
45 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 overall time complexity: O( E ) w : d Liang Huang (Penn) k-best parsing 20
46 Generic 1-best Viterbi Algorithm traverse the hypergraph in topological order for each incoming hyperedge compute the result of the f function along the hyperedge update the 1-best value for the current vertex if possible u: a f 1 w: b v : better( better ( f 1 (a, b), f 2 (c, d)), ) u : c f 2 overall time complexity: O( E ) w : d in CKY: E =O (n 3 P ) Liang Huang (Penn) k-best parsing 20
47 Dynamic Programming: 1950 s Richard Bellman Andrew Viterbi Liang Huang (Penn) k-best parsing 21
48 Dynamic Programming: 1950 s Richard Bellman Andrew Viterbi We knew everything so far in your talk 40 years ago Liang Huang (Penn) k-best parsing 21
49 k-best Viterbi algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value vector components maintain sorted now what s f (a, b)? k 2 values -- Cartesian Product f (a i, b j ) just need top k out of the k 2 values u: a f 1 mult k ( f, a, b) = top k { f (a i, b j ) } w: b v : mult k ( f 1, a, b) Liang Huang (Penn) k-best parsing 22
50 k-best Viterbi algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value vector components maintain sorted now what s f (a, b)? k 2 values -- Cartesian Product f (a i, b j ) b just need top k out of the k 2 values.6.4 a.3.3 u: a f 1 mult k ( f, a, b) = top k { f (a i, b j ) } w: b v : mult k ( f 1, a, b) Liang Huang (Penn) k-best parsing 22
51 Algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value and how to update? from two k-lengthed vectors (2k elements) select the top k elements: O(k) u: a f 1 w: b u : c f 2 v : merge k (mult k ( f 1, a, b), mult k ( f 2, c, d)) w : d Liang Huang (Penn) k-best parsing 23
52 Algorithm 0: naïve straightforward k-best extension: a vector of length k instead of a single value and how to update? from two k-lengthed vectors (2k elements) select the top k elements: O(k) u: a w: b u : c w : d f 2 f 1 v : merge k (mult k ( f 1, a, b), mult k ( f 2, c, d)) overall time complexity: O(k 2 E ) Liang Huang (Penn) k-best parsing 23
53 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) }.1 b a.3.3 Liang Huang (Penn) k-best parsing 24
54 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2?.1 b a.3.3 Liang Huang (Penn) k-best parsing 24
55 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted!.1.3 b a.3 Liang Huang (Penn) k-best parsing 24
56 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic!.1.3 b a.3 Liang Huang (Penn) k-best parsing 24
57 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best.1.3 b a.3 Liang Huang (Penn) k-best parsing 24
58 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best the 2nd-best must be either f (a 2, b 1 ) or f (a 1, b 2 ).1.3 b a.3 Liang Huang (Penn) k-best parsing 24
59 Algorithm 1: speedup mult k mult k ( f, a, b) = top k{ f (a i, b j ) } only interested in top k, why enumerate all k 2? a and b are sorted! f is monotonic! f (a 1, b 1 ) must be the 1-best the 2nd-best must be either f (a 2, b 1 ) or f (a 1, b 2 ) what about the 3rd-best?.1.3 b a.3 Liang Huang (Penn) k-best parsing 24
60 Algorithm 1 (Demo) f (a, b) = ab Liang Huang (Penn) k-best parsing 25
61 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25
62 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25
63 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 25
64 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 26
65 Algorithm 1 (Demo) f (a, b) = ab.1.3 b j a i Liang Huang (Penn) k-best parsing 26
66 Algorithm 1 (Demo) f (a, b) = ab b j a i Liang Huang (Penn) k-best parsing 26
67 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27
68 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27
69 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j a i Liang Huang (Penn) k-best parsing 27
70 Algorithm 1 (Demo) use a priority queue (heap) to store the candidates (frontier) b j in each iteration: 1. extract-max from the heap 2. push the two shoulders into the heap k iterations O(k logk E ) overall time a i Liang Huang (Penn) k-best parsing 27
71 Algorithm 2: speedup merge k Algorithm 1 works on each hyperedge sequentially can we process them simultaneously? u: a w: b p: x q: y f 1 f i f d v Liang Huang (Penn) k-best parsing 28
72 Algorithm 2 (Demo) starts with an initial heap of the 1-best derivations from each hyperedge B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 Liang Huang (Penn) k-best parsing 29
73 Algorithm 2 (Demo) pop the best (.42) and B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3.42 Liang Huang (Penn) k-best parsing 30
74 Algorithm 2 (Demo) pop the best (.42) and push the two successors (.07 and.24) B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 output.42 Liang Huang (Penn) k-best parsing 31
75 Algorithm 2 (Demo) pop the 2 nd -best (.36) B 1 x C 1 B 2 x C 2 B 3 x C item-level heap v k = 2, d = 3 output Liang Huang (Penn) k-best parsing 32
76 Algorithm 3: Offline (lazy) from Algorithm 0 to Algorithm 2: delaying the calculations until needed -- lazier larger locality even lazier (one step further) we are interested in the k-best derivations of the final item only! Liang Huang (Penn) k-best parsing 33
77 Algorithm 3: Offline (lazy) forward phase do a normal 1-best search till the final item but construct the hypergraph (forest) along the way recursive backward phase ask the final item: what s your 2nd-best? final item will propagate this question till the leaves then ask the final item: what s your 3rd-best? Liang Huang (Penn) k-best parsing 34
78 Algorithm 3 demo after the forward step (1-best parsing): forest = 1-best derivations from each hyperedge.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 35
79 Algorithm 3 demo now the backward step.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) what s your 2nd-best? S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 36
80 Algorithm 3 demo.20? 0.4? 0.5 NP (1, 2) VP (3, 7).42? 0.7? 0.6 NP (1, 3) VP (4, 7) 1-best.28? 0.7? 0.4 VP (1, 5) NP (6, 7) I m not sure... let me ask my parents S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 37
81 Algorithm 3 demo well, it must be either or.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 38
82 Algorithm 3 demo but wait a minute did you already know the? s?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)?? ?? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 39
83 Algorithm 3 demo but wait a minute did you already know the? s?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)?? ?? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) oops forgot to ask more questions recursively S (1, 7) Liang Huang (Penn) k-best parsing 39 k=2.42
84 Algorithm 3 demo what s your 2nd-best?.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 40
85 Algorithm 3 demo recursion goes on to the leaf nodes.20? 0.4? 0.5 NP (1, 2) VP (3, 7)??.42? 0.7? 0.6 NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 41
86 Algorithm 3 demo and reports back the numbers??.20? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7) k=2.42 Liang Huang (Penn) k-best parsing 42
87 Algorithm 3 demo push.30 and.21 to the candidate heap (priority queue) ? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) S (1, 7).42 k=2.30 Liang Huang (Penn) k-best parsing 43
88 Algorithm 3 demo pop the root of the heap (.30) ? 0.4? 0.5 NP (1, 2) VP (3, 7) NP (1, 3) VP (4, 7).28? 0.7? 0.4 VP (1, 5) NP (6, 7) now I know my 2nd-best S (1, 7).42 k=2.30 Liang Huang (Penn) k-best parsing 44
89 Interesting Properties 1-best is best everywhere (all decisions optimal) local picture: 2nd-best is optimal everywhere except one decision and that decision must be 2nd-best and it s the best of all 2nd-best decisions b j so what about the 3rd-best? kth-best is.6 a i Liang Huang (Penn) k-best parsing 45
90 Summary of Algorithms Algorithms Time Complexity Locality 1-best O( E ) hyperedge alg. 0: naïve O(k a E ) hyperedge (mult k ) alg. 1 O(k log k E ) hyperedge (mult k ) alg. 2 O( E + V k log k) item (merge k ) alg. 3: lazy O( E + D k log k) global for CKY: a=2, E =O(n 3 P ), V =O(n 2 N ), D =O(n) a is the arity of the grammar Liang Huang (Penn) k-best parsing 46
91 Outline Formulations Algorithms: Alg.0 thru Alg. 3 Experiments Collins/Bikel Parser both efficiency and accuracy Applications in Machine Translation Liang Huang (Penn) k-best parsing 47
92 Background: Statistical Parsing Probabilistic Grammar induced from a treebank (Penn Treebank) State-of-the-art Parsers Collins (1999), Bikel (2004), Charniak (2000), etc. Evaluation of Accuracy PARSEVAL: tree-similarity (English treebank: ~90%) Previous work on k-best Parsing: Collins (2000): turn off dynamic programming Charniak/Johnson (2005): coarse-to-fine, still too slow Liang Huang (Penn) k-best parsing 48
93 Efficiency Implemented Algorithms 0, 1, 3 on top of Collins/Bikel Parser Average (wall-clock) time on Penn Treebank (per sentence): O( E + D k log k) Liang Huang (Penn) k-best parsing 49
94 Oracle Reranking given k parses of a sentence oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function gold standard correct parse 1-best accuracy 100% 89% Liang Huang (Penn) k-best parsing 50
95 Oracle Reranking given k parses of a sentence gold standard correct parse accuracy 100% oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function k-best parses 1-best 96% 91% 89% 78% Liang Huang (Penn) k-best parsing 50
96 Oracle Reranking given k parses of a sentence oracle reranking: pick the best parse according to the gold-standard real reranking: pick the best parse according to the score function gold standard correct parse oracle k-best parses 1-best accuracy 100% 96% 91% 89% 78% Liang Huang (Penn) k-best parsing 50
97 Quality of the k-best lists This work on top of Collins parser Collins (2000) Liang Huang (Penn) k-best parsing 51
98 Quality of the k-best lists This work on top of Collins parser Collins (2000) Liang Huang (Penn) k-best parsing 51
99 Why are our k-best lists better? average number of parses average number of parses for sentences of certain length this work Collins (2000) sentence length Collins (2000): turn down dynamic programming theoretically exponential time complexity; aggressive beam pruning to make it tractable in practice Liang Huang (Penn) k-best parsing 52
100 Why are our k-best lists better? average number of parses average number of parses for sentences of certain length this work Collins (2000) 7% of the test-set sentence length Collins (2000): turn down dynamic programming theoretically exponential time complexity; aggressive beam pruning to make it tractable in practice Liang Huang (Penn) k-best parsing 52
101 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming
102 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming
103 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming
104 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) v 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming
105 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) v Algorithm 2: O(E + k log k V) 3. time complexity: O( V + E ) Liang Huang 53 Dynamic Programming
106 k-best Viterbi Algorithm? 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) Algorithm 0: O(k E) 3. time complexity: O( V + E ) Liang Huang 53 v Algorithm 2: O(E + k log k V) Algorithm 3: O(E + k log k D) Dynamic Programming
107 Conclusions monotonic hypergraph formulation the k-best derivations problem k-best Algorithms Algorithm 0 (naïve) to Algorithm 3 (lazy) experimental results efficiency accuracy (effectively searching over larger space) applications in machine translation k-best rescoring and forest rescoring Liang Huang (Penn) Applications in MT 54
108 Implemented in... state-of-the-art statistical parsers Charniak parser (2005); Berkeley parser (2006) McDonald et al. dependency parser (2005) Microsoft Research (Redmond) dependency parser (2006) generic dynamic programming languages/packages Dyna (Eisner et al., 2005) and Tiburon (May and Knight, 2006) state-of-the-art syntax-based translation systems Hiero (Chiang, 2005) ISI syntax-based system (2005) CMU syntax-based system (2006) BBN syntax-based system (2007) Liang Huang (Penn) k-best parsing 55
109 Applications in Machine Translation
110 Syntax-based Translation synchronous context-free grammars (SCFGs) generating pairs of strings/trees simultaneously co-indexed nonterminal further rewritten as a unit S S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... S NP VP NP VP Baoweier PP VP Powell VP PP yu Shalong juxing le huitan held a meeting with Sharon Liang Huang (Penn) Applications in MT 57
111 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel Liang Huang (Penn) Applications in MT 58
112 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
113 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
114 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
115 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
116 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
117 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
118 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... VP 3.0 Powell with Sharon held a talk NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
119 Translation as Parsing translation ( decoding ) => monolingual parsing parse the source input with the source projection build the corresponding target sub-strings in parallel S NP (1) VP (2), NP (1) VP (2) VP PP (1) VP (2), VP (2) PP (1) NP Baoweier, Powell... held a talk with Sharon VP 3.0 Powell with Sharon held a talk NP PP VP Baoweier yu Shalong juxing le huitan Liang Huang (Penn) Applications in MT 58
120 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency Broken English language model (LM) fluency English Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59
121 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59
122 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59
123 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring TM Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I... I am so hungry Applications in MT 59
124 Language Model: Rescoring Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English k-best rescoring TM LM Que hambre tengo yo Liang Huang (Penn) What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I I am so hungry Applications in MT 59
125 k-best rescoring results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: BLEU score Liang Huang (Penn) Applications in MT 60
126 Language Model: Integration Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Liang Huang (Penn) (Knight and Koehn, 2003) Applications in MT 61
127 Language Model: Integration Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Spanish translation model (TM) competency synchronous CFG Broken English language model (LM) fluency n-gram LM English Que hambre tengo yo decoder integrated (LM-integrated) decoder computationally challenging! I am so hungry Liang Huang (Penn) (Knight and Koehn, 2003) Applications in MT 61
128 Integrated Decoding Results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) Applications in MT 62
129 Integrated Decoding Results The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) but over 100 times slower! Applications in MT 62
130 Integrated Decoding Results see my second talk for details! The ISI syntax-based translation system currently the best performing system on Chinese to English task in NIST evaluations based on synchronous grammars translation model (TM) only: rescoring with trigram LM on best list: trigram integrated decoding: BLEU score Liang Huang (Penn) but over 100 times slower! Applications in MT 62
131 Rescoring vs. Integration quality good poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63
132 Rescoring vs. Integration quality good integration poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63
133 Rescoring vs. Integration quality good integration? any compromise? (on-the-fly rescoring?) poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63
134 Rescoring vs. Integration quality good integration Yes, forest-rescoring -- almost as fast as rescoring, and almost as good as integration? any compromise? (on-the-fly rescoring?) poor rescoring slow fast speed Liang Huang (Penn) Applications in MT 63
135 Why Integration is Slow? hyperedge VP1, 6 PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 with... Sharon along... Sharon with... Shalong held... talk held... meeting hold... talks split each node into +LM items (w/ boundary words) beam search: only keep top-k +LM items at each node but there are many ways to derive each node can we avoid enumerating all combinations? Liang Huang (Penn) Applications in MT 64
136 Why Integration is Slow? hyperedge VP1, 6 hold... Sharon held... Shalong hold... Shalong PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 with... Sharon along... Sharon with... Shalong held... talk held... meeting hold... talks split each node into +LM items (w/ boundary words) beam search: only keep top-k +LM items at each node but there are many ways to derive each node can we avoid enumerating all combinations? Liang Huang (Penn) Applications in MT 64
137 Forest Rescoring k-best parsing Algorithm 2 hyperedge with LM cost, we can only do k-best approximately. VP PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6 process all hyperedges simultaneously! significant savings of computation Liang Huang (Penn) Applications in MT 65
138 Forest Rescoring Results on the Hiero system (Chiang, 2005) ~10 fold speed-up at the same level of BLEU on my syntax-directed system (Huang et al., 2006) ~10 fold speed-up at the same level of search-error on a typical phrase-based system (Pharaoh) ~30 fold speed-up at the same level of search-error ~100 fold speed-up at the same level of BLEU also used in NIST evals by ISI, CMU, and BBN syntax-based systems see my ACL 07 paper for details Liang Huang (Penn) Applications in MT 66
139 Conclusions monotonic hypergraph formulation the k-best derivations problem k-best Algorithms Algorithm 0 (naïve) to Algorithm 3 (lazy) experimental results efficiency accuracy (effectively searching over larger space) applications in machine translation k-best rescoring and forest rescoring Liang Huang (Penn) Applications in MT 67
140 Thank you!! Questions? Comments?
141 Thank you!! Questions? Comments?
142 Thank you!! Questions? Comments? Liang Huang and David Chiang (2005). Better k-best Parsing. In Proceedings of IWPT, Vancouver, B.C. Liang Huang and David Chiang (2007). Forest Rescoring: Faster Decoding with Integrated Language Models. In Proceedings of ACL, Prague, Czech Rep.
143 Quality of the k-best lists Collins (2000) Liang Huang (Penn) k-best parsing 69
144 Syntax-based Experiments
145 Tree-to-String System syntax-directed, English to Chinese (Huang, Knight, Joshi, 2006) the reverse direction is found in (Liu et al., 2006) VP!"""#$%&"""'$ bei synchronous treesubstitution grammars (STSG) VBD was VP VP-C PP (Galley et al., 2004; Eisner, 2003) related to STAG (Shieber/Schabes, 90) VBN PP IN NP-C shot TO to NP-C NN by DT the NN police tested on 140 sentences slightly better BLEU scores than Pharaoh Liang Huang (Penn) death Applications in MT 71
146 Speed vs. Search Quality Liang Huang (Penn) Applications in MT 72
147 Speed vs. Translation Accuracy Liang Huang (Penn) Applications in MT 73
148 Cube Pruning VP1, 6 PP1, 3 VP3, 6 monotonic grid? (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 74
149 Cube Pruning VP1, 6 PP1, 3 VP3, 6 (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 75
150 Cube Pruning VP1, 6 PP1, 3 VP3, 6 bigram (meeting, with) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 75
151 Cube Pruning VP1, 6 PP1, 3 VP3, 6 non-monotonic grid due to LM combo costs (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 76
152 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 77
153 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate push the two successors (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 78
154 Cube Pruning k-best parsing Algorithm 1 a priority queue of candidates extract the best candidate push the two successors (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 79
155 Cube Pruning items are popped out-of-order solution: keep a buffer of pop-ups (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 80
156 Cube Pruning items are popped out-of-order solution: keep a buffer of pop-ups finally re-sort the buffer and return inorder: (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Liang Huang (Penn) Applications in MT 80
Forest-Based Search Algorithms
Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2
More informationForest-based Algorithms in
Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,
More informationNatural Language Processing
Natural Language Processing Spring 2017 Unit 3: Tree Models Lectures 9-11: Context-Free Grammars and Parsing required hard optional Professor Liang Huang liang.huang.sh@gmail.com Big Picture only 2 ideas
More informationParsing with Dynamic Programming
CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words
More informationForest-based Translation Rule Extraction
Forest-based Translation Rule Extraction Haitao Mi 1 1 Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704, Beijing 100190, China
More informationLeft-to-Right Tree-to-String Decoding with Prediction
Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State
More informationImproving Tree-to-Tree Translation with Packed Forests
Improving Tree-to-Tree Translation with Packed Forests Yang Liu and Yajuan Lü and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationBetter k-best Parsing
Better k-best Parsing Liang Huang Dept. of Computer & Information Science University of Pennsylvania 3330 Walnut Street Philadelphia, PA 19104 lhuang3@cis.upenn.edu David Chiang Inst. for Advanced Computer
More informationIterative CKY parsing for Probabilistic Context-Free Grammars
Iterative CKY parsing for Probabilistic Context-Free Grammars Yoshimasa Tsuruoka and Jun ichi Tsujii Department of Computer Science, University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 CREST, JST
More informationCMU-UKA Syntax Augmented Machine Translation
Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationAdmin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?
Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationForest Reranking for Machine Translation with the Perceptron Algorithm
Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,
More informationConstituency to Dependency Translation with Forests
Constituency to Dependency Translation with Forests Haitao Mi and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704,
More informationAgenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing
Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009
More informationLarge-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop
Large-Scale Syntactic Processing: JHU 2009 Summer Research Workshop Intro CCG parser Tasks 2 The Team Stephen Clark (Cambridge, UK) Ann Copestake (Cambridge, UK) James Curran (Sydney, Australia) Byung-Gyu
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationWord Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging Wenbin Jiang Haitao Mi Qun Liu Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationMachine Translation with Lattices and Forests
Machine Translation with Lattices and Forests Haitao Mi Liang Huang Qun Liu Key Lab. of Intelligent Information Processing Information Sciences Institute Institute of Computing Technology Viterbi School
More informationBetter Synchronous Binarization for Machine Translation
Better Synchronous Binarization for Machine Translation Tong Xiao *, Mu Li +, Dongdong Zhang +, Jingbo Zhu *, Ming Zhou + * Natural Language Processing Lab Northeastern University Shenyang, China, 110004
More informationA General-Purpose Rule Extractor for SCFG-Based Machine Translation
A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax
More informationCKY algorithm / PCFGs
CKY algorithm / PCFGs CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University of Massachusetts
More informationEfficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices
Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar 1 and Wolfgang Macherey 1 and Chris Dyer 2 and Franz Och 1 1 Google Inc. 1600
More informationWhat is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.
Parsing What is Parsing? S NP VP NP Det N S NP Papa N caviar NP NP PP N spoon VP V NP VP VP PP NP VP V spoon V ate PP P NP P with VP PP Det the Det a V NP P NP Det N Det N Papa ate the caviar with a spoon
More informationDependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin
Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of
More informationAdvanced PCFG Parsing
Advanced PCFG Parsing BM1 Advanced atural Language Processing Alexander Koller 4 December 2015 Today Agenda-based semiring parsing with parsing schemata. Pruning techniques for chart parsing. Discriminative
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationA Primer on Graph Processing with Bolinas
A Primer on Graph Processing with Bolinas J. Andreas, D. Bauer, K. M. Hermann, B. Jones, K. Knight, and D. Chiang August 20, 2013 1 Introduction This is a tutorial introduction to the Bolinas graph processing
More informationThe CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017
The CKY Parsing Algorithm and PCFGs COMP-550 Oct 12, 2017 Announcements I m out of town next week: Tuesday lecture: Lexical semantics, by TA Jad Kabbara Thursday lecture: Guest lecture by Prof. Timothy
More informationOn LM Heuristics for the Cube Growing Algorithm
On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current
More informationForest-based Neural Machine Translation. Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita
Forest-based Neural Machine Translation Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita Motivation Key point: Syntactic Information To use or not to use? string-to-string model
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationCS 224N Assignment 2 Writeup
CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationDependency Hashing for n-best CCG Parsing
Dependency Hashing for n-best CCG Parsing Dominick g and James R. Curran -lab, School of Information Technologies University of Sydney SW, 2006, Australia {dominick.ng,james.r.curran}@sydney.edu.au e Abstract
More informationDiscriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations
Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations Weiwei Sun and Rui Wang and Yi Zhang Department of Computational Linguistics, Saarland University German Research
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationAT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands
AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com
More informationIncremental Integer Linear Programming for Non-projective Dependency Parsing
Incremental Integer Linear Programming for Non-projective Dependency Parsing Sebastian Riedel James Clarke ICCS, University of Edinburgh 22. July 2006 EMNLP 2006 S. Riedel, J. Clarke (ICCS, Edinburgh)
More informationProbabilistic Parsing of Mathematics
Probabilistic Parsing of Mathematics Jiří Vyskočil Josef Urban Czech Technical University in Prague, Czech Republic Cezary Kaliszyk University of Innsbruck, Austria Outline Why and why not current formal
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationCS395T Project 2: Shift-Reduce Parsing
CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model
More informationLING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro
More informationFast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set Tianze Shi* Liang Huang Lillian Lee* * Cornell University Oregon State University O n 3 O n
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationAdvanced PCFG Parsing
Advanced PCFG Parsing Computational Linguistics Alexander Koller 8 December 2017 Today Parsing schemata and agenda-based parsing. Semiring parsing. Pruning techniques for chart parsing. The CKY Algorithm
More informationHierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications
More informationCoarse-to-fine n-best parsing and MaxEnt discriminative reranking
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking Eugene Charniak and Mark Johnson Brown Laboratory for Linguistic Information Processing (BLLIP) Brown University Providence, RI 02912 {mj
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2014-12-10 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Mid-course evaluation Mostly positive
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2015-12-09 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2016-12-05 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing
More informationWord Graphs for Statistical Machine Translation
Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications
More informationTekniker för storskalig parsning: Dependensparsning 2
Tekniker för storskalig parsning: Dependensparsning 2 Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Dependensparsning 2 1(45) Data-Driven Dependency
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationVoting between Multiple Data Representations for Text Chunking
Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.
More informationBetter Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley
Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Word Alignment s Indonesia parliament court in arraigned speaker Word Alignment s Indonesia
More informationLing/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing
Ling/CSE 472: Introduction to Computational Linguistics 5/4/17 Parsing Reminders Revised project plan due tomorrow Assignment 4 is available Overview Syntax v. parsing Earley CKY (briefly) Chart parsing
More informationParsing in Parallel on Multiple Cores and GPUs
1/28 Parsing in Parallel on Multiple Cores and GPUs Mark Johnson Centre for Language Sciences and Department of Computing Macquarie University ALTA workshop December 2011 Why parse in parallel? 2/28 The
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationOptimal Incremental Parsing via Best-First Dynamic Programming
Optimal Incremental Parsing via Best-First Dynamic Programming Kai Zhao 1 James Cross 1 1 Graduate Center City University of New York 365 Fifth Avenue, New York, NY 10016 {kzhao,jcross}@gc.cuny.edu Liang
More informationCS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers
CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers These notes are not yet complete. 1 Interprocedural analysis Some analyses are not sufficiently
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationI Know Your Name: Named Entity Recognition and Structural Parsing
I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum
More informationGraph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11
Graph-Based Parsing Miguel Ballesteros Algorithms for NLP Course. 7-11 By using some Joakim Nivre's materials from Uppsala University and Jason Eisner's material from Johns Hopkins University. Outline
More informationProbabilistic parsing with a wide variety of features
Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368
More informationNatural Language Dependency Parsing. SPFLODD October 13, 2011
Natural Language Dependency Parsing SPFLODD October 13, 2011 Quick Review from Tuesday What are CFGs? PCFGs? Weighted CFGs? StaKsKcal parsing using the Penn Treebank What it looks like How to build a simple
More informationA Quick Guide to MaltParser Optimization
A Quick Guide to MaltParser Optimization Joakim Nivre Johan Hall 1 Introduction MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data
More informationFast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Xinyan Xiao, Yang Liu, Qun Liu, and Shouxun Lin Key Laboratory of Intelligent Information Processing Institute of Computing
More informationSyntax and Grammars 1 / 21
Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types What is a language? 2 / 21 What is a language?
More informationQuery Lattice for Translation Retrieval
Query Lattice for Translation Retrieval Meiping Dong, Yong Cheng, Yang Liu, Jia Xu, Maosong Sun, Tatsuya Izuha, Jie Hao # State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationThe CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016
The CKY Parsing Algorithm and PCFGs COMP-599 Oct 12, 2016 Outline CYK parsing PCFGs Probabilistic CYK parsing 2 CFGs and Constituent Trees Rules/productions: S VP this VP V V is rules jumps rocks Trees:
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationCSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)
CSE100 Advanced Data Structures Lecture 12 (Based on Paul Kube course materials) CSE 100 Coding and decoding with a Huffman coding tree Huffman coding tree implementation issues Priority queues and priority
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning
Basic Parsing with Context-Free Grammars Some slides adapted from Karl Stratos and from Chris Manning 1 Announcements HW 2 out Midterm on 10/19 (see website). Sample ques>ons will be provided. Sign up
More informationCube Pruning as Heuristic Search
Cube Pruning as Heuristic Search Mark Hopkins and Greg Langmead Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA 90292 {mhopkins,glangmead}@languageweaver.com Abstract Cube pruning
More information22nd International Conference on Computational Linguistics
Coling 2008 22nd International Conference on Computational Linguistics Advanced Dynamic Programming in Computational Linguistics: Theory, Algorithms and Applications Tutorial notes Liang Huang Department
More informationPrinciples of Programming Languages COMP251: Syntax and Grammars
Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2006
More informationOnline Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies
Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies Ivan Titov University of Illinois at Urbana-Champaign James Henderson, Paola Merlo, Gabriele Musillo University
More informationLing 571: Deep Processing for Natural Language Processing
Ling 571: Deep Processing for Natural Language Processing Julie Medero February 4, 2013 Today s Plan Assignment Check-in Project 1 Wrap-up CKY Implementations HW2 FAQs: evalb Features & Unification Project
More informationAdvanced Dynamic Programming in Semiring and Hypergraph Frameworks
Advanced Dynamic Programming in Semiring and Hypergraph Frameworks Liang Huang Department of Computer and Information Science University of Pennsylvania lhuang3@cis.upenn.edu July 15, 2008 Abstract Dynamic
More informationDependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin
Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing
More informationComparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300
More informationEfficient Parallelization of Natural Language Applications using GPUs
Efficient Parallelization of Natural Language Applications using GPUs Chao-Yue Lai Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-54
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationTransition-Based Parsing of the Chinese Treebank using a Global Discriminative Model
Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model Yue Zhang Oxford University Computing Laboratory yue.zhang@comlab.ox.ac.uk Stephen Clark Cambridge University Computer
More informationCSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li
CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationSpanning Tree Methods for Discriminative Training of Dependency Parsers
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2006 Spanning Tree Methods for Discriminative Training of Dependency Parsers Ryan
More informationEasy-First POS Tagging and Dependency Parsing with Beam Search
Easy-First POS Tagging and Dependency Parsing with Beam Search Ji Ma JingboZhu Tong Xiao Nan Yang Natrual Language Processing Lab., Northeastern University, Shenyang, China MOE-MS Key Lab of MCC, University
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More information