Transition-Based Dependency Parsing with MaltParser Joakim Nivre Uppsala University and Växjö University Transition-Based Dependency Parsing 1(13)
Introduction Outline Goals of the workshop Transition-based dependency parsing... Transition systems Scoring functions Search algorithms... with MaltParser Parsing algorithm = transition system + search algorithm Guide = scoring function Transition-Based Dependency Parsing 2(13)
Introduction Goals of the Workshop Background: OSDT meeting in Copenhagen Subgroup interested in using MaltParser as a research platform Goals: Enable participants to use MaltParser Enable participants to modify MaltParser Establish desiderata for future versions of MaltParser Expectations from participants? Transition-Based Dependency Parsing 3(13)
Introduction Program Thursday morning: Introduction: Transition-based parsing with MaltParser (Nivre) MaltParser: Architecture, components and interfaces (Hall) Thursday afternoon: Using MaltParser with built-in options (Nivre) Extending MaltParser with plugins (Hall) Friday morning: Building applications with MaltParser (Hall) Challenges in using parsers at Google (Ringgaard) Friday afternoon Free for discussions, planning, etc. Transition-Based Dependency Parsing 4(13)
Transition-Based Dependency Parsing Dependency Parsing Task definition: Map a sentence x = (w 1,..., w n ) to a dependency graph G = (V, A), where 1. V = {0, 1,..., n} is a set of nodes (one for each w i + root 0), 2. A V L V is a set of labeled arcs (over label set L). We normally require G to be a directed tree rooted at 0. Transition-Based Dependency Parsing 5(13)
Transition-Based Dependency Parsing Transition-Based Dependency Parsing A transition system S = (C, T, c s, C t ), where 1. C is a set of configurations, each of which contains a buffer β of (remaining) nodes and a set A of dependency arcs, 2. T is a set of transitions, each of which is a (partial) function t : C C, 3. c s is an initialization function, mapping a sentence x = (w 1,..., w n ) to a configuration with β = [1,..., n], 4. C t C is a set of terminal configurations. A scoring function λ : C T R, which assigns a real-valued score λ(c, t) to each transition t out of a configuration c. A search algorithm h(s, λ, x) for finding the optimal transition sequence C 0,m = c 0,..., c m (c 0 = c s (x), c m C t ) for sentence x in system S relative to the scoring function λ. Transition-Based Dependency Parsing 6(13)
Transition-Based Dependency Parsing Example: Transition System Arc-standard shift-reduce parsing: C = {(σ, β, A) σ is a stack, β is a buffer, A is an arc set} T = {Shift, LeftArc l, RightArc l }, where 1. Shift : (σ, i β, A) (σ i, β, A) 2. LeftArc l : (σ i, j β, A) (σ, j β, A {(j, l, i)}) 3. RightArc l (σ i, j β, A) (σ, i β, A {(i, l, j)}) c s (x = w 1,..., w n ) = ([0], [1,..., n], ) C t = {(σ, β, A) C β = []} Transition-Based Dependency Parsing 7(13)
Transition-Based Dependency Parsing Example: Scoring Function Feature-based classification: λ(c, t) = g(φ(c, t)), where 1. Φ : C T R k is a feature model, which maps each pair (c, t) to a k-dimensional feature vector Φ(c, t), 2. g : R k R is a (generalized) linear classifier, which maps a feature vector Φ(c, t) to a score in the interval [ 1, 1]. Classifier training: Training instances (c, t) derived from treebank data. Supervised learning using support vector machines with kernels. Transition-Based Dependency Parsing 8(13)
Transition-Based Dependency Parsing Example: Search Algorithm Greedy, deterministic search: h(s, λ, x) 1 c c s (x) 2 while c C t 3 t arg max t λ(c, t ) 4 c t(c) 5 return G = ({0, 1,..., n}, A c ) Transition-Based Dependency Parsing 9(13)
Variations on Transition-Based Parsing Alternative transition systems: Arc-eager shift-reduce parsing [Nivre 2003] Transition-Based Dependency Parsing Arc-standard shift-reduce parsing [Yamada and Matsumoto 2003] Restricted non-projective parsing [Attardi 2006] Unrestricted non-projective parsing [Covington 2001, Nivre 2007] Alternative scoring functions: Support vector machines [Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Isozaki et al. 2004, Cheng et al. 2004, Sagae and Lavie 2006] Memory-based learning [Attardi 2006] Maximum entropy [Cheng et al. 2005, Attardi 2006] Perceptron learning [Ciaramita and Attardi 2007] Alternative search algorithms: Greedy single-pass [Nivre et al. 2004] Greedy iterative [Yamada and Matsumoto 2003] Beam search [Johansson and Nugues 2006, Titov and Henderson 2007] Transition-Based Dependency Parsing 10(13)
MaltParser MaltParser as a Framework MaltParser: Framework for transition-based dependency parsing Orthogonal components: Transition system Scoring function Search algorithm Designed for maximum flexibility: Components can be varied independently. Any combination of components should work (in principle). Transition-Based Dependency Parsing 11(13)
MaltParser Theory and Implementation 1 Transition systems and search algorithms: In MaltParser, a transition system is (currently) merged with a particular search algorithm into a parsing algorithm. As a result, transition systems and search algorithms cannot be varied independently. Parsing algorithms: Several parsing algorithms are built into the system. New parsing algorithms can be added as plugins. Transition-Based Dependency Parsing 12(13)
MaltParser Theory and Implementation 2 Scoring functions: In MaltParser, a scoring function is currently split into a feature model and a learner. As a result, feature models and learners can be varied independently. Feature models: Feature models are defined using a specification language over built-in feature functions. New feature functions can be added as plugins. Learners: Learners are interfaces to machine learning packages. New learners can be added as plugins. Transition-Based Dependency Parsing 13(13)
References Giuseppe Attardi. 2006. Experiments with a multilanguage non-projective dependency parser. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pages 166 170. Yuchang Cheng, Masayuki Asahara, and Yuji Matsumoto. 2004. Deterministic dependency structure analyzer for Chinese. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP), pages 500 508. Yuchang Cheng, Masayuki Asahara, and Yuji Matsumoto. 2005. Machine learning-based dependency analyzer for Chinese. In Proceedings of International Conference on Chinese Computing (ICCC), pages 66 73. Massimiliano Ciaramita and Giuseppe Attardi. 2007. Dependency parsing with second-order feature maps and annotated semantic information. In Proceedings of the Tenth International Conference on Parsing Technologies, pages 133 143, June. Michael A. Covington. 2001. A fundamental algorithm for dependency parsing. In Proceedings of the 39th Annual ACM Southeast Conference, pages 95 102. Hideki Isozaki, Hideto Kazawa, and Tsutomu Hirao. 2004. A deterministic word dependency analyzer enhanced with preference learning. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pages 275 281. Richard Johansson and Pierre Nugues. 2006. Investigating multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pages 206 210. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the Sixth Workshop on Computational Language Learning (CoNLL), pages 63 69. Joakim Nivre, Johan Hall, and Jens Nilsson. 2004. Memory-based dependency parsing. In Proceedings of the 8th Conference on Computational Natural Language Learning, pages 49 56. Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pages 149 160. Transition-Based Dependency Parsing 13(13)
References Joakim Nivre. 2007. Incremental non-projective dependency parsing. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), pages 396 403. Kenji Sagae and Alon Lavie. 2006. Parser combination by reparsing. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 129 132. Ivan Titov and James Henderson. 2007. A latent variable model for generative dependency parsing. In Proceedings of the 10th International Conference on Parsing Technologies (IWPT), pages 144 155. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pages 195 206. Transition-Based Dependency Parsing 13(13)