Klein & Manning, NIPS 2002
|
|
- Dinah Hawkins
- 5 years ago
- Views:
Transcription
1 Agenda for today Factoring complex model into product of simpler models Klein & Manning factored model: dependencies and constituents Dual decomposition for higher-order dependency parsing Refresh memory on arc-factored vs. second order models Intuitions about dual decomposition for this application Step through example inference on whiteboard How does it do? Dependencies and higher order parsing formalisms Tree Adjoining grammars, derived trees and derivation trees 1
2 Klein & Manning, NIPS 2002 Fast Exact Inference with a Factored Model for Natural Language Parsing S NP VP NN NNS VBD PP Factory payrolls fell IN NN in September fell-vbd payrolls-nns fell in-in Factory-NN Factory payrolls in September-NN September S, fell-vbd NP, payrolls-nns VP, fell-vbd Factory-NN payrolls-nns fell-vbd PP, in-in Factory payrolls fell in-in September-NN (a) PCFG Structure (b) Dependency Structure (c) Combined Structure Figure 1: Three kinds of parse structures. in September 2 A Factored Model Unlexicalized PCFG structure (linguistically motivated non-terms) Generative Lexical models dependency for parsing structures typically model one of the kinds of structures shown in figure 1. Figure 1a is a plain phrase-structure tree T, which primarily models syntactic units, figure Lexicalized 1b is a dependency PCFGs tree D, which primarily models word-to-word selectional affinities [5], and figure 1c is a lexicalized phrase-structure tree L, which carries both category and (part-of-speechtagged) head word information at each node. 2 A lexicalized tree can be viewed as the pair L = (T, D) of a phrase structure tree T and a dependency tree D. In this view, generative models over lexicalized trees, of the sort
3 Factored models Constituent and dependency parsers annotate highly correlated information Very competitive approach: percolate heads in constituent parse Splitting non-terminals with lexical heads helps constituent parsing Bi-lexical grammars come with high complexity inference Klein & Manning (2002) break probability model into two parts Let T be a constituent tree; D a dependency graph. Then P(T, D) P(T )P(D) (1) Probability mass allocated to mismatched T, D (deficient) Inside/outside performed separately for T, D using n 3 algorithms Joint inference for T, D an A search, heuristics from inside/outside Intuition: simpler models and inference by factoring 3
4 Inference decomposition Similar motivation for dual decomposition that we ll look at later Simplifying inference for complex models Some guarantees on exact inference (finding high score) Centrally relys on models finding similar solutions Current modeling approaches more principled Before moving on to present dual decomposition Refresh our memory on dependency parsing, MST approaches Look again at higher order models 4
5 Arc factored models 2 ROOT the 30 bit 30 the 40 dog postman
6 MST algorithm 2 40 ROOT 2 8 the 30 bit the 40 dog postman
7 Second-order models In constituent parsing, information in non-terminal labels can impact accuracy E.g., parent annotation, siblings on Markov grammar factorization, etc. Increases the parameter space in models for disambiguation Typically at a cost in terms of grammar size, parsing efficiency May want to increase features in edge-factored MST parser First-order features know only about nodes being linked by dependency Second-order features would know about adjacent dependencies Unfortunately, exact inference with second-order model NP Hard Proof by reduction to another NP Hard graph problem Common approach in NLP: approximate inference with rich model often preferable to exact inference with weaker model 7
8 Adjacent sibling dependencies Collins (1997) bi-lexical parsing model used head-dependent parameters This is a constituent model with rules from a head to its dependents Rules were factored (binarized) from the head-out, as discussed in lecture 3 With such factored categories, forget siblings other than previous k P(d l,k... d l,1 d r,1... d r,j h) P(d l,1 h) P(d l,i h, d l,i 1 ) + P(d r,1 h) i=2 j P(d r,i h, d r,i 1 ) This was a generative model, but same idea for feature configurations Factor your dependencies into adjacent siblings ; define features on those i=2 8
9 will see later. We write s(x i,,x j ) when x j is the first left or first right dependent of word x i. For example, s(2,, 4) is the score of creating a dependency from hit to ball, sinceball is the first child to the right of hit. Moreformally,iftheword x i0 has the children shown in this picture, Graph from McDonald and Pereira (2006) x i0 x i1... x ij x ij+1... x im the score factors as follows: j 1 k=1 s(i 0,i k+1,i k )+s(i 0,,i j ) + s(i 0,,i j+1 )+ m 1 k=j+1 s(i 0,i k,i k+1 ) 4. for j :1 5. for i : 6. y 7. if 8. δ = 9. if δ 10. m 11. end fo 12. end for 13. if m>0 14. y = y 15. else retu 16. end while Figure 4: A projective parsi of dependents h the collection o Partitioning dependencies This second-order to the leftfactorization and right of the subsumes head in the thestringasinglestage,w first-order factorization, since the score function of second-order Features include closest dependency on same side that is closer to the head could just ignore the middle argument to simulate time parsing. Note that first-order features scoring. alsothe usedscore in models of a tree (getfor themsecond- order parsing is now arbitrary m th -o for free) The Eisner a 9 s(x, y) = O(n m+1 ),form s(i, k, j) gorithm will wo
10 Approximate approach Use Eisner s algorithm to build dependencies from head out Algorithm builds left and right dependencies separately Has information for sibling features accessible Cubic complexity algorithm; further, only projective dependencies Exact inference for projective dependencies Greedy approximation: Take projective parse from Eisner as the starting point For every word, try changing its head to other words Choose new (valid) head that increases score the most If none increase the score; then done. Otherwise iterate Best projective parse probably not far from best overall parse 10
11 Improvement over first-order models? Use 2nd order word/word, word/pos pairs; and pos/pos/pos triples as features (as well as all of the first order features) Despite approximate inference, solid accuracy gains in three languages English projective dependencies: Czech non-projective dependencies: Danish non-projective dependencies: Also allow multiple parents in Danish: 85.6 Unsurprising: major slowdown versus edge-factored (first order) models (dominated by projective part) Might a rescoring/post-editing process on standard first-order MST work? 11
12 Dual decomposition (intuitions) General method for breaking complex problems down into smaller problems also called Lagrangian relaxation For example, sibling dependencies in dependency graphs Using MST is NP complete for the full-blown model But MST with edge-factored (first-order) model is quadratic Finding the best dependents for each head using a sibling model also quadratic (using dynamic programming); not guaranteed to be a tree If it were a tree, it would be a solution If MST and sibling dynamic programming found the same solution... Iterative strategy for solving two problems and comparing their solutions Each term has Lagrangian multipliers that are updated at each iteration Comes with formal guarantees about finding optimal 12
13 Dynamic programming: Best dependents for each head s(bit, dog, <s>) dog bit s(bit, -, the) bit the s(bit, the, </s>) s(bit, dog, the) s(bit, -, dog) s(bit, the, postman) the bit s(bit, -, the) bit s(bit, -, postman) bit post man s(bit, the, <s>) s(bit, postman, </s>) s(bit, -, <s>) <s> bit s(bit, -, </s>) bit </s> O(n) nodes in each head s graph; O(n) incoming arcs; overall O(n 3 ) Paper claims quadratic complexity, but for each head position Can collapse states into equivalence classes (head automata) Similar approach for grandparent models 13
14 Some relevant notes Each best head-dependent configuration calculated independently Hence words can be dependents zero or more times (Tree requires exactly once) The three word scores s(head,sib,dep) can include many features Arc-factored features included without increasing complexity Of course, POS-tags, direction and distance are important Even if best solution not a tree, hopefully close to a tree Dual decomposition works when two solutions are close Too many iterations required otherwise 14
15 Lagrangian relaxation (following Koo et al., 2010) Let Y be the set of possible well-formed dependency trees; and Z the set of possible head-dependent relations (not necessarily well formed tree) Let g(y) be the score according to edge-factored model for y Y Let f(z) be the score according to sibling model for z Z Let z(i, j) be one if i is head of j; zero otherwise (y(i, j) similarly defined) Let u(i, j) be the Lagrangian multiplier for head i and dependent j L = max L(u) = max z Z z Z,y Y,z=y f(z) i,j f(z) i,j u(i, j)z(i, j) + g(y) + i,j u(i, j)z(i, j) + max g(y) + y Y i,j u(i, j)y(i, j) u(i, j)y(i, j) L(u) is an upper bound of L ; so search (over all u) for minimum L(u) Known as the dual problem; objective convex but non-differentiable 15
16 Inference algorithm Iterative algorithm; parameterized maximum number of iterations Initialize Lagrangian multipliers (u scores for arcs) to 0 Perform both inference tasks independently If z = y, then done Reward u(i, j) for arcs (i, j) in z; Penalize u(i, j) for arcs (i, j) in y Iterate Some comments on its use Remember: this is an inference algorithm, not a training algorithm Expensive, due to iteration Works when independent z and y solutions are close Some potential speedups, e.g., caching previous solutions 16
17 Example: run an iteration, get solutions MST arc-factored solution: ROOT the aged bottle flies fast sibling model DP solution: ROOT the aged bottle flies fast Now what? 17
18 How well does it work? Koo et al., 2010 dependency parsing results, UAS Ma09 MST Sib G+S Best CertS CertG TimeS TimeG TrainS TrainG Dan Dut Por Slo Swe Tur Eng Eng Sm08 MST Sib G+S CertS CertG TimeS TimeG TrainS TrainG Dan Dut Mc06 MST Sib G+S CertS CertG TimeS TimeG TrainS TrainG PTB PDT Table 1: A comparison of non-projective automaton-based parsers with results from previous work. MST: Our firs order baseline. Sib/G+S: Non-projective head automata with sibling or grandparent/sibling interactions, decoded v dual decomposition. Ma09: The best UAS of the LP/ILP-based 18 parsers introduced in Martins et al. (2009). Sm0 The best UAS of any LBP-based parser in Smith and Eisner (2008). Mc06: The best UAS reported by McDona and Pereira (2006). Best: For the CoNLL-X languages only, the best UAS for any parser in the original shared ta
19 How many iterations does it take to converge? Rand Rand Koo et al., 2010 dependency parsing results sition with lin- (2009). LP(S): gle-commodity tion based on er Linear Pro- Percentage % validation UAS % certificates % match K= Maximum Number of Dual Decomposition Iterations Figure 4: The behavior of the dual-decomposition parser with sibling automata as the value of K is varied. Sib P-Sib G+S P-G+S 19 PTB PDT
20 Final notes on dual decomposition General method for inference, applicable to many problems Every year being applied to more and more problems Becoming part of the standard NLP toolbox Comes with great exact inference guarantees Important and useful, but not required for quality inference Accuracy plateaus with a max of around 50 iterations, approximately 75% with guaranteed optimal solution Often the case in NLP, approximate inference is faster and just as accurate Expect new variants, potentially with new approximate heuristics 20
21 Lexicalized grammars Incorporation of lexical items into grammars influenced by lexicalized grammar formalisms Those now known as mildly context-sensitive, e.g., TAGs Also Lexical-Functional Grammar (LFG), a unification grammar Linguistic insight that words impact syntax, e.g., subcategorization Common approaches to mildly context-sensitive grammars build this into the formalism: lexical anchors in TAG; word tags in CCG Leads to some important connections with dependencies Also leads to spurious derviation ambiguities Worthwhile thinking of derived versus derivation structure 21
22 Tree-Adjoining grammars A Tree-adjoining grammar (TAG) G = (V, T, S, I, A) a set of non-terminal variables V a set of terminals T a special start symbol S V a set of initial trees I Non-terminals on frontier marked for substitution a set of auxiliary trees A One non-terminal on frontier marked as foot node Otherwise like initial trees 22
23 Elementary trees Initial trees (I) and auxiliary trees (A) together make up the set of elementary trees in contrast to derived trees Elementary trees are of type X where X is the root category Foot node in auxiliary trees must be of same category as the root Lexicalized TAG (LTAG) requires at least one terminal item (the anchor) on every elementary tree Two operations defined on trees: substitution and adjunction 23
24 Initial tree Rooted at a single node (X) Yield of tree can consist of terminals and non-terminals Non-terminals are substitution nodes X terminal items and substitution nodes Trees with root category Y can substitute at substitution node with category Y 24
25 Auxiliary tree Rooted at a single node (X) Yield of tree can consist of terminals and non-terminals At least one non-terminal is adjunction node denoted with * X Foot node X* terminal items and substitution nodes Foot node category must match root category 25
26 Substitution and Adjunction Substitution Replace substitution node in yield of tree T 1 with tree T 2 rooted in the same category Adjunction Detach sub-tree rooted with category X from T 1 Attach subtree to foot-node of auxiliary tree T 2 Re-attach root of T 2 at site of original sub-tree 26
27 Substitution Y X X Y X 27
28 Adjunction Y X X a) X * Y X b) Y X c) X X 28
29 Notes on TAG Adjunction makes this context-sensitive Just substitution is context-free equivalent Hence n 3 parsing algorithm with no adjunction Parsing with standard TAG is n 6 Instead of many rules, just two rules of combination Increased role of the lexicon to dictate possible structures Information encoded in rules in CFG now encoded in the lexicon 29
30 Elementary trees (slide taken from Joshi & Schabes, 1997) 30
31 Why mention this formalism in this class? Putting trees together to build parse, not rules Trees have lexical anchors (heads) When a tree substitutes or adjoins, it attaches to the head The steps taken in putting them together is a derivation The derivation is a tree of moves, not a sequence of moves In other words, it is a dependency tree Hence get both a phrase structure tree and dependency structure out of TAG derivation 31
32 Derived tree (slide taken from Joshi & Schabes, 1997) 32
33 Derivation tree (slide taken from Joshi & Schabes, 1997) 33
34 Summary Covered dual decomposition for dependency parsing Very useful approach for many NLP tasks; sort of hot topic Noted links between dependency trees and TAGS Similar links with other approaches, e.g., CCG Next lecture going to wrap up with some misc. topics Approximate inference techniques for dependency parsing Using dependency parsing for machine translation Using dependency parsing for language modeling (All three related to some term projects in the works) 34
Parsing with Dynamic Programming
CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words
More informationGraph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11
Graph-Based Parsing Miguel Ballesteros Algorithms for NLP Course. 7-11 By using some Joakim Nivre's materials from Uppsala University and Jason Eisner's material from Johns Hopkins University. Outline
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationIncremental Integer Linear Programming for Non-projective Dependency Parsing
Incremental Integer Linear Programming for Non-projective Dependency Parsing Sebastian Riedel James Clarke ICCS, University of Edinburgh 22. July 2006 EMNLP 2006 S. Riedel, J. Clarke (ICCS, Edinburgh)
More informationAgenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing
Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2014-12-10 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Mid-course evaluation Mostly positive
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2015-12-09 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing
More informationCS 224N Assignment 2 Writeup
CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)
More informationDependency grammar and dependency parsing
Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2016-12-05 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Activities - dependency parsing
More information2 Ambiguity in Analyses of Idiomatic Phrases
Representing and Accessing [Textual] Digital Information (COMS/INFO 630), Spring 2006 Lecture 22: TAG Adjunction Trees and Feature Based TAGs 4/20/06 Lecturer: Lillian Lee Scribes: Nicolas Hamatake (nh39),
More informationOnline Learning of Approximate Dependency Parsing Algorithms
Online Learning of Approximate Dependency Parsing Algorithms Ryan McDonald Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 {ryantm,pereira}@cis.upenn.edu
More informationNatural Language Dependency Parsing. SPFLODD October 13, 2011
Natural Language Dependency Parsing SPFLODD October 13, 2011 Quick Review from Tuesday What are CFGs? PCFGs? Weighted CFGs? StaKsKcal parsing using the Penn Treebank What it looks like How to build a simple
More informationThe CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017
The CKY Parsing Algorithm and PCFGs COMP-550 Oct 12, 2017 Announcements I m out of town next week: Tuesday lecture: Lexical semantics, by TA Jad Kabbara Thursday lecture: Guest lecture by Prof. Timothy
More informationBidirectional LTAG Dependency Parsing
Bidirectional LTAG Dependency Parsing Libin Shen BBN Technologies Aravind K. Joshi University of Pennsylvania We propose a novel algorithm for bi-directional parsing with linear computational complexity,
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationTekniker för storskalig parsning: Dependensparsning 2
Tekniker för storskalig parsning: Dependensparsning 2 Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Dependensparsning 2 1(45) Data-Driven Dependency
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationLing 571: Deep Processing for Natural Language Processing
Ling 571: Deep Processing for Natural Language Processing Julie Medero January 14, 2013 Today s Plan Motivation for Parsing Parsing as Search Search algorithms Search Strategies Issues Two Goal of Parsing
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationDynamic Programming for Higher Order Parsing of Gap-Minding Trees
Dynamic Programming for Higher Order Parsing of Gap-Minding Trees Emily Pitler, Sampath Kannan, Mitchell Marcus Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 epitler,kannan,mitch@seas.upenn.edu
More informationSEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches
SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY Parser Evaluation Approaches NATURE OF PARSER EVALUATION Return accurate syntactic structure of sentence. Which representation? Robustness of parsing. Quick
More informationEasy-First POS Tagging and Dependency Parsing with Beam Search
Easy-First POS Tagging and Dependency Parsing with Beam Search Ji Ma JingboZhu Tong Xiao Nan Yang Natrual Language Processing Lab., Northeastern University, Shenyang, China MOE-MS Key Lab of MCC, University
More informationApproximate Large Margin Methods for Structured Prediction
: Approximate Large Margin Methods for Structured Prediction Hal Daumé III and Daniel Marcu Information Sciences Institute University of Southern California {hdaume,marcu}@isi.edu Slide 1 Structured Prediction
More informationSuccinct Data Structures for Tree Adjoining Grammars
Succinct Data Structures for Tree Adjoining Grammars James ing Department of Computer Science University of British Columbia 201-2366 Main Mall Vancouver, BC, V6T 1Z4, Canada king@cs.ubc.ca Abstract We
More informationCKY algorithm / PCFGs
CKY algorithm / PCFGs CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University of Massachusetts
More informationAdvanced PCFG Parsing
Advanced PCFG Parsing Computational Linguistics Alexander Koller 8 December 2017 Today Parsing schemata and agenda-based parsing. Semiring parsing. Pruning techniques for chart parsing. The CKY Algorithm
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationScribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017
CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive
More informationInteger Programming ISE 418. Lecture 7. Dr. Ted Ralphs
Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint
More informationMA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011
MA53: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 8 Date: September 2, 20 xercise: Define a context-free grammar that represents (a simplification of) expressions
More informationAdvanced PCFG Parsing
Advanced PCFG Parsing BM1 Advanced atural Language Processing Alexander Koller 4 December 2015 Today Agenda-based semiring parsing with parsing schemata. Pruning techniques for chart parsing. Discriminative
More informationA Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing
A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing Alexander M. Rush 1,2 Michael Collins 2 SRUSH@CSAIL.MIT.EDU MCOLLINS@CS.COLUMBIA.EDU 1 Computer Science
More informationA Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing
A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing The MIT Faculty has made this article openly available. Please share how this access benefits you.
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains
More informationAlgorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp
-7 Algorithms for NLP Chart Parsing Reading: James Allen, Natural Language Understanding Section 3.4, pp. 53-6 Chart Parsing General Principles: A Bottom-Up parsing method Construct a parse starting from
More informationProbabilistic parsing with a wide variety of features
Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368
More informationAlgorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp
11-711 Algorithms for NLP Chart Parsing Reading: James Allen, Natural Language Understanding Section 3.4, pp. 53-61 Chart Parsing General Principles: A Bottom-Up parsing method Construct a parse starting
More informationForest-based Algorithms in
Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,
More informationTuesday, April 10. The Network Simplex Method for Solving the Minimum Cost Flow Problem
. Tuesday, April The Network Simplex Method for Solving the Minimum Cost Flow Problem Quotes of the day I think that I shall never see A poem lovely as a tree. -- Joyce Kilmer Knowing trees, I understand
More informationGreedy Algorithms CHAPTER 16
CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning
Basic Parsing with Context-Free Grammars Some slides adapted from Karl Stratos and from Chris Manning 1 Announcements HW 2 out Midterm on 10/19 (see website). Sample ques>ons will be provided. Sign up
More informationSTRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH
Slide 3.1 3 STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH 3.0 Introduction 3.1 Graph Theory 3.2 Strategies for State Space Search 3.3 Using the State Space to Represent Reasoning with the Predicate
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationCS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019
CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019 1 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Lisa Wang, Juhi Naik,
More informationThe CKY algorithm part 2: Probabilistic parsing
The CKY algorithm part 2: Probabilistic parsing Syntactic analysis/parsing 2017-11-14 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Recap: The CKY algorithm The
More informationDiscrete mathematics
Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many
More informationComputational Linguistics
Computational Linguistics CSC 2501 / 485 Fall 2018 3 3. Chart parsing Gerald Penn Department of Computer Science, University of Toronto Reading: Jurafsky & Martin: 13.3 4. Allen: 3.4, 3.6. Bird et al:
More informationComputational Linguistics: Feature Agreement
Computational Linguistics: Feature Agreement Raffaella Bernardi Contents 1 Admin................................................... 4 2 Formal Grammars......................................... 5 2.1 Recall:
More informationCSE 417 Dynamic Programming (pt 6) Parsing Algorithms
CSE 417 Dynamic Programming (pt 6) Parsing Algorithms Reminders > HW9 due on Friday start early program will be slow, so debugging will be slow... should run in 2-4 minutes > Please fill out course evaluations
More informationDependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin
Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely
More informationContext-Free Grammars
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 3, 2012 (CFGs) A CFG is an ordered quadruple T, N, D, P where a. T is a finite set called the terminals; b. N is a
More informationAlgorithms Dr. Haim Levkowitz
91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic
More informationLSA 354. Programming Assignment: A Treebank Parser
LSA 354 Programming Assignment: A Treebank Parser Due Tue, 24 July 2007, 5pm Thanks to Dan Klein for the original assignment. 1 Setup To do this assignment, you can use the Stanford teaching computer clusters.
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationLecture 25 of 41. Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees
Spatial Sorting: Binary Space Partitioning Quadtrees & Octrees William H. Hsu Department of Computing and Information Sciences, KSU KSOL course pages: http://bit.ly/hgvxlh / http://bit.ly/evizre Public
More informationOutline for today s lecture. Informed Search. Informed Search II. Review: Properties of greedy best-first search. Review: Greedy best-first search:
Outline for today s lecture Informed Search II Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing 2 Review: Greedy best-first search: f(n): estimated
More informationForest-Based Search Algorithms
Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2
More information2.2 Syntax Definition
42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions
More informationOutline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black.
Outline 1 Definition Computer Science 331 Red-Black rees Mike Jacobson Department of Computer Science University of Calgary Lectures #20-22 2 Height-Balance 3 Searches 4 Rotations 5 s: Main Case 6 Partial
More informationSome Interdefinability Results for Syntactic Constraint Classes
Some Interdefinability Results for Syntactic Constraint Classes Thomas Graf tgraf@ucla.edu tgraf.bol.ucla.edu University of California, Los Angeles Mathematics of Language 11 Bielefeld, Germany 1 The Linguistic
More informationAdmin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?
Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides
More informationCSE 417 Branch & Bound (pt 4) Branch & Bound
CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity
More information4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests
4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in
More informationAdvanced PCFG algorithms
Advanced PCFG algorithms! BM1 Advanced atural Language Processing Alexander Koller! 9 December 2014 Today Agenda-based parsing with parsing schemata: generalized perspective on parsing algorithms. Semiring
More information1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples...
. Why Study Trees? CITS00 Data Structures and Algorithms Topic 0 Trees and Graphs Trees and Graphs Binary trees definitions: size, height, levels, skinny, complete Trees, forests and orchards Wood... Examples...
More informationContext-Free Grammars. Carl Pollard Ohio State University. Linguistics 680 Formal Foundations Tuesday, November 10, 2009
Context-Free Grammars Carl Pollard Ohio State University Linguistics 680 Formal Foundations Tuesday, November 10, 2009 These slides are available at: http://www.ling.osu.edu/ scott/680 1 (1) Context-Free
More informationElements of Language Processing and Learning lecture 3
Elements of Language Processing and Learning lecture 3 Ivan Titov TA: Milos Stanojevic Institute for Logic, Language and Computation Today Parsing algorithms for CFGs Recap, Chomsky Normal Form (CNF) A
More informationPhrase Structure Parsing. Statistical NLP Spring Conflicting Tests. Constituency Tests. Non-Local Phenomena. Regularity of Rules
tatistical NLP pring 2007 Lecture 14: Parsing I Dan Klein UC Berkeley Phrase tructure Parsing Phrase structure parsing organizes syntax into constituents or brackets In general, this involves nested trees
More informationQuery Evaluation Strategies
Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Research (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa
More informationA Simple Syntax-Directed Translator
Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called
More informationDesign and Analysis of Algorithms
CSE 101, Winter 018 D/Q Greed SP s DP LP, Flow B&B, Backtrack Metaheuristics P, NP Design and Analysis of Algorithms Lecture 8: Greed Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Optimization
More informationCS 4100 // artificial intelligence
CS 4100 // artificial intelligence instructor: byron wallace Constraint Satisfaction Problems Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials
More informationCOMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview
COMP 181 Compilers Lecture 2 Overview September 7, 2006 Administrative Book? Hopefully: Compilers by Aho, Lam, Sethi, Ullman Mailing list Handouts? Programming assignments For next time, write a hello,
More information8.1. Optimal Binary Search Trees:
DATA STRUCTERS WITH C 10CS35 UNIT 8 : EFFICIENT BINARY SEARCH TREES 8.1. Optimal Binary Search Trees: An optimal binary search tree is a binary search tree for which the nodes are arranged on levels such
More informationCollins and Eisner s algorithms
Collins and Eisner s algorithms Syntactic analysis (5LN455) 2015-12-14 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Recap: Dependency trees dobj subj det pmod
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June
More informationAlgorithms for Integer Programming
Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is
More informationIterative CKY parsing for Probabilistic Context-Free Grammars
Iterative CKY parsing for Probabilistic Context-Free Grammars Yoshimasa Tsuruoka and Jun ichi Tsujii Department of Computer Science, University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 CREST, JST
More informationFormal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr
Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Grammars Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/
More informationNotes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 10 Notes. Class URL:
Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 10 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes February 13 (1) HW5 is due
More informationParsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)
Parsing Part II (Ambiguity, Top-down parsing, Left-recursion Removal) Ambiguous Grammars Definitions If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous
More informationLing 571: Deep Processing for Natural Language Processing
Ling 571: Deep Processing for Natural Language Processing Julie Medero February 4, 2013 Today s Plan Assignment Check-in Project 1 Wrap-up CKY Implementations HW2 FAQs: evalb Features & Unification Project
More informationMotivation. CS389L: Automated Logical Reasoning. Lecture 5: Binary Decision Diagrams. Historical Context. Binary Decision Trees
Motivation CS389L: Automated Logical Reasoning Lecture 5: Binary Decision Diagrams Işıl Dillig Previous lectures: How to determine satisfiability of propositional formulas Sometimes need to efficiently
More informationSyntax Analysis. Chapter 4
Syntax Analysis Chapter 4 Check (Important) http://www.engineersgarage.com/contributio n/difference-between-compiler-andinterpreter Introduction covers the major parsing methods that are typically used
More informationWhat is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.
Parsing What is Parsing? S NP VP NP Det N S NP Papa N caviar NP NP PP N spoon VP V NP VP VP PP NP VP V spoon V ate PP P NP P with VP PP Det the Det a V NP P NP Det N Det N Papa ate the caviar with a spoon
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationA structure-sharing parser for lexicalized grammars
A structure-sharing parser for lexicalized grammars Roger Evans Information Technology Research Institute University of Brighton Brighton, BN2 4G J, UK Roger. Evans @it ri. bright on. ac. uk David Weir
More informationA Logical Approach to Structure Sharing in TAGs
Workshop TAG+5, Paris, 25-27 May 2000 171 A Logical Approach to Structure Sharing in TAGs Adi Palm Department of General Linguistics University of Passau D-94030 Passau Abstract Tree adjoining grammars
More informationText Compression through Huffman Coding. Terminology
Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character
More informationChapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.
Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches
More informationLecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions
U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front
More information1. Generalized CKY Parsing Treebank empties and unaries. Seven Lectures on Statistical Parsing. Unary rules: Efficient CKY parsing
even Lectures on tatistical Parsing 1. Generalized CKY Parsing Treebank empties and unaries -HLN -UBJ Christopher Manning LA Linguistic Institute 27 LA 354 Lecture 3 -NONEε Atone PTB Tree -NONEε Atone
More information