October 19, 2004 Chapter Parsing

Similar documents
Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Ling 571: Deep Processing for Natural Language Processing

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Vorlesung 7: Ein effizienter CYK Parser

Parsing Chart Extensions 1

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

The CKY algorithm part 2: Probabilistic parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Computational Linguistics

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

A simple syntax-directed

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Compiling Regular Expressions COMP360

Syntax and Grammars 1 / 21

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Parsing partially bracketed input

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

The CKY algorithm part 1: Recognition

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

The CKY algorithm part 1: Recognition

The Earley Parser

Ling 571: Deep Processing for Natural Language Processing

CMPT 755 Compilers. Anoop Sarkar.

Non-deterministic Finite Automata (NFA)

Principles of Programming Languages COMP251: Syntax and Grammars

Introduction to Lexical Analysis

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

CSE 3302 Programming Languages Lecture 2: Syntax

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Related Course Objec6ves

COP 3402 Systems Software Syntax Analysis (Parser)

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Lexical Analysis. Chapter 2

Introduction to Lexical Analysis

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntax Analysis Part I

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Deliverable D1.4 Report Describing Integration Strategies and Experiments

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compiler Design Overview. Compiler Design 1

CS 224N Assignment 2 Writeup

Algorithms for NLP. LR Parsing. Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section , pp.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Compilers Course Lecture 4: Context Free Grammars

Lexical Analysis. Lecture 3-4

Abstract Syntax Trees & Top-Down Parsing

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Assignment 4 CSE 517: Natural Language Processing

Compiler Design Concepts. Syntax Analysis

2 Ambiguity in Analyses of Idiomatic Phrases

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 314 Principles of Programming Languages. Lecture 3

CYK- and Earley s Approach to Parsing

Introduction to Lexing and Parsing

Defining syntax using CFGs

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Lexical Analysis. Lecture 2-4

CS 314 Principles of Programming Languages

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Principles of Programming Languages COMP251: Syntax and Grammars

Context Free Grammars and Recursive Descent Parsing

Compilers and Interpreters

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Top down vs. bottom up parsing

Compilers and computer architecture From strings to ASTs (2): context free grammars

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

Topic 3: Syntax Analysis I

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Part 3. Syntax analysis. Syntax analysis 96

11-682: Introduction to IR, NLP, MT and Speech. Parsing with Unification Grammars. Reading: Jurafsky and Martin, Speech and Language Processing

Theory and Compiling COMP360

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER. lg Cambridge UNIVERSITY PRESS

CS153: Compilers Lecture 4: Recursive Parsing

2068 (I) Attempt all questions.

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

CS 842 Ben Cassell University of Waterloo

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lecture 21: Relational Programming II. November 15th, 2011

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

An Efficient Implementation of PATR for Categorial Unification Grammar

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Transcription:

October 19, 2004 Chapter 10.3 10.6 Parsing 1

Overview Review: CFGs, basic top-down parser Dynamic programming Earley algorithm (how it works, how it solves the problems) Finite-state parsing 2

Last time What is parsing? Inadequate grammars for human languages lists, regex, CFG A simple top-down parser for CFGs 3

Corrected version of TOP-DOWN-PARSE function TOP-DOWN-PARSE(input,grammar) returns a parse tree agenda (Initial S tree, Beginning of input) css POP(agenda) loop if SUCCESSFUL-PARSE?(css) then return TREE(css) else if CAT(NODE-TO-EXPAND(css)) is a POS then if CAT(NODE-TO-EXPAND(css)) POS(CURRENT-INPUT(css)) then PUSH(APPLY-LEXICAL-RULE(css,grammar),agenda) else PUSH(APPLY-RULES(css,grammar),agenda) if agenda is empty then return reject else css POP(agenda) end 4

Problems with TOP-DOWN-PARSE Infinite loops with left-recursive grammars Ambiguity Inefficient reparsing of subtrees 5

Dynamic Programming First introduced by Bellman (1957) A table-driven method of solving problems by combining solutions to sub-problems In this context, what is the problem? What is a sub-problem? How are sub-problems combined? 6

The Earley Chart Parser: Chart and Dotted Rules For a sentence of N words, the chart contains N+1 cells. Each cell contains a list of states. A state consists of: a local subtree ( edge ), information about the degree of completion of that subtree, and information about how much of the string corresponds to the subtree. For example (in dotted-rule notation): S V P, [0, 0] NP Det Nominal, [1, 2] V P V NP, [0, 3] 7

The Earley Chart Parser, Outline Add the initial S (gamma S, [0,0]) to the chart at position 0. Loop, for each of the rest of the cells in the chart: If the state is incomplete, and the category to the right of the dot is not a POS, add (if not redundant) new states to the chart in the current position for each rule that expands that category. These new states all have the dot at the beginning of the rule, and a span that starts and ends at the current position. (PREDICTOR) 8

The Earley Chart Parser, Outline Loop (continued): If the state is incomplete, and the category to the right of the dot ( B ) is a POS, and B is a possible POS for the next word in the string, add the rule B word (if not redundant) to the next cell in the chart. (SCANNER) If the state is complete (dot all the way to the right), look in the cell corresponding to the beginning of the current state s span for states which are currently seeking a daughter of the same category as the mother of this state. For each one of those, add a state (if not redundant) to the current cell, with the dot moved over one, and the span increased to the end of the current word. (COMPLETER) 9

The Earley Chart Parser In which cell of the chart does one find the spanning edge(s)? Is the Earley algorithm top-down or bottom-up? Best-first or exhaustive? Uni-directional or bi-directional? Breadth-first or depth-first? 10

How does the Earley algorithm solve these problems? Ambiguity Inefficient reparsing of subtrees Infinite loops with left-recursive grammars NP NP PP The cat on the mat by the door slept 11

Expanding the Chart Parsing is apparently an exponential-time problem. The Earley alogrithm does recognition in polynomial time (O(N 3 )). Returning the trees is still potentially exponential. How would this alogrithm return the trees? Effectively takes an exponential-time problem and makes it polynomial. 12

Finite-State or Chunk Parsing When robustness is more important than precision, shallow parsing techniques are used instead. Finite-State or Chunk parsers recognize patterns within sentences as Noun groups, Verb groups etc. They can be used to return partial results, even if the whole string can t be treated. Done with cascades of FSAs. What does it mean to cascade an FSA? 13

Finite-State or Chunk Parsing More modern robust, shallow processing techniques involve machine learning to deal with learning all the different patterns. One example: RASP (Robust Accurate Statistical Parsing) http://www.cogs.susx.ac.uk/lab/nlp/rasp The European project Deep Thought is currently investigating methods for combining deep and shallow parsing for knowledge-intensive information extraction http://www.project-deepthought.net/ 14

Overview Review: CFGs, basic top-down parser Dynamic programming Earley algorithm (how it works, how it solves the problems) Finite-state parsing 15

Coming up next... Adding unification to enable more interesting grammars. Another parsing algorithm, with a different kind of chart. Probabilistic parsing, for meaningful best-first A parser such as the Earley Chart Parser is only as good as the grammars it interprets... 16