Ling 571: Deep Processing for Natural Language Processing

Similar documents
The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

The CKY algorithm part 1: Recognition

The CKY algorithm part 1: Recognition

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CKY algorithm / PCFGs

Assignment 4 CSE 517: Natural Language Processing

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Parsing II Top-down parsing. Comp 412

October 19, 2004 Chapter Parsing

Natural Language Processing

The CKY algorithm part 2: Probabilistic parsing

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

CS 224N Assignment 2 Writeup

Dependency grammar and dependency parsing

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Computational Linguistics

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

CS 406/534 Compiler Construction Parsing Part I

Ling 571: Deep Processing for Natural Language Processing

Parsing Chart Extensions 1

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Homework & Announcements

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntactic Analysis. Top-Down Parsing

Syntax Analysis Part I

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Computer Science 160 Translation of Programming Languages

The Earley Parser

Probabilistic Parsing of Mathematics

Introduction to Parsing. Comp 412

CSE302: Compiler Design

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 314 Principles of Programming Languages

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Elements of Language Processing and Learning lecture 3

Introduction to Parsing. Lecture 5

CSCE 314 Programming Languages

Introduction to Parsing. Lecture 5

Advanced PCFG Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

CSCI312 Principles of Programming Languages

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH

LR Parsing LALR Parser Generators

Context-Free Grammars

A programming language requires two major definitions A simple one pass compiler

Syntax Analysis, III Comp 412

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Compiler Design Concepts. Syntax Analysis

3. Parsing. Oscar Nierstrasz

LR Parsing LALR Parser Generators

Syntax Analysis, III Comp 412

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Bottom-Up Parsing. Lecture 11-12

Lexical and Syntax Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Summary: Semantic Analysis

Vorlesung 7: Ein effizienter CYK Parser

Parsing II Top-down parsing. Comp 412

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Advanced PCFG algorithms

Introduction to parsers

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Syntax Analysis. Chapter 4

Principles of Programming Languages COMP251: Syntax and Grammars

Compilers and computer architecture From strings to ASTs (2): context free grammars

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Introduction to Parsing. Lecture 8

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Solving systems of regular expression equations

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Natural Language Processing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

In One Slide. Outline. LR Parsing. Table Construction

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Syntax. In Text: Chapter 3

Bottom-Up Parsing. Lecture 11-12

Types of parsing. CMSC 430 Lecture 4, Page 1

CS153: Compilers Lecture 4: Recursive Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Parsing partially bracketed input

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Transcription:

Ling 571: Deep Processing for Natural Language Processing Julie Medero January 14, 2013

Today s Plan Motivation for Parsing Parsing as Search Search algorithms Search Strategies Issues

Two Goal of Parsing Analyze input strings to assign proper structures For input A, grammar G: Assign zero or more parse tree(s) T: Cover all and only the elements of A Root of T is S (the start symbol of G) Do not necessarily pick one (or correct) analysis

Two Goal of Parsing Recognition Subtask of parsing For input A, grammar G Is A in the language defined by G?

Questions for Parsing Is this sentence in the language? FSAs accept the regular languages defined by automaton Parsers accept language defined by CFG What is the syntactic structure of this sentence? Syntactic parse provides framework for semantic analysis What is the subject? Useful for e.g. question answering

Parsing as Search Search through possible parse trees Want one (or more) that derive input Formally, search problems are defined by: Start state S, Goal state G, Successor Function: Transitions between states, Path cost function

One Model of Parsing as Start State: Start Symbol from grammar Goal test: Search Does parse tree cover all and only input? Successor function: Expand a non-terminal using production in grammar where nonterminal is LHS of grammar Path cost: We ll ignore here

One Model of Parsing as Search Node: Partial solution to search problem: Partial parse Search start node: Initial State Input string Start symbol of CFG Goal node: Full parse tree: covering all and only input, rooted at S

Search Algorithms Depth first Keep expanding non-terminal until reach words If no more expansions, back up Breadth first Consider all parses with one non-terminal expanded Then all with two expanded, etc. Other alternatives if have associated path costs

Parse Search Strategies Two constraints: Must start with the start symbol Must cover exactly the input string Correspond to main parsing search strategies Top-down search (Goal-directed search) Bottom-up search (Data-driven search)

Parse Search Strategies Breadth-First Depth-First Top-Down Bottom-Up

Today s Plan Motivation for Parsing Parsing as Search Search algorithms Search Strategies Issues

Breadth-First Search Uninformed search strategy Visiting all neighboring (sister) nodes first Then go deeper into the tree

Breadth-First Example A B C D E F G H I J K L M

Depth-First Search Uninformed search strategy Going down one branch all the way When needed, backtrack.

Depth-First Example A B C D E F G H I J K L M

Today s Plan Motivation for Parsing Parsing as Search Search algorithms Search Strategies Issues

A Toy Grammar

Top-Down Search Begin with productions with S on LHS E.g., S NP VP Successively expand non-terminals E.g., NP Det Nominal; VP V NP Terminate when all leaves are terminals Book that flight

Pros and Cons of Top- Down Search Pros: Doesn t explore trees not rooted at S Doesn t explore invalid subtrees Cons: Produces trees that may not match input May not terminate with recursive rules May rederive subtrees during search

Bottom-Up Search Find all trees that span the input Start with input string: Book that flight. Use all productions with current subtree(s) on RHS E.g., N Book; V Book Stop when spanned by S (or no more rules apply)

Depth-First Breadth-First

Pros and Cons of Bottom-Up Search Pros: Only explore trees that match input Fewer problems with recursive rules Useful for incremental/ fragment parsing Cons: Explore subtrees that will not fit full sentences

Today s Plan Motivation for Parsing Parsing as Search Search algorithms Search Strategies Issues

Parsing Challenges Ambiguity Recursion Repeated substructure

Parsing Ambiguity Lexical ambiguity Book/N; Book/V Structural ambiguity: Attachment ambiguity Constituent can attach in multiple places I shot an elephant in my pyjamas. Coordination ambiguity Different constituents can be conjoined Old men and women

Attachment Ambiguity

Disambiguation Local ambiguity: Ambiguity in subtree, resolved globally Global ambiguity: Multiple complete alternative parses Need strategy to select correct one Alternatively, keep all

Resolving Global Ambiguity Exploit other information Statistical Some prepositional structs more likely to attach high/low Some phrases more likely, e.g., (old (men and women)) Semantic Pragmatic (e.g., elephants and pyjamas)

Garden Path Sentences Misleading Partial Analyses The old man the boat The player kicked the ball kicked the ball The cotton clothing is made of grows in Mississippi

Recursion Direct Recursion (e.g., S S CONJ S) water under the bridge, Bill ran and Jane jogged Indirect Recursion... on a thimble in a box on a stool beside a table near a sofa... NP DT Nom Nom Nom PP PP Prep NP Can yield infinite searches e.g., Top-down search with S S conj S

Repeated Work Avoid repeatedly parsing substructures Good subtrees in globally bad parses Overall, bad parses will fail Reconstruction subtrees on other branches Can t avoid with static backtracking Store shared substructure for efficiency Typically with dynamic programming

Dynamic Programming Want to avoid repeated work Global parse made of parse subtrees Record parses of subtrees Dynamic programming Tabulate solutions to subproblems Avoid repeated work

Parsing w/dynamic Programming Makes parsing algorithms (relatively) efficient Polynomial time in input length Typically cubic (n 3 ) or less Several different implementations Cocke-Kasami-Younger (CKY) algorithm Earley algorithm Chart parsing

Chomsky Normal Form (CNF) CKY parsing requires grammars in CNF All productions of the form: A B C, or A a Most of our grammars are not of this form E.g., S -> Wh-NP Aux NP VP Need a general conversion procedue

CNF Conversion Hybrid rules: INF-VP to VP Unit productions: A B (e.g. Nom N) Long productions: A B C D

Hybrid Rule Conversion Replace all terminals with dummy nonterminals Problem Rule: INF-VP to VP New Rules: INF-VP TO VP TO to

Unit Productions Conversion Rewrite RHS with RHS of all derivable nonunit productions Old Rules: NP Nom Nom Det N New Rule: NP Det N

Long Productions Conversion Introduce new non-terminals and spread over rules Old Rule: S Aux NP VP New Rules: S X1 VP X1 Aux NP

CNF Conversion Summary For all non-conforming rules, Convert terminals to dummy nonterminals Convert unit productions Binarize all resulting rules

Result of CNF Conversion

Next Time CKY Parsing Algorithm Homework 1 Wrap-Up Project 1 Introduction