The CKY algorithm part 1: Recognition

Similar documents
The CKY algorithm part 1: Recognition

The CKY algorithm part 2: Probabilistic parsing

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Elements of Language Processing and Learning lecture 3

Collins and Eisner s algorithms

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Ling 571: Deep Processing for Natural Language Processing

Parsing Chart Extensions 1

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

CKY algorithm / PCFGs

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Transition-based dependency parsing

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Natural Language Processing

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

CS 224N Assignment 2 Writeup

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Advanced PCFG Parsing

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

CSE302: Compiler Design

Syntax Analysis Part I

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Syntax Analysis. Chapter 4

Assignment 4 CSE 517: Natural Language Processing

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Intro to XML. Borrowed, with author s permission, from:

Probabilistic Parsing of Mathematics

October 19, 2004 Chapter Parsing

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Ling 571: Deep Processing for Natural Language Processing

Compilers and Interpreters

Principles of Programming Languages COMP251: Syntax and Grammars

Lecture 4: Syntax Specification

CS 406/534 Compiler Construction Parsing Part I

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

Advanced PCFG Parsing

Compilers. Lecture 2 Overview. (original slides by Sam

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Context-Free Grammars. Carl Pollard Ohio State University. Linguistics 680 Formal Foundations Tuesday, November 10, 2009

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Transition-based Parsing with Neural Nets

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Syntax Intro and Overview. Syntax

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

ITEC2620 Introduction to Data Structures

Parsing partially bracketed input

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Syllabus. 5. String Problems. strings recap

Exam Marco Kuhlmann. This exam consists of three parts:

Grammar Rules in Prolog!!

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

2 Ambiguity in Analyses of Idiomatic Phrases

CS 842 Ben Cassell University of Waterloo

Iterative CKY parsing for Probabilistic Context-Free Grammars

COMS W4705, Spring 2015: Problem Set 2 Total points: 140

Grammar Rules in Prolog

CS 314 Principles of Programming Languages

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CSE 3302 Programming Languages Lecture 2: Syntax

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

LSA 354. Programming Assignment: A Treebank Parser

Introduction to Parsing. Lecture 8

Context-Free Grammars

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Homework & Announcements

Identifying Idioms of Source Code Identifier in Java Context

Compilers. Compiler Construction Tutorial The Front-end

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Computer Components. Software{ User Programs. Operating System. Hardware

Administrivia. Compilers. CS143 10:30-11:50TT Gates B01. Text. Staff. Instructor. TAs. The Purple Dragon Book. Alex Aiken. Aho, Lam, Sethi & Ullman

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Introduction to Parsing. Comp 412

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Parsing

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

11-682: Introduction to IR, NLP, MT and Speech. Parsing with Unification Grammars. Reading: Jurafsky and Martin, Speech and Language Processing

Syntactic Analysis. Top-Down Parsing

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Parsing with Dynamic Programming

Advanced PCFG algorithms

I Know Your Name: Named Entity Recognition and Structural Parsing

1. Generalized CKY Parsing Treebank empties and unaries. Seven Lectures on Statistical Parsing. Unary rules: Efficient CKY parsing

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Parsing in Parallel on Multiple Cores and GPUs

The CYK Parsing Algorithm

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Compiler Design 1. Introduction to Programming Language Design and to Compilation

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

Transcription:

The CKY algorithm part 1: Recognition Syntactic analysis (5LN455) 2014-11-17 Sara Stymne Department of Linguistics and Philology Mostly based on slides from Marco Kuhlmann

Recap: Parsing

Parsing The automatic analysis of a sentence with respect to its syntactic structure.

Phrase structure trees S root (top) NP VP leaves (bottom) Pro Verb NP I prefer Det Nom a Nom Noun Noun flight morning

Ambiguity S NP VP Pro Verb NP I booked Det Nom a Nom PP Noun from LA flight

Ambiguity S NP VP Pro Verb NP PP I booked Det Nom from LA a Noun flight

Parsing as search Parsing as search: search through all possible parse trees for a given sentence bottom up: build parse trees starting at the leaves top down: build parse trees starting at the root node

Overview of the CKY algorithm The CKY algorithm is an efficient bottom-up parsing algorithm for context-free grammars. It was discovered at least three (!) times and named after Cocke, Kasami, and Younger. It is one of the most important and most used parsing algorithms.

Applications The CKY algorithm can be used to compute many interesting things. Here we use it to solve the following tasks: Recognition: Is there any parse tree at all? Probabilistic parsing: What is the most probable parse tree?

Restrictions The CKY algorithm as we present it here can only handle rules that are at most binary: C wi, C C1, C C1 C2. This restriction is not a problem theoretically, but requires preprocessing (binarization) and postprocessing (debinarization). A parsing algorithm that does away with this restriction is Earley s algorithm (Lecture 5 and J&M 13.4.2).

Restrictions - details The CKY algorithm originally handles grammars in CNF (Chomsky normal form): C wi, C C1 C2, (S ε) ε is normally not used in natural language grammars We also allow unit productions, C C1 Extended CNF Easy to integrate into CNF, easier grammar conversions

Conventions We are given a context-free grammar G and a sequence of word tokens w = w1 wn. We want to compute parse trees of w according to the rules of G. We write S for the start symbol of G.

Fencepost positions We view the sequence w as a fence with n holes, one hole for each token wi, and we number the fenceposts from 0 till n. I want a morning flight 0 1 2 3 4 5

Structure Is there any parse tree at all? What is the most probable parse tree?

Recognition

Recognition Recognizer A computer program that can answer the question Is there any parse tree at all for the sequence w according to the grammar G? is called a recognizer. In practical applications one also wants a concrete parse tree, not only an answer to the question whether such a parse tree exists.

Recognition Parse trees S NP VP Pro Verb NP I booked Det Nom a Nom PP Noun from LA flight

Recognition Preterminal rules and inner rules preterminal rules: rules that rewrite a part-of-speech tag to a token, i.e. rules of the form C wi. Pro I, Verb booked, Noun flight inner rules: rules that rewrite a syntactic category to other categories: C C1 C2, C C1. S NP VP, NP Det Nom, NP Pro

Recognition Recognizing small trees w i

Recognition Recognizing small trees C wi w i

Recognition Recognizing small trees C w i

Recognition Recognizing small trees C covers all words between i 1 and i

Recognition Recognizing big trees C 1 C 2 covers all words btw min and mid covers all words btw mid and max

Recognition Recognizing big trees C C1 C2 C 1 C 2 covers all words btw min and mid covers all words btw mid and max

Recognition Recognizing big trees C C 1 C 2 covers all words btw min and mid covers all words btw mid and max

Recognition Recognizing big trees C covers all words between min and max

Recognition Questions How do we know that we have recognized that the input sequence is grammatical? How do we need to extend this reasoning in the presence of unary rules: C C1?

Recognition Signatures The rules that we have just seen are independent of a parse tree s inner structure. The only thing that is important is how the parse tree looks from the outside. We call this the signature of the parse tree. A parse tree with signature [min, max, C] is one that covers all words between min and max and whose root node is labeled with C.

Recognition Questions What is the signature of a parse tree for the complete sentence? How many different signatures are there? Can you relate the runtime of the parsing algorithm to the number of signatures?

Implementation

Implementation Data structure The implementation represents signatures by means of a three-dimensional array chart. Initially, all entries of chart are set to false. (This is guaranteed by Java.) Whenever we have recognized a parse tree that spans all words between min and max and whose root node is labeled with C, we set the entry chart[min][max][c] to true.

Implementation Preterminal rules for each wi from left to right for each preterminal rule C -> wi chart[i - 1][i][C] = true

Implementation Binary rules for each max from 2 to n for each min from max - 2 down to 0 for each syntactic category C for each binary rule C -> C1 C2 for each mid from min + 1 to max - 1 if chart[min][mid][c1] and chart[mid][max][c2] then chart[min][max][c] = true

Implementation Numbering of categories In order to use standard arrays, we need to represent syntactic categories by numbers. We write m for the number of categories; we number them from 0 till m 1. We choose our numbers such that the start symbol S gets the number 0.

Implementation Skeleton code // int n = number of words in the sequence // int m = number of syntactic categories in the grammar // int s = the (number of the) grammar s start symbol boolean[][][] chart = new boolean[n + 1][n + 1][m] // Recognize all parse trees built with with preterminal rules. // Recognize all parse trees built with inner rules. return chart[0][n][s]

Implementation Questions In what way is this algorithm bottom up? Why is that property of the algorithm important? How do we need to extend the code in order to handle unary rules C C1?

Summary The CKY algorithm is an efficient parsing algorithm for context-free grammars. Today: Recognizing whether there is any parse tree at all. Next time: Probabilistic parsing computing the most probable parse tree.

Reading Recap of the introductory lecture: J&M chapter 12.1-12.7 and 13.1-13.3 CKY recognition: J&M section 13.4.1 CKY probabilistic parsing J&M section 14.1-14.2