Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Similar documents
CKY algorithm / PCFGs

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Ling 571: Deep Processing for Natural Language Processing

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Parsing Chart Extensions 1

Elements of Language Processing and Learning lecture 3

The CKY algorithm part 1: Recognition

Natural Language Processing

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Computational Linguistics

The CKY algorithm part 1: Recognition

Parsing with Dynamic Programming

Forest-based Algorithms in

Natural Language Processing

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

COMS W4705, Spring 2015: Problem Set 2 Total points: 140

Transition-based Parsing with Neural Nets

Advanced PCFG Parsing

Complementing Non-CFLs If A and B are context free languages then: AR is a context-free language TRUE

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

The CYK Parsing Algorithm

Context-Free Grammars

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing in Parallel on Multiple Cores and GPUs

Forest-Based Search Algorithms

Decision Properties for Context-free Languages

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Phrase Structure Parsing. Statistical NLP Spring Conflicting Tests. Constituency Tests. Non-Local Phenomena. Regularity of Rules

K-best Parsing Algorithms

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

I Know Your Name: Named Entity Recognition and Structural Parsing

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

CS 224N Assignment 2 Writeup

1. Generalized CKY Parsing Treebank empties and unaries. Seven Lectures on Statistical Parsing. Unary rules: Efficient CKY parsing

October 19, 2004 Chapter Parsing

CS159 - Assignment 2b

The Earley Parser

A Neuro Probabilistic Language Model Bengio et. al. 2003

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Syntax Analysis Part I

The CKY algorithm part 2: Probabilistic parsing

Vorlesung 7: Ein effizienter CYK Parser

CS 406/534 Compiler Construction Parsing Part I

Context-Free Grammars. Carl Pollard Ohio State University. Linguistics 680 Formal Foundations Tuesday, November 10, 2009

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

Context-Free Grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Parsing II Top-down parsing. Comp 412

NLP in practice, an example: Semantic Role Labeling

UNIT 13B AI: Natural Language Processing. Announcement (1)

CSCE 314 Programming Languages

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Languages and Compilers

11-682: Introduction to IR, NLP, MT and Speech. Parsing with Unification Grammars. Reading: Jurafsky and Martin, Speech and Language Processing

Assignment 4 CSE 517: Natural Language Processing

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Application of recursion. Grammars and Parsing. Recursive grammar. Grammars

Structural Text Features. Structural Features

Homework & Announcements

Semantic image search using queries

Smoothing. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH

3. Parsing. Oscar Nierstrasz

Natural Language Dependency Parsing. SPFLODD October 13, 2011

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1

Wh-questions. Ling 567 May 9, 2017

Dependency grammar and dependency parsing

CS 314 Principles of Programming Languages

Syntactic Analysis. Top-Down Parsing

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Advanced PCFG algorithms

Syntax Intro and Overview. Syntax

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Announcements. Application of Recursion. Motivation. A Grammar. A Recursive Grammar. Grammars and Parsing

CSE302: Compiler Design

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

22 23 Earley s Algorithm Drew McDermott , 11-30, Yale CS470/570

Identifying Idioms of Source Code Identifier in Java Context

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

Non-deterministic Finite Automata (NFA)

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Algorithms for NLP. Language Modeling II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

Announcements. CS 188: Artificial Intelligence Spring Quadruped. Today. High-Level Control. Lecture 27: Conclusion 4/28/2010.

UNIVERSITY OF CALIFORNIA

Computer Science 160 Translation of Programming Languages

Formal Languages and Compilers Lecture I: Introduction to Compilers

CYK- and Earley s Approach to Parsing

CS153: Compilers Lecture 4: Recursive Parsing

Transcription:

Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides adapted from Ray Mooney Backoff models: absolute discounting Backoff models: absolute discounting P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise ubtract some absolute number from each of the counts (e.g. 0.75) will have a large effect on low counts will have a small effect on large counts What is (xy)? 1

Backoff models: absolute discounting Backoff models: absolute discounting see the dog 1 see the cat 2 see the banana 4 see the man 1 see the woman 1 see the car 1 the Dow Jones 10 the Dow rose 5 the Dow fell 5 p( rose the Dow ) =? see the dog 1 see the cat 2 see the banana 4 see the man 1 see the woman 1 see the car 1 p( cat see the ) =? 2 D 10 = 2 0.75 =.125 10 p( cat see the ) =? p( jumped the Dow ) =? p( puppy see the ) =? P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise Backoff models: absolute discounting Backoff models: absolute discounting see the dog 1 see the cat 2 see the banana 4 see the man 1 see the woman 1 see the car 1 p( puppy see the ) =? (see the) =? How much probability mass did we reserve/discount for the bigram model? see the dog 1 see the cat 2 see the banana 4 see the man 1 see the woman 1 see the car 1 p( puppy see the ) =? (see the) =? # of types starting with see the * D count( see the ) P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise P absolute (z xy) = C(xyz) D C(xy) α(xy)p absolute (z y) if C(xyz) > 0 otherwise For each of the unique trigrams, we subtracted D/count( see the ) from the probability distribution 2

Backoff models: absolute discounting see the dog 1 see the cat 2 see the banana 4 see the man 1 see the woman 1 see the car 1 P absolute (z xy) = C(xyz) D C(xy) if C(xyz) > 0 α(xy)p absolute (z y) otherwise p( puppy see the ) =? (see the) =? # of types starting with see the * D count( see the ) reserved _ mass(see the) = 6* D 10 = 6*0.75 = 0.45 10 distribute this probability mass to all bigrams we backed off to Calculating We have some number of bigrams we re going to backoff to, i.e. those where C(see the ) = 0, is unseen trigrams starting with see the When we backoff, for each of these, we ll be including their probability in the model: P( the) is the normalizing constant so the sum of these probabilities equals the reserved probability mass :C(see the ) = 0 p( the) = reserved _ mass(see the) Calculating Calculating in general: trigrams We can calculate two ways Based on those we haven t seen: reserved _ mass(see the) α(see the) = p( the) :C(see the ) = 0 Or, more often, based on those we do see: reserved _ mass(see the) α(see the) = 1 p( the) :C(see the ) > 0 Calculate the reserved mass # of types starting with bigram * D reserved_mass(bigram) = count(bigram) Calculate the sum of the backed off probability. For bigram A B : 1 p( B) either is fine in practice, p( B) :C(A B ) > 0 the left is easier :C(A B ) = 0 Calculate reserved _ mass(a B) α(a B) = 1 p( B) :C(A B ) > 0 1 the sum of the bigram probabilities of those trigrams we saw starting with bigram A B 3

Calculating in general: bigrams Calculating backoff models in practice Calculate the reserved mass # of types starting with unigram * D reserved_mass(unigram) = count(unigram) Calculate the sum of the backed off probability. For bigram A B : 1 Calculate p() :C(A ) > 0 either is fine in practice, the left is easier reserved _ mass(a) α(a) = 1 p() :C(A ) > 0 p() :C(A ) = 0 1 the sum of the unigram probabilities of those bigrams we saw starting with word A tore the s in another table If it s a trigram backed off to a bigram, it s a table keyed by the bigrams If it s a bigram backed off to a unigram, it s a table keyed by the unigrams Compute the s during training After calculating all of the probabilities of seen unigrams/bigrams/ trigrams Go back through and calculate the s (you should have all of the information you need) During testing, it should then be easy to apply the backoff model with the s pre-calculated Backoff models: absolute discounting yntactic structure reserved_mass = # of types starting with bigram * D count(bigram) ( ( ( (DT the) (NN man)) ( (IN in) ( (DT the) (NN hat)))) ( (VBD ran) ( (TO to ( (DT the) (NN park)))))) Two nice attributes: decreases if we ve seen more bigrams should be more confident the unseen trigram is no good increases if the bigram tends to be followed by lots of other words will be more likely to see an unseen trigram DT NN IN DT NN VBD IN DT NN The man in the hat ran to the park. 4

CFG: Example Many possible CFGs for English, here is an example (fragment): V P N AdjP AdjP Adj Adv AdjP N boy girl V sees likes Adj big small Adv very P a the Grammar questions Can we determine if a sentence is grammatical? Given a sentence, can we determine the syntactic structure? Can we determine how likely a sentence is to be grammatical? to be an English sentence? Can we generate candidate, grammatical sentences? Parsing Parsing Parsing is the field of NLP interested in automatically determining the syntactic structure of a sentence parsing can also be thought of as determining what sentences are valid English sentences Given a CFG and a sentence, determine the possible parse tree(s) -> -> PRP -> N -> N -> V -> V -> IN N PRP -> I V -> eat N -> sushi N -> tuna IN -> with I eat sushi with tuna What parse trees are possible for this sentence? How did you figure it out? 5

Parsing Parsing PRP V N IN N I eat sushi with tuna PRP V N IN N I eat sushi with tuna What is the difference between these parses? -> -> PRP -> N -> V -> V -> IN N PRP -> I V -> eat N -> sushi N -> tuna IN -> with Given a CFG and a sentence, determine the possible parse tree(s) -> -> PRP -> N -> V -> V -> IN N PRP -> I V -> eat N -> sushi N -> tuna IN -> with I eat sushi with tuna approaches? algorithms? Parsing Parsing Example Top-down parsing start at the top (usually ) and apply rules matching left-hand sides and replacing with right-hand sides Bottom-up parsing start at the bottom (i.e. words) and build the parse tree up from there matching right-hand sides and replacing with left-hand sides 6

Pronoun Pronoun Proper Proper 7

Aux Aux 8

9

Pronoun Pronoun 10

Proper Proper 11

12

13

14

15

16

17

Parsing Why is parsing hard? Pros/Cons? Top-down: Only examines parses could be valid parses (i.e. with an on top) Doesn t take into account the actual words! Bottom-up: Only examines structures have the actual words as the leaves Examines sub-parses may not result in a valid parse! Actual grammars are large Lots of ambiguity! Most sentences have many parses ome sentences have a lot of parses Even for sentences are not ambiguous, there is often ambiguity for subtrees (i.e. multiple ways to parse a phrase) Why is parsing hard? tructural Ambiguity Can Give Exponential Parses I saw the man on the hill with the telescope I was on the hill has a telescope when I saw a man. I saw a man who was on a hill and who had a telescope. What are some interpretations? I saw a man who was on the hill has a telescope on it. I was on the hill when I used the telescope to see a man. Using a telescope, I saw a man who was on a hill.... I saw the man on the hill with the telescope Me ee A man The telescope The hill 18

Dynamic Programming Parsing To avoid extensive repeated work you must cache intermediate results, specifically found constituents Caching (memoizing) is critical to obtaining a polynomial time parsing (recognition) algorithm for CFGs Dynamic programming algorithms based on both top-down and bottom-up search can achieve O(n 3 ) recognition time where n is the length of the input string. Dynamic Programming Parsing Methods CKY (Cocke-Kasami-Younger) algorithm based on bottom-up parsing and requires first normalizing the grammar. Earley parser is based on top-down parsing and does not require normalizing grammar but is more complex. These both fall under the general category of chart parsers which retain completed constituents in a chart CKY CNF Grammar First grammar must be converted to Chomsky normal form (CNF) in which productions must have either exactly 2 non-terminal symbols on the RH or 1 terminal symbol (lexicon rules). Parse bottom-up storing phrases formed from all substrings in a triangular table (chart) -> -> VB -> VB -> DT NN -> NN -> -> IN DT -> the IN -> with VB -> film VB -> trust NN -> man NN -> film NN -> trust -> -> VB -> 2 2 -> VB -> DT NN -> NN -> -> IN DT -> the IN -> with VB -> film VB -> trust NN -> man NN -> film NN -> trust 19