Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1

Size: px
Start display at page:

Download "Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1"

Transcription

1 Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1

2 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a lexicon FSA that matches all English words. What do these regexps do? L* L* & x [ L ]* [ L :0 ]*.o. x Intro to NLP - J. Eisner (what does this FSA look like? how many paths?) [ L ]*.o. [ [ :0] \ ]*.o. x (equivalent)

3 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a lexicon FSA that matches all English words. Let s try to recover the spaces with a single regexp [ [ L :0 ]*.o. x ].u word sequence transduced to spaceless version restrict to paths that produce observed string x language of upper strings of those paths Intro to NLP - J. Eisner 3

4 Word Segmentation x = theprophetsaidtothecity Given L = a lexicon FSA that matches all English words. Point of the lecture: This solution generalizes! [ [ L :0 ]*.o. x ].u word sequence transduced to spaceless version What if Lexicon is weighted? From unigrams to bigrams? Smooth L to include unseen words? restrict to paths that produce observed string x language of upper strings of those paths Make upper strings be over alphabet of words, not letters? Intro to NLP - J. Eisner 4

5 Spelling correction Spelling correction also needs a lexicon L But there is distortion beyond deleting spaces Let T be a transducer that models common typos and other spelling errors ance ence (deliverance,...) e ε (deliverance,...) ε e // Cons _ Cons (athlete,...) rr r (embarrasş occurrence, ) ge dge (privilege, ) etc. What does L.o. T represent? How s that useful? Should L and T have probabilities? We ll want T to include all possible errors Intro to NLP - J. Eisner 5

6 Noisy Channel Model real language Y noisy channel Y X observed string X want to recover Y from X Intro to NLP - J. Eisner 6

7 Noisy Channel Model real language Y correct spelling noisy channel Y X typos observed string X misspelling want to recover Y from X Intro to NLP - J. Eisner 7

8 Noisy Channel Model real language Y (lexicon space)* noisy channel Y X delete spaces observed string X text w/o spaces want to recover Y from X Intro to NLP - J. Eisner 8

9 Noisy Channel Model real language Y language model noisy channel Y X acoustic model observed string X (lexicon space)* pronunciation speech want to recover Y from X Intro to NLP - J. Eisner 9

10 Noisy Channel Model real language Y probabilistic CFG noisy channel Y X observed string X tree delete everything but terminals text want to recover Y from X Intro to NLP - J. Eisner 10

11 Noisy Channel Model real language Y noisy channel Y X observed string X p(y) * p(x Y) = p(x,y) want to recover y Y from x X choose y that maximizes p(y x) or equivalently p(x,y) Intro to NLP - J. Eisner 11

12 Remember: Noisy Channel and Bayes Theorem noisy channel y p(y=y) mess up y into x p(x=x Y=y) x decoder most likely reconstruction of y maximize p(y=y X=x) = p(y=y) p(x=x Y=y) / (X=x) = p(y=y) p(x=x Y=y) / y p(y=y ) p(x=x Y=y ) Intro to NLP - J. Eisner 12

13 Noisy Channel Model.o. = p(y) * p(x Y) = p(x,y) Note p(x,y) sums to 1. Suppose x= C ; what is best y? Intro to NLP - J. Eisner 13

14 Noisy Channel Model.o. = p(y) * p(x Y) = p(x,y) Suppose x= C ; what is best y? Intro to NLP - J. Eisner 14

15 Noisy Channel Model.o. p(y) * p(x Y) restrict just to paths compatible with output C.o. * (X==x)? 0 or 1 = = p(x,y) Intro to NLP - J. Eisner 15

16 Noisy Channel Model (i.e., paths transducing y x have total prob p(x y)) noisy channel FST: accepts pair (x,y) with weight p(x y) [ Y.o. T.o. x ].u source model FSA: accepts string y with weight p(y) (i.e., paths accepting y have total prob p(y)) restrict to paths that produce observed string x language of upper strings of those paths Intro to NLP - J. Eisner 16

17 Just Bayes Theorem Again restrict just to paths compatible with output C.o. p(y) = prior on Y * p(x Y) = likelihood of Y, for each X.o. * evidence X=x = = p(x,y): divide by constant p(x) to get posterior p(y x) Intro to NLP - J. Eisner 17

18 Morpheme Segmentation Let Lexicon be a machine that matches all Turkish words Same problem as word segmentation Just at a lower level: morpheme segmentation Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized Some constraints on morpheme sequence: bigram probs Generative model concatenate then fix up joints stop + -ing = stopping, fly + -s = flies, vowel harmony Use a cascade of transducers to handle all the fixups But this is just morphology! Can use probabilities here too (but people often don t) Intro to NLP - J. Eisner 18

19 An FST: Baby Think & Baby Talk :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: /.1 observe talk.2 m m.1 FST mapping think to talk ( = just babbling, no thought) recover think, by composition :m/.05 :m/.4 Mama:m :m/.4 IWant: /.1.1 Mama/.05 Mama/.005 Mama Iwant/.0005 Mama Iwant Iwant/ Intro to NLP - J. Eisner 19.2 /.032

20 Joint Prob. by Double Composition think :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m m compose :m/.05 :m/.4 Mama:m :m/ IWant: /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 20

21 Cute method for summing all paths think :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m m compose :m/.05 :m/.4 Mama:m :m/ IWant: /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 21

22 Cute method for summing all paths think : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose : /.4 : /.05 : : / : /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 22

23 Cute method for summing all paths think : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose + minimize p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 23

24 p(mama mm) think :Mama : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose + minimize p(mama * mm) = / = p(mama * : mm) = = sum of Mama paths p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 24

25 Edit Distance Older baby said caca Was probably thinking clara (?) Do these match up well? How well? clara caca clar a c a ca clara c aca clara caca 3 substitutions + 1 deletion = total cost 4 2 deletions + 1 insertion = total cost 3 1 deletion + 1 substitution = total cost 2 5 deletions + 4 insertions = total cost 9 minimum edit distance (best alignment) Intro to NLP - J. Eisner 25

26 Edit distance as min-cost path 0 c 1 l 2 a 3 r 4 a 5? 0 c 1 a 2 c 3 a4 Minimum-cost path shows the best alignment, and its cost is the edit distance position in lower string position in upper string Intro to NLP - J. Eisner 26

27 Edit distance as min-cost path position in upper string 0 c 1 l 2 a 3 r 4 a c 1 a 2 c 3 a 4 Minimum-cost path shows the best alignment, and its cost is the edit distance position in lower string Intro to NLP - J. Eisner 27

28 Edit distance as min-cost path A deletion edge has cost 1 It advances in the upper string only, so it s horizontal It pairs the next letter of the upper string with (empty) in the lower string position in lower string position in upper string Intro to NLP - J. Eisner 28

29 Edit distance as min-cost path An insertion edge has cost 1 It advances in the lower string only, so it s vertical It pairs (empty) in the upper string with the next letter of the lower string position in lower string position in upper string Intro to NLP - J. Eisner 29

30 Edit distance as min-cost path A substitution edge has cost 0 or 1 It advances in the upper and lower strings simultaneously, so it s diagonal It pairs the next letter of the upper string with the next letter of the lower string Cost is 0 or 1 depending on whether those letters are identical! position in lower string position in upper string Intro to NLP - J. Eisner 30

31 Edit distance as min-cost path We re looking for a path from upper left to lower right 0 position in upper string (so as to get through both strings) Solid edges have cost 0, dashed edges have cost 1 So we want the path with the fewest dashed edges position in lower string Intro to NLP - J. Eisner 31

32 Edit distance as min-cost path position in upper string clara caca 3 substitutions + 1 deletion = total cost 4 position in lower string Intro to NLP - J. Eisner 32

33 Edit distance as min-cost path position in upper string clar a c a ca 2 deletions + 1 insertion = total cost 3 position in lower string Intro to NLP - J. Eisner 33

34 Edit distance as min-cost path 0 position in upper string clara c aca 1 deletion + 1 substitution = total cost 2 position in lower string Intro to NLP - J. Eisner 34

35 Edit distance as min-cost path 0 position in upper string clara caca 5 deletions + 4 insertions = total cost 9 position in lower string Intro to NLP - J. Eisner 35

36 Edit Distance Transducer O(k) deletion arcs O(k 2 ) substitution arcs O(k) insertion arcs O(k) no-change arcs Intro to NLP - J. Eisner 36

37 Stochastic Edit Distance Transducer O(k) deletion arcs O(k 2 ) substitution arcs O(k) insertion arcs O(k) identity arcs Likely edits = high-probability arcs Defines p(x y), e.g., p(caca clara) Intro to NLP - J. Eisner 37

38 Most likely path that maps clara to caca? clara.o. Best path (by Dijkstra s algorithm) =.o. caca Intro to NLP - J. Eisner 38

39 What string produced caca? (posterior distribution) Lexicon*.o..o. caca = more complicated FST, with cycles. Each path is a possible input string aligned to caca. A path s weight is proportional to the posterior probability of that path as an explanation for caca. Could find best path Intro to NLP - J. Eisner 39

40 Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model p(word seq).o. pronunciation model p(phone seq word seq).o. acoustic model.o. observed acoustics p(acoustics phone seq) Intro to NLP - J. Eisner 40

41 Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model.o. phone context CAT:k æ t æ :.o..o. phone context p(word seq) p(phone seq word seq) p(acoustics phone seq) Intro to NLP - J. Eisner 41

42 HMM Tagging Can think of this as a finite-state noisy channel, too! See slides in the tagging lecture Intro to NLP - J. Eisner 42

Part-of-Speech Tagging

Part-of-Speech Tagging Part-of-Speech Tagging A Canonical Finite-State Task 600.465 - Intro to NLP - J. Eisner 1 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech

More information

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search

More information

The Expectation Maximization (EM) Algorithm

The Expectation Maximization (EM) Algorithm The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden

More information

Lexicographic Semirings for Exact Automata Encoding of Sequence Models

Lexicographic Semirings for Exact Automata Encoding of Sequence Models Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the

More information

CS 6784 Paper Presentation

CS 6784 Paper Presentation Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John La erty, Andrew McCallum, Fernando C. N. Pereira February 20, 2014 Main Contributions Main Contribution Summary

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer + WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and

More information

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2 Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2 Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions Review: FSAs

More information

Introduction to Hidden Markov models

Introduction to Hidden Markov models 1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order

More information

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities Recap: noisy channel model Foundations of Natural anguage Processing ecture 6 pelling correction, edit distance, and EM lex ascarides (lides from lex ascarides and haron Goldwater) 1 February 2019 general

More information

Artificial Intelligence Naïve Bayes

Artificial Intelligence Naïve Bayes Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188

More information

MEMMs (Log-Linear Tagging Models)

MEMMs (Log-Linear Tagging Models) Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:

More information

Lab 2: Training monophone models

Lab 2: Training monophone models v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

Using Learned Conditional Distributions as Edit Distance

Using Learned Conditional Distributions as Edit Distance Using Learned Conditional Distributions as Edit Distance Jose Oncina 1 and Marc Sebban 2 1 Dep. de Lenguajes y Sistemas Informáticos, Universidad de Alicante (Spain) oncina@dlsi.ua.es 2 EURISE, Université

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188

More information

CS4442/9542b Artificial Intelligence II prof. Olga Veksler

CS4442/9542b Artificial Intelligence II prof. Olga Veksler CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 15 Natural Language Processing Spelling Correction Many slides from: D. Jurafsky, C. Manning Types of spelling errors Outline 1. non word

More information

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)? Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal

More information

COMP Logic for Computer Scientists. Lecture 23

COMP Logic for Computer Scientists. Lecture 23 COMP 1002 Logic for Computer cientists Lecture 23 B 5 2 J Admin stuff Assignment 3 extension Because of the power outage, assignment 3 now due on Tuesday, March 14 (also 7pm) Assignment 4 to be posted

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012 CMPSCI 250: Introduction to Computation Lecture #7: Quantifiers and Languages 6 February 2012 Quantifiers and Languages Quantifier Definitions Translating Quantifiers Types and the Universe of Discourse

More information

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model 601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2017 Due date: Sunday 19 November, 9 pm In this assignment, you will build a Hidden Markov

More information

Modeling Sequence Data

Modeling Sequence Data Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and

More information

Time series, HMMs, Kalman Filters

Time series, HMMs, Kalman Filters Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,

More information

Hello, welcome to the video lecture series on Digital Image Processing. So in today's lecture

Hello, welcome to the video lecture series on Digital Image Processing. So in today's lecture Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module 02 Lecture Number 10 Basic Transform (Refer

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4 Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects

More information

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language

More information

A Hidden Markov Model for Alphabet Soup Word Recognition

A Hidden Markov Model for Alphabet Soup Word Recognition A Hidden Markov Model for Alphabet Soup Word Recognition Shaolei Feng 1 Nicholas R. Howe 2 R. Manmatha 1 1 University of Massachusetts, Amherst 2 Smith College Motivation: Inaccessible Treasures Historical

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model 601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2018 Due date: Thursday 15 November, 9 pm In this assignment, you will build a Hidden Markov

More information

Shannon capacity and related problems in Information Theory and Ramsey Theory

Shannon capacity and related problems in Information Theory and Ramsey Theory Shannon capacity and related problems in Information Theory and Ramsey Theory Eyal Lubetzky Based on Joint work with Noga Alon and Uri Stav May 2007 1 Outline of talk Shannon Capacity of of a graph: graph:

More information

--- stands for the horizontal line.

--- stands for the horizontal line. Content Proofs on zoxiy Subproofs on zoxiy Constants in proofs with quantifiers Boxed constants on zoxiy Proofs on zoxiy When you start an exercise, you re already given the basic form of the proof, with

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,

More information

Speech and Language Processing. Lecture 3 Chapter 3 of SLP

Speech and Language Processing. Lecture 3 Chapter 3 of SLP Speech and Language Processing Lecture 3 Chapter 3 of SLP Today English Morphology Finite-State Transducers (FST) Morphology using FST Example with flex Tokenization, sentence breaking Spelling correction,

More information

Tuesday, September 30, 14.

Tuesday, September 30, 14. http://xkcd.com/208/ 1 Lecture 9 Regexes, Finite State Automata Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 2 Exercise 5 out - due

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) May, 30, 2017 CPSC 322, Lecture 14 Slide 1 Announcements Assignment1 due now! Assignment2 out today CPSC 322, Lecture 10 Slide 2 Lecture

More information

CSEP 517 Natural Language Processing Autumn 2013

CSEP 517 Natural Language Processing Autumn 2013 CSEP 517 Natural Language Processing Autumn 2013 Unsupervised and Semi-supervised Learning Luke Zettlemoyer - University of Washington [Many slides from Dan Klein and Michael Collins] Overview Unsupervised

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001 Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques

More information

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Naïve Bayes Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

The Generalized Star-Height Problem

The Generalized Star-Height Problem School of Mathematics & Statistics University of St Andrews CIRCA Lunch Seminar 9th April 2015 Regular Expressions Given a finite alphabet A, we define, ε (the empty word), and a in A to be regular expressions.

More information

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國 Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional

More information

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Least Squares; Sequence Alignment

Least Squares; Sequence Alignment Least Squares; Sequence Alignment 1 Segmented Least Squares multi-way choices applying dynamic programming 2 Sequence Alignment matching similar words applying dynamic programming analysis of the algorithm

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

3-1. Dictionaries and Tolerant Retrieval. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

3-1. Dictionaries and Tolerant Retrieval. Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 3-1. Dictionaries and Tolerant Retrieval Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Dictionary data structures for inverted indexes Sec. 3.1 The dictionary

More information

SET DEFINITION 1 elements members

SET DEFINITION 1 elements members SETS SET DEFINITION 1 Unordered collection of objects, called elements or members of the set. Said to contain its elements. We write a A to denote that a is an element of the set A. The notation a A denotes

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building

More information

Chapter 3: Search. c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 3, Page 1

Chapter 3: Search. c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 3, Page 1 Chapter 3: Search c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 3, Page 1 Searching Often we are not given an algorithm to solve a problem, but only a specification of

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))

More information

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms CSE 417 Dynamic Programming (pt 6) Parsing Algorithms Reminders > HW9 due on Friday start early program will be slow, so debugging will be slow... should run in 2-4 minutes > Please fill out course evaluations

More information

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models. David M. Blei Columbia University. September 17, 2014 Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

More information

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010 Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

CSE 417 Dynamic Programming (pt 5) Multiple Inputs

CSE 417 Dynamic Programming (pt 5) Multiple Inputs CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Conditioned Generation

Conditioned Generation CS11-747 Neural Networks for NLP Conditioned Generation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Language Models Language models are generative models of text s ~ P(x) The Malfoys! said

More information

CS 188: Artificial Intelligence Fall Machine Learning

CS 188: Artificial Intelligence Fall Machine Learning CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select

More information

CS159 - Assignment 2b

CS159 - Assignment 2b CS159 - Assignment 2b Due: Tuesday, Sept. 23 at 2:45pm For the main part of this assignment we will be constructing a number of smoothed versions of a bigram language model and we will be evaluating its

More information

Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices

Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Alexei V. Ivanov, CTO, Verbumware Inc. GPU Technology Conference, San Jose, March 17, 2015 Autonomous Speech Recognition

More information

Lecture 6,

Lecture 6, Lecture 6, 4.16.2009 Today: Review: Basic Set Operation: Recall the basic set operator,!. From this operator come other set quantifiers and operations:!,!,!,! \ Set difference (sometimes denoted, a minus

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 7, 2013 CPSC 322, Lecture 14 Slide 1 Department of Computer Science Undergraduate Events More details @ https://www.cs.ubc.ca/students/undergrad/life/upcoming-events

More information

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation. A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing

More information

CS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004

CS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004 CS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004 Lecture #4: 8 April 2004 Topics: Sequence Similarity Scribe: Sonil Mukherjee 1 Introduction

More information

Log- linear models. Natural Language Processing: Lecture Kairit Sirts

Log- linear models. Natural Language Processing: Lecture Kairit Sirts Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,

More information

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject. Turing Machines Transducers: A transducer is a finite state machine (FST) whose output is a string and not just accept or reject. Each transition of an FST is labeled with two symbols, one designating

More information

Sequence Prediction with Neural Segmental Models. Hao Tang

Sequence Prediction with Neural Segmental Models. Hao Tang Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

Minimum Edit Distance. Definition of Minimum Edit Distance

Minimum Edit Distance. Definition of Minimum Edit Distance Minimum Edit Distance Definition of Minimum Edit Distance How similar are two strings? Spell correction The user typed graffe Which is closest? graf graft grail giraffe Computational Biology Align two

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages Regular Languages MACM 3 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop The set of regular languages: each element is a regular language Each regular language is an example of a

More information

Grammar Rules in Prolog!!

Grammar Rules in Prolog!! Grammar Rules in Prolog GR-1 Backus-Naur Form (BNF) BNF is a common grammar used to define programming languages» Developed in the late 1950 s Because grammars are used to describe a language they are

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

You will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation

You will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation FSM Tutorial I assume that you re using bash as your shell; if not, then type bash before you start (you can use csh-derivatives if you want, but your mileage may vary). You will likely want to log into

More information