Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1
|
|
- Claire Black
- 5 years ago
- Views:
Transcription
1 Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1
2 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a lexicon FSA that matches all English words. What do these regexps do? L* L* & x [ L ]* [ L :0 ]*.o. x Intro to NLP - J. Eisner (what does this FSA look like? how many paths?) [ L ]*.o. [ [ :0] \ ]*.o. x (equivalent)
3 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a lexicon FSA that matches all English words. Let s try to recover the spaces with a single regexp [ [ L :0 ]*.o. x ].u word sequence transduced to spaceless version restrict to paths that produce observed string x language of upper strings of those paths Intro to NLP - J. Eisner 3
4 Word Segmentation x = theprophetsaidtothecity Given L = a lexicon FSA that matches all English words. Point of the lecture: This solution generalizes! [ [ L :0 ]*.o. x ].u word sequence transduced to spaceless version What if Lexicon is weighted? From unigrams to bigrams? Smooth L to include unseen words? restrict to paths that produce observed string x language of upper strings of those paths Make upper strings be over alphabet of words, not letters? Intro to NLP - J. Eisner 4
5 Spelling correction Spelling correction also needs a lexicon L But there is distortion beyond deleting spaces Let T be a transducer that models common typos and other spelling errors ance ence (deliverance,...) e ε (deliverance,...) ε e // Cons _ Cons (athlete,...) rr r (embarrasş occurrence, ) ge dge (privilege, ) etc. What does L.o. T represent? How s that useful? Should L and T have probabilities? We ll want T to include all possible errors Intro to NLP - J. Eisner 5
6 Noisy Channel Model real language Y noisy channel Y X observed string X want to recover Y from X Intro to NLP - J. Eisner 6
7 Noisy Channel Model real language Y correct spelling noisy channel Y X typos observed string X misspelling want to recover Y from X Intro to NLP - J. Eisner 7
8 Noisy Channel Model real language Y (lexicon space)* noisy channel Y X delete spaces observed string X text w/o spaces want to recover Y from X Intro to NLP - J. Eisner 8
9 Noisy Channel Model real language Y language model noisy channel Y X acoustic model observed string X (lexicon space)* pronunciation speech want to recover Y from X Intro to NLP - J. Eisner 9
10 Noisy Channel Model real language Y probabilistic CFG noisy channel Y X observed string X tree delete everything but terminals text want to recover Y from X Intro to NLP - J. Eisner 10
11 Noisy Channel Model real language Y noisy channel Y X observed string X p(y) * p(x Y) = p(x,y) want to recover y Y from x X choose y that maximizes p(y x) or equivalently p(x,y) Intro to NLP - J. Eisner 11
12 Remember: Noisy Channel and Bayes Theorem noisy channel y p(y=y) mess up y into x p(x=x Y=y) x decoder most likely reconstruction of y maximize p(y=y X=x) = p(y=y) p(x=x Y=y) / (X=x) = p(y=y) p(x=x Y=y) / y p(y=y ) p(x=x Y=y ) Intro to NLP - J. Eisner 12
13 Noisy Channel Model.o. = p(y) * p(x Y) = p(x,y) Note p(x,y) sums to 1. Suppose x= C ; what is best y? Intro to NLP - J. Eisner 13
14 Noisy Channel Model.o. = p(y) * p(x Y) = p(x,y) Suppose x= C ; what is best y? Intro to NLP - J. Eisner 14
15 Noisy Channel Model.o. p(y) * p(x Y) restrict just to paths compatible with output C.o. * (X==x)? 0 or 1 = = p(x,y) Intro to NLP - J. Eisner 15
16 Noisy Channel Model (i.e., paths transducing y x have total prob p(x y)) noisy channel FST: accepts pair (x,y) with weight p(x y) [ Y.o. T.o. x ].u source model FSA: accepts string y with weight p(y) (i.e., paths accepting y have total prob p(y)) restrict to paths that produce observed string x language of upper strings of those paths Intro to NLP - J. Eisner 16
17 Just Bayes Theorem Again restrict just to paths compatible with output C.o. p(y) = prior on Y * p(x Y) = likelihood of Y, for each X.o. * evidence X=x = = p(x,y): divide by constant p(x) to get posterior p(y x) Intro to NLP - J. Eisner 17
18 Morpheme Segmentation Let Lexicon be a machine that matches all Turkish words Same problem as word segmentation Just at a lower level: morpheme segmentation Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized Some constraints on morpheme sequence: bigram probs Generative model concatenate then fix up joints stop + -ing = stopping, fly + -s = flies, vowel harmony Use a cascade of transducers to handle all the fixups But this is just morphology! Can use probabilities here too (but people often don t) Intro to NLP - J. Eisner 18
19 An FST: Baby Think & Baby Talk :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: /.1 observe talk.2 m m.1 FST mapping think to talk ( = just babbling, no thought) recover think, by composition :m/.05 :m/.4 Mama:m :m/.4 IWant: /.1.1 Mama/.05 Mama/.005 Mama Iwant/.0005 Mama Iwant Iwant/ Intro to NLP - J. Eisner 19.2 /.032
20 Joint Prob. by Double Composition think :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m m compose :m/.05 :m/.4 Mama:m :m/ IWant: /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 20
21 Cute method for summing all paths think :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m m compose :m/.05 :m/.4 Mama:m :m/ IWant: /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 21
22 Cute method for summing all paths think : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose : /.4 : /.05 : : / : /.1 p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 22
23 Cute method for summing all paths think : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose + minimize p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 23
24 p(mama mm) think :Mama : :b/.2 :m/.4 :m/.05 IWant:u/.8 Mama:m IWant: / talk m: m: compose + minimize p(mama * mm) = / = p(mama * : mm) = = sum of Mama paths p( * : mm) = = sum of all paths Intro to NLP - J. Eisner 24
25 Edit Distance Older baby said caca Was probably thinking clara (?) Do these match up well? How well? clara caca clar a c a ca clara c aca clara caca 3 substitutions + 1 deletion = total cost 4 2 deletions + 1 insertion = total cost 3 1 deletion + 1 substitution = total cost 2 5 deletions + 4 insertions = total cost 9 minimum edit distance (best alignment) Intro to NLP - J. Eisner 25
26 Edit distance as min-cost path 0 c 1 l 2 a 3 r 4 a 5? 0 c 1 a 2 c 3 a4 Minimum-cost path shows the best alignment, and its cost is the edit distance position in lower string position in upper string Intro to NLP - J. Eisner 26
27 Edit distance as min-cost path position in upper string 0 c 1 l 2 a 3 r 4 a c 1 a 2 c 3 a 4 Minimum-cost path shows the best alignment, and its cost is the edit distance position in lower string Intro to NLP - J. Eisner 27
28 Edit distance as min-cost path A deletion edge has cost 1 It advances in the upper string only, so it s horizontal It pairs the next letter of the upper string with (empty) in the lower string position in lower string position in upper string Intro to NLP - J. Eisner 28
29 Edit distance as min-cost path An insertion edge has cost 1 It advances in the lower string only, so it s vertical It pairs (empty) in the upper string with the next letter of the lower string position in lower string position in upper string Intro to NLP - J. Eisner 29
30 Edit distance as min-cost path A substitution edge has cost 0 or 1 It advances in the upper and lower strings simultaneously, so it s diagonal It pairs the next letter of the upper string with the next letter of the lower string Cost is 0 or 1 depending on whether those letters are identical! position in lower string position in upper string Intro to NLP - J. Eisner 30
31 Edit distance as min-cost path We re looking for a path from upper left to lower right 0 position in upper string (so as to get through both strings) Solid edges have cost 0, dashed edges have cost 1 So we want the path with the fewest dashed edges position in lower string Intro to NLP - J. Eisner 31
32 Edit distance as min-cost path position in upper string clara caca 3 substitutions + 1 deletion = total cost 4 position in lower string Intro to NLP - J. Eisner 32
33 Edit distance as min-cost path position in upper string clar a c a ca 2 deletions + 1 insertion = total cost 3 position in lower string Intro to NLP - J. Eisner 33
34 Edit distance as min-cost path 0 position in upper string clara c aca 1 deletion + 1 substitution = total cost 2 position in lower string Intro to NLP - J. Eisner 34
35 Edit distance as min-cost path 0 position in upper string clara caca 5 deletions + 4 insertions = total cost 9 position in lower string Intro to NLP - J. Eisner 35
36 Edit Distance Transducer O(k) deletion arcs O(k 2 ) substitution arcs O(k) insertion arcs O(k) no-change arcs Intro to NLP - J. Eisner 36
37 Stochastic Edit Distance Transducer O(k) deletion arcs O(k 2 ) substitution arcs O(k) insertion arcs O(k) identity arcs Likely edits = high-probability arcs Defines p(x y), e.g., p(caca clara) Intro to NLP - J. Eisner 37
38 Most likely path that maps clara to caca? clara.o. Best path (by Dijkstra s algorithm) =.o. caca Intro to NLP - J. Eisner 38
39 What string produced caca? (posterior distribution) Lexicon*.o..o. caca = more complicated FST, with cycles. Each path is a possible input string aligned to caca. A path s weight is proportional to the posterior probability of that path as an explanation for caca. Could find best path Intro to NLP - J. Eisner 39
40 Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model p(word seq).o. pronunciation model p(phone seq word seq).o. acoustic model.o. observed acoustics p(acoustics phone seq) Intro to NLP - J. Eisner 40
41 Speech Recognition by FST Composition (Pereira & Riley 1996) trigram language model.o. phone context CAT:k æ t æ :.o..o. phone context p(word seq) p(phone seq word seq) p(acoustics phone seq) Intro to NLP - J. Eisner 41
42 HMM Tagging Can think of this as a finite-state noisy channel, too! See slides in the tagging lecture Intro to NLP - J. Eisner 42
Part-of-Speech Tagging
Part-of-Speech Tagging A Canonical Finite-State Task 600.465 - Intro to NLP - J. Eisner 1 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationCS 6784 Paper Presentation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John La erty, Andrew McCallum, Fernando C. N. Pereira February 20, 2014 Main Contributions Main Contribution Summary
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationWFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer
+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationLing/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2
Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2 Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions Review: FSAs
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationA simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities
Recap: noisy channel model Foundations of Natural anguage Processing ecture 6 pelling correction, edit distance, and EM lex ascarides (lides from lex ascarides and haron Goldwater) 1 February 2019 general
More informationArtificial Intelligence Naïve Bayes
Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More informationLab 2: Training monophone models
v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationUsing Learned Conditional Distributions as Edit Distance
Using Learned Conditional Distributions as Edit Distance Jose Oncina 1 and Marc Sebban 2 1 Dep. de Lenguajes y Sistemas Informáticos, Universidad de Alicante (Spain) oncina@dlsi.ua.es 2 EURISE, Université
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188
More informationCS4442/9542b Artificial Intelligence II prof. Olga Veksler
CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 15 Natural Language Processing Spelling Correction Many slides from: D. Jurafsky, C. Manning Types of spelling errors Outline 1. non word
More informationAdmin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?
Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal
More informationCOMP Logic for Computer Scientists. Lecture 23
COMP 1002 Logic for Computer cientists Lecture 23 B 5 2 J Admin stuff Assignment 3 extension Because of the power outage, assignment 3 now due on Tuesday, March 14 (also 7pm) Assignment 4 to be posted
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationCMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012
CMPSCI 250: Introduction to Computation Lecture #7: Quantifiers and Languages 6 February 2012 Quantifiers and Languages Quantifier Definitions Translating Quantifiers Types and the Universe of Discourse
More information/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model
601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2017 Due date: Sunday 19 November, 9 pm In this assignment, you will build a Hidden Markov
More informationModeling Sequence Data
Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and
More informationTime series, HMMs, Kalman Filters
Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,
More informationHello, welcome to the video lecture series on Digital Image Processing. So in today's lecture
Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module 02 Lecture Number 10 Basic Transform (Refer
More informationLecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying
given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationQuery Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4
Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects
More informationMesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee
Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language
More informationA Hidden Markov Model for Alphabet Soup Word Recognition
A Hidden Markov Model for Alphabet Soup Word Recognition Shaolei Feng 1 Nicholas R. Howe 2 R. Manmatha 1 1 University of Massachusetts, Amherst 2 Smith College Motivation: Inaccessible Treasures Historical
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna
More information/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model
601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2018 Due date: Thursday 15 November, 9 pm In this assignment, you will build a Hidden Markov
More informationShannon capacity and related problems in Information Theory and Ramsey Theory
Shannon capacity and related problems in Information Theory and Ramsey Theory Eyal Lubetzky Based on Joint work with Noga Alon and Uri Stav May 2007 1 Outline of talk Shannon Capacity of of a graph: graph:
More information--- stands for the horizontal line.
Content Proofs on zoxiy Subproofs on zoxiy Constants in proofs with quantifiers Boxed constants on zoxiy Proofs on zoxiy When you start an exercise, you re already given the basic form of the proof, with
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationSpeech and Language Processing. Lecture 3 Chapter 3 of SLP
Speech and Language Processing Lecture 3 Chapter 3 of SLP Today English Morphology Finite-State Transducers (FST) Morphology using FST Example with flex Tokenization, sentence breaking Spelling correction,
More informationTuesday, September 30, 14.
http://xkcd.com/208/ 1 Lecture 9 Regexes, Finite State Automata Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 2 Exercise 5 out - due
More informationLocal Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1
Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) May, 30, 2017 CPSC 322, Lecture 14 Slide 1 Announcements Assignment1 due now! Assignment2 out today CPSC 322, Lecture 10 Slide 2 Lecture
More informationCSEP 517 Natural Language Processing Autumn 2013
CSEP 517 Natural Language Processing Autumn 2013 Unsupervised and Semi-supervised Learning Luke Zettlemoyer - University of Washington [Many slides from Dan Klein and Michael Collins] Overview Unsupervised
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Naïve Bayes Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationThe Generalized Star-Height Problem
School of Mathematics & Statistics University of St Andrews CIRCA Lunch Seminar 9th April 2015 Regular Expressions Given a finite alphabet A, we define, ε (the empty word), and a in A to be regular expressions.
More informationLecture 5: Exact inference
Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationLeast Squares; Sequence Alignment
Least Squares; Sequence Alignment 1 Segmented Least Squares multi-way choices applying dynamic programming 2 Sequence Alignment matching similar words applying dynamic programming analysis of the algorithm
More informationLearning The Lexicon!
Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,
More informationStructured Prediction Basics
CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad
More information3-1. Dictionaries and Tolerant Retrieval. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.
3-1. Dictionaries and Tolerant Retrieval Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Dictionary data structures for inverted indexes Sec. 3.1 The dictionary
More informationSET DEFINITION 1 elements members
SETS SET DEFINITION 1 Unordered collection of objects, called elements or members of the set. Said to contain its elements. We write a A to denote that a is an element of the set A. The notation a A denotes
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationCS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018
CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building
More informationChapter 3: Search. c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 3, Page 1
Chapter 3: Search c D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 3, Page 1 Searching Often we are not given an algorithm to solve a problem, but only a specification of
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationRegularization and Markov Random Fields (MRF) CS 664 Spring 2008
Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))
More informationCSE 417 Dynamic Programming (pt 6) Parsing Algorithms
CSE 417 Dynamic Programming (pt 6) Parsing Algorithms Reminders > HW9 due on Friday start early program will be slow, so debugging will be slow... should run in 2-4 minutes > Please fill out course evaluations
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationCompiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010
Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationCSE 417 Dynamic Programming (pt 5) Multiple Inputs
CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationConditioned Generation
CS11-747 Neural Networks for NLP Conditioned Generation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Language Models Language models are generative models of text s ~ P(x) The Malfoys! said
More informationCS 188: Artificial Intelligence Fall Machine Learning
CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select
More informationCS159 - Assignment 2b
CS159 - Assignment 2b Due: Tuesday, Sept. 23 at 2:45pm For the main part of this assignment we will be constructing a number of smoothed versions of a bigram language model and we will be evaluating its
More informationMemory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices
Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Alexei V. Ivanov, CTO, Verbumware Inc. GPU Technology Conference, San Jose, March 17, 2015 Autonomous Speech Recognition
More informationLecture 6,
Lecture 6, 4.16.2009 Today: Review: Basic Set Operation: Recall the basic set operator,!. From this operator come other set quantifiers and operations:!,!,!,! \ Set difference (sometimes denoted, a minus
More informationLocal Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1
Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 7, 2013 CPSC 322, Lecture 14 Slide 1 Department of Computer Science Undergraduate Events More details @ https://www.cs.ubc.ca/students/undergrad/life/upcoming-events
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationCS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004
CS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004 Lecture #4: 8 April 2004 Topics: Sequence Similarity Scribe: Sonil Mukherjee 1 Introduction
More informationLog- linear models. Natural Language Processing: Lecture Kairit Sirts
Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,
More informationTuring Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.
Turing Machines Transducers: A transducer is a finite state machine (FST) whose output is a string and not just accept or reject. Each transition of an FST is labeled with two symbols, one designating
More informationSequence Prediction with Neural Segmental Models. Hao Tang
Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationMinimum Edit Distance. Definition of Minimum Edit Distance
Minimum Edit Distance Definition of Minimum Edit Distance How similar are two strings? Spell correction The user typed graffe Which is closest? graf graft grail giraffe Computational Biology Align two
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationRegular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages
Regular Languages MACM 3 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop The set of regular languages: each element is a regular language Each regular language is an example of a
More informationGrammar Rules in Prolog!!
Grammar Rules in Prolog GR-1 Backus-Naur Form (BNF) BNF is a common grammar used to define programming languages» Developed in the late 1950 s Because grammars are used to describe a language they are
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationTheoretical Part. Chapter one:- - What are the Phases of compiler? Answer:
Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal
More informationStructural and Syntactic Pattern Recognition
Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent
More informationYou will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation
FSM Tutorial I assume that you re using bash as your shell; if not, then type bash before you start (you can use csh-derivatives if you want, but your mileage may vary). You will likely want to log into
More information