Part-of-Speech Tagging
|
|
- Cameron Sullivan
- 6 years ago
- Views:
Transcription
1 Part-of-Speech Tagging A Canonical Finite-State Task Intro to NLP - J. Eisner 1
2 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech (how do we pronounce lead?) can write regexps like () * N+ over the output preprocessing to speed up parser (but a little dangerous) if you know the tag, you can back off to it in other tasks Intro to NLP - J. Eisner 2
3 Why Do We Care? Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ The first statistical NLP task Been done to death by different methods Easy to evaluate (how many tags are correct?) Canonical finite-state task Can be done well with methods that look at local context Though should really do it by parsing! Intro to NLP - J. Eisner 3
4 Degree of Supervision Supervised: Training corpus is tagged by humans Unsupervised: Training corpus isn t tagged Partly supervised: Training corpus isn t tagged, but you have a dictionary giving possible tags for each word We ll start with the supervised case and move to decreasing levels of supervision Intro to NLP - J. Eisner 4
5 Current Performance Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ How many tags are correct? About 97% currently (for English) But baseline is already 90% Baseline is performance of stupidest possible method Tag every word with its most frequent tag Tag unknown words as nouns Intro to NLP - J. Eisner 5
6 What Should We Look At? correct tags PN Verb Prep Prep Bill directed a cortege of autos through the dunes PN Prep Prep Verb Verb Verb some possible tags for Prep each word (maybe more)? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too Intro to NLP - J. Eisner 6
7 What Should We Look At? correct tags PN Verb Prep Prep Bill directed a cortege of autos through the dunes PN Prep Prep Verb Verb Verb some possible tags for Prep each word (maybe more)? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too Intro to NLP - J. Eisner 7
8 What Should We Look At? correct tags PN Verb Prep Prep Bill directed a cortege of autos through the dunes PN Prep Prep Verb Verb Verb some possible tags for Prep each word (maybe more)? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too Intro to NLP - J. Eisner 8
9 Three Finite-State Approaches Noisy Channel Model (statistical) real language Y noisy channel Y X observed string X part-of-speech tags (n-gram model) replace tags with words text want to recover Y from X Intro to NLP - J. Eisner 9
10 Three Finite-State Approaches 1. Noisy Channel Model (statistical) 2. erministic baseline tagger composed with a cascade of fixup transducers 3. Nondeterministic tagger composed with a cascade of finite-state automata that act as filters Intro to NLP - J. Eisner 10
11 Review: Noisy Channel real language Y noisy channel Y X obseved string X p(y) * p(x Y) = p(x,y) want to recover y Y from x X choose y that maximizes p(y x) or equivalently p(x,y) Intro to NLP - J. Eisner 11
12 Review: Noisy Channel.o. = p(y) * p(x Y) = p(x,y) Note p(x,y) sums to 1. Suppose x= C ; what is best y? Intro to NLP - J. Eisner 12
13 Review: Noisy Channel.o. = p(y) * p(x Y) = p(x,y) Suppose x= C ; what is best y? Intro to NLP - J. Eisner 13
14 Review: Noisy Channel.o. p(y) * p(x Y) restrict just to paths compatible with output C.o. * (X = x)? = = p(x, Y) Intro to NLP - J. Eisner 14
15 Noisy Channel for Tagging acceptor: p(tag sequence).o. Markov Model transducer: tags words Unigram Replacement p(y) p(x Y).o. * acceptor: the observed words (X = x)? straight line = = p(x, Y) transducer: scores candidate tag seqs on their joint probability with obs words; pick best path Intro to NLP - J. Eisner 15 *
16 Markov Model (bigrams) Verb Start Prep Stop Intro to NLP - J. Eisner 16
17 Markov Model Verb Start Prep Stop Intro to NLP - J. Eisner 17
18 Markov Model 0.8 Verb Start Prep Stop Intro to NLP - J. Eisner 18
19 Markov Model p(tag seq) 0.8 Verb Start Prep Stop 0.1 Start Stop = 0.8 * 0.3 * 0.4 * 0.5 * Intro to NLP - J. Eisner 19
20 Markov Model as an FSA p(tag seq) 0.8 Verb Start Prep Stop 0.1 Start Stop = 0.8 * 0.3 * 0.4 * 0.5 * Intro to NLP - J. Eisner 20
21 Markov Model as an FSA p(tag seq) 0.8 Verb Start Prep Stop 0.1 Start Stop = 0.8 * 0.3 * 0.4 * 0.5 * Intro to NLP - J. Eisner 21
22 Markov Model (tag bigrams) p(tag seq) 0.8 Start Stop Start Stop = 0.8 * 0.3 * 0.4 * 0.5 * Intro to NLP - J. Eisner 22
23 Noisy Channel for Tagging automaton: p(tag sequence).o. Markov Model transducer: tags words Unigram Replacement p(y) p(x Y).o. * automaton: the observed words p(x X) straight line = = p(x, Y) transducer: scores candidate tag seqs on their joint probability with obs words; pick best path Intro to NLP - J. Eisner 23 *
24 Start 0.4 Noisy Channel for Tagging Verb 0.2 Prep Stop.o. :cortege/ :autos/0.001 :Bill/0.002 :a/0.6 :the/0.4 = :cool/0.003 :directed/ p(y) p(x Y) :cortege/ o. * p(x X) the cool directed autos transducer: scores candidate tag seqs on their joint probability with obs words; we should pick best path = p(x, Y) Intro to NLP - J. Eisner 24 *
25 Unigram Replacement Model p(word seq tag seq) :cortege/ :autos/0.001 :Bill/0.002 sums to 1 :a/0.6 :the/0.4 :cool/0.003 :directed/ sums to 1 :cortege/ Intro to NLP - J. Eisner 25
26 Compose Start Verb Prep Stop 0.2 :cortege/ :autos/0.001 :Bill/0.002 :a/0.6 :the/0.4 :cool/0.003 :directed/ :cortege/ p(tag seq) 0.8 Verb 0.3 Start Prep Stop Intro to NLP - J. Eisner 26
27 Compose Start Verb Prep Stop 0.2 :cortege/ :autos/0.001 :Bill/0.002 :a/0.6 :the/0.4 :cool/0.003 :directed/ :cortege/ p(word seq, tag seq) = p(tag seq) * p(word seq tag seq) :a 0.48 :the 0.32 Start :cool :directed :cortege :cool :directed :cortege N:cortege N:autos Verb Prep Stop Intro to NLP - J. Eisner 27
28 Observed Words as Straight-Line FSA word seq the cool directed autos Intro to NLP - J. Eisner 28
29 Compose with the cool directed autos p(word seq, tag seq) = p(tag seq) * p(word seq tag seq) :a 0.48 :the 0.32 Start :cool :directed :cortege :cool :directed :cortege N:cortege N:autos Verb Prep Stop Intro to NLP - J. Eisner 29
30 Compose with the cool directed autos p(word seq, tag seq) = p(tag seq) * p(word seq tag seq) :the 0.32 Start why did this loop go away? :directed :cool N:autos Verb Prep Stop Intro to NLP - J. Eisner 30
31 The best path: Start Stop = 0.32 * the cool directed autos p(word seq, tag seq) = p(tag seq) * p(word seq tag seq) :the 0.32 Start :directed :cool N:autos Verb Prep Stop Intro to NLP - J. Eisner 31
32 In Fact, Paths Form a Trellis p(word seq, tag seq) Start :directed Stop The best path: Start Stop = 0.32 * the cool directed autos Intro to NLP - J. Eisner 32
33 The Trellis Shape Emerges from the Cross-Product Construction for Finite-State Composition 0 1 0,0 1,1 2,1 3, ,2 2,2 3,2.o = All paths here are 4 words 1,3 2,3 3,3 1,4 2,4 3,4 4,4 So all paths here must have 4 words on output side Intro to NLP - J. Eisner 33 4
34 Actually, Trellis Isn t Complete p(word seq, tag seq) Trellis has no or Stop arcs; why? Start :directed Stop The best path: Start Stop = 0.32 * the cool directed autos Intro to NLP - J. Eisner 34
35 Actually, Trellis Isn t Complete p(word seq, tag seq) Lattice is missing some other arcs; why? Start :directed Stop The best path: Start Stop = 0.32 * the cool directed autos Intro to NLP - J. Eisner 35
36 Actually, Trellis Isn t Complete p(word seq, tag seq) Lattice is missing some states; why? Start :directed Stop The best path: Start Stop = 0.32 * the cool directed autos Intro to NLP - J. Eisner 36
37 Find best path from Start to Stop Start :directed Stop Use dynamic programming like prob. parsing: What is best path from Start to each node? Work from left to right Each node stores its best path from Start (as probability plus one backpointer) Special acyclic case of Dijkstra s shortest-path alg. Faster if some arcs/states are absent Intro to NLP - J. Eisner 37
38 In Summary We are modeling p(word seq, tag seq) The tags are hidden, but we see the words Is tag sequence X likely with these words? Noisy channel model is a Hidden Markov Model : probs from tag bigram model probs from unigram replacement Start PN Verb Prep Pre Bill directed a cortege of autos thro Find X that maximizes probability product Intro to NLP - J. Eisner 38
39 Another Viewpoint We are modeling p(word seq, tag seq) Why not use chain rule + some kind of backoff? Actually, we are! p( ) Start PN Verb Bill directed a = p(start) * p(pn Start) * p(verb Start PN) * p( Start PN Verb) * * p(bill Start PN Verb ) * p(directed Bill, Start PN Verb ) * p(a Bill directed, Start PN Verb ) * Intro to NLP - J. Eisner 39
40 Another Viewpoint We are modeling p(word seq, tag seq) Why not use chain rule + some kind of backoff? Actually, we are! p( ) Start PN Verb Bill directed a = p(start) * p(pn Start) * p(verb Start PN) * p( Start PN Verb) * * p(bill Start PN Verb ) * p(directed Bill, Start PN Verb ) * p(a Bill directed, Start PN Verb ) * Start PN Verb Prep Prep Stop Bill directed a cortege of autos through the dunes Intro to NLP - J. Eisner 40
41 Three Finite-State Approaches 1. Noisy Channel Model (statistical) 2. erministic baseline tagger composed with a cascade of fixup transducers 3. Nondeterministic tagger composed with a cascade of finite-state automata that act as filters Intro to NLP - J. Eisner 41
42 Another FST Paradigm: Successive Fixups Like successive markups but alter Morphology Phonology Part-of-speech tagging input output Intro to NLP - J. Eisner 42
43 figure from Brill s thesis Transformation-Based Tagging (Brill 1995) Intro to NLP - J. Eisner 43
44 Transformations Learned figure from Brill s thesis BaselineTag* VB // TO _ VB //... _ etc. Compose this cascade of FSTs. Gets a big FST that does the initial tagging and the sequence of fixups all at once Intro to NLP - J. Eisner 44
45 figure from Brill s thesis Initial Tagging of OOV Words Intro to NLP - J. Eisner 45
46 Three Finite-State Approaches 1. Noisy Channel Model (statistical) 2. erministic baseline tagger composed with a cascade of fixup transducers 3. Nondeterministic tagger composed with a cascade of finite-state automata that act as filters Intro to NLP - J. Eisner 46
47 Variations Multiple tags per word Transformations to knock some of them out How to encode multiple tags and knockouts? Use the above for partly supervised learning Supervised: You have a tagged training corpus Unsupervised: You have an untagged training corpus Here: You have an untagged training corpus and a dictionary giving possible tags for each word Intro to NLP - J. Eisner 47
Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1
Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner 1 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing
More information/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model
601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2018 Due date: Thursday 15 November, 9 pm In this assignment, you will build a Hidden Markov
More information/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model
601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2017 Due date: Sunday 19 November, 9 pm In this assignment, you will build a Hidden Markov
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationLing/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2
Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2 Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions Review: FSAs
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationA Primer on Graph Processing with Bolinas
A Primer on Graph Processing with Bolinas J. Andreas, D. Bauer, K. M. Hermann, B. Jones, K. Knight, and D. Chiang August 20, 2013 1 Introduction This is a tutorial introduction to the Bolinas graph processing
More informationA Hidden Markov Model for Alphabet Soup Word Recognition
A Hidden Markov Model for Alphabet Soup Word Recognition Shaolei Feng 1 Nicholas R. Howe 2 R. Manmatha 1 1 University of Massachusetts, Amherst 2 Smith College Motivation: Inaccessible Treasures Historical
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationCSEP 517 Natural Language Processing Autumn 2013
CSEP 517 Natural Language Processing Autumn 2013 Unsupervised and Semi-supervised Learning Luke Zettlemoyer - University of Washington [Many slides from Dan Klein and Michael Collins] Overview Unsupervised
More informationSequence Prediction with Neural Segmental Models. Hao Tang
Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationA simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities
Recap: noisy channel model Foundations of Natural anguage Processing ecture 6 pelling correction, edit distance, and EM lex ascarides (lides from lex ascarides and haron Goldwater) 1 February 2019 general
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationStatistical NLP Spring 2009
Statistical NLP Spring 2009 Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for fixed Y Example: K-Means Lecture 5: WSD / Maxent
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationStructural Text Features. Structural Features
Structural Text Features CISC489/689 010, Lecture #13 Monday, April 6 th Ben CartereGe Structural Features So far we have mainly focused on vanilla features of terms in documents Term frequency, document
More informationSupport Vector Machine Learning for Interdependent and Structured Output Spaces
Support Vector Machine Learning for Interdependent and Structured Output Spaces I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, ICML, 2004. And also I. Tsochantaridis, T. Joachims, T. Hofmann,
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationReport for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;
Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the
More informationThis document is a preliminary proposal to encode two characters into Unicode.
A preliminary proposal to encode two base characters William J G Overington 19 October 2015 1. Introduction This document is a preliminary proposal to encode two characters into Unicode. The two characters
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from
More informationTuesday, September 30, 14.
http://xkcd.com/208/ 1 Lecture 9 Regexes, Finite State Automata Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 2 Exercise 5 out - due
More informationAutomata Computing. Mircea R. Stan UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director
Automata Computing Mircea R. Stan (mircea@virginia.edu), UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director 2013 Micron Technology, Inc. All rights reserved. Products
More informationStatistical NLP Spring 2009
Statistical NLP Spring 2009 Lecture 5: WSD / Maxent Dan Klein UC Berkeley Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for
More informationProbabilistic parsing with a wide variety of features
Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationINF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9
1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationStack- propaga+on: Improved Representa+on Learning for Syntax
Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS
More information15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs
5-78: Graduate rtificial Intelligence omputational biology: Sequence alignment and profile HMMs entral dogma DN GGGG transcription mrn UGGUUUGUG translation Protein PEPIDE 2 omparison of Different Organisms
More informationText Mining for Software Engineering
Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software
More informationA Re-estimation Method for Stochastic Language Modeling from Ambiguous Observations
A Re-estimation Method for Stochastic Language Modeling from Ambiguous Observations Mikio Yamamoto Institute of Information Sciences and Electronics University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationENCODING STRUCTURED OUTPUT VALUES. Edward Loper. Computer and Information Science
ENCODING STRUCTURED OUTPUT VALUES Edward Loper A DISSERTATION PROPOSAL in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements
More informationRegular Languages. Natural Language Processing CS 6120 Spring 2013 Northeastern University
Regular Languages Natural Language Processing CS 6120 Spring 2013 Northeastern University David Smith with material from Jason Eisner, Andrew McCallum, and Lari Karttunen 1 Brief History: 1950s Early NLP
More informationRepresentation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval
Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu 12, Jianfeng Gao 1, Xiaodong He 1 Li Deng 1, Kevin Duh 2, Ye-Yi Wang 1 1
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationExam III March 17, 2010
CIS 4930 NLP Print Your Name Exam III March 17, 2010 Total Score Your work is to be done individually. The exam is worth 106 points (six points of extra credit are available throughout the exam) and it
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationSyntactic N-grams as Machine Learning. Features for Natural Language Processing. Marvin Gülzow. Basics. Approach. Results.
s Table of Contents s 1 s 2 3 4 5 6 TL;DR s Introduce n-grams Use them for authorship attribution Compare machine learning approaches s J48 (decision tree) SVM + S work well Section 1 s s Definition n-gram:
More informationApplications of Lexicographic Semirings to Problems in Speech and Language Processing
Applications of Lexicographic Semirings to Problems in Speech and Language Processing Richard Sproat Google, Inc. Izhak Shafran Oregon Health & Science University Mahsa Yarmohammadi Oregon Health & Science
More informationImproving the Performance of Text Categorization using N-gram Kernels
Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal
More informationTALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño
TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and
More informationLexical Analysis. Lecture 3. January 10, 2018
Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re
More informationEDAN20 Language Technology Chapter 13: Dependency Parsing
EDAN20 Language Technology http://cs.lth.se/edan20/ Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ September 19, 2016 Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/
More informationslide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1
Splitting Words a.k.a. Word Sense Disambiguation 600.465 - Intro to NLP - J. Eisner Representing Word as Vector Could average over many occurrences of the word... Each word type has a different vector
More informationLog- linear models. Natural Language Processing: Lecture Kairit Sirts
Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationXFST2FSA. Comparing Two Finite-State Toolboxes
: Comparing Two Finite-State Toolboxes Department of Computer Science University of Haifa July 30, 2005 Introduction Motivation: finite state techniques and toolboxes The compiler: Compilation process
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationNLP Final Project Fall 2015, Due Friday, December 18
NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,
More informationDomain Based Named Entity Recognition using Naive Bayes
AUSTRALIAN JOURNAL OF BASIC AND APPLIED SCIENCES ISSN:1991-8178 EISSN: 2309-8414 Journal home page: www.ajbasweb.com Domain Based Named Entity Recognition using Naive Bayes Classification G.S. Mahalakshmi,
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationCS395T Project 2: Shift-Reduce Parsing
CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model
More informationForest-Based Search Algorithms
Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2
More informationL435/L555. Dept. of Linguistics, Indiana University Fall 2016
for : for : L435/L555 Dept. of, Indiana University Fall 2016 1 / 12 What is? for : Decent definition from wikipedia: Computer programming... is a process that leads from an original formulation of a computing
More informationTowards Learning-Based Holistic Brain Image Segmentation
Towards Learning-Based Holistic Brain Image Segmentation Zhuowen Tu Lab of Neuro Imaging University of California, Los Angeles in collaboration with (A. Toga, P. Thompson et al.) Supported by (NIH CCB
More informationCOMPUTER SCIENCE TRIPOS
CST.95.3.1 COMPUTER SCIENCE TRIPOS Part IB Monday 5 June 1995 1.30 to 4.30 Paper 3 Answer five questions. Submit the answers in five separate bundles each with its own cover sheet. Write on one side of
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More information+ = The Goal of Texture Synthesis. Image Quilting for Texture Synthesis & Transfer. The Challenge. Texture Synthesis for Graphics
Image Quilting for Texture Synthesis & Transfer Alexei Efros (UC Berkeley) Bill Freeman (MERL) The Goal of Texture Synthesis True (infinite) texture input image SYNTHESIS generated image Given a finite
More informationInteger Linear Programming
Integer Linear Programming Micha Elsner April 5, 2017 2 Integer linear programming A framework for inference: Reading: Clarke and Lapata 2008 Global Inference for Sentence Compression An Integer Linear
More informationNatural Language Processing
Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The
More informationWFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer
+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationCSC401 Natural Language Computing
CSC401 Natural Language Computing Jan 19, 2018 TA: Willie Chang Varada Kolhatkar, Ka-Chun Won, and Aryan Arbabi) Mascots: r/sandersforpresident (left) and r/the_donald (right) To perform sentiment analysis
More informationAdmin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?
Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides
More informationLocal Search with Very Large-Scale Neighborhoods for Optimal Permutations in Machine Translation
Appeared in Proceedings of the HLT-NAACL Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, pp. 57-75, 2006. Local Search with Very Large-Scale Neighborhoods
More informationUsing Search-Logs to Improve Query Tagging
Using Search-Logs to Improve Query Tagging Kuzman Ganchev Keith Hall Ryan McDonald Slav Petrov Google, Inc. {kuzman kbhall ryanmcd slav}@google.com Abstract Syntactic analysis of search queries is important
More informationCollins and Eisner s algorithms
Collins and Eisner s algorithms Syntactic analysis (5LN455) 2015-12-14 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Recap: Dependency trees dobj subj det pmod
More informationNatural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison
Natural Language Processing Basics Yingyu Liang University of Wisconsin-Madison Natural language Processing (NLP) The processing of the human languages by computers One of the oldest AI tasks One of the
More information