Discriminative Training with Perceptron Algorithm for POS Tagging Task
|
|
- Lee Casey
- 5 years ago
- Views:
Transcription
1 Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon 1 Introduction One of the most popular algorithms for structured prediction problems in natural language and speech processing is the perceptron algorithm [Rosenblatt1958, Collins2002]. The perceptron algorithm can be used to estimate the model parameters in any structured prediction learning frameworks. Collins presented a discriminative log-linear model and the perceptron algorithm to estimate the model parameters, as a global framework for discriminative training. This framework can be used for training finite-state tagging models such as POS tagging, shallow parsing, sentence segmentation, named entity recognition, etc. In the first part of this study, section 2, we show experimental results for POS tagging of English using the Collins framework. Structured prediction models including perceptron are supervised machine learning techniques and they need a large amount of labeled input-output data to provide an improvement to system performance. Training the model using large amount of data can be cumbersome. McDonald and colleagues [McDonald et al.2010] investigated distributed training strategies for the structured perceptron to reduce training times in the two tasks of named entity recognition and dependency parsing. In the second part of this paper, section 3, we investigate their techniques for another structure prediction task, POS tagging. 2 Discriminative Model and the Perceptron Algorithm This section describes a discriminative log-linear model and the perceptron algorithm to learn the model parameters to train a POS tagger. This framework is first 1
2 Perceptron(T= {(x i, y i )} N i=1, ᾱ{default = 0}, T) For t = 1..T For i = 1..N calculate z i = argmax z GEN(xi )Φ(x i, z).ᾱ If (z i y i ) then ᾱ = ᾱ + Φ(x i, y i ) Φ(x i, z i ) return ᾱ Figure 1: The perceptron algorithm presented by [Collins2002] as a global framework for discriminative training. To train a discriminative POS tagging model, the task is to learn a mapping from inputs x X to outputs y Y, where X is the set of all input sentences and Y is the set of all possible POS tag sequences. Given a set of training examples (x i, y i ), a function GEN(x) that enumerates a set of possible POS tag sequences of length n (where n is the length of x), ᾱ R d a parameter vector, and representation Φ that maps each (x, y) X Y to a feature vector Φ(x, y), there is a mapping from an input x to an output F (x) defined by the formula: F (x) = arg max Φ(x, y) ᾱ (1) y GEN(x) where Φ(x, y) ᾱ is the inner product Σ i α i Φ i (x, y). The model learns the parameter values ᾱ during the training and the decoding algorithm searches for the y that maximizes (1). The feature vector Φ(x, y) represents arbitrary features of the sentence and POS tag sequences. In section 2.1 we describe the feature templates we used in our experiments. To estimate the parameter values ᾱ of the model we use the perceptron algorithm. Figure 1 shows the perceptron algorithm [Collins2002]. At each training example (x i, y i ), the algorithm updates the parameter vector ᾱ by subtracting the features values of the best-scoring hypothesis z i to it and adding the feature values of the true hypothesis y i from it. The algorithm then moves to the next example. This procedure is repeated T times (epochs) over the training examples. The regular perceptron algorithm suffers from over-fitting problem. One solution to this problem is the averaged perceptron which sets the final weight vector to the average of all the parameter vectors seen in training. 2.1 Features We extract features from word input sequence and POS-tag output sequence. The feature set includes bigrams of surrounding words, a window of size 2 of the next 2
3 and previous words, POS-tag of the previous word, and orthographical features, as shown in Table 1. The orthographical feature set includes prefixes and suffixes of the words (up to 4 characters), and presence of a hyphen, digit, or an uppercase character. We do not restrict the orthographical features to rare 1 or unknown words, and we activate them for all words. This improves accuracy at the cost of some speed compared to the case that the orthographical features are activated only for rare or unknown words. Similar to Ratnaparkhi [Ratnaparkhi1997] or Roark et al. [Roark et al.2012], we restricted the search space to the tag dictionary for each word. For known words, the tag dictionary contains the tags occurred with the word in the training set, and for unknown or rare words, the tag dictionary contains all tags in the tag set. Use of tag dictionary speedups the tagger significantly without hurting the accuracy. Lexical Orthographical t i, t i 1 t i, w i t i, w i [0] t i, w i 1 t i, w i [0 1] t i, w i+1 t i, w i [0 2] t i, w i 2 t i, w i [0 3] t i, w i+2 t i, w i [n] t i, w i, w i+1 t i, w i [n-1 n] t i, w i, w i 1 t i, w i [n-2 n] t i, w i+1, w i+2 t i, w i [n-3 n] t i, w i 1, w i 2 t i, w i containing digit t i, w i containing hyphen t i, w i containing uppercase Table 1: Feature templates for POS tagging 2.2 Experimental Results We ran our experiments on the WSJ Penn Treebank corpus [Marcus et al.1999] using sections 2-21 for training, section 24 for development, and section 23 for testing. The decoding process is performed using a Viterbi search with Markov order-0 assumption. Table 2 shows the accuracy of POS tagging for English using our tagger. To asses how the results of our tagger will generalize to an independent test set, we used a k-fold cross-validation approach (k=20). All labeled examples (the 1 Rare words occur less than 5 times in the training data. 3
4 accuracy dev set 97.1% test set 97.3% Table 2: POS tagging accuracy on development (section 24) and test (section 23) data combination of the training and development sets) are sequentially partitioned into k disjoint subsets. At each fold, one of the subsets is used as the development set and the union of other subsets is used as the training set. Each of the k subsets are used exactly once as the development set. The cross-validation approach, determines the best performance on the development set and its corresponding number of epochs. The mean of these epochs is ī. Our final run on the test set (section 23) is then performed by training on all labeled examples for ī epochs with no development set. The averaged accuracy of the best performances over the 20 folds is 97.1% (σ = 0.2), and the accuracy of the final run on the test set is 97.3%. These results confirm the generalizability of our tagger to arbitrary test sets. 3 Distributed Perceptron In this section, we describe two distributed training strategies we used for distributed training of the perceptron algorithm, which are based on the strategies proposed in McDonald et al. 3.1 Parameter Mixing Parameter mixing method is the straight-forward strategy of training separate models on disjoint subsets of the training data in parallel, and then mixing all parameters as the final model. Figure 2 shows this algorithm [McDonald et al.2010]. First, we partition the training data into S shards, then we train S separate perceptron models on these shards in parallel, and finally we mix the parameter values of shards by taking average of those. In a map-reduce framework, training separate perceptron models is done in the map step and mixing (averaging) the parameter values is done in the reduce step. The advantages of this method is that it is easily scalable to very large data sets and it is resource efficient with respect to network usage. The disadvantage of this method is that it can be sub-optimal, which means that it does not necessarily return a separating weight vector, even when the training set is separable. 4
5 ParameterMix(T= {(x i, y i )} N i=1 ) Shard T into S parts T = {T 1,...,T S } ᾱ s = Perceptron(T s, 0, T) ᾱ = s µ sᾱ s return ᾱ Figure 2: Parameter mixing method for distributed perceptron 3.2 Iterative Parameter Mixing A slight modification to the parameter mixing method, called iterative parameter mixing, makes it optimal. Iterative parameter mixing finds a separating hyperplane (assuming that the training set is separable), and yields to comparable or better accuracies than serially trained perceptron, at the cost of increasing network usage. Similar to parameter mixing, we first shard the training data into S shards. Then, we train a separate single epoch perceptron on each shard and mix (average) the model weights. We train another single epoch perceptron on each shard, but this time with the mixed weight vector as the initial value for the perceptrons. The process repeats for T times. Figure 3 shows this algorithm [McDonald et al.2010]. In a map-reduce framework, training the single epoch perceptron models is done in the map step and mixing the parameter values and re-sending them to the shards is done in the reduce step. 3.3 Experiments We investigated distributed training of the perceptron algorithm in the POS tagging task. We used WSJ Penn Treebank sections 2-21 for training and section 24 for development. Note that we focus on the training phase results in this section. We IterativeParameterMix(T= {(x i, y i )} N i=1 ) Shard T into S parts T = {T 1,...,T S } Set ᾱ = 0 For t = 1..T ᾱ (s,t) = Perceptron(T s, ᾱ, 1) ᾱ = s µ (s,t)ᾱ (s,t) return ᾱ Figure 3: Iterative parameter mixing method for distributed perceptron 5
6 compared POS tagging accuracy and perceptron training time in three systems: 1. Serial: non-distributed perceptron on all training data. 2. Parameter mix: distributed perceptron using the parameter mixing method. 3. Iterative parameter mix: distributed perceptron using the iterative parameter mixing method. For all three systems we compared results for regular and averaged perceptron algorithms. For parallel systems (2 and 3) we used 10 disjoint equal-sized shards of training data, built by sequentially splitting the complete training data. Each shard contains around 3,900 POS tagged sentences. To mix the model weights, we used the uniform mixing strategy by taking the mean of the weight vectors, in both regular and averaged perceptrons. Note that the reported results in this section are not from an actual map-reduce such as Haoop implementation. Instead, we simulated a map-reduce framework by piplining the perceptron epochs and wight vector mixing procedures iterativelty. The training time reported for each training epoch is calculated by taking the maximum value among all parallel shards. 3.4 Results Results of the regular and averaged perceptrons are shown in Figure 4. Both distributed algorithms return the models much quicker in terms of wall clock as well as the number of training epochs compared to the serially trained perceptron, in both regular and averaged perceptrons. Parameter mixing method does not meet the performance of training serially on all data for averaged perceptron, neither for the regular perceptron except in the first few epochs. Iterative parameter mixing method achieves a better performance than the parameter mixing method in both regular and averaged perceptrons. It also achieves better accuracy compared to the serial scenario in the regular perceptron, and a comparable accuracy in the averaged perceptron. This happens because the parameter mixing has a similar effect as the averaged perceptron. Mixing parameters by taking the average of those, reduces the variances between different weight vectors, and produces a regularization effect. 4 Conclusion and Future Work In this paper, we described a discriminative training method to train a POS tagger using the perceptron algorithm in serial and distributed scenarios. Our POS 6
7 Regular Perceptron Averaged Perceptron POS tagging accuracy Serial Parameter Mix Iterative Parameter Mix POS tagging accuracy Serial Parameter Mix Iterative Parameter Mix Time (ms) Time (ms) Figure 4: Accuracy versus time for regular and averaged distributed perceptron in POS tagging tagger achieves over 97% accuracy for English. Training features are languageindependent and the tagger can be used for arbitrary languages. To reduce the training time, we tried two distributed training strategies for the structured perceptron, previously proposed in [McDonald et al.2010], in the task of POS tagging. Both parameter mixing methods are very quick and accurate, however, in a simple parameter mixing it is not guaranteed to produce an optimal model. Both distributed approaches reduce the time required to train the perceptron algorithm significantly. Similar results to the other structured prediction tasks, named entity recognition and dependency parsing, are hold for POS tagging. Our tagger is available for download in POStagger. Supporting other finite-state tagging tasks such as shallow parsing or hedge segmentation [Yarmohammadi et al.2014] will be available in the toolkit soon. Another future work is to implement a map-reduce version of distributed perceptron algorithms in the Hadoop framework. 7
8 References [Collins2002] M. Collins Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, EMNLP 02, pages 1 8, Stroudsburg, PA, USA. [Marcus et al.1999] Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor Treebank-3. Linguistic Data Consortium, Philadelphia. [McDonald et al.2010] Ryan McDonald, Keith Hall, and Gideon Mann Distributed training strategies for the structured perceptron. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 10, pages , Stroudsburg, PA, USA. Association for Computational Linguistics. [Ratnaparkhi1997] Adwait Ratnaparkhi A maximum entropy model for part-of-speech tagging. In EMNLP [Roark et al.2012] Brian Roark, Kristy Hollingshead, and Nathan Bodenstab Finite-state chart constraints for reduced complexity context-free parsing pipelines. Computational Linguistics, 38(4): [Rosenblatt1958] Frank Rosenblatt The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6): [Yarmohammadi et al.2014] Mahsa Yarmohammadi, Aaron Dunlop, and Brian Roark Transforming trees into hedges and parsing with hedgebank grammars. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages , Baltimore, Maryland, June. 8
HadoopPerceptron: a Toolkit for Distributed Perceptron Training and Prediction with MapReduce
HadoopPerceptron: a Toolkit for Distributed Perceptron Training and Prediction with MapReduce Andrea Gesmundo Computer Science Department University of Geneva Geneva, Switzerland andrea.gesmundo@unige.ch
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationAT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands
AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com
More informationUsing Search-Logs to Improve Query Tagging
Using Search-Logs to Improve Query Tagging Kuzman Ganchev Keith Hall Ryan McDonald Slav Petrov Google, Inc. {kuzman kbhall ryanmcd slav}@google.com Abstract Syntactic analysis of search queries is important
More informationFeature Extraction and Loss training using CRFs: A Project Report
Feature Extraction and Loss training using CRFs: A Project Report Ankan Saha Department of computer Science University of Chicago March 11, 2008 Abstract POS tagging has been a very important problem in
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationIterative CKY parsing for Probabilistic Context-Free Grammars
Iterative CKY parsing for Probabilistic Context-Free Grammars Yoshimasa Tsuruoka and Jun ichi Tsujii Department of Computer Science, University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 CREST, JST
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationWord Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging Wenbin Jiang Haitao Mi Qun Liu Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy
More informationReranking with Baseline System Scores and Ranks as Features
Reranking with Baseline System Scores and Ranks as Features Kristy Hollingshead and Brian Roark Center for Spoken Language Understanding OGI School of Science & Engineering Oregon Health & Science University
More informationTraining for Fast Sequential Prediction Using Dynamic Feature Selection
Training for Fast Sequential Prediction Using Dynamic Feature Selection Emma Strubell Luke Vilnis Andrew McCallum School of Computer Science University of Massachusetts, Amherst Amherst, MA 01002 {strubell,
More informationComparisons of Sequence Labeling Algorithms and Extensions
Nam Nguyen Yunsong Guo Department of Computer Science, Cornell University, Ithaca, NY 14853, USA NHNGUYEN@CS.CORNELL.EDU GUOYS@CS.CORNELL.EDU Abstract In this paper, we survey the current state-ofart models
More informationOn Structured Perceptron with Inexact Search, NAACL 2012
On Structured Perceptron with Inexact Search, NAACL 2012 John Hewitt CIS 700-006 : Structured Prediction for NLP 2017-09-23 All graphs from Huang, Fayong, and Guo (2012) unless otherwise specified. All
More informationEasy-First POS Tagging and Dependency Parsing with Beam Search
Easy-First POS Tagging and Dependency Parsing with Beam Search Ji Ma JingboZhu Tong Xiao Nan Yang Natrual Language Processing Lab., Northeastern University, Shenyang, China MOE-MS Key Lab of MCC, University
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationHybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser
Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser Nathan David Green and Zdeněk Žabokrtský Charles University in Prague Institute of Formal and Applied Linguistics
More informationThe Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013
The Perceptron Simon Šuster, University of Groningen Course Learning from data November 18, 2013 References Hal Daumé III: A Course in Machine Learning http://ciml.info Tom M. Mitchell: Machine Learning
More informationConditional Random Fields for Object Recognition
Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationOnline Learning of Approximate Dependency Parsing Algorithms
Online Learning of Approximate Dependency Parsing Algorithms Ryan McDonald Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 {ryantm,pereira}@cis.upenn.edu
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationConditional Random Fields. Mike Brodie CS 778
Conditional Random Fields Mike Brodie CS 778 Motivation Part-Of-Speech Tagger 2 Motivation object 3 Motivation I object! 4 Motivation object Do you see that object? 5 Motivation Part-Of-Speech Tagger -
More informationGraph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11
Graph-Based Parsing Miguel Ballesteros Algorithms for NLP Course. 7-11 By using some Joakim Nivre's materials from Uppsala University and Jason Eisner's material from Johns Hopkins University. Outline
More informationA New Perceptron Algorithm for Sequence Labeling with Non-local Features
A New Perceptron Algorithm for Sequence Labeling with Non-local Features Jun ichi Kazama and Kentaro Torisawa Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa, 923-1292
More informationFlexible Text Segmentation with Structured Multilabel Classification
Flexible Text Segmentation with Structured Multilabel Classification Ryan McDonald Koby Crammer Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia,
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationAccelerated Estimation of Conditional Random Fields using a Pseudo-Likelihood-inspired Perceptron Variant
Accelerated Estimation of Conditional Random Fields using a Pseudo-Likelihood-inspired Perceptron Variant Teemu Ruokolainen a Miikka Silfverberg b Mikko Kurimo a Krister Lindén b a Department of Signal
More informationAn Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing
An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing Jun Suzuki, Hideki Isozaki NTT CS Lab., NTT Corp. Kyoto, 619-0237, Japan jun@cslab.kecl.ntt.co.jp isozaki@cslab.kecl.ntt.co.jp
More informationBase Noun Phrase Chunking with Support Vector Machines
Base Noun Phrase Chunking with Support Vector Machines Alex Cheng CS674: Natural Language Processing Final Project Report Cornell University, Ithaca, NY ac327@cornell.edu Abstract We apply Support Vector
More informationConditional Random Fields for Word Hyphenation
Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,
More informationTekniker för storskalig parsning: Dependensparsning 2
Tekniker för storskalig parsning: Dependensparsning 2 Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Dependensparsning 2 1(45) Data-Driven Dependency
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationSemi-Supervised Learning of Named Entity Substructure
Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)
More informationMaximum Entropy based Natural Language Interface for Relational Database
International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 7, Number 1 (2014), pp. 69-77 International Research Publication House http://www.irphouse.com Maximum Entropy based
More informationLearning with Probabilistic Features for Improved Pipeline Models
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu School of EECS Ohio University Athens, OH 45701 bunescu@ohio.edu Abstract We present a novel learning framework for pipeline
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationDensity-Driven Cross-Lingual Transfer of Dependency Parsers
Density-Driven Cross-Lingual Transfer of Dependency Parsers Mohammad Sadegh Rasooli Michael Collins rasooli@cs.columbia.edu Presented by Owen Rambow EMNLP 2015 Motivation Availability of treebanks Accurate
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationParsing in Parallel on Multiple Cores and GPUs
1/28 Parsing in Parallel on Multiple Cores and GPUs Mark Johnson Centre for Language Sciences and Department of Computing Macquarie University ALTA workshop December 2011 Why parse in parallel? 2/28 The
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationSupport Vector Machine Learning for Interdependent and Structured Output Spaces
Support Vector Machine Learning for Interdependent and Structured Output Spaces I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, ICML, 2004. And also I. Tsochantaridis, T. Joachims, T. Hofmann,
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More informationConditional Random Fields : Theory and Application
Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF
More informationCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools Wahed Hemati, Alexander Mehler, and Tolga Uslu Text Technology Lab, Goethe Universitt
More informationDetection and Extraction of Events from s
Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationBetter Evaluation for Grammatical Error Correction
Better Evaluation for Grammatical Error Correction Daniel Dahlmeier 1 and Hwee Tou Ng 1,2 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department of Computer Science, National University
More informationarxiv: v2 [cs.cl] 24 Mar 2015
Yara Parser: A Fast and Accurate Dependency Parser Mohammad Sadegh Rasooli 1 and Joel Tetreault 2 1 Department of Computer Science, Columbia University, New York, NY, rasooli@cs.columbia.edu 2 Yahoo Labs,
More informationConvolution Kernels for Natural Language
Convolution Kernels for Natural Language Michael Collins AT&T Labs Research 180 Park Avenue, New Jersey, NJ 07932 mcollins@research.att.com Nigel Duffy Department of Computer Science University of California
More informationDual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang Wen-tau Yih Microsoft Research Redmond, WA 98052, USA {minchang,scottyih}@microsoft.com Abstract Due to
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationModeling Sequence Data
Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and
More informationLexicalized Semi-Incremental Dependency Parsing
Lexicalized Semi-Incremental Dependency Parsing Hany Hassan Khalil Sima an Andy Way Cairo TDC Language and Computation School of Computing IBM University of Amsterdam Dublin City University Cairo, Egypt
More informationConfidence in Structured-Prediction using Confidence-Weighted Models
Confidence in Structured-Prediction using Confidence-Weighted Models Avihai Mejer Department of Computer Science Technion-Israel Institute of Technology Haifa 32, Israel amejer@tx.technion.ac.il Koby Crammer
More informationLarge-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop
Large-Scale Syntactic Processing: JHU 2009 Summer Research Workshop Intro CCG parser Tasks 2 The Team Stephen Clark (Cambridge, UK) Ann Copestake (Cambridge, UK) James Curran (Sydney, Australia) Byung-Gyu
More informationKarami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.
Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review
More informationIntroduction to SLAM Part II. Paul Robertson
Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space
More informationLexicalized Semi-Incremental Dependency Parsing
Lexicalized Semi-Incremental Dependency Parsing Hany Hassan, Khalil Sima an and Andy Way Abstract Even leaving aside concerns of cognitive plausibility, incremental parsing is appealing for applications
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationRandom Restarts in Minimum Error Rate Training for Statistical Machine Translation
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationTransition-Based Dependency Parsing with MaltParser
Transition-Based Dependency Parsing with MaltParser Joakim Nivre Uppsala University and Växjö University Transition-Based Dependency Parsing 1(13) Introduction Outline Goals of the workshop Transition-based
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationFrustratingly Easy Domain Adaptation
Frustratingly Easy Domain Adaptation Hal Daumé III School of Computing University of Utah Salt Lake City, Utah 84112 me@hal3.name Abstract We describe an approach to domain adaptation that is appropriate
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationAlignment Link Projection Using Transformation-Based Learning
Alignment Link Projection Using Transformation-Based Learning Necip Fazil Ayan, Bonnie J. Dorr and Christof Monz Department of Computer Science University of Maryland College Park, MD 20742 {nfa,bonnie,christof}@umiacs.umd.edu
More informationTTIC 31190: Natural Language Processing
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking
More informationUtilizing Dependency Language Models for Graph-based Dependency Parsing Models
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models Wenliang Chen, Min Zhang, and Haizhou Li Human Language Technology, Institute for Infocomm Research, Singapore {wechen, mzhang,
More informationA Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy Francesco Sartorio Department of Information Engineering University of Padua, Italy sartorio@dei.unipd.it Giorgio Satta Department
More informationIterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation Wenbin Jiang and Fandong Meng and Qun Liu and Yajuan Lü Key Laboratory of Intelligent Information Processing
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationQANUS A GENERIC QUESTION-ANSWERING FRAMEWORK
QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK NG, Jun Ping National University of Singapore ngjp@nus.edu.sg 30 November 2009 The latest version of QANUS and this documentation can always be downloaded from
More informationA simple pattern-matching algorithm for recovering empty nodes and their antecedents
A simple pattern-matching algorithm for recovering empty nodes and their antecedents Mark Johnson Brown Laboratory for Linguistic Information Processing Brown University Mark Johnson@Brown.edu Abstract
More informationTransition-Based Parsing of the Chinese Treebank using a Global Discriminative Model
Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model Yue Zhang Oxford University Computing Laboratory yue.zhang@comlab.ox.ac.uk Stephen Clark Cambridge University Computer
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationFast, Piecewise Training for Discriminative Finite-state and Parsing Models
Fast, Piecewise Training for Discriminative Finite-state and Parsing Models Charles Sutton and Andrew McCallum Department of Computer Science University of Massachusetts Amherst Amherst, MA 01003 USA {casutton,mccallum}@cs.umass.edu
More informationApproximate Large Margin Methods for Structured Prediction
: Approximate Large Margin Methods for Structured Prediction Hal Daumé III and Daniel Marcu Information Sciences Institute University of Southern California {hdaume,marcu}@isi.edu Slide 1 Structured Prediction
More informationStructured Perceptron with Inexact Search
Structured Perceptron with Inexact Search Liang Huang Suphan Fayong Yang Guo presented by Allan July 15, 2016 Slides: http://www.statnlp.org/sperceptron.html Table of Contents Structured Perceptron Exact
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationAdvanced PCFG Parsing
Advanced PCFG Parsing BM1 Advanced atural Language Processing Alexander Koller 4 December 2015 Today Agenda-based semiring parsing with parsing schemata. Pruning techniques for chart parsing. Discriminative
More informationNews-Oriented Keyword Indexing with Maximum Entropy Principle.
News-Oriented Keyword Indexing with Maximum Entropy Principle. Li Sujian' Wang Houfeng' Yu Shiwen' Xin Chengsheng2 'Institute of Computational Linguistics, Peking University, 100871, Beijing, China Ilisujian,
More informationVoting between Multiple Data Representations for Text Chunking
Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.
More informationDiscriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations
Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations Weiwei Sun and Rui Wang and Yi Zhang Department of Computational Linguistics, Saarland University German Research
More informationOptimal Shift-Reduce Constituent Parsing with Structured Perceptron
Optimal Shift-Reduce Constituent Parsing with Structured Perceptron Le Quang Thang Hanoi University of Science and Technology {lelightwin@gmail.com} Hiroshi Noji and Yusuke Miyao National Institute of
More informationLANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING
LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago
More informationFormalizing the Use and Characteristics of Constraints in Pipeline Systems
Formalizing the Use and Characteristics of Constraints in Pipeline Systems Kristy Hollingshead B.A., University of Colorado, 2000 M.S., Oregon Health & Science University, 2004 Presented to the Center
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationTransition-based Dependency Parsing with Rich Non-local Features
Transition-based Dependency Parsing with Rich Non-local Features Yue Zhang University of Cambridge Computer Laboratory yue.zhang@cl.cam.ac.uk Joakim Nivre Uppsala University Department of Linguistics and
More informationProbabilistic parsing with a wide variety of features
Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationLearning to Follow Navigational Route Instructions
Learning to Follow Navigational Route Instructions Nobuyuki Shimizu Information Technology Center University of Tokyo shimizu@r.dl.itc.u-tokyo.ac.jp Andrew Haas Department of Computer Science State University
More informationRobust Information Extraction with Perceptrons
Robust Information Extraction with Perceptrons Mihai Surdeanu Technical University of Catalonia surdeanu@lsi.upc.edu Massimiliano Ciaramita Yahoo! Research Barcelona massi@yahoo-inc.com Abstract We present
More informationDiscriminative Classifiers for Deterministic Dependency Parsing
Discriminative Classifiers for Deterministic Dependency Parsing Johan Hall Växjö University jni@msi.vxu.se Joakim Nivre Växjö University and Uppsala University nivre@msi.vxu.se Jens Nilsson Växjö University
More information