A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Similar documents
Support Vector Machine Learning for Interdependent and Structured Output Spaces

Modeling Sequence Data

Structured Learning. Jun Zhu

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Machine Learning in GATE

Handling Place References in Text

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Exam Marco Kuhlmann. This exam consists of three parts:

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Transition-based dependency parsing

Annotation of Human Motion Capture Data using Conditional Random Fields

Comparisons of Sequence Labeling Algorithms and Extensions

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Automated Extraction of Event Details from Text Snippets

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Ortolang Tools : MarsaTag

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

Question Answering Systems

Introduction to Hidden Markov models

Feature Extraction and Loss training using CRFs: A Project Report

Final Project Discussion. Adam Meyers Montclair State University

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

TectoMT: Modular NLP Framework

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

MEMMs (Log-Linear Tagging Models)

A simple syntax-directed

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Easy-First POS Tagging and Dependency Parsing with Beam Search

CS 6784 Paper Presentation

Deliverable D1.4 Report Describing Integration Strategies and Experiments

Natural Language Processing. SoSe Question Answering

Information Retrieval

NUS-I2R: Learning a Combined System for Entity Linking

Machine Learning for Deep-syntactic MT

Introduction to Lexical Analysis

Army Research Laboratory

An UIMA based Tool Suite for Semantic Text Processing

Domain Based Named Entity Recognition using Naive Bayes

Syntax Analysis. Chapter 4

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

NLP in practice, an example: Semantic Role Labeling

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Voting between Multiple Data Representations for Text Chunking

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

A Linguistic Approach for Semantic Web Service Discovery

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

A Flexible Distributed Architecture for Natural Language Analyzers

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

3D Face and Hand Tracking for American Sign Language Recognition

Fast and Effective System for Name Entity Recognition on Big Data

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

Query Difficulty Prediction for Contextual Image Retrieval

Package corenlp. June 3, 2015

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

A Trigram Part-of-Speech Tagger for the Apertium Free/Open-Source Machine Translation Platform

Machine Translation and Discriminative Models

A programming language requires two major definitions A simple one pass compiler

Event Detection in Unstructured Text (Using Common Lisp) Jason Cornez, CTO RavenPack

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

Question Answering Using XML-Tagged Documents

CSE302: Compiler Design

WebAnno: a flexible, web-based annotation tool for CLARIN

Statistical Methods for NLP

Detection and Extraction of Events from s

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Australian Journal of Basic and Applied Sciences. Named Entity Recognition from Biomedical Abstracts An Information Extraction Task

Using Scala for building DSL s

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Token Identification Using HMM and PPM Models

Conditional Random Fields. Mike Brodie CS 778

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart

Building Search Applications

Text mining tools for semantically enriching the scientific literature

Functional Parsing A Multi-Lingual Killer- Application

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Compiling Regular Expressions COMP360

Introduction to Compiler Construction

Lexical Analysis. Chapter 2

Homework 2: HMM, Viterbi, CRF/Perceptron

Transcription:

A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012

TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of atural Language Processing and Speech tools for Italian. Part-of-speech tagging In the Part-of-Speech Tagging task, systems are required to assign a tag, consisting of a combination of lexical category (PoS tag) and morphological features to each token in a set of sentences. http://www.evalita.it/2009/tasks/pos

POS tagging and learning During the WM you have seen different quantitative approaches for modeling linguistic problem as Stochastic Processes: Hidden-Markov models (generative models) Support Vector Machines (discriminative models) POS tagging problem can be modeled as a sequential tagging task Linguistic information can be acquire by annotated example We will see how to combine this two paradigms

SVM and POS tagging We need to model the task as a stochastic process We aim to classify a sentence (i.e. a sequence of words) with respect to a possible sequence of POS The complexity is combinatory We could classify each word without the contextual information, ignoring other words in the sentence Maybe it can work for not ambiguous cases: the often but the context is crucial to classify a word like run IDA: classify words with respect to the POS tags, but using a contextual information to find the best solution for the entire sentence

SVM and POS tagging (2) An HMM model: The sentence is a SQUC Words (represented through a set of features) are our OSVATIOS HMM STATS are mapped into POS tags The transition probability is estimated from the training set SVM classifier are used to estimate the emission probability The solution is estimated by applying the Viterbi algorithm

A simple example w *,1 W *,2 x 1,1 x 1.2 x 2,1 x 2.2 A classifier x 3,1 x 3,1 for each x 4,1 POS x 5,1 * refers to the,, or POS x 3.2 x 3.2 x 4.2 x 5.2 x 6,1 x 6.2 x 7,1 x 7.2 x 8,1 x 8.2 W *, x 1, x 2, x 3, x 3, x 4, x 5, x 6, x 7, x 8, Yesterday a robber killed a guardian with a knife. V V V V V V V V V

SVM HMM : Structured Learning for POS The SVM HMM model learns a discriminative model isomorphic to a k-order Hidden Markov Model through the Structural SVM formulation. Input: feature vectors Output: label sequence Output labeling sequence: Given a story of lenght k missions Transitions The cutting-plane algorithms is applied to estimate w in polinomial time

SVM HMM input class Sent_id Feature vector Comment 4 qid:1 1:1 2:1 51:1 247:1 2675:1 # four 12 qid:1 58:1 84:1 197:1 250:1 433:1 1145:1 2677:1 # score 3 qid:1 8:1 83:1 88:1 202:1 363:1 364:1 438:1 1147:1 # and 4 qid:1 16:1 47:1 87:1 135:1 197:1 365:1 366:1 # seven 15 qid:1 30:1 49:1 142:1 197:1 202:1 387:1 # years 8 qid:1 39:1 83:1 202:1 267:1 392:1 # ago 20 qid:1 83:1 87:1 247:1 269:1 2675:1 2676:1 # our.. 21 qid:2 5:1 83:1 576:1 923:1 1379:1 1469:1 # now 19 qid:2 23:1 84:1 87:1 577:1 926:1 1383:1 1470:1 # we 30 qid:2 26:1 83:1 84:1 88:1 433:1 578:1 627:1 # are 29 qid:2 7:1 8:1 9:1 87:1 88:1 438:1 628:1 1077:1 3377:1 # engaged 8 qid:2 15:1 16:1 17:1 23:1 47:1 185:1 1082:1 3381:1 # in 8 qid:3 23:1 47:1 48:1 87:1 219:1 1621:1 # on 7 qid:3 3:1 26:1 49:1 50:1 459:1 # a 9 qid:3 5:1 197:1 217:1 460:1 519:1 1535:1 1536:1 1537:1 # great 12 qid:3 8:1 109:1 202:1 219:1 522:1 531:1 1538:1 1539:1 1540:1 # battlefield Sparse notation

How to use SVM HMM Download: http://download.joachims.org/svm_hmm/current/svm_hmm.tar.gz Compile Learn: svm_hmm_learn -c <C> --t <OD_T> -e 0.1 e 1 training_input.dat modelfile.dat -c: Typical SVM parameter C trading-off slack vs. magnitude of the weight-vector (1, 10, 100, 10 3, 10 4 depends by the training set size). --t: Order of dependencies of transitions in HMM (1,2 o 3) Classify: svm_hmm_classify test_input.dat modelfile.dat classify.tags

Feature ngineering The better is the feature representation of words, the better will be the performance Feature engineering Contextual (k word before and after the target word) The word suffix Dictionary Information Feature post-processing ormalization Do not mix features!!!

Project objectives The project consists in defining and implementing a POS tagging system based on the SVM HMM learning framework The system must be implemented in Java For this course the experimental settings are the coarse grain POS tag set open task setting (you can use external resources) You will be provided of the training/development data

Project objectives (2) The system must be Chaos compliant CHAOS is a modular and lexicalized syntactic and semantic parser for Italian and for nglish. The system implements a modular and lexicalised approach to the syntactic parsing problem. The pool of models defines a tokenizer, pos tagger, dependency parser, name entity recognizer Modules defines a sequence of annotators.g. pos tagging can not be applied without tokenizer The XDG provide a data structure containing all the linguistic information added by each module Chaos is written in JAVA

Project objectives (3) Training data will be provided within the XDG structure: Tokenized and POS tagged sentence SVM HMM is written in C The system builds an input file for the learning system Test data will be provided with no pos tags SVM HMM is written in C The system builds a file for the classification system We have a SVM HMM classifier in Java You have to define a module to enrich words with POS tagging information We will help you to integrate the classifiers

Project objectives (4) A proper feature engineering must be defined Contest: When the system is ready you will be provided of a test set Sentences must be labeled and we will measure the performances Tagging accuracy: it is defined as the percentage of correctly tagged tokens with respect to the total number of tokens A final short report is required