Guiding Semi-Supervision with Constraint-Driven Learning

Similar documents
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Posterior Regularization for Structured Latent Varaible Models

27: Hybrid Graphical Models and Neural Networks

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS 229 Midterm Review

On Structured Perceptron with Inexact Search, NAACL 2012

Computationally Efficient M-Estimation of Log-Linear Structure Models

Machine Learning. Semi-Supervised Learning. Manfred Huber

Complex Prediction Problems

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Structured Models in. Dan Huttenlocher. June 2010

QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK

Classification. 1 o Semestre 2007/2008

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Contextual Classification with Functional Max-Margin Markov Networks

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Knowledge Discovery and Data Mining 1 (VO) ( )

Conditional Random Fields for Object Recognition

Semi-Supervised Learning of Named Entity Substructure

CS5670: Computer Vision

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Posterior Regularization for Structured Latent Variable Models

Support Vector Machine Learning for Interdependent and Structured Output Spaces

model order p weights The solution to this optimization problem is obtained by solving the linear system

Introduction to Hidden Markov models

Unsupervised Rank Aggregation with Distance-Based Models

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Supervised Learning for Image Segmentation

Posterior Regularization for Structured Latent Variable Models

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Mention Detection: Heuristics for the OntoNotes annotations

Unsupervised Learning

Semi-supervised Clustering

Bayes Net Learning. EECS 474 Fall 2016

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Generative and discriminative classification techniques

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Semi-Markov Conditional Random Fields for Information Extraction

Structured Learning. Jun Zhu

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

Every Picture Tells a Story: Generating Sentences from Images

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Introduction to CRFs. Isabelle Tellier

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Vision Plan. For KDD- Service based Numerical Entity Searcher (KSNES) Version 2.0

Semi-Supervised Clustering with Partial Background Information

Adapting SVM Classifiers to Data with Shifted Distributions

CSE 573: Artificial Intelligence Autumn 2010

Automated Extraction of Event Details from Text Snippets

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Image Restoration with Deep Generative Models

Semi-supervised Learning

COMP90051 Statistical Machine Learning

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Transliteration as Constrained Optimization

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

COMS 4771 Clustering. Nakul Verma

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

All lecture slides will be available at CSC2515_Winter15.html

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Discriminative Training for Phrase-Based Machine Translation

Lecture 9: Support Vector Machines

Kernels for Structured Data

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Conditional Random Fields : Theory and Application

Probabilistic Graphical Models

Using Machine Learning to Optimize Storage Systems

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Automatic Summarization

Gradient of the lower bound

An Adaptive Framework for Multistream Classification

ECG782: Multidimensional Digital Signal Processing

Large-scale visual recognition The bag-of-words representation

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

An Exact Algorithm for the Statistical Shortest Path Problem

Machine Learning: Algorithms and Applications Mockup Examination

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

Machine Learning. Supervised Learning. Manfred Huber

Information Fusion Dr. B. K. Panigrahi

12 Classification using Support Vector Machines

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Expectation Maximization: Inferring model parameters and class labels

QuickView: NLP-based Tweet Search

A Taxonomy of Semi-Supervised Learning Algorithms

Transcription:

Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew Stone Paper presentation by: Drew Stone 1 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 2 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 3 /

Semi-supervised learning Introduction Question: When labelled data is scarce, how can we take advantage of unlabelled data for training? Intuition: If there exists structure in the underlying distribution of samples, points close/clustered to one another may share labels Paper presentation by: Drew Stone 4 /

Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Benefit: Access to more training data Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Semi-supervised learning Framework Given a model M trained on labelled samples from a distribution D and an unlabelled set of examples U D m Learn labels for each example in U Learn(U, M) and re-use examples (now labelled) to tune M M Benefit: Access to more training data Drawback: Learned model might drift from correct classifier if the assumptions on the distribution do not hold. Paper presentation by: Drew Stone 5 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 6 /

Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Benefits: Simple models (less features) are more computationally efficient Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Constraint-driven learning Introduction Motivation: Keep the learned model simple by using constraints to balance over-simplicity Benefits: Simple models (less features) are more computationally efficient Intuition: Fix a set of task-specific constraints to enable the use of a simple machine learning model but encode task-specific constraints to make both learning easier and more correct. Paper presentation by: Drew Stone 7 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Constraint-driven learning Framework Given an objective function argmax y λ F (x, y) Define the set of linear (non-linear) constraints {C i } i k C i : X Y {0, 1} Solve the optimization problem given the constraints Paper presentation by: Drew Stone 8 /

Constraint-driven learning Framework Given an objective function argmax y λ F (x, y) Define the set of linear (non-linear) constraints {C i } i k C i : X Y {0, 1} Solve the optimization problem given the constraints Any Boolean rule can be encoded as a set of linear inequalities. Paper presentation by: Drew Stone 8 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Constraint-driven learning Example task Sequence tagging with HMM/CRF + global constraints Paper presentation by: Drew Stone 9 /

Constraint-driven learning Sequence tagging constraints Unique label for each word Edges must for a path A verb must exist Paper presentation by: Drew Stone 10 /

Constraint-driven learning Example task Semantic role labelling with independent classifiers + global constraints Paper presentation by: Drew Stone 11 /

Constraint-driven learning Example task Each verb predicate carries with it a new inference problem Paper presentation by: Drew Stone 12 /

Constraint-driven learning Example task Paper presentation by: Drew Stone 13 /

Constraint-driven learning SRL constraints No duplicate argument classes Unique labels Order constraints Paper presentation by: Drew Stone 14 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 15 /

Constraint-drive learning w/ semi-supervision Introduction Intuition: Use constraints to provide better training samples from our unlabelled set of samples Paper presentation by: Drew Stone 16 /

Constraint-drive learning w/ semi-supervision Introduction Intuition: Use constraints to provide better training samples from our unlabelled set of samples Benefit: Deviations from simple model only do so towards a more expressive answer, since constraints guide the model Paper presentation by: Drew Stone 16 /

Constraint-drive learning w/ semi-supervision Examples Consider the constraint over {0, 1} n of 1 0. For every possible sequence {1 010 }, there are 2 good fixes {1 110, 1 000 }. What is the best approach for learning? Consider the constraint, state transitions must occur on punctuation marks. There could be many good sequences abiding by this constraint. Paper presentation by: Drew Stone 17 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 18 /

Constraint-driven learning w/ semi-supervision Model Consider a data and output domain X, Y and a distribution D defined over X Y Input/output pairs (x, y) X Y are given as sequence pairs: x = (x 1,, x N ), y = (y 1,, y M ) We wish to find a structured output classifier h : X Y that uses a structured scoring function f : X Y R to choose the most likely output sequence, where the correct output sequence is given the highest score We are given (or define) a set of constraints {C i } i K C i : X Y {0, 1} Paper presentation by: Drew Stone 19 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 20 /

Constraint-driven learning w/ semi-supervision Scoring Scoring rule: f (x, y) = M i=1 λ i f i (x, y) = λ F (x, y) w T φ(x, y) Compatible for any linear discriminative and generative model such as HMMs and CRFs Local feature functions {f i } i M allow for tractable inference by capturing contextual structure Space of structured pairs could be huge Locally there are smaller distances to account for Paper presentation by: Drew Stone 21 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 22 /

Constraint-driven learning w/ semi-supervision Sample constraints Paper presentation by: Drew Stone 23 /

Constraint-driven learning w/ semi-supervision Constraint pipeline (hard) Define 1 C(x) Y as the set of output sequences that assign the value 1 for a given (x, y) Our objective function then becomes argmax y 1C(x) λ F (x, y) Paper presentation by: Drew Stone 24 / ing-wei Chang Lev Ratinov Dan Roth (University GuidingofSemi-Supervision Illinois at Urbana-Champaign) with Constraint-Driven Learning

Constraint-driven learning w/ semi-supervision Constraint pipeline (hard) Define 1 C(x) Y as the set of output sequences that assign the value 1 for a given (x, y) Our objective function then becomes argmax y 1C(x) λ F (x, y) Intuition: Find best output sequence y that maximizes the score, fitting the hard constraints Paper presentation by: Drew Stone 24 /

Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) i=1 Paper presentation by: Drew Stone 25 /

Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) Goal: Find best output sequence under our model that violates the least amount of constraints i=1 Paper presentation by: Drew Stone 25 /

Constraint-driven learning w/ semi-supervision Constraint pipeline (soft) Given a suitable distance metric between an output sequence and the space of outputs respecting a single constraint Given a set of soft constraints {C i } i K with penalties {ρ i } i K, we get a new objective function: K argmax y λ F (x, y) ρ i d(y, 1 Ci (x)) Goal: Find best output sequence under our model that violates the least amount of constraints Intuition: Given a learned model, λ F (x, y), we can bias its decisions using the amount by which the output sequence violates each soft constraint Option: Use minimal Hamming distance to a sequence i=1 d(y, 1 Ci (x)) = min y C i (x)h(y, y ) Paper presentation by: Drew Stone 25 /

Constraint-driven learning w/ semi-supervision Distance example Taken from CCM-NAACL-12-Tutorial M d(y, 1 Ci (x)) = φ Ci (y j ) count violations j=1 Paper presentation by: Drew Stone 26 /

Constraint-driven learning Recall: sequence tagging Paper presentation by: Drew Stone 27 /

Constraint-driven learning Recall: sequence tagging Paper presentation by: Drew Stone 28 /

Constraint-driven learning Recall: sequence tagging (a) correct citation parsing (b) HMM predicted citation parsing Adding punctuation state transition constraint returns correct parsing under the same HMM Paper presentation by: Drew Stone 29 /

Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 30 /

Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 31 /

Constraint-driven learning Recall: SRL Paper presentation by: Drew Stone 32 /

Outline 1 Background Semi-supervised learning Constraint-driven learning 2 Constraint-driven learning with semi-supervision Introduction Model Scoring Constraint pipeline Constraint Driven Learning (CODL) algorithm 3 Experiments Paper presentation by: Drew Stone 33 /

Constraint-driven learning w/ semi-supervision CODL algorithm Constraints are used in inference not learning Maximum likelihood estimation of λ (EM algorithm) Semi-supervision occurs on line 7,9 (i.e. for each unlabelled sample, we generate K training examples) All unlabelled samples are re-labelled each training cycle, unlike self-training perspective Paper presentation by: Drew Stone 34 /

Constraint-driven learning w/ semi-supervision Expectation Maximization: Top K-hard EM EM procedures typically assign a distribution over all input/output pairs for a given unlabelled x X Instead, we choose top K, y Y maximizing our soft objective function and assign uniform probability to each output Constraints mutate distribution each step Paper presentation by: Drew Stone 35 /

Constraint-driven learning w/ semi-supervision Motivation for K > 1 There may be multiple ways to correct constraint-violating samples We have access to more training data Paper presentation by: Drew Stone 36 /

Experiments Citations Paper presentation by: Drew Stone 37 /

Experiments HMM Paper presentation by: Drew Stone 38 /

Pictures taken from: CCM-NAACL-12-Tutorial Chang, M. W., Ratinov, L., & Roth, D. (2007, June). Guiding semi-supervision with constraint-driven learning. In ACL (pp. 280-287). Paper presentation by: Drew Stone /