CS 6784 Paper Presentation
|
|
- Audra West
- 6 years ago
- Views:
Transcription
1 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John La erty, Andrew McCallum, Fernando C. N. Pereira February 20, 2014
2 Main Contributions Main Contribution Summary This 2001 paper introduced the Conditional Random Field (CRF). Describes e of features. cient representation of field potentials in terms Provides two algorithms for finding Maximum Likelihood parameter values. Provides some really unconvincing examples...
3 Main Contributions Main Contribution Summary... examples are NOT the strongest point of this paper
4 Main Contributions Talk Structure Brad CRF in context
5 Main Contributions Talk Structure Brad CRF in context The Label Bias Problem
6 Main Contributions Talk Structure Brad CRF in context The Label Bias Problem Stephanie Parameter Estimation
7 Main Contributions Talk Structure Brad CRF in context The Label Bias Problem Stephanie Parameter Estimation Experiments Conclusion
8 Intro CRF in Context X = {X 1, X 2, } be a set of observed RV Y = {Y 1, Y 2, } be a set of label RV X, Y be a set of joint observations of X, Y Generative Discriminative Directed?????? Undirected??????
9 Intro CRF in Context X = {X 1, X 2, } be a set of observed RV Y = {Y 1, Y 2, } be a set of label RV X, Y be a set of joint observations of X, Y Generative Discriminative Directed HMM??? Undirected MRF???
10 Intro CRF in Context X = {X 1, X 2, } be a set of observed RV Y = {Y 1, Y 2, } be a set of label RV X, Y be a set of joint observations of X, Y Generative Discriminative Directed HMM ME-MM Undirected MRF??? In 2001, HMM, ME-MM and MRF were well known,
11 Intro CRF in Context X = {X 1, X 2, } be a set of observed RV Y = {Y 1, Y 2, } be a set of label RV X, Y be a set of joint observations of X, Y Generative Discriminative Directed HMM ME-MM Undirected MRF CRF In 2001, HMM, ME-MM and MRF were well known,the paper presents the CRF.
12 Generative vs. Discriminative Generative vs. Discriminative Generative: maximise joint P(Y, X )=P(Y X )P(X ) Discriminative: maximise conditional P(Y X ) When is Discriminative helpful? Tractability requires independence
13 Generative vs. Discriminative Generative vs. Discriminative Generative: maximise joint P(Y, X )=P(Y X )P(X ) Discriminative: maximise conditional P(Y X ) When is Discriminative helpful? Tractability requires independence...but sometimes there are important correlations in X.
14 Generative vs. Discriminative Examples: important correlations Long range interactions in human genomics
15 Generative vs. Discriminative Examples: important correlations Long range interactions in human genomics Pronoun definition and binding
16 Generative vs. Discriminative Examples: important correlations Long range interactions in human genomics Pronoun definition and binding Context in whole scene image recognition
17 Generative vs. Discriminative Examples: important correlations Long range interactions in human genomics Pronoun definition and binding Context in whole scene image recognition Recursive structure in language
18 Directed vs. Undirected Directed vs. Undirected For a graphical model G(E, V) with joint potential (V). Let C be the set of cliques (fully connected subgroups) in G, withc 2 C having edges E c and vertices V c.
19 Directed vs. Undirected Directed vs. Undirected For a graphical model G(E, V) with joint potential (V). Let C be the set of cliques (fully connected subgroups) in G, withc 2 C having edges E c and vertices V c. Finally, Dom(V) is the set of all values assumable by the random variables, V(= X [ Y). P(V) = 1 Z (V), Z = X vindom(v) (v)
20 Directed vs. Undirected Directed vs. Undirected, continued Compactness requires factorization (Hammersley-Cli ord, 1971): (V) = Y c2c c(v c )
21 Directed vs. Undirected Directed vs. Undirected, continued Compactness requires factorization (Hammersley-Cli ord, 1971): (V) = Y c2c c(v c ) Directed: local Normalization - X 8c 2 C, v2dom(v c) c(v) =1
22 Directed vs. Undirected Directed vs. Undirected, continued Compactness requires factorization (Hammersley-Cli ord, 1971): (V) = Y c2c c(v c ) Directed: local Normalization - each c is a probability. X 8c 2 C, c(v) =1 v2dom(v c)
23 Directed vs. Undirected Directed vs. Undirected, continued Compactness requires factorization (Hammersley-Cli ord, 1971): (V) = Y c2c c(v c ) Directed: local Normalization - each c is a probability. X 8c 2 C, c(v) =1 v2dom(v c) Undirected: Global Normalization - relaxes this constraint... but what does it buy us?
24 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data:
25 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data: Y 1 P(Y 1 X 1 ) 1 2 X 1 =R Rel. Joint (Y 2,X 2,Y 1 ) 3 4 X 2 =I 8 Y 1 =1 Y 1 =2 Y 2 X 2 =O X 2 =I X 2 =O 2
26 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data: Y 1 P(Y 1 X 1 ) 1 2 X 1 =R Y 2 Rel. Joint (Y 2,X 2,Y 1 ) 3 4 X 2 =I 8 Y 1 =1 Y 1 =2 X 2 =O X 2 =I X 2 =O 2 Conditional P(Y 2 X 2,Y 1 ) 3 4 X 2 =I 1- Y 1 =1 Y 1 =2 Y 2 X 2 =O X 2 =I X 2 =O 1-
27 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data: Y 1 P(Y 1 X 1 ) 1 2 X 1 =R Y 2 Rel. Joint (Y 2,X 2,Y 1 ) 3 4 X 2 =I 8 Y 1 =1 Y 1 =2 X 2 =O X 2 =I X 2 =O 2 Conditional P(Y 2 X 2,Y 1 ) 3 4 X 2 =I 1- Y 1 =1 Y 1 =2 Y 2 X 2 =O X 2 =I X 2 =O 1- Viterbi is Lets try it for Y 1,Y 2 P(Y 1 R) P(Y 2 Y 1,O) P(Y 1,Y 2 RO) 1, , , , Which labeling wins?
28 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data: Y 1 P(Y 1 X 1 ) 1 2 X 1 =R Y 2 Rel. Joint (Y 2,X 2,Y 1 ) 3 4 X 2 =I 8 Y 1 =1 Y 1 =2 X 2 =O X 2 =I X 2 =O 2 Conditional P(Y 2 X 2,Y 1 ) 3 4 X 2 =I 1- Y 1 =1 Y 1 =2 Y 2 X 2 =O X 2 =I X 2 =O 1- Viterbi is Lets try it for Y 1,Y 2 P(Y 1 R) P(Y 2 Y 1,O) P(Y 1,Y 2 RO) 1, , , , But we want What happened?
29 The Label Bias Problem: Conditional Markov Model (EM MM) Toy Problem fragment of a ME MM Training Data: Local Normalization requires a probability. So.. Y 2 Rel. Joint (Y 2,X 2,Y 1 ) 3 4 X 2 =I 8 Y 1 =1 Y 1 =2 X 2 =O X 2 =I X 2 =O 2 Conditional P(Y 2 X 2,Y 1 ) 3 4 X 2 =I 1- Y 1 =1 Y 1 =2 Y 2 X 2 =O X 2 =I X 2 =O 1- Y 1,Y 2 P(Y 1 R) P(Y 2 Y 1,O) P(Y 1,Y 2 RO) 1, , , ,
30 Toy Problem fragment of a CRF The Label Bias Problem: Potentials Training Data:
31 Toy Problem fragment of a CRF The Label Bias Problem: Potentials X 1 Y R Y Y Y I 8 X 2 O 2 Training Data:
32 Toy Problem fragment of a CRF The Label Bias Problem: Potentials X 1 Y R Y Y Y I 8 X 2 O 2 Training Data: Y1,Y2 (X1,Y1) (Y1,Y2) (X2,Y2) (Y1,Y2,X=RO) P(Y1,Y2 X) 1, , ,3 2 2, Which labeling wins, now?
33 Toy Problem fragment of a CRF The Label Bias Problem: Potentials X 1 Y R Y Y Y I 8 X 2 O 2 Training Data: Y1,Y2 (X1,Y1) (Y1,Y2) (X2,Y2) (Y1,Y2,X=RO) P(Y1,Y2 X) 1, small 1, tiny 2, infinitesimal 2, ~100% Because potentials do not have to normalize into probabilities until AFTER aggregation, they don t suffer from inappropriate conditioning.
34 Fun fact Fun fact: We have seen this in class before! Graphical model G(E, V) with joint potential (V), C the set of cliques in G with c 2 C having edges E c and vertices V c P(V) / (V) = Y c2c c(v c )
35 Fun fact Fun fact: We have seen this in class before! Graphical model G(E, V) with joint potential (V), C the set of cliques in G with c 2 C having edges E c and vertices V c P(V) / (V) = Y c2c c(v c ) M 3 nets: cliques are pairs, and all conditioned on observed x: P(y x) / Y ij(y i, y j, x) (i,j)2e
36 Fun fact Fun fact: We have seen this in class before! Graphical model G(E, V) with joint potential (V), C the set of cliques in G with c 2 C having edges E c and vertices V c P(V) / (V) = Y c2c c(v c ) M 3 nets: cliques are pairs, and all conditioned on observed x: P(y x) / Y ij(y i, y j, x) (i,j)2e AMN: cliques are pairs of nodes and singletons: P (y) = 1 NY i(y i ) Y i,j(y i, y j ) Z i i,j2e
37 Where do Parameters Come From? CRF s are part of the same general class, Ψ Ψ
38 Where do Parameters Come From? CRF s are part of the same general class, Ψ Ψ For trees, cliques are pairs of vertices sharing an edge ( ), and single vertices ( ): Ψ Ψ Ψ
39 Where do Parameters Come From? CRF s are part of the same general class, Ψ Ψ For trees, cliques are pairs of vertices sharing an edge ( ), and single vertices ( ): Ψ Ψ Ψ And because this is a conditional network with Ψ Ψ, Ψ,
40 Where do Parameters Come From? CRF s are part of the same general class, Ψ Ψ For trees, cliques are pairs of vertices sharing an edge ( ), and single vertices ( ): Ψ Ψ Ψ And because this is a conditional network with An exponential identity gives us Ψ Ψ, Ψ, Ψ exp log Ψ, log Ψ,
41 Where do Parameters Come From? CRF s are part of the same general class, Ψ Ψ For trees, cliques are pairs of vertices sharing an edge ( ), and single vertices ( ): Ψ Ψ Ψ And because this is a conditional network with An exponential identity gives us Ψ Ψ, Ψ, Ψ exp log Ψ, log Ψ, Potentials can be ANY positive values like linear combinations of arbitrary features Ψ exp,,,,,,
42 Training algorithms Improved iterative scaling Want to maximize log-likelihodod with respect to parameters =( 1, 2, ; µ 1,µ 2, ) 1 Della Pietra et al. (1997)
43 Training algorithms Improved iterative scaling Want to maximize log-likelihodod with respect to parameters =( 1, 2, ; µ 1,µ 2, ) Method: Improved Iterative Scaling 1 : Extension of Generalised Iterative Scaling (Darroch and Ratcli 1972). 1 Della Pietra et al. (1997)
44 Training algorithms Improved iterative scaling Want to maximize log-likelihodod with respect to parameters =( 1, 2, ; µ 1,µ 2, ) Method: Improved Iterative Scaling 1 : Extension of Generalised Iterative Scaling (Darroch and Ratcli 1972). Improved because features need not sum to constant. 1 Della Pietra et al. (1997)
45 Training algorithms Improved iterative scaling Want to maximize log-likelihodod with respect to parameters =( 1, 2, ; µ 1,µ 2, ) Method: Improved Iterative Scaling 1 : Extension of Generalised Iterative Scaling (Darroch and Ratcli 1972). Improved because features need not sum to constant. Idea: new set of parameters 0 = + =( 1 + 1, ; µ 1 + µ 1 ) which will not decrease objective function. Iteratively apply! 1 Della Pietra et al. (1997)
46 Training algorithms Improved iterative scaling Want to maximize log-likelihodod with respect to parameters =( 1, 2, ; µ 1,µ 2, ) Method: Improved Iterative Scaling 1 : Extension of Generalised Iterative Scaling (Darroch and Ratcli 1972). Improved because features need not sum to constant. Idea: new set of parameters 0 = + =( 1 + 1, ; µ 1 + µ 1 ) which will not decrease objective function. Iteratively apply! Problem: slow, and nobody uses this any more. 1 Della Pietra et al. (1997)
47 Training algorithms Modern CRF training - L-BFGS Generally use L-BFGS 2 algorithm. 2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm
48 Training algorithms Modern CRF training - L-BFGS Generally use L-BFGS 2 algorithm. Approximates Newton s method. Optimise multivariate function f ( ) through updates t+1 = t [Hf ( t )] 1 rf ( t ) 2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm
49 Training algorithms Modern CRF training - L-BFGS Generally use L-BFGS 2 algorithm. Approximates Newton s method. Optimise multivariate function f ( ) through updates t+1 = t [Hf ( t )] 1 rf ( t ) Quasi-Newtonian: approximates Hessian Hf ( ). 2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm
50 Training algorithms Modern CRF training - L-BFGS Generally use L-BFGS 2 algorithm. Approximates Newton s method. Optimise multivariate function f ( ) through updates t+1 = t [Hf ( t )] 1 rf ( t ) Quasi-Newtonian: approximates Hessian Hf ( ). Limited-memory: doesn t store full (approximate) Hessian. 2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm
51 Performance Label bias Generate data with noisy HMM. 4-state system (not counting initial state ), transitions: 1 ) 2 ) 3 4 ) 5 ) 3 Emissions: highly biased! P(X = Y s preferred value Y ) = 29/32 P(X = other Y )=1/32 Preferred values: 1! r, 4! r, 2! i, 5! o, 3! b.
52 Performance Label bias Generate data with noisy HMM. 4-state system (not counting initial state ), transitions: 1 ) 2 ) 3 4 ) 5 ) 3 Emissions: highly biased! P(X = Y s preferred value Y ) = 29/32 P(X = other Y )=1/32 Preferred values: 1! r, 4! r, 2! i, 5! o, 3! b. Result: CRF error 4.6%, MEMM error 42%.
53 Performance Mixed-order sources Generate data with mixed-order HMM: Transitions: (1 )p 1 (y i y i 1 )+ p 2 (y i y i 1, y i 2 ) Emissions: (1 )p 1 (x i y i )+ p 2 (x i y i, x i 1 ) Five labels, 26 observation values. Training/testing: 1000 sequences of length 25. CRF trained with Algorithm S (modified IIS). MEMM trained with iterative scaling. Viterbi to label test set.
54 Performance Mixed-order sources: results Squares : < 0.5.
55 Performance Mixed-order sources: results Squares : < 0.5. CRF sort of wins?
56 Performance Part of Speech Tagging Penn Treebank: 45 syntactic tags, label each word in sentence. Train first-order HMM, MEMM, CRF.
57 Performance Part of Speech Tagging Penn Treebank: 45 syntactic tags, label each word in sentence. Train first-order HMM, MEMM, CRF.
58 Performance Part of Speech Tagging Penn Treebank: 45 syntactic tags, label each word in sentence. Train first-order HMM, MEMM, CRF. Spelling features exploit conditional framework. Examples: starts with number/upper case?, contains hyphen, has su x?
59 Skip-chain CRF Skip-chain CRF Example: skip-chain CRF 3. 3 From An Introduction to Conditional Random Fields for Relational Learning, CharlesSuttonandAndrewMcCallum,2006.
60 Skip-chain CRF Skip-chain CRF Example: skip-chain CRF 3. Has long-range features! 3 From An Introduction to Conditional Random Fields for Relational Learning, CharlesSuttonandAndrewMcCallum,2006.
61 Skip-chain CRF Skip-chain CRF Example: skip-chain CRF 3. Has long-range features! Basic idea: extend linear-chain CRF by joining some distant observations with skip edges. 3 From An Introduction to Conditional Random Fields for Relational Learning, CharlesSuttonandAndrewMcCallum,2006.
62 Skip-chain CRF Skip-chain CRF Example: skip-chain CRF 3. Has long-range features! Basic idea: extend linear-chain CRF by joining some distant observations with skip edges. Connect multiple mentions of entity across whole document. 3 From An Introduction to Conditional Random Fields for Relational Learning, CharlesSuttonandAndrewMcCallum,2006.
63 Skip-chain CRF Example: Note: Squares denote factors (e.g. potential functions).
64 Skip-chain CRF Example: Note: Squares denote factors (e.g. potential functions). Question: Ignoring the skip edges (in blue), what potentials does Y i appear in?
65 Skip-chain CRF Example: Note: Squares denote factors (e.g. potential functions). Question: Ignoring the skip edges (in blue), what potentials does Y i appear in? Answer: (Y i, Y i 1, X i ), (Y i+1, Y i, X i+1 ).
66 Skip-chain CRF Skip-chain task Data: 485 announcements for seminars at CMU.
67 Skip-chain CRF Skip-chain task Data: 485 announcements for seminars at CMU. Task: identify start time, end time, location, speaker.
68 Skip-chain CRF Skip-chain task Data: 485 announcements for seminars at CMU. Task: identify start time, end time, location, speaker. Linear chain CRF with skip edges between identical capitalised words.
69 Skip-chain CRF Skip-chain task Data: 485 announcements for seminars at CMU. Task: identify start time, end time, location, speaker. Linear chain CRF with skip edges between identical capitalised words. Other word-specific features e.g. appears in list of first names, upper case, appears to be part of time/date (by regex), etc.
70 Skip-chain CRF Skip-chain results Values are F1 scores.
71 Skip-chain CRF Skip-chain results Values are F1 scores. Repeated occurrences of speaker improve skip-chain performance.
72 Skip-chain CRF Skip-chain results Values are F1 scores. Repeated occurrences of speaker improve skip-chain performance. Tokens are consistently classified by skip-chain. Linear-chain is inconsistent on 30.2 speakers, skip-chain: 4.8.
73 Summary CRFs combine discriminative (e.g. MEMM) and undirected (e.g. MRF) properties to solve problems:
74 Summary CRFs combine discriminative (e.g. MEMM) and undirected (e.g. MRF) properties to solve problems: Global normalisation avoids label bias.
75 Summary CRFs combine discriminative (e.g. MEMM) and undirected (e.g. MRF) properties to solve problems: Global normalisation avoids label bias. Conditioning on observations avoids modelling complex dependencies.
76 Summary CRFs combine discriminative (e.g. MEMM) and undirected (e.g. MRF) properties to solve problems: Global normalisation avoids label bias. Conditioning on observations avoids modelling complex dependencies. Enables use of features using global structure.
77 Summary CRFs combine discriminative (e.g. MEMM) and undirected (e.g. MRF) properties to solve problems: Global normalisation avoids label bias. Conditioning on observations avoids modelling complex dependencies. Enables use of features using global structure. Examples in paper strangely insubstantial, but CRFs are widely and successfully used.
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationConditional Random Fields : Theory and Application
Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationFeature Extraction and Loss training using CRFs: A Project Report
Feature Extraction and Loss training using CRFs: A Project Report Ankan Saha Department of computer Science University of Chicago March 11, 2008 Abstract POS tagging has been a very important problem in
More informationCRF Feature Induction
CRF Feature Induction Andrew McCallum Efficiently Inducing Features of Conditional Random Fields Kuzman Ganchev 1 Introduction Basic Idea Aside: Transformation Based Learning Notation/CRF Review 2 Arbitrary
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationConditional Random Fields. Mike Brodie CS 778
Conditional Random Fields Mike Brodie CS 778 Motivation Part-Of-Speech Tagger 2 Motivation object 3 Motivation I object! 4 Motivation object Do you see that object? 5 Motivation Part-Of-Speech Tagger -
More informationConditional Random Fields for Word Hyphenation
Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationLecture 3: Conditional Independence - Undirected
CS598: Graphical Models, Fall 2016 Lecture 3: Conditional Independence - Undirected Lecturer: Sanmi Koyejo Scribe: Nate Bowman and Erin Carrier, Aug. 30, 2016 1 Review for the Bayes-Ball Algorithm Recall
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 6-28-2001 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationDiscrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging
0-708: Probabilisti Graphial Models 0-708, Spring 204 Disrete sequential models and CRFs Leturer: Eri P. Xing Sribes: Pankesh Bamotra, Xuanhong Li Case Study: Supervised Part-of-Speeh Tagging The supervised
More informationCRFs for Image Classification
CRFs for Image Classification Devi Parikh and Dhruv Batra Carnegie Mellon University Pittsburgh, PA 15213 {dparikh,dbatra}@ece.cmu.edu Abstract We use Conditional Random Fields (CRFs) to classify regions
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationRegularization and Markov Random Fields (MRF) CS 664 Spring 2008
Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationEdinburgh Research Explorer
Edinburgh Research Explorer An Introduction to Conditional Random Fields Citation for published version: Sutton, C & McCallum, A 2012, 'An Introduction to Conditional Random Fields' Foundations and Trends
More informationAlgorithms for Markov Random Fields in Computer Vision
Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden
More informationSocial Interactions: A First-Person Perspective.
Social Interactions: A First-Person Perspective. A. Fathi, J. Hodgins, J. Rehg Presented by Jacob Menashe November 16, 2012 Social Interaction Detection Objective: Detect social interactions from video
More informationALTW 2005 Conditional Random Fields
ALTW 2005 Conditional Random Fields Trevor Cohn tacohn@csse.unimelb.edu.au 1 Outline Motivation for graphical models in Natural Language Processing Graphical models mathematical preliminaries directed
More informationConditional Random Fields for Object Recognition
Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu
More informationSupport Vector Machine Learning for Interdependent and Structured Output Spaces
Support Vector Machine Learning for Interdependent and Structured Output Spaces I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, ICML, 2004. And also I. Tsochantaridis, T. Joachims, T. Hofmann,
More informationComparisons of Sequence Labeling Algorithms and Extensions
Nam Nguyen Yunsong Guo Department of Computer Science, Cornell University, Ithaca, NY 14853, USA NHNGUYEN@CS.CORNELL.EDU GUOYS@CS.CORNELL.EDU Abstract In this paper, we survey the current state-ofart models
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationDynamic Programming. Ellen Feldman and Avishek Dutta. February 27, CS155 Machine Learning and Data Mining
CS155 Machine Learning and Data Mining February 27, 2018 Motivation Much of machine learning is heavily dependent on computational power Many libraries exist that aim to reduce computational time TensorFlow
More informationModeling Sequence Data
Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and
More informationFinite-State and the Noisy Channel Intro to NLP - J. Eisner 1
Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner 1 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationUndirected Graphical Models. Raul Queiroz Feitosa
Undirected Graphical Models Raul Queiroz Feitosa Pros and Cons Advantages of UGMs over DGMs UGMs are more natural for some domains (e.g. context-dependent entities) Discriminative UGMs (CRF) are better
More informationEfficiently Inducing Features of Conditional Random Fields
Efficiently Inducing Features of Conditional Random Fields Andrew McCallum Computer Science Department University of Massachusetts Amherst Amherst, MA 01003 mccallum@cs.umass.edu Abstract Conditional Random
More informationIntroduction to CRFs. Isabelle Tellier
Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for? What is annotation? inputs can
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationGraphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin
Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationMaximum Entropy Model (III): training and smoothing. LING 572 Fei Xia
Maimum Entrop Model III: training and smoothing LING 572 Fei Xia 1 Outline Overview The Maimum Entrop Principle Modeling** Decoding Training** Smoothing** Case stud: POS tagging: covered in ling570 alread
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationProbabilistic Graphical Models Part III: Example Applications
Probabilistic Graphical Models Part III: Example Applications Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2014 CS 551, Fall 2014 c 2014, Selim
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationHidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi
Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential
More informationContextual Recognition of Hand-drawn Diagrams with Conditional Random Fields
Contextual Recognition of Hand-drawn Diagrams with Conditional Random Fields Martin Szummer, Yuan Qi Microsoft Research, 7 J J Thomson Avenue, Cambridge CB3 0FB, UK szummer@microsoft.com, yuanqi@media.mit.edu
More informationCSEP 517 Natural Language Processing Autumn 2013
CSEP 517 Natural Language Processing Autumn 2013 Unsupervised and Semi-supervised Learning Luke Zettlemoyer - University of Washington [Many slides from Dan Klein and Michael Collins] Overview Unsupervised
More informationScaling Conditional Random Fields for Natural Language Processing
Scaling Conditional Random Fields for Natural Language Processing Trevor A. Cohn Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy January, 2007 Department of Computer
More informationSemi-Markov Conditional Random Fields for Information Extraction
Semi-Markov Conditional Random Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N N I P S 2 0 0 4 P R E S E N T E D B Y : D I N E S H K H A N D E L W A L S L I
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 4 Due Apr 27, 12:00 noon Submission: Homework is due on the due date at 12:00 noon. Please see course website for policy on late submission. You must submit
More informationSegmentation. Bottom up Segmentation Semantic Segmentation
Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,
More informationLogistic Regression
Logistic Regression ddebarr@uw.edu 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationLecture 5: Markov models
Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a
More informationCS545 Project: Conditional Random Fields on an ecommerce Website
CS545 Project: Conditional Random Fields on an ecommerce Website Brock Wilcox December 18, 2013 Contents 1 Conditional Random Fields 1 1.1 Overview................................................. 1 1.2
More informationHidden Markov Models in the context of genetic analysis
Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationLearning Diagram Parts with Hidden Random Fields
Learning Diagram Parts with Hidden Random Fields Martin Szummer Microsoft Research Cambridge, CB 0FB, United Kingdom szummer@microsoft.com Abstract Many diagrams contain compound objects composed of parts.
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationSemi-Supervised Learning of Named Entity Substructure
Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)
More informationWarm-up as you walk in
arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c
More informationSchool of Computer Science. Scheme Flow Analysis Note 2 4/27/90. Useless-Variable Elimination. Olin Shivers.
Carnegie Mellon School of Computer Science Scheme Flow Analysis Note 2 4/27/90 Useless-Variable Elimination Olin Shivers shivers@cs.cmu.edu 1 Intro In my 1988 SIGPLAN paper Control-Flow Analysis in Scheme,
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationNewton and Quasi-Newton Methods
Lab 17 Newton and Quasi-Newton Methods Lab Objective: Newton s method is generally useful because of its fast convergence properties. However, Newton s method requires the explicit calculation of the second
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationExperimental Data and Training
Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion
More informationConditional Random Fields for Activity Recognition
Conditional Random Fields for Activity Recognition Douglas L. Vail CMU-CS-08-119 April, 2008 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 Thesis
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationChapter VI Automatic Semantic Annotation Using Machine Learning
0 Chapter VI Automatic Semantic Annotation Using Machine Learning Jie Tang Tsinghua University, Beijing, China Duo Zhang University of Illinois, Urbana-Champaign, USA Limin Yao Tsinghua University, Beijing,
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationFinding Non-overlapping Clusters for Generalized Inference Over Graphical Models
1 Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models Divyanshu Vats and José M. F. Moura arxiv:1107.4067v2 [stat.ml] 18 Mar 2012 Abstract Graphical models use graphs to compactly
More information10 Sum-product on factor tree graphs, MAP elimination
assachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 10 Sum-product on factor tree graphs, AP elimination algorithm The
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty LAFFERTY@CS.CMU.EDU Andrew McCallum MCCALLUM@WHIZBANG.COM Fernando Pereira Þ FPEREIRA@WHIZBANG.COM
More informationEstimating Human Pose in Images. Navraj Singh December 11, 2009
Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks
More informationStatistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart 1 Motivation Up to now we have considered distributions of a single random variable
More informationConstrained and Unconstrained Optimization
Constrained and Unconstrained Optimization Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Oct 10th, 2017 C. Hurtado (UIUC - Economics) Numerical
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationConditional Random Field for tracking user behavior based on his eye s movements 1
Conditional Random Field for tracing user behavior based on his eye s movements 1 Trinh Minh Tri Do Thierry Artières LIP6, Université Paris 6 LIP6, Université Paris 6 8 rue du capitaine Scott 8 rue du
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationGiRaF: a toolbox for Gibbs Random Fields analysis
GiRaF: a toolbox for Gibbs Random Fields analysis Julien Stoehr *1, Pierre Pudlo 2, and Nial Friel 1 1 University College Dublin 2 Aix-Marseille Université February 24, 2016 Abstract GiRaF package offers
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal
More informationToday s outline: pp
Chapter 3 sections We will SKIP a number of sections Random variables and discrete distributions Continuous distributions The cumulative distribution function Bivariate distributions Marginal distributions
More informationLecture 9: Undirected Graphical Models Machine Learning
Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More information