Introduction to CRFs. Isabelle Tellier

Size: px

Start display at page:

Download "Introduction to CRFs. Isabelle Tellier"

Mervyn Willis
6 years ago
Views:

1 Introduction to CRFs Isabelle Tellier

2 Plan 1. What is annotation for? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

3 1. What is annotation for? What is annotation? inputs can be either texts ou trees or any structure built on finite vocabulary items annotate such a structure = associate to each of its items an output label belonging to another finite vocabulary the structure is given and preserved

4 1. What is annotation for? Exemples of text annotations POS ( part of speech ) labeling : item = word annotation = morphosyntactic label (Det, N, etc.) in the text named entities (NE), IE : item = word annotation = type (D for Date, E for Event, P for Place...) + position of the NE (B for Begin, I for In, O for Out) In 2016 the Olympic Games will take place in Rio de Janeiro O DB O EB EI O O O O PB PI PI segmentation of a text into chunks, phrases, clauses... segmentation of a document into sections (ex : distinguish Title, Menus, Adverts, etc. in a Web page)

5 1. What is annotation for? Exemples of text annotations Text alignment for automatic translation J aime le chocolat I X like X chocolate X correspondance matrices are projected into couples of annotations J 1 aime 2 le 3 chocolat 4 I 1 like 2 chocolate

6 1. What is annotation for? Exemples of tree annotations SENT NP SUJ VN PRED VP OBJ. Sligos va VN PRED NP OBJ PP MOD prendre pied au NP Royaume-Uni syntactic functions, SLR (Semantic Role Labeling : agent, patient...) of a syntactic tree label = value of an attribute in an XML node

7 1. What is annotation for? Exemples of tree annotations HTML BODY Channel DelN... DIV... DelST item DelST TABLE #text DelN DelN TR DelN TD TD DelST DelN #text DIV A SPAN DIV DelST title DelN description DelST #text #text... 0 DelN link 0 DelST on the left : an HTML tree on the right : a labeling with editing operations DelN, DelST : Delete a Node/SubTree channel, item, title, link, description : rename a node

8 1. What is annotation for? Exemples of tree annotations execution of the editing operations HTML BODY... DIV... Channel item title link description TABLE #text TR TD TD #text DIV A SPAN DIV #text #text... implemented application : generations of RSS feeds from HTML pages other possible application : extraction of portions de Web pages

9 1. What is annotation for? Summary many tasks can be considered as annotation tasks for this, you need to specify the nature of input items the relationships between items : order relations of the input structure (sequence, tree...) the nature of the annotations and their meaning the relationships between annotations the relationships between the items their corresponding annotation pre-treatments and post-treatments often necessary

10 Plan 1. What is annotation for? 2. Linear and Tree-shaped CRFs 3. State of the Art 4. Conclusion

11 2. Linear and Tree-shaped CRFs Basic notions classical notations : x is the input, y its annotation (of the same structure) x and y are decomposed into random variables : x = {X 1, X 2,..., X n } et y = {Y 1, Y 2,...Y n } a graphical model defines dependances between the random variables in a graph in a generative model (HMM, PCFG), there are oriented dependence from Y i to X j Y i X j otherwise, in a discriminative model (CRF), it is possible to compute directly p(y x) without knowing p(x) learning : find the best possible parameters for p(y x) from annotated examples (x, y) by maximazing the likelihood annotation : for a new x, compute ŷ = argmax y p(y x)

12 2. Linear and Tree-shaped CRFs Basic properties of CRFs define a non oriented graph on the variables Y i (implicitely : every variable X is connected) CRFs are markovien discriminative models : p(y i X) only dépends of X and Y j (i j) such that Y i and Y j are connected CRFs are defined by (Lafferty, McCallum et Pereira 01) p(y x) = 1 Z(x) ( exp c C C is the set of cliques of the graph y c : values of y on the clique c Z(x) un normalization factor the f k are user-provided features k ) λ k f k (y c, x, i) λ k are the parameters of the model (weights for f k )

13 2. Linear and Tree-shaped CRFs The usual graph for linear CRFs Y 1... Y i 1 Y i Y i+1... Y N the features can use any information in x combined with any information in y c examples of features f k (y i 1, y i, x, i) at position i : * f k (y i 1, y i, x, i) = 1 if x i 1 {the, a} and y i 1 = Det et y i = N = 0 otherwise * f k (y i 1, y i, x, i) = 1 if {Mr, Mrs, Miss} {x i 3,..., x i 1 } = and y i = NE = 0 otherwise

14 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe fume N V. 0 ponct... Definition of features in softwares define a pattern (any shape on x, at most clique-width on y) corresponding instance : f 1 (y i 1, y i, x, i) = 1 if (x i =La) AND (y i =Det) = 0 otherwise

15 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe fume N V. 0 ponct... Associated feature f 2 (y i 1, y i, x, i) = 1 if (x i =bonne) AND (y i =Adj) = 0 otherwise

16 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe fume N V. 0 ponct... Associated feature f 4 (y i 1, y i, x, i) = 1 if (x i 1 =La) AND (y i 1 =Det) AND (x i =bonne) AND (y i =Adj) = 0 otherwise

17 2. Linear and Tree-shaped CRFs Transform a HMM into a linear CRF 1/3 1/3 Adj bonne : 1/2, grande : 1/2 2/3 2/3 Det N V intr la : 2/3 bonne : 1/3 fume : 4/5 une : 1/3 soupe : 2/3 soupe : 1/5 f 1 (y i, x,1) = 1 if y i = Det and x i = la (= 0 otherwise), λ 1 = log(2/3) f 2 (y i 1, y i, x,1) = 1 if y i 1 = Det and y i = Adj (= 0 otherwise), λ 2 = log(1/3) (if empty transition λ = ) the computation of p(y x) is the same in both cases 1

18 2. Linear and Tree-shaped CRFs Possible graphs for trees SUJ PRED OBJ SUJ PRED OBJ PRED OBJ MOD PRED OBJ MOD

19 2. Linear and Tree-shaped CRFs Implementations learning step by maximizing the log-likelihood log( p(y x)) = log p(y x) + penalty... (x,y) S by gradient descent (L-BFGS) (x,y) S annotation by Viterby (linear), inside-outside (trees), message passing (general)... computation in K N Y c (c length of the largest clique) implementations available : Mallet, GRMM, CRFSuite, CRF++, Wapiti, XCRF (for 3-width clique trees), Factorie

20 Plan 1. What is annotation for? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

21 3. State of the Art Use of CRFs for labeling tasks NE recognition (McCallum & Li, 2003) IE from tables (Pinto & al., 2003), POS labeling (Altun & al., 2003) shallow parsing (Sha & Pereira, 2003) SRL for trees (Cohn & Blusom 2005) tree transformation (Gilleron & al. 2006) non linguistic uses : image labeling/segmenting, RNA alignment...

22 3. State of the Art Extensions about the graph add dependencies in the graph : skip-chain CRFs, dynamic (multi-levels) CRFs... use CRFs for syntactic parsing (Finkel & al. 2008) build the tree structure of a CRF (Bradley & Guestrin 2010) CRFs for general graphs (grid-shaped for images) How to build the features nearly always binary feature induction (Mc Callum 2003) allow to integrate external knowledge... (cf. further) more general features may be more effective (Pu & al. 2010)

23 3. State of the Art About the learning step unsupervised or semi-supervised CRFs (difficult, not very effective) add L1 penalty to the likelihood to select the best features (Lavergne & Yvon 2010) add constraints at different possible levels (features, likelihood, labels...) : LREC 2012 tutorial (Druck & alii 2012) MCMC inference methods

24 3. State of the Art Linguistic interest sequential vs. direct complex labeling? how to integrate linguistic knowledge? as external constraints as additional labeled input data as features

25 Plan 1. What is annotation for? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

26 Conclusion Interests very effective for many tasks allow the integration of many distinct sources of information many available easy-to-use libraries Weaknesses does not support well unsupervised/semi-supervised learning not very incremental still high learning complexity with large cliques or large label vocabulary

Conditional Random Fields : Theory and Application

Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF