Machine Translation Zoo

Size: px
Start display at page:

Download "Machine Translation Zoo"

Transcription

1 Machine Translation Zoo Tree-to-tree transfer and Discriminative learning Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague May 5th 2013, Seminar of Formal Linguistics, Prague

2 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

3 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

4 Phrase-based MT (Moses) Training word-alignment (Giza++ & symmetrization) phrase extraction tune parameters (MERT) Decoding get all matching rules find one derivation with a maximum score (beam search)

5 TectoMT Training analyze CzEng to t-layer t-node alignment learn one MaxEnt model for each source lemma and formeme Decoding get all translation variants for each lemma and formeme find a labeling with a maximum score (HMTM)

6 TectoMT MaxEnt Model he / n:subj union / n:with+x agree / v:fin tense=past, voice=active negation=0, sempos=v cut / v:inf, has_left_child=0, sempos=v, has_right_child=1, tag=vb, position=right, named_entity=0 overtime / n:obj chop, saw, trim, shorten, lumber, hew, lower, delete, crop abolish, cancel,... all / adj:attr TRANSFER ANALYSIS SYNTHESIS He agreed with the unions to cut all overtime. Dohodl se s odbory na zrušení všech přesčasů.

7 Machine Translation Taxonomy Level of transfer: Base translation unit (BTU): Extract more segmentations in training? Try (search) more segmentations in decoding? Use more segmentations in the output translation? What is the context X in P(BTU target BTU source, X )?

8 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based

9 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word, phrase Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based

10 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word, phrase, phrase with gaps Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

11 Machine Translation Taxonomy Level of transfer: surface, shallow syntax Base translation unit (BTU): word, phrase, phrase with gaps, treelet Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: neighboring treelets (Quirk and Menezes, 2006) dep. treelet to string (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

12 Machine Translation Taxonomy Level of transfer: surface, shallow syntax, tectogrammatical Base translation unit (BTU): word, phrase, phrase with gaps, treelet, node Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: neighboring nodes (Quirk and Menezes, 2006) dep. treelet to string (Mareček et al., 2010) TectoMT (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

13 Machine Translation Taxonomy Level of transfer: surface, shallow syntax, tectogrammatical Base translation unit (BTU): word, phrase, phrase with gaps, treelet, node Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? yes What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Quirk and Menezes, 2006) (Mareček et al., 2010) dep. treelet to string TectoMT (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical (Arun, 2011) Monte Carlo

14 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

15 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

16 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

17 Hybrids: PhraseFix Done for WMT 2013 by Petra Galuščáková: Post-edit TectoMT output using Moses trained on cs-tectomt cs-reference (whole CzEng). How to post-edit only when confident? filter phrase table add confidence feature for MERT improve alignment (monolingual) boost phrase table (e.g. with identities) Future work: use also source (English) sentences multi-source translation project only content words (using TectoMT) factored translation with non-synchronous (overlapping) factors

18 Hybrids: PhraseFix Done for WMT 2013 by Petra Galuščáková: Post-edit TectoMT output using Moses trained on cs-tectomt cs-reference (whole CzEng). How to post-edit only when confident? filter phrase table add confidence feature for MERT improve alignment (monolingual) boost phrase table (e.g. with identities) Future work: use also source (English) sentences multi-source translation project only content words (using TectoMT) factored translation with non-synchronous (overlapping) factors

19 Even More Hybrids: DepFix, AddToTrain, Chimera DepFix (Rosa et al., 2012) post-edit SMT using syntactic analysis and rules exploit also the source sentences, robust parsing AddToTrain (Bojar, Galuščáková) translate monolingual news (or WMT devsets) with TectoMT add this to Moses parallel training data Chimera post-edit AddToTrain output with DepFix sent to WMT 2013 in attempt to beat Google

20 Even More Hybrids: DepFix, AddToTrain, Chimera DepFix (Rosa et al., 2012) post-edit SMT using syntactic analysis and rules exploit also the source sentences, robust parsing AddToTrain (Bojar, Galuščáková) translate monolingual news (or WMT devsets) with TectoMT add this to Moses parallel training data Chimera post-edit AddToTrain output with DepFix sent to WMT 2013 in attempt to beat Google

21 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

22 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w

23 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero Output: w

24 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) Output: w

25 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) 1. get its features x Output: w

26 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred Output: w

27 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold

28 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights

29 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights

30 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights Definition: conservative online learning no error no update i.e., if y pred = y gold then update(x, y gold, y pred ) = 0

31 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) //prediction(w, x) = y gold Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights Definition: conservative online learning no error no update i.e., if y pred = y gold then update(x, y gold, y pred ) = 0 Definition: aggressive online learning after the update, the instance would be classified correctly

32 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

33 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) dot product (similarity score) of weights and features w x = i w ix i Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

34 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w dot product (similarity score) of weights and features w x = i w ix i Iverson{ bracket 1 if P is true; [P] = 0 otherwise. prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

35 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w dot product (similarity score) of weights and features w x = i w ix i Iverson{ bracket 1 if P is true; [P] = 0 otherwise. prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x learning rate (step size) α > 0

36 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) ) learning rate (step size) α > 0

37 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w Special case: multi-prototype features f(x, y) def = [y = class 1 ] x, [y = class 2 ] x, [y = class C ] x prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) ) w := w + αf(x, y gold ) αf(x, y pred )

38 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) General case: any label-dependent features, e.g. f 101 (x, y) def = [(y=nnp or y=nnps) and x capitalized ] Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) )

39 Structured Prediction the number of possible labels is huge labels y have a structure (graph, tree, sequence,... ) usually can be decomposed (factorized) into subproblems local features f i (x, y, j) can use whole x, but only such y k where k is near j f 101 (x, y, j) def = [ (y j =NNP or y j =NNPS) and word x j capitalized ] f 102 (x, y, j) def = [ y j =NNP and y j 1 =NNP and x 6 ] global features F i (x, y) def = j f i(x, y, j) F number of capitalized words with tag NNP or NNPS F number of NNP followed by NNP or 0 if the sentence is longer than six words We can define also features that cannot be decomposed

40 Structured Prediction using Online Learning 1 local approach update after each local decision output of previous decisions used in local features e.g. Structured Perceptron (Collins, 2002) y pred = arg max y i w if i (x, y j, y j 1,...) 2 global approach generate n-best list (lattice) of outputs y for the whole x compute global features, do update for each x (sentence) we are re-ranking the n-best list e.g. MIRA (Crammer and Singer, 2003) y pred = arg max y i w if i (x, y)

41 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

42 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

43 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

44 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) ) Perceptron α Perc def = 1 (or any fixed value > 0) Passive Aggressive (PA) α PA def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) 2 Passive Aggressive I α PA-I def = min {C, α PA } Passive Aggressive II α PA-II def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) C

45 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) ) Perceptron α Perc def = 1 (or any fixed value > 0) Passive Aggressive (PA) α PA def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) 2 Passive Aggressive I α PA-I def = min {C, α PA } Passive Aggressive II α PA-II def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) C

46 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y gold ) f(x, y pred ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

47 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y gold ) f(x, y pred ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

48 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y hope ) f(x, y fear ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

49 -cost Cost-sensitive Online Learning Hypothesis Selection score

50 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope score

51 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope max-score fear (y pred ) score

52 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) score

53 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear score

54 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear score n-best list

55 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear max-cost fear score n-best list

56 Application to MT x = source sentence y gold = its reference translation more references sometimes available reference may be unreachable we score derivations (which include latent variables) one translation may have more derivations

57 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

58 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

59 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

60 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

61 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT VBN IN f 123 def = [y j = DT, y j+1 = NN] NNP VBD DT RB (Shen, Satta and Joshi, 2007)

62 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN NNP VBD RB f 124 def = [y j = RB, y j+1 = NN] (Shen, Satta and Joshi, 2007)

63 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN VBD NNP RB (Shen, Satta and Joshi, 2007)

64 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN VBD NNP RB (Shen, Satta and Joshi, 2007)

65 Easy-First Decoding (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN VBD NNP (Shen, Satta and Joshi, 2007)

66 Easy-First Decoding (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN VBD NNP (Shen, Satta and Joshi, 2007)

67 Guided Learning (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN y fear VBD y hope NNP (Shen, Satta and Joshi, 2007)

68 Easy-First Decoding (Dependency Parsing) (Goldberg and Elhadad, 2010)

69 (2) ATTACHRIGHT(1) Easy-First Decoding -52 (Dependency Parsing) 246 a fox brown jumped with joy (Goldberg and Elhadad, 2010)

70 (3) ATTACHRIGHT(1) Easy-First Decoding (Dependency 246 Parsing) fox jumped with joy a brown (Goldberg and Elhadad, 2010)

71 MT Intro Online brown Learning Guided Learning a brown Back to MT (4) ATTACHLEFT(2) Easy-First Decoding (Dependency Parsing) (5) ATTACHLEFT(1) jumped with joy jumped with fox fox joy 430 (6) jumped fox w a brown a brown a brownj (Goldberg and Elhadad, 2010)

72 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

73 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting score

74 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting našel zjistil, že že kniha score že rezervovat zajímavý ta kniha zajímavá kniha zajímavá Agatha přišla ta rezervovat kniha zajímavý že kniha

75 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting kniha score že rezervovat ta kniha kniha zajímavá rezervovat kniha zajímavý že kniha

76 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting kniha score že rezervovat ta kniha kniha zajímavá rezervovat kniha zajímavý že kniha

77 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting language model score zajímavý zajímavá kniha zajímavá kniha zajímavý

78 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting language model zajímavá score zajímavý kniha zajímavá kniha zajímavý

79 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting zajímavá score zajímavý kniha zajímavá kniha zajímavý

80 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting score

81 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting zjistil, že score našel že ta přišla

82 Features for Guided Learning in MT Source Segment Features segment size (number of words) entropy P(target source) = i P(src, trg i) log P(trg i src) log count(source) source language model: log P(source) word identity, e.g. f 42 def = [src=found that] PoS identity, e.g. f 43 def = [src pos=vbd IN] Target-dependent Features log P(trg src) target language model: log P(target previous segment) log count(target)? identity, e.g. f 142 def = [src=found that & trg=zjistil]

83 Features for Guided Learning in MT Source Segment Features segment size (number of words) entropy P(target source) = i P(src, trg i) log P(trg i src) log count(source) Combinations and Quantizations source language model: log P(source) [size(src) def = 3] log P(trg src) word identity, e.g. [size(src) f 42 = [src=found = 3 & that] 3 < log P(trg src) < 2] def PoS identity, e.g. etc. f 43 = [src pos=vbd IN] Target-dependent Features log P(trg src) target language model: log P(target previous segment) log count(target)? identity, e.g. f 142 def = [src=found that & trg=zjistil]

84 Application to Tecto Trees find v:fin Agatha n:subj book n:obj interesting adj:compl this adj:attr

85 Application to Tecto Trees n:subj Agatha v:fin find n:obj book adj:attr this adj:compl interesting

86 What have you seen in the Zoo

87 Predictions?

88 Predictions? Hope and Fear

Machine Learning for Deep-syntactic MT

Machine Learning for Deep-syntactic MT Machine Learning for Deep-syntactic MT Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague September 11, 2015 Seminar on the 35th Anniversary of the Cooperation

More information

Machine Translation and Discriminative Models

Machine Translation and Discriminative Models Machine Translation and Discriminative Models Tree-to-tree transfer and Discriminative learning Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague March 23rd 2015,

More information

Treex: Modular NLP Framework

Treex: Modular NLP Framework : Modular NLP Framework Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague September 2015, Prague, MT Marathon Outline Motivation, vs. architecture internals Future

More information

TectoMT: Machine Translation System

TectoMT: Machine Translation System : Machine Translation System Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague Universität des Saarlandes November 4 th 2010, Saarbrücken, Germany Outline NLP

More information

TectoMT: Modular NLP Framework

TectoMT: Modular NLP Framework : Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Depfix: Automatic post-editing of phrase-based machine translation outputs

Depfix: Automatic post-editing of phrase-based machine translation outputs Rudolf Rosa rosa@ufal.mff.cuni.cz Depfix: Automatic post-editing of phrase-based machine translation outputs Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A General-Purpose Rule Extractor for SCFG-Based Machine Translation A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

Better translations with user collaboration - Integrated MT at Microsoft

Better translations with user collaboration - Integrated MT at Microsoft Better s with user collaboration - Integrated MT at Microsoft Chris Wendt Microsoft Research One Microsoft Way Redmond, WA 98052 christw@microsoft.com Abstract This paper outlines the methodologies Microsoft

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016 Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review

More information

Improving the quality of a customized SMT system using shared training data. August 28, 2009

Improving the quality of a customized SMT system using shared training data.  August 28, 2009 Improving the quality of a customized SMT system using shared training data Chris.Wendt@microsoft.com Will.Lewis@microsoft.com August 28, 2009 1 Overview Engine and Customization Basics Experiment Objective

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Parsing with Dynamic Programming

Parsing with Dynamic Programming CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words

More information

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,

More information

CS395T Project 2: Shift-Reduce Parsing

CS395T Project 2: Shift-Reduce Parsing CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model

More information

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith

More information

K-best Parsing Algorithms

K-best Parsing Algorithms K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute) k-best Parsing Liang Huang (Penn) k-best parsing 2 k-best Parsing I saw

More information

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based

More information

Easy-First Chinese POS Tagging and Dependency Parsing

Easy-First Chinese POS Tagging and Dependency Parsing Easy-First Chinese POS Tagging and Dependency Parsing ABSTRACT Ji Ma, Tong Xiao, Jingbo Zhu, Feiliang Ren Natural Language Processing Laboratory Northeastern University, China majineu@outlook.com, zhujingbo@mail.neu.edu.cn,

More information

Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser

Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser Nathan David Green and Zdeněk Žabokrtský Charles University in Prague Institute of Formal and Applied Linguistics

More information

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019 CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019 1 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Lisa Wang, Juhi Naik,

More information

Forest-Based Search Algorithms

Forest-Based Search Algorithms Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2

More information

Bidirectional LTAG Dependency Parsing

Bidirectional LTAG Dependency Parsing Bidirectional LTAG Dependency Parsing Libin Shen BBN Technologies Aravind K. Joshi University of Pennsylvania We propose a novel algorithm for bi-directional parsing with linear computational complexity,

More information

Approximate Large Margin Methods for Structured Prediction

Approximate Large Margin Methods for Structured Prediction : Approximate Large Margin Methods for Structured Prediction Hal Daumé III and Daniel Marcu Information Sciences Institute University of Southern California {hdaume,marcu}@isi.edu Slide 1 Structured Prediction

More information

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised

More information

Experimenting in MT: Moses Toolkit and Eman

Experimenting in MT: Moses Toolkit and Eman Experimenting in MT: Moses Toolkit and Eman Aleš Tamchyna, Ondřej Bojar Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University, Prague Tue Sept 8, 2015 1 / 30

More information

Start downloading our playground for SMT: wget

Start downloading our playground for SMT: wget Before we start... Start downloading Moses: wget http://ufal.mff.cuni.cz/~tamchyna/mosesgiza.64bit.tar.gz Start downloading our playground for SMT: wget http://ufal.mff.cuni.cz/eman/download/playground.tar

More information

Easy-First POS Tagging and Dependency Parsing with Beam Search

Easy-First POS Tagging and Dependency Parsing with Beam Search Easy-First POS Tagging and Dependency Parsing with Beam Search Ji Ma JingboZhu Tong Xiao Nan Yang Natrual Language Processing Lab., Northeastern University, Shenyang, China MOE-MS Key Lab of MCC, University

More information

Features for Syntax-based Moses: Project Report

Features for Syntax-based Moses: Project Report Features for Syntax-based Moses: Project Report Eleftherios Avramidis, Arefeh Kazemi, Tom Vanallemeersch, Phil Williams, Rico Sennrich, Maria Nadejde, Matthias Huck 13 September 2014 Features for Syntax-based

More information

Semantics as a Foreign Language. Gabriel Stanovsky and Ido Dagan EMNLP 2018

Semantics as a Foreign Language. Gabriel Stanovsky and Ido Dagan EMNLP 2018 Semantics as a Foreign Language Gabriel Stanovsky and Ido Dagan EMNLP 2018 Semantic Dependency Parsing (SDP) A collection of three semantic formalisms (Oepen et al., 2014;2015) Semantic Dependency Parsing

More information

Discriminative Reranking for Machine Translation

Discriminative Reranking for Machine Translation Discriminative Reranking for Machine Translation Libin Shen Dept. of Comp. & Info. Science Univ. of Pennsylvania Philadelphia, PA 19104 libin@seas.upenn.edu Anoop Sarkar School of Comp. Science Simon Fraser

More information

SampleRank Training for Phrase-Based Machine Translation

SampleRank Training for Phrase-Based Machine Translation SampleRank Training for Phrase-Based Machine Translation Barry Haddow School of Informatics University of Edinburgh bhaddow@inf.ed.ac.uk Abhishek Arun Microsoft UK abarun@microsoft.com Philipp Koehn School

More information

Advanced Natural Language Processing and Information Retrieval

Advanced Natural Language Processing and Information Retrieval Advanced Natural Language Processing and Information Retrieval LAB3: Kernel Methods for Reranking Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email:

More information

Dynamic Feature Selection for Dependency Parsing

Dynamic Feature Selection for Dependency Parsing Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana

More information

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com

More information

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science

More information

Achieving Domain Specificity in SMT without Overt Siloing

Achieving Domain Specificity in SMT without Overt Siloing Achieving Domain Specificity in SMT without Overt Siloing William D. Lewis, Chris Wendt, David Bullock Microsoft Research Redmond, WA 98052, USA wilewis@microsoft.com, christw@microsoft.com, a-davbul@microsoft.com

More information

Morpho-syntactic Analysis with the Stanford CoreNLP

Morpho-syntactic Analysis with the Stanford CoreNLP Morpho-syntactic Analysis with the Stanford CoreNLP Danilo Croce croce@info.uniroma2.it WmIR 2015/2016 Objectives of this tutorial Use of a Natural Language Toolkit CoreNLP toolkit Morpho-syntactic analysis

More information

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001 Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,

More information

Density-Driven Cross-Lingual Transfer of Dependency Parsers

Density-Driven Cross-Lingual Transfer of Dependency Parsers Density-Driven Cross-Lingual Transfer of Dependency Parsers Mohammad Sadegh Rasooli Michael Collins rasooli@cs.columbia.edu Presented by Owen Rambow EMNLP 2015 Motivation Availability of treebanks Accurate

More information

Stack- propaga+on: Improved Representa+on Learning for Syntax

Stack- propaga+on: Improved Representa+on Learning for Syntax Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Saturday 10 th December 2016 09:30 to 11:30 INSTRUCTIONS

More information

Voting between Multiple Data Representations for Text Chunking

Voting between Multiple Data Representations for Text Chunking Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.

More information

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Discriminative Training with Perceptron Algorithm for POS Tagging Task Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu

More information

Annual Public Report - Project Year 2 November 2012

Annual Public Report - Project Year 2 November 2012 November 2012 Grant Agreement number: 247762 Project acronym: FAUST Project title: Feedback Analysis for User Adaptive Statistical Translation Funding Scheme: FP7-ICT-2009-4 STREP Period covered: from

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

Forest-based Algorithms in

Forest-based Algorithms in Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,

More information

Tekniker för storskalig parsning: Dependensparsning 2

Tekniker för storskalig parsning: Dependensparsning 2 Tekniker för storskalig parsning: Dependensparsning 2 Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Dependensparsning 2 1(45) Data-Driven Dependency

More information

I Know Your Name: Named Entity Recognition and Structural Parsing

I Know Your Name: Named Entity Recognition and Structural Parsing I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning Basic Parsing with Context-Free Grammars Some slides adapted from Karl Stratos and from Chris Manning 1 Announcements HW 2 out Midterm on 10/19 (see website). Sample ques>ons will be provided. Sign up

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

Lexical Semantics. Regina Barzilay MIT. October, 5766

Lexical Semantics. Regina Barzilay MIT. October, 5766 Lexical Semantics Regina Barzilay MIT October, 5766 Last Time: Vector-Based Similarity Measures man woman grape orange apple n Euclidian: x, y = x y = i=1 ( x i y i ) 2 n x y x i y i i=1 Cosine: cos( x,

More information

EDAN20 Language Technology Chapter 13: Dependency Parsing

EDAN20 Language Technology   Chapter 13: Dependency Parsing EDAN20 Language Technology http://cs.lth.se/edan20/ Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ September 19, 2016 Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/

More information

Apache UIMA and Mayo ctakes

Apache UIMA and Mayo ctakes Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017) Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com

More information

D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version)

D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version) www.kconnect.eu D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version) Deliverable number D1.6 Dissemination level Public Delivery date 16 September 2016 Status Author(s) Final

More information

TEXT MINING APPLICATION PROGRAMMING

TEXT MINING APPLICATION PROGRAMMING TEXT MINING APPLICATION PROGRAMMING MANU KONCHADY CHARLES RIVER MEDIA Boston, Massachusetts Contents Preface Acknowledgments xv xix Introduction 1 Originsof Text Mining 4 Information Retrieval 4 Natural

More information

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process Processing Text Converting documents to index terms Why? Matching the exact

More information

Fully Delexicalized Contexts for Syntax-Based Word Embeddings

Fully Delexicalized Contexts for Syntax-Based Word Embeddings Fully Delexicalized Contexts for Syntax-Based Word Embeddings Jenna Kanerva¹, Sampo Pyysalo² and Filip Ginter¹ ¹Dept of IT - University of Turku, Finland ²Lang. Tech. Lab - University of Cambridge turkunlp.github.io

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY 2009 17 26 Memory-Based Machine Translation and Language Modeling Antal van den Bosch, Peter Berck Abstract We describe a freely available

More information

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses PBML logo The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY 2009 1 11 Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow, Jean-Baptiste Fouet Abstract We describe

More information

Structural Text Features. Structural Features

Structural Text Features. Structural Features Structural Text Features CISC489/689 010, Lecture #13 Monday, April 6 th Ben CartereGe Structural Features So far we have mainly focused on vanilla features of terms in documents Term frequency, document

More information

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11 Graph-Based Parsing Miguel Ballesteros Algorithms for NLP Course. 7-11 By using some Joakim Nivre's materials from Uppsala University and Jason Eisner's material from Johns Hopkins University. Outline

More information

Refactored Moses. Hieu Hoang MT Marathon Prague August 2013

Refactored Moses. Hieu Hoang MT Marathon Prague August 2013 Refactored Moses Hieu Hoang MT Marathon Prague August 2013 What did you Refactor? Decoder Feature Func?on Framework moses.ini format Simplify class structure Deleted func?onality NOT decoding algorithms

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Transition-based dependency parsing

Transition-based dependency parsing Transition-based dependency parsing Syntactic analysis (5LN455) 2014-12-18 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Overview Arc-factored dependency parsing

More information

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2014-12-10 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Mid-course evaluation Mostly positive

More information

Marcello Federico MMT Srl / FBK Trento, Italy

Marcello Federico MMT Srl / FBK Trento, Italy Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614

More information

Improved Models of Distortion Cost for Statistical Machine Translation

Improved Models of Distortion Cost for Statistical Machine Translation Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel Galley, and Christopher D. Manning Computer Science Department Stanford University Stanford, CA 9435 {spenceg,mgalley,manning}@stanford.edu

More information

Moses Version 3.0 Release Notes

Moses Version 3.0 Release Notes Moses Version 3.0 Release Notes Overview The document contains the release notes for the Moses SMT toolkit, version 3.0. It describes the changes to the toolkit since version 2.0 from January 2014. In

More information

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER An Experimental Management System. Philipp Koehn

The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER An Experimental Management System. Philipp Koehn The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER 2010 87 96 An Experimental Management System Philipp Koehn University of Edinburgh Abstract We describe Experiment.perl, an experimental

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Annotation Procedure in Building the Prague Czech-English Dependency Treebank

Annotation Procedure in Building the Prague Czech-English Dependency Treebank Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics, Charles University in Prague Abstract. In this

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Batch Tuning Strategies for Statistical Machine Translation

Batch Tuning Strategies for Statistical Machine Translation Batch Tuning Strategies for Statistical Machine Translation Colin Cherry and George Foster National Research Council Canada {Colin.Cherry,George.Foster}@nrc-cnrc.gc.ca Abstract There has been a proliferation

More information

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder

More information

Structured Perceptron with Inexact Search

Structured Perceptron with Inexact Search Structured Perceptron with Inexact Search Liang Huang Suphan Fayong Yang Guo presented by Allan July 15, 2016 Slides: http://www.statnlp.org/sperceptron.html Table of Contents Structured Perceptron Exact

More information

Corpus Linguistics: corpus annotation

Corpus Linguistics: corpus annotation Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course

More information

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro

More information

Backpropagating through Structured Argmax using a SPIGOT

Backpropagating through Structured Argmax using a SPIGOT Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018 Overview arg max Parser Downstream task Loss L Overview arg max Parser Downstream task Head

More information

An efficient and user-friendly tool for machine translation quality estimation

An efficient and user-friendly tool for machine translation quality estimation An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification II Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013 The Perceptron Simon Šuster, University of Groningen Course Learning from data November 18, 2013 References Hal Daumé III: A Course in Machine Learning http://ciml.info Tom M. Mitchell: Machine Learning

More information

CS 224N Assignment 2 Writeup

CS 224N Assignment 2 Writeup CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)

More information

Structure and Support Vector Machines. SPFLODD October 31, 2013

Structure and Support Vector Machines. SPFLODD October 31, 2013 Structure and Support Vector Machines SPFLODD October 31, 2013 Outline SVMs for structured outputs Declara?ve view Procedural view Warning: Math Ahead Nota?on for Linear Models Training data: {(x 1, y

More information