Machine Translation Zoo

Size: px

Start display at page:

Download "Machine Translation Zoo"

Horatio Sherman
5 years ago
Views:

1 Machine Translation Zoo Tree-to-tree transfer and Discriminative learning Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague May 5th 2013, Seminar of Formal Linguistics, Prague

2 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

3 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

4 Phrase-based MT (Moses) Training word-alignment (Giza++ & symmetrization) phrase extraction tune parameters (MERT) Decoding get all matching rules find one derivation with a maximum score (beam search)

5 TectoMT Training analyze CzEng to t-layer t-node alignment learn one MaxEnt model for each source lemma and formeme Decoding get all translation variants for each lemma and formeme find a labeling with a maximum score (HMTM)

6 TectoMT MaxEnt Model he / n:subj union / n:with+x agree / v:fin tense=past, voice=active negation=0, sempos=v cut / v:inf, has_left_child=0, sempos=v, has_right_child=1, tag=vb, position=right, named_entity=0 overtime / n:obj chop, saw, trim, shorten, lumber, hew, lower, delete, crop abolish, cancel,... all / adj:attr TRANSFER ANALYSIS SYNTHESIS He agreed with the unions to cut all overtime. Dohodl se s odbory na zrušení všech přesčasů.

7 Machine Translation Taxonomy Level of transfer: Base translation unit (BTU): Extract more segmentations in training? Try (search) more segmentations in decoding? Use more segmentations in the output translation? What is the context X in P(BTU target BTU source, X )?

8 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based

9 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word, phrase Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based

10 Machine Translation Taxonomy Level of transfer: surface Base translation unit (BTU): word, phrase, phrase with gaps Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

11 Machine Translation Taxonomy Level of transfer: surface, shallow syntax Base translation unit (BTU): word, phrase, phrase with gaps, treelet Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: neighboring treelets (Quirk and Menezes, 2006) dep. treelet to string (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

12 Machine Translation Taxonomy Level of transfer: surface, shallow syntax, tectogrammatical Base translation unit (BTU): word, phrase, phrase with gaps, treelet, node Extract more segmentations in training? no Try (search) more segmentations in decoding? no Use more segmentations in the output translation? no What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: neighboring nodes (Quirk and Menezes, 2006) dep. treelet to string (Mareček et al., 2010) TectoMT (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical

training? yes Try (search) more segmentations in decoding?

yes What is the context X in P(BTU target BTU source, X )?

13 Machine Translation Taxonomy Level of transfer: surface, shallow syntax, tectogrammatical Base translation unit (BTU): word, phrase, phrase with gaps, treelet, node Extract more segmentations in training? yes Try (search) more segmentations in decoding? yes Use more segmentations in the output translation? yes What is the context X in P(BTU target BTU source, X )? Considering just Translation Model: nothing (Quirk and Menezes, 2006) (Mareček et al., 2010) dep. treelet to string TectoMT (Brown et al., 1993) word-based (Koehn et al., 2003) phrase-based (Chiang, 2005) hierarchical (Arun, 2011) Monte Carlo

14 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

15 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis.

16 Hybrids: TectoMoses Linearize source t-trees (two factors: lemma and formeme), translate with Moses, project dependencies and use TectoMT synthesis. rule based tectogramatical layer & statistical blocks ANALYSIS TRANSFER SYNTHESIS t-layer fill formems grammatemes use fill morphological categories query build t-tree dictionary HMTM impose agreement mark edges to contract add functional words analytical layer a-layer analytical functions generate parser (McDonald's MST) wordforms morphological layer m-layer tagger (Morce) concatenate lemmatization source language (English) tokenization (Czech) target language w-layer segmentation

17 Hybrids: PhraseFix Done for WMT 2013 by Petra Galuščáková: Post-edit TectoMT output using Moses trained on cs-tectomt cs-reference (whole CzEng). How to post-edit only when confident? filter phrase table add confidence feature for MERT improve alignment (monolingual) boost phrase table (e.g. with identities) Future work: use also source (English) sentences multi-source translation project only content words (using TectoMT) factored translation with non-synchronous (overlapping) factors

18 Hybrids: PhraseFix Done for WMT 2013 by Petra Galuščáková: Post-edit TectoMT output using Moses trained on cs-tectomt cs-reference (whole CzEng). How to post-edit only when confident? filter phrase table add confidence feature for MERT improve alignment (monolingual) boost phrase table (e.g. with identities) Future work: use also source (English) sentences multi-source translation project only content words (using TectoMT) factored translation with non-synchronous (overlapping) factors

19 Even More Hybrids: DepFix, AddToTrain, Chimera DepFix (Rosa et al., 2012) post-edit SMT using syntactic analysis and rules exploit also the source sentences, robust parsing AddToTrain (Bojar, Galuščáková) translate monolingual news (or WMT devsets) with TectoMT add this to Moses parallel training data Chimera post-edit AddToTrain output with DepFix sent to WMT 2013 in attempt to beat Google

also the source sentences, robust parsing AddToTrain (Bojar,

TectoMT add this to Moses parallel training data Chimera

20 Even More Hybrids: DepFix, AddToTrain, Chimera DepFix (Rosa et al., 2012) post-edit SMT using syntactic analysis and rules exploit also the source sentences, robust parsing AddToTrain (Bojar, Galuščáková) translate monolingual news (or WMT devsets) with TectoMT add this to Moses parallel training data Chimera post-edit AddToTrain output with DepFix sent to WMT 2013 in attempt to beat Google

21 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

22 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w

23 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero Output: w

24 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) Output: w

25 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) 1. get its features x Output: w

26 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred Output: w

27 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold

28 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights

29 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights

30 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights Definition: conservative online learning no error no update i.e., if y pred = y gold then update(x, y gold, y pred ) = 0

31 General Algorithm for Online Learning w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) //prediction(w, x) = y gold Output: w initialize all weights to zero for each instance (observation) 1. get its features x 2. do the prediction y pred 3. get the correct label y gold 4. update the weights Definition: conservative online learning no error no update i.e., if y pred = y gold then update(x, y gold, y pred ) = 0 Definition: aggressive online learning after the update, the instance would be classified correctly

32 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

33 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) dot product (similarity score) of weights and features w x = i w ix i Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

34 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w dot product (similarity score) of weights and features w x = i w ix i Iverson{ bracket 1 if P is true; [P] = 0 otherwise. prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x

35 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w dot product (similarity score) of weights and features w x = i w ix i Iverson{ bracket 1 if P is true; [P] = 0 otherwise. prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x learning rate (step size) α > 0

36 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) ) learning rate (step size) α > 0

37 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) Output: w Special case: multi-prototype features f(x, y) def = [y = class 1 ] x, [y = class 2 ] x, [y = class C ] x prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) ) w := w + αf(x, y gold ) αf(x, y pred )

38 Perceptron w := 0 while (x, y gold ) := get new data() y pred := prediction(w, x) w += update(x, y gold, y pred ) General case: any label-dependent features, e.g. f 101 (x, y) def = [(y=nnp or y=nnps) and x capitalized ] Output: w prediction(w, x) def = update(x, y gold, y pred ) def = Binary Perceptron [w x > 0] α(y gold y pred ) x Multi-class Perceptron arg max y w f(x, y) α ( f(x, y gold ) f(x, y pred ) )

39 Structured Prediction the number of possible labels is huge labels y have a structure (graph, tree, sequence,... ) usually can be decomposed (factorized) into subproblems local features f i (x, y, j) can use whole x, but only such y k where k is near j f 101 (x, y, j) def = [ (y j =NNP or y j =NNPS) and word x j capitalized ] f 102 (x, y, j) def = [ y j =NNP and y j 1 =NNP and x 6 ] global features F i (x, y) def = j f i(x, y, j) F number of capitalized words with tag NNP or NNPS F number of NNP followed by NNP or 0 if the sentence is longer than six words We can define also features that cannot be decomposed

40 Structured Prediction using Online Learning 1 local approach update after each local decision output of previous decisions used in local features e.g. Structured Perceptron (Collins, 2002) y pred = arg max y i w if i (x, y j, y j 1,...) 2 global approach generate n-best list (lattice) of outputs y for the whole x compute global features, do update for each x (sentence) we are re-ranking the n-best list e.g. MIRA (Crammer and Singer, 2003) y pred = arg max y i w if i (x, y)

41 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

42 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

43 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) )

44 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) ) Perceptron α Perc def = 1 (or any fixed value > 0) Passive Aggressive (PA) α PA def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) 2 Passive Aggressive I α PA-I def = min {C, α PA } Passive Aggressive II α PA-II def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) C

45 Margin-based Online Learning Definitions score(y) = w f(x, y) margin(y) = score(y gold ) score(y) margin > 0 no error margin confidence hinge loss(y) = max ( 0, 1 margin(y) ) Online Prediction and Update y pred def = arg max w f(x, y) w += α ( f(x, y gold ) f(x, y pred ) ) Perceptron α Perc def = 1 (or any fixed value > 0) Passive Aggressive (PA) α PA def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) 2 Passive Aggressive I α PA-I def = min {C, α PA } Passive Aggressive II α PA-II def = hinge loss(y pred ) f(x,y gold ) f(x,y pred ) C

46 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y gold ) f(x, y pred ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

47 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y gold ) f(x, y pred ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

48 Cost-sensitive Online Learning Definitions cost(y) = external error metric (non-negative) e.g. 1 - similarity of y and y gold hinge loss(y) = max ( 0, cost(y) margin(y) ) Hope and Fear w += α ( f(x, y hope ) f(x, y fear ) ) min-cost y hope def = arg max y cost(y) max-score y fear def = arg max y score(y) cost-diminished y hope def = arg max y score(y) cost(y) cost-augmented y fear def = arg max y score(y) + cost(y) max-cost y fear def = arg max y cost(y)

49 -cost Cost-sensitive Online Learning Hypothesis Selection score

50 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope score

51 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope max-score fear (y pred ) score

52 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) score

53 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear score

54 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear score n-best list

55 -cost Cost-sensitive Online Learning Hypothesis Selection min-cost (max-bleu) hope cost-diminished hope max-score fear (y pred ) cost-augmented fear max-cost fear score n-best list

56 Application to MT x = source sentence y gold = its reference translation more references sometimes available reference may be unreachable we score derivations (which include latent variables) one translation may have more derivations

57 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

58 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

59 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

60 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score VB IN VBN NNP VBD DT RB (Shen, Satta and Joshi, 2007)

61 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT VBN IN f 123 def = [y j = DT, y j+1 = NN] NNP VBD DT RB (Shen, Satta and Joshi, 2007)

62 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN NNP VBD RB f 124 def = [y j = RB, y j+1 = NN] (Shen, Satta and Joshi, 2007)

63 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN VBD NNP RB (Shen, Satta and Joshi, 2007)

64 Easy-First Decoding (PoS Tagging) Agatha found that book interesting NN JJ score DT IN VBN VBD NNP RB (Shen, Satta and Joshi, 2007)

65 Easy-First Decoding (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN VBD NNP (Shen, Satta and Joshi, 2007)

66 Easy-First Decoding (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN VBD NNP (Shen, Satta and Joshi, 2007)

67 Guided Learning (PoS Tagging) Agatha found that book interesting DT NN JJ score VBN y fear VBD y hope NNP (Shen, Satta and Joshi, 2007)

68 Easy-First Decoding (Dependency Parsing) (Goldberg and Elhadad, 2010)

69 (2) ATTACHRIGHT(1) Easy-First Decoding -52 (Dependency Parsing) 246 a fox brown jumped with joy (Goldberg and Elhadad, 2010)

70 (3) ATTACHRIGHT(1) Easy-First Decoding (Dependency 246 Parsing) fox jumped with joy a brown (Goldberg and Elhadad, 2010)

71 MT Intro Online brown Learning Guided Learning a brown Back to MT (4) ATTACHLEFT(2) Easy-First Decoding (Dependency Parsing) (5) ATTACHLEFT(1) jumped with joy jumped with fox fox joy 430 (6) jumped fox w a brown a brown a brownj (Goldberg and Elhadad, 2010)

72 Today s Menu 1 MT Intro Taxonomy Hybrids 2 Online Learning Perceptron Structured Prediction 3 Guided Learning 4 Back to MT Easy-First Decoding in MT Guided Learning in MT

73 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting score

74 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting našel zjistil, že že kniha score že rezervovat zajímavý ta kniha zajímavá kniha zajímavá Agatha přišla ta rezervovat kniha zajímavý že kniha

75 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting kniha score že rezervovat ta kniha kniha zajímavá rezervovat kniha zajímavý že kniha

76 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting kniha score že rezervovat ta kniha kniha zajímavá rezervovat kniha zajímavý že kniha

77 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting language model score zajímavý zajímavá kniha zajímavá kniha zajímavý

78 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting language model zajímavá score zajímavý kniha zajímavá kniha zajímavý

79 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting zajímavá score zajímavý kniha zajímavá kniha zajímavý

80 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting score

81 Easy-First Decoding (Phrase-Based MT) Agátě Agatha přišla found ta that kniha book zajímavá interesting zjistil, že score našel že ta přišla

82 Features for Guided Learning in MT Source Segment Features segment size (number of words) entropy P(target source) = i P(src, trg i) log P(trg i src) log count(source) source language model: log P(source) word identity, e.g. f 42 def = [src=found that] PoS identity, e.g. f 43 def = [src pos=vbd IN] Target-dependent Features log P(trg src) target language model: log P(target previous segment) log count(target)? identity, e.g. f 142 def = [src=found that & trg=zjistil]

83 Features for Guided Learning in MT Source Segment Features segment size (number of words) entropy P(target source) = i P(src, trg i) log P(trg i src) log count(source) Combinations and Quantizations source language model: log P(source) [size(src) def = 3] log P(trg src) word identity, e.g. [size(src) f 42 = [src=found = 3 & that] 3 < log P(trg src) < 2] def PoS identity, e.g. etc. f 43 = [src pos=vbd IN] Target-dependent Features log P(trg src) target language model: log P(target previous segment) log count(target)? identity, e.g. f 142 def = [src=found that & trg=zjistil]

84 Application to Tecto Trees find v:fin Agatha n:subj book n:obj interesting adj:compl this adj:attr

85 Application to Tecto Trees n:subj Agatha v:fin find n:obj book adj:attr this adj:compl interesting

86 What have you seen in the Zoo

87 Predictions?

88 Predictions? Hope and Fear

Machine Learning for Deep-syntactic MT

Machine Learning for Deep-syntactic MT Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague September 11, 2015 Seminar on the 35th Anniversary of the Cooperation