Joint Decoding with Multiple Translation Models

Size: px

Start display at page:

Download "Joint Decoding with Multiple Translation Models"

Ursula Shepherd
5 years ago
Views:

1 Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences 8/10/2009 ACL/IJCNLP 2009, ingapore 1

2 ystem Combination ystem 1 E1 ystem 2 E2 f Combiner e ystem n En decoding? postprocessing 8/10/2009 ACL/IJCNLP 2009, ingapore 2

3 This Work We propose a technique called joint decoding to combine different models directly in the decoding phase. Our preliminary work shows promising results. 8/10/2009 ACL/IJCNLP 2009, ingapore 3

4 Translation Hypergraph fabiao yanjiang give a talk give talks give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 4

5 Consensus Translations give talks give a talk give a talk give talks give a talk give a speech give 0-1 talks 1-2 give 0-1 talk 1-2 give 0-1 talk 1-2 phrase-based hierarchical phrase-based tree-to-string 8/10/2009 ACL/IJCNLP 2009, ingapore 5

6 haring Nodes give a speech give a talk give talks give 0-1 talk 1-2 talks 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 6

7 coring a Translation target sentence source sentence pe ( f) = ped, f d ( ef, ) ( ) one derivation the set of derivations that translate f into e a derivation can come from any model! 8/10/2009 ACL/IJCNLP 2009, ingapore 7

8 MDD and MTD ( ) p e f = d ( e, f ) exp λihi de,, f i Z ( ) Blunsom et al. (2008) eˆ = argmaxe exp λihi( de,, f ) d ( e, f ) i argmax ed, λihi( de,, f ) i max-translation decoding max-derivation decoding 8/10/2009 ACL/IJCNLP 2009, ingapore 8

9 An Example of MTD model name feature weight value p(e f) 0.1 hier p(f e) l(e f) exp(3.7) l(f e) 0.3 rc 3 t2s p(e f) p(f e) l(e f) exp(4.8) (exp(3.7) + exp(4.8)) exp(3.7) l(f e) 0.2 rc 4 const lm wc exp(3.7) 8/10/2009 ACL/IJCNLP 2009, ingapore 9

10 Joint Decoding fabiao yanjiang /10/2009 ACL/IJCNLP 2009, ingapore 10

11 Joint Decoding fabiao yanjiang give 0-1 8/10/2009 ACL/IJCNLP 2009, ingapore 11

12 Joint Decoding fabiao yanjiang give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 12

13 Joint Decoding fabiao yanjiang give 0-1 talk 1-2 talks 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 13

14 Joint Decoding fabiao yanjiang give 0-1 talk 1-2 talks 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 14

15 Joint Decoding fabiao yanjiang give a speech give a talk give talks give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 15

16 Joint Decoding fabiao yanjiang give a talk a pruned packed hypergraph give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 16

17 haring Hyperedges give a talk give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 17

18 Joint Decoding with Different Rules VP VV NN fabiao yanjiang /10/2009 ACL/IJCNLP 2009, ingapore 18

19 Joint Decoding with Different Rules VP X -> <fabiao, give> VV NN fabiao yanjiang give 0-1 8/10/2009 ACL/IJCNLP 2009, ingapore 19

20 Joint Decoding with Different Rules VV VP NN X -> <fabiao, give> X -> <yanjiang, talk> fabiao yanjiang give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 20

21 Joint Decoding with Different Rules (VP (VV:x1) (NN:x2)) -> x1 a x2 VV VP NN X -> <fabiao, give> X -> <yanjiang, talk> fabiao yanjiang give a talk give 0-1 talk 1-2 8/10/2009 ACL/IJCNLP 2009, ingapore 21

22 An Example of MDD model name feature weight value p(e f) 0.1 p(f e) 0.2 hier l(e f) 0.1 l(f e) 0.3 rc 2 p(e f) p(f e) t2s l(e f) 0.3 l(f e) 0.2 rc 1 const lm wc /10/2009 ACL/IJCNLP 2009, ingapore 22

23 haring Matrix Phrase Hiero T2 2T T2T Phrase node, edge node, edge node, edge node node Hiero node, edge node, edge node, edge node node T2 node, edge node, edge node, edge node node 2T node node node node, edge node, edge T2T node node node node, edge node, edge 8/10/2009 ACL/IJCNLP 2009, ingapore 23

24 How to Tune Feature Weights for MTD? model name feature weight value hier p(e f) p(f e) l(e f) eˆ = argmaxe exp λihi( d, e, f ) d ( e, f ) i l(f e) 0.3 rc 3 p(e f) p(f e) (exp(3.7) + exp(4.8)) exp(3.7) t2s l(e f) 0.3 l(f e) 0.2 rc 4 const lm wc /10/2009 ACL/IJCNLP 2009, ingapore 24

25 MERT for MDD f e1 e2 e3 d d d e1 e3 e2 8/10/2009 ACL/IJCNLP 2009, ingapore 25

26 MERT for MTD f e1 e2 e3 model 1 model 2 d d d d d d /10/2009 ACL/IJCNLP 2009, ingapore 26

27 Curves e1 e2 ( ) K = ak f x e + k= 1 x b k e3 8/10/2009 ACL/IJCNLP 2009, ingapore 27

28 etup Models Hierarchical phrase-based (Chiang, 2005) Tree-to-string (Liu et al., 2006) Training set: FBI (6.9M + 8.9M) Language model: 4-gram trained on GIGAWORD Xinhua portion Development set: NIT 2002 C2E Test set: NIT 2005 C2E 8/10/2009 ACL/IJCNLP 2009, ingapore 28

29 Individual Decoding Vs. Joint Decoding Model haring Max-derivation Time BLEU Max-translation Time BLEU Hiero T both node node & edge /10/2009 ACL/IJCNLP 2009, ingapore 29

30 Compared with ystem Combination Method Model BLEU individual Hiero T system comb. both joint both /10/2009 ACL/IJCNLP 2009, ingapore 30

31 Individual Training Vs. Joint Training Training individual joint Max-derivation Max-translation /10/2009 ACL/IJCNLP 2009, ingapore 31

32 Conclusion and Future Work We have presented a framework for combining different translation models in the decoding phrase. Future work Including more models Forced decoding Hypergraph-based MERT (Kumar et al., 2009) 8/10/2009 ACL/IJCNLP 2009, ingapore 32

33 Thanks! 8/10/2009 ACL/IJCNLP 2009, ingapore 33

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of