Reassessment of the Role of Phrase Extraction in PBSMT

Size: px

Start display at page:

Download "Reassessment of the Role of Phrase Extraction in PBSMT"

Georgiana Miranda Lewis
5 years ago
Views:

cmu.edu Presented by: Nguyen Bach LTI-CMU

1 Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM Qin Gao LTI-CMU Stephan Vogel LTI-CMU Presented by: Nguyen Bach LTI-CMU MT Summit XII. Ottawa, Canada August 27,

2 Outline Introduction Analysis Setup Analysis I: Word Alignments Metrics Results Analysis II: Phrase Extraction Metrics Manual Evaluation Results Lessons Learned: Mind your gaps Experimental Results Conclusions 2

3 Word Alignment Beginning of SMT pipeline. Most subsequent steps based on WA. A lot of work to improve WA quality. Widely available Hand Alignments enabled discriminative approaches. Discriminative based on metrics such AER. 3

4 AER vs BLEU Fraser and Marcu, 2004 Evaluated correlation between BLEU and AER. Possible links => flaws. Variation of F-measure, uses a coefficient to modify balance between precision and recall. The optimal coefficient depends on the corpus. Vilar et al., 2004 Better BLEU scores can be obtained with "degraded" alignments. Mismatch between alignment and translation models. Support the use of AER. AER BLEU AER BLEU 4

5 AER vs BLEU Ayan and Dorr, 2004 Analyze the quality of the alignments and resulting phrase tables. Several types of alignments. Several lexical weightings CPER (Consistent Phrase Error Rate) 5

6 Our Objective Study the relationship between Word Alignment and Phrase Extraction Alignment characteristics => Phrase Table characteristics 6

7 Analysis Setup 7

8 Setup Alignments: Data: GIZA++ Chinese-English Symmetrized (growdiag-final) GIZA++ train (11M Sentences) Discriminative Phrase Extraction: phrase-extract (Och & Ney) DWA tune(500 Sentences) Test (200 Sentences) Max P Len= 7 8

9 Discriminative (Niehues & Vogel) CRF Based GIZA Uses GIZA++ byproducts as features Tuned towards AER lexicon Viterbi alignments fertilities Discriminative Word Aligner (DWA) 9

10 Continuous alignment matrix Binarized to different thresholds 10

11 Analysis I: Word Alignment 11

12 Word Alignment Metrics Qualitative: Quantitative: AER (F-measure) Number of links Precision Recall Number of unaligned words 12

13 Alignment quality results DWA alignments: higher threshold=>more precision Best AER from slightly more precise alignment (DWA-0.5) GIZA=> more precision than recall. SYM lower AER than GIZA. Precision/Recall AER Precision Recall Alignment quality DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 AER 30 GIZA S2T GIZA T2S SYM Alignment 13

14 Links Hand Align closer to (DWA-0.3) 100,000 90,000 Number of Links by alignment DWA aligns: high threshold=>fewer links Best AER (DWA-0.5) fewer links than HA 80,000 70,000 60,000 50,000 40,000 30,000 20,000 10, DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA number of links Alignment 14

15 Unaligned Words HA Source: closer to SYM, DWA-0.3 HA Target: closer to DWA-0.4, DWA-0.3 GIZA asymmetry DWA: higher threshold, more unalignments. DWA: lower threshold=> more proportion Chinese words unaligned. total number of unaligned words 30,000 25,000 20,000 15,000 10,000 5,000 0 DWA-0.1 DWA-0.2 Unaligned Source and Target Words Source Unaligned Target Unaligned DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignment 15

16 Word Alignment: Summary Diversity of balance precision/recall between alignments. Usually precision prevails over recall. Two ways of describing to an alignment: links and unaligned words. In next section, we'll observe the importance of such factors in the generation of phrase pairs. 16

17 Analysis II: Phrase Extraction 17

18 Metrics Quantitative: Qualitative: Number of phrases Manual Evaluation Singletons (unique entries) Phrase lengths Gaps (unaligned words inside phrase pair) 18

19 target What do we mean by gaps? Alignment matrix In this phrase pair: 1 gap on the target phrase source In this phrase pair: 3 gaps on the source phrase 19

20 Number of Phrases PT grows as our alignment gets sparser Related to unaligned words rather than number of links Number of Unaligned Words 30,000 25,000 20,000 15,000 10,000 5,000 Number of generated phrase pairs Phrases Unaligned Source Unaligned Target 900, , , , , , , , ,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T number of phrases 0 GIZA T2S SYM HA Alignments 20

21 Number of Phrases PT grows as our alignment gets sparser Related to unaligned words rather than number of links 100,000 90,000 80,000 70,000 60,000 Number of generated phrase pairs Phrases Links 900, , , , ,000 number of links 50,000 40,000 30,000 20,000 10, , , , , DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA number of phrases Alignments 21

22 Singletons Most of the phrase-pairs are singletons. Repeated phrase-pairs grow at a slower rate. 900, , , ,000 Generated phrase pairs Singletons Repeated 500, , , , ,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Number of phrase pairs Alignments 22

23 Singletons Most of the phrase-pairs are singletons. 30,000 Generated phrase pairs Repeated Repeated phrase-pairs grow at a slower rate. 25,000 20,000 15,000 10,000 5,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Number of phrase pairs Alignments 23

24 Phase Length As our PT grows, entries become longer and longer. Source phrase length distribution of generated phrase pairs Number of phrase pairs #7 #6 #5 #4 #3 #2 #1 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignments 24

25 Gaps The gaps inside a phrase pair increases too. Gaps in generated phrase pairs The distribution of gaps in the generated phrases follows the distribution of unaligned words in the alignment Expected number of gaps in phrase pair DWA-0.1 exp. #source gaps exp. #target gaps DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignment 25

26 Human Evaluation of Phrase Pairs Setup: Native Chinese Speakers Each subject was asked whether a phrase pair was adequate No contextual information Included a noisy input 26

27 Sampling 27

28 Results HA w/o gaps yields better results than HA w/ gaps. DWA-0.1 very good (short phrase pairs) 100% 90% 80% 70% 8 24 Adequacy by Alignments DWA-0.5 not so great Random pairings are usually bad Test Cases (adequacy) 60% 50% 40% 30% 20% 10% 0% No Yes DWA-0.1 HA-nogap HA-gap DWA-0.2 DWA-0.4 DWA-0.3 DWA-0.7 DWA-0.6 DWA-0.5 SYM DWA-0.9 DWA-0.8 Random Alignments 28

29 Phrase Extraction: Summary 29

30 Lessons Learned: Mind your gaps 30

31 Taking into account GAPS Gaps inside phrase pairs have considerable impact on human perceived quality of phrase pair. Do they affect translation? Translation Experiment: Include gap count as a feature (similar to WC) Compare the performance of the different systems w/o the features 31

32 Setup Training GALE P3 Data Maximum sentence length 30 1 Million sentences (random) Tuning Test MT05 GALE DEV07-Blind 32

33 Tuning Baseline gets get better results 29 TUNE Over-fitting? DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU Baseline Unalign- Feat 33

34 Experimental Results Overall gains Web performs better (~2 BP) Best system shifts to a higher precision alignment (DWA-0.5 => DWA-0.6) Higher recall alignments without much change Baseline Unalign-Feat NewsWire DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU 34

35 Experimental Results Overall gains Web Web performs better (~2 BP) Best system shifts to a higher precision alignment (DWA-0.5 => DWA-0.6) Higher recall alignments without much change Baseline Unalign-Feat DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU 35

36 Conclusions We can describe an alignment by its quality and its structure (links, unaligned words). Unaligned words have an important role in phrase extraction (more than number links). The distribution of the gaps inside a phrase pair is related to the distribution of unaligned words in the alignment. Extracted phrase pairs with more gaps have lower human perceived quality. Taking into account the number of gaps in an extracted phrase pair as features achieved overall improvements. 36

37 Questions? 37

38 Adequacy of Phrase Pair by adequate we mean that a source phrase could be used as a translation of a target phrase in at least ONE situation, without loss of meaning 38

39 Experiment Variables Dependent: Adequacy (yes/no) Independent: System : HA-Gaps HA-No Gaps DWA { } Random SYM Random (noisy) August 27, 2009 Number of Source, Target gaps Evaluator 39

40 Results ANOVA System is significative Interaction SourceGaps*TargetGaps is significative Evaluator*System is not significative 40

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language