Translation Quality Estimation: Past, Present, and Future

Size: px
Start display at page:

Download "Translation Quality Estimation: Past, Present, and Future"

Transcription

1 Translation Quality Estimation: Past, Present, and Future André Martins MT Marathon, Lisbon, August 31st, 2017 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 1 / 69

2 This Talk First part: largely based on Lucia Specia s MTM16 slides Second part: joint work with Marcin, Fabio, Ramon, Chris, Roman Third part: my thoughts on the future of QE André Martins (Unbabel) Quality Estimation MTM, 31/8/17 2 / 69

3 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 3 / 69

4 Why Do We Care About Evaluation? In the business of developing MT, we need to: measure progress over new/alternative versions compare different MT systems decide whether a translation is good enough for something optimize parameters of MT systems understand where systems go wrong (diagnosis)... remember Yvette s lecture on Monday: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 4 / 69

5 Why Do We Care About Evaluation? One should optimize a system using the same metric that will be used to evaluate it Issue: how to choose a metric? Choice should be related to the system s purpose (not the case in practice) Other aspects are important for tuning (sentence/corpus-level, fast, cheap, differentiable,...) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 5 / 69

6 Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B?... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69

7 Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B?... Quality for whom/what? End-user (gisting vs dissemination) Post-editor (light vs heavy post-editing) Other applications (e.g. CLIR) MT-system (tuning or diagnosis for improvement)... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69

8 Complex Problem MT Do buy this product, it s their craziest invention! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

9 Complex Problem MT Do buy this product, it s their craziest invention! HT Do not buy this product, it s their craziest invention! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

10 Complex Problem MT Do buy this product, it s their craziest invention! HT Do not buy this product, it s their craziest invention! Severe if end-user does not speak source language Trivial to post-edit by translators André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

11 Complex Problem MT Six-hours battery, 30 minutes to full charge last. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

12 Complex Problem MT Six-hours battery, 30 minutes to full charge last. HT The battery lasts 6 hours and it can be fully recharged in 30 minutes. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

13 Complex Problem MT Six-hours battery, 30 minutes to full charge last. HT The battery lasts 6 hours and it can be fully recharged in 30 minutes. Ok for gisting - meaning preserved Very costly for post-editing if style is to be preserved André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

14 A Taxonomy of MT Evaluation Methods Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69

15 A Taxonomy of MT Evaluation Methods Direct asses. Scoring Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69

16 Manual Assessment: Scoring Is this translation correct? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 10 / 69

17 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 11 / 69

18 Manual Assessment: Ranking André Martins (Unbabel) Quality Estimation MTM, 31/8/17 12 / 69

19 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 13 / 69

20 MQM (Multidimensional Quality Metrics) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 14 / 69

21 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 15 / 69

22 Amount of Post-Editing HTER André Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69

23 Amount of Post-Editing André Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69

24 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 17 / 69

25 Reading Comprehension André Martins (Unbabel) Quality Estimation MTM, 31/8/17 18 / 69

26 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Eye-tracking Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 19 / 69

27 Eye-Tracking André Martins (Unbabel) Quality Estimation MTM, 31/8/17 20 / 69

28 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Eye-tracking Automatic Reference-based André Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69

29 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic Reference-based Quality estimation Eye-tracking BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN,... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69

30 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

31 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that Metrics completely disregard source segment André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

32 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that Metrics completely disregard source segment Main problem: Cannot be applied for MT systems in use André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

33 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic Reference-based Quality estimation Eye-tracking BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN,... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 23 / 69

34 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

35 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

36 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

37 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

38 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? Quality = Is it worth post-editing it? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

39 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? Quality = Is it worth post-editing it? Quality = How much effort to fix it? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

40 Related: Confidence in MT (Blatz et al., 2004; Ueffing and Ney, 2007) Goal: augment the MT system to produce a confidence score Quality Estimation is slightly more general and advantageous: it does not require access to the internals of the MT system (i.e. can treat it as a black box) it makes it possible to use several MT systems, several domains, etc. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 25 / 69

41 Building a model: Quality Estimation: Framework X: examples of source & translations Feature extraction Features Y: Quality scores for examples in X Machine Learning QE model (Slide credit: Lucia Specia) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 26 / 69

42 Applying the model: Quality Estimation: Framework MT system Source Text x s' Translation for x t' Feature extraction Features Quality score y' QE model (Slide credit: Lucia Specia) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 26 / 69

43 Data and Levels of Granularity Sentence level: 1-5 subjective scores, PE time, PE edits Word level: good/bad, good/delete/replace, MQM Phrase level: good/bad Document level: PE effort André Martins (Unbabel) Quality Estimation MTM, 31/8/17 27 / 69

44 Features and Algorithms Adequacy features Source text MT system Translation s -1 s s +1 t -1 t t +1 Complexity features Confidence features Fluency features Algorithms can be used off-the-shelf André Martins (Unbabel) Quality Estimation MTM, 31/8/17 28 / 69

45 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69

46 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams SVM regression with RBF kernel André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69

47 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams SVM regression with RBF kernel QuEst: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69

48 Predicting HTER (WMT16) Sentence-Level QE: SOTA System ID Pearson Spearman English-German YSDA/SNTX+BLEU+SVM POSTECH/SENT-RNN-QV SHEF-LIUM/SVM-NN-emb-QuEst POSTECH/SENT-RNN-QV SHEF-LIUM/SVM-NN-both-emb UGENT-LT3/SCATE-SVM UFAL/MULTIVEC RTM/RTM-FS-SVR UU/UU-SVM UGENT-LT3/SCATE-SVM RTM/RTM-SVR Baseline SVM SHEF/SimpleNets-SRC SHEF/SimpleNets-TGT André Martins (Unbabel) Quality Estimation MTM, 31/8/17 30 / 69

49 Predicting HTER (WMT17) Sentence-Level QE: SOTA André Martins (Unbabel) Quality Estimation MTM, 31/8/17 31 / 69

50 Word-Level QE: SOTA Word-level ok/bad labels (WMT16) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 32 / 69

51 Word-level ok/bad labels (WMT17) Word-Level QE: SOTA André Martins (Unbabel) Quality Estimation MTM, 31/8/17 33 / 69

52 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 34 / 69

53 Pushing the Limits of Quality Estimation Recent TACL paper: André F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramón Astudillo, Chris Hokamp, Roman Grundkiewicz. Pushing the Limits of Quality Estimation. TACL, 5: , André Martins (Unbabel) Quality Estimation MTM, 31/8/17 35 / 69

54 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69

55 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use This paper: considerable improvements by: stacking a neural system and a linear sequential model using automatic post-editing as an auxiliary task André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69

56 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use This paper: considerable improvements by: stacking a neural system and a linear sequential model using automatic post-editing as an auxiliary task Overall (in the WMT16 En-De dataset): 57.47% (+7.95%) for word-level F mult % (+13.26%) for sentence-level Pearson score André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69

57 But isn t MT nearly indistinguishable from humans now? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 37 / 69

58 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69

59 Wrong translation! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69

60 Wrong translation! We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69

61 Le travail de La traduction automatique fonctionne-t-elle? We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69

62 quality estimation BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69

63 Quality Estimation (Blatz et al., 2004; Specia et al., 2013) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Sentence-level QE: predict edit distance (HTER) Word-level QE: predict a OK/BAD label for each translated word André Martins (Unbabel) Quality Estimation MTM, 31/8/17 39 / 69

64 Quality Estimation (Blatz et al., 2004; Specia et al., 2013) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Sentence-level QE: predict edit distance (HTER) Word-level QE: predict a OK/BAD label for each translated word This paper: we first engineer a strong word-level QE system then convert it to make sentence-level predictions too André Martins (Unbabel) Quality Estimation MTM, 31/8/17 39 / 69

65 Why Quality Estimation? 1 Informs an end user about the reliability of translated content 2 Decide if a translation is good to go or requires a human to fix it 3 Highlights to a human post-editor the words that need to be revised André Martins (Unbabel) Quality Estimation MTM, 31/8/17 40 / 69

66 Unbabel Translation Pipeline (Credit: João Graça s EAMT presentation) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 41 / 69

67 Dataset of Post-Edited Translations WMT 2016: English-German, IT domain 12,000 training sentences, 1,000 dev sentences, 2,000 test sentences Quality labels obtained from post-edited translations via TERCOM (Snover et al., 2006) In the paper: also experiments with WMT 2015 (English-Spanish) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 42 / 69

68 Model #1: Linear Sequential Model First shot: a discriminative, shallow, CRF-like model BAD BAD BAD OK OK OK The model predicts a label sequence ŷ 1:N {OK, BAD} N : ŷ 1:N = arg max y N w φ u (s, t, A, y i ) }{{} i=1 unigram features N+1 + i=1 w φ b (s, t, A, y i, y i 1 ) }{{} bigram features André Martins (Unbabel) Quality Estimation MTM, 31/8/17 43 / 69

69 Features Features Label Input (referenced by the ith target word) unigram y i... Bias Word, LeftWord, RightWord SourceWord, SourceLeftWord, SourceRightWord LargestNGramLeft/Right SourceLargestNGramLeft/Right PosTag, SourcePosTag Word+LeftWord, Word+RightWord Word+SourceWord, PosTag+SourcePosTag simple y i y i 1... Bias bigram rich y i y i 1... all above bigrams y i+1 y i... Word+SourceWord, PosTag+SourcePosTag syntactic y i... DepRel, Word+DepRel HeadWord/PosTag+Word/PosTag LeftSibWord/PosTag+Word/PosTag RightSibWord/PosTag+Word/PosTag GrandWord/PosTag+HeadWord/PosTag+Word/PosTag André Martins (Unbabel) Quality Estimation MTM, 31/8/17 44 / 69

70 Performance of Linear Model (Dev Set) unigrams only +simple bigrams rich bigrams syntactic (full) F MULT 1 (WMT16, Dev) Large impact of rich bigram features (3 points) and syntactic features (another 2.5 points) Net improvement exceeds 6 points over the unigram model André Martins (Unbabel) Quality Estimation MTM, 31/8/17 45 / 69

71 Model #2: Neural Model Second shot: a model inspired by QUETCH (Kreutzer et al., 2015): a feedforward neural net whose inputs are target words and their aligned source words We depart from QUETCH in a few aspects: we add recurrent layers more depth embeddings for the POS tags (not just the words) dropout regularization layer normalization In the paper: ablation experiments to validate the architecture André Martins (Unbabel) Quality Estimation MTM, 31/8/17 46 / 69

72 softmax OK/BAD 2 x FF BiGRU x FF 2 x BiGRU x FF 2 x 400 target word source word target POS source POS embeddings 3 x 64 3 x 64 3 x 50 3 x 50 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 47 / 69

73 Linear vs Neural Models (Test Set) UGENT system LinearQE NeuralQE F MULT 1 (WMT16) UGENT is Tezcan et al. (2016) (best in WMT16 after us) Both our linear and neural systems outperform it by a big margin André Martins (Unbabel) Quality Estimation MTM, 31/8/17 48 / 69

74 Model #3: Stacked Architecture Stacked learning is a simple and effective way of ensembling structured models (Cohen and de Carvalho, 2005; Martins et al., 2008) Key idea: include the neural model s prediction as an additional feature in the linear model Better performance if we use several neural models with different random initializations and data shuffles André Martins (Unbabel) Quality Estimation MTM, 31/8/17 49 / 69

75 Stacking Linear and Neural Models UGENT system LinearQE NeuralQE StackedQE F MULT 1 (WMT16) Large improvement (3 points) just by combining the two systems! This shows that they are highly complementary We won the WMT16 shared task with this approach... but can we still do better? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 50 / 69

76 Stacking Linear and Neural Models UGENT system LinearQE NeuralQE StackedQE F MULT 1 (WMT16) Large improvement (3 points) just by combining the two systems! This shows that they are highly complementary We won the WMT16 shared task with this approach... but can we still do better? Yes. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 50 / 69

77 Automatic Post-Editing (Simard et al., 2007)... remember Marcin s lecture on Wednesday: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 51 / 69

78 Automatic Post-Editing (Simard et al., 2007) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Goal: automatically correct the output of MT While word-level QE detects mistakes, APE seeks to correct mistakes Still, pretty similar. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 52 / 69

79 Roundtrip and Log-Linear Combination Current best APE system (Junczys-Dowmunt and Grundkiewicz, 2016): generate a large amount of artificial data ( roundtrip translations ) train two NMT systems: s p and t p combine the two systems with a log-linear model Our strategy: train an APE system tuned for QE test time: project the predicted post-edited text p onto quality labels André Martins (Unbabel) Quality Estimation MTM, 31/8/17 53 / 69

80 Roundtrip and Log-Linear Combination Current best APE system (Junczys-Dowmunt and Grundkiewicz, 2016): generate a large amount of artificial data ( roundtrip translations ) train two NMT systems: s p and t p combine the two systems with a log-linear model Our strategy: train an APE system tuned for QE test time: project the predicted post-edited text p onto quality labels Two key differences with respect to the other pure QE systems: Learned from finer-grained information (post-edited text) A lot more data (500,000 roundtrip translations) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 53 / 69

81 Model #4: APE-Based Quality Estimation UGENT system LinearQE NeuralQE StackedQE APE-QE F MULT 1 (WMT16) This strategy outperforms the pure QE systems by 5 points!! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 54 / 69

82 Model #5: Combining Linear, Neural, and APE UGENT system LinearQE NeuralQE StackedQE APE-QE FullStackedQE F MULT 1 (WMT16)... combining all the models together, we get another improvement of 2 points!! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 55 / 69

83 Examples Source Combines the hue value of the blend color with the luminance and saturation of the base color to create the result color. MT Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. PE (Reference) Kombiniert den Farbtonwert der Mischfarbe mit der Luminanz und Sättigung der Grundfarbe. APE Kombiniert den Farbton der Mischfarbe mit der Luminanz und die Sättigung der Grundfarbe, um die Ergebnisfarbe zu erstellen. StackedQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. ApeQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. FullStackedQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 56 / 69

84 Performance over Sentence Length Figure: Averaged number of words predicted as bad by the different systems in the WMT16 gold dev set, for different bins of the sentence length. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 57 / 69

85 Sentence-Level Quality Estimation Now that we have a strong word-level system, we apply a very simple procedure to obtain sentence-level predictions: For pure QE: convert word-level predictions to a sentence HTER prediction by counting the fraction of BAD words For APE: use directly the HTER between t and p For the combined system: average of the two above André Martins (Unbabel) Quality Estimation MTM, 31/8/17 58 / 69

86 Sentence-Level Quality Estimation YANDEX system StackedQE APE-QE FullStackedQE Pearson correlation (WMT16) YANDEX was the best sentence-level system at WMT16 (Kozlova et al., 2016) Our simple conversion led to impressive results, well above the SOTA! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 59 / 69

87 This Year s WMT17: Other Competitive Methods André Martins (Unbabel) Quality Estimation MTM, 31/8/17 60 / 69

88 This Year s WMT17: Other Competitive Methods André Martins (Unbabel) Quality Estimation MTM, 31/8/17 61 / 69

89 Conclusions New SOTA systems for word-level and sentence-level QE considerably more accurate than previously existing systems First, we proposed a new pure QE system which stacks a linear and a neural system Then, by relating the tasks of APE and word-level QE, we derived a new APE-based QE system Finally, we combined the two systems via a full stacking architecture The full system was extended to sentence-level QE by virtue of a simple word-to-sentence conversion (no further training or tuning) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 62 / 69

90 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 63 / 69

91 The Future of QE Word-level: go beyond ok/bad labels look at missing words from the source (particularly relevant for NMT) mistakes of NMT systems have different patterns than PBMT how to estimate adequacy? Sentence-level: multi-task learning with word-level predict MQM scores (useful for quality ensurance of crowd-sourced translation) Document-level: take inter-sentential context into account (very relevant for chat) evaluate translation global coherence André Martins (Unbabel) Quality Estimation MTM, 31/8/17 64 / 69

92 The Future of QE Beyond MT: human translation quality estimation much harder: humans have higher variability hard to distinguish good non-literal translations from complete rubbish André Martins (Unbabel) Quality Estimation MTM, 31/8/17 65 / 69

93 We re Hiring! Excited about MT, crowdsourcing and Lisbon? Andre Martins (Unbabel) Quality Estimation MTM, 31/8/17 66 / 69

94 Acknowledgments EXPERT project (EU Marie Curie ITN No ) Fundação para a Ciência e Tecnologia (FCT), through contracts UID/EEA/50008/2013 and UID/CEC/50021/2013 LearnBig project (PTDC/EEI-SII/7092/2014) GoLocal project (grant CMUPERI/TIC/0046/2014) Amazon Academic Research Awards program André Martins (Unbabel) Quality Estimation MTM, 31/8/17 67 / 69

95 References I Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. (2004). Confidence estimation for machine translation. In Proc. of the International Conference on Computational Linguistics, page 315. Cohen, W. W. and de Carvalho, V. R. (2005). Stacked sequential learning. In IJCAI. Junczys-Dowmunt, M. and Grundkiewicz, R. (2016). Log-linear combinations of monolingual and bilingual neural machine translation models for automatic post-editing. In Proceedings of the First Conference on Machine Translation, pages , Berlin, Germany. Association for Computational Linguistics. Kozlova, A., Shmatova, M., and Frolov, A. (2016). YSDA Participation in the WMT 16 Quality Estimation Shared Task. In Proceedings of the First Conference on Machine Translation, pages Kreutzer, J., Schamoni, S., and Riezler, S. (2015). Quality estimation from scratch (quetch): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages Martins, A. F. T., Das, D., Smith, N. A., and Xing, E. P. (2008). Stacking Dependency Parsers. In Proc. of Empirical Methods for Natural Language Processing. Simard, M., Ueffing, N., Isabelle, P., and Kuhn, R. (2007). Rule-based translation with statistical phrase-based post-editing. In Proceedings of the Second Workshop on Statistical Machine Translation, pages André Martins (Unbabel) Quality Estimation MTM, 31/8/17 68 / 69

96 References II Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages Specia, L., Shah, K., de Souza, J. G., and Cohn, T. (2013). QuEst - a translation quality estimation framework. In Proc. of the Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages Tezcan, A., Hoste, V., and Macken, L. (2016). Ugent-lt3 scate submission for wmt16 shared task on quality estimation. In Proceedings of the First Conference on Machine Translation, pages , Berlin, Germany. Association for Computational Linguistics. Ueffing, N. and Ney, H. (2007). Word-level confidence estimation for machine translation. Computational Linguistics, 33(1):9 40. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 69 / 69

An efficient and user-friendly tool for machine translation quality estimation

An efficient and user-friendly tool for machine translation quality estimation An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk

More information

Marcello Federico MMT Srl / FBK Trento, Italy

Marcello Federico MMT Srl / FBK Trento, Italy Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Vis-Eval Metric Viewer: A Visualisation Tool for Inspecting and Evaluating Metric Scores of Machine Translation Output

Vis-Eval Metric Viewer: A Visualisation Tool for Inspecting and Evaluating Metric Scores of Machine Translation Output Vis-Eval Metric Viewer: A Visualisation Tool for Inspecting and Evaluating Metric Scores of Machine Translation Output David Steele and Lucia Specia Department of Computer Science University of Sheffield

More information

A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation

A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation Mikel L. Forcada Felipe Sánchez-Martínez Dept. de Llenguatges

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 100 OCTOBER

The Prague Bulletin of Mathematical Linguistics NUMBER 100 OCTOBER The Prague Bulletin of Mathematical Linguistics NUMBER 100 OCTOBER 2013 19 30 QuEst Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation Kashif Shah a, Eleftherios

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

System Combination Using Joint, Binarised Feature Vectors

System Combination Using Joint, Binarised Feature Vectors System Combination Using Joint, Binarised Feature Vectors Christian F EDERMAN N 1 (1) DFKI GmbH, Language Technology Lab, Stuhlsatzenhausweg 3, D-6613 Saarbrücken, GERMANY cfedermann@dfki.de Abstract We

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach

More information

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith

More information

Semantic image search using queries

Semantic image search using queries Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Improving the quality of a customized SMT system using shared training data. August 28, 2009

Improving the quality of a customized SMT system using shared training data.  August 28, 2009 Improving the quality of a customized SMT system using shared training data Chris.Wendt@microsoft.com Will.Lewis@microsoft.com August 28, 2009 1 Overview Engine and Customization Basics Experiment Objective

More information

Neural Machine Translation In Linear Time

Neural Machine Translation In Linear Time Neural Machine Translation In Linear Time Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu Presenter: SunMao sm4206 YuZheng yz2978 OVERVIEW

More information

Lecture 14: Annotation

Lecture 14: Annotation Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose

More information

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Institute of Computational Linguistics Treatment of Markup in Statistical Machine Translation

Institute of Computational Linguistics Treatment of Markup in Statistical Machine Translation Institute of Computational Linguistics Treatment of Markup in Statistical Machine Translation Master s Thesis Mathias Müller September 13, 2017 Motivation Researchers often give themselves the luxury of

More information

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

DCU System Report on the WMT 2017 Multi-modal Machine Translation Task

DCU System Report on the WMT 2017 Multi-modal Machine Translation Task DCU System Report on the WMT 2017 Multi-modal Machine Translation Task Iacer Calixto and Koel Dutta Chowdhury and Qun Liu {iacer.calixto,koel.chowdhury,qun.liu}@adaptcentre.ie Abstract We report experiments

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

Integration of QuEst and Okapi

Integration of QuEst and Okapi Integration of QuEst and Okapi Final Activity Report Participants: Gustavo Paetzold, Lucia Specia and Kashif Shah (Sheffield University) Yves Savourel (Okapi) 1. Introduction The proposed project aimed

More information

A Simple (?) Exercise: Predicting the Next Word

A Simple (?) Exercise: Predicting the Next Word CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane

More information

CS395T Project 2: Shift-Reduce Parsing

CS395T Project 2: Shift-Reduce Parsing CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model

More information

Lign/CSE 256, Programming Assignment 1: Language Models

Lign/CSE 256, Programming Assignment 1: Language Models Lign/CSE 256, Programming Assignment 1: Language Models 16 January 2008 due 1 Feb 2008 1 Preliminaries First, make sure you can access the course materials. 1 The components are: ˆ code1.zip: the Java

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

BBN System Description for WMT10 System Combination Task

BBN System Description for WMT10 System Combination Task BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013 The Perceptron Simon Šuster, University of Groningen Course Learning from data November 18, 2013 References Hal Daumé III: A Course in Machine Learning http://ciml.info Tom M. Mitchell: Machine Learning

More information

Sentence-Level Confidence Estimation for MT

Sentence-Level Confidence Estimation for MT Sentence-Level Confidence Estimation for MT Lucia Specia Nicola Cancedda, Marc Dymetman, Craig Saunders - Xerox Marco Turchi, Nello Cristianini Univ. Bristol Zhuoran Wang, John Shawe-Taylor UCL Outline

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Adversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University,

Adversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University, Adversarial Examples and Adversarial Training Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University, 2017-05-30 Overview What are adversarial examples? Why do they happen?

More information

Automatic Summarization

Automatic Summarization Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

More information

Sparse Non-negative Matrix Language Modeling

Sparse Non-negative Matrix Language Modeling Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

VERIZON ACHIEVES BEST IN TEST AND WINS THE P3 MOBILE BENCHMARK USA

VERIZON ACHIEVES BEST IN TEST AND WINS THE P3 MOBILE BENCHMARK USA VERIZON ACHIEVES BEST IN TEST AND WINS THE P3 MOBILE BENCHMARK USA INTRODUCTION in test 10/2018 Mobile Benchmark USA The international benchmarking expert P3 has been testing the performance of cellular

More information

Corpus Acquisition from the Internet

Corpus Acquisition from the Internet Corpus Acquisition from the Internet Philipp Koehn partially based on slides from Christian Buck 26 October 2017 Big Data 1 For many language pairs, lots of text available. Text you read in your lifetime

More information

Neural Nets & Deep Learning

Neural Nets & Deep Learning Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg

More information

Prototype of Silver Corpus Merging Framework

Prototype of Silver Corpus Merging Framework www.visceral.eu Prototype of Silver Corpus Merging Framework Deliverable number D3.3 Dissemination level Public Delivery data 30.4.2014 Status Authors Final Markus Krenn, Allan Hanbury, Georg Langs This

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Sequence Prediction with Neural Segmental Models. Hao Tang

Sequence Prediction with Neural Segmental Models. Hao Tang Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A General-Purpose Rule Extractor for SCFG-Based Machine Translation A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614

More information

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and

More information

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

QuEst++ Tutorial. Carolina Scarton and Lucia Specia January 19, 2016

QuEst++ Tutorial. Carolina Scarton and Lucia Specia January 19, 2016 QuEst++ Tutorial Carolina Scarton and Lucia Specia January 19, 2016 Abstract In this tutorial we present QuEst++, an open source framework for pipelined Translation Quality Estimation. QuEst++ is the newest

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm

Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Notebook for PAN at CLEF 2014 Erwan Moreau 1, Arun Jayapal 2, and Carl Vogel 3 1 moreaue@cs.tcd.ie 2 jayapala@cs.tcd.ie

More information

Kyoto-NMT: a Neural Machine Translation implementation in Chainer

Kyoto-NMT: a Neural Machine Translation implementation in Chainer Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

CS 288: Statistical NLP Assignment 1: Language Modeling

CS 288: Statistical NLP Assignment 1: Language Modeling CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms

More information

Parsing with Dynamic Programming

Parsing with Dynamic Programming CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

Transition-based dependency parsing

Transition-based dependency parsing Transition-based dependency parsing Syntactic analysis (5LN455) 2014-12-18 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Overview Arc-factored dependency parsing

More information

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking

More information

Project 3 Q&A. Jonathan Krause

Project 3 Q&A. Jonathan Krause Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

A bit of theory: Algorithms

A bit of theory: Algorithms A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.

More information

Experimenting in MT: Moses Toolkit and Eman

Experimenting in MT: Moses Toolkit and Eman Experimenting in MT: Moses Toolkit and Eman Aleš Tamchyna, Ondřej Bojar Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University, Prague Tue Sept 8, 2015 1 / 30

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

A Quick Guide to MaltParser Optimization

A Quick Guide to MaltParser Optimization A Quick Guide to MaltParser Optimization Joakim Nivre Johan Hall 1 Introduction MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

DeepPose & Convolutional Pose Machines

DeepPose & Convolutional Pose Machines DeepPose & Convolutional Pose Machines Main Concepts 1. 2. 3. 4. CNN with regressor head. Object Localization. Bigger to smaller or Smaller to bigger view. Deep Supervised learning to prevent Vanishing

More information

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007

More information

D6.1.2: Second report on scientific evaluations

D6.1.2: Second report on scientific evaluations D6.1.2: Second report on scientific evaluations UPVLC, XEROX, JSI-K4A, RWTH, EML and DDS Distribution: Public translectures Transcription and Translation of Video Lectures ICT Project 287755 Deliverable

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Linguistic Resources for Arabic Handwriting Recognition

Linguistic Resources for Arabic Handwriting Recognition Linguistic Resources for Arabic Handwriting Recognition Stephanie M. Strassel Linguistic Data Consortium 3600 Market Street, Suite 810 Philadelphia, PA 19104 USA strassel@ldc.upenn.edu Abstract MADCAT

More information

Context Encoding LSTM CS224N Course Project

Context Encoding LSTM CS224N Course Project Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing

More information

D5.6 - Evaluation Benchmarks

D5.6 - Evaluation Benchmarks Th is document is part of the Project Machine Tr a n sla tion En h a n ced Com pu ter A ssisted Tr a n sla tion (Ma teca t ), fu nded by the 7th Framework Programme of th e Eu r opea n Com m ission th

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information