Translation Quality Estimation: Past, Present, and Future
|
|
- Sabrina Riley
- 6 years ago
- Views:
Transcription
1 Translation Quality Estimation: Past, Present, and Future André Martins MT Marathon, Lisbon, August 31st, 2017 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 1 / 69
2 This Talk First part: largely based on Lucia Specia s MTM16 slides Second part: joint work with Marcin, Fabio, Ramon, Chris, Roman Third part: my thoughts on the future of QE André Martins (Unbabel) Quality Estimation MTM, 31/8/17 2 / 69
3 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 3 / 69
4 Why Do We Care About Evaluation? In the business of developing MT, we need to: measure progress over new/alternative versions compare different MT systems decide whether a translation is good enough for something optimize parameters of MT systems understand where systems go wrong (diagnosis)... remember Yvette s lecture on Monday: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 4 / 69
5 Why Do We Care About Evaluation? One should optimize a system using the same metric that will be used to evaluate it Issue: how to choose a metric? Choice should be related to the system s purpose (not the case in practice) Other aspects are important for tuning (sentence/corpus-level, fast, cheap, differentiable,...) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 5 / 69
6 Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B?... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69
7 Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B?... Quality for whom/what? End-user (gisting vs dissemination) Post-editor (light vs heavy post-editing) Other applications (e.g. CLIR) MT-system (tuning or diagnosis for improvement)... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69
8 Complex Problem MT Do buy this product, it s their craziest invention! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69
9 Complex Problem MT Do buy this product, it s their craziest invention! HT Do not buy this product, it s their craziest invention! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69
10 Complex Problem MT Do buy this product, it s their craziest invention! HT Do not buy this product, it s their craziest invention! Severe if end-user does not speak source language Trivial to post-edit by translators André Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69
11 Complex Problem MT Six-hours battery, 30 minutes to full charge last. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69
12 Complex Problem MT Six-hours battery, 30 minutes to full charge last. HT The battery lasts 6 hours and it can be fully recharged in 30 minutes. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69
13 Complex Problem MT Six-hours battery, 30 minutes to full charge last. HT The battery lasts 6 hours and it can be fully recharged in 30 minutes. Ok for gisting - meaning preserved Very costly for post-editing if style is to be preserved André Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69
14 A Taxonomy of MT Evaluation Methods Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69
15 A Taxonomy of MT Evaluation Methods Direct asses. Scoring Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69
16 Manual Assessment: Scoring Is this translation correct? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 10 / 69
17 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 11 / 69
18 Manual Assessment: Ranking André Martins (Unbabel) Quality Estimation MTM, 31/8/17 12 / 69
19 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 13 / 69
20 MQM (Multidimensional Quality Metrics) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 14 / 69
21 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 15 / 69
22 Amount of Post-Editing HTER André Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69
23 Amount of Post-Editing André Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69
24 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 17 / 69
25 Reading Comprehension André Martins (Unbabel) Quality Estimation MTM, 31/8/17 18 / 69
26 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Eye-tracking Automatic André Martins (Unbabel) Quality Estimation MTM, 31/8/17 19 / 69
27 Eye-Tracking André Martins (Unbabel) Quality Estimation MTM, 31/8/17 20 / 69
28 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Eye-tracking Automatic Reference-based André Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69
29 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic Reference-based Quality estimation Eye-tracking BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN,... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69
30 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69
31 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that Metrics completely disregard source segment André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69
32 Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control HT 2 However, you cannot choose yourself HT 3 However, not everything is up to you to decide HT 4 But you can t choose that Metrics completely disregard source segment Main problem: Cannot be applied for MT systems in use André Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69
33 A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Error annotation Task-based Post-editing Reading comprehension Automatic Reference-based Quality estimation Eye-tracking BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN,... André Martins (Unbabel) Quality Estimation MTM, 31/8/17 23 / 69
34 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
35 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
36 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
37 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
38 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? Quality = Is it worth post-editing it? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
39 Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data: purpose is clear, no comparison to references, source considered Quality = Can we publish it as is? Quality = Can a reader get the gist? Quality = Is it worth post-editing it? Quality = How much effort to fix it? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69
40 Related: Confidence in MT (Blatz et al., 2004; Ueffing and Ney, 2007) Goal: augment the MT system to produce a confidence score Quality Estimation is slightly more general and advantageous: it does not require access to the internals of the MT system (i.e. can treat it as a black box) it makes it possible to use several MT systems, several domains, etc. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 25 / 69
41 Building a model: Quality Estimation: Framework X: examples of source & translations Feature extraction Features Y: Quality scores for examples in X Machine Learning QE model (Slide credit: Lucia Specia) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 26 / 69
42 Applying the model: Quality Estimation: Framework MT system Source Text x s' Translation for x t' Feature extraction Features Quality score y' QE model (Slide credit: Lucia Specia) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 26 / 69
43 Data and Levels of Granularity Sentence level: 1-5 subjective scores, PE time, PE edits Word level: good/bad, good/delete/replace, MQM Phrase level: good/bad Document level: PE effort André Martins (Unbabel) Quality Estimation MTM, 31/8/17 27 / 69
44 Features and Algorithms Adequacy features Source text MT system Translation s -1 s s +1 t -1 t t +1 Complexity features Confidence features Fluency features Algorithms can be used off-the-shelf André Martins (Unbabel) Quality Estimation MTM, 31/8/17 28 / 69
45 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69
46 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams SVM regression with RBF kernel André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69
47 Sentence-Level QE: Baseline Setting Features: number of tokens in the source and target sentences average source token length average number of occurrences of words in the target number of punctuation marks in source and target sentences LM probability of source and target sentences average number of translations per source word % of seen source n-grams SVM regression with RBF kernel QuEst: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 29 / 69
48 Predicting HTER (WMT16) Sentence-Level QE: SOTA System ID Pearson Spearman English-German YSDA/SNTX+BLEU+SVM POSTECH/SENT-RNN-QV SHEF-LIUM/SVM-NN-emb-QuEst POSTECH/SENT-RNN-QV SHEF-LIUM/SVM-NN-both-emb UGENT-LT3/SCATE-SVM UFAL/MULTIVEC RTM/RTM-FS-SVR UU/UU-SVM UGENT-LT3/SCATE-SVM RTM/RTM-SVR Baseline SVM SHEF/SimpleNets-SRC SHEF/SimpleNets-TGT André Martins (Unbabel) Quality Estimation MTM, 31/8/17 30 / 69
49 Predicting HTER (WMT17) Sentence-Level QE: SOTA André Martins (Unbabel) Quality Estimation MTM, 31/8/17 31 / 69
50 Word-Level QE: SOTA Word-level ok/bad labels (WMT16) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 32 / 69
51 Word-level ok/bad labels (WMT17) Word-Level QE: SOTA André Martins (Unbabel) Quality Estimation MTM, 31/8/17 33 / 69
52 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 34 / 69
53 Pushing the Limits of Quality Estimation Recent TACL paper: André F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramón Astudillo, Chris Hokamp, Roman Grundkiewicz. Pushing the Limits of Quality Estimation. TACL, 5: , André Martins (Unbabel) Quality Estimation MTM, 31/8/17 35 / 69
54 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69
55 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use This paper: considerable improvements by: stacking a neural system and a linear sequential model using automatic post-editing as an auxiliary task André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69
56 In a Nutshell Quality estimation: evaluate a translation on the fly with no reference Useful for predicting (or sidestepping) human post-editing effort Until now: not really accurate enough for practical use This paper: considerable improvements by: stacking a neural system and a linear sequential model using automatic post-editing as an auxiliary task Overall (in the WMT16 En-De dataset): 57.47% (+7.95%) for word-level F mult % (+13.26%) for sentence-level Pearson score André Martins (Unbabel) Quality Estimation MTM, 31/8/17 36 / 69
57 But isn t MT nearly indistinguishable from humans now? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 37 / 69
58 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69
59 Wrong translation! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69
60 Wrong translation! We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69
61 Le travail de La traduction automatique fonctionne-t-elle? We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69
62 quality estimation BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? We can fix it with a human post-editor! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 38 / 69
63 Quality Estimation (Blatz et al., 2004; Specia et al., 2013) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Sentence-level QE: predict edit distance (HTER) Word-level QE: predict a OK/BAD label for each translated word André Martins (Unbabel) Quality Estimation MTM, 31/8/17 39 / 69
64 Quality Estimation (Blatz et al., 2004; Specia et al., 2013) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Sentence-level QE: predict edit distance (HTER) Word-level QE: predict a OK/BAD label for each translated word This paper: we first engineer a strong word-level QE system then convert it to make sentence-level predictions too André Martins (Unbabel) Quality Estimation MTM, 31/8/17 39 / 69
65 Why Quality Estimation? 1 Informs an end user about the reliability of translated content 2 Decide if a translation is good to go or requires a human to fix it 3 Highlights to a human post-editor the words that need to be revised André Martins (Unbabel) Quality Estimation MTM, 31/8/17 40 / 69
66 Unbabel Translation Pipeline (Credit: João Graça s EAMT presentation) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 41 / 69
67 Dataset of Post-Edited Translations WMT 2016: English-German, IT domain 12,000 training sentences, 1,000 dev sentences, 2,000 test sentences Quality labels obtained from post-edited translations via TERCOM (Snover et al., 2006) In the paper: also experiments with WMT 2015 (English-Spanish) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 42 / 69
68 Model #1: Linear Sequential Model First shot: a discriminative, shallow, CRF-like model BAD BAD BAD OK OK OK The model predicts a label sequence ŷ 1:N {OK, BAD} N : ŷ 1:N = arg max y N w φ u (s, t, A, y i ) }{{} i=1 unigram features N+1 + i=1 w φ b (s, t, A, y i, y i 1 ) }{{} bigram features André Martins (Unbabel) Quality Estimation MTM, 31/8/17 43 / 69
69 Features Features Label Input (referenced by the ith target word) unigram y i... Bias Word, LeftWord, RightWord SourceWord, SourceLeftWord, SourceRightWord LargestNGramLeft/Right SourceLargestNGramLeft/Right PosTag, SourcePosTag Word+LeftWord, Word+RightWord Word+SourceWord, PosTag+SourcePosTag simple y i y i 1... Bias bigram rich y i y i 1... all above bigrams y i+1 y i... Word+SourceWord, PosTag+SourcePosTag syntactic y i... DepRel, Word+DepRel HeadWord/PosTag+Word/PosTag LeftSibWord/PosTag+Word/PosTag RightSibWord/PosTag+Word/PosTag GrandWord/PosTag+HeadWord/PosTag+Word/PosTag André Martins (Unbabel) Quality Estimation MTM, 31/8/17 44 / 69
70 Performance of Linear Model (Dev Set) unigrams only +simple bigrams rich bigrams syntactic (full) F MULT 1 (WMT16, Dev) Large impact of rich bigram features (3 points) and syntactic features (another 2.5 points) Net improvement exceeds 6 points over the unigram model André Martins (Unbabel) Quality Estimation MTM, 31/8/17 45 / 69
71 Model #2: Neural Model Second shot: a model inspired by QUETCH (Kreutzer et al., 2015): a feedforward neural net whose inputs are target words and their aligned source words We depart from QUETCH in a few aspects: we add recurrent layers more depth embeddings for the POS tags (not just the words) dropout regularization layer normalization In the paper: ablation experiments to validate the architecture André Martins (Unbabel) Quality Estimation MTM, 31/8/17 46 / 69
72 softmax OK/BAD 2 x FF BiGRU x FF 2 x BiGRU x FF 2 x 400 target word source word target POS source POS embeddings 3 x 64 3 x 64 3 x 50 3 x 50 André Martins (Unbabel) Quality Estimation MTM, 31/8/17 47 / 69
73 Linear vs Neural Models (Test Set) UGENT system LinearQE NeuralQE F MULT 1 (WMT16) UGENT is Tezcan et al. (2016) (best in WMT16 after us) Both our linear and neural systems outperform it by a big margin André Martins (Unbabel) Quality Estimation MTM, 31/8/17 48 / 69
74 Model #3: Stacked Architecture Stacked learning is a simple and effective way of ensembling structured models (Cohen and de Carvalho, 2005; Martins et al., 2008) Key idea: include the neural model s prediction as an additional feature in the linear model Better performance if we use several neural models with different random initializations and data shuffles André Martins (Unbabel) Quality Estimation MTM, 31/8/17 49 / 69
75 Stacking Linear and Neural Models UGENT system LinearQE NeuralQE StackedQE F MULT 1 (WMT16) Large improvement (3 points) just by combining the two systems! This shows that they are highly complementary We won the WMT16 shared task with this approach... but can we still do better? André Martins (Unbabel) Quality Estimation MTM, 31/8/17 50 / 69
76 Stacking Linear and Neural Models UGENT system LinearQE NeuralQE StackedQE F MULT 1 (WMT16) Large improvement (3 points) just by combining the two systems! This shows that they are highly complementary We won the WMT16 shared task with this approach... but can we still do better? Yes. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 50 / 69
77 Automatic Post-Editing (Simard et al., 2007)... remember Marcin s lecture on Wednesday: André Martins (Unbabel) Quality Estimation MTM, 31/8/17 51 / 69
78 Automatic Post-Editing (Simard et al., 2007) BAD BAD BAD OK OK OK Le travail de La traduction automatique fonctionne-t-elle? Goal: automatically correct the output of MT While word-level QE detects mistakes, APE seeks to correct mistakes Still, pretty similar. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 52 / 69
79 Roundtrip and Log-Linear Combination Current best APE system (Junczys-Dowmunt and Grundkiewicz, 2016): generate a large amount of artificial data ( roundtrip translations ) train two NMT systems: s p and t p combine the two systems with a log-linear model Our strategy: train an APE system tuned for QE test time: project the predicted post-edited text p onto quality labels André Martins (Unbabel) Quality Estimation MTM, 31/8/17 53 / 69
80 Roundtrip and Log-Linear Combination Current best APE system (Junczys-Dowmunt and Grundkiewicz, 2016): generate a large amount of artificial data ( roundtrip translations ) train two NMT systems: s p and t p combine the two systems with a log-linear model Our strategy: train an APE system tuned for QE test time: project the predicted post-edited text p onto quality labels Two key differences with respect to the other pure QE systems: Learned from finer-grained information (post-edited text) A lot more data (500,000 roundtrip translations) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 53 / 69
81 Model #4: APE-Based Quality Estimation UGENT system LinearQE NeuralQE StackedQE APE-QE F MULT 1 (WMT16) This strategy outperforms the pure QE systems by 5 points!! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 54 / 69
82 Model #5: Combining Linear, Neural, and APE UGENT system LinearQE NeuralQE StackedQE APE-QE FullStackedQE F MULT 1 (WMT16)... combining all the models together, we get another improvement of 2 points!! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 55 / 69
83 Examples Source Combines the hue value of the blend color with the luminance and saturation of the base color to create the result color. MT Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. PE (Reference) Kombiniert den Farbtonwert der Mischfarbe mit der Luminanz und Sättigung der Grundfarbe. APE Kombiniert den Farbton der Mischfarbe mit der Luminanz und die Sättigung der Grundfarbe, um die Ergebnisfarbe zu erstellen. StackedQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. ApeQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. FullStackedQE Kombiniert den Farbton Wert der Angleichungsfarbe mit der Luminanz und Sättigung der Grundfarbe zu erstellen. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 56 / 69
84 Performance over Sentence Length Figure: Averaged number of words predicted as bad by the different systems in the WMT16 gold dev set, for different bins of the sentence length. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 57 / 69
85 Sentence-Level Quality Estimation Now that we have a strong word-level system, we apply a very simple procedure to obtain sentence-level predictions: For pure QE: convert word-level predictions to a sentence HTER prediction by counting the fraction of BAD words For APE: use directly the HTER between t and p For the combined system: average of the two above André Martins (Unbabel) Quality Estimation MTM, 31/8/17 58 / 69
86 Sentence-Level Quality Estimation YANDEX system StackedQE APE-QE FullStackedQE Pearson correlation (WMT16) YANDEX was the best sentence-level system at WMT16 (Kozlova et al., 2016) Our simple conversion led to impressive results, well above the SOTA! André Martins (Unbabel) Quality Estimation MTM, 31/8/17 59 / 69
87 This Year s WMT17: Other Competitive Methods André Martins (Unbabel) Quality Estimation MTM, 31/8/17 60 / 69
88 This Year s WMT17: Other Competitive Methods André Martins (Unbabel) Quality Estimation MTM, 31/8/17 61 / 69
89 Conclusions New SOTA systems for word-level and sentence-level QE considerably more accurate than previously existing systems First, we proposed a new pure QE system which stacks a linear and a neural system Then, by relating the tasks of APE and word-level QE, we derived a new APE-based QE system Finally, we combined the two systems via a full stacking architecture The full system was extended to sentence-level QE by virtue of a simple word-to-sentence conversion (no further training or tuning) André Martins (Unbabel) Quality Estimation MTM, 31/8/17 62 / 69
90 Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future André Martins (Unbabel) Quality Estimation MTM, 31/8/17 63 / 69
91 The Future of QE Word-level: go beyond ok/bad labels look at missing words from the source (particularly relevant for NMT) mistakes of NMT systems have different patterns than PBMT how to estimate adequacy? Sentence-level: multi-task learning with word-level predict MQM scores (useful for quality ensurance of crowd-sourced translation) Document-level: take inter-sentential context into account (very relevant for chat) evaluate translation global coherence André Martins (Unbabel) Quality Estimation MTM, 31/8/17 64 / 69
92 The Future of QE Beyond MT: human translation quality estimation much harder: humans have higher variability hard to distinguish good non-literal translations from complete rubbish André Martins (Unbabel) Quality Estimation MTM, 31/8/17 65 / 69
93 We re Hiring! Excited about MT, crowdsourcing and Lisbon? Andre Martins (Unbabel) Quality Estimation MTM, 31/8/17 66 / 69
94 Acknowledgments EXPERT project (EU Marie Curie ITN No ) Fundação para a Ciência e Tecnologia (FCT), through contracts UID/EEA/50008/2013 and UID/CEC/50021/2013 LearnBig project (PTDC/EEI-SII/7092/2014) GoLocal project (grant CMUPERI/TIC/0046/2014) Amazon Academic Research Awards program André Martins (Unbabel) Quality Estimation MTM, 31/8/17 67 / 69
95 References I Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. (2004). Confidence estimation for machine translation. In Proc. of the International Conference on Computational Linguistics, page 315. Cohen, W. W. and de Carvalho, V. R. (2005). Stacked sequential learning. In IJCAI. Junczys-Dowmunt, M. and Grundkiewicz, R. (2016). Log-linear combinations of monolingual and bilingual neural machine translation models for automatic post-editing. In Proceedings of the First Conference on Machine Translation, pages , Berlin, Germany. Association for Computational Linguistics. Kozlova, A., Shmatova, M., and Frolov, A. (2016). YSDA Participation in the WMT 16 Quality Estimation Shared Task. In Proceedings of the First Conference on Machine Translation, pages Kreutzer, J., Schamoni, S., and Riezler, S. (2015). Quality estimation from scratch (quetch): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages Martins, A. F. T., Das, D., Smith, N. A., and Xing, E. P. (2008). Stacking Dependency Parsers. In Proc. of Empirical Methods for Natural Language Processing. Simard, M., Ueffing, N., Isabelle, P., and Kuhn, R. (2007). Rule-based translation with statistical phrase-based post-editing. In Proceedings of the Second Workshop on Statistical Machine Translation, pages André Martins (Unbabel) Quality Estimation MTM, 31/8/17 68 / 69
96 References II Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages Specia, L., Shah, K., de Souza, J. G., and Cohn, T. (2013). QuEst - a translation quality estimation framework. In Proc. of the Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages Tezcan, A., Hoste, V., and Macken, L. (2016). Ugent-lt3 scate submission for wmt16 shared task on quality estimation. In Proceedings of the First Conference on Machine Translation, pages , Berlin, Germany. Association for Computational Linguistics. Ueffing, N. and Ney, H. (2007). Word-level confidence estimation for machine translation. Computational Linguistics, 33(1):9 40. André Martins (Unbabel) Quality Estimation MTM, 31/8/17 69 / 69
An efficient and user-friendly tool for machine translation quality estimation
An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk
More informationMarcello Federico MMT Srl / FBK Trento, Italy
Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationVis-Eval Metric Viewer: A Visualisation Tool for Inspecting and Evaluating Metric Scores of Machine Translation Output
Vis-Eval Metric Viewer: A Visualisation Tool for Inspecting and Evaluating Metric Scores of Machine Translation Output David Steele and Lucia Specia Department of Computer Science University of Sheffield
More informationA general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation Mikel L. Forcada Felipe Sánchez-Martínez Dept. de Llenguatges
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 100 OCTOBER
The Prague Bulletin of Mathematical Linguistics NUMBER 100 OCTOBER 2013 19 30 QuEst Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation Kashif Shah a, Eleftherios
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationSystem Combination Using Joint, Binarised Feature Vectors
System Combination Using Joint, Binarised Feature Vectors Christian F EDERMAN N 1 (1) DFKI GmbH, Language Technology Lab, Stuhlsatzenhausweg 3, D-6613 Saarbrücken, GERMANY cfedermann@dfki.de Abstract We
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationImproving the quality of a customized SMT system using shared training data. August 28, 2009
Improving the quality of a customized SMT system using shared training data Chris.Wendt@microsoft.com Will.Lewis@microsoft.com August 28, 2009 1 Overview Engine and Customization Basics Experiment Objective
More informationNeural Machine Translation In Linear Time
Neural Machine Translation In Linear Time Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu Presenter: SunMao sm4206 YuZheng yz2978 OVERVIEW
More informationLecture 14: Annotation
Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose
More informationThe HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10
The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationInstitute of Computational Linguistics Treatment of Markup in Statistical Machine Translation
Institute of Computational Linguistics Treatment of Markup in Statistical Machine Translation Master s Thesis Mathias Müller September 13, 2017 Motivation Researchers often give themselves the luxury of
More informationImproving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University
Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationDCU System Report on the WMT 2017 Multi-modal Machine Translation Task
DCU System Report on the WMT 2017 Multi-modal Machine Translation Task Iacer Calixto and Koel Dutta Chowdhury and Qun Liu {iacer.calixto,koel.chowdhury,qun.liu}@adaptcentre.ie Abstract We report experiments
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationIntegration of QuEst and Okapi
Integration of QuEst and Okapi Final Activity Report Participants: Gustavo Paetzold, Lucia Specia and Kashif Shah (Sheffield University) Yves Savourel (Okapi) 1. Introduction The proposed project aimed
More informationA Simple (?) Exercise: Predicting the Next Word
CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane
More informationCS395T Project 2: Shift-Reduce Parsing
CS395T Project 2: Shift-Reduce Parsing Due date: Tuesday, October 17 at 9:30am In this project you ll implement a shift-reduce parser. First you ll implement a greedy model, then you ll extend that model
More informationLign/CSE 256, Programming Assignment 1: Language Models
Lign/CSE 256, Programming Assignment 1: Language Models 16 January 2008 due 1 Feb 2008 1 Preliminaries First, make sure you can access the course materials. 1 The components are: ˆ code1.zip: the Java
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationBBN System Description for WMT10 System Combination Task
BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationThe Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013
The Perceptron Simon Šuster, University of Groningen Course Learning from data November 18, 2013 References Hal Daumé III: A Course in Machine Learning http://ciml.info Tom M. Mitchell: Machine Learning
More informationSentence-Level Confidence Estimation for MT
Sentence-Level Confidence Estimation for MT Lucia Specia Nicola Cancedda, Marc Dymetman, Craig Saunders - Xerox Marco Turchi, Nello Cristianini Univ. Bristol Zhuoran Wang, John Shawe-Taylor UCL Outline
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationAdversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University,
Adversarial Examples and Adversarial Training Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University, 2017-05-30 Overview What are adversarial examples? Why do they happen?
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationThe Goal of this Document. Where to Start?
A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce
More informationPython & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012
Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted
More informationVERIZON ACHIEVES BEST IN TEST AND WINS THE P3 MOBILE BENCHMARK USA
VERIZON ACHIEVES BEST IN TEST AND WINS THE P3 MOBILE BENCHMARK USA INTRODUCTION in test 10/2018 Mobile Benchmark USA The international benchmarking expert P3 has been testing the performance of cellular
More informationCorpus Acquisition from the Internet
Corpus Acquisition from the Internet Philipp Koehn partially based on slides from Christian Buck 26 October 2017 Big Data 1 For many language pairs, lots of text available. Text you read in your lifetime
More informationNeural Nets & Deep Learning
Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg
More informationPrototype of Silver Corpus Merging Framework
www.visceral.eu Prototype of Silver Corpus Merging Framework Deliverable number D3.3 Dissemination level Public Delivery data 30.4.2014 Status Authors Final Markus Krenn, Allan Hanbury, Georg Langs This
More informationCSCI 5417 Information Retrieval Systems. Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationSequence Prediction with Neural Segmental Models. Hao Tang
Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationCMU-UKA Syntax Augmented Machine Translation
Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues
More informationA General-Purpose Rule Extractor for SCFG-Based Machine Translation
A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationAlgorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationA Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions
A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614
More informationTALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño
TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and
More informationSeq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning
Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationQuEst++ Tutorial. Carolina Scarton and Lucia Specia January 19, 2016
QuEst++ Tutorial Carolina Scarton and Lucia Specia January 19, 2016 Abstract In this tutorial we present QuEst++, an open source framework for pipelined Translation Quality Estimation. QuEst++ is the newest
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationAuthor Verification: Exploring a Large set of Parameters using a Genetic Algorithm
Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Notebook for PAN at CLEF 2014 Erwan Moreau 1, Arun Jayapal 2, and Carl Vogel 3 1 moreaue@cs.tcd.ie 2 jayapala@cs.tcd.ie
More informationKyoto-NMT: a Neural Machine Translation implementation in Chainer
Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationCS 288: Statistical NLP Assignment 1: Language Modeling
CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms
More informationParsing with Dynamic Programming
CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationTransition-based dependency parsing
Transition-based dependency parsing Syntactic analysis (5LN455) 2014-12-18 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann Overview Arc-factored dependency parsing
More informationTTIC 31190: Natural Language Processing
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking
More informationProject 3 Q&A. Jonathan Krause
Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationA bit of theory: Algorithms
A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.
More informationExperimenting in MT: Moses Toolkit and Eman
Experimenting in MT: Moses Toolkit and Eman Aleš Tamchyna, Ondřej Bojar Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University, Prague Tue Sept 8, 2015 1 / 30
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationA Quick Guide to MaltParser Optimization
A Quick Guide to MaltParser Optimization Joakim Nivre Johan Hall 1 Introduction MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationDeepPose & Convolutional Pose Machines
DeepPose & Convolutional Pose Machines Main Concepts 1. 2. 3. 4. CNN with regressor head. Object Localization. Bigger to smaller or Smaller to bigger view. Deep Supervised learning to prevent Vanishing
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationD6.1.2: Second report on scientific evaluations
D6.1.2: Second report on scientific evaluations UPVLC, XEROX, JSI-K4A, RWTH, EML and DDS Distribution: Public translectures Transcription and Translation of Video Lectures ICT Project 287755 Deliverable
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationLinguistic Resources for Arabic Handwriting Recognition
Linguistic Resources for Arabic Handwriting Recognition Stephanie M. Strassel Linguistic Data Consortium 3600 Market Street, Suite 810 Philadelphia, PA 19104 USA strassel@ldc.upenn.edu Abstract MADCAT
More informationContext Encoding LSTM CS224N Course Project
Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing
More informationD5.6 - Evaluation Benchmarks
Th is document is part of the Project Machine Tr a n sla tion En h a n ced Com pu ter A ssisted Tr a n sla tion (Ma teca t ), fu nded by the 7th Framework Programme of th e Eu r opea n Com m ission th
More informationMetric Learning for Large Scale Image Classification:
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationSemantic Estimation for Texts in Software Engineering
Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More information