Reassessment of the Role of Phrase Extraction in PBSMT
|
|
- Georgiana Miranda Lewis
- 5 years ago
- Views:
Transcription
1 Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM Qin Gao LTI-CMU Stephan Vogel LTI-CMU Presented by: Nguyen Bach LTI-CMU MT Summit XII. Ottawa, Canada August 27,
2 Outline Introduction Analysis Setup Analysis I: Word Alignments Metrics Results Analysis II: Phrase Extraction Metrics Manual Evaluation Results Lessons Learned: Mind your gaps Experimental Results Conclusions 2
3 Word Alignment Beginning of SMT pipeline. Most subsequent steps based on WA. A lot of work to improve WA quality. Widely available Hand Alignments enabled discriminative approaches. Discriminative based on metrics such AER. 3
4 AER vs BLEU Fraser and Marcu, 2004 Evaluated correlation between BLEU and AER. Possible links => flaws. Variation of F-measure, uses a coefficient to modify balance between precision and recall. The optimal coefficient depends on the corpus. Vilar et al., 2004 Better BLEU scores can be obtained with "degraded" alignments. Mismatch between alignment and translation models. Support the use of AER. AER BLEU AER BLEU 4
5 AER vs BLEU Ayan and Dorr, 2004 Analyze the quality of the alignments and resulting phrase tables. Several types of alignments. Several lexical weightings CPER (Consistent Phrase Error Rate) 5
6 Our Objective Study the relationship between Word Alignment and Phrase Extraction Alignment characteristics => Phrase Table characteristics 6
7 Analysis Setup 7
8 Setup Alignments: Data: GIZA++ Chinese-English Symmetrized (growdiag-final) GIZA++ train (11M Sentences) Discriminative Phrase Extraction: phrase-extract (Och & Ney) DWA tune(500 Sentences) Test (200 Sentences) Max P Len= 7 8
9 Discriminative (Niehues & Vogel) CRF Based GIZA Uses GIZA++ byproducts as features Tuned towards AER lexicon Viterbi alignments fertilities Discriminative Word Aligner (DWA) 9
10 Continuous alignment matrix Binarized to different thresholds 10
11 Analysis I: Word Alignment 11
12 Word Alignment Metrics Qualitative: Quantitative: AER (F-measure) Number of links Precision Recall Number of unaligned words 12
13 Alignment quality results DWA alignments: higher threshold=>more precision Best AER from slightly more precise alignment (DWA-0.5) GIZA=> more precision than recall. SYM lower AER than GIZA. Precision/Recall AER Precision Recall Alignment quality DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 AER 30 GIZA S2T GIZA T2S SYM Alignment 13
14 Links Hand Align closer to (DWA-0.3) 100,000 90,000 Number of Links by alignment DWA aligns: high threshold=>fewer links Best AER (DWA-0.5) fewer links than HA 80,000 70,000 60,000 50,000 40,000 30,000 20,000 10, DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA number of links Alignment 14
15 Unaligned Words HA Source: closer to SYM, DWA-0.3 HA Target: closer to DWA-0.4, DWA-0.3 GIZA asymmetry DWA: higher threshold, more unalignments. DWA: lower threshold=> more proportion Chinese words unaligned. total number of unaligned words 30,000 25,000 20,000 15,000 10,000 5,000 0 DWA-0.1 DWA-0.2 Unaligned Source and Target Words Source Unaligned Target Unaligned DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignment 15
16 Word Alignment: Summary Diversity of balance precision/recall between alignments. Usually precision prevails over recall. Two ways of describing to an alignment: links and unaligned words. In next section, we'll observe the importance of such factors in the generation of phrase pairs. 16
17 Analysis II: Phrase Extraction 17
18 Metrics Quantitative: Qualitative: Number of phrases Manual Evaluation Singletons (unique entries) Phrase lengths Gaps (unaligned words inside phrase pair) 18
19 target What do we mean by gaps? Alignment matrix In this phrase pair: 1 gap on the target phrase source In this phrase pair: 3 gaps on the source phrase 19
20 Number of Phrases PT grows as our alignment gets sparser Related to unaligned words rather than number of links Number of Unaligned Words 30,000 25,000 20,000 15,000 10,000 5,000 Number of generated phrase pairs Phrases Unaligned Source Unaligned Target 900, , , , , , , , ,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T number of phrases 0 GIZA T2S SYM HA Alignments 20
21 Number of Phrases PT grows as our alignment gets sparser Related to unaligned words rather than number of links 100,000 90,000 80,000 70,000 60,000 Number of generated phrase pairs Phrases Links 900, , , , ,000 number of links 50,000 40,000 30,000 20,000 10, , , , , DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA number of phrases Alignments 21
22 Singletons Most of the phrase-pairs are singletons. Repeated phrase-pairs grow at a slower rate. 900, , , ,000 Generated phrase pairs Singletons Repeated 500, , , , ,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Number of phrase pairs Alignments 22
23 Singletons Most of the phrase-pairs are singletons. 30,000 Generated phrase pairs Repeated Repeated phrase-pairs grow at a slower rate. 25,000 20,000 15,000 10,000 5,000 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Number of phrase pairs Alignments 23
24 Phase Length As our PT grows, entries become longer and longer. Source phrase length distribution of generated phrase pairs Number of phrase pairs #7 #6 #5 #4 #3 #2 #1 0 DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignments 24
25 Gaps The gaps inside a phrase pair increases too. Gaps in generated phrase pairs The distribution of gaps in the generated phrases follows the distribution of unaligned words in the alignment Expected number of gaps in phrase pair DWA-0.1 exp. #source gaps exp. #target gaps DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 GIZA S2T GIZA T2S SYM HA Alignment 25
26 Human Evaluation of Phrase Pairs Setup: Native Chinese Speakers Each subject was asked whether a phrase pair was adequate No contextual information Included a noisy input 26
27 Sampling 27
28 Results HA w/o gaps yields better results than HA w/ gaps. DWA-0.1 very good (short phrase pairs) 100% 90% 80% 70% 8 24 Adequacy by Alignments DWA-0.5 not so great Random pairings are usually bad Test Cases (adequacy) 60% 50% 40% 30% 20% 10% 0% No Yes DWA-0.1 HA-nogap HA-gap DWA-0.2 DWA-0.4 DWA-0.3 DWA-0.7 DWA-0.6 DWA-0.5 SYM DWA-0.9 DWA-0.8 Random Alignments 28
29 Phrase Extraction: Summary 29
30 Lessons Learned: Mind your gaps 30
31 Taking into account GAPS Gaps inside phrase pairs have considerable impact on human perceived quality of phrase pair. Do they affect translation? Translation Experiment: Include gap count as a feature (similar to WC) Compare the performance of the different systems w/o the features 31
32 Setup Training GALE P3 Data Maximum sentence length 30 1 Million sentences (random) Tuning Test MT05 GALE DEV07-Blind 32
33 Tuning Baseline gets get better results 29 TUNE Over-fitting? DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU Baseline Unalign- Feat 33
34 Experimental Results Overall gains Web performs better (~2 BP) Best system shifts to a higher precision alignment (DWA-0.5 => DWA-0.6) Higher recall alignments without much change Baseline Unalign-Feat NewsWire DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU 34
35 Experimental Results Overall gains Web Web performs better (~2 BP) Best system shifts to a higher precision alignment (DWA-0.5 => DWA-0.6) Higher recall alignments without much change Baseline Unalign-Feat DWA-0.1 DWA-0.2 DWA-0.3 DWA-0.4 DWA-0.5 DWA-0.6 DWA-0.7 DWA-0.8 DWA-0.9 SYM BLEU 35
36 Conclusions We can describe an alignment by its quality and its structure (links, unaligned words). Unaligned words have an important role in phrase extraction (more than number links). The distribution of the gaps inside a phrase pair is related to the distribution of unaligned words in the alignment. Extracted phrase pairs with more gaps have lower human perceived quality. Taking into account the number of gaps in an extracted phrase pair as features achieved overall improvements. 36
37 Questions? 37
38 Adequacy of Phrase Pair by adequate we mean that a source phrase could be used as a translation of a target phrase in at least ONE situation, without loss of meaning 38
39 Experiment Variables Dependent: Adequacy (yes/no) Independent: System : HA-Gaps HA-No Gaps DWA { } Random SYM Random (noisy) August 27, 2009 Number of Source, Target gaps Evaluator 39
40 Results ANOVA System is significative Interaction SourceGaps*TargetGaps is significative Evaluator*System is not significative 40
Reassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language
More informationA Semi-supervised Word Alignment Algorithm with Partial Manual Alignments
A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh
More informationCMU-UKA Syntax Augmented Machine Translation
Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues
More informationBetter Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley
Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Word Alignment s Indonesia parliament court in arraigned speaker Word Alignment s Indonesia
More informationPower Mean Based Algorithm for Combining Multiple Alignment Tables
Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More informationAlignment Link Projection Using Transformation-Based Learning
Alignment Link Projection Using Transformation-Based Learning Necip Fazil Ayan, Bonnie J. Dorr and Christof Monz Department of Computer Science University of Maryland College Park, MD 20742 {nfa,bonnie,christof}@umiacs.umd.edu
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationTALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño
TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationA General-Purpose Rule Extractor for SCFG-Based Machine Translation
A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax
More informationSyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation
SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska
More informationThe HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10
The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationImproving Statistical Word Alignment with Ensemble Methods
Improving Statiical Word Alignment with Ensemble Methods Hua Wu and Haifeng Wang Toshiba (China) Research and Development Center, 5/F., Tower W2, Oriental Plaza, No.1, Ea Chang An Ave., Dong Cheng Dirict,
More informationThe QCN System for Egyptian Arabic to English Machine Translation. Presenter: Hassan Sajjad
The QCN System for Egyptian Arabic to English Machine Translation Presenter: Hassan Sajjad QCN Collaboration Ahmed El Kholy Wael Salloum Nizar Habash Ahmed Abdelali Nadir Durrani Francisco Guzmán Preslav
More informationAn Unsupervised Model for Joint Phrase Alignment and Extraction
An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University
More informationAlgorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique
More informationTraining a Super Model Look-Alike:Featuring Edit Distance, N-Gram Occurrence,and One Reference Translation p.1/40
Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation Eva Forsbom, Uppsala University evafo@stp.ling.uu.se MT Summit IX Workshop on Machine Translation
More informationFinding parallel texts on the web using cross-language information retrieval
Finding parallel texts on the web using cross-language information retrieval Achim Ruopp University of Washington, Seattle, WA 98195, USA achimr@u.washington.edu Fei Xia University of Washington Seattle,
More informationPackage word.alignment
Type Package Package word.alignment October 26, 2017 Title Computing Word Alignment Using IBM Model 1 (and Symmetrization) for a Given Parallel Corpus and Its Evaluation Version 1.0.7 Date 2017-10-10 Author
More informationOutline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder
GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationTuning Machine Translation Parameters with SPSA
Tuning Machine Translation Parameters with SPSA Patrik Lambert, Rafael E. Banchs TALP Research Center, Jordi Girona Salgado 1 3. 0803 Barcelona, Spain lambert,rbanchs@gps.tsc.upc.edu Abstract Most of statistical
More information3 Related Work. 2 Motivation. the alignment in either of the two directional alignments.
PACLIC 30 Proceedings Yet Another Symmetrical and Real-time Word Alignment Method: Hierarchical Sub-sentential Alignment using F-measure Hao Wang, Yves Lepage Graduate School of Information, Production
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationTraining and Evaluating Error Minimization Rules for Statistical Machine Translation
Training and Evaluating Error Minimization Rules for Statistical Machine Translation Ashish Venugopal School of Computer Science Carnegie Mellon University arv@andrew.cmu.edu Andreas Zollmann School of
More informationEstimating Human Pose in Images. Navraj Singh December 11, 2009
Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks
More informationExperimenting in MT: Moses Toolkit and Eman
Experimenting in MT: Moses Toolkit and Eman Aleš Tamchyna, Ondřej Bojar Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University, Prague Tue Sept 8, 2015 1 / 30
More informationRandom Restarts in Minimum Error Rate Training for Statistical Machine Translation
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com
More informationLarge-scale Word Alignment Using Soft Dependency Cohesion Constraints
Large-scale Word Alignment Using Soft Dependency Cohesion Constraints Zhiguo Wang and Chengqing Zong National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences {zgwang,
More informationImproving the quality of a customized SMT system using shared training data. August 28, 2009
Improving the quality of a customized SMT system using shared training data Chris.Wendt@microsoft.com Will.Lewis@microsoft.com August 28, 2009 1 Overview Engine and Customization Basics Experiment Objective
More informationAdvanced Search Algorithms
CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationiems Interactive Experiment Management System Final Report
iems Interactive Experiment Management System Final Report Pēteris Ņikiforovs Introduction Interactive Experiment Management System (Interactive EMS or iems) is an experiment management system with a graphical
More informationAre Unaligned Words Important for Machine Translation?
Are Unaligned Words Important for Machine Translation? Yuqi Zhang Evgeny Matusov Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH
More informationLoonyBin: Making Empirical MT Reproduciple, Efficient, and Less Annoying
LoonyBin: Making Empirical MT Reproduciple, Efficient, and Less Annoying Jonathan Clark Alon Lavie CMU Machine Translation Lunch Tuesday, January 19, 2009 Outline A (Brief) Guilt Trip What goes into a
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationBagging & System Combination for POS Tagging. Dan Jinguji Joshua T. Minor Ping Yu
Bagging & System Combination for POS Tagging Dan Jinguji Joshua T. Minor Ping Yu Bagging Bagging can gain substantially in accuracy The vital element is the instability of the learning algorithm Bagging
More informationA System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality
A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality Huỳnh Công Pháp 1 and Nguyễn Văn Bình 2 The University of Danang
More informationAligning English Strings with Abstract Meaning Representation Graphs
Aligning English Strings with Abstract Meaning Representation Graphs Nima Pourdamghani, Yang Gao, Ulf Hermjakob, Kevin Knight Information Sciences Institute Department of Computer Science University of
More informationDiscriminative Reranking for Machine Translation
Discriminative Reranking for Machine Translation Libin Shen Dept. of Comp. & Info. Science Univ. of Pennsylvania Philadelphia, PA 19104 libin@seas.upenn.edu Anoop Sarkar School of Comp. Science Simon Fraser
More informationThe GREYC Machine Translation System for the IWSLT 2008 campaign
The GREYC Machine Translation System for the IWSLT 2008 campaign Yves Lepage Adrien Lardilleux Julien Gosme Jean-Luc Manguin GREYC, University of Caen, France (GREYC@IWSLT 2008) 1 / 12 The system The system
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling
The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY 2009 17 26 Memory-Based Machine Translation and Language Modeling Antal van den Bosch, Peter Berck Abstract We describe a freely available
More informationLocal Phrase Reordering Models for Statistical Machine Translation
Local Phrase Reordering Models for Statistical Machine Translation Shankar Kumar, William Byrne Center for Language and Speech Processing, Johns Hopkins University, 3400 North Charles Street, Baltimore,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY
The Prague Bulletin of Mathematical Linguistics NUMBER 9 JANUARY 009 79 88 Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems Omar F. Zaidan Abstract
More informationCHAPTER 8 ANFIS MODELING OF FLANK WEAR 8.1 AISI M2 HSS TOOLS
CHAPTER 8 ANFIS MODELING OF FLANK WEAR 8.1 AISI M2 HSS TOOLS Control surface as shown in Figs. 8.1 8.3 gives the interdependency of input, and output parameters guided by the various rules in the given
More informationAn efficient and user-friendly tool for machine translation quality estimation
An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk
More informationStatic Pruning of Terms In Inverted Files
In Inverted Files Roi Blanco and Álvaro Barreiro IRLab University of A Corunna, Spain 29th European Conference on Information Retrieval, Rome, 2007 Motivation : to reduce inverted files size with lossy
More informationD6.1.2: Second report on scientific evaluations
D6.1.2: Second report on scientific evaluations UPVLC, XEROX, JSI-K4A, RWTH, EML and DDS Distribution: Public translectures Transcription and Translation of Video Lectures ICT Project 287755 Deliverable
More informationMMT Modern Machine Translation. First Evaluation Plan
This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No 645487. MMT Modern Machine Translation First Evaluation Plan Author(s):
More informationLeft-to-Right Tree-to-String Decoding with Prediction
Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State
More informationPacket-Level Diversity From Theory to Practice: An based Experimental Investigation
Packet-Level Diversity From Theory to Practice: An 802.11- based Experimental Investigation E. Vergetis, E. Pierce, M. Blanco and R. Guérin University of Pennsylvania Department of Electrical & Systems
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of
More informationLarge Scale Parallel Document Mining for Machine Translation
Large Scale Parallel Document Mining for Machine Translation Jakob Uszkoreit Jay M. Ponte Ashok C. Popat Moshe Dubiner Google, Inc. {uszkoreit,ponte,popat,moshe}@google.com Abstract A distributed system
More informationA Quantitative Analysis of Reordering Phenomena
A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street
More informationCombining PGMs and Discriminative Models for Upper Body Pose Detection
Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative
More informationSearch Engines Chapter 8 Evaluating Search Engines Felix Naumann
Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationBetter Evaluation for Grammatical Error Correction
Better Evaluation for Grammatical Error Correction Daniel Dahlmeier 1 and Hwee Tou Ng 1,2 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department of Computer Science, National University
More informationEvaluating Spoken Dialogue Systems. Julia Hirschberg CS /27/2011 1
Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706 4/27/2011 1 Dialogue System Evaluation Key point about SLP. Whenever we design a new algorithm or build a new application, need to evaluate it
More informationFast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Xinyan Xiao, Yang Liu, Qun Liu, and Shouxun Lin Key Laboratory of Intelligent Information Processing Institute of Computing
More informationStone Soup Translation
Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is
More informationLearning Tractable Word Alignment Models with Complex Constraints
University of Pennsylvania ScholarlyCommons Lab Papers (GRASP) General Robotics, Automation, Sensing and Perception Laboratory 3-10-2010 Learning Tractable Word Alignment Models with Complex Constraints
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationBetter translations with user collaboration - Integrated MT at Microsoft
Better s with user collaboration - Integrated MT at Microsoft Chris Wendt Microsoft Research One Microsoft Way Redmond, WA 98052 christw@microsoft.com Abstract This paper outlines the methodologies Microsoft
More informationIMAGE ACQUISTION AND PROCESSING FOR FINANCIAL DUE DILIGENCE
Technical Disclosure Commons Defensive Publications Series August 24, 2016 IMAGE ACQUISTION AND PROCESSING FOR FINANCIAL DUE DILIGENCE Matthew Wood Patrick Dunagan John Clark Kristina Bohl David Gonzalez
More informationStabilizing Minimum Error Rate Training
Stabilizing Minimum Error Rate Training George Foster and Roland Kuhn National Research Council Canada first.last@nrc.gc.ca Abstract The most commonly used method for training feature weights in statistical
More informationCorpus Acquisition from the Interwebs. Christian Buck, University of Edinburgh
Corpus Acquisition from the Interwebs Christian Buck, University of Edinburgh There is no data like more data (Bob Mercer, 1985) Mining Bilingual Text "Same text in different languages" Usually: one side
More informationCS 288: Statistical NLP Assignment 1: Language Modeling
CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms
More informationImproved Models of Distortion Cost for Statistical Machine Translation
Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel Galley, and Christopher D. Manning Computer Science Department Stanford University Stanford, CA 9435 {spenceg,mgalley,manning}@stanford.edu
More informationEXPERIMENTAL SETUP: MOSES
EXPERIMENTAL SETUP: MOSES Moses is a statistical machine translation system that allows us to automatically train translation models for any language pair. All we need is a collection of translated texts
More informationChallenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio
Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive
More informationFast Inference in Phrase Extraction Models with Belief Propagation
Fast Inference in Phrase Extraction Models with Belief Propagation David Burkett and Dan Klein Computer Science Division University of California, Berkeley {dburkett,klein}@cs.berkeley.edu Abstract Modeling
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationMoses version 2.1 Release Notes
Moses version 2.1 Release Notes Overview The document is the release notes for the Moses SMT toolkit, version 2.1. It describes the changes to toolkit since version 1.0 from January 2013. The aim during
More informationLearning Non-linear Features for Machine Translation Using Gradient Boosting Machines
Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University
More informationA Large Scale Ranker-Based System for Search Query Spelling Correction
A Large Scale Ranker-Based System for Search Query Spelling Correction Jianfeng Gao Microsoft Research Redmond, WA, USA jfgao@microsoft.com Xiaolong Li Microsoft Corporation Redmond, WA, USA xiaolong.li@microsoft.com
More informationParallel Performance and Optimization
Parallel Performance and Optimization Gregory G. Howes Department of Physics and Astronomy University of Iowa Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa 25-26 August
More informationHierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications
More informationBBN System Description for WMT10 System Combination Task
BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,
More informationStatistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh
More informationBLANC: Learning Evaluation Metrics for MT
BLANC: Learning Evaluation Metrics for MT Lucian Vlad Lita and Monica Rogati and Alon Lavie Carnegie Mellon University {llita,mrogati,alavie}@cs.cmu.edu Abstract We introduce BLANC, a family of dynamic,
More informationTsinghuaAligner: A Statistical Bilingual Word Alignment System
TsinghuaAligner: A Statistical Bilingual Word Alignment System Yang Liu Tsinghua University liuyang2011@tsinghua.edu.cn April 22, 2015 1 Introduction Word alignment is a natural language task that aims
More informationA Document Graph Based Query Focused Multi- Document Summarizer
A Document Graph Based Query Focused Multi- Document Summarizer By Sibabrata Paladhi and Dr. Sivaji Bandyopadhyay Department of Computer Science and Engineering Jadavpur University Jadavpur, Kolkata India
More informationFeatures for Syntax-based Moses: Project Report
Features for Syntax-based Moses: Project Report Eleftherios Avramidis, Arefeh Kazemi, Tom Vanallemeersch, Phil Williams, Rico Sennrich, Maria Nadejde, Matthias Huck 13 September 2014 Features for Syntax-based
More informationJuggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets
Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April
More informationMarcello Federico MMT Srl / FBK Trento, Italy
Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation
More informationStart downloading our playground for SMT: wget
Before we start... Start downloading Moses: wget http://ufal.mff.cuni.cz/~tamchyna/mosesgiza.64bit.tar.gz Start downloading our playground for SMT: wget http://ufal.mff.cuni.cz/eman/download/playground.tar
More informationLearning to Reweight Terms with Distributed Representations
Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215 Outline Goal: Assign weights to query terms for better retrieval results
More informationHomework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday
Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model
More informationOn LM Heuristics for the Cube Growing Algorithm
On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current
More informationLarge-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search
Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search Jeffrey Flanigan Chris Dyer Jaime Carbonell Language Technologies Institute Carnegie Mellon University
More informationTo make sense of data, you can start by answering the following questions:
Taken from the Introductory Biology 1, 181 lab manual, Biological Sciences, Copyright NCSU (with appreciation to Dr. Miriam Ferzli--author of this appendix of the lab manual). Appendix : Understanding
More informationOCR and Automated Translation for the Navigation of non-english Handsets: A Feasibility Study with Arabic
OCR and Automated Translation for the Navigation of non-english Handsets: A Feasibility Study with Arabic Jennifer Biggs and Michael Broughton Defence Science and Technology Organisation Edinburgh, South
More information3 Baseline Systems. LangPair Train Dev DevTest Test JPC-CJ 1,000,000 2,000 2,000 2,000 JPC-KJ 1,000,000 2,000 2,000 2,000
Overview of the 2nd Workshop on Asian Translation Toshiaki Nakazawa Japan Science and Technology Agency nakazawa@pa.jst.jp Hideya Mino National Institute of Information and Communications Technology hideya.mino@nict.go.jp
More informationJane: User s Manual. David Vilar, Daniel Stein, Matthias Huck, Joern Wuebker, Markus Freitag, Stephan Peitz, Malte Nuhn, Jan-Thorsten Peter
Jane: User s Manual David Vilar, Daniel Stein, Matthias Huck, Joern Wuebker, Markus Freitag, Stephan Peitz, Malte Nuhn, Jan-Thorsten Peter February 4, 2013 Contents 1 Introduction 3 2 Installation 5 2.1
More information