Refactored Moses. Hieu Hoang MT Marathon Prague August 2013

Size: px
Start display at page:

Download "Refactored Moses. Hieu Hoang MT Marathon Prague August 2013"

Transcription

1 Refactored Moses Hieu Hoang MT Marathon Prague August 2013

2 What did you Refactor? Decoder Feature Func?on Framework moses.ini format Simplify class structure Deleted func?onality NOT decoding algorithms Tuning NOT tuning algorithm EMS (Experiment Management System)

3 Why did you Refactor? Feature Func?on Framework easier to implement new features use sparse features Simplify class structure easier to develop with Moses Delete func?onality easier to refactor code very linle dele?on

4 Why did you Refactor? Feature Func?on Framework easier to implement new features use sparse features Simplify class structure easier to develop with Moses Delete func?onality easier to refactor code very linle dele?on

5 Why did you Refactor? Feature Func?on Framework easier to implement new features use sparse features Simplify class structure easier to develop with Moses Delete func?onality easier to refactor code very linle dele?on

6 Language Model IN THE BEGINNING LanguageModel LanguageModelSingleFactor LanguageModelMul?Factor SRI IRST KEN

7 Language Model THEN. LM LMImplementa?on LMRefCount LMSingleFactor LMMul?Factor KEN LMPointerState SRI IRST

8 Language Model NOW LM LMImplementa?on KEN LMSingleFactor LMMul?Factor SRI IRST

9 Phrase Tables IN THE BEGINNING PhraseDic?onary Memory TreeAdaptor SCFG OnDisk

10 Phrase Tables THEN. PhraseDic?onary PDFeature Memory TreeAdaptor RuleTableTrie OnDisk Compact SCFG RuleTableUTrie

11 Phrase Tables NOW PhraseDic?onary TreeAdaptor RuleTableTrie OnDisk Compact Memory RuleTableUTrie

12 Why did you Refactor? Feature Func?on Framework easier to implement new features use sparse features Simplify class structure easier to develop with Moses Delete func?onality easier to refactor code very linle dele?on

13 Deleted 1. Transla?on Systems mul?ple engines in 1 decoding process replaced with alterna?ve weights func?on 2. Distributed Language Model send LM queries to different machines replace with mul?pass/asynchronous decoding? 3. Con?nue Par?al Transla?on start from non- empty hypothesis

14 Why did you Refactor? Feature Func?on Framework easier to implement new features use sparse features Simplify class structure easier to develop with Moses Delete func?onality easier to refactor code very linle dele?on

15 Adding new Feature Func?on THEN. Add entry to Parameter class Read parameter in Sta?cData add variable to hold new feature load feature Bespoke ini file: [new- feature- sec?on] file.name [weight- new- feature]

16 Adding new Feature Func?on Sta?cData Now. Simple framework to load feature ini file: [feature] WordPenalty Distor?on NEW- FEATURE file=path factor=0 [weight] WordPenalty0= Distor?on0= 0.14 NEW- FEATURE0= 0.54

17 MERT THEN. Only know about certain feature func?ons Hardcoded array = qw(d=weight- d lm=weight- l tm=weight- t w=weight- w g=weight- genera?on lex=weight- lex I=weight- i dlm=weight- dlm pp=weight- pp wt=weight- wt pb=weight- pb lex=weight- lex glm=weight- glm); - requires feature name and abbrevia?on

18 MERT THEN. Only know about certain feature func?ons Hardcoded array = qw(d=weight- d lm=weight- l tm=weight- t w=weight- w g=weight- genera?on lex=weight- lex I=weight- i dlm=weight- dlm pp=weight- pp wt=weight- wt pb=weight- pb lex=weight- lex glm=weight- glm); - requires feature name and abbrevia?on Deleted array Ask decoder for feature func?on abbrevia?ons deleted

19 Timeline of a Transla?on Rule File Load Memory Apply to input sentence Transla?on Op?on Hypothesis Search

20 Timeline of a Transla?on Rule File Load Source phrase Target phrase Memory Apply to input sentence Input sentence Input path Transla?on Op?on Search Transla?on context Segmenta?on Hypothesis

21 Timeline of a Transla?on Rule File Load Once Memory Apply to input sentence Per occurrence in sentence Transla?on Op?on Search Per hypothesis Hypothesis

22 File Feature Func?on API Loading Xà je suis X 1 I am X 1 Source phrase: je suis X Access to: 1 Target phrase: I am X 1 void Evaluate(source,!! target,! scores,!! estimated future scores)! Memory Feature func?ons that use this: Word Penalty Phrase penalty Language model (par?al)

23 Memory Feature Func?on API Apply to input sentence Input: je PRO suis VB 25 NUM ans NNS.. Input subphrase: je PRO suis VB 25 NUM Access to: Input scores: void Evaluate(input,!!! input path,! scores)! Transla?on Op?on Feature func?ons that uses this: Input feature Bag- of- word features.

24 Transla?on Op?on Feature Func?on API Hypothesis: Search enumerated tuples defines the search Access to: Current hypothesis Previous hypotheses Segmenta?on Hypothesis Stateless features: void Evaluate(hypo, scores)! void EvaluateChart(hypo, scores)!! Stateful features: state Evaluate(hypo,!!! previous state,!!! scores)! state EvaluateChart(hypo!!!! previous state,!!!! scores)!

25 Transla?on Op?on Hypothesis Feature Func?on API Decoding Feature func?ons that uses this: All stateful features Language models Distor?on model Lexicalized distor?on Some stateless features Global lexicalized model Word transla?on

26 Feature Func?on Loading: Apply to Input: Search: void Evaluate(source,!! target,! scores,!! estimated future scores)! void Evaluate(input,!!! input path,! scores)! Stateless features: void Evaluate(hypo, scores)! void EvaluateChart(hypo, scores)!! Stateful features: state Evaluate(hypo,!!! previous state,!!! scores)! state EvaluateChart(hypo!!!! previous state,!!!! scores)!

27 Strange Features func?ons (1) Language model implement 2 Evaluate() 1. Loading evaluate full n- grams es?mate future cost par?al n- grams 2. Decoding reprise de la session resump?on of the session evaluate overlapping n- grams

28 Strange Feature Func?ons (2) Phrase- tables Unknown Word Penalty Genera?on Model integral part of decoding process Uses no Evaluate() scores assign by decoder

29 Register a Feature Func?on Register in moses/ff/factory.cpp add entry MOSES_FNAME(ClassName);

30 Language Model Inherit from LanguageModelSingleFactor class LanguageModelSingleFactor :! {! LMResult GetValue(!!const std::vector<const Word*> &contextfactor,!state* finalstate = NULL) const = 0;! }!

31 Phrase Table Inherit from PhraseDic?onary Legacy API: class PhraseDictionary :...! {! public:! TargetPhraseCollection *GetTargetPhraseCollectionLEGACY(!!!Phrase &src) const;!! protected:! const TargetPhraseCollection!*GetTargetPhraseCollectionNonCacheLEGACY(Phrase &src);!! }!

32 Phrase Table New API: class PhraseDictionary :...! {! public:! void GetTargetPhraseCollectionBatch(InputPathList &);! }! InputPathList Input Sentence: Input Path List: je suis 25 ans. je je suis je suis 25 suis suis 25

33 Phrase- table Op?miza?on Input Sentence: je suis 25 ans. Input Path List: je je suis je suis 25 suis suis suis Phrase- table trie 0 je suis

34 Phrase- table Lalce Input Input Path List: einen einen wenbewerb einen wenbewerbs einen wenbewerbsbedingten einen wenbewerb bedingten einen wenbewerbs bedingten einen wenbewerb bedingten preissturz einen wenbewerbs bedingten preis einen wenbewerbsbedingten preissturz einen wenbewerbsbedingten preis.

35 New API: Ac?ve Parsing CKY+ Phrase Table Syntax Decoding class PhraseDictionary :...! {! virtual ChartRuleLookupManager *CreateRuleLookupManager(! const ChartParser &,! const ChartCellCollectionBase &) = 0;! }! class ChartRuleLookupManager! {! void GetChartRuleCollection(! const WordsRange &range,! ChartParserCallback &outcoll) = 0;! }!

36 Results Lalce decoding use all phrase- table implementa?ons Syntax model (nearly ) One in- memory phrase- table implementa?on One on- disk phrase- table Syntax models AND phrase- based model Zen s binary implementa?on phrase- based only deprecated to be removed

37 Pass regression tests tests have also changed BLEU unchanged subject to random variability Results Transla?on Quality # Description v th July # Description v th July En-es 1 pb De-en 1 pb hiero hiero (1) + CreateOnDiskPt (1) + POS de Es-en 1 pb (3) + POS en hiero (1) + CreateOnDiskPt (1) + CreateOnDiskPt (3) + CreateOnDiskPt 15.7 En-cs 1 pb (4) + CreateOnDiskPt hiero En-fr 1 pb truecase (1) + placeholders pb recase (1) + CreateOnDiskPt (2) + hiero (2) + Ken's incre algo (2) + POS en Cs-en 1 pb (2) + kbmira hiero (2) + pro (1) + placeholders (2) + CreateOnDiskPt (1) + CreateOnDiskPt Fr-en 1 pb truecase (2) + Ken's incre algo pb recased En-de 1 pb (2) + hiero hiero (2) + factors (1) + POS de (2) + kbmira (3) + POS en (2) + pro (1) + CreateOnDiskPt (2) + CreateOnDiskPt 22.46

38 Phrase- based Results Speed speed (sec) speed (sec) 1 v master + caching 1, % With compact pt: 3 v % 4 master (with caching) % Hierarchical Pop Limit Loading v 1.0 1,044 1,393 2,620 4, th Aug Reduction (excl load) -65% -26% -21% -18%

39 Comparison with the decoders

Moses version 2.1 Release Notes

Moses version 2.1 Release Notes Moses version 2.1 Release Notes Overview The document is the release notes for the Moses SMT toolkit, version 2.1. It describes the changes to toolkit since version 1.0 from January 2013. The aim during

More information

Moses Version 3.0 Release Notes

Moses Version 3.0 Release Notes Moses Version 3.0 Release Notes Overview The document contains the release notes for the Moses SMT toolkit, version 3.0. It describes the changes to the toolkit since version 2.0 from January 2014. In

More information

MOSES. Statistical Machine Translation System. User Manual and Code Guide. Philipp Koehn University of Edinburgh

MOSES. Statistical Machine Translation System. User Manual and Code Guide. Philipp Koehn University of Edinburgh MOSES Statistical Machine Translation System User Manual and Code Guide Philipp Koehn pkoehn@inf.ed.ac.uk University of Edinburgh Abstract This document serves as user manual and code guide for the Moses

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009

More information

Deliverable D1.1. Moses Specification. Work Package: WP1: Moses Coordination and Integration Lead Author: Barry Haddow Due Date: May 1st, 2012

Deliverable D1.1. Moses Specification. Work Package: WP1: Moses Coordination and Integration Lead Author: Barry Haddow Due Date: May 1st, 2012 MOSES CORE Deliverable D1.1 Moses Specification Work Package: WP1: Moses Coordination and Integration Lead Author: Barry Haddow Due Date: May 1st, 2012 August 15, 2013 2 Note This document was extracted

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER An Experimental Management System. Philipp Koehn

The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER An Experimental Management System. Philipp Koehn The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER 2010 87 96 An Experimental Management System Philipp Koehn University of Edinburgh Abstract We describe Experiment.perl, an experimental

More information

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A General-Purpose Rule Extractor for SCFG-Based Machine Translation A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search

More information

Features for Syntax-based Moses: Project Report

Features for Syntax-based Moses: Project Report Features for Syntax-based Moses: Project Report Eleftherios Avramidis, Arefeh Kazemi, Tom Vanallemeersch, Phil Williams, Rico Sennrich, Maria Nadejde, Matthias Huck 13 September 2014 Features for Syntax-based

More information

MOSES. Statistical Machine Translation System. User Manual and Code Guide. Philipp Koehn University of Edinburgh

MOSES. Statistical Machine Translation System. User Manual and Code Guide. Philipp Koehn University of Edinburgh MOSES Statistical Machine Translation System User Manual and Code Guide Philipp Koehn pkoehn@inf.ed.ac.uk University of Edinburgh Abstract This document serves as user manual and code guide for the Moses

More information

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications

More information

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique

More information

Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley

Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley Word Alignment s Indonesia parliament court in arraigned speaker Word Alignment s Indonesia

More information

iems Interactive Experiment Management System Final Report

iems Interactive Experiment Management System Final Report iems Interactive Experiment Management System Final Report Pēteris Ņikiforovs Introduction Interactive Experiment Management System (Interactive EMS or iems) is an experiment management system with a graphical

More information

KenLM: Faster and Smaller Language Model Queries

KenLM: Faster and Smaller Language Model Queries KenLM: Faster and Smaller Language Model Queries Kenneth heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm What KenLM Does Answer language model queries using less time and memory.

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

K-best Parsing Algorithms

K-best Parsing Algorithms K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute) k-best Parsing Liang Huang (Penn) k-best parsing 2 k-best Parsing I saw

More information

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)

More information

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY The Prague Bulletin of Mathematical Linguistics NUMBER 9 JANUARY 009 79 88 Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems Omar F. Zaidan Abstract

More information

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614

More information

CS 406: Syntax Directed Translation

CS 406: Syntax Directed Translation CS 406: Syntax Directed Translation Stefan D. Bruda Winter 2015 SYNTAX DIRECTED TRANSLATION Syntax-directed translation the source language translation is completely driven by the parser The parsing process

More information

Apply. A. Michelle Lawing Ecosystem Science and Management Texas A&M University College Sta,on, TX

Apply. A. Michelle Lawing Ecosystem Science and Management Texas A&M University College Sta,on, TX Apply A. Michelle Lawing Ecosystem Science and Management Texas A&M University College Sta,on, TX 77843 alawing@tamu.edu Schedule for today My presenta,on Review New stuff Mixed, Fixed, and Random Models

More information

A Quantitative Analysis of Reordering Phenomena

A Quantitative Analysis of Reordering Phenomena A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 8: Paging Ding Yuan Lecture Overview Today we ll cover more paging mechanisms: Op&miza&ons Managing page tables (space) Efficient transla&ons (TLBs) (&me) Demand paged virtual

More information

Machine Translation Zoo

Machine Translation Zoo Machine Translation Zoo Tree-to-tree transfer and Discriminative learning Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague May 5th 2013, Seminar of Formal Linguistics,

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually

More information

RegMT System for Machine Translation, System Combination, and Evaluation

RegMT System for Machine Translation, System Combination, and Evaluation RegMT System for Machine Translation, System Combination, and Evaluation Ergun Biçici Koç University 34450 Sariyer, Istanbul, Turkey ebicici@ku.edu.tr Deniz Yuret Koç University 34450 Sariyer, Istanbul,

More information

D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version)

D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version) www.kconnect.eu D1.6 Toolkit and Report for Translator Adaptation to New Languages (Final Version) Deliverable number D1.6 Dissemination level Public Delivery date 16 September 2016 Status Author(s) Final

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

LoonyBin: Making Empirical MT Reproduciple, Efficient, and Less Annoying

LoonyBin: Making Empirical MT Reproduciple, Efficient, and Less Annoying LoonyBin: Making Empirical MT Reproduciple, Efficient, and Less Annoying Jonathan Clark Alon Lavie CMU Machine Translation Lunch Tuesday, January 19, 2009 Outline A (Brief) Guilt Trip What goes into a

More information

Experimenting in MT: Moses Toolkit and Eman

Experimenting in MT: Moses Toolkit and Eman Experimenting in MT: Moses Toolkit and Eman Aleš Tamchyna, Ondřej Bojar Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University, Prague Tue Sept 8, 2015 1 / 30

More information

Transla'on, Protec'on, and Virtual Memory. 2/25/16 CS 152 Sec'on 6 Colin Schmidt

Transla'on, Protec'on, and Virtual Memory. 2/25/16 CS 152 Sec'on 6 Colin Schmidt Transla'on, Protec'on, and Virtual Memory 2/25/16 CS 152 Sec'on 6 Colin Schmidt Agenda Protec'on Transla'on Virtual Memory Lab 1 Feedback Ques'ons/Open-ended discussion Hand back Lab 1 Protec'on Why? Supervisor

More information

Diagnostic evaluation of MT with DELiC4MT

Diagnostic evaluation of MT with DELiC4MT Diagnostic evaluation of MT with DELiC4MT Final report MT Marathon 2012 Edinburgh, 5th September 2012 Walid Aransa, Luong Ngoc Quang, Antonio Toral Overview Automatic Diagonistic Evaluation on Linguistic

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Ways to implement a language

Ways to implement a language Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Construc9on Sort- based indexing Blocked Sort- Based Indexing

More information

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF) Chapter 3: Describing Syntax and Semantics Introduction Formal methods of describing syntax (BNF) We can analyze syntax of a computer program on two levels: 1. Lexical level 2. Syntactic level Lexical

More information

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e CS-3160 Concepts of Programming Languages Spring 2015 EXAM #1 (Chapters 1-6) Name: SCORES MC: /75 PROB #1: /15 PROB #2: /10 TOTAL: /100 Multiple Choice Responses Each multiple choice question in the separate

More information

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL Compiler Construction Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 98 OCTOBER

The Prague Bulletin of Mathematical Linguistics NUMBER 98 OCTOBER The Prague Bulletin of Mathematical Linguistics NUMBER 98 OCTOBER 2012 63 74 Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression Marcin Junczys-Dowmunt

More information

EXPERIMENTAL SETUP: MOSES

EXPERIMENTAL SETUP: MOSES EXPERIMENTAL SETUP: MOSES Moses is a statistical machine translation system that allows us to automatically train translation models for any language pair. All we need is a collection of translated texts

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Language Support for Stream. Processing. Robert Soulé

Language Support for Stream. Processing. Robert Soulé Language Support for Stream Processing Robert Soulé Introduction Proposal Work Plan Outline Formalism Translators & Abstractions Optimizations Related Work Conclusion 2 Introduction Stream processing is

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Compression Collec9on and vocabulary sta9s9cs: Heaps and

More information

Mul$ple Upstream Interface Support for IGMP/MLD Proxy

Mul$ple Upstream Interface Support for IGMP/MLD Proxy 93 rd IETF, July 2015, Prague, Czech Republic Mul$ple Upstream Interface Support for dra

More information

A Smorgasbord of Features to Combine Phrase-Based and Neural Machine Translation

A Smorgasbord of Features to Combine Phrase-Based and Neural Machine Translation A Smorgasbord of Features to Combine Phrase-Based and Neural Machine Translation Benjamin Marie Atsushi Fujita National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho,

More information

What were his cri+cisms? Classical Methodologies:

What were his cri+cisms? Classical Methodologies: 1 2 Classifica+on In this scheme there are several methodologies, such as Process- oriented, Blended, Object Oriented, Rapid development, People oriented and Organisa+onal oriented. According to David

More information

CS 288: Statistical NLP Assignment 1: Language Modeling

CS 288: Statistical NLP Assignment 1: Language Modeling CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms

More information

Introduc)on to. CS60092: Informa0on Retrieval

Introduc)on to. CS60092: Informa0on Retrieval Introduc)on to CS60092: Informa0on Retrieval Ch. 4 Index construc)on How do we construct an index? What strategies can we use with limited main memory? Sec. 4.1 Hardware basics Many design decisions in

More information

An Unsupervised Model for Joint Phrase Alignment and Extraction

An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University

More information

Compila(on /15a Lecture 6. Seman(c Analysis Noam Rinetzky

Compila(on /15a Lecture 6. Seman(c Analysis Noam Rinetzky Compila(on 0368-3133 2014/15a Lecture 6 Seman(c Analysis Noam Rinetzky 1 You are here Source text txt Process text input characters Lexical Analysis tokens Annotated AST Syntax Analysis AST Seman(c Analysis

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses PBML logo The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY 2009 1 11 Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow, Jean-Baptiste Fouet Abstract We describe

More information

: Advanced Compiler Design. 8.0 Instruc?on scheduling

: Advanced Compiler Design. 8.0 Instruc?on scheduling 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.

More information

Machine Translation and Discriminative Models

Machine Translation and Discriminative Models Machine Translation and Discriminative Models Tree-to-tree transfer and Discriminative learning Martin Popel ÚFAL (Institute of Formal and Applied Linguistics) Charles University in Prague March 23rd 2015,

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Previously: Indexing Process Query Process Informa.on Needs An informa(on need is the underlying

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Left-to-Right Tree-to-String Decoding with Prediction

Left-to-Right Tree-to-String Decoding with Prediction Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Forest-Based Search Algorithms

Forest-Based Search Algorithms Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2

More information

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model

More information

Forest Reranking for Machine Translation with the Perceptron Algorithm

Forest Reranking for Machine Translation with the Perceptron Algorithm Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,

More information

Introduc)on to Probabilis)c Latent Seman)c Analysis. NYP Predic)ve Analy)cs Meetup June 10, 2010

Introduc)on to Probabilis)c Latent Seman)c Analysis. NYP Predic)ve Analy)cs Meetup June 10, 2010 Introduc)on to Probabilis)c Latent Seman)c Analysis NYP Predic)ve Analy)cs Meetup June 10, 2010 PLSA A type of latent variable model with observed count data and nominal latent variable(s). Despite the

More information

Jane: User s Manual. David Vilar, Daniel Stein, Matthias Huck, Joern Wuebker, Markus Freitag, Stephan Peitz, Malte Nuhn, Jan-Thorsten Peter

Jane: User s Manual. David Vilar, Daniel Stein, Matthias Huck, Joern Wuebker, Markus Freitag, Stephan Peitz, Malte Nuhn, Jan-Thorsten Peter Jane: User s Manual David Vilar, Daniel Stein, Matthias Huck, Joern Wuebker, Markus Freitag, Stephan Peitz, Malte Nuhn, Jan-Thorsten Peter February 4, 2013 Contents 1 Introduction 3 2 Installation 5 2.1

More information

Problem Score Max Score 1 Syntax directed translation & type

Problem Score Max Score 1 Syntax directed translation & type CMSC430 Spring 2014 Midterm 2 Name Instructions You have 75 minutes for to take this exam. This exam has a total of 100 points. An average of 45 seconds per point. This is a closed book exam. No notes

More information

Related Course Objec6ves

Related Course Objec6ves Syntax 9/18/17 1 Related Course Objec6ves Develop grammars and parsers of programming languages 9/18/17 2 Syntax And Seman6cs Programming language syntax: how programs look, their form and structure Syntax

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts 5-23 / 8-23: Introduc=on to Computer Systems 6 th Lecture, Mar. 8, 24 Instructors: Anthony Rowe, Seth Goldstein, and Gregory Kesden Today VM Movaon and Address spaces ) VM as a

More information

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University

More information

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC RPC and Clocks Tom Anderson Go Synchroniza>on RPC Lab 1 RPC Last Time 1 Topics MapReduce Fault tolerance Discussion RPC At least once At most once Exactly once Lamport Clocks Mo>va>on MapReduce Fault Tolerance

More information

Physics 120B: Lecture 11. Timers and Scheduled Interrupts

Physics 120B: Lecture 11. Timers and Scheduled Interrupts Physics 120B: Lecture 11 Timers and Scheduled Interrupts Timer Basics The ATMega 328 has three @mers available to it (Arduino Mega has 6) max frequency of each is 16 MHz, on Arduino TIMER0 is an 8- bit

More information

Start downloading our playground for SMT: wget

Start downloading our playground for SMT: wget Before we start... Start downloading Moses: wget http://ufal.mff.cuni.cz/~tamchyna/mosesgiza.64bit.tar.gz Start downloading our playground for SMT: wget http://ufal.mff.cuni.cz/eman/download/playground.tar

More information

Marcello Federico MMT Srl / FBK Trento, Italy

Marcello Federico MMT Srl / FBK Trento, Italy Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation

More information

Bundle Protocol Specifica2on 22 July 2015

Bundle Protocol Specifica2on 22 July 2015 Bundle Protocol Specifica2on 22 July 2015 Sco$ Burleigh Jet Propulsion Laboratory California Ins;tute of Technology 23 July 2015 This research was carried out at the, under a contract with the National

More information

Robust Identification of Fuzzy Duplicates

Robust Identification of Fuzzy Duplicates Robust Identification of Fuzzy Duplicates ì Authors: Surajit Chaudhuri (Microso3 Research) Venkatesh Gan; (Microso3 Research) Rajeev Motwani (Stanford University) Publica;on: 21 st Interna;onal Conference

More information

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar 1 and Wolfgang Macherey 1 and Chris Dyer 2 and Franz Och 1 1 Google Inc. 1600

More information

Neural Networks: Learning. Cost func5on. Machine Learning

Neural Networks: Learning. Cost func5on. Machine Learning Neural Networks: Learning Cost func5on Machine Learning Neural Network (Classifica2on) total no. of layers in network no. of units (not coun5ng bias unit) in layer Layer 1 Layer 2 Layer 3 Layer 4 Binary

More information

Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management. Full- Text Indexing

Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management. Full- Text Indexing Ges$one Avanzata dell Informazione Part A Full- Text Informa$on Management Full- Text Indexing Contents } Introduction } Inverted Indices } Construction } Searching 2 GAvI - Full- Text Informa$on Management:

More information

Approximate Large Margin Methods for Structured Prediction

Approximate Large Margin Methods for Structured Prediction : Approximate Large Margin Methods for Structured Prediction Hal Daumé III and Daniel Marcu Information Sciences Institute University of Southern California {hdaume,marcu}@isi.edu Slide 1 Structured Prediction

More information

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science

More information

Grammar Rules in Prolog!!

Grammar Rules in Prolog!! Grammar Rules in Prolog GR-1 Backus-Naur Form (BNF) BNF is a common grammar used to define programming languages» Developed in the late 1950 s Because grammars are used to describe a language they are

More information

CS395T Visual Recogni5on and Search. Gautam S. Muralidhar

CS395T Visual Recogni5on and Search. Gautam S. Muralidhar CS395T Visual Recogni5on and Search Gautam S. Muralidhar Today s Theme Unsupervised discovery of images Main mo5va5on behind unsupervised discovery is that supervision is expensive Common tasks include

More information

Machine Translation in Academia and in the Commercial World: a Contrastive Perspective

Machine Translation in Academia and in the Commercial World: a Contrastive Perspective Machine Translation in Academia and in the Commercial World: a Contrastive Perspective Alon Lavie Research Professor LTI, Carnegie Mellon University Co-founder, President and CTO Safaba Translation Solutions

More information

CS106X Handout 35 Winter 2018 March 12 th, 2018 CS106X Midterm Examination

CS106X Handout 35 Winter 2018 March 12 th, 2018 CS106X Midterm Examination CS106X Handout 35 Winter 2018 March 12 th, 2018 CS106X Midterm Examination This is an open-book, open-note, closed-electronic-device exam. You needn t write #includes, and you may (and you re even encouraged

More information

Thinking Induc,vely. COS 326 David Walker Princeton University

Thinking Induc,vely. COS 326 David Walker Princeton University Thinking Induc,vely COS 326 David Walker Princeton University slides copyright 2017 David Walker permission granted to reuse these slides for non-commercial educa,onal purposes Administra,on 2 Assignment

More information

CSE 12 Spring 2016 Week One, Lecture Two

CSE 12 Spring 2016 Week One, Lecture Two CSE 12 Spring 2016 Week One, Lecture Two Homework One and Two: hw2: Discuss in section today - Introduction to C - Review of basic programming principles - Building from fgetc and fputc - Input and output

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts Instructor: Dr. Hyunyoung Lee Based on slides provided by Randy Bryant and Dave O Hallaron Today Address spaces VM as a tool for caching VM as a tool for memory management VM as

More information

Coveo Platform 7.0. Alfresco One Connector Guide

Coveo Platform 7.0. Alfresco One Connector Guide Coveo Platform 7.0 Alfresco One Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing market

More information

OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System

OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System To be presented at IceTAL (Reykjavík, 16-18 Aug. 2010) OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System Sandipan Dandapat, Mikel L. Forcada, Declan Groves, Sergio Penkale,

More information

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 software development simplified RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 Eric Westfall - Indiana University JASIG 2011 For those who don t know Kuali Rice consists of mul8ple sub-

More information

Better Synchronous Binarization for Machine Translation

Better Synchronous Binarization for Machine Translation Better Synchronous Binarization for Machine Translation Tong Xiao *, Mu Li +, Dongdong Zhang +, Jingbo Zhu *, Ming Zhou + * Natural Language Processing Lab Northeastern University Shenyang, China, 110004

More information

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 104 OCTOBER

The Prague Bulletin of Mathematical Linguistics NUMBER 104 OCTOBER The Prague Bulletin of Mathematical Linguistics NUMBER 104 OCTOBER 2015 39 50 Sampling Phrase Tables for the Moses Statistical Machine Translation System Ulrich Germann University of Edinburgh Abstract

More information

Homework 1 Simple code genera/on. Luca Della Toffola Compiler Design HS15

Homework 1 Simple code genera/on. Luca Della Toffola Compiler Design HS15 Homework 1 Simple code genera/on Luca Della Toffola Compiler Design HS15 1 Administra1ve issues Has everyone found a team- mate? Mailing- list: cd1@lists.inf.ethz.ch Please subscribe if we forgot you 2

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information