Question of the Day. Machine Translation. Statistical Word Alignment. Centauri/Arcturan (Knight, 1997) Centauri/Arcturan (Knight, 1997)

Size: px
Start display at page:

Download "Question of the Day. Machine Translation. Statistical Word Alignment. Centauri/Arcturan (Knight, 1997) Centauri/Arcturan (Knight, 1997)"

Transcription

1 Question of the Day Is it possible to learn to translate from plain example translations? Machine Translation Statistical Word Alignment Based on slides by Philipp Koehn and Kevin Knight Word Alignment 1 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Word Alignment 2 Word Alignment 3

2 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp??? Word Alignment 4 Word Alignment 5 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Word Alignment 6 Word Alignment 7

3 Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp??? Word Alignment 8 Word Alignment 9 Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp process of elimination Word Alignment 10 Word Alignment 11

4 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, put these words in order: { jjat, arrat, mat, bat, oloat, at-yurp } cognate? zero fertility Word Alignment 12 Word Alignment 13 Conclusion Clients do not sell pharmaceuticals in Europe => Clientes no venden medicinas en Europa 1a. Garcia and associates. 1b. Garcia y asociados. 2a. Carlos Garcia has three associates. 2b. Carlos Garcia tiene tres asociados. 3a. his associates are not strong. 3b. sus asociados no son fuertes. 4a. Garcia has a company also. 4b. Garcia tambien tiene una empresa. 7a. the clients and the associates are enemies. 7b. los clients y los asociados son enemigos. 8a. the company has three groups. 8b. la empresa tiene tres grupos. 9a. its groups are in Europe. 9b. sus grupos estan en Europa. 10a. the modern groups sell strong pharmaceuticals. 10b. los grupos modernos venden medicinas fuertes. It is possible to find alignments between words... without prior knowledge Translation models can be learned from word alignment 5a. its clients are angry. 5b. sus clientes estan enfadados. 6a. the associates are also angry. 6b. los asociados tambien estan enfadados. 11a. the groups do not sell zenzanine. 11b. los grupos no venden zanzanina. 12a. the small groups are not modern. 12b. los grupos pequenos no son modernos. Word Alignment 14 Word Alignment 15

5 Chicken and Egg Problem EM Algorithm Statistical alignment models can be used to align data argmax a p(a e, f) =argmax a p(e, a f) p(e f) Word aligned data is necessary to estimate model parameters Learning with incomplete data word alignment is hidden need to fill the gaps in the data Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2 3 until convergence Word Alignment 16 Word Alignment 17 EM Algorithm EM Algorithm... la maison... la maison blue... la fleur la maison... la maison blue... la fleur the house... the blue house... the flower... Initial step: all alignments equally likely Model learns that, e.g., la is often aligned with the... the house... the blue house... the flower... After one iteration Alignments, e.g., between la and the are more likely Word Alignment 18 Word Alignment 19

6 EM Algorithm EM Algorithm... la maison... la maison bleu... la fleur la maison... la maison bleu... la fleur the house... the blue house... the flower... After another iteration It becomes apparent that alignments, e.g., between fleur and flower are more likely... the house... the blue house... the flower... Convergence Inherent hidden structure revealed by EM Word Alignment 20 Word Alignment 21 EM Algorithm... la maison... la maison bleu... la fleur... EM Algorithm consists of two steps IBM Model 1 and EM Expectation-Step: Apply model to the data... the house... the blue house... the flower... p(la the) = p(le the) = p(maison house) = p(bleu blue) = Parameter estimation from the aligned corpus Word Alignment 22 parts of the model are hidden (here: alignments) using the model, assign probabilities to possible alignments Maximization-Step: Estimate model from data take assigned values as fractional counts collect counts (weighted by probabilities) estimate model from counts Iterate these steps until convergence Word Alignment 23

7 IBM Model 1 and EM IBM Model 1 and EM: Expectation Step Probabilities p(the la) =0.7 p(house la) =0.05 p(the maison) =0.1 p(house maison) =0.8 We need to compute p(a e, f) Alignments la maison the house the house la the maison la,,, the house p(e,a f) =0.56 p(e,a f) =0.035 p(e,a f) =0.08 p(e,a f) =0.005 Applying the chain rule: p(a e, f) = p(e,a f) p(e f) p(a e, f) =0.824 p(a e, f) =0.052 p(a e, f) =0.118 p(a e, f) =0.007 Counts c(the la) = c(house la) = c(the maison) = c(house maison) = We already have the formula for p(e, a f) (definition of Model 1) Word Alignment 25 Word Alignment 26 IBM Model 1 and EM: Expectation Step IBM Model 1 and EM: Expectation Step We need to compute p(e f) p(e f) = X p(e,a f) a l fx l fx =... p(e,a f) a(1)=0 a(l e)=0 l fx l fx =... a(1)=0 a(l e)=0 (l f + 1) le l e Y j=1 t(e j f a(j) ) Word Alignment 27 p(e f) = l fx a(1)=0... = (l f + 1) le = (l f + 1) le Note the trick in the last line l fx a(l e)=0 l fx a(1)=0 l e Y (l f + 1) le... j=1 i=0 l fx l e Y j=1 l e Y a(l e)=0 j=1 l fx t(e j f i ) t(e j f a(j) ) t(e j f a(j) ) removes the need for an exponential number of products! this makes IBM Model 1 estimation tractable Word Alignment 28

8 a(1)=0 a(2)=0 j=1 The Trick (case l e = l f =2) p(e f) = /3 2 2X 2X 2Y t(e j f a(j) ) = /3 2 ( t(e 1 f 0 ) t(e 2 f 0 )+t(e 1 f 0 ) t(e 2 f 1 )+t(e 1 f 0 ) t(e 2 f 2 )+ t(e 1 f 1 ) t(e 2 f 0 )+t(e 1 f 1 ) t(e 2 f 1 )+t(e 1 f 1 ) t(e 2 f 2 )+ t(e 1 f 2 ) t(e 2 f 0 )+t(e 1 f 2 ) t(e 2 f 1 )+t(e 1 f 2 ) t(e 2 f 2 )) = /3 2 ( t(e 1 f 0 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))+ t(e 1 f 1 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))+ t(e 1 f 2 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))) = /3 2 ( ( t(e 1 f 0 )+t(e 1 f 1 )+t(e 1 f 2 ))(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))) IBM Model 1 and EM: Expectation Step Combine what we have: p(a e, f) = = = p(e, a f) p(e f) (l f +1) le Q le (l f +1) le Q le j=1 l e Y j=1 t(e j f a(j) ) P lf i=0 t(e j f i ) j=1 t(e j f a(j) ) P lf i=0 t(e j f i ) Word Alignment 29 Word Alignment 30 IBM Model 1 and EM: Maximization Step IBM Model 1 and EM: Maximization Step Now we have to collect counts Evidence from a sentence pair e,f that word e is a translation of word f: c(e f; e, f) = X a p(a e, f) l e X j=1 (e, e j ) (f,f a(j) ) After collecting these counts over a corpus, we can estimate the model: t(e f; e, f) = P P f (e,f) P (e,f) c(e f; e, f)) c(e f; e, f)) 1 if a = b Note that: (a, b) = 0 otherwise! Count how many times e is aligned to f in alignment a and! weight each count by the likelihood p(a e, f) of that alignment Word Alignment 31 Word Alignment 33

9 IBM Model 1 and EM: Pseudocode Input: set of sentence pairs (e, f) Output: translation prob. t(e f) 1: initialize t(e f) uniformly 2: while not converged do 3: // initialize 4: count(e f) =0for all e, f 5: total(f) =0for all f 6: for all sentence pairs (e,f) do 7: // compute normalization 8: for all words e in edo 9: s-total(e) =0 10: for all words f in fdo 11: s-total(e) +=t(e f) 12: end for 13: end for 14: // collect counts 15: for all words e in edo 16: for all words f in fdo 17: count(e f) += t(e f) s-total(e) 18: total(f) += t(e f) s-total(e) 19: end for 20: end for 21: end for 22: // estimate probabilities 23: for all foreign words f do 24: for all English words e do 25: t(e f) = count(e f) total(f) 26: end for 27: end for 28: end while Word Alignment 34 das the Haus house Convergence das the Buch book ein a Buch book e f initial 1st it. 2nd it. 3rd it.... final the das book das house das the buch book buch a buch book ein a ein the haus house haus Word Alignment 35 Perplexity Higher IBM Models How well does the model fit the data? Perplexity: derived from probability of the training data according to the model log 2 PP = X s 1 S log 2 p(e s f s ) IBM Model 1 IBM Model 2 IBM Model 3 IBM Model 4 IBM Model 5 lexical translation adds absolute reordering model adds fertility model relative reordering model fixes deficiency Example (=1) initial 1st it. 2nd it. 3rd it.... final p(the haus das haus) p(the book das buch) p(a book ein buch) unnormalized perplexity Only IBM Model 1 has global maximum training of a higher IBM model builds on previous model Computationally biggest change in Model 3 trick to simplify estimation does not work anymore! exhaustive count collection becomes computationally too expensive sampling over high probability alignments is used instead Word Alignment 36 Word Alignment 37

10 Typical Training Scheme iterations over alignment models of increasing complexity: 1. n EM iterations of IBM Model 1 with uniform initialization 2. n EM iterations of IBM Model 2 or HMM initialized with Model 1 3. parameter transfer from IBM Model 2 / HMM to IBM Model 3 4. n hill-climbing iterations of IBM Model 3 based on best alignment 5. parameter transfer from IBM Model 3 to IBM Model 4 6. n hill-climbing iterations of IBM Model 4 based on best alignment typical number of iterations: 5 Popular implementation: GIZA++ Conclusion IBM Models were the pioneering models in statistical machine translation EM training learn from incomplete data by maximizing data likelihood iteratively converge to local maximum approximations needed for IBM 3 and higher Recommended reading (besides the text book): SMT Tutorial Workbook (Kevin Knight 1999) Introductory article by Kevin Knight (1997) Lecture notes by Micheal Collins in IBM Model 1 and 2 Hardcore: Brown et al., 1993 The Mathematics of Statistical Machine Translation: Parameter Estimation Word Alignment 44 Word Alignment 46

Statistical Machine Translation Lecture 3. Word Alignment Models

Statistical Machine Translation Lecture 3. Word Alignment Models p. Statistical Machine Translation Lecture 3 Word Alignment Models Stephen Clark based on slides by Philipp Koehn p. Statistical Modeling p Mary did not slap the green witch Maria no daba una bofetada

More information

Language in 10 minutes

Language in 10 minutes Language in 10 minutes http://mt-class.org/jhu/lin10.html By Friday: Group up (optional, max size 2), choose a language (not one y all speak) and a date First presentation: Yuan on Thursday Yuan will start

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model

More information

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

Administrative. Machine learning code. Machine learning: Unsupervised learning

Administrative. Machine learning code. Machine learning: Unsupervised learning Machine learning: Unsupervised learning http://www.youtube.com/watch?v=or_-y-eilqo David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Statistical Machine Translation: Word Based Translation Models. Michael Wohlmayr

Statistical Machine Translation: Word Based Translation Models. Michael Wohlmayr Statistical Machine Translation: Word Based Translation Models Michael Wohlmayr Statistical Machine Translation There is not THE ONE english translation e of a foreign sentence f. Some translations e are

More information

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)

More information

IBM Model 1 and Machine Translation

IBM Model 1 and Machine Translation IBM Model 1 and Machine Translation Recap 2 Expectation Maximization (EM) 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Inclusion of large input corpora in Statistical Machine Translation

Inclusion of large input corpora in Statistical Machine Translation Inclusion of large input corpora in Statistical Machine Translation Bipin Suresh Stanford University bipins@stanford.edu ABSTRACT In recent years, the availability of large, parallel, bilingual corpora

More information

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

Power Mean Based Algorithm for Combining Multiple Alignment Tables

Power Mean Based Algorithm for Combining Multiple Alignment Tables Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination

More information

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and

More information

An Unsupervised Model for Joint Phrase Alignment and Extraction

An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University

More information

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh

More information

Clustering. Image segmentation, document clustering, protein class discovery, compression

Clustering. Image segmentation, document clustering, protein class discovery, compression Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key

More information

1 Implement EM training of IBM model 1

1 Implement EM training of IBM model 1 INF5820, fall 2016 Assignment 2: Alignment for Stat. MT Deadline 21 Oct. at 6 pm, to be delivered in Devilry In this set we will familiarize ourselves with the first steps in the construction of a statistical

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best knob settings. Often impossible to nd di The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its

More information

Aligning English Strings with Abstract Meaning Representation Graphs

Aligning English Strings with Abstract Meaning Representation Graphs Aligning English Strings with Abstract Meaning Representation Graphs Nima Pourdamghani, Yang Gao, Ulf Hermjakob, Kevin Knight Information Sciences Institute Department of Computer Science University of

More information

Lecture 8: The EM algorithm

Lecture 8: The EM algorithm 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses

More information

Computer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),

Computer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Computer Science 401 13 February 2018 St. George Campus University of Toronto Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Statistical Machine Translation TA: Mohamed Abdalla (mohamed.abdalla@mail.utoronto.ca);

More information

Lecture 5: Markov models

Lecture 5: Markov models Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh

More information

Decoding in Statistical Machine Translation Using Moses And Cygwin on Windows

Decoding in Statistical Machine Translation Using Moses And Cygwin on Windows Decoding in Statistical Machine Translation Using Moses And Cygwin on Windows Ms. Pragati Vaidya M.Tech Student, Banasthali Vidyapith, Banasthali, Jaipur Abstract Decoding is an integral part in SMT most

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

The Expectation Maximization (EM) Algorithm

The Expectation Maximization (EM) Algorithm The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

Treba: Efficient Numerically Stable EM for PFA

Treba: Efficient Numerically Stable EM for PFA JMLR: Workshop and Conference Proceedings 21:249 253, 2012 The 11th ICGI Treba: Efficient Numerically Stable EM for PFA Mans Hulden Ikerbasque (Basque Science Foundation) mhulden@email.arizona.edu Abstract

More information

Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models

Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 12-20-2016 Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models Guannan Zhong

More information

CUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation

CUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies CUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation

More information

Image Segmentation using Gaussian Mixture Models

Image Segmentation using Gaussian Mixture Models Image Segmentation using Gaussian Mixture Models Rahman Farnoosh, Gholamhossein Yari and Behnam Zarpak Department of Applied Mathematics, University of Science and Technology, 16844, Narmak,Tehran, Iran

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

Embracing Diversity: Searching over multiple languages

Embracing Diversity: Searching over multiple languages Embracing Diversity: Searching over multiple languages Tommaso Teofili Suneel Marthi June 12, 2017 Berlin Buzzwords, Berlin, Germany 1 Tommaso Teofili @tteofili $WhoAreWe Software Engineer, Adobe Systems

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

Machine Translation PDF MACHINE TRANSLATION. PDF File: Machine Translation 1

Machine Translation PDF MACHINE TRANSLATION. PDF File: Machine Translation 1 MACHINE TRANSLATION PDF File: Machine Translation 1 RELATED BOOK : Machine Translation Website Translation TransPerfect With TransPerfect s machine translation methodology, utilizing tools such as WorldLingo,

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Nonlinear Manifold Learning for Visual Speech Recognition

Nonlinear Manifold Learning for Visual Speech Recognition Nonlinear Manifold Learning for Visual Speech Recognition Christoph Bregler and Stephen Omohundro University of California, Berkeley & NEC Research Institute, Inc. 1/25 Overview Manifold Learning: Applications

More information

Binary Search and Worst-Case Analysis

Binary Search and Worst-Case Analysis Yufei Tao ITEE University of Queensland A significant part of computer science is devoted to understanding the power of the RAM model in solving specific problems. Every time we discuss a problem in this

More information

Binary Search and Worst-Case Analysis

Binary Search and Worst-Case Analysis Department of Computer Science and Engineering Chinese University of Hong Kong A significant part of computer science is devoted to understanding the power of the RAM model in solving specific problems.

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Conditional Random Fields for Word Hyphenation

Conditional Random Fields for Word Hyphenation Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009

More information

WebMining: An unsupervised parallel corpora web retrieval system

WebMining: An unsupervised parallel corpora web retrieval system WebMining: An unsupervised parallel corpora web retrieval system Jesús Tomás Instituto Tecnológico de Informática Universidad Politécnica de Valencia jtomas@upv.es Jaime Lloret Dpto. de Comunicaciones

More information

CSE100 Principles of Programming with C++

CSE100 Principles of Programming with C++ 1 Instructions You may work in pairs (that is, as a group of two) with a partner on this lab project if you wish or you may work alone. If you work with a partner, only submit one lab project with both

More information

Monte Carlo Methods and Statistical Computing: My Personal E

Monte Carlo Methods and Statistical Computing: My Personal E Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Optimal Naïve Nets (Adapted from

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month Lecture 1 Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale? Q: Just above November there

More information

Lecture 9: Ultra-Fast Design of Ring Oscillator

Lecture 9: Ultra-Fast Design of Ring Oscillator Lecture 9: Ultra-Fast Design of Ring Oscillator CSCE 6933/5933 Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages,

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas

More information

Multi-dynamic Bayesian Networks

Multi-dynamic Bayesian Networks Multi-dynamic Bayesian Networks Karim Filali and Jeff A. Bilmes Departments of Computer Science & Engineering and Electrical Engineering University of Washington Seattle, WA 98195 {karim@cs,bilmes@ee}.washington.edu

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality

A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality Huỳnh Công Pháp 1 and Nguyễn Văn Bình 2 The University of Danang

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Chapter 2. The Algorithmic Foundations of. Computer Science INVITATION TO. Computer Science. Tuesday, September 10, 13

Chapter 2. The Algorithmic Foundations of. Computer Science INVITATION TO. Computer Science. Tuesday, September 10, 13 Chapter 2 The Algorithmic Foundations of Computer Science INVITATION TO Computer Science 1 Objectives After studying this chapter, students will be able to: Explain the benefits of pseudocode over natural

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Advanced Java Programming Daniel Liang

Advanced Java Programming Daniel Liang We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with advanced java programming

More information

Time series, HMMs, Kalman Filters

Time series, HMMs, Kalman Filters Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,

More information

Introduction to Optimization Using Metaheuristics. The Lecturer: Thomas Stidsen. Outline. Name: Thomas Stidsen: Nationality: Danish.

Introduction to Optimization Using Metaheuristics. The Lecturer: Thomas Stidsen. Outline. Name: Thomas Stidsen: Nationality: Danish. The Lecturer: Thomas Stidsen Name: Thomas Stidsen: tks@imm.dtu.dk Outline Nationality: Danish. General course information Languages: Danish and English. Motivation, modelling and solving Education: Ph.D.

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Lecture 10 May 14, Prabhakar Raghavan

Lecture 10 May 14, Prabhakar Raghavan Lecture 10 May 14, 2001 Prabhakar Raghavan Centroid/nearest-neighbor classification Bayesian Classification Link-based classification Document summarization Given training docs for a topic, compute their

More information

Constraints in Particle Swarm Optimization of Hidden Markov Models

Constraints in Particle Swarm Optimization of Hidden Markov Models Constraints in Particle Swarm Optimization of Hidden Markov Models Martin Macaš, Daniel Novák, and Lenka Lhotská Czech Technical University, Faculty of Electrical Engineering, Dep. of Cybernetics, Prague,

More information

MPLS Configuration On Cisco IOS Software (Networking Technology) [Kindle Edition] By Umesh Lakshman;Lancy Lobo READ ONLINE

MPLS Configuration On Cisco IOS Software (Networking Technology) [Kindle Edition] By Umesh Lakshman;Lancy Lobo READ ONLINE MPLS Configuration On Cisco IOS Software (Networking Technology) [Kindle Edition] By Umesh Lakshman;Lancy Lobo READ ONLINE If searched for the book MPLS Configuration on Cisco IOS Software (Networking

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Approximate Bayesian Computation. Alireza Shafaei - April 2016 Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Bilinear Programming

Bilinear Programming Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu

More information

Reference Services Division Presents. Excel Introductory Course

Reference Services Division Presents. Excel Introductory Course Reference Services Division Presents Excel 2007 Introductory Course OBJECTIVES: Navigate Comfortably in the Excel Environment Create a basic spreadsheet Learn how to format the cells and text Apply a simple

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities Recap: noisy channel model Foundations of Natural anguage Processing ecture 6 pelling correction, edit distance, and EM lex ascarides (lides from lex ascarides and haron Goldwater) 1 February 2019 general

More information

Applications of Machine Translation

Applications of Machine Translation Applications of Machine Translation Index Historical Overview Commercial Products Open Source Software Special Applications Future Aspects History Before the Computer: Mid 1930s: Georges Artsrouni and

More information

Stone Soup Translation

Stone Soup Translation Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is

More information

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Discriminative Training with Perceptron Algorithm for POS Tagging Task Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu

More information

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ Point-Cloud Registration Methods Fuse data

More information

Topics du jour CS347. Centroid/NN. Example

Topics du jour CS347. Centroid/NN. Example Topics du jour CS347 Lecture 10 May 14, 2001 Prabhakar Raghavan Centroid/nearest-neighbor classification Bayesian Classification Link-based classification Document summarization Centroid/NN Given training

More information

Outline for today s lecture. Informed Search. Informed Search II. Review: Properties of greedy best-first search. Review: Greedy best-first search:

Outline for today s lecture. Informed Search. Informed Search II. Review: Properties of greedy best-first search. Review: Greedy best-first search: Outline for today s lecture Informed Search II Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing 2 Review: Greedy best-first search: f(n): estimated

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

Lecture 3 of 42. Lecture 3 of 42

Lecture 3 of 42. Lecture 3 of 42 Search Problems Discussion: Term Projects 3 of 5 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: http://snipurl.com/v9v3 Course web site: http://www.kddresearch.org/courses/cis730

More information

Chapter 6. Dynamic Programming. Modified from slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Modified from slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Modified from slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Think recursively (this week)!!! Divide & conquer and Dynamic programming

More information

Intra-sentence Punctuation Insertion in Natural Language Generation

Intra-sentence Punctuation Insertion in Natural Language Generation Intra-sentence Punctuation Insertion in Natural Language Generation Zhu ZHANG, Michael GAMON, Simon CORSTON-OLIVER, Eric RINGGER School of Information Microsoft Research University of Michigan One Microsoft

More information

Learning Undirected Models with Missing Data

Learning Undirected Models with Missing Data Learning Undirected Models with Missing Data Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Log-linear form of Markov Network The missing data parameter estimation problem Methods for missing data:

More information

Fitting D.A. Forsyth, CS 543

Fitting D.A. Forsyth, CS 543 Fitting D.A. Forsyth, CS 543 Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can t tell whether a set of points lies on

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation. A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach

More information

Programming Language Design and Implementation. Cunning Plan. Your Host For The Semester. Wes Weimer TR 9:30-10:45 MEC 214. Who Are We?

Programming Language Design and Implementation. Cunning Plan. Your Host For The Semester. Wes Weimer TR 9:30-10:45 MEC 214. Who Are We? Programming Language Design and Implementation Wes Weimer TR 9:30-10:45 MEC 214 #1 Who Are We? Cunning Plan Wes, Pieter, Isabelle Administrivia What Is This Class About? Brief History Lesson Understanding

More information