Question of the Day. Machine Translation. Statistical Word Alignment. Centauri/Arcturan (Knight, 1997) Centauri/Arcturan (Knight, 1997)
|
|
- Julia Perry
- 6 years ago
- Views:
Transcription
1 Question of the Day Is it possible to learn to translate from plain example translations? Machine Translation Statistical Word Alignment Based on slides by Philipp Koehn and Kevin Knight Word Alignment 1 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Word Alignment 2 Word Alignment 3
2 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp??? Word Alignment 4 Word Alignment 5 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Word Alignment 6 Word Alignment 7
3 Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp??? Word Alignment 8 Word Alignment 9 Your assignment, translate this to Arcturan: farokcrrrok hihok yorok clok kantok ok-yurp Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp process of elimination Word Alignment 10 Word Alignment 11
4 Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Your assignment, put these words in order: { jjat, arrat, mat, bat, oloat, at-yurp } cognate? zero fertility Word Alignment 12 Word Alignment 13 Conclusion Clients do not sell pharmaceuticals in Europe => Clientes no venden medicinas en Europa 1a. Garcia and associates. 1b. Garcia y asociados. 2a. Carlos Garcia has three associates. 2b. Carlos Garcia tiene tres asociados. 3a. his associates are not strong. 3b. sus asociados no son fuertes. 4a. Garcia has a company also. 4b. Garcia tambien tiene una empresa. 7a. the clients and the associates are enemies. 7b. los clients y los asociados son enemigos. 8a. the company has three groups. 8b. la empresa tiene tres grupos. 9a. its groups are in Europe. 9b. sus grupos estan en Europa. 10a. the modern groups sell strong pharmaceuticals. 10b. los grupos modernos venden medicinas fuertes. It is possible to find alignments between words... without prior knowledge Translation models can be learned from word alignment 5a. its clients are angry. 5b. sus clientes estan enfadados. 6a. the associates are also angry. 6b. los asociados tambien estan enfadados. 11a. the groups do not sell zenzanine. 11b. los grupos no venden zanzanina. 12a. the small groups are not modern. 12b. los grupos pequenos no son modernos. Word Alignment 14 Word Alignment 15
5 Chicken and Egg Problem EM Algorithm Statistical alignment models can be used to align data argmax a p(a e, f) =argmax a p(e, a f) p(e f) Word aligned data is necessary to estimate model parameters Learning with incomplete data word alignment is hidden need to fill the gaps in the data Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2 3 until convergence Word Alignment 16 Word Alignment 17 EM Algorithm EM Algorithm... la maison... la maison blue... la fleur la maison... la maison blue... la fleur the house... the blue house... the flower... Initial step: all alignments equally likely Model learns that, e.g., la is often aligned with the... the house... the blue house... the flower... After one iteration Alignments, e.g., between la and the are more likely Word Alignment 18 Word Alignment 19
6 EM Algorithm EM Algorithm... la maison... la maison bleu... la fleur la maison... la maison bleu... la fleur the house... the blue house... the flower... After another iteration It becomes apparent that alignments, e.g., between fleur and flower are more likely... the house... the blue house... the flower... Convergence Inherent hidden structure revealed by EM Word Alignment 20 Word Alignment 21 EM Algorithm... la maison... la maison bleu... la fleur... EM Algorithm consists of two steps IBM Model 1 and EM Expectation-Step: Apply model to the data... the house... the blue house... the flower... p(la the) = p(le the) = p(maison house) = p(bleu blue) = Parameter estimation from the aligned corpus Word Alignment 22 parts of the model are hidden (here: alignments) using the model, assign probabilities to possible alignments Maximization-Step: Estimate model from data take assigned values as fractional counts collect counts (weighted by probabilities) estimate model from counts Iterate these steps until convergence Word Alignment 23
7 IBM Model 1 and EM IBM Model 1 and EM: Expectation Step Probabilities p(the la) =0.7 p(house la) =0.05 p(the maison) =0.1 p(house maison) =0.8 We need to compute p(a e, f) Alignments la maison the house the house la the maison la,,, the house p(e,a f) =0.56 p(e,a f) =0.035 p(e,a f) =0.08 p(e,a f) =0.005 Applying the chain rule: p(a e, f) = p(e,a f) p(e f) p(a e, f) =0.824 p(a e, f) =0.052 p(a e, f) =0.118 p(a e, f) =0.007 Counts c(the la) = c(house la) = c(the maison) = c(house maison) = We already have the formula for p(e, a f) (definition of Model 1) Word Alignment 25 Word Alignment 26 IBM Model 1 and EM: Expectation Step IBM Model 1 and EM: Expectation Step We need to compute p(e f) p(e f) = X p(e,a f) a l fx l fx =... p(e,a f) a(1)=0 a(l e)=0 l fx l fx =... a(1)=0 a(l e)=0 (l f + 1) le l e Y j=1 t(e j f a(j) ) Word Alignment 27 p(e f) = l fx a(1)=0... = (l f + 1) le = (l f + 1) le Note the trick in the last line l fx a(l e)=0 l fx a(1)=0 l e Y (l f + 1) le... j=1 i=0 l fx l e Y j=1 l e Y a(l e)=0 j=1 l fx t(e j f i ) t(e j f a(j) ) t(e j f a(j) ) removes the need for an exponential number of products! this makes IBM Model 1 estimation tractable Word Alignment 28
8 a(1)=0 a(2)=0 j=1 The Trick (case l e = l f =2) p(e f) = /3 2 2X 2X 2Y t(e j f a(j) ) = /3 2 ( t(e 1 f 0 ) t(e 2 f 0 )+t(e 1 f 0 ) t(e 2 f 1 )+t(e 1 f 0 ) t(e 2 f 2 )+ t(e 1 f 1 ) t(e 2 f 0 )+t(e 1 f 1 ) t(e 2 f 1 )+t(e 1 f 1 ) t(e 2 f 2 )+ t(e 1 f 2 ) t(e 2 f 0 )+t(e 1 f 2 ) t(e 2 f 1 )+t(e 1 f 2 ) t(e 2 f 2 )) = /3 2 ( t(e 1 f 0 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))+ t(e 1 f 1 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))+ t(e 1 f 2 )(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))) = /3 2 ( ( t(e 1 f 0 )+t(e 1 f 1 )+t(e 1 f 2 ))(t(e 2 f 0 )+t(e 2 f 1 )+t(e 2 f 2 ))) IBM Model 1 and EM: Expectation Step Combine what we have: p(a e, f) = = = p(e, a f) p(e f) (l f +1) le Q le (l f +1) le Q le j=1 l e Y j=1 t(e j f a(j) ) P lf i=0 t(e j f i ) j=1 t(e j f a(j) ) P lf i=0 t(e j f i ) Word Alignment 29 Word Alignment 30 IBM Model 1 and EM: Maximization Step IBM Model 1 and EM: Maximization Step Now we have to collect counts Evidence from a sentence pair e,f that word e is a translation of word f: c(e f; e, f) = X a p(a e, f) l e X j=1 (e, e j ) (f,f a(j) ) After collecting these counts over a corpus, we can estimate the model: t(e f; e, f) = P P f (e,f) P (e,f) c(e f; e, f)) c(e f; e, f)) 1 if a = b Note that: (a, b) = 0 otherwise! Count how many times e is aligned to f in alignment a and! weight each count by the likelihood p(a e, f) of that alignment Word Alignment 31 Word Alignment 33
9 IBM Model 1 and EM: Pseudocode Input: set of sentence pairs (e, f) Output: translation prob. t(e f) 1: initialize t(e f) uniformly 2: while not converged do 3: // initialize 4: count(e f) =0for all e, f 5: total(f) =0for all f 6: for all sentence pairs (e,f) do 7: // compute normalization 8: for all words e in edo 9: s-total(e) =0 10: for all words f in fdo 11: s-total(e) +=t(e f) 12: end for 13: end for 14: // collect counts 15: for all words e in edo 16: for all words f in fdo 17: count(e f) += t(e f) s-total(e) 18: total(f) += t(e f) s-total(e) 19: end for 20: end for 21: end for 22: // estimate probabilities 23: for all foreign words f do 24: for all English words e do 25: t(e f) = count(e f) total(f) 26: end for 27: end for 28: end while Word Alignment 34 das the Haus house Convergence das the Buch book ein a Buch book e f initial 1st it. 2nd it. 3rd it.... final the das book das house das the buch book buch a buch book ein a ein the haus house haus Word Alignment 35 Perplexity Higher IBM Models How well does the model fit the data? Perplexity: derived from probability of the training data according to the model log 2 PP = X s 1 S log 2 p(e s f s ) IBM Model 1 IBM Model 2 IBM Model 3 IBM Model 4 IBM Model 5 lexical translation adds absolute reordering model adds fertility model relative reordering model fixes deficiency Example (=1) initial 1st it. 2nd it. 3rd it.... final p(the haus das haus) p(the book das buch) p(a book ein buch) unnormalized perplexity Only IBM Model 1 has global maximum training of a higher IBM model builds on previous model Computationally biggest change in Model 3 trick to simplify estimation does not work anymore! exhaustive count collection becomes computationally too expensive sampling over high probability alignments is used instead Word Alignment 36 Word Alignment 37
10 Typical Training Scheme iterations over alignment models of increasing complexity: 1. n EM iterations of IBM Model 1 with uniform initialization 2. n EM iterations of IBM Model 2 or HMM initialized with Model 1 3. parameter transfer from IBM Model 2 / HMM to IBM Model 3 4. n hill-climbing iterations of IBM Model 3 based on best alignment 5. parameter transfer from IBM Model 3 to IBM Model 4 6. n hill-climbing iterations of IBM Model 4 based on best alignment typical number of iterations: 5 Popular implementation: GIZA++ Conclusion IBM Models were the pioneering models in statistical machine translation EM training learn from incomplete data by maximizing data likelihood iteratively converge to local maximum approximations needed for IBM 3 and higher Recommended reading (besides the text book): SMT Tutorial Workbook (Kevin Knight 1999) Introductory article by Kevin Knight (1997) Lecture notes by Micheal Collins in IBM Model 1 and 2 Hardcore: Brown et al., 1993 The Mathematics of Statistical Machine Translation: Parameter Estimation Word Alignment 44 Word Alignment 46
Statistical Machine Translation Lecture 3. Word Alignment Models
p. Statistical Machine Translation Lecture 3 Word Alignment Models Stephen Clark based on slides by Philipp Koehn p. Statistical Modeling p Mary did not slap the green witch Maria no daba una bofetada
More informationLanguage in 10 minutes
Language in 10 minutes http://mt-class.org/jhu/lin10.html By Friday: Group up (optional, max size 2), choose a language (not one y all speak) and a date First presentation: Yuan on Thursday Yuan will start
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationHomework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday
Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model
More informationAlgorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationAdministrative. Machine learning code. Machine learning: Unsupervised learning
Machine learning: Unsupervised learning http://www.youtube.com/watch?v=or_-y-eilqo David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationStatistical Machine Translation: Word Based Translation Models. Michael Wohlmayr
Statistical Machine Translation: Word Based Translation Models Michael Wohlmayr Statistical Machine Translation There is not THE ONE english translation e of a foreign sentence f. Some translations e are
More informationOutline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder
GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)
More informationIBM Model 1 and Machine Translation
IBM Model 1 and Machine Translation Recap 2 Expectation Maximization (EM) 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationInclusion of large input corpora in Statistical Machine Translation
Inclusion of large input corpora in Statistical Machine Translation Bipin Suresh Stanford University bipins@stanford.edu ABSTRACT In recent years, the availability of large, parallel, bilingual corpora
More informationSyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation
SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More informationPower Mean Based Algorithm for Combining Multiple Alignment Tables
Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination
More informationTALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño
TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and
More informationAn Unsupervised Model for Joint Phrase Alignment and Extraction
An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University
More informationA Semi-supervised Word Alignment Algorithm with Partial Manual Alignments
A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh
More informationClustering. Image segmentation, document clustering, protein class discovery, compression
Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key
More information1 Implement EM training of IBM model 1
INF5820, fall 2016 Assignment 2: Alignment for Stat. MT Deadline 21 Oct. at 6 pm, to be delivered in Devilry In this set we will familiarize ourselves with the first steps in the construction of a statistical
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationThe EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di
The EM Algorithm This lecture introduces an important statistical estimation algorithm known as the EM or \expectation-maximization" algorithm. It reviews the situations in which EM works well and its
More informationAligning English Strings with Abstract Meaning Representation Graphs
Aligning English Strings with Abstract Meaning Representation Graphs Nima Pourdamghani, Yang Gao, Ulf Hermjakob, Kevin Knight Information Sciences Institute Department of Computer Science University of
More informationLecture 8: The EM algorithm
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses
More informationComputer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),
Computer Science 401 13 February 2018 St. George Campus University of Toronto Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Statistical Machine Translation TA: Mohamed Abdalla (mohamed.abdalla@mail.utoronto.ca);
More informationLecture 5: Markov models
Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationStatistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh
More informationDecoding in Statistical Machine Translation Using Moses And Cygwin on Windows
Decoding in Statistical Machine Translation Using Moses And Cygwin on Windows Ms. Pragati Vaidya M.Tech Student, Banasthali Vidyapith, Banasthali, Jaipur Abstract Decoding is an integral part in SMT most
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationTreba: Efficient Numerically Stable EM for PFA
JMLR: Workshop and Conference Proceedings 21:249 253, 2012 The 11th ICGI Treba: Efficient Numerically Stable EM for PFA Mans Hulden Ikerbasque (Basque Science Foundation) mhulden@email.arizona.edu Abstract
More informationCryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 12-20-2016 Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models Guannan Zhong
More informationCUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation
2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies CUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation
More informationImage Segmentation using Gaussian Mixture Models
Image Segmentation using Gaussian Mixture Models Rahman Farnoosh, Gholamhossein Yari and Behnam Zarpak Department of Applied Mathematics, University of Science and Technology, 16844, Narmak,Tehran, Iran
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationEmbracing Diversity: Searching over multiple languages
Embracing Diversity: Searching over multiple languages Tommaso Teofili Suneel Marthi June 12, 2017 Berlin Buzzwords, Berlin, Germany 1 Tommaso Teofili @tteofili $WhoAreWe Software Engineer, Adobe Systems
More informationAn Introduction to Markov Chain Monte Carlo
An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other
More informationMachine Translation PDF MACHINE TRANSLATION. PDF File: Machine Translation 1
MACHINE TRANSLATION PDF File: Machine Translation 1 RELATED BOOK : Machine Translation Website Translation TransPerfect With TransPerfect s machine translation methodology, utilizing tools such as WorldLingo,
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationNonlinear Manifold Learning for Visual Speech Recognition
Nonlinear Manifold Learning for Visual Speech Recognition Christoph Bregler and Stephen Omohundro University of California, Berkeley & NEC Research Institute, Inc. 1/25 Overview Manifold Learning: Applications
More informationBinary Search and Worst-Case Analysis
Yufei Tao ITEE University of Queensland A significant part of computer science is devoted to understanding the power of the RAM model in solving specific problems. Every time we discuss a problem in this
More informationBinary Search and Worst-Case Analysis
Department of Computer Science and Engineering Chinese University of Hong Kong A significant part of computer science is devoted to understanding the power of the RAM model in solving specific problems.
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationConditional Random Fields for Word Hyphenation
Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009
More informationWebMining: An unsupervised parallel corpora web retrieval system
WebMining: An unsupervised parallel corpora web retrieval system Jesús Tomás Instituto Tecnológico de Informática Universidad Politécnica de Valencia jtomas@upv.es Jaime Lloret Dpto. de Comunicaciones
More informationCSE100 Principles of Programming with C++
1 Instructions You may work in pairs (that is, as a group of two) with a partner on this lab project if you wish or you may work alone. If you work with a partner, only submit one lab project with both
More informationMonte Carlo Methods and Statistical Computing: My Personal E
Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Optimal Naïve Nets (Adapted from
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationQ: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month
Lecture 1 Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale? Q: Just above November there
More informationLecture 9: Ultra-Fast Design of Ring Oscillator
Lecture 9: Ultra-Fast Design of Ring Oscillator CSCE 6933/5933 Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages,
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationCSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3
CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas
More informationMulti-dynamic Bayesian Networks
Multi-dynamic Bayesian Networks Karim Filali and Jeff A. Bilmes Departments of Computer Science & Engineering and Electrical Engineering University of Washington Seattle, WA 98195 {karim@cs,bilmes@ee}.washington.edu
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationA System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality
A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality Huỳnh Công Pháp 1 and Nguyễn Văn Bình 2 The University of Danang
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationChapter 2. The Algorithmic Foundations of. Computer Science INVITATION TO. Computer Science. Tuesday, September 10, 13
Chapter 2 The Algorithmic Foundations of Computer Science INVITATION TO Computer Science 1 Objectives After studying this chapter, students will be able to: Explain the benefits of pseudocode over natural
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationAdvanced Java Programming Daniel Liang
We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with advanced java programming
More informationTime series, HMMs, Kalman Filters
Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,
More informationIntroduction to Optimization Using Metaheuristics. The Lecturer: Thomas Stidsen. Outline. Name: Thomas Stidsen: Nationality: Danish.
The Lecturer: Thomas Stidsen Name: Thomas Stidsen: tks@imm.dtu.dk Outline Nationality: Danish. General course information Languages: Danish and English. Motivation, modelling and solving Education: Ph.D.
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationLecture 10 May 14, Prabhakar Raghavan
Lecture 10 May 14, 2001 Prabhakar Raghavan Centroid/nearest-neighbor classification Bayesian Classification Link-based classification Document summarization Given training docs for a topic, compute their
More informationConstraints in Particle Swarm Optimization of Hidden Markov Models
Constraints in Particle Swarm Optimization of Hidden Markov Models Martin Macaš, Daniel Novák, and Lenka Lhotská Czech Technical University, Faculty of Electrical Engineering, Dep. of Cybernetics, Prague,
More informationMPLS Configuration On Cisco IOS Software (Networking Technology) [Kindle Edition] By Umesh Lakshman;Lancy Lobo READ ONLINE
MPLS Configuration On Cisco IOS Software (Networking Technology) [Kindle Edition] By Umesh Lakshman;Lancy Lobo READ ONLINE If searched for the book MPLS Configuration on Cisco IOS Software (Networking
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationApproximate Bayesian Computation. Alireza Shafaei - April 2016
Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationBilinear Programming
Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu
More informationReference Services Division Presents. Excel Introductory Course
Reference Services Division Presents Excel 2007 Introductory Course OBJECTIVES: Navigate Comfortably in the Excel Environment Create a basic spreadsheet Learn how to format the cells and text Apply a simple
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationA simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities
Recap: noisy channel model Foundations of Natural anguage Processing ecture 6 pelling correction, edit distance, and EM lex ascarides (lides from lex ascarides and haron Goldwater) 1 February 2019 general
More informationApplications of Machine Translation
Applications of Machine Translation Index Historical Overview Commercial Products Open Source Software Special Applications Future Aspects History Before the Computer: Mid 1930s: Georges Artsrouni and
More informationStone Soup Translation
Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationThree-Dimensional Sensors Lecture 6: Point-Cloud Registration
Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ Point-Cloud Registration Methods Fuse data
More informationTopics du jour CS347. Centroid/NN. Example
Topics du jour CS347 Lecture 10 May 14, 2001 Prabhakar Raghavan Centroid/nearest-neighbor classification Bayesian Classification Link-based classification Document summarization Centroid/NN Given training
More informationOutline for today s lecture. Informed Search. Informed Search II. Review: Properties of greedy best-first search. Review: Greedy best-first search:
Outline for today s lecture Informed Search II Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing 2 Review: Greedy best-first search: f(n): estimated
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationLecture 3 of 42. Lecture 3 of 42
Search Problems Discussion: Term Projects 3 of 5 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: http://snipurl.com/v9v3 Course web site: http://www.kddresearch.org/courses/cis730
More informationChapter 6. Dynamic Programming. Modified from slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Modified from slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Think recursively (this week)!!! Divide & conquer and Dynamic programming
More informationIntra-sentence Punctuation Insertion in Natural Language Generation
Intra-sentence Punctuation Insertion in Natural Language Generation Zhu ZHANG, Michael GAMON, Simon CORSTON-OLIVER, Eric RINGGER School of Information Microsoft Research University of Michigan One Microsoft
More informationLearning Undirected Models with Missing Data
Learning Undirected Models with Missing Data Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Log-linear form of Markov Network The missing data parameter estimation problem Methods for missing data:
More informationFitting D.A. Forsyth, CS 543
Fitting D.A. Forsyth, CS 543 Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can t tell whether a set of points lies on
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach
More informationProgramming Language Design and Implementation. Cunning Plan. Your Host For The Semester. Wes Weimer TR 9:30-10:45 MEC 214. Who Are We?
Programming Language Design and Implementation Wes Weimer TR 9:30-10:45 MEC 214 #1 Who Are We? Cunning Plan Wes, Pieter, Isabelle Administrivia What Is This Class About? Brief History Lesson Understanding
More information