Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley
|
|
- Baldwin Waters
- 5 years ago
- Views:
Transcription
1 Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley
2 Word Alignment s Indonesia parliament court in arraigned speaker
3 Word Alignment s Indonesia parliament court in arraigned speaker
4 Word Alignment s Indonesia parliament court in arraigned speaker
5 Word Alignment s Indonesia parliament court in arraigned speaker
6 Experimental Setup
7 Experimental Setup 2002 Chinese-English NIST data labeled training examples 191 test examples
8 Experimental Setup 2002 Chinese-English NIST data labeled training examples 191 test examples Evaluation - - Prec, Recall over gold alignments BLEU end-to-end
9 Alignment Families A
10 Alignment Families Optimal AER A
11 Alignment Families Optimal AER A 0.0 A
12 Alignment Families Optimal AER A 0.0 A A 1-1
13 Alignment Families Optimal AER A 0.0 A A 1-1
14 Alignment Families Optimal AER A A A A 1-1
15 Alignment Families
16 Alignment Families Inversion Transduction Grammar (ITG) [Wu, 97]
17 Alignment Families Inversion Transduction Grammar (ITG) X f,e, f,,,e [Wu, 97]
18 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f [Wu, 97]
19 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) [Wu, 97]
20 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) [Wu, 97]
21 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) X X (L) X (R) [Wu, 97]
22 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) X X (L) X (R) X (L) X (R) [Wu, 97]
23 Alignment Families Optimal AER A A A A 1-1
24 Alignment Families Optimal AER A A A A 1-1 A ITG
25 Alignment Families Optimal AER A A A A 1-1 A ITG
26 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A ITG
27 Alignment Families
28 Alignment Families Block ITG (BITG)
29 Alignment Families Block ITG (BITG) X f,e, f,e
30 Alignment Families Block ITG (BITG) e e X f,e, f,e f f
31 Alignment Families Block ITG (BITG) e e X f,e, f,e f f X X (L) X (R) X (L) X (R) X X (L) X (R) X (L) X (R)
32 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A ITG
33 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A BITG A ITG
34 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A BITG A ITG
35 Alignment Families Optimal AER A A A ITG 1.2 A BITG 10.2 A A 1-1 A BITG A ITG
36 Alignment Families Optimal AER A BITG 1.2 A A BITG
37 Supervised Data
38 Supervised Data x s Indonesia parliament court in arraigned speaker
39 Supervised Data a x s Indonesia parliament court in arraigned speaker
40 Loss Function
41 Loss Function a
42 Loss Function a â
43 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments
44 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments
45 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments
46 Linear Model
47 Linear Model a A
48 Linear Model a A Score w T φ(a)
49 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a
50 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a φ 13 = { PartialDictMatch(Indonesia, Indonesia,
51 Linear Model a A Score w T φ(a) Features φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a
52 Margin Training
53 Margin Training a
54 Margin Training a â
55 Margin Training a â arg max a A w T φ(a)
56 Margin Training a â arg max a A w T φ(a)+λl(a, a)
57 Margin Training a â
58 Margin Training a â w T φ(a ) w T φ(â)+l(a, â)
59 Margin Training a â min w w w 2 w T φ(a ) w T φ(â)+l(a, â)
60 Oracle Projection a
61 Oracle Projection a A
62 Oracle Projection a A A
63 Oracle Projection m =min a A L(a, a) a A A
64 Oracle Projection m =min a A L(a, a) M(a ) = {a A s.t. a L(a, a) =m } A A
65 Oracle Projection m =min a A L(a, a) M(a ) = {a A s.t. a L(a, a) =m } A A M(a )
66 Oracle Projection a A A M(a )
67 Oracle Projection a A A M(a )
68 Oracle Projection a A A M(a )
69 Oracle Projection a a p A A M(a )
70 Learning Alignments 1-1 ITG BITG
71 Learning Alignments 1-1 ITG BITG MIRA Trained
72 Learning Alignments 1-1 ITG BITG MIRA Trained Viterbi Inference
73 Learning Alignments 1-1 ITG BITG MIRA Trained - Dice - Lexical - Distance - Dictionary Viterbi Inference Simple Features
74 Learning Alignments 1-1 ITG BITG 90 MIRA Trained Viterbi Inference Simple Features Dice - Lexical - Distance - Dictionary Precision Recall
75 Likelihood Training
76 Likelihood Training P w (a x) exp{w T φ(a)}
77 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A
78 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1
79 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 #P
80 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A X 1-1 #P
81 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P
82 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P j k i l Target Source
83 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P j k i l Target Source ABITG Target j k i l Source
84 Derivations vs. Alignments
85 Derivations vs. Alignments
86 Derivations vs. Alignments
87 Derivations vs. Alignments
88 Derivations vs. Alignments
89 Derivations vs. Alignments
90 Derivations vs. Alignments
91 ITG Normal Form
92 ITG Normal Form N-ary Productions
93 ITG Normal Form N-ary Productions
94 ITG Normal Form N-ary Productions
95 ITG Normal Form N-ary Productions
96 ITG Normal Form N-ary Productions Null Attachment
97 ITG Normal Form N-ary Productions Null Attachment
98 ITG Normal Form N-ary Productions Null Attachment
99 ITG Normal Form N-ary Productions Null Attachment
100 Oracle Projection
101 Oracle Projection P w (a x) exp{w T φ(a)}
102 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i
103 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a
104 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a A
105 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a A M(a )
106 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (M(a i ) x i ) i A a A M(a )
107 Learning Alignments Margin Likelihood
108 Learning Alignments Margin Likelihood Viterbi Inference
109 Learning Alignments Margin Likelihood Viterbi Inference Simple Features - Dice - Lexical - Distance - Dictionary
110 Learning Alignments Margin Likelihood 90 Viterbi Inference Simple Features Dice - Lexical - Distance - Dictionary 50 Precision Recall
111 Learning Alignments GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM)
112 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall
113 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall
114 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall
115 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall
116 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall
117 End-to-End Experiments
118 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009]
119 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40
120 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua
121 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test
122 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Test: NIST 2005 Chinese-English Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test
123 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Test: NIST 2005 Chinese-English Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test Alignments Recall Prec BLEU GIZA HMM LL-BITG
124 Conclusions
125 Conclusions Blocks are important, ITG tractable
126 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training
127 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements
128 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements
129 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements Software nlp.cs.berkeley.edu
130 Thanks!
131 Likelihood Criterion A â a P w (a x) = s w (a) a A s w(a ) max w (x,a ) D log P (a x)
132 Likelihood Criterion A A ITG â a P w (a x) = s w (a) a A ITG s w (a ) max w (x,a ) D log P (a x)
133 Likelihood Criterion M(a ) a m = min L(a, a) a A ITG M(a )={a A ITG s.t. L(a, a) =m }
134 Likelihood Criterion M(a ) a max w (x,a ) D log P (M(a ) x)
135 Likelihood Criterion MIRA 21.1 LogLike A BITG
136 Adding External Features Adding Joint HMM posteriors from DeNero et. al. (2007) MIRA Likelihood 1-1 ITG BITG BITG-S BITG-N Features P R AER P R AER P R AER P R AER P R AER Dice, dist, blcks, dict, lex HMM TODO: Keynote Chart
137 End-to-End Results Using JosHUa decoder (HIERO) Alignments Translations Model Prec Rec Rules BLEU GIZA M Joint HMM M Viterbi ITG M Posterior ITG M TODO: Keynote Chart
138 Alignment Families A A 1-1
139 Alignment Families A A 1-1
140 Alignment Families A A 1-1 A ITG
141 Alignment Families A A 1-1 A ITG
142 Alignment Families A A 1-1 A ITG
143 Alignment Families A A 1-1 A ITG
144 Alignment Families A A 1-1 A ITG Optimal AER A 1-1 A ITG
145 Alignment Families A A 1-1 A ITG A BITG
146 Alignment Families A A 1-1 A ITG A BITG
147 Alignment Families Optimal AER A A 1-1 A 1-1 A ITG A ITG A BITG 1.2 A BITG
148 Learning Alignments a A
149 Learning Alignments a A s(a) =w T φ(a)
150 Learning Alignments a A Score w T φ(a) Features φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a
151 Learning Alignments a A s w (a) =w T φ(a) φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a
152 MIRA: Margin Criterion a â
153 MIRA: Margin Criterion A â arg max s(a) a A a s w (a ) s w (â )+L(a, â )
154 MIRA: Margin Criterion A â a w t+1 = arg min w w w t 2 s.t. â = arg max a A s w t (a) s wt+1 (a ) s wt+1 (â )+L(a, â )
155 MIRA: Margin Criterion A A 1-1 â a a p â = arg max a A 1-1 s wt (a)+λl(a p, a)
156 MIRA: Margin Criterion A A 1-1 â a a p w t+1 = arg min w w w t 2 s.t. â = arg max a A 1-1 s wt (a)+λl(a p, a) s wt+1 (a ) s wt+1 (â )+L(a p, â )
157 Alignment Families A
158 Alignment Families A A 1-1
159 Alignment Families A A 1-1
160 Alignment Families A A A 1-1
161 Results GIZA++
162 Results GIZA Precision Recall
163 Results 90.0 GIZA++ HMM [Liang et. al., 06] Precision Recall
164 Results 90.0 GIZA++ HMM LL-BITG (simple) Precision Recall
165 Learning Alignments 90.0 GIZA++ HMM LL-BITG NeurAlign (simple) [Fayan & Dorr, 06] Precision Recall
166 Learning Alignments 90.0 GIZA++ HMM LL-BITG NeurAlign LL-BITG (simple) (+HMM) Precision Recall
167 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a
168 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a { Dice(i,j), Dictionary(i,j), LogFreqDiff(i,j),... }
Fast Inference in Phrase Extraction Models with Belief Propagation
Fast Inference in Phrase Extraction Models with Belief Propagation David Burkett and Dan Klein Computer Science Division University of California, Berkeley {dburkett,klein}@cs.berkeley.edu Abstract Modeling
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach
More informationAn Unsupervised Model for Joint Phrase Alignment and Extraction
An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University
More informationAlgorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationHierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications
More informationTALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño
TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and
More informationLarge-scale Word Alignment Using Soft Dependency Cohesion Constraints
Large-scale Word Alignment Using Soft Dependency Cohesion Constraints Zhiguo Wang and Chengqing Zong National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences {zgwang,
More informationK-best Parsing Algorithms
K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute) k-best Parsing Liang Huang (Penn) k-best parsing 2 k-best Parsing I saw
More informationA Semi-supervised Word Alignment Algorithm with Partial Manual Alignments
A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh
More informationLocal Phrase Reordering Models for Statistical Machine Translation
Local Phrase Reordering Models for Statistical Machine Translation Shankar Kumar, William Byrne Center for Language and Speech Processing, Johns Hopkins University, 3400 North Charles Street, Baltimore,
More informationForest Reranking for Machine Translation with the Perceptron Algorithm
Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,
More informationA General-Purpose Rule Extractor for SCFG-Based Machine Translation
A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax
More informationLanguage in 10 minutes
Language in 10 minutes http://mt-class.org/jhu/lin10.html By Friday: Group up (optional, max size 2), choose a language (not one y all speak) and a date First presentation: Yuan on Thursday Yuan will start
More informationPower Mean Based Algorithm for Combining Multiple Alignment Tables
Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination
More informationTraining a Super Model Look-Alike:Featuring Edit Distance, N-Gram Occurrence,and One Reference Translation p.1/40
Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation Eva Forsbom, Uppsala University evafo@stp.ling.uu.se MT Summit IX Workshop on Machine Translation
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationAlignment Link Projection Using Transformation-Based Learning
Alignment Link Projection Using Transformation-Based Learning Necip Fazil Ayan, Bonnie J. Dorr and Christof Monz Department of Computer Science University of Maryland College Park, MD 20742 {nfa,bonnie,christof}@umiacs.umd.edu
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationImproving Statistical Word Alignment with Ensemble Methods
Improving Statiical Word Alignment with Ensemble Methods Hua Wu and Haifeng Wang Toshiba (China) Research and Development Center, 5/F., Tower W2, Oriental Plaza, No.1, Ea Chang An Ave., Dong Cheng Dirict,
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationBBN System Description for WMT10 System Combination Task
BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,
More informationLearning Non-linear Features for Machine Translation Using Gradient Boosting Machines
Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University
More informationLeft-to-Right Tree-to-String Decoding with Prediction
Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State
More informationCS 664 Flexible Templates. Daniel Huttenlocher
CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationGenome 559. Hidden Markov Models
Genome 559 Hidden Markov Models A simple HMM Eddy, Nat. Biotech, 2004 Notes Probability of a given a state path and output sequence is just product of emission/transition probabilities If state path is
More informationCMU-UKA Syntax Augmented Machine Translation
Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues
More informationGuiding Semi-Supervision with Constraint-Driven Learning
Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew
More informationOutline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder
GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)
More informationStructured Learning for Taxonomy Induction with Belief Propagation
Structured Learning for Taxonomy Induction with Belief Propagation Mohit Bansal TTI-Chicago David Burkett Twitter Inc. Gerard de Melo Tsinghua U. Dan Klein UC Berkeley A Lexical Taxonomy! Captures types
More informationForest-based Algorithms in
Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,
More informationThesis Proposal: Mapping Natural Language to and from the Abstract Meaning Representation
Thesis Proposal: Mapping Natural Language to and from the Abstract Meaning Representation Jeffrey Flanigan April 1, 2016 Abstract A key task in intelligent language processing is obtaining semantic representations
More informationFast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Xinyan Xiao, Yang Liu, Qun Liu, and Shouxun Lin Key Laboratory of Intelligent Information Processing Institute of Computing
More informationRefactored Moses. Hieu Hoang MT Marathon Prague August 2013
Refactored Moses Hieu Hoang MT Marathon Prague August 2013 What did you Refactor? Decoder Feature Func?on Framework moses.ini format Simplify class structure Deleted func?onality NOT decoding algorithms
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language
More informationArtificial Intelligence Naïve Bayes
Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188
More informationCube Pruning as Heuristic Search
Cube Pruning as Heuristic Search Mark Hopkins and Greg Langmead Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA 90292 {mhopkins,glangmead}@languageweaver.com Abstract Cube pruning
More informationIncremental Integer Linear Programming for Non-projective Dependency Parsing
Incremental Integer Linear Programming for Non-projective Dependency Parsing Sebastian Riedel James Clarke ICCS, University of Edinburgh 22. July 2006 EMNLP 2006 S. Riedel, J. Clarke (ICCS, Edinburgh)
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationEXPERIMENTAL SETUP: MOSES
EXPERIMENTAL SETUP: MOSES Moses is a statistical machine translation system that allows us to automatically train translation models for any language pair. All we need is a collection of translated texts
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More informationImproved Typesetting Models for Historical OCR
Improved Typesetting Models for Historical OCR Taylor Berg-Kirkpatrick Dan Klein Computer Science Division University of California, Berkeley {tberg,klein}@cs.berkeley.edu Abstract We present richer typesetting
More informationSyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation
SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationA Quantitative Analysis of Reordering Phenomena
A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationType-based MCMC for Sampling Tree Fragments from Forests
Type-based MCMC for Sampling Tree Fragments from Forests Xiaochang Peng and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract This paper applies type-based
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationAligning English Strings with Abstract Meaning Representation Graphs
Aligning English Strings with Abstract Meaning Representation Graphs Nima Pourdamghani, Yang Gao, Ulf Hermjakob, Kevin Knight Information Sciences Institute Department of Computer Science University of
More informationThe QCN System for Egyptian Arabic to English Machine Translation. Presenter: Hassan Sajjad
The QCN System for Egyptian Arabic to English Machine Translation Presenter: Hassan Sajjad QCN Collaboration Ahmed El Kholy Wael Salloum Nizar Habash Ahmed Abdelali Nadir Durrani Francisco Guzmán Preslav
More informationDynamic Time Warping
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW
More informationCS 188: Artificial Intelligence Fall Machine Learning
CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select
More information3 Related Work. 2 Motivation. the alignment in either of the two directional alignments.
PACLIC 30 Proceedings Yet Another Symmetrical and Real-time Word Alignment Method: Hierarchical Sub-sentential Alignment using F-measure Hao Wang, Yves Lepage Graduate School of Information, Production
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,
More informationPictorial Structures for Object Recognition
Pictorial Structures for Object Recognition Felzenszwalb and Huttenlocher Presented by Stephen Krotosky Pictorial Structures Introduced by Fischler and Elschlager in 1973 Objects are modeled by a collection
More informationThe HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10
The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 20: Naïve Bayes 4/11/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. W4 due right now Announcements P4 out, due Friday First contest competition
More informationMention Detection: Heuristics for the OntoNotes annotations
Mention Detection: Heuristics for the OntoNotes annotations Jonathan K. Kummerfeld, Mohit Bansal, David Burkett and Dan Klein Computer Science Division University of California at Berkeley {jkk,mbansal,dburkett,klein}@cs.berkeley.edu
More informationHomework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday
Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model
More informationLearning The Lexicon!
Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,
More informationForest-Based Search Algorithms
Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2
More informationParsing with Dynamic Programming
CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words
More informationOptimizing feature representation for speaker diarization using PCA and LDA
Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationFully Distributed EM for Very Large Datasets
Fully Distributed for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley Overview Task: unsupervised learning via الولاياتالمتحدة المتحدة المتحدة مو تمر الولايات
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationComputer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),
Computer Science 401 13 February 2018 St. George Campus University of Toronto Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Statistical Machine Translation TA: Mohamed Abdalla (mohamed.abdalla@mail.utoronto.ca);
More informationStatistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh
More informationTime series, HMMs, Kalman Filters
Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationMarcello Federico MMT Srl / FBK Trento, Italy
Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation
More informationStephen Scott.
1 / 33 sscott@cse.unl.edu 2 / 33 Start with a set of sequences In each column, residues are homolgous Residues occupy similar positions in 3D structure Residues diverge from a common ancestral residue
More informationIntroduction to HTK Toolkit
Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:
More informationAdvanced Search Algorithms
CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationReversing Morphological Tokenization in English-to-Arabic SMT
Reversing Morphological Tokenization in English-to-Arabic SMT Mohammad Salameh Colin Cherry Grzegorz Kondrak Department of Computing Science National Research Council Canada University of Alberta 1200
More informationSemi-Supervised Learning of Named Entity Substructure
Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationComparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationAT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands
AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationHierarchical Phrase-Based Translation with Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias Adrià de Gispert Eduardo R. Banga William Byrne University of Vigo. Dept. of Signal Processing and Communications.
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationEfficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices
Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar 1 and Wolfgang Macherey 1 and Chris Dyer 2 and Franz Och 1 1 Google Inc. 1600
More informationOn LM Heuristics for the Cube Growing Algorithm
On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current
More informationA Simple Approach to the Design of Site-Level Extractors Using Domain-Centric Principles
A Simple Approach to the Design of Site-Level Extractors Using Domain-Centric Principles Chong Long Yahoo! Labs, Beijing chongl@yahoo-inc.com Xiubo Geng Yahoo! Labs, Beijing gengxb@yahoo-inc.com Sathiya
More informationLightly Supervised Learning of Procedural Dialog Systems
Lightly Supervised Learning of Procedural Dialog Systems Svitlana Volkova in collaboration with Bill Dolan, Pallavi Choudhury, Luke Zettlemoyer, Chris Quirk Outline Motivation Data Query Logs Office.com
More informationLarge-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search
Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search Jeffrey Flanigan Chris Dyer Jaime Carbonell Language Technologies Institute Carnegie Mellon University
More informationCRF Feature Induction
CRF Feature Induction Andrew McCallum Efficiently Inducing Features of Conditional Random Fields Kuzman Ganchev 1 Introduction Basic Idea Aside: Transformation Based Learning Notation/CRF Review 2 Arbitrary
More information