Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley

Size: px
Start display at page:

Download "Better Word Alignment with Supervised ITG Models. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley"

Transcription

1 Better Word Alignment with Supervised ITG Models Aria Haghighi, John Blitzer, John DeNero, and Dan Klein UC Berkeley

2 Word Alignment s Indonesia parliament court in arraigned speaker

3 Word Alignment s Indonesia parliament court in arraigned speaker

4 Word Alignment s Indonesia parliament court in arraigned speaker

5 Word Alignment s Indonesia parliament court in arraigned speaker

6 Experimental Setup

7 Experimental Setup 2002 Chinese-English NIST data labeled training examples 191 test examples

8 Experimental Setup 2002 Chinese-English NIST data labeled training examples 191 test examples Evaluation - - Prec, Recall over gold alignments BLEU end-to-end

9 Alignment Families A

10 Alignment Families Optimal AER A

11 Alignment Families Optimal AER A 0.0 A

12 Alignment Families Optimal AER A 0.0 A A 1-1

13 Alignment Families Optimal AER A 0.0 A A 1-1

14 Alignment Families Optimal AER A A A A 1-1

15 Alignment Families

16 Alignment Families Inversion Transduction Grammar (ITG) [Wu, 97]

17 Alignment Families Inversion Transduction Grammar (ITG) X f,e, f,,,e [Wu, 97]

18 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f [Wu, 97]

19 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) [Wu, 97]

20 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) [Wu, 97]

21 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) X X (L) X (R) [Wu, 97]

22 Alignment Families Inversion Transduction Grammar (ITG) e X f,e, f,,,e f X X (L) X (R) X (L) X (R) X X (L) X (R) X (L) X (R) [Wu, 97]

23 Alignment Families Optimal AER A A A A 1-1

24 Alignment Families Optimal AER A A A A 1-1 A ITG

25 Alignment Families Optimal AER A A A A 1-1 A ITG

26 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A ITG

27 Alignment Families

28 Alignment Families Block ITG (BITG)

29 Alignment Families Block ITG (BITG) X f,e, f,e

30 Alignment Families Block ITG (BITG) e e X f,e, f,e f f

31 Alignment Families Block ITG (BITG) e e X f,e, f,e f f X X (L) X (R) X (L) X (R) X X (L) X (R) X (L) X (R)

32 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A ITG

33 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A BITG A ITG

34 Alignment Families Optimal AER A A 1-1 A ITG A A 1-1 A BITG A ITG

35 Alignment Families Optimal AER A A A ITG 1.2 A BITG 10.2 A A 1-1 A BITG A ITG

36 Alignment Families Optimal AER A BITG 1.2 A A BITG

37 Supervised Data

38 Supervised Data x s Indonesia parliament court in arraigned speaker

39 Supervised Data a x s Indonesia parliament court in arraigned speaker

40 Loss Function

41 Loss Function a

42 Loss Function a â

43 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments

44 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments

45 Loss Function a â L(a, a) = # of missing sure alignments + # of incorrect alignments

46 Linear Model

47 Linear Model a A

48 Linear Model a A Score w T φ(a)

49 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a

50 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a φ 13 = { PartialDictMatch(Indonesia, Indonesia,

51 Linear Model a A Score w T φ(a) Features φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a

52 Margin Training

53 Margin Training a

54 Margin Training a â

55 Margin Training a â arg max a A w T φ(a)

56 Margin Training a â arg max a A w T φ(a)+λl(a, a)

57 Margin Training a â

58 Margin Training a â w T φ(a ) w T φ(â)+l(a, â)

59 Margin Training a â min w w w 2 w T φ(a ) w T φ(â)+l(a, â)

60 Oracle Projection a

61 Oracle Projection a A

62 Oracle Projection a A A

63 Oracle Projection m =min a A L(a, a) a A A

64 Oracle Projection m =min a A L(a, a) M(a ) = {a A s.t. a L(a, a) =m } A A

65 Oracle Projection m =min a A L(a, a) M(a ) = {a A s.t. a L(a, a) =m } A A M(a )

66 Oracle Projection a A A M(a )

67 Oracle Projection a A A M(a )

68 Oracle Projection a A A M(a )

69 Oracle Projection a a p A A M(a )

70 Learning Alignments 1-1 ITG BITG

71 Learning Alignments 1-1 ITG BITG MIRA Trained

72 Learning Alignments 1-1 ITG BITG MIRA Trained Viterbi Inference

73 Learning Alignments 1-1 ITG BITG MIRA Trained - Dice - Lexical - Distance - Dictionary Viterbi Inference Simple Features

74 Learning Alignments 1-1 ITG BITG 90 MIRA Trained Viterbi Inference Simple Features Dice - Lexical - Distance - Dictionary Precision Recall

75 Likelihood Training

76 Likelihood Training P w (a x) exp{w T φ(a)}

77 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A

78 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1

79 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 #P

80 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A X 1-1 #P

81 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P

82 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P j k i l Target Source

83 Likelihood Training P w (a x) exp{w T φ(a)} Z w (x) = exp{w T φ(a)} a A A 1-1 AITG X#P j k i l Target Source ABITG Target j k i l Source

84 Derivations vs. Alignments

85 Derivations vs. Alignments

86 Derivations vs. Alignments

87 Derivations vs. Alignments

88 Derivations vs. Alignments

89 Derivations vs. Alignments

90 Derivations vs. Alignments

91 ITG Normal Form

92 ITG Normal Form N-ary Productions

93 ITG Normal Form N-ary Productions

94 ITG Normal Form N-ary Productions

95 ITG Normal Form N-ary Productions

96 ITG Normal Form N-ary Productions Null Attachment

97 ITG Normal Form N-ary Productions Null Attachment

98 ITG Normal Form N-ary Productions Null Attachment

99 ITG Normal Form N-ary Productions Null Attachment

100 Oracle Projection

101 Oracle Projection P w (a x) exp{w T φ(a)}

102 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i

103 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a

104 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a A

105 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (a i x i ) i A a A M(a )

106 Oracle Projection P w (a x) exp{w T φ(a)} max w log P w (M(a i ) x i ) i A a A M(a )

107 Learning Alignments Margin Likelihood

108 Learning Alignments Margin Likelihood Viterbi Inference

109 Learning Alignments Margin Likelihood Viterbi Inference Simple Features - Dice - Lexical - Distance - Dictionary

110 Learning Alignments Margin Likelihood 90 Viterbi Inference Simple Features Dice - Lexical - Distance - Dictionary 50 Precision Recall

111 Learning Alignments GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM)

112 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall

113 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall

114 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall

115 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall

116 Learning Alignments 90.0 GIZA++ HMM NeurAlign Likelihood-BITG [Fayan & Dorr, 06] (+HMM) Precision Recall

117 End-to-End Experiments

118 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009]

119 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40

120 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua

121 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test

122 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Test: NIST 2005 Chinese-English Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test

123 End-to-End Experiments Decoder: JosHUa [Li et. al, 2009] Test: NIST 2005 Chinese-English Data: FBIS 100k sents max length 40 LM: 5-gram on Eng. Gigaword Xinhua Tuning: 300 sents. of NIST MT04 test Alignments Recall Prec BLEU GIZA HMM LL-BITG

124 Conclusions

125 Conclusions Blocks are important, ITG tractable

126 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training

127 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements

128 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements

129 Conclusions Blocks are important, ITG tractable Normal form and oracle projection allow for likelihood training Word alignment improvements yield BLEU improvements Software nlp.cs.berkeley.edu

130 Thanks!

131 Likelihood Criterion A â a P w (a x) = s w (a) a A s w(a ) max w (x,a ) D log P (a x)

132 Likelihood Criterion A A ITG â a P w (a x) = s w (a) a A ITG s w (a ) max w (x,a ) D log P (a x)

133 Likelihood Criterion M(a ) a m = min L(a, a) a A ITG M(a )={a A ITG s.t. L(a, a) =m }

134 Likelihood Criterion M(a ) a max w (x,a ) D log P (M(a ) x)

135 Likelihood Criterion MIRA 21.1 LogLike A BITG

136 Adding External Features Adding Joint HMM posteriors from DeNero et. al. (2007) MIRA Likelihood 1-1 ITG BITG BITG-S BITG-N Features P R AER P R AER P R AER P R AER P R AER Dice, dist, blcks, dict, lex HMM TODO: Keynote Chart

137 End-to-End Results Using JosHUa decoder (HIERO) Alignments Translations Model Prec Rec Rules BLEU GIZA M Joint HMM M Viterbi ITG M Posterior ITG M TODO: Keynote Chart

138 Alignment Families A A 1-1

139 Alignment Families A A 1-1

140 Alignment Families A A 1-1 A ITG

141 Alignment Families A A 1-1 A ITG

142 Alignment Families A A 1-1 A ITG

143 Alignment Families A A 1-1 A ITG

144 Alignment Families A A 1-1 A ITG Optimal AER A 1-1 A ITG

145 Alignment Families A A 1-1 A ITG A BITG

146 Alignment Families A A 1-1 A ITG A BITG

147 Alignment Families Optimal AER A A 1-1 A 1-1 A ITG A ITG A BITG 1.2 A BITG

148 Learning Alignments a A

149 Learning Alignments a A s(a) =w T φ(a)

150 Learning Alignments a A Score w T φ(a) Features φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a

151 Learning Alignments a A s w (a) =w T φ(a) φ(a) = φ ij + (i,j) a φ j i/ a φ i + j/ a

152 MIRA: Margin Criterion a â

153 MIRA: Margin Criterion A â arg max s(a) a A a s w (a ) s w (â )+L(a, â )

154 MIRA: Margin Criterion A â a w t+1 = arg min w w w t 2 s.t. â = arg max a A s w t (a) s wt+1 (a ) s wt+1 (â )+L(a, â )

155 MIRA: Margin Criterion A A 1-1 â a a p â = arg max a A 1-1 s wt (a)+λl(a p, a)

156 MIRA: Margin Criterion A A 1-1 â a a p w t+1 = arg min w w w t 2 s.t. â = arg max a A 1-1 s wt (a)+λl(a p, a) s wt+1 (a ) s wt+1 (â )+L(a p, â )

157 Alignment Families A

158 Alignment Families A A 1-1

159 Alignment Families A A 1-1

160 Alignment Families A A A 1-1

161 Results GIZA++

162 Results GIZA Precision Recall

163 Results 90.0 GIZA++ HMM [Liang et. al., 06] Precision Recall

164 Results 90.0 GIZA++ HMM LL-BITG (simple) Precision Recall

165 Learning Alignments 90.0 GIZA++ HMM LL-BITG NeurAlign (simple) [Fayan & Dorr, 06] Precision Recall

166 Learning Alignments 90.0 GIZA++ HMM LL-BITG NeurAlign LL-BITG (simple) (+HMM) Precision Recall

167 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a

168 Linear Model a A Score w T φ(a) Features φ(a) = φ ij (i,j) a { Dice(i,j), Dictionary(i,j), LogFreqDiff(i,j),... }

Fast Inference in Phrase Extraction Models with Belief Propagation

Fast Inference in Phrase Extraction Models with Belief Propagation Fast Inference in Phrase Extraction Models with Belief Propagation David Burkett and Dan Klein Computer Science Division University of California, Berkeley {dburkett,klein}@cs.berkeley.edu Abstract Modeling

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in Francisco Guzmán CCIR-ITESM guzmanhe@gmail.com Qin Gao LTI-CMU qing@cs.cmu.edu Stephan Vogel LTI-CMU stephan.vogel@cs.cmu.edu Presented by: Nguyen Bach

More information

An Unsupervised Model for Joint Phrase Alignment and Extraction

An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University

More information

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of ciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn 8/10/2009

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation. A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications

More information

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño Outline Overview Outline Translation generation Training IWSLT'04 Chinese-English supplied task results Conclusion and

More information

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints Large-scale Word Alignment Using Soft Dependency Cohesion Constraints Zhiguo Wang and Chengqing Zong National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences {zgwang,

More information

K-best Parsing Algorithms

K-best Parsing Algorithms K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute) k-best Parsing Liang Huang (Penn) k-best parsing 2 k-best Parsing I saw

More information

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh

More information

Local Phrase Reordering Models for Statistical Machine Translation

Local Phrase Reordering Models for Statistical Machine Translation Local Phrase Reordering Models for Statistical Machine Translation Shankar Kumar, William Byrne Center for Language and Speech Processing, Johns Hopkins University, 3400 North Charles Street, Baltimore,

More information

Forest Reranking for Machine Translation with the Perceptron Algorithm

Forest Reranking for Machine Translation with the Perceptron Algorithm Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,

More information

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A General-Purpose Rule Extractor for SCFG-Based Machine Translation A General-Purpose Rule Extractor for SCFG-Based Machine Translation Greg Hanneman, Michelle Burroughs, and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax

More information

Language in 10 minutes

Language in 10 minutes Language in 10 minutes http://mt-class.org/jhu/lin10.html By Friday: Group up (optional, max size 2), choose a language (not one y all speak) and a date First presentation: Yuan on Thursday Yuan will start

More information

Power Mean Based Algorithm for Combining Multiple Alignment Tables

Power Mean Based Algorithm for Combining Multiple Alignment Tables Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination

More information

Training a Super Model Look-Alike:Featuring Edit Distance, N-Gram Occurrence,and One Reference Translation p.1/40

Training a Super Model Look-Alike:Featuring Edit Distance, N-Gram Occurrence,and One Reference Translation p.1/40 Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation Eva Forsbom, Uppsala University evafo@stp.ling.uu.se MT Summit IX Workshop on Machine Translation

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

Alignment Link Projection Using Transformation-Based Learning

Alignment Link Projection Using Transformation-Based Learning Alignment Link Projection Using Transformation-Based Learning Necip Fazil Ayan, Bonnie J. Dorr and Christof Monz Department of Computer Science University of Maryland College Park, MD 20742 {nfa,bonnie,christof}@umiacs.umd.edu

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Improving Statistical Word Alignment with Ensemble Methods

Improving Statistical Word Alignment with Ensemble Methods Improving Statiical Word Alignment with Ensemble Methods Hua Wu and Haifeng Wang Toshiba (China) Research and Development Center, 5/F., Tower W2, Oriental Plaza, No.1, Ea Chang An Ave., Dong Cheng Dirict,

More information

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

BBN System Description for WMT10 System Combination Task

BBN System Description for WMT10 System Combination Task BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,

More information

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University

More information

Left-to-Right Tree-to-String Decoding with Prediction

Left-to-Right Tree-to-String Decoding with Prediction Left-to-Right Tree-to-String Decoding with Prediction Yang Feng Yang Liu Qun Liu Trevor Cohn Department of Computer Science The University of Sheffield, Sheffield, UK {y.feng, t.cohn}@sheffield.ac.uk State

More information

CS 664 Flexible Templates. Daniel Huttenlocher

CS 664 Flexible Templates. Daniel Huttenlocher CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

Genome 559. Hidden Markov Models

Genome 559. Hidden Markov Models Genome 559 Hidden Markov Models A simple HMM Eddy, Nat. Biotech, 2004 Notes Probability of a given a state path and output sequence is just product of emission/transition probabilities If state path is

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

Guiding Semi-Supervision with Constraint-Driven Learning

Guiding Semi-Supervision with Constraint-Driven Learning Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew

More information

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)

More information

Structured Learning for Taxonomy Induction with Belief Propagation

Structured Learning for Taxonomy Induction with Belief Propagation Structured Learning for Taxonomy Induction with Belief Propagation Mohit Bansal TTI-Chicago David Burkett Twitter Inc. Gerard de Melo Tsinghua U. Dan Klein UC Berkeley A Lexical Taxonomy! Captures types

More information

Forest-based Algorithms in

Forest-based Algorithms in Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight,

More information

Thesis Proposal: Mapping Natural Language to and from the Abstract Meaning Representation

Thesis Proposal: Mapping Natural Language to and from the Abstract Meaning Representation Thesis Proposal: Mapping Natural Language to and from the Abstract Meaning Representation Jeffrey Flanigan April 1, 2016 Abstract A key task in intelligent language processing is obtaining semantic representations

More information

Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training Xinyan Xiao, Yang Liu, Qun Liu, and Shouxun Lin Key Laboratory of Intelligent Information Processing Institute of Computing

More information

Refactored Moses. Hieu Hoang MT Marathon Prague August 2013

Refactored Moses. Hieu Hoang MT Marathon Prague August 2013 Refactored Moses Hieu Hoang MT Marathon Prague August 2013 What did you Refactor? Decoder Feature Func?on Framework moses.ini format Simplify class structure Deleted func?onality NOT decoding algorithms

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language

More information

Artificial Intelligence Naïve Bayes

Artificial Intelligence Naïve Bayes Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188

More information

Cube Pruning as Heuristic Search

Cube Pruning as Heuristic Search Cube Pruning as Heuristic Search Mark Hopkins and Greg Langmead Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA 90292 {mhopkins,glangmead}@languageweaver.com Abstract Cube pruning

More information

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Incremental Integer Linear Programming for Non-projective Dependency Parsing Incremental Integer Linear Programming for Non-projective Dependency Parsing Sebastian Riedel James Clarke ICCS, University of Edinburgh 22. July 2006 EMNLP 2006 S. Riedel, J. Clarke (ICCS, Edinburgh)

More information

RLAT Rapid Language Adaptation Toolkit

RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

EXPERIMENTAL SETUP: MOSES

EXPERIMENTAL SETUP: MOSES EXPERIMENTAL SETUP: MOSES Moses is a statistical machine translation system that allows us to automatically train translation models for any language pair. All we need is a collection of translated texts

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Improved Typesetting Models for Historical OCR

Improved Typesetting Models for Historical OCR Improved Typesetting Models for Historical OCR Taylor Berg-Kirkpatrick Dan Klein Computer Science Division University of California, Berkeley {tberg,klein}@cs.berkeley.edu Abstract We present richer typesetting

More information

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

A Quantitative Analysis of Reordering Phenomena

A Quantitative Analysis of Reordering Phenomena A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street

More information

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM) Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,

More information

Type-based MCMC for Sampling Tree Fragments from Forests

Type-based MCMC for Sampling Tree Fragments from Forests Type-based MCMC for Sampling Tree Fragments from Forests Xiaochang Peng and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract This paper applies type-based

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at

More information

Aligning English Strings with Abstract Meaning Representation Graphs

Aligning English Strings with Abstract Meaning Representation Graphs Aligning English Strings with Abstract Meaning Representation Graphs Nima Pourdamghani, Yang Gao, Ulf Hermjakob, Kevin Knight Information Sciences Institute Department of Computer Science University of

More information

The QCN System for Egyptian Arabic to English Machine Translation. Presenter: Hassan Sajjad

The QCN System for Egyptian Arabic to English Machine Translation. Presenter: Hassan Sajjad The QCN System for Egyptian Arabic to English Machine Translation Presenter: Hassan Sajjad QCN Collaboration Ahmed El Kholy Wael Salloum Nizar Habash Ahmed Abdelali Nadir Durrani Francisco Guzmán Preslav

More information

Dynamic Time Warping

Dynamic Time Warping Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW

More information

CS 188: Artificial Intelligence Fall Machine Learning

CS 188: Artificial Intelligence Fall Machine Learning CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select

More information

3 Related Work. 2 Motivation. the alignment in either of the two directional alignments.

3 Related Work. 2 Motivation. the alignment in either of the two directional alignments. PACLIC 30 Proceedings Yet Another Symmetrical and Real-time Word Alignment Method: Hierarchical Sub-sentential Alignment using F-measure Hao Wang, Yves Lepage Graduate School of Information, Production

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,

More information

Pictorial Structures for Object Recognition

Pictorial Structures for Object Recognition Pictorial Structures for Object Recognition Felzenszwalb and Huttenlocher Presented by Stephen Krotosky Pictorial Structures Introduced by Fischler and Elschlager in 1973 Objects are modeled by a collection

More information

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa Stupperich, Laura Jehl, Katharina Wäschle, Artem Sokolov, Stefan Riezler Institute for Computational Linguistics,

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 20: Naïve Bayes 4/11/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. W4 due right now Announcements P4 out, due Friday First contest competition

More information

Mention Detection: Heuristics for the OntoNotes annotations

Mention Detection: Heuristics for the OntoNotes annotations Mention Detection: Heuristics for the OntoNotes annotations Jonathan K. Kummerfeld, Mohit Bansal, David Burkett and Dan Klein Computer Science Division University of California at Berkeley {jkk,mbansal,dburkett,klein}@cs.berkeley.edu

More information

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday Homework 1 Leaderboard Read through, submit the default output Time for questions on Tuesday Agenda Focus on Homework 1 Review IBM Models 1 & 2 Inference (compute best alignment from a corpus given model

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Forest-Based Search Algorithms

Forest-Based Search Algorithms Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2

More information

Parsing with Dynamic Programming

Parsing with Dynamic Programming CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words

More information

Optimizing feature representation for speaker diarization using PCA and LDA

Optimizing feature representation for speaker diarization using PCA and LDA Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Fully Distributed EM for Very Large Datasets

Fully Distributed EM for Very Large Datasets Fully Distributed for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley Overview Task: unsupervised learning via الولاياتالمتحدة المتحدة المتحدة مو تمر الولايات

More information

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith

More information

Computer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),

Computer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Computer Science 401 13 February 2018 St. George Campus University of Toronto Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Statistical Machine Translation TA: Mohamed Abdalla (mohamed.abdalla@mail.utoronto.ca);

More information

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh

More information

Time series, HMMs, Kalman Filters

Time series, HMMs, Kalman Filters Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs

More information

Marcello Federico MMT Srl / FBK Trento, Italy

Marcello Federico MMT Srl / FBK Trento, Italy Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation

More information

Stephen Scott.

Stephen Scott. 1 / 33 sscott@cse.unl.edu 2 / 33 Start with a set of sequences In each column, residues are homolgous Residues occupy similar positions in 3D structure Residues diverge from a common ancestral residue

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:

More information

Advanced Search Algorithms

Advanced Search Algorithms CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from

More information

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Discriminative Training with Perceptron Algorithm for POS Tagging Task Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu

More information

Reversing Morphological Tokenization in English-to-Arabic SMT

Reversing Morphological Tokenization in English-to-Arabic SMT Reversing Morphological Tokenization in English-to-Arabic SMT Mohammad Salameh Colin Cherry Grzegorz Kondrak Department of Computing Science National Research Council Canada University of Alberta 1200

More information

Semi-Supervised Learning of Named Entity Substructure

Semi-Supervised Learning of Named Entity Substructure Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

Hierarchical Phrase-Based Translation with Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias Adrià de Gispert Eduardo R. Banga William Byrne University of Vigo. Dept. of Signal Processing and Communications.

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar 1 and Wolfgang Macherey 1 and Chris Dyer 2 and Franz Och 1 1 Google Inc. 1600

More information

On LM Heuristics for the Cube Growing Algorithm

On LM Heuristics for the Cube Growing Algorithm On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current

More information

A Simple Approach to the Design of Site-Level Extractors Using Domain-Centric Principles

A Simple Approach to the Design of Site-Level Extractors Using Domain-Centric Principles A Simple Approach to the Design of Site-Level Extractors Using Domain-Centric Principles Chong Long Yahoo! Labs, Beijing chongl@yahoo-inc.com Xiubo Geng Yahoo! Labs, Beijing gengxb@yahoo-inc.com Sathiya

More information

Lightly Supervised Learning of Procedural Dialog Systems

Lightly Supervised Learning of Procedural Dialog Systems Lightly Supervised Learning of Procedural Dialog Systems Svitlana Volkova in collaboration with Bill Dolan, Pallavi Choudhury, Luke Zettlemoyer, Chris Quirk Outline Motivation Data Query Logs Office.com

More information

Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search

Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search Jeffrey Flanigan Chris Dyer Jaime Carbonell Language Technologies Institute Carnegie Mellon University

More information

CRF Feature Induction

CRF Feature Induction CRF Feature Induction Andrew McCallum Efficiently Inducing Features of Conditional Random Fields Kuzman Ganchev 1 Introduction Basic Idea Aside: Transformation Based Learning Notation/CRF Review 2 Arbitrary

More information